Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models

Jungkyoo Shin, Jieun Han, Seungjun Kim, Yoonseon Oh, Eunwoo Kim

PDF

Key figure (auto-extracted from paper)

Abstract

In the field of robot manipulation, learnable task planners are gaining attention, especially for long-horizon tasks such as cooking. However, existing methods that predominantly rely on symbolic representations suffer from limitations in generalization capabilities, particularly in handling unseen ob- jects. Given that objects may vary in real-world environments, this limitation may constrain their practical applicability. To address this issue, we propose a novel task-planning framework that leverages a pretrained large language model (LLM) for environmental interpretation. Our proposed framework ex- tracts semantic features directly from textual data, enabling the planner to accommodate unfamiliar objects. We further incorporate a transformer-based encoder-decoder framework to understand environmental attributes derived from the language model and generate sequential predictions in line with object- oriented subgoals. To validate the effectiveness of our model, we utilize a dataset focused on cooking recipes. Going a step fur- ther, we propose a method that automatically generates object- oriented data from natural language description using recurrent LLM, enhancing the framework to manage previously unseen targets as well. Our framework shows an average success rate of 95% when validated with test sets that involve unseen objects. By providing the automatically generated dataset to the framework, we achieve a significant 27% increase in success rate on unknown target recipes. We also provide evidence of the real-world viability of our planner by successfully deploying it on a robot platform.

Index terms

Task Planning Software Architecture for Robotic and Automation