Dynamic Co-Speech Gesture Generation for Tutoring Agents

November 19th, 2024

Large Language Models (LLM) can be applied to transform a natural language (NL)-based text input query into a NL-based text answer. A common use case are personal assistants, e.g., for learning activities. In such teaching contexts they can process knowledge recorded in plain text documents, create summarizations, or teach knowledge according to a curriculum. However, interfaces for LLMs are currently text-based chats. This can be enhanced by showing body language, e.g., gestures which support the conveyed content. With the help of desktop-based virtual agents, the chat interface can be turned into a video call where the LLM is personified by an agent which is able to respond with gestures in addition to the output text.

Thesis Type	Master
Student	Sebastian Meinberger
Status	Running
Presentation room	Seminar room I5 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Benedikt Hensen
Contact	hensen@dbis.rwth-aachen.de

This thesis explores the use case of using an LLM to teach knowledge recorded in plain NL text. The focus lies on enhancing the LLM-to-user interaction by creating multimodal visualizations of the answer queries. The user-to-LLM interactions consist of regular NL text queries which are entered through a keyboard. As an interface for this, an established chat program should be used to provide the LLM as a chatbot. This existing interface is then extended by adding additional visualizations in the form of a virtual agent. In a comparative study, the thesis can investigate different visualization types and gather meaningful data about their impact. These visualizations can include a display of a virtual face and text-to-speech outputs. The face can synchronize its lip movements with the speech and show suitable expressions. Another visualization type is a full 3D agent which is capable of expressing gestures and using virtual objects for demonstration purposes. By comparing these visualizations to a traditional text chat, insights can be gained about the strengths and weaknesses of such visual personifications of the LLM.

Prerequisites:

Must: Knowledge of LLMs

Beneficial: Experience with the Unity 3D engine, C# and Python

DBIS

Dynamic Co-Speech Gesture Generation for Tutoring Agents

Quick Links

Recent News

Recent Publications