Enhancing Mixed Reality Instructional Agents with Large Language Models

May 5th, 2025

The innovative integration of Mixed Reality and Large Language Models can lead to highly interactive instructional MR agents. Utilized as automated instructors, these MR agents have the potential to significantly enhance traditional instruction manuals by providing visual guidance. For instance, they can illustrate the next required actions in practical tasks such as tightening screws in machine maintenance. With LLMs, the interactivity of the MR agents can further be enhanced by enabling users to engage in a dialogue with the MR agents, posing questions and receiving real-time responses. Here, a challenge lies in providing a spatial understanding to the LLM so that it can refer to elements in the MR space.

Thesis Type	Bachelor
Student	Tomoaki Fujiwara
Status	Running
Presentation room	Seminar room I5 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Benedikt Hensen
Contact	hensen@dbis.rwth-aachen.de

The goal of this thesis is to answer the question how to combine LLMs with MR agents to facilitate interactive and context-aware instructional support. The realization of this concept will involve investigating and developing an instructional agent using our open-source Virtual Agents Framework and combining it with LLM technology. To overcome the limitations of LLMs regarding spatial awareness, a spatial description module should be devised which collects points of interests in the MR environments. These identified elements should then be transformed into textual descriptions where coordinates are expressed as natural language directions that can serve as additional input information to the prompt for the LLM. Consequently, the LLM can dynamically reference elements in the MR space, guiding users with instructions like directing them to “turn to the left for the next step”. The effectiveness and utility of the developed solution should be evaluated through a user study. The evaluation should quantify the impact of the MR-LLM instructional agent on user experience and learning outcomes, e.g., by measuring the retention rate and by judging the clarity of the spatial instructions.

Prerequisites:

Required knowledge: C# or Java
Beneficial experience: Unity, Mixed Reality, usage of LLMs

DBIS

Enhancing Mixed Reality Instructional Agents with Large Language Models

Quick Links

Recent News

Recent Publications