Conveying information in a tour traditionally either requires a real human or an audio recording. However, the human guide might not always be available and the quality in the tours varies. With audio recordings, extra care has to be taken to convey to the user which element is currently being talked about since there are no visual clues. With recent advancements in mobile augmented reality (AR), another option becomes viable where mixed reality (MR) agents can point out landmarks, give directions, and provide guidance based on location and surroundings, making the user’s experience immersive and interactive. MR agents are virtual, human-like entities interacting with users within augmented or virtual environments, blending digital content with the real world. Large Language Models (LLMs) can add conversational intelligence to the experience. They understand natural language inputs, answer questions, and provide detailed, personalized information based on user preferences. Together, MR agents and LLMs could create a system where the agent visually guides users through real-world environments. At the same time, the LLM delivers rich, context-specific information, responds to queries, and adapts content dynamically to the user. In the context of a tour guide, this integration enables users to explore cities or other locations with the MR agent directing them and visually pointing out features, while the LLM provides detailed explanations, answers, and personalized insights.
Thesis Type |
|
Student |
David Terhürne |
Status |
Running |
Proposal on |
11/12/2024 2:00 pm |
Proposal room |
Seminar room I5 6202 |
Presentation room |
Seminar room I5 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Benedikt Hensen |
Contact |
hensen@dbis.rwth-aachen.de |
The goal of this thesis is to investigate how MR agents can be utilized in combination with LLMs to enrich the experience of guided tours and how to improve the retention of information. These tour guides should be easily accessible on smartphones and directed at tourists or new residents who want to learn more about their city. The technical challenges for this thesis include designing a method for efficiently integrating AR with LLMs in one application. For personalization, conservation, and evaluation purposes the application should be able to ask the user questions. In terms of the LLM, it needs to be ensured that the information gathered is reliable and relevant. Another technical aspect is localizing the users’ and agents’ position in the real world. The developed system should be evaluated with users to assess the effectiveness of the combination of AR and LLM with regards to user interactions, engagement, and learning impact. Another aspect of the evaluation can be the usability of the interface and how well the application performs on a variety of phones. The interactions between the user and the LLM should be collected and evaluated based on their correctness, relevance, and the clarity of the information provided.
Must: Experience with the Unity 3D engine and C#
Beneficial: Knowledge about Mixed Reality, Augmented Reality and Large Language Models