YANKEE: YouTube-ANnotated Knowledge Extraction Engine

November 17th, 2025

The aim of this thesis is to extend an existing system for providing psychomotor feedback in a camera-based learning environment by automating or supporting the rule creation process.

The core objective is to leverage computer vision techniques and large language models (LLMs) to extract motion data from YouTube tutorial videos and automatically infer psychomotor feedback rules, which can be integrated into the existing feedback engine.

By doing so, the need for expert manual input in defining feedback rules would be minimized, thus streamlining the feedback process for learners in various psychomotor skill domains.

Thesis Type	Master
Status	Running
Presentation room	Seminar room I5 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Michal Slupczynski
Contact	slupczynski@dbis.rwth-aachen.de

Background and Related Work:

Recent advancements in computer vision have enabled systems to accurately track and analyze human motion using models such as YOLO Pose and MediaPipe. These models allow for the extraction of skeletal motion data, which has been successfully applied in applications like fitness tracking, rehabilitation, and sports analysis. Additionally, large language models (LLMs), such as GPT and BERT, have demonstrated the ability to interpret text, generate insights, and infer rules from unstructured content.

Prior research has focused on using expert-defined rules to provide real-time feedback to learners in psychomotor tasks, often based on analyzing their movements. However, defining these rules manually can be time-consuming and requires domain expertise. This thesis seeks to build on this work by introducing an automated approach to rule generation, utilizing both computer vision and natural language processing techniques to minimize the reliance on human experts in the feedback creation process.

Prerequisites:

Knowledge of computer vision – Familiarity with models such as YOLO Pose, MediaPipe, or other skeletal tracking frameworks is essential.
Understanding of natural language processing (NLP) – Experience with large language models (LLMs) like GPT, BERT, or similar is important for rule inference.
Experience with machine learning frameworks – Proficiency in TensorFlow, PyTorch, or a similar framework would be beneficial for implementing and testing the models.
Background in human-computer interaction (HCI) – A general understanding of how feedback is used in psychomotor learning and skill acquisition.

DBIS

YANKEE: YouTube-ANnotated Knowledge Extraction Engine

Background and Related Work:

Quick Links

Recent News

Recent Publications