Seminar Large Language Models – UCD-driven Metrics and Benchmarks

May 29th, 2024

The development of a user-centered quality metric for the outputs of large language models in corporate contexts addresses a key challenge: How can the quality and relevance of AI-powered systems be effectively evaluated and enhanced to optimally meet the specific requirements of companies and their employees? The motivation for this research concept stems from the necessity to develop a systematic and quantifiable method for assessing user satisfaction and the usefulness of LLM outputs.

Type Seminar
Term WS 2024
Mentor(s) René Reiners
Assistant(s) Milad Morad

The development of this metric addresses a central problem, namely the current lack of an effective way to measure the quality and user satisfaction of AI-powered systems from the perspective of actual end-users. Many current evaluation methods focus on technical performance indicators or are not specifically tailored to the unique needs and expectations in particular corporate domains. The proposed metric fills this gap by providing a user-oriented, comprehensive evaluation framework that considers both the subjective satisfaction of users and the objective performance of the systems, thereby enabling a holistic assessment of AI quality.

Objectives of the quality metric as a control instrument:

– Optimization of AI systems

– Improvement of employee satisfaction

– Support for an adaptive learning culture within the company

– Promotion of informed decision-making