{"id":7147,"date":"2026-04-29T17:00:24","date_gmt":"2026-04-29T15:00:24","guid":{"rendered":"https:\/\/dbis.rwth-aachen.de\/dbis\/?p=7147"},"modified":"2026-04-29T17:05:01","modified_gmt":"2026-04-29T15:05:01","slug":"traceability-framework-for-human-llm-assisted-tabular-data-transformations","status":"publish","type":"post","link":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/2026\/traceability-framework-for-human-llm-assisted-tabular-data-transformations\/","title":{"rendered":"Traceability Framework for Human\u2013LLM-Assisted Tabular Data Transformations"},"content":{"rendered":"\n<p>Large Language Models (LLMs) are increasingly used to support data wrangling, but their integration into interactive transformation workflows raises new challenges for auditability, reproducibility, and accountability. When users approve, reject, or refine LLM-generated suggestions, conventional data lineage systems often fail to capture why a change occurred, who was responsible for it, and which transformation produced the final dataset.<\/p>\n\n\n\n<p>This thesis investigates a compact traceability framework for human\u2013LLM-assisted transformations of uploaded tabular files. The target setting is a single structured tabular data file (e.g. CSV), column-level transformation workflows, and practical reproducibility. The framework tracks file versions, table versions, selected columns, LLM suggestions, human decisions, approved transformation specifications, generated code references, execution events, and resulting output versions, with the goal of enabling reconstruction and rollback without storing the full conversation.<\/p>\n\n\n\n<p><strong>This paper is co-supervised by <a href=\"https:\/\/dbgroup.cs.tsinghua.edu.cn\/jnwang\/\">Prof. Jiannan Wang<\/a> (<a href=\"mailto:jnwang@tsinghua.edu.cn\">jnwang@tsinghua.edu.cn<\/a>) from <a href=\"https:\/\/www.cs.tsinghua.edu.cn\/\" target=\"_blank\" rel=\"noreferrer noopener\">Department of Computer Science and Technology<\/a> at <a href=\"https:\/\/www.tsinghua.edu.cn\/\">Tsinghua University<\/a> , who also serves as the second supervisor.<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"264\" src=\"https:\/\/dbis.rwth-aachen.de\/dbis\/wp-content\/uploads\/2026\/04\/image-1-1024x264.png\" alt=\"\" class=\"wp-image-7152\" srcset=\"https:\/\/dbis.rwth-aachen.de\/dbis\/wp-content\/uploads\/2026\/04\/image-1-1024x264.png 1024w, https:\/\/dbis.rwth-aachen.de\/dbis\/wp-content\/uploads\/2026\/04\/image-1-300x77.png 300w, https:\/\/dbis.rwth-aachen.de\/dbis\/wp-content\/uploads\/2026\/04\/image-1-768x198.png 768w, https:\/\/dbis.rwth-aachen.de\/dbis\/wp-content\/uploads\/2026\/04\/image-1.png 1431w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models (LLMs) are increasingly used to support data wrangling, but their integration into interactive transformation workflows raises new challenges for auditability, reproducibility, and accountability. When users approve, reject, or refine LLM-generated suggestions, conventional data lineage systems often fail to capture why a change occurred, who was responsible for it, and which transformation produced [&hellip;]<\/p>\n","protected":false},"author":145,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[21],"tags":[89,46],"class_list":["post-7147","post","type-post","status-publish","format-standard","hentry","category-thesis","tag-data-lineage","tag-llm"],"acf":[],"_links":{"self":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/users\/145"}],"replies":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/comments?post=7147"}],"version-history":[{"count":4,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7147\/revisions"}],"predecessor-version":[{"id":7153,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7147\/revisions\/7153"}],"wp:attachment":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/media?parent=7147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/categories?post=7147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/tags?post=7147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}