{"id":6925,"date":"2026-02-19T20:24:58","date_gmt":"2026-02-19T19:24:58","guid":{"rendered":"https:\/\/dbis.rwth-aachen.de\/dbis\/?p=6925"},"modified":"2026-02-20T17:50:23","modified_gmt":"2026-02-20T16:50:23","slug":"deep-table-structure-integration-for-llm-based-semantic-table-understanding","status":"publish","type":"post","link":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/2026\/deep-table-structure-integration-for-llm-based-semantic-table-understanding\/","title":{"rendered":"Deep Table-Structure Integration for LLM-based Semantic Table Understanding"},"content":{"rendered":"\n<p>This thesis investigates how Large Language Models (LLMs) can be equipped with a&nbsp;<strong>deeper, architecture-level understanding of tabular data<\/strong>, going beyond \u201ctables-as-serialized-text\u201d toward&nbsp;<strong>tables-as-structured objects<\/strong>&nbsp;that expose row\/column topology, header semantics, cell neighborhoods, and inter-cell dependencies to the model in a principled way [1,2,8]. The target setting is&nbsp;<strong>Semantic Table Interpretation (STI)<\/strong>&nbsp;as studied in the&nbsp;<strong>SemTab challenge<\/strong>, focusing on three standard downstream tasks:&nbsp;<strong>Cell Entity Annotation (CEA)<\/strong>,&nbsp;<strong>Column Type Annotation (CTA)<\/strong>, and&nbsp;<strong>Column Property Annotation (CPA)<\/strong>&nbsp;[3,4].<\/p>\n\n\n\n<p>The work will be developed and evaluated primarily using the&nbsp;<strong>MammoTab 25<\/strong>&nbsp;benchmark (Wikipedia-scale tables annotated against&nbsp;<strong>Wikidata<\/strong>) and SemTab-style evaluation protocols [6,7,14].<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This thesis investigates how Large Language Models (LLMs) can be equipped with a&nbsp;deeper, architecture-level understanding of tabular data, going beyond \u201ctables-as-serialized-text\u201d toward&nbsp;tables-as-structured objects&nbsp;that expose row\/column topology, header semantics, cell neighborhoods, and inter-cell dependencies to the model in a principled way [1,2,8]. The target setting is&nbsp;Semantic Table Interpretation (STI)&nbsp;as studied in the&nbsp;SemTab challenge, focusing on three [&hellip;]<\/p>\n","protected":false},"author":145,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[21],"tags":[46,84],"class_list":["post-6925","post","type-post","status-publish","format-standard","hentry","category-thesis","tag-llm","tag-tabular-data"],"acf":[],"_links":{"self":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6925","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/users\/145"}],"replies":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/comments?post=6925"}],"version-history":[{"count":3,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6925\/revisions"}],"predecessor-version":[{"id":6928,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6925\/revisions\/6928"}],"wp:attachment":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/media?parent=6925"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/categories?post=6925"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/tags?post=6925"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}