{"id":7142,"date":"2026-04-23T14:43:32","date_gmt":"2026-04-23T12:43:32","guid":{"rendered":"https:\/\/dbis.rwth-aachen.de\/dbis\/?p=7142"},"modified":"2026-04-29T17:05:22","modified_gmt":"2026-04-29T15:05:22","slug":"training-a-tiny-llm-with-block-attention-residuals-on-commonsenseqa","status":"publish","type":"post","link":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/2026\/training-a-tiny-llm-with-block-attention-residuals-on-commonsenseqa\/","title":{"rendered":"Training a Tiny LLM with Block Attention Residuals on CommonsenseQA"},"content":{"rendered":"\n<p>Knowledge-augmented multiple-choice question answering (MCQA) aims to improve robustness and factual grounding by integrating external structured knowledge (e.g., knowledge graphs) into language-model-based decision making. Current high-performing systems typically retrieve a local subgraph relevant to a question and candidate answers, then combine pretrained language representations with explicit graph reasoning modules.<\/p>\n\n\n\n<p>This thesis investigates an alternative representation path: instead of processing retrieved knowledge graph (KG) subgraphs as symbolic triples with graph neural networks, the subgraphs are deterministically rendered into a compact 2D \u201cvisual graph\u201d representation and encoded with a vision backbone. The resulting visual KG evidence is fused with a encoder-only language model via attention-based cross-modal interaction. The core research question is whether a visually encoded KG can preserve decision-relevant relational structure and support competitive knowledge-augmented MCQA performance on CommonsenseQA and OpenBookQA (optionally extending to MedQA-USMLE).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Knowledge-augmented multiple-choice question answering (MCQA) aims to improve robustness and factual grounding by integrating external structured knowledge (e.g., knowledge graphs) into language-model-based decision making. Current high-performing systems typically retrieve a local subgraph relevant to a question and candidate answers, then combine pretrained language representations with explicit graph reasoning modules. This thesis investigates an alternative representation [&hellip;]<\/p>\n","protected":false},"author":145,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[21],"tags":[90,87,46],"class_list":["post-7142","post","type-post","status-publish","format-standard","hentry","category-thesis","tag-attention-residuals","tag-commonsense-qa","tag-llm"],"acf":[],"_links":{"self":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/users\/145"}],"replies":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/comments?post=7142"}],"version-history":[{"count":1,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7142\/revisions"}],"predecessor-version":[{"id":7143,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/7142\/revisions\/7143"}],"wp:attachment":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/media?parent=7142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/categories?post=7142"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/tags?post=7142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}