{"id":6802,"date":"2025-12-23T11:47:13","date_gmt":"2025-12-23T10:47:13","guid":{"rendered":"https:\/\/dbis.rwth-aachen.de\/dbis\/?p=6802"},"modified":"2025-12-23T11:47:15","modified_gmt":"2025-12-23T10:47:15","slug":"spacedrag-spacing-aware-knowledge-corruption-against-clustering-based-detection-in-rag-systems","status":"publish","type":"post","link":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/2025\/spacedrag-spacing-aware-knowledge-corruption-against-clustering-based-detection-in-rag-systems\/","title":{"rendered":"SpacedRAG: Spacing-Aware Knowledge Corruption Against Clustering-Based Detection in RAG Systems"},"content":{"rendered":"\n<p>Student:  Tim Vogelbacher<br><\/p>\n\n\n\n<p>Abstract: <\/p>\n\n\n\n<p>Retrieval-Augmented Generation (RAG) systems reduce hallucinations<br>andenhancetherelevanceoftheoutputoflargelanguagemodels(LLMs)<br>by incorporating external knowledge sources. However, this architectural<br>advantage introduces new security risks, including the susceptibility to<br>knowledge corruption attacks, where an attacker crafts malicious docu-<br>ments that are injected into the knowledge base to manipulate an LLMs<br>output. Prior work, such as PoisonedRAG, exploits this vulnerability but<br>is mitigated by defenses like TrustRAG, which clusters the embeddings<br>of the texts inside the knowledge base to identify and remove unusu-<br>ally dense document groups. In this thesis, we present SpacedRAG, an<br>attack that circumvents clustering-based and ROUGE-L-based defenses<br>by adapting the crafting of malicious documents with a new spacing<br>condition. Unlike PoisonedRAG that generates documents to be as sim-<br>ilar to the query as possible and are subsequently also highly similar to<br>each other, SpacedRAG generates malicious documents that are inten-<br>tionally dissimilar from each other while still satisfying conditions for<br>retrieval and resulting in the generation of malicious answers. We for-<br>mulate the attack as an optimization problem and evaluate SpacedRAG<br>under different levels of knowledge that the attacker has about the RAG<br>system. The results show that up to 84% of the adversarial texts created<br>with SpacedRAG bypass TrustRAG\u2019s defenses and lead to a 70% attack<br>success rate, when injected into knowledge bases containing millions of<br>texts.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Student: Tim Vogelbacher Abstract: Retrieval-Augmented Generation (RAG) systems reduce hallucinationsandenhancetherelevanceoftheoutputoflargelanguagemodels(LLMs)by incorporating external knowledge sources. However, this architecturaladvantage introduces new security risks, including the susceptibility toknowledge corruption attacks, where an attacker crafts malicious docu-ments that are injected into the knowledge base to manipulate an LLMsoutput. Prior work, such as PoisonedRAG, exploits this vulnerability butis mitigated by [&hellip;]<\/p>\n","protected":false},"author":52,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[43],"class_list":["post-6802","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-thesis"],"acf":[],"_links":{"self":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6802","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/users\/52"}],"replies":[{"embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/comments?post=6802"}],"version-history":[{"count":1,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6802\/revisions"}],"predecessor-version":[{"id":6803,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/posts\/6802\/revisions\/6803"}],"wp:attachment":[{"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/media?parent=6802"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/categories?post=6802"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbis.rwth-aachen.de\/dbis\/index.php\/wp-json\/wp\/v2\/tags?post=6802"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}