Thesis Type |
|
Student |
Pouya Shekarchizadeh Esfahani |
Status |
Running |
Proposal on |
10/09/2024 1:00 pm |
Proposal room |
Seminar room I5 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Mehdi Akbari G. |
Contact |
mehdi.akbari.gurabi@fit.fraunhofer.de |
This bachelor thesis proposes an approach to assist experts in creating comprehensive and machine-readable incident response playbooks. By leveraging Large Language Models (LLMs) with a Retrieval-Augmented Generation (RAG) methodology, this approach aims to streamline the playbook creation process based on user input. It begins by gathering various types of information from users, such as asset insights, security environment details, threat descriptions, and CPE tags, while also incorporating security advisories in the standardized CSAF format to ensure high-quality output. The knowledge base will include semi-structured playbooks in formats like JSON, YML, and BPMN as reference samples. The system will be interactive, prompting users for additional information to tailor playbooks to their specific needs. The research questions guiding this thesis focus on the effectiveness and real-world applicability of the generated playbooks, potential enhancements to the RAG approach to minimize inaccuracies, and performance comparisons between different language models. A naïve RAG approach will be implemented as a baseline, with subsequent improvements aimed at enhancing playbook quality and relevance. Ultimately, this approach seeks to advance the automation of playbook generation to reduce the time and effort required to create security playbooks.
Seed papers with the idea of LLM-based generating workflows:
- Process Modeling With Large Language Models: https://arxiv.org/pdf/2403.07541.pdf
- A Method for Extracting BPMN Models from Textual Descriptions Using Natural Language Processing: https://zir.nsk.hr/islandora/object/unipu:8207/datastream/PDF/download
Survey paper regarding RAG approaches: https://arxiv.org/pdf/2312.10997
Sample RAG instruction and information:
- https://research.aimultiple.com/retrieval-augmented-generation/
- https://haystack.deepset.ai/tutorials/07_rag_generator
Example playbooks repositories can be input for our RAG DB:
- https://github.com/luduslibrum/awesome-playbooks (probobly best one with 1300+ playbooks)
- https://github.com/phantomcyber/playbooks
- https://gitlab.com/syntax-ir/playbooks
- https://publica-rest.fraunhofer.de/server/api/core/bitstreams/76b8ef20-de93-45cb-b8dc-17de7a8ad354/content
- https://www.cisa.gov/sites/default/files/publications/Federal_Government_Cybersecurity_Incident_and_Vulnerability_Response_Playbooks_508C.pdf
- https://github.com/guardsight/gsvsoc_cirt-playbook-battle-cards
Knowledge in the domains of cybersecurity and Natural Language Processing (specifically, state-of-the-art Generative AI and Retrieval-Augmented Generation methods).