Thesis Type |
|
Student |
Rabina Borici |
Status |
Running |
Proposal on |
03/05/2024 11:15 am |
Proposal room |
Seminar room I5 6202 |
Supervisor(s) |
Stefan Decker |
Advisor(s) |
Mehdi Akbari G. fehermsen |
Contact |
mehdi.akbari.gurabi@fit.fraunhofer.de felix.hermsen@fit.fraunhofer.de |
The master’s thesis project aims to automate the translation of unstructured or semi-structured cybersecurity playbooks into a standardized, machine-readable format (OASIS CACAO) using Large Language Models (LLMs). It highlights the benefits of structured processes, interoperable visualizations, collaborative knowledge sharing, and automated response actions, reducing human intervention. A key research focus is ensuring the accuracy, reliability, and effectiveness of LLM-generated workflows during playbook translation. It includes a concept where security operators use LLMs to convert unstructured text into structured workflows, with syntax checkers and playbook management components ensuring standard compliance and content accuracy. The main research problem of the master’s thesis is scarcity of data in both unstructured and machine-readable formats. It will focus on fine-tuning of LLMs (e.g., GPT-3, 4 or Llama 2) with small quantity of data (with the help of other strategies such as feature extraction) for the CACAO playbook translation and further development of an already existing CACAO syntax checker component for syntax verification.
These are the resources we talked about for initial thinking of the topic:
- Playbook Examples and guidelines: These links will provide simple practical examples of cybersecurity playbooks:
- https://github.com/phantomcyber/playbooks
- https://gitlab.com/syntax-ir/playbooks
- https://github.com/luduslibrum/awesome-playbooks
- OASIS CACAO Specification: This document details the Collaborative Automated Course of Action Operations (CACAO) standard for cybersecurity playbooks: https://docs.oasis-open.org/cacao/security-playbooks/v2.0/security-playbooks-v2.0.html
- CACAO v2.0 syntax validator: https://github.com/opencybersecurityalliance/cacao-roaster/tree/main/src/diagram/modules/features/validator
- Fine-tuning with OpenAI: This resource from OpenAI discusses fine-tuning, a method for effectively utilizing Large Language Models (LLMs) like GPT-3 or GPT-4. Understanding how to fine-tune models that results in better outcomes will be key in automating the playbook translation process: https://platform.openai.com/docs/guides/fine-tuning/when-to-use-fine-tuning
- Some relevant articles:
- https://arxiv.org/pdf/2010.07835.pdf
- https://medium.com/@bijit211987/advanced-techniques-for-fine-tuning-llms-46f849c6ece8
- https://pradeepundefned.medium.com/fine-tuning-a-pre-trained-llm-with-unlabelled-dataset-73aa5082a5ef
- https://lightning.ai/pages/community/tutorial/optimizing-llms-from-a-dataset-perspective/
- https://barryzhang.substack.com/p/our-humble-attempt-at-fine-tuning
Basic knowledge in the domains of cyber security, Natural Language Processing (Specifically, Generative AI).