Kategorie: ‘Uncategorized’
Evaluation of posioning attacks to Retrieval-Augmented Generation of Large Language Models
New Bachelor Thesis.
Updates TBD.
Assessing the limitations ofImage Anonymization
In this thesis, we investigate the risk of a Singling Out on visual data.
This is closely tied to the potential risk of visual re-identification attacks.
We investigate this risk by providing a conceptual approach of detecting
bodies in visual data. After that anonymizing their faces using a Gaussian
Blur and finally collecting visual attributes, such as age, gender, and more.
Those are then analyzed for their risk of Singling Out. Furthermore, we
provide an analysis on how much the usage of anonymization techniques
impact the classification process of visual attributes. It was discovered
that a singling out was partly possible for some people in the dataset,
but not for all. In addition, for some visual attributes the anonymization-
process did not yield a significant deviation from the ground truth.
Comparing Synthetically Generated Against Traditionally Anonymized Time Series Data With a Privacy and Utility Framework
This study compares the utility and privacy preservation capabilities of synthetically gen- erated time series data against traditionally anonymized data, focusing on electrocardiogram (ECG) recordings. As data privacy regulations tighten, there’s an increasing need for methods that protect sensitive information while maintaining data utility for research.
Time series data, common across various domains, presents unique privacy challenges due to its temporal dependencies and long-term correlations. This study introduces a modular framework for evaluating synthetic and anonymized time series data, applying it to a large- scale ECG dataset from the PhysioNet Challenge.
A Transformer Time-Series Conditional Generative Adversarial Network (TTS-CGAN) is used to generate synthetic data, while (k,P)-anonymization serves as a benchmark for tradi- tional methods. The resulting datasets are evaluated using visual analysis, privacy metrics like Nearest Neighbor Distance Ratio (NNDR) and Distance to Closest Record (DCR), and utility assessment through a CNN-based classification task.
Results show that TTS-CGAN-generated synthetic data offers a promising balance between privacy protection and data utility, outperforming traditionally anonymized data in classifica- tion tasks (52% vs 19% accuracy) while maintaining robust privacy guarantees.
The findings suggest that while synthetic data generation techniques show significant po- tential for privacy-preserving data sharing in sensitive domains, both synthetic and traditional anonymization methods have their place and unique applications, highlighting the need for continued research in both areas.