Categories
Pages
-

DBIS

Comparing Synthetically Generated Against Traditionally Anonymized Time Series Data With a Privacy and Utility Framework

October 15th, 2024

This study compares the utility and privacy preservation capabilities of synthetically gen- erated time series data against traditionally anonymized data, focusing on electrocardiogram (ECG) recordings. As data privacy regulations tighten, there’s an increasing need for methods that protect sensitive information while maintaining data utility for research.

Time series data, common across various domains, presents unique privacy challenges due to its temporal dependencies and long-term correlations. This study introduces a modular framework for evaluating synthetic and anonymized time series data, applying it to a large- scale ECG dataset from the PhysioNet Challenge.

A Transformer Time-Series Conditional Generative Adversarial Network (TTS-CGAN) is used to generate synthetic data, while (k,P)-anonymization serves as a benchmark for tradi- tional methods. The resulting datasets are evaluated using visual analysis, privacy metrics like Nearest Neighbor Distance Ratio (NNDR) and Distance to Closest Record (DCR), and utility assessment through a CNN-based classification task.

Results show that TTS-CGAN-generated synthetic data offers a promising balance between privacy protection and data utility, outperforming traditionally anonymized data in classifica- tion tasks (52% vs 19% accuracy) while maintaining robust privacy guarantees.

The findings suggest that while synthetic data generation techniques show significant po- tential for privacy-preserving data sharing in sensitive domains, both synthetic and traditional anonymization methods have their place and unique applications, highlighting the need for continued research in both areas.