Skip to content. | Skip to navigation

Sections
Personal tools
You are here: Home Publications Convolutional Embedded Networks for Population Scale Clustering and Bio-ancestry Inferencing

Contact

Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports

Disclaimer

Webmaster

 

 

Convolutional Embedded Networks for Population Scale Clustering and Bio-ancestry Inferencing

Year 2020
Abstract URL view
PDF File download

The study of genetic variants(GVs) can help find correlating population groups to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. ML algorithms are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. In this paper, we proposed convolutional embedded networks(CEN) in which we combine two DNN architectures called convolutional embedded clustering(CEC) and convolutional autoencoder(CAE) classifier for clustering individuals and predicting geographic ethnicity based on GVs, respectively. We employed CAE-based representation learning on 95 million GVs from the 1000 genomes and Simons genome diversity projects. Quantitative and qualitative analyses with a focus on accuracy and scalability show that our approach outperforms state-of-the-art approaches such as VariantSpark and ADMIXTURE. In particular, CEC can cluster targeted population groups in 22 hours with an adjusted rand index(ARI) of 0.915, the normalized mutual information(NMI) of 0.92, and the clustering accuracy(ACC) of 89%. Contrarily, the CAE classifier can predict the geographic ethnicity of unknown samples with an F1 and Mathews correlation coefficient(MCC) score of 0.9004 and 0.8245, respectively. To provide interpretations of the predictions, we identify significant biomarkers using gradient boosted trees and SHAP.

Details

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Authors

Published in

IEEE/ACM Transactions on Computational Biology and Bioinformatics , volume 1 , p. 1-1 .

Document Actions