Categories
Pages
-

DBIS

Single-Cell Centric Biomedical Foundation Models for Cancer

October 17th, 2024

This thesis aims to develop a single-cell-centric biomedical foundation model that leverages the capabilities of generative pre-trained transformers to enhance the analysis of single-cell RNA data. The model will address critical tasks in single-cell biology, such as cell-type annotation, perturbation prediction, identification of pathogenic cells, and gene network inference.

This thesis is co-supervised by Sikander Hayat and Rafael Kramann, Department of Medicine II, University Hospital Aachen.

Please send your application to Yongli Mou, M.Sc. (mou@dbis.rwth-aachen.de) and CC. Dr. Sikander Hayat (shayat@ukaachen.de)

Thesis Type
  • Master
Status
Open
Presentation room
Seminar room I5 6202
Supervisor(s)
Stefan Decker
Advisor(s)
Yongli Mou
Contact
mou@dbis.rwth-aachen.de

Background

The rapid growth of single-cell sequencing technologies has enabled researchers to study cellular diversity in greater detail, which is crucial for understanding disease mechanisms, developmental biology, and therapeutic responses. However, existing models for single-cell data often lack scalability and generalizability. Foundation models, particularly those built on transformer architectures, have demonstrated versatility across different domains, such as language and computer vision, by capturing task-agnostic knowledge. Inspired by this, the potential for a single-cell foundation model lies in its ability to handle the high-dimensional nature of single-cell data, allowing for a unified framework that supports a wide range of biological inquiries.

Objectives

  • Develop a foundation model and pre-train the model on massive single-cell data
  • Fine-tune the model to perform tasks such as cell-type annotation, perturbation prediction, and gene network inference, prediction of pathogenic cells

Tasks

  • Literature review and analysis of current state-of-the-art.
    • Review the existing single-cell analysis tools and foundation models in biomedicine.
    • Identify gaps in current methodologies and challenges specific to single-cell data.
  • Data collection and preprocessing
  • Model development, pre-training and fine-tuning, and evaluation
    • Design a transformer-based model architecture that incorporates specialized attention mechanisms for single-cell data.
    • Train the model using a combination of self-supervised learning for pretraining and supervised fine-tuning for specific tasks.
    • Evaluate model performance on cell-type classification, perturbation response prediction, and gene network inference tasks.

References

  1. Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, Wang B. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods. 2024 Feb 26:1-1.

Prerequisites:

Knowledge in Machine Learning, Biology, and Multi-omics (e.g., genomic, proteomic, transcriptomic, epigenomic, etc.)
Programming language – Python
Deep Learning Framework – PyTorch, Transformers