TLM: Bridge the Modality Gap between Transcriptome and Natural Language

October 30th, 2024

This thesis focuses on developing a Transcriptome-Language Model (TLM) to effectively bridge the modality gap between transcriptomic data and natural language text. You will explore advanced models for transcriptomic data representation, and cross-modal learning techniques aligning transcriptomic and textual modalities. This model will be evaluated in tasks such as zero-shot cell property classification and text generation.

This thesis is co-supervised by Sikander Hayat and Rafael Kramann, Department of Medicine II, University Hospital Aachen.

Please send your application to Yongli Mou, M.Sc. (mou@dbis.rwth-aachen.de) and CC. Dr. Sikander Hayat (shayat@ukaachen.de)

Thesis Type	Master
Status	Open
Presentation room	Seminar room I5 6202
Supervisor(s)	Stefan Decker
Advisor(s)	Yongli Mou
Contact	mou@dbis.rwth-aachen.de

Background

Understanding transcriptomic data is central to many areas of biomedical research, including disease modeling, therapeutic discovery, and personalized medicine. However, transcriptomic data is high-dimensional and complex, often requiring expert interpretation, which limits its accessibility and integration with clinical knowledge stored in natural language formats. Existing multimodal models typically focus on aligning image and text data, but few are equipped to handle the unique challenges presented by transcriptomic data alignment with natural language. This thesis will explore the development of a model that not only captures the biological features of transcriptomic data but also aligns this data with textual information for different biomedical applications.

Objectives

Design and develop multimodal models that bridge transcriptomic data and natural language.
Investigate modality alignment methods to ensure cross-modal representations.
Evaluate the models in tasks such as zero-shot cell property classification and text generation

Tasks

Systematic Literature Review
- Review existing models for transcriptomic data representation, language models, and multimodal learning approaches, particularly in aligning high-dimensional biological data with natural language.
- Identify specific challenges in modality alignment between transcriptomic and textual data, focusing on issues like feature sparsity, dimensionality, and domain-specific terminology.

Data Collection and Preprocessing
- Gather extensive transcriptomic datasets from databases such as GEO and CELLxGENE, alongside relevant biomedical literature and annotations.
- Preprocess data with techniques such as normalization, feature selection, and transformation to address variability in transcriptomic profiles and enhance cross-modal alignment.

Model Development
- Develop a transformer-based architecture that supports multimodal embeddings, incorporating specialized mechanisms (e.g., contrastive and self-supervised learning) to enhance the alignment between transcriptomic and text data.
- Integrate language models like BioBERT or GPT variants, designing shared embedding spaces that bridge the transcriptomic and textual representations.
Model Training and Evaluation
- Conduct pre-training on large transcriptome-text pairs to establish robust base representations, followed by task-specific fine-tuning.
- Evaluate TLM on various cross-modal tasks, such as zero-shot cell property classification and text generation using metrics for accuracy, interpretability, and alignment quality.

References

Schaefer M, Peneder P, Malzl D, Peycheva M, Burton J, Hakobyan A, Sharma V, Krausgruber T, Menche J, Tomazou EM, Bock C. Multimodal learning of transcriptomes and text enables interactive single-cell RNA-seq data exploration with natural-language chats. bioRxiv. 2024:2024-10.
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G. Learning transferable visual models from natural language supervision. InInternational conference on machine learning 2021 Jul 1 (pp. 8748-8763). PMLR.
Li J, Li D, Xiong C, Hoi S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning 2022 Jun 28 (pp. 12888-12900). PMLR.
Lu H, Liu W, Zhang B, Wang B, Dong K, Liu B, Sun J, Ren T, Li Z, Sun Y, Deng C. Deepseek-vl: towards real-world vision-language understanding. arXiv preprint arXiv:2403.05525. 2024 Mar 8.
Lu H, Liu W, Zhang B, Wang B, Dong K, Liu B, Sun J, Ren T, Li Z, Sun Y, Deng C. Deepseek-vl: towards real-world vision-language understanding. arXiv preprint arXiv:2403.05525. 2024 Mar 8.

Prerequisites:

Knowledge in Machine Learning, Biology, and Multi-omics (e.g., genomic, proteomic, transcriptomic, epigenomic, etc.)
Programming language – Python
Deep Learning Framework – PyTorch, Transformers

DBIS

TLM: Bridge the Modality Gap between Transcriptome and Natural Language

Quick Links

Recent News

Recent Publications