Skip to content. | Skip to navigation

Personal tools
You are here: Home Theses Dynamic Topic Mining for Visual Analytics on Large Document Collections


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Dynamic Topic Mining for Visual Analytics on Large Document Collections

Thesis type
  • Master
  • Diplom
Student Nikou Günnemann-Gholizadeh
Status Finished
Submitted in 2013
Proposal on 04. Sep 2012 15:30
Proposal room Seminarraum I5
Add proposal to calendar vCal
Presentation on 21. Mar 2013 10:30
Presentation room Bibliothek I5
Add presentation to calendar vCal

This thesis will conceive, implement and deploy a dynamic topic modelling approach to expose topic dynamics within existing large community mediabases. The objective of this topic modelling approach is to identify topics as well as their bursts and shifts over time within different kinds of media in a community, e.g. in blogs, wikis, research projects, published papers.

A Community Mediabase is a set of databases which comprise different media artifacts relevant to a specific community, as well as tools to access that data. The artifacts typically include blogs, wikis, newslists, and similar social software artifacts; they may also include other relevant information like the community's collaboration networks, shared projects, and publications. In the scope of the TEL-Map EU project, a Community Mediabase for Technology Enhanced Learning (TEL) was created including databases for TEL projects, publications and blogs. Also, social network analysis (SNA) was performed on these data sets to identify the most relevant authors, projects, organizations, etc. in TEL [1].

The aim of this thesis is to complement the SNA approach with a semantic view by conceiving, implementing and deploying a probabilistic topic modelling approach to expose topic dynamics within the the Community Mediabase data. Topic Modelling is an emerging unsupervised machine learning field (see [2]), although with existing library code. This approach tries to extract topics from a text source using a "bag of words" paradigm and a model where topics are defined by a certain distribution of linguistic terms and where documents are deemed to be about several topics, each with different weightings. A number of algorithms exist to "reverse engineer" these distributions given the actual document content.

If you are interested in this thesis, please contact Dr. Michael Derntl.


  1. M. Derntl et al.: Mediabase ready and first analysis report. TEL-Map deliverable D4.3.
  2. D.M. Blei: Introduction to Probabilistic Topic Models. Princeton University.



We expect the student to have existing skills in SQL and OO programming, and have interest and/or experience in data analysis, text mining, topic modelling, information visualisation and R (the language and environment for statistical computing).

Related projects

Document Actions