Crawling and Scraping Scientific Events for Semantic Analysis
| Thesis type |
|
|---|---|
| Status | Open |
| Supervisor(s) |
There are a lot of Web-based services for scientific events based on communication tools like email lists, wikis or blogs. Recently, some of these services start to identify important semantic properties like topics, deadlines, venues, pc members or organizers with automatic semantic extraction tools. These extraction tools are based on natural language processing and are able to produce structured data, i.e. RDF tuples. Together with a data model of the scientific event domain complex queries would be possible. Such queries have the potential to improve scientific communication by better placing of scientific publications or events.
The task of this bachelor thesis is the creation of crawlers for communication tools like email lists, wikis or blogs and scraping tools based on existing tools, the storage of the data in an existing structured database and the realization of a Web-based query interface and/or RESTful API. The bachelor thesis is embedded in ongoing projects at our chair.
The ideal candidate should be able to program in Java and JavaScript, has some knowledge about databases and interests in semantic technologies as well as Web Services.

