HiWi - Web Crawling and Mining
We are looking for a student interested in tools and crawlers operating on web data and relational databases.
We are developing and hosting so-called Mediabases that offer tools for operating and analyzing media resources (e.g. blogs, papers, websites, forums, mailing lists) of communities of practice on the web. These media resources are crawled from the web and stored and updated every day in the Mediabase databases.
Since the representations of these resources are evolving (e.g. new possibilities with HTML5), the crawlers face many new challenges and need to be adapted and evolved accordingly.
Your task will be to fix and develop these crawlers and related tools which operate on the Mediabase data. You will be offered to work and learn in a stimulating environment, and the tools you will develop will be publicly available on the Web to large communities of users all over the world.
We expect profound practical skills in Python and SQL. Proficiency with Perl is also a nice-to-have.