Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Sections
Personal tools
You are here: Home Theses i5CloudMatch: An Entity-based Integration of Large-scale Datasets in the Cloud

Contact

Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports

Disclaimer

Webmaster

 

 

i5CloudMatch: An Entity-based Integration of Large-scale Datasets in the Cloud

Thesis type
  • Master
Student Ammar Sahib
Status Finished
Submitted in 2014
Proposal on 09. Jul 2013 00:00
Proposal room Bibliothek I5
Add proposal to calendar vCal
iCal
Presentation on 25. Feb 2014 14:30
Presentation room Bibliothek
Add presentation to calendar vCal
iCal
Supervisor(s)
Advisor(s)

The objective of this thesis is to design and implement a framework to analyze and integrate large-scale social networks on the cloud. The evaluation will be done using data from AERCS.

Background
Recent years have witnessed a viral growth in social networks. Facebook, Filckr, Twitter and YouTube all form complex social networks. Complex networks abstract interactions among entities in a graph representation. The graphs are usually at large-scale. Analyzing large scale complex networks often serves as basis for building intelligent systems such as recommender systems. However,  these large scale networks impose computing challenges mainly due to large volumes of data and the irregular structure of the graphs.

In that direction, recently, cloud computing has emerged as a tool for on-demand unlimited processing power. For example, Google and Yahoo use MapReduce on thousands of commodity PCs to process large data volumes, delivering intelligent solutions to Web users. Moreover, more evolved cloud computing solutions based on MapReduce are also available.

In our system called AERCS, we are implementing a recommendation system for computer scientists based on social network analysis. In this system, we are working with the collaboration (co-authorship) and citation networks among researchers, publications and venues (conferences, journals, workshops, etc.) with millions of nodes and edges. The raw data sources used for this analysis are also very big. The integration of such big and in the same time different data sources to provide a framework for further analysis poses many problems.

The objective of this thesis is to design and implement a framework to analyze and integrate large-scale data-sources/social networks in the cloud. The evaluation will be done using data from AERCS. The candidate should have a basic knowledge and interest in graph analysis or SNA. The candidate should also have interest in working with cutting-edge distributed data processing platforms.

Tasks
•    Working with large-scale datasets of citation and co-authorship networks
•    Develop efficient algorithms for analysis of complex social networks that run on a cloud computing platform (MapReduce, Pregel)
•    Shape the system with your own ideas and enthusiasm

Related projects

Document Actions