Categories
Pages
-

DBIS

Big Data & Model Management

December 9th, 2021

Big data is a buzzword that summarizes various aspects of handling large amounts of heterogeneous data. The goals are to perform efficient analytics and to derive new information from large collection of potentially heterogeneous data. The heterogeneity of data is an important issue in big data: data is not only large in volume and produced at a high speed (velocity), it has also a high variety. This research groups applies and extends technologies that have been developed in the context of data integration and model management. Research in model management aims at developing technologies and mechanisms to support the integration, merging, evolution, and matching of complex data models to support the management of complex, integrated, distributed, heterogeneous information systems.

Manager(s)

Overview

The research group has a long experience in developing systems and applications for handling complex, heterogeneous data. The model management system GeRoMeSuite has been developed as a platform for generic model management. This means that the heterogeneous modeling languages (e.g., XML Schema, the Relational Data Model, OWL) are represented in a generic metamodel (GeRoMe) in order to enable the integration and mapping of models represented in different modeling languages.

In general, model management aims at developing technologies and mechanisms to support the integration, merging, evolution, and matching of complex data models. This support is required for the management of complex, integrated, distributed, heterogeneous information systems. Basic concepts in model management are models, mappings and operators. Models describe the structure of data. Mappings represent relationships between elements from different models. Operators are operations on models and mappings (e.g., merging & matching of models, composition of mapping).

The management of metadata is of particular importance for information integration, model management, and big data applications. Metadata is data about data and provides semantics to heterogeneous data; only with a description data becomes understandable and might become more valuable information. Furthermore, using a metadata-based approach in the design and implementation of an integrated information system increases the flexibility and adaptability of the system, as information about the structure of data models and their dependencies are not hidden in the source code of the system. Instead, this information is captured in semantically rich metadata models, which enable the (re)use of the information in various contexts. Furthermore, a semantically rich representation of data models supports the definition of model management operators.

The following topics are addressed in more detailed in this research group:

  • Big Data Architectures
  • Systems to manage Big Data (Hadoop, NoSQL systems such as MongoDB, etc.)
  • Scientific Data Management, especially in Life Science
  • Schema Mapping & Matching
  • Quality-oriented Data Integration
  • Semantic Web

Software