Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Sections
Personal tools
You are here: Home Theses Visualising Data Trends using Complex Event Processing.

Contact

Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports

Disclaimer

Webmaster

 

 

Visualising Data Trends using Complex Event Processing.

Thesis type
  • Master
Status Running
Supervisor(s)
Advisor(s)

Overview:

An enormous amount of data is generated by various heterogeneous sources and not just volume of data is increasing rapidly but also velocity with which the data arriving is increasing. There is a need for a system which helps in the exploration of real-time complex, high-velocity events in the streaming data. Complex event processing systems enable real-time analysis of high-velocity event streams and it is widely used for several use-cases like detecting anomalies, making real-time predictions and presenting data. In recent literature, researchers have proposed ways of using complex event processing to detect trends (increase or decrease in values for a certain period) in streaming data. 

The goal if this thesis is to develop a system having a CEP Engine which will be used to visualise trends and detect anomalies in an interactive way.

Motivation: 

With the development of advanced applications and tools to handle real-time streaming data with high volume and velocity, it is getting difficult to visualize the large and high-dimensional datasets which are arriving at high velocity. In recent times lots of businesses requirements need to analyze data and present data to end user in real-time for a quick response, there is a real challenge to visualize that high-dimensional-high-velocity data for efficient business outcomes. In recent years, there is a lot of development on the reduction of high-dimensional streaming data using various methods like Linear Discriminant analysis and Principal Component Analysis. Despite adopting various strategies for minimizing the data-loss in dimensional reduction, we still face the problem of information loss making the applications or approach inefficient. Also, reducing the high-dimensional data arriving with high-velocity is really challenging to produce good results for visualization. The outcome of this thesis will be a system which instead of visualizing the whole data-set will be able to detect the important trends in data streams and visualise these trends. 

Goals:

1. Define Data Model: Streaming data cannot be stored on disk as the data is arriving in large volume and high velocity unlike the static data. So we need some kind of mechanism to get a fraction of data from an unbounded stream of data to do processing on it. The sliding window is one such technique which selects a block of data based on the time interval and it is only accessible once such that there is no scope of using that data again. The window slide itself after a certain period and the data which remains inside the window is being used. As the new data arrives in the window the old data gets removed. To achieve this message broker such as Apache Kafka should be used to consume the data streams and receive the events at the consumer end.

2. Data Preprocessing using CEP: Event streamed data needs to be processed using complex event processing. The goal is to build a CEP engine that would use use-defined rules stored in a rule base and execute those rules to detect the primitive events and generate composite events as alerts or trends which is then used for visualization.

3. Visualise Trend/Anomalies: Once the system is able to detect the trends in the time-series stream(for example, data from stock prices) using CEP, the goal is to visualize these observed trends. This is an important part of the system because through visualization businesses or users can exactly understand which data trends and detect anomalies for quick decision-making. Appropriate visualization techniques or models to visualize the data-set or specific trends should be selected. 

4. Evaluation of the system

 

 

Document Actions