Synopsis operators for a distributed on-the-edge streaming architecture

May 23rd, 2023

Thesis Type
  • Master
Presentation room
Seminar room I5 6202
Sandra Geisler
Stefan Decker
Sandra Geisler

In the GALOIS Data Stream Processing on the Edge (DSPoE) system [1], data in form of streams are processed by continuous queries consisting of operators, such as joins, filtering, and so on. The operators are distributed over a network of processing nodes, i.e., mainly edge devices located close to their source.

Synopses are operators that maintain a summary of a data stream [2,3]. They consist of a corresponding data structure and an algorithm updating it. Depending on the algorithm synopsis approximate the actual data stream and its characteristics in varying accuracy.

The goal of this thesis is to implement new algorithms for creating synopses which are especially suitable for DSPoE systems, i.e., they must be lightweight, small in size, efficient to compute in terms of CPU power and battery consumption. Optimally, they should adapt to the corresponding resources of the target node or picked from a library of synopsis operators, which expose semantically their properties, such as time bias (only recent data, over the whole data stream), approximation bounds, types of data, and so on. The tasks in the thesis may comprise:

(1) identify characteristics of synopses suitable for DSPoE, especially for GALOIS

(2) research existing synopsis algorithms

(2) design a concept for a configurable (synopsis) operator, which semantically describes its characteristics

(3) implement varying synopsis operators in Wasm

(4) Evaluate the efficiency and suitability of the operators for identified use cases (e.g., from Internet of Production)

[1] Stolz, T., Koren, I., Tirpitz, L., & Geisler, S. (2023). GALOIS: A Hybrid and Platform-Agnostic Stream Processing Architecture. arXiv preprint arXiv:2305.02063.

[2] Kolomvatsos, K., Anagnostopoulos, C., Koziri, M., & Loukopoulos, T. (2020). Proactive & time-optimized data synopsis management at the edge. IEEE Transactions on Knowledge and Data Engineering34(7), 3478-3490.

[3] Buddhika, T., Malensek, M., Pallickara, S. L., & Pallickara, S. (2017). Synopsis: A distributed sketch over voluminous spatiotemporal observational streams. IEEE Transactions on Knowledge and Data Engineering29(11), 2552-2566.