Extending Data Stream Processors with Interoperable User-Defined Functions in WebAssembly

March 5th, 2024

Thesis Type
  • Bachelor
  • Master
Sandra Geisler
Liam Tirpitz

Over recent years, driven by an overall increase in data volume, increasingly complex data processing pipelines are placed on heterogeneous resources, across both centralized distributed environments. To perform their analysis of data in real-time, users can choose one of many data stream processing platforms in existence, such as Apache Spark Streaming, Apache Flink and others. However, these platforms have various strengths and weaknesses, and excel at processing data either in edge or cloud environments, not both.

To build efficient and dynamic data pipelines from distributed edge environments to centralized cloud infrastructure, or even across organizations, multiple platforms can be chained together.
However, the heterogeneity in processing platforms and the lack of common interfaces between them, poses a challenge. To dynamically shift operations between platforms, the operations themselves need to be interoperable between them.

This can, for example, be achieved by a common definition of functions and operators combined with platform-specific implementations, or by implementing interoperable function blocks supported across platforms. One promising technology to facilitate this kind of interoperability is WebAssembly (wasm).

In the context of this thesis, you will enable cross-platform data stream processing function blocks with WebAssembly.

Towards that goal, you will:

  • analyze data stream processing platforms, which already support extendibility via WebAssembly, such as Fluvio, RedPanda, or Galois for their similarities and differences
  • look at existing approaches to enable cross-platform processing pipelines, such as Apache Wayang.
  • Define a common interface for interoperable (wasm-based) Data Stream Processing functions
  • Extend existing platforms which currently do not support interoperable operators (Spark Streaming, Apache Flink, or NebulaStream), such that processing operations can be dynamically shifted between platforms.

The scope of this thesis can be adapted to either a bachelor or master thesis.


Interested? Questions? Contact Us with a CV and a current transcript of records!

Liam Tirpitz, M.Sc. – – Tel: +49 241 80-21542