You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by GitBox <gi...@apache.org> on 2023/01/17 13:32:17 UTC

[GitHub] [streampipes] tenthe edited a discussion: Harmonize data set and data stream API

GitHub user tenthe edited a discussion: Harmonize data set and data stream API



## Harmonize `data set` and `data stream` APIs

We are currently looking at the Connect API and plan to refactor parts of it. Looking at the current implementation, I noticed that we have several cases that make the implementation more complex.

## Distinction between `data set` and `data stream` adapters

For example, we distinguish between `data set` and `data stream` adapters. Set adapters are treated as bounded streams, i.e. they stream a data set only once. Originally, this was added because it allows the user to replay existing events (e.g., from databases or files). However, I don't think this feature is used very often and we only have three implementations of set adapters. This feature adds a lot of overhead in many different places, such as the UI, the core, and extension services.

## Main features of current data sets

The main features of the data sets we currently use are:
- For the e2e tests to validate the processing elements
- Import a data set (e.g. CSV file) into the time-series storage

I think these are important and we should definitely keep them, but maybe we can find another solution to accomplish these tasks.

## Alternative solutions

New functionality:
- Add option to create adapters without starting them
- Add an option to the `FileStreamAdapter` to play the file only once

To import a dataset, a user (or the e2e tests) would need to create an adapter without starting it, create the pipeline, and then start the adapter.

## Recommendation

Since we don't have many benefits of the data set API, I would recommend removing it. This would also provide a clearer focus for StreamPipes because it focuses on streaming data produced by machines. Further, it will ease the implementation in many places without drawbacks in terms of functionality.
How do you see it?


PS: I would also like to harmonize the model for `GenericAdapters` and `SpecificAdapters`, but that is another discussion ;).

Cheers,
Philipp

GitHub link: https://github.com/apache/streampipes/discussions/1115

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org