You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@streampipes.apache.org by "tenthe (via GitHub)" <gi...@apache.org> on 2023/02/15 06:54:00 UTC

[I] Harmonize data set and data stream API (streampipes)

tenthe opened a new issue, #1289:
URL: https://github.com/apache/streampipes/issues/1289

### Discussed in https://github.com/apache/streampipes/discussions/1115

<sup>Originally posted by **tenthe** January 17, 2023</sup>

## Harmonize `data set` and `data stream` APIs

We are currently looking at the Connect API and plan to refactor parts of it. Looking at the current implementation, I noticed that we have several cases that make the implementation more complex.

## Distinction between `data set` and `data stream` adapters

For example, we distinguish between `data set` and `data stream` adapters. Set adapters are treated as bounded streams, i.e. they stream a data set only once. Originally, this was added because it allows the user to replay existing events (e.g., from databases or files). However, I don't think this feature is used very often and we only have three implementations of set adapters. This feature adds a lot of overhead in many different places, such as the UI, the core, and extension services.

## Main features of current data sets

The main features of the data sets we currently use are:
- For the e2e tests to validate the processing elements
- Import a data set (e.g. CSV file) into the time-series storage

I think these are important and we should definitely keep them, but maybe we can find another solution to accomplish these tasks.

## Alternative solutions

New functionality:
- Add option to create adapters without starting them
- Add an option to the `FileStreamAdapter` to play the file only once

To import a dataset, a user (or the e2e tests) would need to create an adapter without starting it, create the pipeline, and then start the adapter.

## Recommendation

Since we don't have many benefits of the data set API, I would recommend removing it. This would also provide a clearer focus for StreamPipes because it focuses on streaming data produced by machines. Further, it will ease the implementation in many places without drawbacks in terms of functionality.
How do you see it?

PS: I would also like to harmonize the model for `GenericAdapters` and `SpecificAdapters`, but that is another discussion ;).

Cheers,
Philipp</div>

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] Harmonize data set and data stream API (streampipes)

Posted by "dominikriemer (via GitHub)" <gi...@apache.org>.

dominikriemer closed issue #1289: Harmonize data set and data stream API
URL: https://github.com/apache/streampipes/issues/1289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] Harmonize data set and data stream API (streampipes)

Posted by "tenthe (via GitHub)" <gi...@apache.org>.

tenthe closed issue #1289: Harmonize data set and data stream API
URL: https://github.com/apache/streampipes/issues/1289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org