You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by GitBox <gi...@apache.org> on 2023/01/30 23:53:57 UTC

[GitHub] [streampipes] flomickl created a discussion: Store and Handle Metadata in StreamPipes

GitHub user flomickl created a discussion: Store and Handle Metadata in StreamPipes

Hi,
We are handling the data in StreamPipes with:

- data_layer
  - with key value pairs

- description_layer
  - event schema where some basic description of the data takes place but mainly to create pipelines

The data layer is stored in the data lake.
Am I right that the description layer is not?

Also general Metadata about the Source and Process in missing.
e.g. what source. date of process, who dit it,...
and some other standards of metadata fields 

Any ideas if we should handle this somehow?

GitHub link: https://github.com/apache/streampipes/discussions/1186

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

@tenthe 
Can you create a short scratch visualization from the dataflow in StreamPipes that shows when persistent and when in memory is used?
Starts from producers over processors to sinks.
I have also an idea but I think I have some lack of informatiions or not the full picture in my mind.


I think this is helpful and could be part of the technical introduction.


GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-5001705

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

@tenthe Ah I just saw the data lake. Did not know that there is also an application data db in the backend.
It is possible to access this data somewhere together with the data lake data?

What are the differences between these two levels? Is there also a difference in the workflow?
> Sensor data is processed in two separate layers, one focused on streaming data that is not persistent and the other on long-term storage.



GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-4845887

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

Hi @flomickl,
all data only resides on the message broker, only if you use a data lake sink will it be persisted to the data lake.

GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-5007411

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user flomickl added a comment to the discussion: Store and Handle Metadata in StreamPipes

How to achieve that the data lake will not run into a data swamp or is this information stored somewhere as well 

For example, the units of a specific field in the data lake.
Also an important meta info is the username who executed the pipeline.
The producer source and so on.

There are a lot of different metadata types.
I know it from the geo perspective:
http://opengeospatial.github.io/e-learning/metadata/text/main.html

Who What Why Where When?
I think this is transferrable to "plain" sensor data as well.





GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-5001797

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

Hi @flomickl,
yes, you are right. There are two separate databases, one for the actual sensor data and one for the application data like the information of (adapters, processors, sinks, pipelines, users, ...).

For the storage of the application data we use a NO-SQL database (Apache CouchDB).

Sensor data is processed in two separate layers, one focused on streaming data that is not persistent and the other on long-term storage. There are several options for the message broker (e.g. Kafka, NATS), but we are currently discussing to focus on one of them.
For the sensor data we use a time series storage, currently this is InfluxDB. Currently there are also discussions to change this in the future.

If you or anyone else has experience with these technologies, please feel free to share your thoughts here. We are open to new ideas and suggestions.

Cheers,
Philipp


GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-4826264

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org


[GitHub] [streampipes] tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

Posted by GitBox <gi...@apache.org>.
GitHub user tenthe added a comment to the discussion: Store and Handle Metadata in StreamPipes

The data in the data lake can be accessed over the UI or with the StreamPipes client.
We use a message broker to route the data between the processing elemnts. This data is not persisted. 
If you want to perfom some offline analytics on the data it must be persisted in the data lake.

GitHub link: https://github.com/apache/streampipes/discussions/1186#discussioncomment-4850296

----
This is an automatically sent email for dev@streampipes.apache.org.
To unsubscribe, please send an email to: dev-unsubscribe@streampipes.apache.org