You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@streampipes.apache.org by "flomickl (via GitHub)" <gi...@apache.org> on 2023/02/11 22:32:23 UTC

[I] Missing values from file set adapter to datalake (streampipes)

flomickl opened a new issue, #1269:
URL: https://github.com/apache/streampipes/issues/1269

   ### Body
   
   Hi 
   
   The file set adapter is losing some entries in the datalake
   (starts with 4 entries, but only 2 are saved in the datalake)
   Is something wrong with this csv? I don't think so.
   
   ## How to reproduce:
   1) Setup of adapter with this csv as file set and csv import (delimiter , and header)
   [distance.csv](https://github.com/apache/streampipes/files/10714833/distance.csv)
   
   2) create simple pipeline with adapter and directly connected to datalake
   does not matter if there are processors involved!
   
   ## What happens:
   In the datalake are only 2 values stores => ID 2 and 3 are missing 
   Only ID 1 and 4 are stored
   
   ## What is expected 
   all four values are stored in the datalake
   
   ### StreamPipes Committer
   
   I acknowledge that I am a maintainer/committer of the Apache StreamPipes project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Missing values from file set adapter to datalake (streampipes)

Posted by "tenthe (via GitHub)" <gi...@apache.org>.
tenthe commented on issue #1269:
URL: https://github.com/apache/streampipes/issues/1269#issuecomment-1427943977

   Hello @flomickl,
   the problem with the CSV file is that it has no column for the "timestamp".
   When you replay the data, the data is streamed as fast as it can be read. This means that all events are be created and a timestamp will be appended in the adapter.
   It is very likely that multiple events will have the same timestamp and since this is the index in the data lake, only one of these events will be stored.
   
   One possible solution would be to add a timestamp to the raw events. Another solution is to mark one of the properties as a dimension property, since one event can be stored for each dimension even if it has the same timestamp.
   
   I hope this is helpful.
   Thanks a lot!
   Philipp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Missing values from file set adapter to datalake (streampipes)

Posted by "bossenti (via GitHub)" <gi...@apache.org>.
bossenti commented on issue #1269:
URL: https://github.com/apache/streampipes/issues/1269#issuecomment-1427091417

   I could reproduce the issue, albeit I received 3 out of four records 😅 
   Export of data lake is attached
   [2023-02-12_test-reproduce_all.csv](https://github.com/apache/streampipes/files/10716778/2023-02-12_test-reproduce_all.csv)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Missing values from file set adapter to datalake

Posted by "bossenti (via GitHub)" <gi...@apache.org>.
bossenti closed issue #1269: Missing values  from file set adapter to datalake 
URL: https://github.com/apache/streampipes/issues/1269


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org