You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@streampipes.apache.org by "flomickl (via GitHub)" <gi...@apache.org> on 2023/02/11 22:32:23 UTC
[I] Missing values from file set adapter to datalake (streampipes)
flomickl opened a new issue, #1269:
URL: https://github.com/apache/streampipes/issues/1269
### Body
Hi
The file set adapter is losing some entries in the datalake
(starts with 4 entries, but only 2 are saved in the datalake)
Is something wrong with this csv? I don't think so.
## How to reproduce:
1) Setup of adapter with this csv as file set and csv import (delimiter , and header)
[distance.csv](https://github.com/apache/streampipes/files/10714833/distance.csv)
2) create simple pipeline with adapter and directly connected to datalake
does not matter if there are processors involved!
## What happens:
In the datalake are only 2 values stores => ID 2 and 3 are missing
Only ID 1 and 4 are stored
## What is expected
all four values are stored in the datalake
### StreamPipes Committer
I acknowledge that I am a maintainer/committer of the Apache StreamPipes project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Missing values from file set adapter to datalake (streampipes)
Posted by "tenthe (via GitHub)" <gi...@apache.org>.
tenthe commented on issue #1269:
URL: https://github.com/apache/streampipes/issues/1269#issuecomment-1427943977
Hello @flomickl,
the problem with the CSV file is that it has no column for the "timestamp".
When you replay the data, the data is streamed as fast as it can be read. This means that all events are be created and a timestamp will be appended in the adapter.
It is very likely that multiple events will have the same timestamp and since this is the index in the data lake, only one of these events will be stored.
One possible solution would be to add a timestamp to the raw events. Another solution is to mark one of the properties as a dimension property, since one event can be stored for each dimension even if it has the same timestamp.
I hope this is helpful.
Thanks a lot!
Philipp
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Missing values from file set adapter to datalake (streampipes)
Posted by "bossenti (via GitHub)" <gi...@apache.org>.
bossenti commented on issue #1269:
URL: https://github.com/apache/streampipes/issues/1269#issuecomment-1427091417
I could reproduce the issue, albeit I received 3 out of four records 😅
Export of data lake is attached
[2023-02-12_test-reproduce_all.csv](https://github.com/apache/streampipes/files/10716778/2023-02-12_test-reproduce_all.csv)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Missing values from file set adapter to datalake
Posted by "bossenti (via GitHub)" <gi...@apache.org>.
bossenti closed issue #1269: Missing values from file set adapter to datalake
URL: https://github.com/apache/streampipes/issues/1269
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampipes.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org