You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Biplob Biswas <re...@gmail.com> on 2016/06/06 08:30:37 UTC

Data Source Generator emits 4 instances of the same tuple

Hi,

I am using a Data Source Generator, which very closely resembles the one
here on the dataartisans github page

https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java

This essentially reads from a file and then emits the data which is then
consumed by my flink program.

But when I run this, I get 4 instances of the same tuple, so if I have 10
samples, I am getting 40 samples.

I am not really sure what's the problem here, if needed I can post my code
as well.

Thanks and Regards
Biplob Biswas



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Data Source Generator emits 4 instances of the same tuple

Posted by Biplob Biswas <re...@gmail.com>.
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was
problematic. So as suggested by Fabian and Robert, I used Source Function
and then in the flink job, i set the output of map with a parallelism of 4
to get the desired result.

Thanks again.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7513.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Data Source Generator emits 4 instances of the same tuple

Posted by Fabian Hueske <fh...@gmail.com>.
We solved this problem yesterday at the Flink Hackathon.
The issue was that the source function was started with parallelism 4 and
each function read the whole file.

Cheers, Fabian

2016-06-06 16:53 GMT+02:00 Biplob Biswas <re...@gmail.com>:

> Hi,
>
> I tried streaming the source data 2 ways
>
> 1. Is a simple straight forward way of sending data without using the
> serving speed concept
> http://pastebin.com/cTv0Pk5U
>
>
> 2. The one where I use the TaxiRide source which is exactly similar except
> loading the data in the proper data structures.
> http://pastebin.com/NenvXShH
>
>
> I hope to get a solution out of it.
>
> Thanks and Regards
> Biplob Biswas
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7405.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: Data Source Generator emits 4 instances of the same tuple

Posted by Biplob Biswas <re...@gmail.com>.
Hi,

I tried streaming the source data 2 ways 

1. Is a simple straight forward way of sending data without using the
serving speed concept
http://pastebin.com/cTv0Pk5U


2. The one where I use the TaxiRide source which is exactly similar except
loading the data in the proper data structures.
http://pastebin.com/NenvXShH


I hope to get a solution out of it.

Thanks and Regards
Biplob Biswas





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7405.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Data Source Generator emits 4 instances of the same tuple

Posted by Ufuk Celebi <uc...@apache.org>.
Can you please post your code as well? The duplication might happen in
your part of the code and not the TaxiRideSource.

– Ufuk


On Mon, Jun 6, 2016 at 10:30 AM, Biplob Biswas <re...@gmail.com> wrote:
> Hi,
>
> I am using a Data Source Generator, which very closely resembles the one
> here on the dataartisans github page
>
> https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java
>
> This essentially reads from a file and then emits the data which is then
> consumed by my flink program.
>
> But when I run this, I get 4 instances of the same tuple, so if I have 10
> samples, I am getting 40 samples.
>
> I am not really sure what's the problem here, if needed I can post my code
> as well.
>
> Thanks and Regards
> Biplob Biswas
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.