You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ashish Attarde <as...@gmail.com> on 2018/03/01 08:21:53 UTC
Fwd: Hi Flink Team
Hi,
I am new to Flink and in general data processing using stream processors.
I am using flink to do real time correlation between multiple records which
are coming as part of same stream. I am doing is "apply" operation on
TimeWindowed stream. When I submit job with parallelism factor of 4, I am
still seeing apply operation is applied with parallelism factor of 1.
Here is the peice of code :
parsedInput.keyBy("mflowHash")
.timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
.allowedLateness(Time.seconds(10))
.apply(new CRWindow());
I am trying to correlate 2 streams, what is the right way to do it? I tried
the CEP library and experienced the worst performance. It is taking ~4
minutes to do the correlation. The corelation logic is very simple and not
compute intensive.
--
Thanks
-Ashish Attarde
--
Thanks
-Ashish Attarde
Re: Hi Flink Team
Posted by Ashish Attarde <as...@gmail.com>.
Thanks Piotrek for your response. Teena responsed for same. I am
implementing changes to try it out.
Yes, Originally I did call keyBy for same reason so that I can parallelize
the operation.
On Thu, Mar 1, 2018 at 1:24 AM, Piotr Nowojski <pi...@data-artisans.com>
wrote:
> Hi,
>
> timeWindowAll is a non parallel operation, since it gathers all of the
> elements and process them together:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/
> apache/flink/streaming/api/datastream/DataStream.html#
> timeWindowAll-org.apache.flink.streaming.api.windowing.
> time.Time-org.apache.flink.streaming.api.windowing.time.Time-
>
> Note that it’s defined in DataStream, not in the KeyedStream.
>
> In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use
> KeyedStream#timeWindows method?
>
> Piotrek
>
> On 1 Mar 2018, at 09:21, Ashish Attarde <as...@gmail.com> wrote:
>
> Hi,
>
> I am new to Flink and in general data processing using stream processors.
>
> I am using flink to do real time correlation between multiple records
> which are coming as part of same stream. I am doing is "apply" operation on
> TimeWindowed stream. When I submit job with parallelism factor of 4, I am
> still seeing apply operation is applied with parallelism factor of 1.
>
> Here is the peice of code :
>
> parsedInput.keyBy("mflowHash")
> .timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
> .allowedLateness(Time.seconds(10))
> .apply(new CRWindow());
>
>
> I am trying to correlate 2 streams, what is the right way to do it? I
> tried the CEP library and experienced the worst performance. It is taking
> ~4 minutes to do the correlation. The corelation logic is very simple and
> not compute intensive.
>
>
> --
>
> Thanks
> -Ashish Attarde
>
>
>
> --
>
> Thanks
> -Ashish Attarde
>
>
>
--
Thanks
-Ashish Attarde
Re: Hi Flink Team
Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,
timeWindowAll is a non parallel operation, since it gathers all of the elements and process them together:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#timeWindowAll-org.apache.flink.streaming.api.windowing.time.Time-org.apache.flink.streaming.api.windowing.time.Time- <https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#timeWindowAll-org.apache.flink.streaming.api.windowing.time.Time-org.apache.flink.streaming.api.windowing.time.Time->
Note that it’s defined in DataStream, not in the KeyedStream.
In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use KeyedStream#timeWindows method?
Piotrek
> On 1 Mar 2018, at 09:21, Ashish Attarde <as...@gmail.com> wrote:
>
> Hi,
>
> I am new to Flink and in general data processing using stream processors.
>
> I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I am still seeing apply operation is applied with parallelism factor of 1.
>
> Here is the peice of code :
>
> parsedInput.keyBy("mflowHash")
> .timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
> .allowedLateness(Time.seconds(10))
> .apply(new CRWindow());
>
> I am trying to correlate 2 streams, what is the right way to do it? I tried the CEP library and experienced the worst performance. It is taking ~4 minutes to do the correlation. The corelation logic is very simple and not compute intensive.
>
>
> --
>
> Thanks
> -Ashish Attarde
>
>
>
> --
>
> Thanks
> -Ashish Attarde