You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Saiph Kappa <sa...@gmail.com> on 2016/02/25 19:58:36 UTC

Counting tuples within a window in Flink Stream

Hi,

In Flink Stream what's the best way of counting the number of tuples within
a window of 10 seconds? Using a map-reduce task? Asking because in spark
there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Re: Counting tuples within a window in Flink Stream

Posted by Stephan Ewen <se...@apache.org>.
Yes, Gyula, that should work. I would make the random key across a range of
10 * parallelism.




On Fri, Feb 26, 2016 at 7:16 PM, Gyula Fóra <gy...@gmail.com> wrote:

> Hey,
>
> I am wondering if the following code will result in identical but more
> efficient (parallel):
>
> input.keyBy(assignRandomKey).window(Time.seconds(10)
> ).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)
>
> Effectively just assigning random keys to do the preaggregation and then
> do a window on the pre-aggregated values. I wonder if this actually leads
> to correct results or how does it interplay with the time semantics.
>
> Cheers,
> Gyula
>
> Stephan Ewen <se...@apache.org> ezt írta (időpont: 2016. febr. 26., P,
> 19:10):
>
>> True, at this point it does not pre-aggregate in parallel, that is
>> actually a feature on the list but not yet added...
>>
>> On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <sa...@gmail.com>
>> wrote:
>>
>>> That code will not run in parallel right? So, a map-reduce task would
>>> yield better performance no?
>>>
>>>
>>>
>>> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> Then go for:
>>>>
>>>> input.timeWindowAll(Time.seconds(10)).fold(0, new
>>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
>>>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
>>>> { return integer + 1; } });
>>>>
>>>> Try to explore the API a bit, most things should be quite intuitive.
>>>> There are also some docs:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>>>>
>>>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want
>>>>> to count all tuples that are contained in a window.
>>>>>
>>>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Saiph,
>>>>>>
>>>>>> you can do it the following way:
>>>>>>
>>>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>>>>>     @Override
>>>>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>>>>>>         return integer + 1;
>>>>>>     }
>>>>>> });
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>> ​
>>>>>>
>>>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In Flink Stream what's the best way of counting the number of tuples
>>>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>>>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Counting tuples within a window in Flink Stream

Posted by Gyula Fóra <gy...@gmail.com>.
Hey,

I am wondering if the following code will result in identical but more
efficient (parallel):

input.keyBy(assignRandomKey).window(Time.seconds(10)
).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)

Effectively just assigning random keys to do the preaggregation and then do
a window on the pre-aggregated values. I wonder if this actually leads to
correct results or how does it interplay with the time semantics.

Cheers,
Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2016. febr. 26., P,
19:10):

> True, at this point it does not pre-aggregate in parallel, that is
> actually a feature on the list but not yet added...
>
> On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <sa...@gmail.com>
> wrote:
>
>> That code will not run in parallel right? So, a map-reduce task would
>> yield better performance no?
>>
>>
>>
>> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Then go for:
>>>
>>> input.timeWindowAll(Time.seconds(10)).fold(0, new
>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
>>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
>>> { return integer + 1; } });
>>>
>>> Try to explore the API a bit, most things should be quite intuitive.
>>> There are also some docs:
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>>>
>>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <sa...@gmail.com>
>>> wrote:
>>>
>>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
>>>> count all tuples that are contained in a window.
>>>>
>>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Saiph,
>>>>>
>>>>> you can do it the following way:
>>>>>
>>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>>>>     @Override
>>>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>>>>>         return integer + 1;
>>>>>     }
>>>>> });
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>> ​
>>>>>
>>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In Flink Stream what's the best way of counting the number of tuples
>>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Counting tuples within a window in Flink Stream

Posted by Stephan Ewen <se...@apache.org>.
True, at this point it does not pre-aggregate in parallel, that is actually
a feature on the list but not yet added...

On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <sa...@gmail.com> wrote:

> That code will not run in parallel right? So, a map-reduce task would
> yield better performance no?
>
>
>
> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Then go for:
>>
>> input.timeWindowAll(Time.seconds(10)).fold(0, new
>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
>> { return integer + 1; } });
>>
>> Try to explore the API a bit, most things should be quite intuitive.
>> There are also some docs:
>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>>
>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <sa...@gmail.com>
>> wrote:
>>
>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
>>> count all tuples that are contained in a window.
>>>
>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>>> Hi Saiph,
>>>>
>>>> you can do it the following way:
>>>>
>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>>>     @Override
>>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>>>>         return integer + 1;
>>>>     }
>>>> });
>>>>
>>>> Cheers,
>>>> Till
>>>> ​
>>>>
>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In Flink Stream what's the best way of counting the number of tuples
>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Counting tuples within a window in Flink Stream

Posted by Saiph Kappa <sa...@gmail.com>.
That code will not run in parallel right? So, a map-reduce task would yield
better performance no?



On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <se...@apache.org> wrote:

> Then go for:
>
> input.timeWindowAll(Time.seconds(10)).fold(0, new
> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
> { return integer + 1; } });
>
> Try to explore the API a bit, most things should be quite intuitive.
> There are also some docs:
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>
> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <sa...@gmail.com>
> wrote:
>
>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
>> count all tuples that are contained in a window.
>>
>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Saiph,
>>>
>>> you can do it the following way:
>>>
>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>>     @Override
>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>>>         return integer + 1;
>>>     }
>>> });
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In Flink Stream what's the best way of counting the number of tuples
>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>

Re: Counting tuples within a window in Flink Stream

Posted by Stephan Ewen <se...@apache.org>.
Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new
FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
{ return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.
There are also some docs:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams

On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <sa...@gmail.com> wrote:

> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
> count all tuples that are contained in a window.
>
> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Saiph,
>>
>> you can do it the following way:
>>
>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>     @Override
>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>>         return integer + 1;
>>     }
>> });
>>
>> Cheers,
>> Till
>> ​
>>
>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In Flink Stream what's the best way of counting the number of tuples
>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Counting tuples within a window in Flink Stream

Posted by Saiph Kappa <sa...@gmail.com>.
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
count all tuples that are contained in a window.

On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <tr...@apache.org> wrote:

> Hi Saiph,
>
> you can do it the following way:
>
> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>     @Override
>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
>         return integer + 1;
>     }
> });
>
> Cheers,
> Till
> ​
>
> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In Flink Stream what's the best way of counting the number of tuples
>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>
>> Thanks.
>>
>
>

Re: Counting tuples within a window in Flink Stream

Posted by Till Rohrmann <tr...@apache.org>.
Hi Saiph,

you can do it the following way:

input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new
FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o)
throws Exception {
        return integer + 1;
    }
});

Cheers,
Till
​

On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <sa...@gmail.com> wrote:

> Hi,
>
> In Flink Stream what's the best way of counting the number of tuples
> within a window of 10 seconds? Using a map-reduce task? Asking because in
> spark there is the method rawStream.countByWindow(Seconds(x)).
>
> Thanks.
>