You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "m@xi" <ma...@gmail.com> on 2018/01/31 13:27:27 UTC

Re: Maintain heavy hitters in Flink application

Hello everyone and Happy New Year!

Regarding the Heavy Hitter tracking...I wanna do it in a distributed manner. 

Thus,
1 -- Round Robin the input stream to a number of parallel map instances (say
p = env.parallelism)
2 -- Each one of the p mappers maintains approximately the HH of its
corresponding portion of the input, utilizing an algorithm like Space
Saving, Misha-Gries etc etc.
3 -- Every now and then I would like to concatenate the state of all the p
mappers into one, thus producing the global Space Saving summary for the
entire input stream.
4 -- Due to the fact that I wanna balance out things given to the p mappers
in the beginning, I wanna use rebalance(), i.e. round robin algorithm -->
Thus, its is not possible to use Keyed State.
5 -- So, I am going to use ListCheckpointed state as described in [1].
6 -- When the "every now and then" happens, I wanna merge the partial
summaries and I will emit them through a side output, as described in [2].

The question is the following: [1] shows an example of state-redistribution.
So...can I change the parallelism of the p instance parallel .map() from
within the operator, and merge the summaries for the HH there just before
emitting them to the side output???

Essentially, how should I implement the 6th bullet is my question.

Any advice, on it or on the general guideline implementation for getting the
aforementioned thing done, is more than welcome.

Cheer,
Max

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/side_output.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Maintain heavy hitters in Flink application

Posted by "m@xi" <ma...@gmail.com>.
Hi Timo,

Thanks a lot for the advice. I am working on it.

Cheers,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Maintain heavy hitters in Flink application

Posted by Timo Walther <tw...@apache.org>.
Hi,

I think it would be easier to implement a custom key selector and 
introduce some artifical key that spreads the load more evenly. This 
would also allow you to use keyed state. You could use a ProcessFunction 
and set timers to define the "every now and then". Keyed state would 
also ease the state redistribution in case the parallelism changes. 
Maybe could could also do the summary merge in some downstream 
operators. Maybe this talk [1] gives you some additional inspiration.

Regards,
Timo

[1] https://www.youtube.com/watch?v=Do7C4UJyWCM



Am 2/1/18 um 9:31 AM schrieb m@xi:
> Anyone, someone, somebody?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: Maintain heavy hitters in Flink application

Posted by "m@xi" <ma...@gmail.com>.
Anyone, someone, somebody? 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/