You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2019/03/21 16:43:05 UTC

Flink and sketches

Hi to all,
I was looking for an approx_count and freq_item in Flink and I'm not sure
which road to follow.
At the moment I found 2 valuable options:

   1. Wait for STREAMLINE to unveil their code of HLL_DISTINCT_COUNT[1]
   2. Use the Yahoo Datasketches lib [2], following the example of Tobias
   Lindener [3][4] (and maybe release a better and reusable third party lib
   for Flink)

What do you advice about it? Is there any other ongoing effort on approx
statistics?

Best,
Flavio

[1]
https://h2020-streamline-project.eu/wp-content/uploads/2018/10/Streamline-D5.5-Final.pdf
[2] https://datasketches.github.io
[3]https://github.com/tlindener/ApproximateQueries/
[4]
https://www.slideshare.net/SeattleApacheFlinkMeetup/approximate-queries-and-graph-streams-on-apache-flink-theodore-vasiloudis-seattle-apache-flink-meetup