You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Alieh <sa...@informatik.uni-leipzig.de> on 2017/09/04 14:08:30 UTC

count + aggragation

Hello all,

1st question:
Is there any way to know the count or the content of a "Fink DataSet" 
without using count() or collect()? The problem is that I have a loop 
which the number of iterations depends on the count of a DataSet. Using 
count() may force the whole pipeline to be executed again. I do not like 
to use delta or bulk iteration.

2nd question:

Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on 
the second field, I observed that the second field is the real maximum 
of the whole dataset while the first field is not the corresponding one 
to the second!!!

Best,
Alieh

Re: count + aggragation

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Alieh,

I'm not aware of a solution to the first problem, but for the second issue
you should use mayBy() instead of max().

Best, Fabian

2017-09-04 16:08 GMT+02:00 Alieh <sa...@informatik.uni-leipzig.de>:

> Hello all,
>
> 1st question:
> Is there any way to know the count or the content of a "Fink DataSet"
> without using count() or collect()? The problem is that I have a loop which
> the number of iterations depends on the count of a DataSet. Using count()
> may force the whole pipeline to be executed again. I do not like to use
> delta or bulk iteration.
>
> 2nd question:
>
> Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on
> the second field, I observed that the second field is the real maximum of
> the whole dataset while the first field is not the corresponding one to the
> second!!!
>
> Best,
> Alieh
>
>