You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alieh <sa...@informatik.uni-leipzig.de> on 2017/09/04 14:08:30 UTC
count + aggragation
Hello all,
1st question:
Is there any way to know the count or the content of a "Fink DataSet"
without using count() or collect()? The problem is that I have a loop
which the number of iterations depends on the count of a DataSet. Using
count() may force the whole pipeline to be executed again. I do not like
to use delta or bulk iteration.
2nd question:
Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on
the second field, I observed that the second field is the real maximum
of the whole dataset while the first field is not the corresponding one
to the second!!!
Best,
Alieh
Re: count + aggragation
Posted by Fabian Hueske <fh...@gmail.com>.
Hi Alieh,
I'm not aware of a solution to the first problem, but for the second issue
you should use mayBy() instead of max().
Best, Fabian
2017-09-04 16:08 GMT+02:00 Alieh <sa...@informatik.uni-leipzig.de>:
> Hello all,
>
> 1st question:
> Is there any way to know the count or the content of a "Fink DataSet"
> without using count() or collect()? The problem is that I have a loop which
> the number of iterations depends on the count of a DataSet. Using count()
> may force the whole pipeline to be executed again. I do not like to use
> delta or bulk iteration.
>
> 2nd question:
>
> Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on
> the second field, I observed that the second field is the real maximum of
> the whole dataset while the first field is not the corresponding one to the
> second!!!
>
> Best,
> Alieh
>
>