You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by ebru <b2...@cs.hacettepe.edu.tr> on 2017/11/29 14:26:07 UTC

Dataset using several count operator in the same environment

Hi all,

We are trying to use more than one count operator for dataset, but it executes first count and skips other operations. Also we call env.execute().
How can we solve this problem?

-Ebru

Re: Dataset using several count operator in the same environment

Posted by Timo Walther <tw...@apache.org>.

Hi Ebru,

the count() operator is a very simple utility functions that calls 
execute() internally. If you want to have a more complex pipeline you 
can take a look at how our WordCount [0] example works. The general 
concept is to emit a 1 for every record and sum the ones in parallel. If 
you need an overall count, you need to set the parallelism of the last 
operator to 1 (operator(xxx).setParallelism(1)), but this means that 
your job is not executed in parallel anymore.

It might also make sense to take a look at Flink's Table & SQL API [1] 
which makes such operations easier.

Hope that helps.

Regards,
Timo



[0] 
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/index.html


Am 11/29/17 um 3:26 PM schrieb ebru:
> Hi all,
>
> We are trying to use more than one count operator for dataset, but it executes first count and skips other operations. Also we call env.execute().
> How can we solve this problem?
>
> -Ebru