You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Mihail Vieru <vi...@informatik.hu-berlin.de> on 2015/05/13 20:02:27 UTC

write data set to a single file

Hi,

I need to write a data set to a single file without setting the 
parallelism to 1.
How can I achieve this?

Cheers,
Mihail

P.S.: it's for persisting intermediate results in loops and reading 
those in the next iteration.
Which btw work for higher iteration counts with explicit persistence.

Re: write data set to a single file

Posted by Mihail Vieru <vi...@informatik.hu-berlin.de>.
Awesome, it works. Thanks! :)

On 13.05.2015 20:05, Stephan Ewen wrote:
> If you want to write a single file, you need to write it with one 
> task. So, you can run a program with parallelism 100 and just set the 
> sink operator to parallelism 1.
>
> You can set the parallelism of each individual operator by calling 
> "setParallelism()" after the operation, for example 
> "result.writeAsText(path).setParallelism(1)".
>
>
> On Wed, May 13, 2015 at 8:02 PM, Mihail Vieru 
> <vieru@informatik.hu-berlin.de <ma...@informatik.hu-berlin.de>> 
> wrote:
>
>     Hi,
>
>     I need to write a data set to a single file without setting the
>     parallelism to 1.
>     How can I achieve this?
>
>     Cheers,
>     Mihail
>
>     P.S.: it's for persisting intermediate results in loops and
>     reading those in the next iteration.
>     Which btw work for higher iteration counts with explicit persistence.
>
>


Re: write data set to a single file

Posted by Stephan Ewen <se...@apache.org>.
If you want to write a single file, you need to write it with one task. So,
you can run a program with parallelism 100 and just set the sink operator
to parallelism 1.

You can set the parallelism of each individual operator by calling
"setParallelism()" after the operation, for example
"result.writeAsText(path).setParallelism(1)".


On Wed, May 13, 2015 at 8:02 PM, Mihail Vieru <vieru@informatik.hu-berlin.de
> wrote:

> Hi,
>
> I need to write a data set to a single file without setting the
> parallelism to 1.
> How can I achieve this?
>
> Cheers,
> Mihail
>
> P.S.: it's for persisting intermediate results in loops and reading those
> in the next iteration.
> Which btw work for higher iteration counts with explicit persistence.
>