You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Paschek, Robert" <ro...@tu-berlin.de> on 2016/05/05 13:29:46 UTC

Writing Intermediates to disk

Hi Mailing List,

I want to write and read intermediates to/from disk.
The following foo- codesnippet may illustrate my intention:

public void mapPartition(Iterable<T> tuples, Collector<T> out) {

                for (T tuple : tuples) {

                               if (Condition)
                                               out.collect(tuple);
                               else
                                               writeTupleToDisk
                }

                While ('TupleOnDisk')
                               out.collect('ReadNextTupleFromDisk');
}

I'am wondering if flink provides an integrated class for this purpose. I also have to precise identify the files with the intermediates due parallelism of mapPartition.


Thank you in advance!
Robert

AW: Writing Intermediates to disk

Posted by "Paschek, Robert" <ro...@tu-berlin.de>.

Hey,

thank you for your answers and sorry for my late response : - (

The intention was to store some of the data to disk, when the main memory gets full / my temporary ArrayList<Tuple> reaches a pre-defined size.

I used com.opencsv.CSVReader and import com.opencsv.CSVWriter for this task and getRuntimeContext().getIndexOfThisSubtask() to differ the filenames from other tasks, running on the same machine.

Fortunately that isn't no longer necessary form my work.

Best

Robert

________________________________
Von: Vikram Saxena <vi...@gmail.com>
Gesendet: Montag, 9. Mai 2016 12:15
An: user@flink.apache.org
Betreff: Re: Writing Intermediates to disk

I do not know if I understand completely, but I would  create a new DataSet based on filtering the condition and then persist this DataSet.

So :

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)

On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <uc...@apache.org>> wrote:
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

- Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<ro...@tu-berlin.de>> wrote:
> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While ('TupleOnDisk')
>
>                                out.collect('ReadNextTupleFromDisk');
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert

Re: Writing Intermediates to disk

Posted by Vikram Saxena <vi...@gmail.com>.

I do not know if I understand completely, but I would  create a new DataSet
based on filtering the condition and then persist this DataSet.

So :

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)




On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <uc...@apache.org> wrote:

> Flink has support for spillable intermediate results. Currently they
> are only set if necessary to avoid pipeline deadlocks.
>
> You can force this via
>
> env.getConfig().setExecutionMode(ExecutionMode.BATCH);
>
> This will write shuffles to disk, but you don't get the fine-grained
> control you probably need for your use case.
>
> – Ufuk
>
> On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
> <ro...@tu-berlin.de> wrote:
> > Hi Mailing List,
> >
> >
> >
> > I want to write and read intermediates to/from disk.
> >
> > The following foo- codesnippet may illustrate my intention:
> >
> >
> >
> > public void mapPartition(Iterable<T> tuples, Collector<T> out) {
> >
> >
> >
> >                 for (T tuple : tuples) {
> >
> >
> >
> >                                if (Condition)
> >
> >                                                out.collect(tuple);
> >
> >                                else
> >
> >                                                writeTupleToDisk
> >
> >                 }
> >
> >
> >
> >                 While (‘TupleOnDisk’)
> >
> >                                out.collect(‘ReadNextTupleFromDisk’);
> >
> > }
> >
> >
> >
> > I'am wondering if flink provides an integrated class for this purpose. I
> > also have to precise identify the files with the intermediates due
> > parallelism of mapPartition.
> >
> >
> >
> >
> >
> > Thank you in advance!
> >
> > Robert
>

Re: Writing Intermediates to disk

Posted by Ufuk Celebi <uc...@apache.org>.

Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

– Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<ro...@tu-berlin.de> wrote:
> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While (‘TupleOnDisk’)
>
>                                out.collect(‘ReadNextTupleFromDisk’);
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert