You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Florin Dinu <fl...@epfl.ch> on 2017/09/19 16:00:16 UTC

the design of spilling to disk

Hello everyone,


In our group at EPFL we're doing research on understanding and potentially improving the performance of data-parallel frameworks that use secondary storage.

I was looking at the Flink code to understand how spilling to disk actually works.

So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw there are some spilling markers used.

I am curious if there is any design document available on this topic.

I was not able to find much online.

If there is no such design document I would appreciate if someone could help me understand how these spilling markers are used.

At a higher level, I am trying to understand how much data does Flink spill to disk after it has concluded that it needs to spill to disk.


Thank you very much

Florin Dinu

RE: the design of spilling to disk

Posted by "Newport, Billy" <Bi...@gs.com>.
Don’t forget there is also spilling/serialization in between stages in the pipeline if operations cannot be chained.


From: Kurt Young [mailto:ykt836@gmail.com]
Sent: Tuesday, September 19, 2017 9:09 PM
To: Florin Dinu
Cc: Kostas Kloudas; user@flink.apache.org; fhueske@apache.org
Subject: Re: the design of spilling to disk

Copied from my earlier response to some similar question:

"Here is a short description for how it works: there are totally 3 threads working together, one for reading, one for sorting partial data in memory, and the last one is responsible for spilling. Flink will first figure out how many memory it can use during the in-memory sort, and manage them as MemorySegments. Once these memory runs out, the sorting thread will take over these memory and do the in-memory sorting (For more details about in-memory sorting, you can see NormalizedKeySorter). After this, the spilling thread will write this sorted data to disk and make these memory available again for reading. This will repeated until all data has been processed.
Normally, the data will be read twice (one from source, and one from disk) and write once, but if you spilled too much files, flink will first merge some all the files and make sure the last merge step will not exceed some limit (default 128). Hope this can help you."

Best,
Kurt

On Wed, Sep 20, 2017 at 12:19 AM, Florin Dinu <fl...@epfl.ch>> wrote:

Hi Kostas,



Thank you for the quick reply and the tips. I will check them out !



I would like to start by understanding the way secondary storage is used in batch processing.

If you guys have additional pointers on that, it would certainly help me a lot.



Thanks again,

Florin



________________________________
From: Kostas Kloudas <k....@data-artisans.com>>
Sent: Tuesday, September 19, 2017 18:10
To: Florin Dinu
Cc: user@flink.apache.org<ma...@flink.apache.org>; fhueske@apache.org<ma...@apache.org>
Subject: Re: the design of spilling to disk

Hi Florin,

Unfortunately, there is no design document.

The UnilateralSortMerger.java is used in the batch processing mode (not is streaming) and,
in fact, the code dates some years back. I cc also Fabian as he may have more things to say on this.

Now for the streaming side, Flink uses 3 state-backends, in-memory (no spilling and mainly useful for testing),
filesystem and RocksDB (both eventually spill to disk but in different ways), and it also supports incremental
checkpoints, i.e. at each checkpoint it only stores the diff between checkpoint[i] and checkpoint[i-1].

For more information on Flink state and state backends, checkout the latest talk from Stefan Richter at
Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DdWQ24wERItM&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=rlkM70D3djmDN7dGPzzbVKG26ShcTFDMKlX5AWucE5Q&m=h8pcETdgef2LWpd2jM2xBNDGg1IhgUhJtSTTqk2lHBo&s=lV68ganlv5sPdUd4fV1rf_MwS95kLYKE3POT3MqBucI&e=>) and the .

Cheers,
Kostas

On Sep 19, 2017, at 6:00 PM, Florin Dinu <fl...@epfl.ch>> wrote:

Hello everyone,

In our group at EPFL we're doing research on understanding and potentially improving the performance of data-parallel frameworks that use secondary storage.
I was looking at the Flink code to understand how spilling to disk actually works.
So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw there are some spilling markers used.
I am curious if there is any design document available on this topic.
I was not able to find much online.
If there is no such design document I would appreciate if someone could help me understand how these spilling markers are used.
At a higher level, I am trying to understand how much data does Flink spill to disk after it has concluded that it needs to spill to disk.

Thank you very much
Florin Dinu



Re: the design of spilling to disk

Posted by Kurt Young <yk...@gmail.com>.
Copied from my earlier response to some similar question:

"Here is a short description for how it works: there are totally 3 threads
working together, one for reading, one for sorting partial data in memory,
and the last one is responsible for spilling. Flink will first figure out
how many memory it can use during the in-memory sort, and manage them as
MemorySegments. Once these memory runs out, the sorting thread will take
over these memory and do the in-memory sorting (For more details about
in-memory sorting, you can see NormalizedKeySorter). After this, the
spilling thread will write this sorted data to disk and make these memory
available again for reading. This will repeated until all data has been
processed.
Normally, the data will be read twice (one from source, and one from disk)
and write once, but if you spilled too much files, flink will first merge
some all the files and make sure the last merge step will not exceed some
limit (default 128). Hope this can help you."

Best,
Kurt

On Wed, Sep 20, 2017 at 12:19 AM, Florin Dinu <fl...@epfl.ch> wrote:

> Hi Kostas,
>
>
> Thank you for the quick reply and the tips. I will check them out !
>
>
> I would like to start by understanding the way secondary storage is used
> in batch processing.
>
> If you guys have additional pointers on that, it would certainly help me a
> lot.
>
>
> Thanks again,
>
> Florin
>
>
> ------------------------------
> *From:* Kostas Kloudas <k....@data-artisans.com>
> *Sent:* Tuesday, September 19, 2017 18:10
> *To:* Florin Dinu
> *Cc:* user@flink.apache.org; fhueske@apache.org
> *Subject:* Re: the design of spilling to disk
>
> Hi Florin,
>
> Unfortunately, there is no design document.
>
> The UnilateralSortMerger.java is used in the batch processing mode (not
> is streaming) and,
> in fact, the code dates some years back. I cc also Fabian as he may have
> more things to say on this.
>
> Now for the streaming side, Flink uses 3 state-backends, in-memory (no
> spilling and mainly useful for testing),
> filesystem and RocksDB (both eventually spill to disk but in different
> ways), and it also supports incremental
> checkpoints, i.e. at each checkpoint it only stores the diff between
> checkpoint[i] and checkpoint[i-1].
>
> For more information on Flink state and state backends, checkout the
> latest talk from Stefan Richter at
> Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM)
> and the .
>
> Cheers,
> Kostas
>
> On Sep 19, 2017, at 6:00 PM, Florin Dinu <fl...@epfl.ch> wrote:
>
> Hello everyone,
>
> In our group at EPFL we're doing research on understanding and potentially
> improving the performance of data-parallel frameworks that use secondary
> storage.
> I was looking at the Flink code to understand how spilling to disk
> actually works.
> So far I got to the UnilateralSortMerger.java and its spill and reading
> threads. I also saw there are some spilling markers used.
> I am curious if there is any design document available on this topic.
> I was not able to find much online.
> If there is no such design document I would appreciate if someone could
> help me understand how these spilling markers are used.
> At a higher level, I am trying to understand how much data does Flink
> spill to disk after it has concluded that it needs to spill to disk.
>
> Thank you very much
> Florin Dinu
>
>
>

Re: the design of spilling to disk

Posted by Florin Dinu <fl...@epfl.ch>.
Hi Kostas,


Thank you for the quick reply and the tips. I will check them out !


I would like to start by understanding the way secondary storage is used in batch processing.

If you guys have additional pointers on that, it would certainly help me a lot.


Thanks again,

Florin


________________________________
From: Kostas Kloudas <k....@data-artisans.com>
Sent: Tuesday, September 19, 2017 18:10
To: Florin Dinu
Cc: user@flink.apache.org; fhueske@apache.org
Subject: Re: the design of spilling to disk

Hi Florin,

Unfortunately, there is no design document.

The UnilateralSortMerger.java is used in the batch processing mode (not is streaming) and,
in fact, the code dates some years back. I cc also Fabian as he may have more things to say on this.

Now for the streaming side, Flink uses 3 state-backends, in-memory (no spilling and mainly useful for testing),
filesystem and RocksDB (both eventually spill to disk but in different ways), and it also supports incremental
checkpoints, i.e. at each checkpoint it only stores the diff between checkpoint[i] and checkpoint[i-1].

For more information on Flink state and state backends, checkout the latest talk from Stefan Richter at
Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM) and the .

Cheers,
Kostas

On Sep 19, 2017, at 6:00 PM, Florin Dinu <fl...@epfl.ch>> wrote:

Hello everyone,

In our group at EPFL we're doing research on understanding and potentially improving the performance of data-parallel frameworks that use secondary storage.
I was looking at the Flink code to understand how spilling to disk actually works.
So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw there are some spilling markers used.
I am curious if there is any design document available on this topic.
I was not able to find much online.
If there is no such design document I would appreciate if someone could help me understand how these spilling markers are used.
At a higher level, I am trying to understand how much data does Flink spill to disk after it has concluded that it needs to spill to disk.

Thank you very much
Florin Dinu


Re: the design of spilling to disk

Posted by Kostas Kloudas <k....@data-artisans.com>.
Hi Florin,

Unfortunately, there is no design document.

The UnilateralSortMerger.java is used in the batch processing mode (not is streaming) and, 
in fact, the code dates some years back. I cc also Fabian as he may have more things to say on this.

Now for the streaming side, Flink uses 3 state-backends, in-memory (no spilling and mainly useful for testing),
filesystem and RocksDB (both eventually spill to disk but in different ways), and it also supports incremental 
checkpoints, i.e. at each checkpoint it only stores the diff between checkpoint[i] and checkpoint[i-1].

For more information on Flink state and state backends, checkout the latest talk from Stefan Richter at 
Flink Forward Berlin 2017 (https://www.youtube.com/watch?v=dWQ24wERItM <https://www.youtube.com/watch?v=dWQ24wERItM>) and the .

Cheers,
Kostas

> On Sep 19, 2017, at 6:00 PM, Florin Dinu <fl...@epfl.ch> wrote:
> 
> Hello everyone,
> 
> In our group at EPFL we're doing research on understanding and potentially improving the performance of data-parallel frameworks that use secondary storage.
> I was looking at the Flink code to understand how spilling to disk actually works.
> So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw there are some spilling markers used.
> I am curious if there is any design document available on this topic.
> I was not able to find much online.
> If there is no such design document I would appreciate if someone could help me understand how these spilling markers are used.
> At a higher level, I am trying to understand how much data does Flink spill to disk after it has concluded that it needs to spill to disk.
> 
> Thank you very much
> Florin Dinu