You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by deepthi Sridharan <de...@gmail.com> on 2021/03/30 18:37:04 UTC

IO benchmarking

Hi,

I am trying to set up some benchmarking with a couple of IO options for
saving checkpoints and have a couple of questions :

1. Does flink come with any IO benchmarking tools? I couldn't find any. I
was hoping to use those to derive some insights about the storage
performance and extrapolate it for the checkpoint use case.

2. Are there any metrics pertaining to restore from checkpoints? The only
metric I can find is the last restore time, but neither the time it took to
read the checkpoints, nor the time it took to restore the operator/task
states seem to be covered. I am using RocksDB, but couldn't find any
metrics relating to how much time it took to restore the state backend from
rocksdb either.

3. I am trying to find documentation on how the states are serialized into
the checkpoint files from multiple operators and tasks to tailor the
testing use case, but can't seem to find any. Are there any bogs that go
into this detail or would reading the code be the only option?

--
Thanks,
Deepthi

Re: IO benchmarking

Posted by Matthias Pohl <ma...@ververica.com>.
For 2. there are also efforts to expose the state and operator
initialization through the logs (see FLINK-17012 [1]).
For 3. the TypeSerializer [2] might be another point of interest. It is
used to serialize specific types. Other than that, the state
serialzation depends heavily on the used state backend. Hence, you want to
look into RocksDB's SSTables if relying on it as a state backend.

[1] https://issues.apache.org/jira/browse/FLINK-17012
[2]
https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/main/java/org/apache/flink/api/common/typeutils/TypeSerializer.java

On Thu, Apr 1, 2021 at 1:27 AM deepthi Sridharan <
deepthi.sridharan@gmail.com> wrote:

> Thanks, Matthias. This is very helpful.
>
> Regarding the checkpoint documentation, I was mostly looking for
> information on how states from various tasks get serialized into one (or
> more?) files on persistent storage. I'll check out the code pointers!
>
> On Wed, Mar 31, 2021 at 7:07 AM Matthias Pohl <ma...@ververica.com>
> wrote:
>
>> Hi Deepthi,
>> 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it
>> might be helpful.
>> 2. Unfortunately, Flink doesn't provide metrics like that. But you might
>> want to follow FLINK-21736 [2] for future developments.
>> 3. Is there anything specific you are looking for? Unfortunately, I don't
>> know any blogs for a more detailed summary. If you plan to look into the
>> code CheckpointCoordinator [3] might be a starting point. Alternatively,
>> something like MetadataV2V3SerializerBase [4] offers insights into how the
>> checkpoints' metadata is serialized.
>>
>> Best,
>> Matthias
>>
>> [1] https://github.com/apache/flink-benchmarks
>> [2] https://issues.apache.org/jira/browse/FLINK-21736
>> [3]
>> https://github.com/apache/flink/blob/11550edbd4e1874634ec441bde4fe4952fc1ec4e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1493
>> [4]
>> https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83
>>
>> On Tue, Mar 30, 2021 at 8:37 PM deepthi Sridharan <
>> deepthi.sridharan@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am trying to set up some benchmarking with a couple of IO options for
>>> saving checkpoints and have a couple of questions :
>>>
>>> 1. Does flink come with any IO benchmarking tools? I couldn't find
>>> any. I was hoping to use those to derive some insights about the storage
>>> performance and extrapolate it for the checkpoint use case.
>>>
>>> 2. Are there any metrics pertaining to restore from checkpoints? The
>>> only metric I can find is the last restore time, but neither the time it
>>> took to read the checkpoints, nor the time it took to restore the
>>> operator/task states seem to be covered. I am using RocksDB, but couldn't
>>> find any metrics relating to how much time it took to restore the state
>>> backend from rocksdb either.
>>>
>>> 3. I am trying to find documentation on how the states are serialized
>>> into the checkpoint files from multiple operators and tasks to tailor the
>>> testing use case, but can't seem to find any. Are there any bogs that go
>>> into this detail or would reading the code be the only option?
>>>
>>> --
>>> Thanks,
>>> Deepthi
>>>
>>
>
> --
> Regards,
> Deepthi
>

Re: IO benchmarking

Posted by deepthi Sridharan <de...@gmail.com>.
Thanks, Matthias. This is very helpful.

Regarding the checkpoint documentation, I was mostly looking for
information on how states from various tasks get serialized into one (or
more?) files on persistent storage. I'll check out the code pointers!

On Wed, Mar 31, 2021 at 7:07 AM Matthias Pohl <ma...@ververica.com>
wrote:

> Hi Deepthi,
> 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it
> might be helpful.
> 2. Unfortunately, Flink doesn't provide metrics like that. But you might
> want to follow FLINK-21736 [2] for future developments.
> 3. Is there anything specific you are looking for? Unfortunately, I don't
> know any blogs for a more detailed summary. If you plan to look into the
> code CheckpointCoordinator [3] might be a starting point. Alternatively,
> something like MetadataV2V3SerializerBase [4] offers insights into how the
> checkpoints' metadata is serialized.
>
> Best,
> Matthias
>
> [1] https://github.com/apache/flink-benchmarks
> [2] https://issues.apache.org/jira/browse/FLINK-21736
> [3]
> https://github.com/apache/flink/blob/11550edbd4e1874634ec441bde4fe4952fc1ec4e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1493
> [4]
> https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83
>
> On Tue, Mar 30, 2021 at 8:37 PM deepthi Sridharan <
> deepthi.sridharan@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to set up some benchmarking with a couple of IO options for
>> saving checkpoints and have a couple of questions :
>>
>> 1. Does flink come with any IO benchmarking tools? I couldn't find any. I
>> was hoping to use those to derive some insights about the storage
>> performance and extrapolate it for the checkpoint use case.
>>
>> 2. Are there any metrics pertaining to restore from checkpoints? The only
>> metric I can find is the last restore time, but neither the time it took to
>> read the checkpoints, nor the time it took to restore the operator/task
>> states seem to be covered. I am using RocksDB, but couldn't find any
>> metrics relating to how much time it took to restore the state backend from
>> rocksdb either.
>>
>> 3. I am trying to find documentation on how the states are serialized
>> into the checkpoint files from multiple operators and tasks to tailor the
>> testing use case, but can't seem to find any. Are there any bogs that go
>> into this detail or would reading the code be the only option?
>>
>> --
>> Thanks,
>> Deepthi
>>
>

-- 
Regards,
Deepthi

Re: IO benchmarking

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Deepthi,
1. Have you had a look at flink-benchmarks [1]? I haven't used it but it
might be helpful.
2. Unfortunately, Flink doesn't provide metrics like that. But you might
want to follow FLINK-21736 [2] for future developments.
3. Is there anything specific you are looking for? Unfortunately, I don't
know any blogs for a more detailed summary. If you plan to look into the
code CheckpointCoordinator [3] might be a starting point. Alternatively,
something like MetadataV2V3SerializerBase [4] offers insights into how the
checkpoints' metadata is serialized.

Best,
Matthias

[1] https://github.com/apache/flink-benchmarks
[2] https://issues.apache.org/jira/browse/FLINK-21736
[3]
https://github.com/apache/flink/blob/11550edbd4e1874634ec441bde4fe4952fc1ec4e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1493
[4]
https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83

On Tue, Mar 30, 2021 at 8:37 PM deepthi Sridharan <
deepthi.sridharan@gmail.com> wrote:

> Hi,
>
> I am trying to set up some benchmarking with a couple of IO options for
> saving checkpoints and have a couple of questions :
>
> 1. Does flink come with any IO benchmarking tools? I couldn't find any. I
> was hoping to use those to derive some insights about the storage
> performance and extrapolate it for the checkpoint use case.
>
> 2. Are there any metrics pertaining to restore from checkpoints? The only
> metric I can find is the last restore time, but neither the time it took to
> read the checkpoints, nor the time it took to restore the operator/task
> states seem to be covered. I am using RocksDB, but couldn't find any
> metrics relating to how much time it took to restore the state backend from
> rocksdb either.
>
> 3. I am trying to find documentation on how the states are serialized into
> the checkpoint files from multiple operators and tasks to tailor the
> testing use case, but can't seem to find any. Are there any bogs that go
> into this detail or would reading the code be the only option?
>
> --
> Thanks,
> Deepthi
>