You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Aeden Jameson <ae...@gmail.com> on 2022/05/18 17:47:48 UTC

Confusing S3 Entropy Injection Behavior

I have checkpoints setup against s3 using the hadoop plugin. (I'll
migrate to presto at some point) I've setup entropy injection per the
documentation with

state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
s3.entropy.key: _entropy_

I'm seeing some behavior that I don't quite understand.

1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/...
literally exists. Meaning that "_entropy_" has not been replaced. At
the same time there are also a bunch of folders where "_entropy_" has
been replaced. Is that to be expected? If so, would someone elaborate
on why this is happening?

2. Should the paths in the checkpoints history tab in the FlinkUI
display the path the key? With the current setup it is not.

Thanks,
Aeden

GitHub: https://github.com/aedenj
Linked In: http://www.linkedin.com/in/aedenjameson

Re: Confusing S3 Entropy Injection Behavior

Posted by Aeden Jameson <ae...@gmail.com>.

Thanks for the response David. I'm using Flink 1.13.5.

>> For point 1 the behavior you are seeing is what is expected.

Great. That's what I concluded after digging into things a little
more. This helps me be sure I just didn't miss some other
configuration. Thank you.

>> For point 2, I'm not sure.

Ok, It appears to be the path to the file named "metadata"

>> FWIW, I would urge you to use presto instead of hadoop for checkpointing on S3. The performance of the hadoop "filesystem" is problematic when it's used for checkpointing.

For sure, it's definitely on the list.

On Thu, May 19, 2022 at 7:06 AM David Anderson <da...@apache.org> wrote:
>
> Aeden,
>
> I want to expand my answer after having re-read your question a bit more carefully.
>
> For point 1 the behavior you are seeing is what is expected. With hadoop the metadata written by the job manager will literally include "_entropy_" in its path, while this will be replaced in paths of any and all checkpoint data files. With presto the metadata path won't include "_entropy_" at all (it will disappear, rather than being replaced by something specific).
>
> For point 2, I'm not sure.
>
> David
>
> On Thu, May 19, 2022 at 2:37 PM David Anderson <da...@nosredna.org> wrote:
>>
>> This sounds like it could be FLINK-17359 [1]. What version of Flink are you using?
>>
>> Another likely explanation arises from the fact that only the checkpoint data files (the ones created and written by the task managers) will have the _entropy_ replaced. The job manager does not inject entropy into the path of the checkpoint metadata, so that it remains at a predictable URI. Since Flink only writes keyed state larger than state.storage.fs.memory-threshold into the checkpoint data files, and only those files have entropy injected into their paths, if all of your state is small it will all end up in the metadata file and you don't see any entropy injection happening. See the comments on [2] for more on this.
>>
>> FWIW, I would urge you to use presto instead of hadoop for checkpointing on S3. The performance of the hadoop "filesystem" is problematic when it's used for checkpointing.
>>
>> Regards,,
>> David
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-17359
>> [2] https://issues.apache.org/jira/browse/FLINK-24878
>>
>> On Wed, May 18, 2022 at 7:48 PM Aeden Jameson <ae...@gmail.com> wrote:
>>>
>>> I have checkpoints setup against s3 using the hadoop plugin. (I'll
>>> migrate to presto at some point) I've setup entropy injection per the
>>> documentation with
>>>
>>> state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
>>> s3.entropy.key: _entropy_
>>>
>>> I'm seeing some behavior that I don't quite understand.
>>>
>>> 1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/...
>>> literally exists. Meaning that "_entropy_" has not been replaced. At
>>> the same time there are also a bunch of folders where "_entropy_" has
>>> been replaced. Is that to be expected? If so, would someone elaborate
>>> on why this is happening?
>>>
>>> 2. Should the paths in the checkpoints history tab in the FlinkUI
>>> display the path the key? With the current setup it is not.
>>>
>>> Thanks,
>>> Aeden
>>>
>>> GitHub: https://github.com/aedenj
>>> Linked In: http://www.linkedin.com/in/aedenjameson



-- 
Cheers,
Aeden

GitHub: https://github.com/aedenj
Linked In: http://www.linkedin.com/in/aedenjameson

Re: Confusing S3 Entropy Injection Behavior

Posted by David Anderson <da...@apache.org>.

Aeden,

I want to expand my answer after having re-read your question a bit more
carefully.

For point 1 the behavior you are seeing is what is expected. With hadoop
the metadata written by the job manager will literally include "_entropy_"
in its path, while this will be replaced in paths of any and all checkpoint
data files. With presto the metadata path won't include "_entropy_" at all
(it will disappear, rather than being replaced by something specific).

For point 2, I'm not sure.

David

On Thu, May 19, 2022 at 2:37 PM David Anderson <da...@nosredna.org> wrote:

> This sounds like it could be FLINK-17359 [1]. What version of Flink are
> you using?
>
> Another likely explanation arises from the fact that only the
> checkpoint data files (the ones created and written by the task managers)
> will have the _entropy_ replaced. The job manager does not inject entropy
> into the path of the checkpoint metadata, so that it remains at a
> predictable URI. Since Flink only writes keyed state larger than
> state.storage.fs.memory-threshold into the checkpoint data files, and only
> those files have entropy injected into their paths, if all of your state is
> small it will all end up in the metadata file and you don't see any entropy
> injection happening. See the comments on [2] for more on this.
>
> FWIW, I would urge you to use presto instead of hadoop for checkpointing
> on S3. The performance of the hadoop "filesystem" is problematic when it's
> used for checkpointing.
>
> Regards,,
> David
>
> [1] https://issues.apache.org/jira/browse/FLINK-17359
> [2] https://issues.apache.org/jira/browse/FLINK-24878
>
> On Wed, May 18, 2022 at 7:48 PM Aeden Jameson <ae...@gmail.com>
> wrote:
>
>> I have checkpoints setup against s3 using the hadoop plugin. (I'll
>> migrate to presto at some point) I've setup entropy injection per the
>> documentation with
>>
>> state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
>> s3.entropy.key: _entropy_
>>
>> I'm seeing some behavior that I don't quite understand.
>>
>> 1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/...
>> literally exists. Meaning that "_entropy_" has not been replaced. At
>> the same time there are also a bunch of folders where "_entropy_" has
>> been replaced. Is that to be expected? If so, would someone elaborate
>> on why this is happening?
>>
>> 2. Should the paths in the checkpoints history tab in the FlinkUI
>> display the path the key? With the current setup it is not.
>>
>> Thanks,
>> Aeden
>>
>> GitHub: https://github.com/aedenj
>> Linked In: http://www.linkedin.com/in/aedenjameson
>>
>

Re: Confusing S3 Entropy Injection Behavior

Posted by David Anderson <da...@nosredna.org>.

This sounds like it could be FLINK-17359 [1]. What version of Flink are you
using?

Another likely explanation arises from the fact that only the
checkpoint data files (the ones created and written by the task managers)
will have the _entropy_ replaced. The job manager does not inject entropy
into the path of the checkpoint metadata, so that it remains at a
predictable URI. Since Flink only writes keyed state larger than
state.storage.fs.memory-threshold into the checkpoint data files, and only
those files have entropy injected into their paths, if all of your state is
small it will all end up in the metadata file and you don't see any entropy
injection happening. See the comments on [2] for more on this.

FWIW, I would urge you to use presto instead of hadoop for checkpointing on
S3. The performance of the hadoop "filesystem" is problematic when it's
used for checkpointing.

Regards,,
David

[1] https://issues.apache.org/jira/browse/FLINK-17359
[2] https://issues.apache.org/jira/browse/FLINK-24878

On Wed, May 18, 2022 at 7:48 PM Aeden Jameson <ae...@gmail.com>
wrote:

> I have checkpoints setup against s3 using the hadoop plugin. (I'll
> migrate to presto at some point) I've setup entropy injection per the
> documentation with
>
> state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
> s3.entropy.key: _entropy_
>
> I'm seeing some behavior that I don't quite understand.
>
> 1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/...
> literally exists. Meaning that "_entropy_" has not been replaced. At
> the same time there are also a bunch of folders where "_entropy_" has
> been replaced. Is that to be expected? If so, would someone elaborate
> on why this is happening?
>
> 2. Should the paths in the checkpoints history tab in the FlinkUI
> display the path the key? With the current setup it is not.
>
> Thanks,
> Aeden
>
> GitHub: https://github.com/aedenj
> Linked In: http://www.linkedin.com/in/aedenjameson
>