You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Paul S (Jira)" <ji...@apache.org> on 2021/01/08 12:56:00 UTC

[jira] [Comment Edited] (FLINK-10841) Reduce the number of ListObjects calls when checkpointing to S3

    [ https://issues.apache.org/jira/browse/FLINK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261291#comment-17261291 ] 

Paul S edited comment on FLINK-10841 at 1/8/21, 12:55 PM:
----------------------------------------------------------

To add a comment on this.
 We've noticed the same issue too. It's less obvious on AWS S3 but in our PureStorage S3 backend  ( [https://support.purestorage.com/FlashBlade/Purity_FB/PurityFB_REST_API/S3_Object_Store_REST_API/FlashBlade_S3_Object_Store_Documentation] ) this is causing significant performance issues and problems for other clients of the backend. 
 On AWS consequently this increases the usage price significantly just for checkpointing.

On further debugging we noticed the library for S3FS used by flink is performing a LOT of ListObjectsV1 api calls to S3. Instead of the newer and computationally cheaper ListObjectsV2 calls  (?). 

Could you revisit this issue and possibly categorize it as a bug given it's causing performance issues on the backends as well as significant costs? 

Are there any other workarounds we could do on our side to decrease the number of these calls? 

Let me know if you need more details on this.


was (Author: uberspot):
To add a comment on this.
We've noticed the same issue too. It's less obvious on AWS S3 but in our PureStorage S3 backend  ( [https://support.purestorage.com/FlashBlade/Purity_FB/PurityFB_REST_API/S3_Object_Store_REST_API/FlashBlade_S3_Object_Store_Documentation] ) this is causing significant performance issues and problems for other clients of the backend. 
On AWS consequently this increases the usage price significantly just for checkpointing.

On further debugging we noticed the library for S3FS used by flink is performing a LOT of ListObjectsV1 api calls to S3. Instead of the newer and computationally cheaper ListObjectsV2 calls  (?). 

Could you revisit this issue and possibly categorize it as a bug given it's causing performance issues on the backends as well as significant costs? 

Are there any other workarounds we could do on our side to decrease the number of these calls? 

> Reduce the number of ListObjects calls when checkpointing to S3
> ---------------------------------------------------------------
>
>                 Key: FLINK-10841
>                 URL: https://issues.apache.org/jira/browse/FLINK-10841
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.5.5, 1.6.2
>            Reporter: Pawel Bartoszek
>            Priority: Minor
>
> With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see loads of ListObjects calls. For instance the job with ~1600 tasks requires around 23000 ListObjects calls for every checkpoint including clearing it up by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of dollars pay month just for ListObjects calls. I am aware that implementation details might be hidden in Hadoop jar and maybe difficult to change, but at least maybe some workaround might be suggested?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)