You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by B....@dell.com on 2020/03/27 06:10:24 UTC

Ask for reason for choice of S3 plugins

Hi,

In this document https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html#hadooppresto-s3-file-systems-plugins, it mentioned that

  *   Presto is the recommended file system for checkpointing to S3.
Is there a reason for that? Is there some bottleneck for s3 hadoop plugin that can't support checkpoint storage well?

And if I have the s3:// scheme with both plugins loaded, is there a class loading order or just random for accessing S3? Which plugin will take charge?

Best Regards,
Brian


Re: Ask for reason for choice of S3 plugins

Posted by David Anderson <da...@ververica.com>.
If you are using both the Hadoop S3 and Presto S3 filesystems, you should
use s3p:// and s3a:// to distinguish between the two.

Presto is recommended for checkpointing because the Hadoop implementation
has very high latency when creating files, and because it hits request rate
limits very quickly. The Hadoop S3 filesystem tries to imitate a normal
filesystem on top of S3:

 - before writing a key it checks if the "parent directory" exists by
checking for a key with the prefix up to the last "/"
 - it creates empty marker files to mark the existence of such a parent
directory
 - all these existence requests are S3 HEAD requests, which have very low
request rate limits


*David Anderson* | Training Coordinator

Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time


On Fri, Mar 27, 2020 at 7:10 AM <B....@dell.com> wrote:

> Hi,
>
>
>
> In this document
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html#hadooppresto-s3-file-systems-plugins,
> it mentioned that
>
>    - Presto is the recommended file system for checkpointing to S3.
>
> Is there a reason for that? Is there some bottleneck for s3 hadoop plugin
> that can’t support checkpoint storage well?
>
>
>
> And if I have the s3:// scheme with both plugins loaded, is there a class
> loading order or just random for accessing S3? Which plugin will take
> charge?
>
>
>
> Best Regards,
>
> Brian
>
>
>