You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "David Anderson (Jira)" <ji...@apache.org> on 2022/07/13 12:07:00 UTC

[jira] [Updated] (FLINK-24392) Upgrade presto s3 fs implementation to Trino >= 348

     [ https://issues.apache.org/jira/browse/FLINK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Anderson updated FLINK-24392:
-----------------------------------
    Fix Version/s: 1.16.0

> Upgrade presto s3 fs implementation to Trino >= 348
> ---------------------------------------------------
>
>                 Key: FLINK-24392
>                 URL: https://issues.apache.org/jira/browse/FLINK-24392
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.14.0
>            Reporter: Robert Metzger
>            Priority: Major
>             Fix For: 1.16.0
>
>
> The Presto s3 filesystem implementation currently shipped with Flink doesn't support streaming uploads. All data needs to be materialized to a single file on disk, before it can be uploaded.
> This can lead to situations where TaskManagers are running out of disk when creating a savepoint.
> The Hadoop filesystem implementation supports streaming uploads (by using multipart uploads of smaller (say 100mb) files locally), but it does more API calls, leading to other issues.
> Trino version >= 348 supports streaming uploads.
> During experiments, I also noticed that the current presto s3 fs implementation seems to allocate a lot of memory outside the heap (when shipping large data, for example when creating a savepoint). On a K8s pod with a memory limit of 4000Mi, I was not able to run Flink with a "taskmanager.memory.flink.size" above 3000m. This means that an additional 1gb of memory needs to be allocated just for the peaks in memory allocation when presto s3 is taking a savepoint. It would be good to confirm this behavior, and then either adjust the default memory configuration or the documentation.
> As part of this upgrade, we also need to make sure that the new presto / Trino version is not doing substantially more S3 API calls than the current version. After switching away from the presto s3 to hadoop s3, I noticed that disposing an old checkpoint (~100gb) can take up to 15 minutes. The upgraded presto s3 fs should still be able to quickly dispose state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)