You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/21 18:04:00 UTC

[jira] [Resolved] (IMPALA-3578) S3: Consider allowing table-sink to stage in HDFS when writing to S3

     [ https://issues.apache.org/jira/browse/IMPALA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-3578.
-----------------------------------
    Resolution: Won't Fix

Having HDFS +S3 co-existing is an unusual architecture, not work doing.

> S3: Consider allowing table-sink to stage in HDFS when writing to S3
> --------------------------------------------------------------------
>
>                 Key: IMPALA-3578
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3578
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Perf Investigation
>    Affects Versions: Impala 2.6.0
>            Reporter: Sailesh Mukil
>            Assignee: Sailesh Mukil
>            Priority: Minor
>              Labels: performance, s3
>
> If users do not want to skip the staging step on INSERTs to S3, we could allow the table sink to stage the temporary files in HDFS (if available) and make the coordinator move the files to S3 on FinalizeSuccessfulInsert().
> This could improve performance in INSERTs to S3 as writes to HDFS are faster than to S3 currently. Currently, when we do not skip the staging step, the sinks write to a temporary loaction in S3 and the coordinator copies over these files to the final location in S3 (as S3 doesn't support the rename() operation). So this would bring down the number of writes to S3 from 2 to 1 per file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)