You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/01/21 21:35:00 UTC
[jira] [Resolved] (IMPALA-3394) Writes to S3 require equivalent local disk space

     [ https://issues.apache.org/jira/browse/IMPALA-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar resolved IMPALA-3394.
----------------------------------
    Resolution: Won't Fix

This has been significantly improved in S3A - [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#How_S3A_writes_data_to_S3]

I don't think it makes sense to make any changes in Impala to address this. If we continue to see problems with this we should work on improving the S3A implementation rather than writing our own.

> Writes to S3 require equivalent local disk space
> ------------------------------------------------
>
>                 Key: IMPALA-3394
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3394
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Sailesh Mukil
>            Assignee: Sahil Takiar
>            Priority: Minor
>              Labels: s3, supportability
>
> How S3 writes work through Impala is the following:
> 1) PlanFragmentExecutor calls sink->Open() which initializes the table writer(s) (parquet, text, etc.)
> 2) PlanFragmentExecutor calls sink->Send() which goes through the writer for the corressponding file format (HdfsTextTableWriter, HdfsParquetTableWriter, etc.). These writers ultimately call HdfsTableWriter::Write().
> 3) HdfsTableWriter::Write() calls hdfsWrite() which is a libHDFS function.
> 4) libHDFS determines which filesystem it's writing to and calls the appropriate write() function. In the S3A case, it uses S3AFileSystem.java:
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> 5) S3AFileSystem uses S3AOutputStream which buffers all the writes to a file(s) in the local disk:
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AOutputStream.java#L99
> 6) When our table writer calls Close(), it ultimately ends up calling S3AOutputStream::close(), *which only then uploads to S3*. S3AOutputStream::write() only writes to the local disk.
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AOutputStream.java#L120
> The problem here is that the local disk might not have enough space to buffer all these writes, which will cause the INSERT to fail. (When writing a 50GB file, we don't want to impose that the node *must* have 50GB free).
> Problem with the HdfsTextTableWriter:
>  - It buffers everything into one file, no matter how large.
> HdfsParquetTableWriter splits the write such that it creates multiple files with a default size of 256MB. So this is not as bad as HdfsTextTableWriter as each file is closed once it reaches 256MB (or whatever we set the default parquet file size to).
> Solutions:
> 1) Patch libHDFS to modify the S3AOutputStream so that we can stream writes to S3 instead of writing it all at once during Close().
> 2) Think of a longer term more permanent fix (like not using libHDFS for S3 and using the AWS SDK directly).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)