You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/04 18:01:00 UTC

[jira] [Commented] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2

    [ https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126118#comment-17126118 ] 

ASF subversion and git services commented on IMPALA-9723:
---------------------------------------------------------

Commit 3c715864674004011dd87e01187f6ed378506a91 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3c71586 ]

IMPALA-9723: Raise error when when Hive Streaming side-file is found

Currently Impala cannot read a Hive Streaming file when it is being
appended, i.e. when a side-file tells the last committed file size.

With this commit Impala raises an error during table loading whenever
it encounters a side-file.

Testing:
 * added new unit test to AcidUtilsTest

Change-Id: I8223411570ec5e31bbb98b907cf0e5c235817760
Reviewed-on: http://gerrit.cloudera.org:8080/16002
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Read files created by Hive Streaming Ingestion V2
> -------------------------------------------------
>
>                 Key: IMPALA-9723
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9723
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it writes a "side file" next to the file that contains the last committed file end (name of it is file name + _flush_length). Impala should take that into consideration when it reads such files. Everything after "flush length" must be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org