You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2021/01/04 10:04:00 UTC

[jira] [Commented] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2

    [ https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258117#comment-17258117 ] 

Zoltán Borók-Nagy commented on IMPALA-9723:
-------------------------------------------

Lowered the priority because AFAIK the current engines don't append to existing files, but create new ones. So the problem in the description is likely non-existent. But keeping this jira open until this behavior will be the standard.

> Read files created by Hive Streaming Ingestion V2
> -------------------------------------------------
>
>                 Key: IMPALA-9723
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9723
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Minor
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it writes a "side file" next to the file that contains the last committed file end (name of it is file name + _flush_length). Impala should take that into consideration when it reads such files. Everything after "flush length" must be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org