You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ádám Szita (Jira)" <ji...@apache.org> on 2020/10/12 13:17:00 UTC

[jira] [Created] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

Ádám Szita created HIVE-24266:
---------------------------------

             Summary: Committed rows in hflush'd ACID files may be missing from query result
                 Key: HIVE-24266
                 URL: https://issues.apache.org/jira/browse/HIVE-24266
             Project: Hive
          Issue Type: Bug
            Reporter: Ádám Szita
            Assignee: Ádám Szita


in HDFS environment if a writer is using hflush to write ORC ACID files during a transaction commit, the results might be seen as missing when reading the table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just ensures that new readers can see the new content. This causes the block information to be incomplete, on which BISplitStrategy relies on. Although the side file (_flush_length) tracks the proper end of the file that is being written, this information is neglected in the favour of block information, and we may end up generating a very short split instead of the larger, available length.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)