You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hocheol Park (Jira)" <ji...@apache.org> on 2019/10/28 10:07:00 UTC

[jira] [Updated] (HIVE-22413) Avoid dirty read when reading the ACID table while compaction is running

     [ https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hocheol Park updated HIVE-22413:
--------------------------------
    Attachment: HIVE-22413.1.patch

> Avoid dirty read when reading the ACID table while compaction is running
> ------------------------------------------------------------------------
>
>                 Key: HIVE-22413
>                 URL: https://issues.apache.org/jira/browse/HIVE-22413
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Hocheol Park
>            Priority: Major
>         Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while base or delta directories are being created by the compactor. Especially it is highly likely to occur in the S3 storage because the “move” logic of S3 is “copy and delete”, and it takes a long time to copy if the size of files are large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are existed in the partition directory on the process of listing the child directories when reading the ACID table, compare the names of the directory in the “_tmp” one and skip it in case of the same. Then it will read the files before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)