You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2017/10/02 18:55:00 UTC

[jira] [Updated] (HIVE-17547) MoveTask for Acid tables race condition

     [ https://issues.apache.org/jira/browse/HIVE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eugene Koifman updated HIVE-17547:
----------------------------------
    Description: 
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
          └── -ext-10000
            │   └── 000000_0
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00000
            │   └── 000000_1
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00001
{noformat}
for a write to a bucketed table.
The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019 under the table/partition dir, the others just add bucket_0000N there.
That means there is a small window where someone may "ls table/part/delta_0000019_0000019" and not see all the buckets.

Once Acid writes directly to the final location (a la MM tables) this issue resolves automatically since txn 19 is uncommitted until everything is written.

  was:
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
          └── -ext-10000
            │   └── 000000_0
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00000
            │   └── 000000_1
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00001
{noformat}
for a write to a bucketed table.
The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019 under the table/partition dir, the others just add bucket_0000N there.
That means there is a small window where someone may "ls table/part/delta_0000019_0000019" and not see all the buckets.


> MoveTask for Acid tables race condition
> ---------------------------------------
>
>                 Key: HIVE-17547
>                 URL: https://issues.apache.org/jira/browse/HIVE-17547
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> Consider Hive.moveAcidFiles()
> it starts out with something like
> {noformat}
>           └── -ext-10000
>             │   └── 000000_0
>             │       ├── _orc_acid_version
>             │       └── delta_0000019_0000019
>             │           └── bucket_00000
>             │   └── 000000_1
>             │       ├── _orc_acid_version
>             │       └── delta_0000019_0000019
>             │           └── bucket_00001
> {noformat}
> for a write to a bucketed table.
> The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019 under the table/partition dir, the others just add bucket_0000N there.
> That means there is a small window where someone may "ls table/part/delta_0000019_0000019" and not see all the buckets.
> Once Acid writes directly to the final location (a la MM tables) this issue resolves automatically since txn 19 is uncommitted until everything is written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)