You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2017/10/13 18:25:00 UTC

[jira] [Comment Edited] (HIVE-17214) check/fix conversion of unbucketed non-acid to acid

    [ https://issues.apache.org/jira/browse/HIVE-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170954#comment-16170954 ] 

Eugene Koifman edited comment on HIVE-17214 at 10/13/17 6:24 PM:
-----------------------------------------------------------------

non acid2acid conversion works except for TestAcidOnTez.testNonStandardConversion02 which tests data files found at different levels (root and subdirs).  For some reason it works locally but not in ptest

When converting unbucketed tables to acid, we assign bucketId based on file name.  For example,
if original table has 0000_0, 0000_0_copy1 - both will have bucketId property set such that the id of bucket/writer in it is 0.
Need to finish TestTxnNobuckets.testToAcidConversionMultiBucket() to have a test that covers the case when we start with 0000_0, 0001_0.  This should assign ROW__IDs as if there are 2 buckets.



was (Author: ekoifman):
non acid2acid conversion works except for TestAcidOnTez.testNonStandardConversion02 which tests data files found at different levels (root and subdirs).  For some reason it works locally but not in ptest

> check/fix conversion of unbucketed non-acid to acid
> ---------------------------------------------------
>
>                 Key: HIVE-17214
>                 URL: https://issues.apache.org/jira/browse/HIVE-17214
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Minor
>
> bucketed tables have stricter rules for file layout on disk - bucket files are direct children of a partition directory.
> for un-bucketed tables I'm not sure there are any rules
> for example, CTAS with Tez + Union operator creates 1 directory for each leg of the union
> Supposedly Hive can read table by picking all files recursively.  
> Can it also write (other than CTAS example above) arbitrarily?
> Does it mean Acid write can also write anywhere?
> Figure out what can be supported and how can existing layout can be checked?  Examining a full "ls -l -R" for a large table could be expensive. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)