You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2017/07/20 20:19:00 UTC

[jira] [Created] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

Eugene Koifman created HIVE-17138:
-------------------------------------

             Summary: FileSinkOperator doesn't create empty files for acid path
                 Key: HIVE-17138
                 URL: https://issues.apache.org/jira/browse/HIVE-17138
             Project: Hive
          Issue Type: Bug
          Components: Transactions
    Affects Versions: 2.2.0
            Reporter: Eugene Koifman
            Assignee: Eugene Koifman


For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the actual bucketN file (as of HIVE-14007, it does it regardless of whether RecordUpdate sees any rows).  This causes empty (i.e.ORC metadata only) bucket files to be created.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that needs to land there is seen.  Thus it never creates empty buckets no mater what the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert path will matter since base and delta are the only files split computation, etc looks at.  delete_delta is only for Acid internals so there is never any reason for create empty files there.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)