You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2019/02/27 04:08:00 UTC

[jira] [Created] (HIVE-21330) Bucketing id varies b/w data loaded through streaming apis and regular query

Vineet Garg created HIVE-21330:
----------------------------------

             Summary: Bucketing id varies b/w data loaded through streaming apis and regular query
                 Key: HIVE-21330
                 URL: https://issues.apache.org/jira/browse/HIVE-21330
             Project: Hive
          Issue Type: Bug
            Reporter: Vineet Garg


The test at [https://github.com/apache/hive/blob/master/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java#L439] tests for this case. It currently passes but for the wrong reason. This test checks for empty result set. Result sets are empty due to prior INSERT failing to load data not because the bucketing scheme is different.

This error with INSERT is fixed in https://github.com/apache/hive/pull/552. Test with this patch fails because the underlying bucketing ids generated are different.

These tests are run on MR instead of TEZ  which could explain the different bucketing ids.
I don't really know what are the repercussion of having different bucketing ids and why are they expected to be same but since there is a test to test this logic it is worth investigating the case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)