You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2019/03/13 09:39:00 UTC

[jira] [Updated] (HIVE-21439) Provide an option to reduce lookup overhead for bucketed tables

     [ https://issues.apache.org/jira/browse/HIVE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated HIVE-21439:
------------------------------------
    Description: 
If a table is bucketed, `OpTraitsRulesProcFactory::TableScanRule` ends up verifying if the partitions have got the same number of files as the number of buckets in table. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L185

In large tables, this turns out to be very time consuming (100s of seconds) operation. It would be good to have an option to by pass this depending on need basis.

  was:
If a table is bucketed, `OpTraitsRulesProcFactory::TableScanRule` ends up verifying if the partitions have got the same number of files as the number of buckets in table. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L185

In large tables, this turns out to be very time consuming operation. It would be good to have an option to by pass this depending on need basis.


> Provide an option to reduce lookup overhead for bucketed tables
> ---------------------------------------------------------------
>
>                 Key: HIVE-21439
>                 URL: https://issues.apache.org/jira/browse/HIVE-21439
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Trivial
>
> If a table is bucketed, `OpTraitsRulesProcFactory::TableScanRule` ends up verifying if the partitions have got the same number of files as the number of buckets in table. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L185
> In large tables, this turns out to be very time consuming (100s of seconds) operation. It would be good to have an option to by pass this depending on need basis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)