You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Ning Zhang <nz...@fb.com> on 2011/03/23 01:20:32 UTC

Review Request: HIVE-2050. batch processing partition pruning process

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
-----------------------------------------------------------

Review request for hive.


Summary
-------

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs
-----

  trunk/metastore/if/hive_metastore.thrift 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1084243 
  trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1084243 
  trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1084243 
  trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1084243 
  trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1084243 
  trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1084243 
  trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1084243 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1084243 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1084243 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1084243 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1084243 

Diff: https://reviews.apache.org/r/522/diff


Testing
-------


Thanks,

Ning


Re: Review Request: HIVE-2050. batch processing partition pruning process

Posted by namit jain <nj...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/#review355
-----------------------------------------------------------


mostly minor issues - can you update the patch, and I will try to get it in today


trunk/conf/hive-default.xml
<https://reviews.apache.org/r/522/#comment705>

    spelling: alsore



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/522/#comment706>

    remove commented code



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
<https://reviews.apache.org/r/522/#comment708>

    Are these parameters used ?



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
<https://reviews.apache.org/r/522/#comment707>

    This check should be inside the loop where
    we are iterating over all the partitions.
    
    It may not matter, but we are marking all 
    partitions as unknown even if one partition is
    unknown.


- namit


On 2011-03-27 22:59:19, Ning Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/522/
> -----------------------------------------------------------
> 
> (Updated 2011-03-27 22:59:19)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> Introducing a new metastore API to retrieve a list of partitions in batch. 
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1085555 
>   trunk/conf/hive-default.xml 1085555 
>   trunk/metastore/if/hive_metastore.thrift 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1085555 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1085555 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1085555 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1085555 
> 
> Diff: https://reviews.apache.org/r/522/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Ning
> 
>


Re: Review Request: HIVE-2050. batch processing partition pruning process

Posted by Ning Zhang <nz...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
-----------------------------------------------------------

(Updated 2011-03-29 09:20:49.657378)


Review request for hive.


Changes
-------

Fixed pcr.q and updated with Java only patch.


Summary
-------

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086471 
  trunk/conf/hive-default.xml 1086471 
  trunk/metastore/if/hive_metastore.thrift 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1086471 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1086471 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1086471 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1086471 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1086471 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1086471 

Diff: https://reviews.apache.org/r/522/diff


Testing
-------


Thanks,

Ning


Re: Review Request: HIVE-2050. batch processing partition pruning process

Posted by Ning Zhang <nz...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
-----------------------------------------------------------

(Updated 2011-03-28 11:02:07.935934)


Review request for hive.


Changes
-------

Taken Namit's comments. 


Summary
-------

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1085555 
  trunk/conf/hive-default.xml 1085555 
  trunk/metastore/if/hive_metastore.thrift 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1085555 

Diff: https://reviews.apache.org/r/522/diff


Testing
-------


Thanks,

Ning


Re: Review Request: HIVE-2050. batch processing partition pruning process

Posted by Ning Zhang <nz...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
-----------------------------------------------------------

(Updated 2011-03-27 22:59:19.075996)


Review request for hive.


Changes
-------

There are 2 major changes from the last patch:
 - added a parameter hive.metastore.batch.retrieve.max to control the maximum number of partitions can be retrieved from the metastore in one batch (default 300). In Hive.getPartitionsByNames(), the input partition name list are separated into sublists and call the metastore API for each sublist.
 - one of the most time consuming DB operations is the retrieve the sub-classes of MPartition. In particular the list of FieldSchema are retrieved for each partition and they are never used (the table's field schema is used for all partitions). So one of the changes here is to omit the retrieval of FieldSchema and make the table's fieldschema as the partitions. If later we need the partition's fieldschema for schema evaluation, we should add another function/flag for that. 

These changes reduce memory by 50% and CPU by 20%. 


Summary
-------

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1085555 
  trunk/conf/hive-default.xml 1085555 
  trunk/metastore/if/hive_metastore.thrift 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1085555 

Diff: https://reviews.apache.org/r/522/diff


Testing
-------


Thanks,

Ning


Re: Review Request: HIVE-2050. batch processing partition pruning process

Posted by Ning Zhang <nz...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
-----------------------------------------------------------

(Updated 2011-03-25 13:50:22.615065)


Review request for hive.


Changes
-------

The previous patch is too large due to thrift-generated files. This is a Java-only patch by removing all thrift-generated files.


Summary
-------

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-----

  trunk/metastore/if/hive_metastore.thrift 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1085555 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1085555 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1085555 

Diff: https://reviews.apache.org/r/522/diff


Testing
-------


Thanks,

Ning