You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Chao Sun <ch...@cloudera.com> on 2015/05/26 18:28:47 UTC

Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/
-----------------------------------------------------------

Review request for hive, chengxiang li and Xuefu Zhang.


Bugs: HIVE-9152
    https://issues.apache.org/jira/browse/HIVE-9152


Repository: hive-git


Description
-------

Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
  itests/src/test/resources/testconfiguration.properties 2a5f7e3 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
  ql/if/queryplan.thrift c8dfa35 
  ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
  ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
  ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
  ql/src/gen/thrift/gen-php/Types.php 7121ed4 
  ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
  ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
  ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
  ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
  ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
  ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
  ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
  ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
  ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
  ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
  ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
  ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
  ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
  ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
  ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
  ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
  ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
  ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
  ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
  ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
  serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
  serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
  serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
  serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
  serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
  service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
  service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
  service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
  service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
  service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
  service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
  service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
  service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
  service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
  service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
  service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 

Diff: https://reviews.apache.org/r/34666/diff/


Testing
-------

spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.


Thanks,

Chao Sun

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > itests/src/test/resources/testconfiguration.properties, line 894
> > <https://reviews.apache.org/r/34666/diff/1/?file=971683#file971683line894>
> >
> >     Are there more test cases that can be turned on?

will turn on vectorized_dynamic_partition_pruning.q


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java, line 74
> > <https://reviews.apache.org/r/34666/diff/1/?file=971705#file971705line74>
> >
> >     Is there anything specific to Spark? If not, we should probably reuse rather than copying.

The only difference is the type of pruning sink added - we use SparkPartitionPruningSinkOp while Tez uses AppMasterEventOp.
OK, I'll reuse the existing class.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 589
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line589>
> >
> >     It seems that an operator might be visited multiple times.

Yea, but I guess it doesn't matter here. We just use this to find the enclosing work for a op, and we just need to find at least one root op.
Duplicate doesn't matter here I think.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 107
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line107>
> >
> >     For the cloned tree, don't we need to remove the branches that's not connected to the pruning sink operator, i.e., RS->Join?

This is done before we clone the branch:

```
List<Operator<?>> savedChildOps = filterOp.getChildOperators();
filterOp.setChildOperators(Utilities.makeList(selOp));
```


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 111
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line111>
> >
> >     This is not cloned as part of cloneOperatorTree()?

no - because it is a transient field.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?

It shouldn't - I think work ID and Random#nextInt() should both be unique, right?


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java, line 68
> > <https://reviews.apache.org/r/34666/diff/1/?file=971701#file971701line68>
> >
> >     I think we should delegate the processing to the parent when processing one row from the batch. Refer to VectorReduceSinkOperator for an example.

Not much we can do here, since here the processing is more complicated. I changed part of the code to call the super.process().


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 98
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98>
> >
> >     Nit: Potential leak of BufferedOutputStream.

Can you explain a little under which situation this would happen? and what is the better way to do this? Thanks.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?
> 
> Xuefu Zhang wrote:
>     Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.
> 
> Chao Sun wrote:
>     Yes targetWorkID/sourceWorkID should be unique, but it could have multiple tasks from a single work, and if we don't have the random number, their results may overwrite each other. We also did the same thing for the hash table sink in Spark, and we haven't seen any issue with that.

targetWorkID/sourceWorkID are unique. We need random number because we could have multiple tasks for a particular work, in which case they may overwrite each other's file.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?

Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 98
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98>
> >
> >     Nit: Potential leak of BufferedOutputStream.
> 
> Chao Sun wrote:
>     Can you explain a little under which situation this would happen? and what is the better way to do this? Thanks.

fs.create() can be successful, while either "new BufferedOutputStream()" or "new ObjectOutputStream()" can fail (returning null). In that case, the file descriptor returned by fs.create() will leak.

There is a new notation in java 7 for automatic resource management. Refer to: http://radar.oreilly.com/2011/09/java7-features.html


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?
> 
> Xuefu Zhang wrote:
>     Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.

Yes targetWorkID/sourceWorkID should be unique, but it could have multiple tasks from a single work, and if we don't have the random number, their results may overwrite each other. We also did the same thing for the hash table sink in Spark, and we haven't seen any issue with that.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------



itests/src/test/resources/testconfiguration.properties (line 894)
<https://reviews.apache.org/r/34666/#comment142628>

    Are there more test cases that can be turned on?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java (line 68)
<https://reviews.apache.org/r/34666/#comment142851>

    I think we should delegate the processing to the parent when processing one row from the batch. Refer to VectorReduceSinkOperator for an example.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java (line 74)
<https://reviews.apache.org/r/34666/#comment142852>

    Is there anything specific to Spark? If not, we should probably reuse rather than copying.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java (line 45)
<https://reviews.apache.org/r/34666/#comment142853>

    Same as above. We should probably reuse.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 375)
<https://reviews.apache.org/r/34666/#comment142860>

    Instead of throwing an AssertionError, should we do a condition assertion instead?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 589)
<https://reviews.apache.org/r/34666/#comment142870>

    It seems that an operator might be visited multiple times.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 219)
<https://reviews.apache.org/r/34666/#comment142758>

    The comment here is a little confusing. "break op tree" seems having already happened above.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 224)
<https://reviews.apache.org/r/34666/#comment142759>

    Nit: add comments here, like "regenerate task dependency".



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 262)
<https://reviews.apache.org/r/34666/#comment142756>

    Rename generateWorkTree() to generateTaskTreeHelper() or something like that.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 71)
<https://reviews.apache.org/r/34666/#comment142757>

    Rename the class to something like OperatorTreeSplitterForPPD().



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 81)
<https://reviews.apache.org/r/34666/#comment142760>

    Nit: Split this into two lines instead.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 107)
<https://reviews.apache.org/r/34666/#comment142764>

    For the cloned tree, don't we need to remove the branches that's not connected to the pruning sink operator, i.e., RS->Join?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 111)
<https://reviews.apache.org/r/34666/#comment142768>

    This is not cloned as part of cloneOperatorTree()?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 69)
<https://reviews.apache.org/r/34666/#comment142765>

    Nit: remove the blank line.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 92)
<https://reviews.apache.org/r/34666/#comment142766>

    Can we still get conflicts in the file name?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 98)
<https://reviews.apache.org/r/34666/#comment142767>

    Nit: Potential leak of BufferedOutputStream.


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On July 6, 2015, 10:26 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java, line 77
> > <https://reviews.apache.org/r/34666/diff/2/?file=999023#file999023line77>
> >
> >     I guess I don't know enough to comment on this, but looking at VectorReduceSinkOperator and VectorAppMasterEventOperator I can see some prominent differences:
> >     
> >     1. first batch detection and processing there
> >     2. VectorizedSerde logic here
> >     
> >     Probably a live review will help.

Chatted with Xuefu offline and we've cleared some doubts here.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90593
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90593
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java (line 69)
<https://reviews.apache.org/r/34666/#comment143691>

    Nit: remove the blank line.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java (line 77)
<https://reviews.apache.org/r/34666/#comment143693>

    I guess I don't know enough to comment on this, but looking at VectorReduceSinkOperator and VectorAppMasterEventOperator I can see some prominent differences:
    
    1. first batch detection and processing there
    2. VectorizedSerde logic here
    
    Probably a live review will help.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 69)
<https://reviews.apache.org/r/34666/#comment143690>

    Nit: remove the empty line.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 75)
<https://reviews.apache.org/r/34666/#comment143689>

    Isn't this is the last operator in the operator graph? If so, we don't need to call forward(), right?


- Xuefu Zhang


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review91427
-----------------------------------------------------------

Ship it!


Ship It!

- chengxiang li


On 七月 8, 2015, 6:04 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated 七月 8, 2015, 6:04 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
>   itests/src/test/resources/testconfiguration.properties 4f2de12 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f58a10b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java ca0ffb6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 2ff3951 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a7cf8b7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java ad47547 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 7992c88 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 7f2c079 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9e9a2a2 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_vectorized_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/
-----------------------------------------------------------

(Updated July 8, 2015, 6:04 p.m.)


Review request for hive, chengxiang li and Xuefu Zhang.


Bugs: HIVE-9152
    https://issues.apache.org/jira/browse/HIVE-9152


Repository: hive-git


Description
-------

Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
  itests/src/test/resources/testconfiguration.properties 4f2de12 
  ql/if/queryplan.thrift c8dfa35 
  ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f58a10b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java ca0ffb6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 2ff3951 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a7cf8b7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java ad47547 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 7992c88 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 7f2c079 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9e9a2a2 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_vectorized_dynamic_partition_pruning.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34666/diff/


Testing
-------

spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.


Thanks,

Chao Sun

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Gopal V <go...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90598
-----------------------------------------------------------



ql/if/queryplan.thrift (line 60)
<https://reviews.apache.org/r/34666/#comment143696>

    Enum ordering nit - this needs to move down to the end for b/c.


- Gopal V


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/
-----------------------------------------------------------

(Updated July 3, 2015, 10:45 p.m.)


Review request for hive, chengxiang li and Xuefu Zhang.


Bugs: HIVE-9152
    https://issues.apache.org/jira/browse/HIVE-9152


Repository: hive-git


Description
-------

Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
  itests/src/test/resources/testconfiguration.properties 2a5f7e3 
  ql/if/queryplan.thrift c8dfa35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
  ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
  ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
  ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
  ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
  ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
  ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
  ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
  ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
  ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
  ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
  ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
  ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
  ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
  ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
  ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
  ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
  ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
  ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 

Diff: https://reviews.apache.org/r/34666/diff/


Testing
-------

spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.


Thanks,

Chao Sun

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.


> On 七月 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java, line 59
> > <https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59>
> >
> >     The statistic data shoud be quite unaccurate after filter and group, as it's computered based on estimation during compile time. I think threshold verification on unaccurate data should be unacceptable as that means the threshold may not work at all.
> >     We may check this threshold in SparkPartitionPruningSinkOperator at runtime.
> 
> Chao Sun wrote:
>     Switching to runtime would be very different - here we want to check this threshold, and avoid generating the pruning task if possible.
>     How inaccurate the stats would be? I'm fine if it's always more conservative.

Take FilterOperator for example, the worst case is, it may just half the input rows as its statistic, you can find the rule for FilterOperator at FilterStatsRule, so it's a bad news that estimated statistics is not always conservative, this would make the threshold does not work as expected sometimes. You may create a followup work for this if it changes a lot.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------


On 七月 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated 七月 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java, line 59
> > <https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59>
> >
> >     The statistic data shoud be quite unaccurate after filter and group, as it's computered based on estimation during compile time. I think threshold verification on unaccurate data should be unacceptable as that means the threshold may not work at all.
> >     We may check this threshold in SparkPartitionPruningSinkOperator at runtime.

Switching to runtime would be very different - here we want to check this threshold, and avoid generating the pruning task if possible.
How inaccurate the stats would be? I'm fine if it's always more conservative.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 396
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line396>
> >
> >     Why we need List for table/cloumnname/partkey here? do we support multi PartitionPruningSinkOperator inside single operator tree?

This is because a target work with a partitioned table could have multiple partition columns which could come from multiple table and/or partkeys.
You can check test output file for some examples.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 61
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line61>
> >
> >     While append data size overwhelm its capability, DataOutputBuffer expand its byte array size by create a new byte array with 2x size and copy old one to new one. A estimated initial byte array size should be able to reduce most array copy.

Yes, this would be an improvement. Xuefu and me talked about adding an extra parameter to control the generated file size. We plan to do that as a follow-up task.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java, line 59
> > <https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59>
> >
> >     The statistic data shoud be quite unaccurate after filter and group, as it's computered based on estimation during compile time. I think threshold verification on unaccurate data should be unacceptable as that means the threshold may not work at all.
> >     We may check this threshold in SparkPartitionPruningSinkOperator at runtime.
> 
> Chao Sun wrote:
>     Switching to runtime would be very different - here we want to check this threshold, and avoid generating the pruning task if possible.
>     How inaccurate the stats would be? I'm fine if it's always more conservative.
> 
> chengxiang li wrote:
>     Take FilterOperator for example, the worst case is, it may just half the input rows as its statistic, you can find the rule for FilterOperator at FilterStatsRule, so it's a bad news that estimated statistics is not always conservative, this would make the threshold does not work as expected sometimes. You may create a followup work for this if it changes a lot.

OK, that makes sense. I think we can address this issue in the follow-up JIRA.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java (line 59)
<https://reviews.apache.org/r/34666/#comment143202>

    The statistic data shoud be quite unaccurate after filter and group, as it's computered based on estimation during compile time. I think threshold verification on unaccurate data should be unacceptable as that means the threshold may not work at all.
    We may check this threshold in SparkPartitionPruningSinkOperator at runtime.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 396)
<https://reviews.apache.org/r/34666/#comment143199>

    Why we need List for table/cloumnname/partkey here? do we support multi PartitionPruningSinkOperator inside single operator tree?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 61)
<https://reviews.apache.org/r/34666/#comment143203>

    While append data size overwhelm its capability, DataOutputBuffer expand its byte array size by create a new byte array with 2x size and copy old one to new one. A estimated initial byte array size should be able to reduce most array copy.


- chengxiang li


On 五月 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated 五月 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/if/queryplan.thrift, line 60
> > <https://reviews.apache.org/r/34666/diff/1/?file=971689#file971689line60>
> >
> >     I'm not sure if it matters, but it's probably better if we add it as the last.
> 
> Xuefu Zhang wrote:
>     It's still needed to move to the last as other also pointed out.

OK, fixed. Sorry I forgot last time.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 177
> > <https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177>
> >
> >     Any chance that an op might be visited multiple times?
> 
> Chao Sun wrote:
>     It shouldn't - it'a tree traversing and every operator should only be added once.
> 
> Xuefu Zhang wrote:
>     Actually there could be a diamond shape in the operator graph such as that formed by demux and mux operators. Join operator is another example. We should use graph traverse instead of tree traverse.

Yes you're right. However, the only usage for this is in SplitOpTreeForDPP, in which we pass a set as parameter. So it should only contain unique operators.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 177
> > <https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177>
> >
> >     Any chance that an op might be visited multiple times?

It shouldn't - it'a tree traversing and every operator should only be added once.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 519
> > <https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519>
> >
> >     numThread could be <= 0?

It could equal to 0, since getInputPaths() could return 0. This would result an IAE from newFixedThreadPool.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 164
> > <https://reviews.apache.org/r/34666/diff/1/?file=971704#file971704line164>
> >
> >     what's this change about?

This is to delay stats annotation until we've done the DPP optimization. The generated branches also need to be annotated with stats.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out, line 1914
> > <https://reviews.apache.org/r/34666/diff/1/?file=971731#file971731line1914>
> >
> >     why the stats are gone?

This is because we moved the stats annotation to SparkCompiler, and therefore in SimpleFetchOptimizer, the fetch task generated won't have the stats.
I don't see this is a big issue and Tez does the same thing as well.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 177
> > <https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177>
> >
> >     Any chance that an op might be visited multiple times?
> 
> Chao Sun wrote:
>     It shouldn't - it'a tree traversing and every operator should only be added once.

Actually there could be a diamond shape in the operator graph such as that formed by demux and mux operators. Join operator is another example. We should use graph traverse instead of tree traverse.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 519
> > <https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519>
> >
> >     numThread could be <= 0?
> 
> Chao Sun wrote:
>     It could equal to 0, since getInputPaths() could return 0. This would result an IAE from newFixedThreadPool.

Maybe there is a problem as you described, but I think that's irrelavent to the work here. Thus, we should create a separate JIRA to fix that instead of including it here.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/if/queryplan.thrift, line 60
> > <https://reviews.apache.org/r/34666/diff/1/?file=971689#file971689line60>
> >
> >     I'm not sure if it matters, but it's probably better if we add it as the last.

It's still needed to move to the last as other also pointed out.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


This a big patch, for a big feature. It's hard to review offline. Here I offered about things that are obvious. For better understanding, I think an in-person review would be more effective.


ql/if/queryplan.thrift
<https://reviews.apache.org/r/34666/#comment136752>

    I'm not sure if it matters, but it's probably better if we add it as the last.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/34666/#comment136753>

    Did you make any changes in this file? If not, let's leave it as it is.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
<https://reviews.apache.org/r/34666/#comment136942>

    File descriptor needs to be closed in final block. In addition, closing in is not sufficient, as in might be null while fs.open(fstatus.getPath() returns not null.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
<https://reviews.apache.org/r/34666/#comment136943>

    Any chance that an op might be visited multiple times?



ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
<https://reviews.apache.org/r/34666/#comment136946>

    numThread could be <= 0?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
<https://reviews.apache.org/r/34666/#comment136948>

    what's this change about?



ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out
<https://reviews.apache.org/r/34666/#comment136976>

    why the stats are gone?


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On July 2, 2015, 5:25 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java, line 251
> > <https://reviews.apache.org/r/34666/diff/1/?file=971699#file971699line251>
> >
> >     Log in error level should means some error happens,the process would be interrupted, if we really expect single field here, should we throw an exception while it has more? otherwise, we should downgrade the log level to WARN with more precise information.

OK, I changed this to assert, since it would be a bug if # of fields is not 1 here.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90191
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90191
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java (line 246)
<https://reviews.apache.org/r/34666/#comment143192>

    Should encapsulated with LOG.isDebugEnabled.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java (line 251)
<https://reviews.apache.org/r/34666/#comment143193>

    Log in error level should means some error happens,the process would be interrupted, if we really expect single field here, should we throw an exception while it has more? otherwise, we should downgrade the log level to WARN with more precise information.


- chengxiang li


On 五月 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated 五月 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On July 1, 2015, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java, line 142
> > <https://reviews.apache.org/r/34666/diff/1/?file=971707#file971707line142>
> >
> >     Why do we need this now?

This is to prevent a newly generated task to be processed again. In case the task contains localwork, it maybe overwritten. See HIVE-9424 for more details.
However, I just found out that this is also fixed as part of HIVE-9659, so I'll remove this code now.


> On July 1, 2015, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 227
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line227>
> >
> >     why putting the old work in the map.

This is because even though we are cloning the op tree, we are still retaining the old work. So, after we've created a new root op, we need to
update the rootToWorkMap, and map the cloned root op to the old work. This is later used in getEnclosingWork.
We also need to keep the old entry because in SparkPartitionPruningSink, it still stores the old TableScanOperator, and in processPartitionPruningSink it
will look up the op to get a corresponding target work.
Added more comments in the code.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89972
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89972
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java (line 142)
<https://reviews.apache.org/r/34666/#comment142879>

    Why do we need this now?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 227)
<https://reviews.apache.org/r/34666/#comment142880>

    why putting the old work in the map.


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>