You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Xuefu Zhang <xz...@cloudera.com> on 2015/06/30 22:55:00 UTC

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------



itests/src/test/resources/testconfiguration.properties (line 894)
<https://reviews.apache.org/r/34666/#comment142628>

    Are there more test cases that can be turned on?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java (line 68)
<https://reviews.apache.org/r/34666/#comment142851>

    I think we should delegate the processing to the parent when processing one row from the batch. Refer to VectorReduceSinkOperator for an example.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java (line 74)
<https://reviews.apache.org/r/34666/#comment142852>

    Is there anything specific to Spark? If not, we should probably reuse rather than copying.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java (line 45)
<https://reviews.apache.org/r/34666/#comment142853>

    Same as above. We should probably reuse.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 375)
<https://reviews.apache.org/r/34666/#comment142860>

    Instead of throwing an AssertionError, should we do a condition assertion instead?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 589)
<https://reviews.apache.org/r/34666/#comment142870>

    It seems that an operator might be visited multiple times.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 219)
<https://reviews.apache.org/r/34666/#comment142758>

    The comment here is a little confusing. "break op tree" seems having already happened above.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 224)
<https://reviews.apache.org/r/34666/#comment142759>

    Nit: add comments here, like "regenerate task dependency".



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 262)
<https://reviews.apache.org/r/34666/#comment142756>

    Rename generateWorkTree() to generateTaskTreeHelper() or something like that.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 71)
<https://reviews.apache.org/r/34666/#comment142757>

    Rename the class to something like OperatorTreeSplitterForPPD().



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 81)
<https://reviews.apache.org/r/34666/#comment142760>

    Nit: Split this into two lines instead.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 107)
<https://reviews.apache.org/r/34666/#comment142764>

    For the cloned tree, don't we need to remove the branches that's not connected to the pruning sink operator, i.e., RS->Join?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java (line 111)
<https://reviews.apache.org/r/34666/#comment142768>

    This is not cloned as part of cloneOperatorTree()?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 69)
<https://reviews.apache.org/r/34666/#comment142765>

    Nit: remove the blank line.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 92)
<https://reviews.apache.org/r/34666/#comment142766>

    Can we still get conflicts in the file name?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java (line 98)
<https://reviews.apache.org/r/34666/#comment142767>

    Nit: Potential leak of BufferedOutputStream.


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > itests/src/test/resources/testconfiguration.properties, line 894
> > <https://reviews.apache.org/r/34666/diff/1/?file=971683#file971683line894>
> >
> >     Are there more test cases that can be turned on?

will turn on vectorized_dynamic_partition_pruning.q


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java, line 74
> > <https://reviews.apache.org/r/34666/diff/1/?file=971705#file971705line74>
> >
> >     Is there anything specific to Spark? If not, we should probably reuse rather than copying.

The only difference is the type of pruning sink added - we use SparkPartitionPruningSinkOp while Tez uses AppMasterEventOp.
OK, I'll reuse the existing class.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 589
> > <https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line589>
> >
> >     It seems that an operator might be visited multiple times.

Yea, but I guess it doesn't matter here. We just use this to find the enclosing work for a op, and we just need to find at least one root op.
Duplicate doesn't matter here I think.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 107
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line107>
> >
> >     For the cloned tree, don't we need to remove the branches that's not connected to the pruning sink operator, i.e., RS->Join?

This is done before we clone the branch:

```
List<Operator<?>> savedChildOps = filterOp.getChildOperators();
filterOp.setChildOperators(Utilities.makeList(selOp));
```


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 111
> > <https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line111>
> >
> >     This is not cloned as part of cloneOperatorTree()?

no - because it is a transient field.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?

It shouldn't - I think work ID and Random#nextInt() should both be unique, right?


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java, line 68
> > <https://reviews.apache.org/r/34666/diff/1/?file=971701#file971701line68>
> >
> >     I think we should delegate the processing to the parent when processing one row from the batch. Refer to VectorReduceSinkOperator for an example.

Not much we can do here, since here the processing is more complicated. I changed part of the code to call the super.process().


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 98
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98>
> >
> >     Nit: Potential leak of BufferedOutputStream.

Can you explain a little under which situation this would happen? and what is the better way to do this? Thanks.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java 334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java 6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java 169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java 4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java 6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java 3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?
> 
> Xuefu Zhang wrote:
>     Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.
> 
> Chao Sun wrote:
>     Yes targetWorkID/sourceWorkID should be unique, but it could have multiple tasks from a single work, and if we don't have the random number, their results may overwrite each other. We also did the same thing for the hash table sink in Spark, and we haven't seen any issue with that.

targetWorkID/sourceWorkID are unique. We need random number because we could have multiple tasks for a particular work, in which case they may overwrite each other's file.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?

Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 98
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98>
> >
> >     Nit: Potential leak of BufferedOutputStream.
> 
> Chao Sun wrote:
>     Can you explain a little under which situation this would happen? and what is the better way to do this? Thanks.

fs.create() can be successful, while either "new BufferedOutputStream()" or "new ObjectOutputStream()" can fail (returning null). In that case, the file descriptor returned by fs.create() will leak.

There is a new notation in java 7 for automatic resource management. Refer to: http://radar.oreilly.com/2011/09/java7-features.html


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

Posted by Chao Sun <ch...@cloudera.com>.


> On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92
> > <https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92>
> >
> >     Can we still get conflicts in the file name?
> 
> Chao Sun wrote:
>     It shouldn't - I think work ID and Random#nextInt() should both be unique, right?
> 
> Xuefu Zhang wrote:
>     Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives you uniqueness, then you don't need a random number, right? If targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number doesn't help much.

Yes targetWorkID/sourceWorkID should be unique, but it could have multiple tasks from a single work, and if we don't have the random number, their results may overwrite each other. We also did the same thing for the hash table sink in Spark, and we haven't seen any issue with that.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab 
>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>