You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Dong Chen <do...@intel.com> on 2015/03/03 09:28:35 UTC

Review Request 31671: HIVE-8128: Improve Parquet Vectorization

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31671/
-----------------------------------------------------------

Review request for hive, Brock Noland, cheng xu, and Sergio Pena.


Repository: hive-git


Description
-------

This is a POC based on the new vectorized Parquet API at https://github.com/zhenxiao/incubator-parquet-mr/pull/1

I check out the Parquet API code, make a little change, and then add the Hive changes. The vectorized read could work locally. Add a test to verify it.

This patch only contains the basic work. A list of TODO is commented in the code.

Any feedback is welcome!


Diffs
-----

  data/files/testParquetFile PRE-CREATION 
  pom.xml 75a41a4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 6a44c27 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java c915f72 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 0391229 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java d7edd52 
  ql/src/test/queries/clientpositive/vectorized_parquet_data_types.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_parquet_data_types.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/31671/diff/


Testing
-------

add one test, and UT pass locally.


Thanks,

Dong Chen