You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Brock Noland <br...@cloudera.com> on 2013/06/09 22:11:14 UTC
Review Request: HIVE-4113: Optimize select count(1) with RCFile and Orc
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11770/
-----------------------------------------------------------
Review request for hive.
Description
-------
Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read. Additionally the patch updates all locations which uses the old method of empty string indicating all columns should be read.
The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some unrelated import/whitespace cleanup.
This addresses bug HIVE-4113.
https://issues.apache.org/jira/browse/HIVE-4113
Diffs
-----
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java da85501
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatBaseInputFormat.java bc0e04c
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatRecordReader.java ac3753f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InitializeInput.java 02ec37f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InternalUtil.java 4167afa
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatMultiOutputFormat.java b5f22af
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java dd2ac10
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestHCatLoader.java e907c73
ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 6bbcb26
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a784b2
ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 49145b7
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java adf4923
ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d18d403
ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 96ac584
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java cbdc2db
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 9fc52fa
ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java 0df08e4
ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java e33a1ce
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 785f0b1
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 23180cf
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 11f5f07
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1335446
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java e1270cc
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java b717278
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java 0317024
serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java PRE-CREATION
serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 3ba2699
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java 99420ca
Diff: https://reviews.apache.org/r/11770/diff/
Testing
-------
All unit tests pass with the patch. ColumnProjectionUtils has new unit tests covering it's functionality. Additionally I verified manually the select count(1) from RCFile/Orc resulted in less IO after the change.
Before:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 17.75 sec HDFS Read: 28782851 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 23.72 sec HDFS Read: 825865962 HDFS Write: 9 SUCCESS
After:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 9.9 sec HDFS Read: 67325 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 16.96 sec HDFS Read: 96045618 HDFS Write: 9 SUCCESS
Thanks,
Brock Noland
Re: Review Request 11770: HIVE-4113: Optimize select count(1) with RCFile
and Orc
Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11770/
-----------------------------------------------------------
(Updated July 15, 2013, 7:51 p.m.)
Review request for hive.
Changes
-------
Test was missed, included it now.
Bugs: HIVE-4113
https://issues.apache.org/jira/browse/HIVE-4113
Repository: hive-git
Description
-------
Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read. Additionally the patch updates all locations which uses the old method of empty string indicating all columns should be read.
The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some unrelated import/whitespace cleanup.
Diffs (updated)
-----
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java da85501
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatBaseInputFormat.java bc0e04c
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatRecordReader.java ac3753f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InitializeInput.java 02ec37f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InternalUtil.java 4167afa
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatMultiOutputFormat.java b5f22af
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java dd2ac10
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestHCatLoader.java e907c73
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a784b2
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java f72ecfb
ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 49145b7
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java adf4923
ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d18d403
ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 96ac584
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java cbdc2db
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 400abf3
ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java fb9fca1
ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java ae6a5ee
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 785f0b1
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 23180cf
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 11f5f07
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1335446
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java e1270cc
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java b717278
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java 0317024
serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java PRE-CREATION
serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 3ba2699
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java 99420ca
Diff: https://reviews.apache.org/r/11770/diff/
Testing
-------
All unit tests pass with the patch. ColumnProjectionUtils has new unit tests covering it's functionality. Additionally I verified manually the select count(1) from RCFile/Orc resulted in less IO after the change.
Before:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 17.75 sec HDFS Read: 28782851 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 23.72 sec HDFS Read: 825865962 HDFS Write: 9 SUCCESS
After:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 9.9 sec HDFS Read: 67325 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 16.96 sec HDFS Read: 96045618 HDFS Write: 9 SUCCESS
Thanks,
Brock Noland
Re: Review Request 11770: HIVE-4113: Optimize select count(1) with RCFile
and Orc
Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11770/
-----------------------------------------------------------
(Updated July 15, 2013, 7:47 p.m.)
Review request for hive.
Changes
-------
Rebased patch, no real changes.
Bugs: HIVE-4113
https://issues.apache.org/jira/browse/HIVE-4113
Repository: hive-git
Description
-------
Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read. Additionally the patch updates all locations which uses the old method of empty string indicating all columns should be read.
The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some unrelated import/whitespace cleanup.
Diffs (updated)
-----
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java da85501
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatBaseInputFormat.java bc0e04c
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatRecordReader.java ac3753f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InitializeInput.java 02ec37f
hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/InternalUtil.java 4167afa
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatMultiOutputFormat.java b5f22af
hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java dd2ac10
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestHCatLoader.java e907c73
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a784b2
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java f72ecfb
ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 49145b7
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java adf4923
ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d18d403
ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 96ac584
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java cbdc2db
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 400abf3
ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java fb9fca1
ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java ae6a5ee
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 785f0b1
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 23180cf
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 11f5f07
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1335446
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java e1270cc
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java b717278
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java 0317024
serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 3ba2699
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java 99420ca
Diff: https://reviews.apache.org/r/11770/diff/
Testing
-------
All unit tests pass with the patch. ColumnProjectionUtils has new unit tests covering it's functionality. Additionally I verified manually the select count(1) from RCFile/Orc resulted in less IO after the change.
Before:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 17.75 sec HDFS Read: 28782851 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 23.72 sec HDFS Read: 825865962 HDFS Write: 9 SUCCESS
After:
hive> select count(1) from users_orc;
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 9.9 sec HDFS Read: 67325 HDFS Write: 9 SUCCESS
hive> select count(1) from users_rc;
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 16.96 sec HDFS Read: 96045618 HDFS Write: 9 SUCCESS
Thanks,
Brock Noland