You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/10/02 21:47:57 UTC

[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet scan node
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@557
PS2, Line 557: __builtin_popcount
Should call BitUtil::Popcount(), which will use hardware acceleration if appropriate.


http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@579
PS2, Line 579: bit_map
We put an underscore at the end of private members, i.e. 'bit_map_'


http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@582
PS2, Line 582:   /// Mapping of file formats (file type, compression types set) to the number of
Not your change, but it should mention the second entry in the tuple - whether the split was skipped.


http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql@1581
PS2, Line 1581: -- IMPALA-5448: parquet files with multiple compression types
We moved to loading "special" files as part of the tests rather than part of the data loading in a lot of cases. I think that is better practically because if you change this template then everyone has to reload data.

I commented on an instance of the alternative approach that we should switch to.


http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@82
PS2, Line 82:   def test_hdfs_parquet_scan_node_profile(self, vector):
This only applies to parquet so should go in TestParquet below (TestScannersAllTableFormats runs the test for all table formats).


http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@337
PS2, Line 337:   def test_corrupt_rle_counts(self, vector, unique_database):
This is an example of the alternative way of loading data files as part of the test.



-- 
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2017 21:47:57 +0000
Gerrit-HasComments: Yes