You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "anujphadke (Code Review)" <ge...@cloudera.org> on 2017/06/21 07:34:27 UTC

[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

anujphadke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7245

Change subject: IMPALA-4863: Correctly account the file type and compression codec
......................................................................

IMPALA-4863: Correctly account the file type and compression codec

If a scan range is filtered at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: PARQUET/SNAPPY:364 PARQUET(Filtered):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-text-scanner.cc
7 files changed, 27 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7245/9/testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
File testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test:

PS9, Line 13: 608
Is this number stable? Or does it depend on how scan ranges and assigned to backends?


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863: Correctly account the file type and compression codec
......................................................................


Patch Set 2:

Defaulting filtered to false in the function definition instead of passing false at every function invocation.

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7:

I tested this locally on my machine with a cluster size = 1.
I was able to recreate this locally on starting the impala minicluster with size=3.
I have a fix for this which I am testing right now.
Spoke to Tim about it. Sorry forgot to update it here.

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7245

to look at the new patch set (#8).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
M tests/query_test/test_scanners.py
7 files changed, 71 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863: Correctly account the file type and compression codec
......................................................................


Patch Set 2:

(6 comments)

We should add a regression test for this as well.

http://gerrit.cloudera.org:8080/#/c/7245/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4863: Correctly account the file type and compression codec
It would be easier to review if we included the IMPALA-5311 fix in this, since they will require related changes to the same code path.


http://gerrit.cloudera.org:8080/#/c/7245/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS1, Line 771: const
const is redundant. Should also have a space after ","


http://gerrit.cloudera.org:8080/#/c/7245/2/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS2, Line 553: true
It seems like this will remove useful information for file formats where we *do* know the compression codec ahead of time. I think we should rethink the parameters here. If I understand correctly there are three outcomes:

* File is not filtered
* File is filtered but is a file format where we know the compression
* File is filtered but is a file format where we don't know the compression without scanning it.


Line 771:     const THdfsCompression::type& compression_type,const bool filtered) {
missing space - maybe run it it through clang-format before posting?

Also remove the "const", it doesn't make much sense for arguments passed by value.


http://gerrit.cloudera.org:8080/#/c/7245/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS2, Line 250: =f
should have spaces around " = "


PS2, Line 462:  
don't need space between >> in C++11


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1094/

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7245/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS4, Line 259: 
             :   /// 2. scan range 
> This part isn't quite right, since we do know the compression type and runt
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4863: Correctly account the file type and compression codec
......................................................................

IMPALA-4863: Correctly account the file type and compression codec

If a scan range is filtered at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: PARQUET/SNAPPY:364 PARQUET(Filtered):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
4 files changed, 21 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7245/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 27: #include <tuple>
Shouldn't this be in the header instead of the .cc?


Line 776:     const THdfsCompression::type& compression_type, bool skipped) {
Parameter names don't match header


Line 883:           if (file_format == THdfsFileFormat::PARQUET) {
One line comment explaining why Parquet is a special case.


PS3, Line 886:     ss << file_format << "(Skipped)" << "/" << compression_type << ":"
             :               << it->second << " ";
AVRO/SNAPPY(Skipped) I think reads better


http://gerrit.cloudera.org:8080/#/c/7245/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 257:       const std::vector<THdfsCompression::type>& compression_type, bool filtered = false);
Parameter names are inconsistent - skipped vs filtered. We should also document when 'skipped' should be set to true.


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7245/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 27: #include <sstream>
> Shouldn't this be in the header instead of the .cc?
Done


Line 776:   vector<THdfsCompression::type> types;
> Parameter names don't match header
Done


Line 883:             // If a scan range stored as parquet is skipped, its compression type
> One line comment explaining why Parquet is a special case.
Done


PS3, Line 886:   } else {
             :             ss << file_format << "/
> AVRO/SNAPPY(Skipped) I think reads better
Done. Changed it everywhere.


http://gerrit.cloudera.org:8080/#/c/7245/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 257:   /// in the file. The metrics are incremented for each compression_type.
> Parameter names are inconsistent - skipped vs filtered. We should also docu
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7245/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 885:         THdfsCompression::type compression_type = std::get<2>(it->first);
> nit: missing space before <<
Done


PS5, Line 888: fi
> nit: align <<'s vertically (it's in the style guide somewhere).
Done


http://gerrit.cloudera.org:8080/#/c/7245/5/testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
File testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test:

Line 9: ====
> I missed this in an earlier pass but we don't have a regression test for IM
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has uploaded a new patch set (#5).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
6 files changed, 48 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7245/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 885:             ss << file_format << "/" << "Unknown"<< "(Skipped):" << it->second << " ";
nit: missing space before <<


PS5, Line 888: <<
nit: align <<'s vertically (it's in the style guide somewhere).


http://gerrit.cloudera.org:8080/#/c/7245/5/testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
File testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test:

Line 9: ---- QUERY
I missed this in an earlier pass but we don't have a regression test for IMPALA-4863. Can we add tests for the runtime filter partition case, maybe for both text and parquet. E.g. this query works for me:

set runtime_filter_wait_time_ms=100000;
select count(*) 
from tpcds.store_sales
join tpcds.date_dim on ss_sold_date_sk = d_date_sk
where d_qoy=1


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7245 )

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Reviewed-on: http://gerrit.cloudera.org:8080/7245
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Tim Armstrong <ta...@cloudera.com>
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
M tests/query_test/test_scanners.py
7 files changed, 71 insertions(+), 20 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-Change-Number: 7245
Gerrit-PatchSet: 12
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1094/

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 8:

Ran the core tests and the exhaustive tests with the change and they passed.

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7245/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
> It would be easier to review if we included the IMPALA-5311 fix in this, si
Done


http://gerrit.cloudera.org:8080/#/c/7245/2/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS2, Line 553: 
> It seems like this will remove useful information for file formats where we
Done


Line 771: 
> missing space - maybe run it it through clang-format before posting?
Done


http://gerrit.cloudera.org:8080/#/c/7245/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS2, Line 250: t 
> should have spaces around " = "
Done


PS2, Line 462: 
> don't need space between >> in C++11
Not needed anymore.
Moved std::pair to a tuple


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7:

Did you have a change to look at the test failures? They looked related to the patch, e.g. TestScannersAllTableFormats::test_hdfs_scanner_profile

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7245/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS4, Line 259: nd compression_type is not
             :   ///    determined.
This part isn't quite right, since we do know the compression type and runtime. Can probably just omit this clause.


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has uploaded a new patch set (#3).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: PARQUET/SNAPPY:364 PARQUET(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
6 files changed, 42 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7245 )

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 11: Verified+1

Build passed but gerrit was down when it tried to submit.


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-Change-Number: 7245
Gerrit-PatchSet: 11
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Sep 2017 17:38:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1244/

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................


Patch Set 7:

Ping?

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7245

to look at the new patch set (#10).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
M tests/query_test/test_scanners.py
7 files changed, 71 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has uploaded a new patch set (#4).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
6 files changed, 49 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has uploaded a new patch set (#6).

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
......................................................................

IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
M tests/query_test/test_scanners.py
7 files changed, 71 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke <ap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>