You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sahil Takiar (Code Review)" <ge...@cloudera.org> on 2019/01/28 16:40:16 UTC

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12282


Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
A fe/src/main/java/org/apache/impala/catalog/FsType.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 614 insertions(+), 85 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 07:28:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1905/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Mon, 28 Jan 2019 17:35:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12282/1/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/12282/1/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@344
PS1, Line 344:       Map<HdfsScanNode.SampledPartitionMetadata, List<FileDescriptor>> result = new HashMap<>();
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12282/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@799
PS1, Line 799:    * Initializes members with information about files and scan ranges, e.g. totalFilesPerFs_,
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1072
PS1, Line 1072:       // The extrapolated row count is based on the 'totalBytesPerFs_' which already accounts
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 28 Jan 2019 16:41:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12282

to look at the new patch set (#2).

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
A fe/src/main/java/org/apache/impala/catalog/FsType.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 614 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12282

to look at the new patch set (#3).

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
A fe/src/main/java/org/apache/impala/catalog/FsType.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 618 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java
File fe/src/main/java/org/apache/impala/catalog/FsType.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java@25
PS4, Line 25:  * Represents the type of filesystem being used. Typically associated with a
This is somewhat redundant with the is*Filesystem() logic in FileSystemUtil.java. Maybe we should move this enum there so that it's easier to find the relate code. Or cross-reference it? Or share some of the implementation?

Mainly just concerned that this is another place people have to update to add a filesystem. There's a more theoretical concern that maybe multiple URL schemes map to a single filesystem implementation, so the other code could recognise a filesystem that this doesn't. That seems far-fetched though.


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@89
PS4, Line 89:     return FsType.LOCAL;
See comment on LocalFsTable


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@276
PS4, Line 276:     return FsType.LOCAL;
Is this right? I think this is a "Local" table in the sense that it's the "Local" catalog implementation. But it could be any table.


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/test/java/org/apache/impala/planner/ExplainTest.java
File fe/src/test/java/org/apache/impala/planner/ExplainTest.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@81
PS4, Line 81:     partitions.add(createMockHdfsPartition("abfs://dummy-fs@dummy-account.dfs.core"
This is cool


http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
File testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test:

http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test@22
PS4, Line 22:    S3 partitions=1/0 files=0 size=0B
What's the deal with the 1/0?



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 18:19:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2701/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 16:37:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 6: Code-Review+2

(2 comments)

LGTM assuming tests pass.

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java
File fe/src/main/java/org/apache/impala/catalog/FsType.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java@25
PS4, Line 25: 
> Moved it to FileSystemUtil.java so all the code is in one place. The use of
Thanks for the explanation, yeah that makes a lot of sense.


http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
File testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test:

http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test@22
PS4, Line 22:    S3 partitions=1/0 files=0 size=0B
> It's due to the fact that s3_tbl isn't a real table, it's a "test table" cr
Oh ok, so that's pre-existing behaviour? I guess that's nbd, although I imagine it will confuse some other people.



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:40:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1906/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Mon, 28 Jan 2019 17:54:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12282

to look at the new patch set (#6).

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 624 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12282

to look at the new patch set (#5).

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 617 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 4:

(1 comment)

I re-ran the core tests and everything passed, so this should be good to merge.

http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
File testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test:

http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test@22
PS4, Line 22:    S3 partitions=1/0 files=0 size=0B
> Oh ok, so that's pre-existing behaviour? I guess that's nbd, although I ima
Yeah its pre-existing. I agree its a bit confusing though.



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Apr 2019 20:22:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4013/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 02:15:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 4:

(5 comments)

I'm re-running the full suite of tests to ensure there are no test failures introduced by rebasing this patch. Will post an update when the tests complete.

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java
File fe/src/main/java/org/apache/impala/catalog/FsType.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/FsType.java@25
PS4, Line 25:  * Represents the type of filesystem being used. Typically associated with a
> This is somewhat redundant with the is*Filesystem() logic in FileSystemUtil
Moved it to FileSystemUtil.java so all the code is in one place. The use of FsType is slightly different from the is*FileSystem methods. FsType allows grouping multiple FileSystems into a single logic type, which is useful for cloud stores like ADLS because it supports multiple file system connectors: abfs, abfss, adl. It also also mapping a filesystem scheme to a more user friendly string - e.g. s3a to S3.

I added some more Javadocs to make it clearer when to use FsType vs. when to use is*FileSystem.


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@89
PS4, Line 89:     return FsType.LOCAL;
> See comment on LocalFsTable
Done


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@276
PS4, Line 276:     return FsType.LOCAL;
> Is this right? I think this is a "Local" table in the sense that it's the "
Yeah, thanks for pointing that out. Updated the implementation so its pretty much the same as what HdfsTable / HdfsPartition does.


http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/test/java/org/apache/impala/planner/ExplainTest.java
File fe/src/test/java/org/apache/impala/planner/ExplainTest.java:

http://gerrit.cloudera.org:8080/#/c/12282/4/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@81
PS4, Line 81:     partitions.add(createMockHdfsPartition("abfs://dummy-fs@dummy-account.dfs.core"
> This is cool
Thanks :)


http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
File testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test:

http://gerrit.cloudera.org:8080/#/c/12282/4/testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test@22
PS4, Line 22:    S3 partitions=1/0 files=0 size=0B
> What's the deal with the 1/0?
It's due to the fact that s3_tbl isn't a real table, it's a "test table" created using FrontendTestBase#addTestTable. 't_nopart' in 'insert-sort-by.test' is another example of a table that has the '1/0' behavior.



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:15:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2711/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:44:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 02:15:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2712/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:45:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12282

to look at the new patch set (#4).

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
A fe/src/main/java/org/apache/impala/catalog/FsType.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 618 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12282/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................

IMPALA-6050: Query profiles should indicate storage layer(s) used

This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).

Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional.alltypes]
    partitions=24/24 files=24 size=478.45KB

Now the explain plan will look like this:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN S3 [functional.alltypes]
    ADLS partitions=4/24 files=4 size=478.45KB
    HDFS partitions=10/24 files=10 size=478.45KB
    S3 partitions=10/24 files=10 size=478.45KB

The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.

This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.

Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic

Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Reviewed-on: http://gerrit.cloudera.org:8080/12282
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A testdata/workloads/functional-planner/queries/PlannerTest/scan-node-fs-scheme.test
M testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
M tests/common/impala_test_suite.py
M tests/util/filesystem_utils.py
19 files changed, 624 insertions(+), 88 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6050: Query profiles should indicate storage layer(s) used

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12282 )

Change subject: IMPALA-6050: Query profiles should indicate storage layer(s) used
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@772
PS5, Line 772:     public SampledPartitionMetadata(long partitionId, FileSystemUtil.FsType partitionFsType) {
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1323
PS5, Line 1323:         for (Map.Entry<FileSystemUtil.FsType, Long> partsPerFs : numPartitionsPerFs_.entrySet()) {
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java
File fe/src/test/java/org/apache/impala/planner/ExplainTest.java:

http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@99
PS5, Line 99:     partitions.add(createMockHdfsPartition("s3a://dummy-bucket/dummy-part-7", FileSystemUtil.FsType.S3));
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@100
PS5, Line 100:     partitions.add(createMockHdfsPartition("s3a://dummy-bucket/dummy-part-8", FileSystemUtil.FsType.S3));
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@101
PS5, Line 101:     partitions.add(createMockHdfsPartition(dummyTblPath + "/dummy-part-9", FileSystemUtil.FsType.HDFS));
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@102
PS5, Line 102:     partitions.add(createMockHdfsPartition(dummyTblPath + "/dummy-part-10", FileSystemUtil.FsType.HDFS));
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/12282/5/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@148
PS5, Line 148:   private HdfsPartition createMockHdfsPartition(String path, FileSystemUtil.FsType fsType) {
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Gerrit-Change-Number: 12282
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 20:59:11 +0000
Gerrit-HasComments: Yes