You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Yida Wu (Code Review)" <ge...@cloudera.org> on 2021/07/22 14:17:41 UTC

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17711


Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................

IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

We support the HDFS scratch space, but only as a test-only feature
with a fixed HDFS default local path.

In this patch, we extend the HDFS scratch space to support the
customer's input. For supporting the function, we add a new
format for HDFS scratch space path. It forces the HDFS path
to have the port number to solve the contradiction to the
current format of the scratch space path.

For example, previously, the format for scratch space path is,
take s3 for example, s3a://bucketpath:#bytes:#priority. In this
case, the bucketpath doesn't have a port number.

In the patch, the new format of HDFS scratch path is
hdfs://ipaddr:#port:#bytes:#priority. The port number is required,
therefore, there must be at least one colon in the HDFS path, the
bytes and priority are optional as before. For other scratch
spaces, the path format doesn’t change.

The patch also adds a logic to try fetching the connection of a
remote scratch space in the initialization of TmpFileMgr, in
order to expose the case at the beginning if a wrong remote path
is configured.

Tests:
Added and passed TmpFileMgrTest::TestDirectoryLimitParsingRemotePath.
Ran the Core tests.

Change-Id: I26393922811137278046bf82d4988ab832944f02
---
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
6 files changed, 191 insertions(+), 74 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17711/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has abandoned this change. ( http://gerrit.cloudera.org:8080/17711 )

Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17711 )

Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9144/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Jul 2021 14:45:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17711 )

Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9155/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 12:55:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17711 )

Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................


Patch Set 2:

One general comment for better code structuring is that parsing and validation logic for temporary paths can be moved to `TmpDir`. We can have inherited HdfsTmpDir, S3TmpDir etc which can have custom logic for their filesystem like enforcing port number in Hdfs paths, not having port number in S3 paths etc. It would then be easier if we need to add more custom logic in future for object stores like Azure Blob Store, GCS, Minio etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 09:11:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17711 )

Change subject: IMPALA-10429 Add Support for Spilling to HDFS Path Parsing
......................................................................

IMPALA-10429 Add Support for Spilling to HDFS Path Parsing

We support the HDFS scratch space, but only as a test-only feature
with a fixed HDFS default local path.

In this patch, we extend the HDFS scratch space to support the
customer's input. For supporting the function, we add a new
format for HDFS scratch space path. It forces the HDFS path
to have the port number to solve the contradiction to the
current format of the scratch space path.

For example, previously, the format for scratch space path is,
take s3 for example, s3a://bucketpath:#bytes:#priority. In this
case, the bucketpath doesn't have a port number.

In the patch, the new format of HDFS scratch path is
hdfs://ipaddr:#port:#bytes:#priority. The port number is required,
therefore, there must be at least one colon in the HDFS path, the
bytes and priority are optional as before. For other scratch
spaces, the path format doesn’t change.

The patch also fixes a bug that if there is a space ahead of the
s3 scratch space path, the parsing is incorrect because using an
untrimmed string.

Tests:
Added and passed TmpFileMgrTest::TestDirectoryLimitParsingRemotePath.
Ran the Core tests.

Change-Id: I26393922811137278046bf82d4988ab832944f02
---
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
6 files changed, 210 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17711/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26393922811137278046bf82d4988ab832944f02
Gerrit-Change-Number: 17711
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>