You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Smith (Code Review)" <ge...@cloudera.org> on 2023/02/13 19:38:44 UTC

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service name

Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19496


Change subject: IMPALA-11920: Support spill to HDFS address by service name
......................................................................

IMPALA-11920: Support spill to HDFS address by service name

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service name that doesn't include a port number.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
2 files changed, 28 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 6: Code-Review+2

Carry +2 for rebase and comment/commit message update.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 00:38:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service name

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service name
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12365/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Feb 2023 20:01:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 4:

I think putting docs in the same patch as code changes is confusing the Jenkins auto-review jobs. Should I split them up?


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Feb 2023 23:17:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Hello Yida Wu, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19496

to look at the new patch set (#2).

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number.

Updates documentation to include examples with service identifier. Also
fixes inconsistent use of ASCII quotes for example text, highlighting
code and variable names, and normalizes descriptions between
S3/HDFS/Ozone. Removes "priority" from remote descriptions as it is
optional and does nothing.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M docs/topics/impala_disk_space.xml
3 files changed, 123 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service name

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service name
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19496/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/19496/1/be/src/runtime/tmp-file-mgr.cc@825
PS1, Line 825:         Substitute("The scratch URI must have a path or port number: '$0'", raw_path_));
There is one potentially ambiguous case here. If you wanted to put scratch-dirs in the HDFS root for some reason, then
    hdfs://hdfs1:100
could mean that you want to use port 100, or bytes_limit 100. Right now it will always be read as a port, as I think normal practice should be to put the scratch space in a directory; in Ozone you must use a volume. If you want to differentiate, add a trailing slash to the URI
    hdfs://hdfs1/:100



-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Feb 2023 19:44:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number. Examples
- "hdfs://hdfs1/:10G" uses the root directory of HDFS with a 10G limit
- "ofs://ozone1/tmp::" uses /tmp in Ozone with default limit/priority

Updates `scratch_dirs` parsing to allow whitespace after each specifier,
as in "hdfs://hdfs1/ , /tmp". This is unambiguous and avoids failures
for simple mistakes.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Reviewed-on: http://gerrit.cloudera.org:8080/19496
Reviewed-by: Michael Smith <mi...@cloudera.com>
Tested-by: Michael Smith <mi...@cloudera.com>
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
2 files changed, 59 insertions(+), 35 deletions(-)

Approvals:
  Michael Smith: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19496/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19496/5//COMMIT_MSG@15
PS5, Line 15: hfs
hdfs?


http://gerrit.cloudera.org:8080/#/c/19496/5/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/19496/5/be/src/runtime/tmp-file-mgr.cc@814
PS5, Line 814: // We enforce the HDFS scratch path to have the port number,
Comment probably should be updated.



-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 00:30:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Hello Yida Wu, Shajini Thayasingh, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19496

to look at the new patch set (#3).

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number. Examples
- "hdfs://hdfs1/:10G" uses the root directory of HDFS with a 10G limit
- "ofs://ozone1/tmp::" uses /tmp in Ozone with default limit/priority

Updates documentation to include examples with service identifier. Also
fixes inconsistent use of ASCII quotes for example text, highlighting
code and variable names, and normalizes descriptions between
S3/HDFS/Ozone. Removes "priority" from remote descriptions as it is
optional and does nothing.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M docs/topics/impala_disk_space.xml
3 files changed, 145 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12512/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 00:58:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19496/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19496/3//COMMIT_MSG@17
PS3, Line 17: priority
> Just to confirm that the priority is removed in the document for the remote
Yes, it'll still be parsed in the code.


http://gerrit.cloudera.org:8080/#/c/19496/3/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/19496/3/be/src/runtime/tmp-file-mgr-test.cc@2240
PS3, Line 2240: hdfs
> Maybe it is not related, but when I test with a space behind the url like -
Ok. That happens because https://github.com/apache/impala/blob/feb4a76ed4cb5b688143eb21370f78ec93133c56/be/src/runtime/tmp-file-mgr.cc#L290 only does trim left. trim_copy would be fine there.


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml
File docs/topics/impala_disk_space.xml:

http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@298
PS3, Line 298: /dir1::0
> The alignment looks weird compared to the previous version, does this part 
It fixes the indentation in https://impala.apache.org/docs/build/asf-site-html/topics/impala_disk_space.html. All whitespace is included in a codeblock, and having the examples indented looks wrong to me.


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@486
PS3, Line 486: ozone1
> Just a question that what is the difference between Ozone service identifie
Ozone Manager might be addressed via a hostname. https://ci-hadoop.apache.org/view/Hadoop%20Ozone/job/ozone-doc-master/lastSuccessfulBuild/artifact/hadoop-hdds/docs/public/feature/om-ha.html discusses service IDs for an HA deployment.


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@488
PS3, Line 488:  
> nit. looks good to have a comma here
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Feb 2023 17:50:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12383/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Feb 2023 23:39:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5: Code-Review+1

The patch looks good to me. I will include Abhishek to review, because he reviewed the Spill to HDFS.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Feb 2023 01:12:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Hello Yida Wu, Shajini Thayasingh, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19496

to look at the new patch set (#5).

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number. Examples
- "hdfs://hdfs1/:10G" uses the root directory of HDFS with a 10G limit
- "ofs://ozone1/tmp::" uses /tmp in Ozone with default limit/priority

Updates `scratch_dirs` parsing to allow whitespace after each specifier,
as in "hfs://hdfs1/ , /tmp". This is unambiguous and avoids failures for
simple mistakes.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
2 files changed, 57 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 6: Verified+1

> Patch Set 6: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9094/

Hit IMPALA-11816. Overriding it.


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 17:25:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has removed a vote on this change.

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 16 Feb 2023 04:46:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Hello Yida Wu, Shajini Thayasingh, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19496

to look at the new patch set (#4).

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number. Examples
- "hdfs://hdfs1/:10G" uses the root directory of HDFS with a 10G limit
- "ofs://ozone1/tmp::" uses /tmp in Ozone with default limit/priority

Updates `scratch_dirs` parsing to allow whitespace after each specifier,
as in "hfs://hdfs1/ , /tmp". This is unambiguous and avoids failures for
simple mistakes.

Updates documentation to include examples with service identifier. Also
fixes inconsistent use of ASCII quotes for example text, highlighting
code and variable names, and normalizes descriptions between
S3/HDFS/Ozone. Removes "priority" from remote descriptions as it is
optional and does nothing.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M docs/topics/impala_disk_space.xml
3 files changed, 153 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9052/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Feb 2023 23:34:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9094/


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 05:44:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9094/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 00:39:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Hello Yida Wu, Abhishek Rawat, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19496

to look at the new patch set (#6).

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................

IMPALA-11920: Support spill to HDFS address by service ID

Allows addressing HDFS (and Ozone) filesystems in `scratch_dirs` by a
service identifier that doesn't include a port number. Examples
- "hdfs://hdfs1/:10G" uses the root directory of HDFS with a 10G limit
- "ofs://ozone1/tmp::" uses /tmp in Ozone with default limit/priority

Updates `scratch_dirs` parsing to allow whitespace after each specifier,
as in "hdfs://hdfs1/ , /tmp". This is unambiguous and avoids failures
for simple mistakes.

Testing:
- new backend test cases run with HDFS and Ozone
- manually tested that Impala starts with
  --impalad_args=--scratch_dirs=ofs://localhost/tmp,/tmp
  creates impala-scratch in both locations

Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
2 files changed, 59 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/19496/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19496/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19496/5//COMMIT_MSG@15
PS5, Line 15: hfs
> hdfs?
Done


http://gerrit.cloudera.org:8080/#/c/19496/5/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/19496/5/be/src/runtime/tmp-file-mgr.cc@814
PS5, Line 814: // We enforce the HDFS scratch path to have the port number,
> Comment probably should be updated.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 00:38:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Yida Wu (Code Review)" <ge...@cloudera.org>.
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Patch Set 3:

(6 comments)

Thanks for the fix, it looks good, just some questions to understand the change.

http://gerrit.cloudera.org:8080/#/c/19496/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19496/3//COMMIT_MSG@17
PS3, Line 17: priority
Just to confirm that the priority is removed in the document for the remote scratch directory, however, can still be parsed in the code.


http://gerrit.cloudera.org:8080/#/c/19496/3/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/19496/3/be/src/runtime/tmp-file-mgr-test.cc@2240
PS3, Line 2240: hdfs
Maybe it is not related, but when I test with a space behind the url like --impalad_args='--scratch_dirs="hdfs://localhost:8020/ , /tmp"', it will try to create a path on " hdfs://localhost:8020/ /impala-scratch", could you please take a look on the rtrim of the path and add a testcase for it if it is an issue?


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml
File docs/topics/impala_disk_space.xml:

http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@298
PS3, Line 298: /dir1::0
The alignment looks weird compared to the previous version, does this part show normally?


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@303
PS3, Line 303: /dir1:200GB
             : /dir1:200GB:
Same as above


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@486
PS3, Line 486: ozone1
Just a question that what is the difference between Ozone service identifier and Ozone Manager? Is it no difference for the Impala to parse the Ozone scratch path?


http://gerrit.cloudera.org:8080/#/c/19496/3/docs/topics/impala_disk_space.xml@488
PS3, Line 488:  
nit. looks good to have a comma here



-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Shajini Thayasingh <st...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Feb 2023 10:53:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11920: Support spill to HDFS address by service ID

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has removed Shajini Thayasingh from this change.  ( http://gerrit.cloudera.org:8080/19496 )

Change subject: IMPALA-11920: Support spill to HDFS address by service ID
......................................................................


Removed reviewer Shajini Thayasingh.
-- 
To view, visit http://gerrit.cloudera.org:8080/19496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ie069cba211df85fe90d174900b20a26fcc1f7167
Gerrit-Change-Number: 19496
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>