You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2023/05/12 06:54:51 UTC

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19877


Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................

IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

When using HDFS caching, subrange reads are copying memory
from the HDFS cache buffer into the output buffer. When
the HDFS cache buffer starts at an offset into the file,
the subrange offsets need to be adjusted to point to the
relative place within the cache buffer. This fixes the
calculation to use the relative locations.

Testing:
 - Modified the caching tests in disk-io-mgr-test to also
   test with reads at an offset
 - In a local minicluster, creating a table based on
   alltypes_tiny_pages.parquet, added HDFS caching for the table,
   ran the SQLs in parquet-page-index-alltypes-tiny-pages.test,
   and checked the HDFS caching reads in the profile.

Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
---
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/scan-range.cc
2 files changed, 51 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/19877/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13013/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 07:15:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 2:

I changed this to also return Status rather than read past the end of the buffer. I think that gives me more peace of mind.


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 17:18:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19877

to look at the new patch set (#2).

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................

IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

When using HDFS caching, subrange reads are copying memory
from the HDFS cache buffer into the output buffer. When
the HDFS cache buffer starts at an offset into the file,
the subrange offsets need to be adjusted to point to the
relative place within the cache buffer. This fixes the
calculation to use the relative locations.

Testing:
 - Modified the caching tests in disk-io-mgr-test to also
   test with reads at an offset
 - In a local minicluster, creating a table based on
   alltypes_tiny_pages.parquet, added HDFS caching for the table,
   ran the SQLs in parquet-page-index-alltypes-tiny-pages.test,
   and checked the HDFS caching reads in the profile.

Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
---
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/scan-range.cc
2 files changed, 58 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/19877/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13023/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 17:20:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9314/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 17:19:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 22:45:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19877/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19877/1//COMMIT_MSG@19
PS1, Line 19:  - In a local minicluster, creating a table based on
            :    alltypes_tiny_pages.parquet, added HDFS caching for the table,
            :    ran the SQLs in parquet-page-index-alltypes-tiny-pages.test,
            :    and checked the HDFS caching reads in the profile.
> Would it be possible to add an e2e test for this?
e2e tests should be possible, but there are some interactions with OS security settings to get caching to work properly. I filed IMPALA-12139 for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 17:18:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 2: Code-Review+2

Going ahead with this, carrying +2


-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 23:10:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

Thanks for finding and fixing this bug!

http://gerrit.cloudera.org:8080/#/c/19877/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19877/1//COMMIT_MSG@19
PS1, Line 19:  - In a local minicluster, creating a table based on
            :    alltypes_tiny_pages.parquet, added HDFS caching for the table,
            :    ran the SQLs in parquet-page-index-alltypes-tiny-pages.test,
            :    and checked the HDFS caching reads in the profile.
Would it be possible to add an e2e test for this?



-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 12:33:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19877 )

Change subject: IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges
......................................................................

IMPALA-12123 (part 2): Fix offsets for HDFS caching with subranges

When using HDFS caching, subrange reads are copying memory
from the HDFS cache buffer into the output buffer. When
the HDFS cache buffer starts at an offset into the file,
the subrange offsets need to be adjusted to point to the
relative place within the cache buffer. This fixes the
calculation to use the relative locations.

Testing:
 - Modified the caching tests in disk-io-mgr-test to also
   test with reads at an offset
 - In a local minicluster, creating a table based on
   alltypes_tiny_pages.parquet, added HDFS caching for the table,
   ran the SQLs in parquet-page-index-alltypes-tiny-pages.test,
   and checked the HDFS caching reads in the profile.

Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Reviewed-on: http://gerrit.cloudera.org:8080/19877
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/scan-range.cc
2 files changed, 58 insertions(+), 22 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/19877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icd0ccc677b090ef3a75047defb4d7bda4279dac6
Gerrit-Change-Number: 19877
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>