You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2018/09/26 16:01:57 UTC

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11520


Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within
the scan range. Sub-ranges can be added by ScanRange::AddSubRanges().
If done so, then the ScanRange class will only read the parts defined
by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reasds during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
14 files changed, 433 insertions(+), 139 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 1:

(3 comments)

I did a quick pass, I plan to return soon.

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/hdfs-file-reader.cc@213
PS1, Line 213:   }
Shouldn't the function return here?


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@210
PS1, Line 210:     } else {
             :       DCHECK(false);
             :     }
Why is this 'else' branch needed? One of the last two conditions should be always true.


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@281
PS1, Line 281: ReadSubRangesFromCache
This and ReadSubRanges look very similar - they could be merged to reduce code size.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:10:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(4 comments)

I'm making my way through this. First round of comments.

At a high level, I think another dimension of the design is how it interacts with future efforts to manage IO patterns (e.g. IMPALA-6673). I don't think that should hold up this review, and I don't see a problem with the SubRange abstraction.

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49
PS3, Line 49: 'eosr' is set to true when end of file has reached,
            :   /// or the file reader has read all the bytes needed by 'scan_range_'.
Depending on what we do with eosr, we may need to update this comment.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eosr = true;
eosr is a bit weird now. It can detect end of file, but not end of scan range. I'll be taking another look at eosr and how it gets set, but we might end up renaming this variable to be eof.

I might be missing something, but the eosr logic seems like it could be implemented here. We have access to the scan_range_ and can increment bytes_read_. The ScanRange has bytes_to_read_ which is similar to len() today.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416
PS3, Line 416:   /// They need to be ordered by offset and cannot overlap.
I think it would be useful to combine contiguous SubRanges into a single SubRange. (It doesn't necessarily need to happen in this code change.) Should we require that the SubRanges are non-contiguous (forcing the caller to merge them) or should we merge them ourselves?

(Is there a time when we wouldn't want to merge contiguous SubRanges?)


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> Yeah, I think it's not necessary to test eosr in the loop condition, but ot
About this DCHECK. We have a condition in ReadFromPos() that raises some questions for me:

https://github.com/apache/impala/blob/master/be/src/runtime/io/hdfs-file-reader.cc#L152

I'm reading into what would trigger that condition.

My current thought: If we issue an hdfsRead() call where the offset is near the end of the file and the length goes beyond the end of the file, does hdfsRead() produce an error or does it do a partial read and return less than the requested amount of data?



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Oct 2018 00:28:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 7: Code-Review+2

This looks good!


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 16:33:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/962/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Oct 2018 15:41:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#6).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within the scan
range. Sub-ranges can be added to a scan range when calling
ScanRange::Reset(). If done so, the ScanRange class will only read the
parts defined by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 511 insertions(+), 158 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(1 comment)

I looked at the interface changes and the tests and this seems good. I haven't done a detailed review of the code but you should feel free to move forward without me doing that (I don't want to hold this up).

http://gerrit.cloudera.org:8080/#/c/11520/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11520/3//COMMIT_MSG@28
PS3, Line 28:  * extended disk-io-mgr-test.cc with sub-ranges
I'd suggest running the standalone disk-io-mgr-stress-test with -duration_sec=0 for a while just to make sure that hasn't regressed.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Oct 2018 19:32:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11520/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11520/1//COMMIT_MSG@27
PS1, Line 27: reasds
type: reads


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc@429
PS1, Line 429:       if (cached_read_succeeded) {
Can you add some comments about this path?


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc@485
PS1, Line 485:       *needs_buffers = range->external_buffer_tag() ==
Same as line 429.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Sep 2018 12:34:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/810/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 16:34:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/837/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Sep 2018 13:42:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within the scan
range. Sub-ranges can be added to a scan range when calling
ScanRange::Reset(). If done so, the ScanRange class will only read the
parts defined by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Reviewed-on: http://gerrit.cloudera.org:8080/11520
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 511 insertions(+), 158 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@263
PS1, Line 263:         sub_range.offset + sub_range_pos_.bytes_read, buffer_desc->buffer_ + buffer_desc->len(),
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 16:02:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3328/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 16:39:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

One small comment. Otherwise, looks good!

http://gerrit.cloudera.org:8080/#/c/11520/6/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/6/be/src/runtime/io/scan-range.cc@466
PS6, Line 466: void ScanRange::Reset(hdfsFS fs, const char* file, int64_t len, int64_t offset,
             :     int disk_id, bool expected_local, const BufferOpts& buffer_opts,
             :     vector<SubRange>&& sub_ranges, void* meta_data) {
             :   Reset(fs, file, len, offset, disk_id, expected_local, buffer_opts, meta_data);
             :   AddSubRanges(move(sub_ranges));
             : }
Nit: Just as a style thing, my preference would be to have the base Reset() take sub_ranges as an argument and initialize sub_ranges_. The other Reset() would call in with an empty list.
i.e.

Reset(args) -> Reset(args, sub_ranges)

rather than

Reset(args, sub_ranges) -> Reset(args) then init sub_ranges

In this case, it might make sense for AddSubRanges() to be renamed to InitSubRanges().



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Oct 2018 23:34:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#3).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within
the scan range. Sub-ranges can be added by ScanRange::AddSubRanges().
If done so, then the ScanRange class will only read the parts defined
by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 462 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1022/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 10:07:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 4:

(4 comments)

Thanks for the comments.
Yeah, looking at IMPALA-6673 I don't think there should be a conflict with these changes.

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49
PS3, Line 49: 'eof' is set to true when end of file has reached.
            :   virtual Status ReadFromPos(int64_t file_offset, uint8_t* buffer,
> Depending on what we do with eosr, we may need to update this comment.
Updated comment, and renamed eosr to eof.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eof = true;
> eosr is a bit weird now. It can detect end of file, but not end of scan ran
I renamed eosr to eof.
Currently file readers doesn't see how many bytes have been read (copied) from cache when the file is available in the HDFS cache. This is the main reason why this logic is implemented there.

Putting that logic into the file readers doesn't seem wrong to me, we would need to change some parts of ScanRange, but nothing dramatic. What do you think?


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416
PS3, Line 416:   /// Validates the sub-ranges. All sub-range must be insid
> I think it would be useful to combine contiguous SubRanges into a single Su
Yeah, I was thinking about it after uploading the last PS.
Now I've added MergeSubRanges() which I call in AddSubRanges().


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> About this DCHECK. We have a condition in ReadFromPos() that raises some qu
My idea is to only issue such read operations (that go beyond the end of the file) when there are no sub-ranges.

We should only issue sub-ranges when we exactly know what do we want to read, e.g. Parquet pages in a column chunk.

Btw, hdfsRead() does a partial read in this scenario, but we only allow that for ScanRanges that don't have sub-ranges.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Oct 2018 15:14:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 20:33:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 16:39:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#5).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within the scan
range. Sub-ranges can be added to a scan range when calling
ScanRange::Reset(). If done so, the ScanRange class will only read the
parts defined by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 512 insertions(+), 158 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1059/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 14:08:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#2).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within
the scan range. Sub-ranges can be added by ScanRange::AddSubRanges().
If done so, then the ScanRange class will only read the parts defined
by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reasds during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 453 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eof = true;
> I renamed eosr to eof.
I'm thinking that eof is the right way to go. The FileReader is a new abstraction, but I think it is right to avoid having it track end of scan range. FileReader doesn't really have a reason to know that.


http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h@251
PS4, Line 251:   /// Add sub-ranges to this ScanRange. If sub_ranges is not empty, then ScanRange won't
             :   /// read everything from its range, but will only read these sub-ranges.
             :   /// Sub-ranges need to be ordered by 'offset' and cannot overlap with each other.
             :   /// Doesn't need to merge continuous sub-ranges in advance, this method will do.
             :   void AddSubRanges(std::vector<SubRange>&& sub_ranges);
Will we know the SubRanges when we call Reset()? (BaseScalarColumnReader::Reset()->HdfsScanNodeBase::AllocateScanRange()) I'm wondering whether it makes sense for Reset() to take in the SubRanges as an argument rather than having an AddSubRanges() function. I like Reset() behaving somewhat like a constructor and taking everything in one go.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> My idea is to only issue such read operations (that go beyond the end of th
I think it makes sense for us to view end-of-file or partial read as errors, but I also see it as something that is beyond our control. We don't have complete control on what the file is going to look like. It can be overwritten or modified in place. So, I would rather return an error Status (maybe SCANNER_INCOMPLETE_READ?) than have a DCHECK here.


http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc@236
PS4, Line 236: buffer_desc->eosr_ |= bytes_read_ == bytes_to_read_;
One thing we should try to do is make it clear what this calculation is doing (and why we are passing *eosr in as the eof arg). I'll think on this some more, but at the least, a comment explaining that eosr can be because we hit eof or because there are no more bytes to read (or for the subranges case, there are no more subranges to read).



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 05:18:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1023/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 10:15:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/hdfs-file-reader.cc@213
PS1, Line 213:     return;
> Shouldn't the function return here?
Yes, this was a bug. Fortunately the e2e tests also catched it. Done.


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc@429
PS1, Line 429:           needs_buffers));
> Can you add some comments about this path?
Done


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/request-context.cc@485
PS1, Line 485:   AddActiveScanRangeLocked(lock, range);
> Same as line 429.
Done


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@210
PS1, Line 210:     COUNTER_ADD_IF_NOT_NULL(reader_->bytes_read_counter_, buffer_desc->len_);
             :     COUNTER_ADD_IF_NOT_NULL(reader_->active_read_thread_counter_, -1L);
             :   }
> Why is this 'else' branch needed? One of the last two conditions should be 
Done


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@263
PS1, Line 263:           sub_range.offset + sub_range_pos_.bytes_read,
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11520/1/be/src/runtime/io/scan-range.cc@281
PS1, Line 281: K();
> This and ReadSubRanges look very similar - they could be merged to reduce c
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Sep 2018 13:10:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> optional: If this is true, then the loop seems more complex then necessary,
Yeah, I think it's not necessary to test eosr in the loop condition, but other than that I don't think it would significantly simplify the code structure.

We still have three exit conditions:
* buffer is full
* we've read all sub-ranges
* there was an error during the read

Adding more exit branches to the loop body don't simplify the code IMO. And putting them in the for-loop condition doesn't seem natural. 

Plus, when incrementing sub_range_pos_.index we also need to set sub_range_pos_.bytes_read to zero.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Sep 2018 12:00:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(1 comment)

Thanks Tim! I just don't know who else knows this part of the code well enough to give it a +2 confidently.

Anyway, I'm not blocked on it right now, I can make progress so it's not that urgent. But I'll be blocked by it in 2-3 weeks.

http://gerrit.cloudera.org:8080/#/c/11520/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11520/3//COMMIT_MSG@28
PS3, Line 28:  * extended disk-io-mgr-test.cc with sub-ranges
> I'd suggest running the standalone disk-io-mgr-stress-test with -duration_s
I've ran it for a while, seems like there is no regression.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Oct 2018 18:05:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eof = true;
> I'm thinking that eof is the right way to go. The FileReader is a new abstr
I agree, it is cleaner to keep FileReader simple and keep higher level logic in ScanRange.


http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h@254
PS5, Line 254:       bool expected_local, const BufferOpts& buffer_opts,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/request-ranges.h@251
PS4, Line 251:   /// Same as above, but it also adds sub-ranges. No need to merge contiguous sub-ranges
             :   /// in advance, as this method will do the merge.
             :   void Reset(hdfsFS fs, const char* file, int64_t len, int64_t offset, int disk_id,
             :       bool expected_local, const BufferOpts& buffer_opts,
             :       std::vector<SubRange>&& sub_ranges, void* meta_dat
> Will we know the SubRanges when we call Reset()? (BaseScalarColumnReader::R
Yes, I think we'll always know the SubRanges when we call Reset().

I overloaded Reset() to have another version that takes SubRanges. This way I didn't have to modify the current call sites that pass in metadata.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268:  (!read_status.ok()) return read_status;
> I think it makes sense for us to view end-of-file or partial read as errors
Seems valid, now it returns SCANNER_INCOMPLETE_READ in this case as suggested.


http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/4/be/src/runtime/io/scan-range.cc@236
PS4, Line 236: buffer_desc->eosr_ = eof || bytes_read_ == bytes_to_
> One thing we should try to do is make it clear what this calculation is doi
Introduced a new local boolean variable called 'eof'.
Also added a short comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 09:58:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/5/be/src/runtime/io/request-ranges.h@254
PS5, Line 254:       bool expected_local, const BufferOpts& buffer_opts, std::vector<SubRange>&& sub_ranges,
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 09:36:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11520/6/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/6/be/src/runtime/io/scan-range.cc@466
PS6, Line 466:   io_mgr_ = nullptr;
             :   reader_ = nullptr;
             :   sub_ranges_.clear();
             :   sub_range_pos_ = {};
             :   InitSubRanges(move(sub_ranges));
             : }
> Nit: Just as a style thing, my preference would be to have the base Reset()
Thanks, yeah it's more usual and clean that way. Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 14:23:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#7).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within the scan
range. Sub-ranges can be added to a scan range when calling
ScanRange::Reset(). If done so, the ScanRange class will only read the
parts defined by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 511 insertions(+), 158 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
optional: If this is true, then the loop seems more complex then necessary, it could be something like:

for (; sub_range_pos_.index < sub_ranges_.size(); sub_range_pos_.index++):
  end_of_buffer = rest_of_buffer < rest_of_sub_range
  bytes_to_read = end_of_buffer ? ... : ...
  ... do the read ...
  if end_of_buffer:
    ...
    return status.ok()
  else 
    ...
*eosr = true
return status.ok()



-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Sep 2018 15:26:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/836/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Sep 2018 13:19:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7543: Enhance scan ranges to support sub-ranges

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11520

to look at the new patch set (#4).

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................

IMPALA-7543: Enhance scan ranges to support sub-ranges

This commit enhances the ScanRange class to make it possible to only
read some smaller parts of the whole ScanRange. This functionality
is needed by IMPALA-5843.

A sub-range is an offset and length which is located within
the scan range. Sub-ranges can be added by ScanRange::AddSubRanges().
If done so, then the ScanRange class will only read the parts defined
by the sub-ranges.

If we have sub-ranges for a cache read then the ScanRange won't
enqueue the whole cache buffer (which contains the whole ScanRange),
but memcpy() the sub-ranges to IO/client buffers.

Smaller refactorings needed to do:
 * remove scan_range_offset_ from BufferDescriptor
 * number of bytes read are bookkeeped by ScanRange again

Testing:
 * introduced CacheReaderTestStub to fake cache reads during testing
 * extended disk-io-mgr-test.cc with sub-ranges

Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
---
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/cache-reader-test-stub.h
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
D be/src/runtime/io/file-reader.cc
M be/src/runtime/io/file-reader.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/local-file-reader.cc
M be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 499 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11520/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>