You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org> on 2018/02/01 01:28:21 UTC

[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8707 )

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
......................................................................


Patch Set 20:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.h@289
PS20, Line 289: AddBuffersToRange
nit: AllocateBuffersForRange


http://gerrit.cloudera.org:8080/#/c/8707/14/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8707/14/be/src/runtime/io/disk-io-mgr.h@259
PS14, Line 259: do
nit: does


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/disk-io-mgr.cc@826
PS20, Line 826: ScheduleMode::IMMEDIATELY
please see comment on request-context.cc::179


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc@174
PS20, Line 174: disk_state->ScheduleContext(lock, this, range->disk_id());
shouldn't this be called only for ScheduleMode::IMMEDIATELY?

for ScheduleMode::UPON_GETNEXT: GetNextRange() will be scheduling this context
and
for ScheduleMode::BY_CALLER : the caller will explicitly call ScheduleContext()


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-context.cc@179
PS20, Line 179: DCHECK(schedule_mode == ScheduleMode::IMMEDIATELY) << static_cast<int>(schedule_mode);
can you document this in the method comment, that schedule_mode should always be ScheduleMode::IMMEDIATELY for writes.

Also, I have some confusion over here too, initially we required schedule_immediately to be always false, yet we always called  ScheduleContext() for writes. What should be the desired behavior here?


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h@437
PS20, Line 437: unused_buffers_
should we call this unused_internal_buffers? to make the distinction clear from external buffers


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/request-ranges.h@444
PS20, Line 444: unused_buffer_bytes_returned_
this implicitly signifies the amount of bytes that have been read or are being read, which is why we are using this to check if we need to hold onto buffers.

this however is incremented on call to GetUnusedBuffer(), which is also being used in CleanUpUnusedBuffers().
Although this wont make any difference to the correctness of how this variable is being used, since CleanUpUnusedBuffers() is only called when eosr is true or scanner is cancelled, after which this variable will be redundant....still I would prefer this variable be incremented in GetNextBufferForRange() instead and renamed appropriately


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@139
PS20, Line 139: unused_buffer_bytes_ += buffer->buffer_len();
should we do a check for (unused_buffer_bytes_ >= len_ - unused_buffer_bytes_returned_) after absorbing each buffer.

This way we can hold on to only the buffers we need to read the remainder of the file and cleanup the rest of them


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@141
PS20, Line 141: if (blocked_on_buffer_) {
              :           // This scan range was blocked and is no longer, schedule it again.
              :           blocked_on_buffer_ = false;
              :           schedule_range = true;
              :         }
maybe do this outside the for loop?


http://gerrit.cloudera.org:8080/#/c/8707/20/be/src/runtime/io/scan-range.cc@212
PS20, Line 212: boost::
nit: can be removed



-- 
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 20
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Feb 2018 01:28:21 +0000
Gerrit-HasComments: Yes