You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Smith (Code Review)" <ge...@cloudera.org> on 2022/06/20 21:43:22 UTC

[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files

Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
......................................................................


Patch Set 12:

(19 comments)

A mix of questions and suggestions. I didn't see anything clearly wrong.

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55
PS12, Line 55: class MemBlock {
This seems like a fairly leaky abstraction. It mostly exists to handle the lifecycle around data_, but also relies on external operations to do lots of the actual management. Not sure I have specific improvements to suggest however.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@60
PS12, Line 60:     DCHECK_EQ(static_cast<int>(status_), static_cast<int>(MemBlockStatus::DISABLED));
Can we also verify no memory leaks by checking `data_`?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220
PS12, Line 220:     std::unique_ptr<int64_t[]> page_cnts_per_block_;
This seems a little funny, but I think I see why it's necessary. A const vector wouldn't let you update element values, and a non-const vector could potentially be resized. This is the only way to get a runtime fixed-size array.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@386
PS12, Line 386:   bool DoReadFromReadBuffer(
This function doesn't take any action, so I'm not really sure why it's called "DoRead". I'd opt for Can, Should, or Must (depending on expectations here).


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446
PS12, Line 446:   // Helper function to check the status of a read buffe block.
nit: buffe -> buffer


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458
PS12, Line 458:   // Helper function to delete the read buffe block.
nit: buffe -> buffer


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516
PS12, Line 516:   /// The lock also protects the memory blocks from destruction, if the disk file has.
"if the disk file has" seems like an incomplete sentence.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99
PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) {
Some unit tests around MemBlock transitions might be useful.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@116
PS12, Line 116:       SetStatusLocked(lock, MemBlockStatus::DISABLED);
It would be nice to have a DCHECK(data_ == nullptr) here. It would be true immediately after freeing ALLOC memory, or in other states (because no memory should have been allocated).


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc@463
PS12, Line 463:     // If there was an error during reading, keep the old status.
Might be helpful to log errors that we can't return.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152
PS12, Line 152:   RequestRange(RequestType::type request_type, int disk_id = -1, int64_t offset = -1)
How does an offset of -1 differ from 0?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156
PS12, Line 156:     if (!use_mem_buffer) {
Since we asserted above that use_local_buff implies !use_mem_buffer, this check is redundant.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291
PS12, Line 291:       // 1. If it is the local buffer file is not deleted(evicted) yet.
Change to "If the local buffer file ..."


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835
PS12, Line 1835:   int64_t file_size = 512 * 1024 * 1024;
Why was this changed?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@108
PS12, Line 108:     "files. Only valid when the remote_batch_read is true.");
"Only valid when remote_batch_read is true." seems more consistent with how we talk about other flags.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117
PS12, Line 117:     "Set if the system uses batch reading for the remote temporary files.");
This description could be more descriptive. Maybe "Set to prefetch additional blocks when reading from remote temporary files." Unless prefetch is the wrong term; it's not clear what's being batched without digging into the source code a lot.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@259
PS12, Line 259:       LOG(WARNING) << "Disable the read buffer for the remote temporary files "
"Disabled the read buffer for remote temporary files due to errors in read buffer parameters: "


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1116
PS12, Line 1116:     // ahead of the real allocation, we need to decrease it if the block is reversed
reversed -> reserved


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1132
PS12, Line 1132:   // Set a flag to notify other threads which are hold the file lock to release,
nit: hold -> holding



-- 
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 12
Gerrit-Owner: Yida Wu <wy...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Jun 2022 21:43:22 +0000
Gerrit-HasComments: Yes