You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2018/09/14 16:19:01 UTC

[Impala-ASF-CR] IMPALA-7556: part1: handle different file systems via polymorphism

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11444


Change subject: IMPALA-7556: part1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 765 insertions(+), 419 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/679/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Sep 2018 17:03:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Reviewed-on: http://gerrit.cloudera.org:8080/11444
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 736 insertions(+), 398 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#6).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 739 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

One final comment about comments.

http://gerrit.cloudera.org:8080/#/c/11444/7/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/7/be/src/runtime/io/hdfs-file-reader.h@69
PS7, Line 69:   /// Total number of bytes read remotely. This is necessary to maintain a count of
            :   /// the number of remote scan ranges. Since IO statistics can be collected multiple
            :   /// times for a scan range, it is necessary to keep some state about whether this
            :   /// scan range has already been counted as remote. There is also a requirement to
            :   /// log the number of unexpected remote bytes for a scan range. To solve both
            :   /// requirements, maintain num_remote_bytes_ on the ScanRange and push it to the
            :   /// reader_ once at the close of the scan range.
This comment is probably hard to understand for someone who doesn't know (yet) that there is a 1to1 mapping between scan ranges and file readers.

My suggestion is extend the class comments to stress this 1to1 mapping.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 09:42:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#4).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 742 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 10:24:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@81
PS2, Line 81: st scan
> totally nitpicking: I disagree in this case, because lock() can suggests to
For now I leave it as it is but I see your point.


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@118
PS5, Line 118:       DCHECK_GE(chunk_size, 0);
> I think that this should be DCHECK_GT - (bytes_to_read - *bytes_read) is po
Done


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@131
PS5, Line 131:         // Do not retry:
             :         // - if read was successful (current_bytes_read != -1)
             :         // - or if already retried once
             :         // - or if this not using a borrowed file handle
             :         DCHECK_LE(num_retries, 1);
             :         if (current_bytes_read != -1 || borrowed_hdfs_fh == nullptr
             :             || num_retries == 1) {
             :           break;
             :         }
> Now that the "logic that is retried" is much simpler I would consider to re
Done


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@150
PS5, Line 150:       if (!status.ok())
> A DCHECK could be added to ensure that current_bytes_read == -1 iff !status
Done


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@190
PS5, Line 190: seek_failed
> This variable is no longer used.
Done


http://gerrit.cloudera.org:8080/#/c/11444/4/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/4/be/src/runtime/io/local-file-reader.cc@55
PS4, Line 55:   unique_lock<SpinLock> fs_lock(lock_);
            :   DCHECK(scan_range_->read_in_flight());
            :   DCHECK_GE(bytes_to_read, 0);
            :   // Delay before acquiring the lock, to allow triggering IMPALA-6587 race.
            : #ifndef NDEBUG
            :   if (FLAGS_stress_disk_read_delay_ms > 0) {
            :     SleepForMs(FLAGS_stress_disk_read_delay_ms);
            :   }
            : #endif
> The comment contradicts the code because the lock is acquired before the sl
Nice catch!

Yeah, I first wanted to use the template method pattern, i.e. have non-virtual public member functions that call virtual private functions that are overridden by the children.

However, there's little common code so I'm not sure if it worth the trouble. For now I leave it as it is.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 14:26:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@132
PS1, Line 132:                 GetHdfsErrorMsg("Error reading from HDFS file: ", *scan_range_->file_string()));
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@142
PS1, Line 142:                       position_in_file, *scan_range_->file_string(), GetHdfsErrorMsg("")));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@207
PS1, Line 207:     if (scan_range_->external_buffer_tag_ == ScanRange::ExternalBufferTag::CACHED_BUFFER) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@265
PS1, Line 265:       hadoopReadZero(exclusive_hdfs_fh_->file(), io_mgr_->cached_read_options(), scan_range_->len());
line too long (101 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Sep 2018 17:10:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 1:

(11 comments)

This reply contains comments for PS1 that tell you where the code was copied from.

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.h@46
PS1, Line 46:   /// Hadoop filesystem that contains the file being read.
            :   hdfsFS hdfs_fs_;
Same as 'fs_' in scan_range_.


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.h@49
PS1, Line 49:   /// The hdfs file handle is stored here in three cases:
            :   /// 1. The file handle cache is off (max_cached_file_handles == 0).
            :   /// 2. The scan range is using hdfs caching.
            :   /// -OR-
            :   /// 3. The hdfs file is expected to be remote (expected_local_ == false)
            :   /// In each case, the scan range gets a new ExclusiveHdfsFileHandle at Open(),
            :   /// owns it exclusively, and destroys it in Close().
            :   ExclusiveHdfsFileHandle* exclusive_hdfs_fh_ = nullptr;
            : 
            :   /// If true, we expect the reads to be a local read. Note that if this is false,
            :   /// it does not necessarily mean we expect the read to be remote, and that we never
            :   /// create scan ranges where some of the range is expected to be remote and some of it
            :   /// local.
            :   /// TODO: we can do more with this
            :   const bool expected_local_;
            : 
            :   /// Total number of bytes read remotely. This is necessary to maintain a count of
            :   /// the number of remote scan ranges. Since IO statistics can be collected multiple
            :   /// times for a scan range, it is necessary to keep some state about whether this
            :   /// scan range has already been counted as remote. There is also a requirement to
            :   /// log the number of unexpected remote bytes for a scan range. To solve both
            :   /// requirements, maintain num_remote_bytes_ on the ScanRange and push it to the
            :   /// reader_ once at the close of the scan range.
            :   int64_t num_remote_bytes_ = 0;
Extracted from ScanRange


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@45
PS1, Line 45: Status HdfsFileReader::Open(bool use_file_handle_cache) {
Copied from ScanRange::Open()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@74
PS1, Line 74: Status HdfsFileReader::ReadFromPos(int64_t file_offset, uint8_t* buffer,
Copied from ScanRange::Read()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@201
PS1, Line 201: Status HdfsFileReader::Close() {
Copied from ScanRange::Close()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@249
PS1, Line 249: Status HdfsFileReader::ReadFromCache(const unique_lock<mutex>& reader_lock,
Copied from ScanRange::ReadFromCache()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/hdfs-file-reader.cc@310
PS1, Line 310: void HdfsFileReader::GetHdfsStatistics(hdfsFile hdfs_file) {
Copied from ScanRange::GetHdfsStatistics()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.h
File be/src/runtime/io/local-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.h@40
PS1, Line 40:   FILE* file_ = nullptr;
Extracted from ScanRange


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.cc@36
PS1, Line 36: Status LocalFileReader::Open(bool use_file_handle_cache) {
Copied from ScanRange::Open()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.cc@53
PS1, Line 53: Status LocalFileReader::ReadFromPos(int64_t file_offset, uint8_t* buffer,
Copied from ScanRange::Read()


http://gerrit.cloudera.org:8080/#/c/11444/1/be/src/runtime/io/local-file-reader.cc@107
PS1, Line 107: Status LocalFileReader::Close() {
Copied from ScanRange::Close()



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 11:46:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3194/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 12:20:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/732/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 11:11:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/709/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Sep 2018 16:19:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 2:

(8 comments)

I have minimal experience with IO manager so I may not understand the intent of the refactor in some places.

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@122
PS2, Line 122:       while (true) {
optional: If this part is already touched, it would be nice to extract some parts to functions to reduce function length/nestedness. At the first glance I would try to extract the part that is "retried" to a function like Status ReadFromPosInternal(...lot of args).


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@201
PS2, Line 201: Close
In the .h Close() is after ReadFromCache(). I would prefer this order in .cc to be consistent and also to group read functions together.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@203
PS2, Line 203: closed_file
Variable closed_file looks unnecessary.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@207
PS2, Line 207: if
optional: as this block mainly uses private ScanRange variables, I would extract it to a function in ScanRange like FreeCachedExternalBuffer().


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@249
PS2, Line 249: ReadFromCache
I would consider moving (most of) this back to ScanRange::ReadFromCache(). The local file version doesn't seem meaningful + this uses scan_range's private variables much more often than HDFSFileReader's.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@334
PS2, Line 334:  
nit: other members have no space after =.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc@100
PS2, Line 100: Status LocalFileReader::ReadFromCache(const unique_lock<mutex>& reader_lock,
             :       bool* read_succeeded) {
             :   DCHECK(reader_lock.mutex() == &request_context_->lock_ && reader_lock.owns_lock());
             :   DCHECK_EQ(bytes_read_, 0);
             :   return Status::OK();
             : }
This looks weird to me - is this ever called?
I checked the callsites of ReadFromCache() and read_succeeded is not initialized before the call, so function will also leave it in an uninitialized state.

If this should never be called, then I would replace add DCHECK(false).


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc@109
PS2, Line 109: closed_file
Variable closed_file looks unnecessary.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Sep 2018 14:32:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#7).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 734 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/737/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 16:51:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/743/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 10:39:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@150
PS5, Line 150:         break;
> Done
Actually the other combination (when status is ok but current_bytes_read is -1) is the one I wanted to check.


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/hdfs-file-reader.cc@131
PS6, Line 131:       // Retry if:
             :       // - first read was not successful
             :       // and
             :       // - used a borrowed file handle
I find this comment to be a bit redundant, but feel free to keep it if you disagree.


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc@22
PS6, Line 22: #include "runtime/io/request-context.h"
Is this header really needed?


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc@24
PS6, Line 24: #include "util/hdfs-util.h"
Is this header really needed?


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h@188
PS6, Line 188: LocalFileReader
Does LocalFileReader actually use RequestContext's private members?


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h@189
PS6, Line 189: HdfsFileReader
I would prefer to remove this friendship if it doesn't complicate code too much. My idea would be to move stats/counters pointers to a single public struct, and pass a pointer to this to HdfsFileReader.

Such a struct would be meaningful in my opinion, because it would contain those members that are manipulated from more than one threads and do not need locking (as they are atomics or pointers to atomics).


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h@70
PS6, Line 70: LocalFileReader
Is this friendship really needed?


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h@197
PS6, Line 197: LocalFileReader
Is this friendship really needed?



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 15:30:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 3:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@41
PS2, Line 41:   /// Returns number of bytes read by this file reader.
> I think we always call SetRequestContext() and SetDiskIoMgr() at the same t
I modifed the file readers to use the io mgr and request context objects through 'scan_range_'.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@50
PS2, Line 50: bytes act
> bytes_read() since it's just a trivial accessor
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@81
PS2, Line 81: n_range
> this is a plain accessor so prefer 'lock()'
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@93
PS2, Line 93: 
> const? I don't think this changes after initialization.
I changed it to a reference.

It can't be a const-ref since we are modifying scan_range_.external_buffer_tag_ in Close(). Also scan_range_.file_string() is a non-const member function. I could create a const version of it, but we pass the returned pointer to functions that take non-const pointers.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.h@47
PS2, Line 47:       bool is_borrowed_fh, uint8_t* buffer, int64_t chunk_size, int* bytes_read);
> const, since I don't think this is reassigned
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@108
PS2, Line 108:     hdfs_file = borrowed_hdfs_fh->file();
> This part was just a mechanical move, right? Would be helpful to flag the p
E.g. should I annotate PS1 with gerrit comments? It is the closest PS to the original version, then people could see the diffs relative to PS1.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@122
PS2, Line 122:       // bytes_read_ is only updated after the while loop
> optional: If this part is already touched, it would be nice to extract some
Yeah I was thinking about it, but didn't want to do it in the first PS.
Done.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@132
PS2, Line 132:         // - if read was successful (current_bytes_read != -1)
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@142
PS2, Line 142:         req_context_read_timer.Stop();
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@201
PS2, Line 201: sRead
> In the .h Close() is after ReadFromCache(). I would prefer this order in .c
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@203
PS2, Line 203: f (*bytes_r
> Variable closed_file looks unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@207
PS2, Line 207:       }
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@207
PS2, Line 207:   
> optional: as this block mainly uses private ScanRange variables, I would ex
I moved cached_buffer_ into HdfsFileReader since then.
For now I leave it as it is.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@249
PS2, Line 249: .
> I would consider moving (most of) this back to ScanRange::ReadFromCache(). 
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@265
PS2, Line 265:                    << PrettyPrinter::Print(num_remote_bytes_, TUnit::BYTES)
> line too long (101 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@334
PS2, Line 334: 
> nit: other members have no space after =.
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.h
File be/src/runtime/io/local-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.h@40
PS2, Line 40:   /// Points to a C FILE object between calls to Open() and Close(), otherwise nullptr.
> Maybe briefly document how this fits into the lifecycle, e.g. set in Open()
Added comment about it.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc@100
PS2, Line 100: void* LocalFileReader::CachedFile() {
             :   DCHECK(false);
             :   return nullptr;
             : }
             : 
             : S
> This looks weird to me - is this ever called?
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.cc@109
PS2, Line 109:  = nullptr;
> Variable closed_file looks unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h@274
PS2, Line 274:   int BytesRead() const;
> This needs a little more thought - we needed to hold lock_ or scan_range_lo
Moved cancel_status_ to the section of 'private members that are accessed by other io:: classes'.

The comment was almost already good IMO, only needed some minor changes.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h@482
PS2, Line 482: 
> Comments referencing hdfs_lock_ need updating.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Sep 2018 15:47:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 2:

(9 comments)

I think this makes sense. I think the documentation of invariants around some of the variables could be better and make it easier to work on the code in future.

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@41
PS2, Line 41:   void SetRequestContext(RequestContext* request_context) {
I think we always call SetRequestContext() and SetDiskIoMgr() at the same time as ResetState() so I think we can just combine them. Would make the intended use pattern of the API clearer.

Or we could just expose request_context_ and io_mgr_ in ScanRange and use those directly.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@50
PS2, Line 50: BytesRead
bytes_read() since it's just a trivial accessor


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@81
PS2, Line 81: GetLock
this is a plain accessor so prefer 'lock()'


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@93
PS2, Line 93:   ScanRange* scan_range_;
const? I don't think this changes after initialization.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.h@47
PS2, Line 47:   hdfsFS hdfs_fs_;
const, since I don't think this is reassigned


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@108
PS2, Line 108:   int64_t max_chunk_size = scan_range_->MaxReadChunkSize();
This part was just a mechanical move, right? Would be helpful to flag the parts that were just moved from one file to another (we can verify this ourselves but it's helpful to have a starting people).


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.h
File be/src/runtime/io/local-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/local-file-reader.h@40
PS2, Line 40:   FILE* file_ = nullptr;
Maybe briefly document how this fits into the lifecycle, e.g. set in Open() and set back to nullptr in Close().


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h@274
PS2, Line 274:   Status CancelStatus() const { return cancel_status_; }
This needs a little more thought - we needed to hold lock_ or scan_range_lock_ to read this member so don't want to expose this publicly here. Maybe could move it to private: and document that the FileReader classes access it directly while holding their lock.

trivial accessor, so cancel_status() if we keep it exposed.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/request-ranges.h@482
PS2, Line 482: hdfs_lock_
Comments referencing hdfs_lock_ need updating.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Sep 2018 17:03:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11444/7/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/7/be/src/runtime/io/hdfs-file-reader.h@69
PS7, Line 69:   /// Total number of bytes read remotely. This is necessary to maintain a count of
            :   /// the number of remote scan ranges. Since IO statistics can be collected multiple
            :   /// times for a scan range, it is necessary to keep some state about whether this
            :   /// scan range has already been counted as remote. There is also a requirement to
            :   /// log the number of unexpected remote bytes for a scan range. To solve both
            :   /// requirements, maintain num_remote_bytes_ on the ScanRange and push it to the
            :   /// reader_ once at the close of the scan range.
> This comment is probably hard to understand for someone who doesn't know (y
I extended the class comment of FileReader.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 10:03:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/file-reader.h@78
PS3, Line 78:   SpinLock lock_;
> Optional: switch to SpinLock to reduce overhead on uncontented acquisition 
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@93
PS2, Line 93: 
> I was thinking ScanRange* const scan_range_;. Using a reference is a bit in
Ok, switched to ScanRange* for consistency.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h@51
PS3, Line 51:   hdfsFS const hdfs_fs_;
> I should have been clearer - I meant "hdfsFS const hdfs_fs_" since we don't
Actually, since hdfsFS is a typedef it means the same:
https://stackoverflow.com/questions/8504411/typedef-pointer-const-weirdness

Anyway, I swapped the two words to cease confusion.

The hdfs api functions expect non-const pointed-to-memory as far as I can tell, so we can't really make the pointed-to memory const.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h@78
PS3, Line 78: Non-N
> Maybe "non-NULL"?
Done


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@108
PS2, Line 108:     hdfs_file = borrowed_hdfs_fh->file();
> Yeah if there's a large mechanical component to the changes you made I find
Yeah it is hard to do it in a diff-friendly way and I'm also not aware of any good tools that can show the moved code parts.

OK, I'll create another gerrit reply about PS1 and comment the origin of the copy-pasted parts so later reviewers might find it useful.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@171
PS3, Line 171:   if (bytes_read_ == scan_range_->len()) *eosr = true;
> Same comment about local-file-reader for simplifying *eosr logic.
I have the same concern. At L154 we set *eosr to true if the scan range went past the end of the file.

We might not want to set it to false here. Again, I'm not sure if it is a valid concern, but since I don't see DCHECKs against it I suppose it is valid.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@194
PS3, Line 194:             Substitute("Error seeking to $0 in file: $1: $2",
> I think we can just return the status here, no? We don't need the seek_fail
True, done!


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@204
PS3, Line 204:     }
> We can just return here I think
Done


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@210
PS3, Line 210:   {
> If we return early on the error paths, this just becomes return Status::OK(
Done


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc@87
PS3, Line 87:     }
> We returned on the first branch so we can avoid nested the else branch.
Done


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc@95
PS3, Line 95:   return Status::OK();
> Conditional needs to either be on one line or have parentheses. Or we could
We set *eosr to true at L90 when we reach eof, so probably we don't want to set it to false when bytes_read_ != scan_range_.len().

However, I don't really know if it's possible to have a scan range that runs over the end of the file. Maybe if we had open-ended scan ranges for reading everything from a given position until the end of file. Around L90 there are no DCHECKs like DCHECK_EQ(bytes_read_, scan_range_.len()), so I suppose it is possible.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/request-ranges.h@339
PS3, Line 339:   /// cancellation - CANCELLED if cancelled without error, or another status if an
> I also modified this comment in https://gerrit.cloudera.org/#/c/11464/, sor
It's OK, thanks for notifying I'll watch for this during the rebase.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 10:37:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 12:20:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#3).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 749 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 15:59:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/678/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Sep 2018 16:52:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#8).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 736 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@81
PS2, Line 81: st scan
> Done
totally nitpicking: I disagree in this case, because lock() can suggests to someone that the function actually locks something, while GetLock() is really unambiguous.


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@118
PS5, Line 118:       DCHECK_GE(chunk_size, 0);
I think that this should be DCHECK_GT - (bytes_to_read - *bytes_read) is positive because of the loop condition and max_chunk_size should be also positive.


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@131
PS5, Line 131:         // Do not retry:
             :         // - if read was successful (current_bytes_read != -1)
             :         // - or if already retried once
             :         // - or if this not using a borrowed file handle
             :         DCHECK_LE(num_retries, 1);
             :         if (current_bytes_read != -1 || borrowed_hdfs_fh == nullptr
             :             || num_retries == 1) {
             :           break;
             :         }
Now that the "logic that is retried" is much simpler I would consider to rewrite this "while(true)" loop to an "if" or "for":
status = ReadFromPosInternal(...)
if (!status.ok() &&  borrowed_hdfs_fh == nullptr) {
  ... retry logic
}


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@150
PS5, Line 150:       if (!status.ok())
A DCHECK could be added to ensure that current_bytes_read == -1 iff !status.ok(). This would make it more obvious that the rest of the loop does not have to deal wirh negative current_bytes_read.


http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@190
PS5, Line 190: seek_failed
This variable is no longer used.


http://gerrit.cloudera.org:8080/#/c/11444/4/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/4/be/src/runtime/io/local-file-reader.cc@55
PS4, Line 55:   unique_lock<SpinLock> fs_lock(lock_);
            :   DCHECK(scan_range_->read_in_flight());
            :   DCHECK_GE(bytes_to_read, 0);
            :   // Delay before acquiring the lock, to allow triggering IMPALA-6587 race.
            : #ifndef NDEBUG
            :   if (FLAGS_stress_disk_read_delay_ms > 0) {
            :     SleepForMs(FLAGS_stress_disk_read_delay_ms);
            :   }
            : #endif
The comment contradicts the code because the lock is acquired before the sleep. The original function and the HdfsFileReader version acquires the lock after the sleep.

It may also make sense to move this logic to a common function in FileReader that would call to something like protected ReadFromPosImpl().



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 13:38:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/733/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 12:09:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11444

to look at the new patch set (#5).

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 742 insertions(+), 398 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/5/be/src/runtime/io/hdfs-file-reader.cc@150
PS5, Line 150:       }
> Actually the other combination (when status is ok but current_bytes_read is
Oh that wasn't clear for me from the comment. That makes more sense. Done.


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/hdfs-file-reader.cc@131
PS6, Line 131:       // Retry if:
             :       // - first read was not successful
             :       // and
             :       // - used a borrowed file handle
> I find this comment to be a bit redundant, but feel free to keep it if you 
I'd rather keep it to emphasize the retry logic.


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc@22
PS6, Line 22: #include "runtime/io/request-ranges.h"
> Is this header really needed?
Removed


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/local-file-reader.cc@24
PS6, Line 24: 
> Is this header really needed?
Removed


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h@188
PS6, Line 188: HdfsFileReader;
> Does LocalFileReader actually use RequestContext's private members?
Removed


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-context.h@189
PS6, Line 189: 
> I would prefer to remove this friendship if it doesn't complicate code too 
I'd rather not touch that part of the code to keep this commit focused.


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h@70
PS6, Line 70: HdfsFileReader;
> Is this friendship really needed?
Removed


http://gerrit.cloudera.org:8080/#/c/11444/6/be/src/runtime/io/request-ranges.h@197
PS6, Line 197: 
> Is this friendship really needed?
Removed



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 16:18:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@132
PS2, Line 132:                 GetHdfsErrorMsg("Error reading from HDFS file: ", *scan_range_->file_string()));
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@142
PS2, Line 142:                       position_in_file, *scan_range_->file_string(), GetHdfsErrorMsg("")));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@207
PS2, Line 207:     if (scan_range_->external_buffer_tag_ == ScanRange::ExternalBufferTag::CACHED_BUFFER) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@265
PS2, Line 265:       hadoopReadZero(exclusive_hdfs_fh_->file(), io_mgr_->cached_read_options(), scan_range_->len());
line too long (101 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Sep 2018 18:00:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 15:00:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................

IMPALA-7556: part 1: handle different file systems via polymorphism

This commit reorganizes some parts of the ScanRange class.

File operations are handled through the abstract FileReader
class. The interface supports positional read that will
be needed by IMPALA-5843. The concrete file operations are
implemented in sub-classes LocalFileReader and
HdfsFileReader.

File reader classes are responsible for setting counters and
metrics related to file operations.

The core logic haven't been changed significantly, but quite
a lot code fragments were relocated.

Testing: Debug exhaustive tests passed

Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-reader.cc
A be/src/runtime/io/file-reader.h
A be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-reader.h
A be/src/runtime/io/local-file-reader.cc
A be/src/runtime/io/local-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
11 files changed, 765 insertions(+), 412 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11444/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 7: Code-Review+1

I'll let Csaba finish the review.


-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 18:14:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h@51
PS3, Line 51:   const hdfsFS hdfs_fs_;
> Actually, since hdfsFS is a typedef it means the same:
Oh I didn't know that, interesting!


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc@95
PS3, Line 95:   if (bytes_read_ == scan_range_.len())
> We set *eosr to true at L90 when we reach eof, so probably we don't want to
I missed that it was set upon a partial read, that makes sense. It is possible to have a scan range past the end of the file if the metadata is inconsistent with the actual file length.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 18:14:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7556: part 1: handle different file systems via polymorphism

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11444 )

Change subject: IMPALA-7556: part 1: handle different file systems via polymorphism
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/file-reader.h@78
PS3, Line 78:   // Debug string of this file reader.
Optional: switch to SpinLock to reduce overhead on uncontented acquisition path.


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/file-reader.h@93
PS2, Line 93:   ScanRange* scan_range_;
> I changed it to a reference.
I was thinking ScanRange* const scan_range_;. Using a reference is a bit inconsistent with some other parts of the code but I think captures the intent.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h@51
PS3, Line 51:   /// 2. The scan range is using hdfs caching.
I should have been clearer - I meant "hdfsFS const hdfs_fs_" since we don't expect the pointer to be reassigned. It's good we can make the pointed-to memory const too. Ok to ignore, this is fine.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.h@78
PS3, Line 78: 
Maybe "non-NULL"?


http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/2/be/src/runtime/io/hdfs-file-reader.cc@108
PS2, Line 108:   int64_t max_chunk_size = scan_range_->MaxReadChunkSize();
> E.g. should I annotate PS1 with gerrit comments? It is the closest PS to th
Yeah if there's a large mechanical component to the changes you made I find it helpful to understand what you did so I can focus on checking the correctness of the mechanical transformation instead of re-reviewing existing code. There's some subtle edge cases in this code so I was mainly wanting to make sure that we preserved them (we should have test cases for them all but I wouldn't bet my life on it).

Thomas did something nice with https://gerrit.cloudera.org/#/c/10394/, although that patch was way more unwieldy than this.

I sometimes review these things by manually putting the before and after moved code and diffing the two files (although I suspect there are some tools to better automate that that I don't know about yet).

I did that for this and I'm basically content now.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@171
PS3, Line 171:             io_mgr_->ReopenCachedHdfsFileHandle(hdfs_fs_, scan_range_->file_string(),
Same comment about local-file-reader for simplifying *eosr logic.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@194
PS3, Line 194:     return status;
I think we can just return the status here, no? We don't need the seek_failed stuff post-refactor.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@204
PS3, Line 204:   if (exclusive_hdfs_fh_ != nullptr) {
We can just return here I think


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/hdfs-file-reader.cc@210
PS3, Line 210:       scan_range_->external_buffer_tag_ = ScanRange::ExternalBufferTag::NO_BUFFER;
If we return early on the error paths, this just becomes return Status::OK().


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc
File be/src/runtime/io/local-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc@87
PS3, Line 87:     } else {
We returned on the first branch so we can avoid nested the else branch.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/local-file-reader.cc@95
PS3, Line 95:   if (bytes_read_ == scan_range_->len())
Conditional needs to either be on one line or have parentheses. Or we could actually just rewrite as 

  *eosr = bytes_read_ == scan_range_.len()

and avoid initialising above.


http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11444/3/be/src/runtime/io/request-ranges.h@339
PS3, Line 339:   /// END: private members that are accessed by other io:: classes
I also modified this comment in https://gerrit.cloudera.org/#/c/11464/, sorry for the rebase conflict in advance.



-- 
To view, visit http://gerrit.cloudera.org:8080/11444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3d3d2d774075008285230606b992603d5be1a82
Gerrit-Change-Number: 11444
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Sep 2018 22:22:46 +0000
Gerrit-HasComments: Yes