You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2016/06/06 11:23:12 UTC

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3313

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................

IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. This
results in re-issuing the whole set of scan ranges and hence a severe
performance hit. This patch just sets the file read offset position
to 0 if the ReadFromCache() call fails.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Testing: Its difficult to simulate failed cache reads in tests. So,
I tested this manually by adding additional logging in Read() and
ReadFromCache(). Also the queries mentioned above on 1gb dataset
sped up with performance close to non-cached query runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
1 file changed, 7 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.

Michael Brown has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3313/8/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS8, Line 101:     if self.exploration_strategy() != 'exhaustive': pytest.skip()
Unfortunately this means this test won't ever be run, because the workload is not set to functional-query, and for historical reasons not clear to me, the way we call impala-py.test is only to set the workload exploration strategy of the functional-query workload (see bin/run-all-tests.sh, L89-90).


PS8, Line 121:       while not is_path_fully_cached(encrypted_table_dir): time.sleep(2)
What about a max timeout here so we don't loop indefinitely?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/6/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl:

PS6, Line 88:     <value>1073741824</value>
> Does this require us to still use ulimit -l or edit limits.conf? If yes, th
I figured out a way to rewrite the test without having to modify any of the existing configs. Please check PS8. Thanks for the inputs so far.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG
Commit Message:

PS1, Line 12: re-issuing the whole set of scan ranges
> Thanks.  Can you incorporate this in the commit message or JIRA?
Done


Line 23: sped up with performance close to non-cached query runs.
> I think we need a way to exercise this path in testing.  For instance, the 
I think we can use IMPALA-3679 for this, where ZCRs are known to fail with caching in encryption zones. I wrote a simple test for this, let me know if you think its ok.


http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 432:       hdfsSeek(fs_, hdfs_file_->file(), offset_);
> Regarding Open(), I was wondering why the call to Open() in DiskIoMgr::Read
I didn't call Close() initially as it is heavier than hdfsSeek() but yea I agree with your point that it should undo all the side effects of Open(). Changed it now.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 5: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3313/5/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS5, Line 96: @pytest.mark.execute_serially
does it need to run serially or would unique_database be enough?


PS5, Line 113: /test-warehouse/tpch.orders/*.tbl
this should also have get_fs_path (though in practice it won't matter)


PS5, Line 130: __cleanup()
why is this needed given line 132?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG
Commit Message:

PS1, Line 12: re-issuing the whole set of scan ranges
what "set" of scan ranges? and where does this happen?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 3:

(2 comments)

Yes the test caught the wrong offset bug. Due to the wrong offset, the no. of rows scanned from cached_orders were more than the original.

http://gerrit.cloudera.org:8080/#/c/3313/3/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 535:         range->Close();
> i think it would be better to put this in the ReadFromCache() function so t
Actually I tried that first but it creates a deadlock on hdfs_lock_. Both Close() and ReadFromCache() try to acquire it. So I moved it to the callers.


http://gerrit.cloudera.org:8080/#/c/3313/3/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

Line 118:       self.execute_query("ALTER TABLE tpch.cached_orders set cached in 'testPool'")
> we may need to skip this on S3/isilon/dssd.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has submitted this change and it was merged.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we end up creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Reviewed-on: http://gerrit.cloudera.org:8080/3313
Reviewed-by: Michael Brown <mi...@cloudera.com>
Tested-by: Bharath Vissapragada <bh...@cloudera.com>
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 83 insertions(+), 5 deletions(-)

Approvals:
  Michael Brown: Looks good to me, approved
  Bharath Vissapragada: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 4:

(1 comment)

Sorry this got stuck in my "drafts".

http://gerrit.cloudera.org:8080/#/c/3313/4/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 431:       return Status::OK();
can't this block be moved outside of the lock, and then you can put the Close() call here?  cached_buffer_ is read on line 436 so it must not be necessary to hold the lock for this check, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 8:

Code changes still look fine, please get Michael B. +1 for the test stuff and then you can carry forward my +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 432:       hdfsSeek(fs_, hdfs_file_->file(), 0);
how did this work before, and how does this work?  Shouldn't we seek to file offset 'offset_'?  Actually, why is this needed since won't we call ScanRange::Open() before doing the normal file read anyway?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG
Commit Message:

PS1, Line 12: re-issuing the whole set of scan ranges
> what "set" of scan ranges? and where does this happen?
When the ReadFromCache() fails, it increments the internal file read offset but doesn't read any data. Then we  fallback to ReadRange(), which again fails because we try to read past the file_size (due to incorrect offset). Then we enter [1], where impalad thinks the data is still left to be read and tries to read past scan_range and hence uses DEFAULT_READ_PAST_SIZE(=1KB). This repeatedly starts a large no. of 1KB size scanranges till the whole range is read. 

[1] https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/scanner-context.cc#L145


http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 432:       hdfsSeek(fs_, hdfs_file_->file(), 0);
> how did this work before, and how does this work?  Shouldn't we seek to fil
- Yea sorry, this should be offset_. hdfs seems to be incrementing its internal file offset even though the read fails and when we call hdfsReadFile() after fallback, its resuming from that offset again, because we are reusing hdfs_file_->file() object. So per my understanding, this is still required even if we are not calling ScanRange::Open()

- Regarding how this worked before, I replied to your comment on the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6:

I ran the tests locally. However I couldn't run the exhaustive tests as the jenkins slaves have "ulimit -l" set to very low (64kb) (A higher value is required since we are trying to cache more than 100mb).  So this change is dependent on an aux fix to run properly. I'm looking into it.

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3313/8/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS8, Line 101:     if self.exploration_strategy() != 'exhaustive': pytest.skip()
> Unfortunately this means this test won't ever be run, because the workload 
As discussed over chat, moved this to a separate class and added a TODO.


PS8, Line 121:       while not is_path_fully_cached(encrypted_table_dir): time.sleep(2)
> What about a max timeout here so we don't loop indefinitely?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we endup creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
4 files changed, 50 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3313/6//COMMIT_MSG
Commit Message:

PS6, Line 13: endup
> Typo: "endup"
Done


http://gerrit.cloudera.org:8080/#/c/3313/6/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl:

PS6, Line 88:     <value>1073741824</value>
> Is this a reasonable default value for our minicluster, or is this needed o
Currently only test_hdfs_caching_fallback_path  caches a larger table (~120MB). (This is to test a specific code path that was untested before), So ideally I can reduce it to say 200MB and still get the work done. As such I don't see any risks of changing this as it is quite common in any deployment that intends to use hdfs caching.


http://gerrit.cloudera.org:8080/#/c/3313/6/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS6, Line 105: get_random_id(5))
> Instead of this, you could use the "testid_checksum" fixture, along with a 
Done


PS6, Line 114: execute_query
> execute_query_expect_success() here and in other places?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.

Michael Brown has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 9: Code-Review+2

If you've verified this works in a private Jenkins build with "exhaustive" enabled, LGTM.

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has uploaded a new patch set (#4).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we endup creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
4 files changed, 67 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3313

to look at the new patch set (#6).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we endup creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 62 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3313

to look at the new patch set (#9).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we end up creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 83 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 4:

Dan, do I need to make any further changes here? Thanks for looking into this.

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................

IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. This
results in re-issuing the whole set of scan ranges and hence a severe
performance hit. This patch just sets the file read offset position
to the beginning of scan range, if the ReadFromCache() call fails.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Testing: Its difficult to simulate failed cache reads in tests. So,
I tested this manually by adding additional logging in Read() and
ReadFromCache(). Also the queries mentioned above on 1gb dataset
sped up with performance close to non-cached query runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
1 file changed, 7 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.

Michael Brown has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/6/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl:

PS6, Line 88:     <value>1073741824</value>
> Currently only test_hdfs_caching_fallback_path  caches a larger table (~120
Does this require us to still use ulimit -l or edit limits.conf? If yes, that'll be a burden on everybody to change their dev machines, and also on Jenkins infrastructure, both Cloudera and soon Apache. I think this merits bringing attention to members of dev@impala.incubator.apache.org . We could get a consensus on what people would live with; or others might have ideas how to get around the problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.

David Knupp has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/5/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS5, Line 130: __cleanup()
> why is this needed given line 132?
[Disclaimer: still figuring stuff out here, so apologies in advance if my comments don't make sense.]

My guess is that the intention is for __cleanup() to run whether or not an exception is thrown. Rather than calling it in two places, adding a finally: clause to the block would accomplish that:

  try:
      # do stuff
  except:
      # oh no!
  finally:
      __cleanup()

That said, if the unique_database fixture is used (as a previous comment suggested), that might obviate the need entirely to have a __cleanup() function, since the fixture cleans up after itself. In that case, you could simply move the single line that invokes 'hdfs dfs -rm...' to the finally clause.

Also as noted, if unique_database is used, then it's *probably* OK to remove the execute_serially marker.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3313/4/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 431:       return Status::OK();
> can't this block be moved outside of the lock, and then you can put the Clo
Correct. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 9: Verified+1

Exhaustive build passed: http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-and-test/3565

(test_admission_control failed but thats unrelated - IMPALA-3772)

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3313

to look at the new patch set (#7).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we end up creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 62 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3313

to look at the new patch set (#8).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we end up creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 64 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3313/5/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS5, Line 96: @pytest.mark.execute_serially
> does it need to run serially or would unique_database be enough?
Sorry forgot about the unique_db fixture. Added it now. This isn't required anymore.


PS5, Line 113: /test-warehouse/tpch.orders/*.tbl
> this should also have get_fs_path (though in practice it won't matter)
Done


PS5, Line 130: __cleanup()
> [Disclaimer: still figuring stuff out here, so apologies in advance if my c
Oops, I forgot about the unique db fixture. Added it and refactored the cleanup. made it a try-except-finally. Thanks David.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG
Commit Message:

PS1, Line 12: re-issuing the whole set of scan ranges
> When the ReadFromCache() fails, it increments the internal file read offset
Thanks.  Can you incorporate this in the commit message or JIRA?


Line 23: sped up with performance close to non-cached query runs.
I think we need a way to exercise this path in testing.  For instance, the bug where 0 should have been offset_ went unnoticed.  Is there no way to get things into a state where the metadata thinks the file is cached but it isn't cached (presumably causing hadoopReadZero() to fail)?


http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 432:       hdfsSeek(fs_, hdfs_file_->file(), 0);
> - Yea sorry, this should be offset_. hdfs seems to be incrementing its inte
Regarding Open(), I was wondering why the call to Open() in DiskIoMgr::ReadRange() doesn't do this seek for us, but now I see that Open() returns early if hdfs_file_ != NULL.

Rather than do the seek here, maybe we'd be better off calling Close().  That way, we're undoing all possible side effects of Open(), which is called in this function.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6: Code-Review+2

Looks good to me. Please just make sure you've rerun the new test since making those changes.

-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 3:

(2 comments)

Did the test show the bug with always seeking to file offset 0?

http://gerrit.cloudera.org:8080/#/c/3313/3/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 535:         range->Close();
i think it would be better to put this in the ReadFromCache() function so that callers don't have to know how to undo the side-effects.


http://gerrit.cloudera.org:8080/#/c/3313/3/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

Line 118:       self.execute_query("ALTER TABLE tpch.cached_orders set cached in 'testPool'")
we may need to skip this on S3/isilon/dssd.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.

Michael Brown has posted comments on this change.

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3313/6//COMMIT_MSG
Commit Message:

PS6, Line 13: endup
Typo: "endup"


http://gerrit.cloudera.org:8080/#/c/3313/6/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl:

PS6, Line 88:     <value>1073741824</value>
Is this a reasonable default value for our minicluster, or is this needed only for test_hdfs_caching_fallback_path ? Note this is much larger. What are the risks in changing this?

Also, as you stated privately, this requires a ulimit -l change on Jenkins slaves and desktop machines. Let's be sure this is the right way to go.


http://gerrit.cloudera.org:8080/#/c/3313/6/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

PS6, Line 105: get_random_id(5))
Instead of this, you could use the "testid_checksum" fixture, along with a prefix. This is also how unique_database gets its name. Or you could even cheat and use the unique_database string directly.


PS6, Line 114: execute_query
execute_query_expect_success() here and in other places?


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.

Bharath Vissapragada has uploaded a new patch set (#5).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we endup creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 65 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>