You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Lars Volker (Code Review)" <ge...@cloudera.org> on 2016/10/25 00:03:13 UTC

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4828

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................

IMPALA-4223: Handle truncated file read from HDFS cache

While overwriting files on HDFS via Hive it can happen that Impala sees
a partially written, cached file. In these cases we did not correctly
handle the partial cached read.

This change adds a check and triggers a fall back to disk reads for such
errors. If the file is partially written to disk, too, then the query
will report a file corruption warning through the disk read path.

Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
1 file changed, 13 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4828/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

(1 comment)

The change makes sense and we should get it in regardless, but could this have caused the original crash we saw?

http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 445
This is definitely suspicious.

Sometimes I feel like some of the DCHECKS were expressions of hope rather than assertions.


-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

(1 comment)

Can we test this by having the test load metadata and then truncate a cached file?

http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

PS1, Line 448: disk
how about "uncached" instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has submitted this change and it was merged.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


IMPALA-4223: Handle truncated file read from HDFS cache

While overwriting files on HDFS via Hive it can happen that Impala sees
a partially written, cached file. In these cases we did not correctly
handle the partial cached read.

This change adds a check and triggers a fall back to disk reads for such
errors. If the file is partially written to disk, too, then the query
will report a file corruption warning through the disk read path.

Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Reviewed-on: http://gerrit.cloudera.org:8080/4828
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Marcel Kornacker <ma...@cloudera.com>
---
M be/src/runtime/disk-io-mgr-scan-range.cc
1 file changed, 13 insertions(+), 5 deletions(-)

Approvals:
  Marcel Kornacker: Verified
  Tim Armstrong: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > > (1 comment)
 > > >
 > > > Can we test this by having the test load metadata and then
 > > truncate
 > > > a cached file?
 > >
 > > We can try to do this in a custom cluster test. It needs to
 > follow
 > > the steps outlined here: https://issues.cloudera.org/browse/IMPALA-4223?focusedCommentId=212776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-212776
 > >
 > > They require changes to the system limits to allow for larger
 > > cached files, and to the data nodes to increase the file cache as
 > > well, so they might be rather disruptive. Once these are changed
 > we
 > > can write a custom cluster test to download, truncate, and upload
 > > files and run queries over them, checking that the correct log
 > > messages appear.
 > >
 > > Should we break this out into several Jiras / changes? The limits
 > > should be change in impala-setup, too. The datanode settings
 > change
 > > will be required to integrate this into fuzz testing, too.
 > >
 > > I will also have a look at the scanner fuzz test and see if it is
 > > easy to inject these error there.
 > 
 > Yes, if it's a good amount of work to set up that kind of test,
 > let's save it for another step.  It's probably better to spend time
 > with getting the fuzzer working rather than this one particular
 > test case.

I Opened IMPALA-4394 and IMPALA-4395 to track both adding tests for this change and adding Hdfs caching to the fuzz tests.

http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

PS1, Line 448: disk
> Okay. I'm fine either leaving this one as "disk" or changing both.
Thanks. I'll leave it to keep any tooling that searches for this working.


-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > Can we test this by having the test load metadata and then
 > truncate
 > > a cached file?
 > 
 > We can try to do this in a custom cluster test. It needs to follow
 > the steps outlined here: https://issues.cloudera.org/browse/IMPALA-4223?focusedCommentId=212776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-212776
 > 
 > They require changes to the system limits to allow for larger
 > cached files, and to the data nodes to increase the file cache as
 > well, so they might be rather disruptive. Once these are changed we
 > can write a custom cluster test to download, truncate, and upload
 > files and run queries over them, checking that the correct log
 > messages appear.
 > 
 > Should we break this out into several Jiras / changes? The limits
 > should be change in impala-setup, too. The datanode settings change
 > will be required to integrate this into fuzz testing, too.
 > 
 > I will also have a look at the scanner fuzz test and see if it is
 > easy to inject these error there.

Yes, if it's a good amount of work to set up that kind of test, let's save it for another step.  It's probably better to spend time with getting the fuzzer working rather than this one particular test case.

http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

PS1, Line 448: disk
> I followed the error from L433, assuming that since we already report a sim
Okay. I'm fine either leaving this one as "disk" or changing both.


-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1: Code-Review+1

Change looks good to me, I'm not convinced this was the problem we hit in IMPALA-4223 though, because it would not have made it this far when there was an error seeking in Open().

I think there's some protection against opening the wrong version of a HDFS file because the mtime of the cached file is checked (this check is definitely a good idea though)

-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

> (1 comment)
 > 
 > The change makes sense and we should get it in regardless, but
 > could this have caused the original crash we saw?

I don't think this caused what we saw. I removed the DCHECK to see what would happen, since the original stack was from a Release build, but ran into another DCHECk. After removing that one (and a third one), too, the queries just started to "work", reporting a different warning about not being able to read from the file. The same also happened with a Release build, which I was unable to crash.

-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-4223: Handle truncated file read from HDFS cache
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > Can we test this by having the test load metadata and then truncate
 > a cached file?

We can try to do this in a custom cluster test. It needs to follow the steps outlined here: https://issues.cloudera.org/browse/IMPALA-4223?focusedCommentId=212776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-212776

They require changes to the system limits to allow for larger cached files, and to the data nodes to increase the file cache as well, so they might be rather disruptive. Once these are changed we can write a custom cluster test to download, truncate, and upload files and run queries over them, checking that the correct log messages appear.

Should we break this out into several Jiras / changes? The limits should be change in impala-setup, too. The datanode settings change will be required to integrate this into fuzz testing, too.

I will also have a look at the scanner fuzz test and see if it is easy to inject these error there.

http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

PS1, Line 448: disk
> how about "uncached" instead?
I followed the error from L433, assuming that since we already report a similar case it made sense to be consistent. Should I change both?


-- 
To view, visit http://gerrit.cloudera.org:8080/4828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes