You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2016/12/15 23:47:57 UTC

[kudu-CR] spark: continue scanning after encountering empty batch

Hello Jean-Daniel Cryans,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/5531

to review the following change.

Change subject: spark: continue scanning after encountering empty batch
......................................................................

spark: continue scanning after encountering empty batch

The Spark connector would previously stop scanning after the first empty
batch returned by a tablet server. The tablet server will not return an
empty batch when there are rows remaining in the tablet unless the scan
hits an internal timeout of 500ms[1]. This can only realistically happen
on large scans with highly selective predicates on data not in the block
cache. As a result this behavior only occurs with very large tables on
slow tablet server, which makes it very hard to test.  No unit tests are
included with this patch, but the fix has been verified on a real
cluster exhibiting the issue.

[1] https://github.com/apache/kudu/blob/2ed179a7a188b4748a43a829940764ab5dddbc1c/src/kudu/tserver/tablet_service.cc#L1670

Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/5531/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>

[kudu-CR] spark: continue scanning after encountering empty batch

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: spark: continue scanning after encountering empty batch
......................................................................


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] spark: continue scanning after encountering empty batch

Posted by "Chris George (Code Review)" <ge...@cloudera.org>.
Chris George has posted comments on this change.

Change subject: spark: continue scanning after encountering empty batch
......................................................................


Patch Set 1: Code-Review+1

I did a local build of kudu-spark and tested against our cluster that was having this issue and it definitely fixed it with no side affects.

-- 
To view, visit http://gerrit.cloudera.org:8080/5531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] spark: continue scanning after encountering empty batch

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: spark: continue scanning after encountering empty batch
......................................................................


spark: continue scanning after encountering empty batch

The Spark connector would previously stop scanning after the first empty
batch returned by a tablet server. The tablet server will not return an
empty batch when there are rows remaining in the tablet unless the scan
hits an internal timeout of 500ms[1]. This can only realistically happen
on large scans with highly selective predicates on data not in the block
cache. As a result this behavior only occurs with very large tables on
slow tablet server, which makes it very hard to test.  No unit tests are
included with this patch, but the fix has been verified on a real
cluster exhibiting the issue.

[1] https://github.com/apache/kudu/blob/2ed179a7a188b4748a43a829940764ab5dddbc1c/src/kudu/tserver/tablet_service.cc#L1670

Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Reviewed-on: http://gerrit.cloudera.org:8080/5531
Tested-by: Kudu Jenkins
Reviewed-by: Chris George <ch...@rms.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Chris George: Looks good to me, but someone else must approve
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>