You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yao Xu (Code Review)" <ge...@cloudera.org> on 2019/10/31 02:49:37 UTC

[kudu-CR] [spark] Add prefetching option to kudu-spark

Yao Xu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14598


Change subject: [spark] Add prefetching option to kudu-spark
......................................................................

[spark] Add prefetching option to kudu-spark

We have already supported the scanner prefetching feature in the previous
patches. With the prefetching, the time for the spark task to read kudu data
can be greatly reduced in some scenarios. Therefore, I added prefetching
option for kudu-spark in this patch.

Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M src/kudu/client/client.proto
9 files changed, 44 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/14598/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>

[kudu-CR] [spark] Add prefetching option to kudu-spark

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
......................................................................


Patch Set 1:

> (1 comment)
 > 
 > Thanks for finding and fixing this bug. Could we break this into
 > two patches for clarity? The first patch should fix exposing
 > prefetching in the scan token, and the second should expose
 > prefetching to spark.

Ok, I will break this patch into two patches.
I think it's better to set it to false for the time being, because prefetching means that the spark task needs more memory, which may cause problems for the stability of existing spark jobs, such as memory out of limits.


-- 
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 07:59:01 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] Add prefetching option to kudu-spark

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
......................................................................


Patch Set 1:

> (1 comment)

Maybe we can use the existing testcase to test the function of prefetching, I will  take a look.


-- 
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 08:01:22 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] Add prefetching option to kudu-spark

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11
PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added prefetching
> Out of curiosity, have you seen a performance increase when using pre-fetch
I share Grant's curiosity. I am also a little anxious about advertising it more widely given that it has no automated testing at all (see KUDU-1260). For more context, this client-side prefetching thing was inherited from asynchbase when we first built the Java client; it wasn't explicitly added to Kudu.



-- 
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 31 Oct 2019 19:52:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add prefetching option to kudu-spark

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

Thanks for finding and fixing this bug. Could we break this into two patches for clarity? The first patch should fix exposing prefetching in the scan token, and the second should expose prefetching to spark.

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11
PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added prefetching
Out of curiosity, have you seen a performance increase when using pre-fetching? Do you have a quantified example? 

Should we consider setting the default to true? why or why not?



-- 
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 31 Oct 2019 13:27:21 +0000
Gerrit-HasComments: Yes