You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2016/12/08 20:21:57 UTC

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Will Berkeley has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5425

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................

[spark] KUDU-1641 Pushdown SparkSQL In predicates

This patch adds support for pushing down "IN (values)" predicates to Kudu
using the in list predicate.

Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
2 files changed, 61 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/5425/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 2:

> Proceed with this patch and file an issue to implement an in list limit?

I think so.  Todd, is that OK with you?

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 3:

My worry is that at some point we'll cross over the max RPC size limit when describing the scan spec... I guess it would be better to handle this at the Java client layer, and again check it at the server, though. Mind filing a JIRA to add such a limit somewhere?

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 3:

See https://issues.apache.org/jira/browse/KUDU-1803

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5425

to look at the new patch set (#3).

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................

[spark] KUDU-1641 Pushdown SparkSQL In predicates

This patch adds support for pushing down "IN (values)" predicates to Kudu
using the in list predicate.

Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/5425/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5425

to look at the new patch set (#2).

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................

[spark] KUDU-1641 Pushdown SparkSQL In predicates

This patch adds support for pushing down "IN (values)" predicates to Kudu
using the in list predicate.

Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
2 files changed, 43 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/5425/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 2:

> Until we implement a max in the Kudu server I don't think it makes
 > sense to limit the spark connector.

I think I agree with this. I also think it would be nice to have some limit. What's the next step? Proceed with this patch and file an issue to implement an in list limit?

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 2:

Until we implement a max in the Kudu server I don't think it makes sense to limit the spark connector.

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5425/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 208:     assert(values.nonEmpty, "in list array must be nonempty")
Do we need to check this?  Kudu should do the right thing and return an empty result set.


Line 209:     values match {
I think this block will be cleaner if you do the `values.map` part inside the match, assign it to a variable, and then call KuduPredicate.newInListPredicate only once.


-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5425/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 208:     assert(values.nonEmpty, "in list array must be nonempty")
> Do we need to check this?  Kudu should do the right thing and return an emp
No longer applicable. See below.


Line 209:     values match {
> I think this block will be cleaner if you do the `values.map` part inside t
Actually the entire match block is unnecessary. The function can just create the in list predicate immediately.


-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


[spark] KUDU-1641 Pushdown SparkSQL In predicates

This patch adds support for pushing down "IN (values)" predicates to Kudu
using the in list predicate.

Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Reviewed-on: http://gerrit.cloudera.org:8080/5425
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
2 files changed, 45 insertions(+), 1 deletion(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 2:

> (1 comment)

There are no limits client- or server-side in Kudu, or in Spark. What would be reasonable? People do run queries with at least hundreds of things in the in clause. Any sense on a good default? 1024?

-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-1641 Pushdown SparkSQL In predicates

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [spark] KUDU-1641 Pushdown SparkSQL In predicates
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5425/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 213:     KuduPredicate.newInListPredicate(table.getSchema.getColumn(column), values.toList.asJava)
does Spark have any limit on the max number of elements in the IN list? do we? probably a good idea to put in some kind of limit here beyond which we don't try to push?


-- 
To view, visit http://gerrit.cloudera.org:8080/5425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia76dcbc0b5c5dcb2d28b468d8ed305b9f8eb8b28
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes