You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2017/01/07 00:31:52 UTC

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Hello Jean-Daniel Cryans, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/5636

to review the following change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................

KUDU-1824. KuduRDD.collect fails because of NoSerializableException

This also fixes a few style issues.

Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 46 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 3:

Any chance you can run a sum(l_linenumber) or sum(l_tax) as well? count() is kind of a special case

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 3:

The count operation was pulling back all columns from Kudu in the table, is that what you wanted to test?

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#2).

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................

KUDU-1824. KuduRDD.collect fails because of NoSerializableException

This also fixes a few style issues.

Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 45 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5636

to look at the new patch set (#3).

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................

KUDU-1824. KuduRDD.collect fails because of NoSerializableException

The internal KuduRow class has been removed, and instead we copy into a
serializable Spark row format.

This also fixes a few style issues.

Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 45 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 4:

I think it would have if I'd used a SparkSQL "select count(*) ..." query, but I manually created an RDD including all of the columns, and then called count on that.

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5636/2//COMMIT_MSG
Commit Message:

Line 7: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
> would be good to explain the approach in the commit message
Done


http://gerrit.cloudera.org:8080/#/c/5636/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

Line 120:   override def next(): Row = {
> does this now introudce an extra allocation/copy in the non-RDD case (DataF
I'm not entirely sure.  I don't fully understand how these objects were previously being serialized.  I guess the RDD is able to reach into Kudu an serialize our internal row block, and is smart enough to only do it once (not once per-row).  Honestly I'm not sure how we would fix this, while keeping that behavior for the RDD case without copying all this code again.


-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5636/2//COMMIT_MSG
Commit Message:

Line 7: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
would be good to explain the approach in the commit message


http://gerrit.cloudera.org:8080/#/c/5636/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

Line 120:   override def next(): Row = {
does this now introudce an extra allocation/copy in the non-RDD case (DataFrame) as well? It seems like we should avoid a performance regression on the SparkSQL/DataFrame use case if the bug didn't affect those


-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


KUDU-1824. KuduRDD.collect fails because of NoSerializableException

The internal KuduRow class has been removed, and instead we copy into a
serializable Spark row format.

This also fixes a few style issues.

Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Reviewed-on: http://gerrit.cloudera.org:8080/5636
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
2 files changed, 44 insertions(+), 22 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 3:

Oh really? I thought count() was smart enough to issue a column-less scan... although the fact that it took 1097 seconds now that you mention it seems like evidence to the contrary.

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 3:

Just ran a big count job an a lineitem table, and this patch made it about 5.7% slower (1160 seconds vs 1097 seconds)

-- 
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No