You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2017/01/07 00:31:52 UTC
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Hello Jean-Daniel Cryans, Todd Lipcon,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/5636
to review the following change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
KUDU-1824. KuduRDD.collect fails because of NoSerializableException
This also fixes a few style issues.
Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 46 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/1
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 3:
Any chance you can run a sum(l_linenumber) or sum(l_tax) as well? count() is kind of a special case
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 3:
The count operation was pulling back all columns from Kudu in the table, is that what you wanted to test?
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#2).
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
KUDU-1824. KuduRDD.collect fails because of NoSerializableException
This also fixes a few style issues.
Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 45 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/2
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5636
to look at the new patch set (#3).
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
KUDU-1824. KuduRDD.collect fails because of NoSerializableException
The internal KuduRow class has been removed, and instead we copy into a
serializable Spark row format.
This also fixes a few style issues.
Change-Id: I42618188003d2eef66088f3101803d1750e4134b
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
3 files changed, 45 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5636/3
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 3: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 4:
I think it would have if I'd used a SparkSQL "select count(*) ..." query, but I manually created an RDD including all of the columns, and then called count on that.
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 2:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/5636/2//COMMIT_MSG
Commit Message:
Line 7: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
> would be good to explain the approach in the commit message
Done
http://gerrit.cloudera.org:8080/#/c/5636/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:
Line 120: override def next(): Row = {
> does this now introudce an extra allocation/copy in the non-RDD case (DataF
I'm not entirely sure. I don't fully understand how these objects were previously being serialized. I guess the RDD is able to reach into Kudu an serialize our internal row block, and is smart enough to only do it once (not once per-row). Honestly I'm not sure how we would fix this, while keeping that behavior for the RDD case without copying all this code again.
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 2:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/5636/2//COMMIT_MSG
Commit Message:
Line 7: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
would be good to explain the approach in the commit message
http://gerrit.cloudera.org:8080/#/c/5636/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:
Line 120: override def next(): Row = {
does this now introudce an extra allocation/copy in the non-RDD case (DataFrame) as well? It seems like we should avoid a performance regression on the SparkSQL/DataFrame use case if the bug didn't affect those
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
KUDU-1824. KuduRDD.collect fails because of NoSerializableException
The internal KuduRow class has been removed, and instead we copy into a
serializable Spark row format.
This also fixes a few style issues.
Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Reviewed-on: http://gerrit.cloudera.org:8080/5636
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
A java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
2 files changed, 44 insertions(+), 22 deletions(-)
Approvals:
Todd Lipcon: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 3:
Oh really? I thought count() was smart enough to issue a column-less scan... although the fact that it took 1097 seconds now that you mention it seems like evidence to the contrary.
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1824. KuduRDD.collect fails because of NoSerializableException
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.
Change subject: KUDU-1824. KuduRDD.collect fails because of NoSerializableException
......................................................................
Patch Set 3:
Just ran a big count job an a lineitem table, and this patch made it about 5.7% slower (1160 seconds vs 1097 seconds)
--
To view, visit http://gerrit.cloudera.org:8080/5636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I42618188003d2eef66088f3101803d1750e4134b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No