You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/11/01 18:00:38 UTC
[kudu-CR] [examples] Add basic Spark example (scala)

Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(15 comments)

I tested this locally with a multimaster cluster and with a YARN cluster running Spark and a single-master Kudu cluster. It worked as expected in both cases.

However, I think we should remove the ability to run the jar standalone and require spark-submit to be used. This is, AFAICT, typical for Spark applications, and it simplifies the configuration of the job.

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@32
PS10, Line 32: To build and run, ensure maven is installed and from the spark-example directory
             : run
Can you specify what you need for this to work? Do you need Spark running somewhere? A Kudu cluster somewhere?


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@38
PS10, Line 38: $ java -jar target/kudu-spark-example-1.0-SNAPSHOT.jar
After poking around the internet for a bit, I think we shouldn't try to make this runnable standalone. It should require spark-submit. If we do that, we can remove the very awkward double specifying of the Spark master.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: Host
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: \
You can't break the line in the middle of a single-quotes literal, since every character inside them is treated literally by the shell.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@17
PS10, Line 17: org.apache.kudu.examples
I think this should be org.apache.kudu.spark.examples.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS10, Line 32:   val kuduMasters: String = System.getProperty("kuduMasters","localhost:7051")   // Kudu master address list.
Spaces after commas here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS10, Line 34: val sparkMaster: String = System.getProperty("sparkMaster","local")
If we only support running this example with spark-submit, which seems to be the only thing done in practice, we don't need this parameter, and invoking the example will be less awkward because the Spark master will only be specified once, and in the normal Spark way.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS10, Line 40: Defining
Define


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@44
PS10, Line 44:   case class User(name:String,id:Int)
nit: space here


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS10, Line 47: 
nit: remove extra line


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@51
PS10, Line 51: SparkExample
KuduSparkExample


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@54
PS10, Line 54: Importing
Import


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@67
PS10, Line 67: 1 replica per tablet
Let's do the default number of replicas (i.e. remove setting the number of replicas explicitly). That lowers the chance for funny business if somebody copies this and builds it into a production job.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@102
PS10, Line 102:       // Clean up
.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@103
PS10, Line 103: $tableName
nit: Here and elsewhere, surround tableName in single quotes.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 +0000
Gerrit-HasComments: Yes