You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mitch Barnett (Code Review)" <ge...@cloudera.org> on 2018/10/25 18:11:10 UTC

[kudu-CR] [examples] Add basic Spark example written in Scala

Mitch Barnett has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11788


Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs, as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source, then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 259 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 1
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#10).

Change subject: [examples] Add basic Spark example (scala)
......................................................................

[examples] Add basic Spark example (scala)

This patch adds a basic Kudu-Spark example that utilizes the
Kudu-Spark integration.
It will allow users to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 298 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/10
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(15 comments)

I tested this locally with a multimaster cluster and with a YARN cluster running Spark and a single-master Kudu cluster. It worked as expected in both cases.

However, I think we should remove the ability to run the jar standalone and require spark-submit to be used. This is, AFAICT, typical for Spark applications, and it simplifies the configuration of the job.

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@32
PS10, Line 32: To build and run, ensure maven is installed and from the spark-example directory
             : run
Can you specify what you need for this to work? Do you need Spark running somewhere? A Kudu cluster somewhere?


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@38
PS10, Line 38: $ java -jar target/kudu-spark-example-1.0-SNAPSHOT.jar
After poking around the internet for a bit, I think we shouldn't try to make this runnable standalone. It should require spark-submit. If we do that, we can remove the very awkward double specifying of the Spark master.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: Host
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: \
You can't break the line in the middle of a single-quotes literal, since every character inside them is treated literally by the shell.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@17
PS10, Line 17: org.apache.kudu.examples
I think this should be org.apache.kudu.spark.examples.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS10, Line 32:   val kuduMasters: String = System.getProperty("kuduMasters","localhost:7051")   // Kudu master address list.
Spaces after commas here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS10, Line 34: val sparkMaster: String = System.getProperty("sparkMaster","local")
If we only support running this example with spark-submit, which seems to be the only thing done in practice, we don't need this parameter, and invoking the example will be less awkward because the Spark master will only be specified once, and in the normal Spark way.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS10, Line 40: Defining
Define


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@44
PS10, Line 44:   case class User(name:String,id:Int)
nit: space here


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS10, Line 47: 
nit: remove extra line


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@51
PS10, Line 51: SparkExample
KuduSparkExample


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@54
PS10, Line 54: Importing
Import


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@67
PS10, Line 67: 1 replica per tablet
Let's do the default number of replicas (i.e. remove setting the number of replicas explicitly). That lowers the chance for funny business if somebody copies this and builds it into a production job.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@102
PS10, Line 102:       // Clean up
.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@103
PS10, Line 103: $tableName
nit: Here and elsewhere, surround tableName in single quotes.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed a vote on this change.

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Removed Code-Review+1 by Mitch Barnett <mb...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> Could you provide a command-line example with proper SparkMaster URL that p
I'm starting to wonder if this is something we should even expose at this point, given the purpose of this example was supposed to be as simple as possible. If we leave this local, there's really no point in exposing it and documenting how to change it.

I can work on an example of how to target a remote cluster, but it will require a bit of reworking of the code as we'll need to add some additional values in order to gather the necessary YARN configuration files, etc. which might be more than a simple example like this should need. Thoughts?



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 18:26:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 9:

(44 comments)

Still need to try running it. Will respond again after I have.

http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@9
PS8, Line 9: Kudu client
You mean it adds a basic kudu-spark example?


http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@11
PS8, Line 11: It will allow customers to pull down the pom.xml and scala source,
nit: No paragraph break.


http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@11
PS8, Line 11: customers
Say "users" instead of "customers". Nobody pays Apache to use Apache Kudu. Or, if they do, I'm not getting my take and I'm pretty mad -.-


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@22
PS8, Line 22: Kudu client APIs
This is misleading. It doesn't use the KuduClient directly. Instead it uses the methods on KuduContext, which is the recommended way to do things. I would say instead that the program "uses the Kudu-Spark integration to:"


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@27
PS8, Line 27: Scan some rows
and then split this list item into two:

- Scan rows using RDD / DataFrame methods
- Scan rows using SparkSQL


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@37
PS8, Line 37: 
The above runs against a local cluster? What does it do?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@38
PS8, Line 38: configurable parameters
More precisely, these are Java system properties. Say that, since many users will understand what this means.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: Host
Remove


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: Master
nit: master


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: A String value consisting of
Remove this first part of the sentence. Parameters from the command line always start as strings.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@41
PS8, Line 41: This
Remove, so it's just "Defaults to..."


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@41
PS8, Line 41: This will need to be pointed to the Kudu cluster you wish to target.
I think this is obvious and should be omitted.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@43
PS8, Line 43: location
You mean the address, right?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@43
PS8, Line 43: A String value conatining
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@46
PS8, Line 46: A String value containing
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@47
PS8, Line 47: The default table name is
Replace with "Defaults to".


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@49
PS8, Line 49: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format.
Omit this sentence since it's a generality about Java system properties, and the example demonstrates the syntax anyway.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@51
PS8, Line 51: set the property `KuduMasters` to a CSV of the master addresses in the form
            : `host:port` and add a table name value, as shown:
Replace with just "use a command like the following:"


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@56
PS8, Line 56: KuduMasters
I don't like PascalCase for these parameters. We have the same kuduMaster parameter for the java example except it's named camelCase.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@56
PS8, Line 56: TableName
tableName


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@56
PS8, Line 56: target/kudu-spark-example-1.0-SNAPSHOT.jar
Split the command so it's 80 or less characters per line. Remember to use \ to break bash commands into multiple lines properly.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@59
PS8, Line 59: rk2 'spark-sub
Do we have to care whether Spark is running on YARN vs anything else? Seems like the answer should be no. What do you do if you are running on Mesos? Is that still a thing for Spark 2?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@60
PS8, Line 60: your '--master' parameter
What does this mean?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@19
PS8, Line 19: import collection.JavaConverters._
            : 
            : import org.apache.kudu.client._
            : import org.apache.kudu.spark.kudu._
            : 
            : import org.apache.spark.sql.SparkSession
            : import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
            : import org.apache.spark.sql.Row
Double-check that you are complying with the import rules at https://github.com/databricks/scala-style-guide#imports.

We don't have a chosen Scala style guide, so I'm choosing the Databricks one for this file because it's what's used for Spark itself.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: //Kudu
Space between the // and the first word

    // Kudu master address list.

Here and in most every other comment in the file.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: KuduMasters
As I commented elsewhere, can we name these parameters camelCase instead of PascalCase?

    kuduMasters
    tableName
    sparkMaster

This will be consistent with the Java example.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: KuduMasters
I know Grant said these should be capitalized like this if they are constants, but they aren't. They are immutable values retrieved from the environment, so I think they should also be camelCase.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS8, Line 40: //Defining a class that we'll use to insert data into the table.
It seems like there's benefits to using a case class-- the RDD toDF() calls below don't need an explicit schema, for example. Maybe this comment should explain we're using a case class for this reason? This will help newer people make good decisions about data modeling with Spark.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@45
PS8, Line 45: //Define our session and context variables for use throughout the program.
I think this comment is redundant and can be omitted, unless you are going to explain more about what a KuduContext is. Maybe you should say explicitly that all interaction with Kudu should be through the KuduContext, or in extremis use the KuduClient obtainable from KuduContext#kuduClient?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS8, Line 47: kc
Rename to kuduContext for maximum readability.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@49
PS8, Line 49: //Importing a class from the SparkSession we instantiated above, to allow for easier RDD -> DF conversions.
            :     import spark.implicits._
Can you import this at the top? I feel like this is an antipattern.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@52
PS8, Line 52: Define the schema of the table we're going to create.
nit: I'd just say "The schema of the table we're going to use."


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@53
PS8, Line 53: //This will be a basic schema, with only two columns: 'name' and 'id'
This is obvious from the code and doesn't need to be comment.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@62
PS8, Line 62: xecute the command from the KuduContext to 
Remove, so it's just "Create the..."


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@64
PS8, Line 64: .setNumReplicas(1).addHashPartitions(List(IdCol).asJava, 3))
Please split into lines of 80 characters or less. Occasional overflows up to 100 characters are ok if splitting the line makes it harder to read.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@68
PS8, Line 68: //Create an array of User objects, put them into an RDD, convert the RDD to a DF and then insert the rows.
This restates the code. Remove.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@76
PS8, Line 76: //Specify the columns of the table we want to read, push those into a kuduRDD object for the current SparkContext and the table.
            :       //Map the values in the RDD to give us a tuple format, and then print the results.
Remove. Don't just restate the code in comments-- explain it.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@78
PS8, Line 78:       val readCols = Seq(NameCol, IdCol)
Add a log message here: "Reading back the rows written" or something.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@81
PS8, Line 81: .collect().foreach(println(_))
Does .show() not work?


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@84
PS8, Line 84:  //Create another Array of User objects to upsert, pass them into an RDD, convert it to a DF and upsert it to the table specified.
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@93
PS8, Line 93: //Create a DF and map the values of the KuduMaster and TableName values.
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@96
PS8, Line 96: "select * from " + TableName + " where " + IdCol + " > 1000"
This would be nicer if you used https://docs.scala-lang.org/overviews/core/string-interpolation.html.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@97
PS8, Line 97: 
Extra newline.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@101
PS8, Line 101: //Delete the table and shutdown the spark session.
Remove, or replace with "Clean up."



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 9
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 20:02:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> This should work without issue in a spark-submit job, I tested that previou
I ran the job locally - worked fine. Then, I tried running it against a Spark 1.6 standalone cluster (CHD5.15.2) and got an error "java.io.StreamCorruptedException: invalid stream header: 01000C31" - because of the mismatch between Spark 2 and Spark 1.6. The way I submitted SparkMaster URL is exactly how your colleague described: 

java -DKuduMasters=greg-kudu-5152-1.vpc.cloudera.com:7051 -DSparkMaster=spark://greg-kudu-5152-1.vpc.cloudera.com:7077 -jar target/kudu-spark-example-1.0-SNAPSHOT.jar

I didn't try running it against Spark2 on yarn yet, that will require rewriting the example code a bit.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 20:18:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2 'spark-submit' job, you will need to set this value to match the '--master' Spark
> To clarify, my point was not to make this example CDH-specific, but to make
Gotcha - misunderstanding on my part!



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 9
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 19:47:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#5).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs, as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source, then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 287 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 5
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@43
PS1, Line 43: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format. For example, to set a different set of masters for the Kudu cluster from the command line and use a custom table name, set the property `KuduMasters` to a CSV of the master addresses in the form `host:port` and add a table name value, as shown:
> I didn't see any default listed for this value - it's inclusion is only due
I meant the default you are using in your code. For example, by default you set SPARK_MASTER to local if it isn't provided.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@43
PS4, Line 43: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format. For example, to set a different set of masters for the Kudu cluster from the command line and use a custom table name, set the property `KuduMasters` to a CSV of the master addresses in the form `host:port` and add a table name value, as shown:
Nit: line breaks.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@16
PS4, Line 16:   val KuduMasters: String = System.getProperty("KuduMasters","kudu.master1:7051,kudu.master2:7051,kudu.master3:7051")   //kudu master address list
Can we use localhost:7051 as the default?


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@22
PS4, Line 22:   //defining a class that we'll use to insert data into the table
Nit: periods at the end of comments.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@27
PS4, Line 27:     val logger = LoggerFactory.getLogger(SparkExample.getClass)
nit: this can be defined at the top of the object.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS4, Line 34:     import spark.implicits._
Can you try moving this? It should be fine to be with the other imports.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@39
PS4, Line 39:                        List(
nit: This indentation seams level looks like it is using 4 spaces.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS4, Line 40:                          StructField(IdCol,IntegerType,false),
nit: space after commas.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@48
PS4, Line 48:         kc.createTable(TableName, schema, Seq(IdCol), new CreateTableOptions().setNumReplicas(3).addHashPartitions(List(IdCol).asJava, 3))
This still has a replication factor of 3.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@71
PS4, Line 71:       val upsertUsers = Array(User("newUserA", 1234), User("userC", 7777))
To show the value of an upsert, should we re-use a key that was already inserted?


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@84
PS4, Line 84:     finally try {
just finally?



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 13:59:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
Could you provide a command-line example with proper SparkMaster URL that points to a remote Spark Master?



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 18:03:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> I ran the job locally - worked fine. Then, I tried running it against a Spa
We don't support Spark 1 anymore. That was dropped a few versions back.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@71
PS4, Line 71:       val upsertUsers = Array(User("newUserA", 1234), User("userC", 7777))
> Yep, I am repeating a value from above (the id=1234). This will upsert 1234
oh, my bad I was looking at the names.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 20:34:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@32
PS11, Line 32: compile and
remove


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@33
PS11, Line 33: spark-example/
`spark-example` instead.


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@33
PS11, Line 33: a new
             : java executable
a Spark application jar


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@34
PS11, Line 34: target/
`target` instead


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@34
PS11, Line 34: within
in


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@34
PS11, Line 34: spark-example/
remove


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@41
PS11, Line 41: There are a few Java system properties defined in SparkExample.scala:
             : 
             : - kuduMasters: A comma-separated list of Kudu master addresses. Defaults to
             :   'localhost:7051'.
             : - tableName: The name of the table you wish to create on the Kudu cluster.
             :   Defaults to 'spark_test'.
             : 
             : To run this as a spark2 'spark-submit' job, use the spark-submit command
             : as follows from the spark-example directory - it requires that you have built
             : and compiled the package using the 'mvn package' command above. You will also
             : need a Spark on YARN cluster and a Kudu cluster, both of which should be
             : resolvable and accessible from the host executing the command:
I think we should redo this section a little bit. How do you like:

To configure the kudu-spark example, there are two Java system properties available:

- kuduMasters: a comma-separated list of Kudu master addresses. Default: localhost:7051.
- tableName: the name of the table to use for the example program. This table should not exist in Kudu. Defaults to 'spark_test'.

The application can be run using `spark-submit`. For example, to run the example against a Spark cluster running on YARN, use a command like the following:

(long command i ain't typin)

You will need the Kudu cluster to be up and running and Spark correctly configured for the example to work.


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@57
PS11, Line 57: <preferred deploy mode>
What's the default? We want client mode so people will see the log messages showing the app running vs Kudu. If the default is client, just remove this param.


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala@65
PS11, Line 65: defaul
default



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 11
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 19:05:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#7).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs,
as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 287 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 7
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@43
PS1, Line 43: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format. For example, to set a different set of masters for the Kudu cluster from the command line and use a custom table name, set the property `KuduMasters` to a CSV of the master addresses in the form `host:port` and add a table name value, as shown:
> I meant the default you are using in your code. For example, by default you
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@43
PS4, Line 43: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format. For example, to set a different set of masters for the Kudu cluster from the command line and use a custom table name, set the property `KuduMasters` to a CSV of the master addresses in the form `host:port` and add a table name value, as shown:
> Nit: line breaks.
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@16
PS4, Line 16:   val KuduMasters: String = System.getProperty("KuduMasters","kudu.master1:7051,kudu.master2:7051,kudu.master3:7051")   //kudu master address list
> Can we use localhost:7051 as the default?
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@22
PS4, Line 22:   //defining a class that we'll use to insert data into the table
> Nit: periods at the end of comments.
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@27
PS4, Line 27:     val logger = LoggerFactory.getLogger(SparkExample.getClass)
> nit: this can be defined at the top of the object.
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS4, Line 34:     import spark.implicits._
> Can you try moving this? It should be fine to be with the other imports.
It fails to build:

[ERROR] /Users/mbarnett/src/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:14: error: not found: object spark
[ERROR] import spark.implicits._
[ERROR]        ^
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

because it's using the 'spark' instance we declare above it, which actually contains the lib.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@39
PS4, Line 39:                        List(
> nit: This indentation seams level looks like it is using 4 spaces.
Sorry, didn't quite understand this one. Should this be simply indented, or level?


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS4, Line 40:                          StructField(IdCol,IntegerType,false),
> nit: space after commas.
Done


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@48
PS4, Line 48:         kc.createTable(TableName, schema, Seq(IdCol), new CreateTableOptions().setNumReplicas(3).addHashPartitions(List(IdCol).asJava, 3))
> This still has a replication factor of 3.
Missed the change. Completed now.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@71
PS4, Line 71:       val upsertUsers = Array(User("newUserA", 1234), User("userC", 7777))
> To show the value of an upsert, should we re-use a key that was already ins
Yep, I am repeating a value from above (the id=1234). This will upsert 1234 and change the name of 'userA' to 'newUserA'.

If you think the change should be more drastic to call more attention to what the upsert did, let me know. I can have it change to something like "BrandNewUser" instead.


http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@84
PS4, Line 84:     finally try {
> just finally?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 14:43:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 1:

(11 comments)

Thanks for the contribution. I did a quick first review pass.

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@18
PS1, Line 18: = Kudu-Spark  example README
nit: extra space


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@40
PS1, Line 40: - ID_COL: String value containing the name of the Primary Key column.
Is this needed? Given the code is creating the table, we have control over this. Same with NAME_COL. I think it complicates the example. 

Looking at the code, these aren't exposed as setable properties.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@43
PS1, Line 43: - SPARK_MASTER: String value which identifies the location of the Spark Master to be used. If running locally (standalone), this must be set to 'local'.
nit: Move up near KUDU_MASTERS

Can you specify if these have a default?


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@14
PS1, Line 14:   val KUDU_MASTERS : String = System.getProperty("KUDU_MASTERS","kudu.master1:7051,kudu.master2:7051,kudu.master3:7051")  //kudu master address list
nit: val KuduMasters: String = ...

In Scala constants are usually named like objects. The others below should be changed too.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@19
PS1, Line 19:   val SPARK_MASTER = "local"  //location of spark master, specify 'local' if running on localhost
Should this be configurable like KUDU_MASTERS?


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS1, Line 32:     val schema : StructType = StructType {
Nit: I think the below sytax is easier

StructType(
      List(
        StructField("ID_COL", DataTypes.IntegerType,NULLABLE),
        StructField("NAME_COL", DataTypes.StringType,NULLABLE),
      ))

ID columns can't be nullable. You can probably just hard coded if a column is nullable or not.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@39
PS1, Line 39: ID_COL
nit: Replication factor of 1 might be easier for examples.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@43
PS1, Line 43:       import spark.implicits._
Can this be moved up by the other imports.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS1, Line 47:       System.out.println("Writing to table " + TABLE_NAME)
Nit: Use slf4j logging.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@77
PS1, Line 77:       case e: Exception => e.printStackTrace()
No need to catch if there is no handling. Just let the exception bubble up.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@87
PS1, Line 87:         case e: Exception => e.printStackTrace()
No need to catch here either.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 1
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 19:46:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#4).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs, as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source, then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 259 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 17:49:12 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#8).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs,
as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 292 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: If running locally (standalone), this must be set to 'local'.
This wording creates an impression that this parameter must be set  explicitly when running locally, but the way the code works (and what the example below shows) is that this parameter defaults to "local" and is not required to be set.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 17:50:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#6).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs,
as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 287 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 6
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@34
PS11, Line 34: spark-example/
> remove
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 11
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 20:12:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:16:36 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#2).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs, as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source, then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 259 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 2
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> I tried running this on Spark2 on Yarn after replacing how parameters are p
I guess, my point is that it would be nice if this example was applicable to CDH. Otherwise, it should probably just use local execution.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 22:03:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS4, Line 34:     import spark.implicits._
> It fails to build:
This link explains it a bit better: https://stackoverflow.com/questions/39968707/spark-2-0-missing-spark-implicits



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 14:52:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#12).

Change subject: [examples] Add basic Spark example (scala)
......................................................................

[examples] Add basic Spark example (scala)

This patch adds a basic Kudu-Spark example that utilizes the
Kudu-Spark integration.
It will allow users to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala
3 files changed, 287 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/12
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 12
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4: Verified+1

unrelated TSAN failure


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:16:47 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> That's one way to address it. I a wonder if there is a way to refactor the 
This should work without issue in a spark-submit job, I tested that previously.

I talked with a colleague on the Spark team, and noted that the proper reference for a standalone spark cluster is simply "spark://<master>:7077" so that should be very easy to implement after all.

Let me make those changes, and go back and attempt the same spark-submit test again to make sure this is still working. Could you provide me with the steps you took to submit it, as well as the submit command you used?



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 19:45:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@32
PS11, Line 32: compile and
> remove
Done


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@33
PS11, Line 33: spark-example/
> `spark-example` instead.
Done


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@41
PS11, Line 41: There are a few Java system properties defined in SparkExample.scala:
             : 
             : - kuduMasters: A comma-separated list of Kudu master addresses. Defaults to
             :   'localhost:7051'.
             : - tableName: The name of the table you wish to create on the Kudu cluster.
             :   Defaults to 'spark_test'.
             : 
             : To run this as a spark2 'spark-submit' job, use the spark-submit command
             : as follows from the spark-example directory - it requires that you have built
             : and compiled the package using the 'mvn package' command above. You will also
             : need a Spark on YARN cluster and a Kudu cluster, both of which should be
             : resolvable and accessible from the host executing the command:
> I think we should redo this section a little bit. How do you like:
Done


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/README.adoc@57
PS11, Line 57: <preferred deploy mode>
> What's the default? We want client mode so people will see the log messages
Yeah, the default is client. I've omitted this parameter.


http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/11/examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala@65
PS11, Line 65: defaul
> default
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 11
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 20:11:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4: Code-Review+1

(11 comments)

Updated per Grant's feedback.

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@18
PS1, Line 18: = Kudu-Spark example README
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@40
PS1, Line 40: - KuduMasters: A String value consisting of a comma-separated list of Kudu Master Host addresses. This will need to be pointed to the Kudu cluster you wish to target.
> Is this needed? Given the code is creating the table, we have control over 
Good call - probably better to give less control as to not cause additional failures.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@43
PS1, Line 43: To specify a value at execution time, you'll specify the parameter name in the '-D<name>' format. For example, to set a different set of masters for the Kudu cluster from the command line and use a custom table name, set the property `KuduMasters` to a CSV of the master addresses in the form `host:port` and add a table name value, as shown:
> nit: Move up near KUDU_MASTERS
I didn't see any default listed for this value - it's inclusion is only due to it being a requirement of the SparkSession create statement.

I suppose you could infer that 'local' is the default value, since any other value would require a spark master to be explicitly defined.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@14
PS1, Line 14: object SparkExample {
> nit: val KuduMasters: String = ...
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@19
PS1, Line 19:   val NameCol = "name"
> Should this be configurable like KUDU_MASTERS?
I went ahead and made it configurable, as to allow those who want to the option of changing where it runs.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS1, Line 32: 
> Nit: I think the below sytax is easier
Went ahead and made that syntax change - bit easier to read.

Also went ahead and explicitly defined 'false' for the structFields instead of the NULLABLE constant.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@39
PS1, Line 39: 
> nit: Replication factor of 1 might be easier for examples.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@43
PS1, Line 43:                   )
> Can this be moved up by the other imports.
From what I've read in the Spark documentation and what I've seen in the code base, it can't be moved.

It's being called directly from the SparkSession we instantiated above.
I could move it higher in main(), but I can't take it up to the other imports unfortunately.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS1, Line 47:       if (!kc.tableExists(TableName)) {
> Nit: Use slf4j logging.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@77
PS1, Line 77:       //create a DF and map the values of the KuduMaster and TableName values
> No need to catch if there is no handling. Just let the exception bubble up.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@87
PS1, Line 87:       kc.deleteTable(TableName)
> No need to catch here either.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:11:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 12
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 18:21:46 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> I ran the job locally - worked fine. Then, I tried running it against a Spa
I tried running this on Spark2 on Yarn after replacing how parameters are passed (args instead of System.getProperty) and it almost worked. It broke with "java.lang.NoClassDefFoundError: org/apache/commons/dbcp/ConnectionFactory"



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 22:00:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has removed a vote on this change.

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: KuduMasters
Java strings are case sensitive, so this should be kuduMasters, to correspond to the change you made in the code.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@45
PS10, Line 45: SparkMaster
same as above - sparkMaster not SparkMaster


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@56
PS10, Line 56: KuduMasters
Same as above. This should be kuduMasters with lowercase k


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@57
PS10, Line 57: TableName
same as above - tableName, not TableName


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@63
PS10, Line 63: SparkMaste
same here sparkMaster


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: -DKuduMasters=master.0:7051,master.1:7051,master.2:7051 -DTableName=test_table \
             : -DSparkMaster=yarn
same comment here, you need to change the names of sys properties to match the code - they are case sensitive.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 31 Oct 2018 19:25:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 11:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@32
PS10, Line 32: To compile and build the example, ensure maven is installed and execute
             : the
> Can you specify what you need for this to work? Do you need Spark running s
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@38
PS10, Line 38: $ mvn package
> After poking around the internet for a bit, I think we shouldn't try to mak
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: addr
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: 
> You can't break the line in the middle of a single-quotes literal, since ev
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@17
PS10, Line 17: 
> I think this should be org.apache.kudu.spark.examples.
Renamed, and created the additional directory structure.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS10, Line 32: 
> Spaces after commas here and elsewhere.
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS10, Line 40: 
> Define
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@44
PS10, Line 44: 
> nit: space here
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS10, Line 47: 
> nit: remove extra line
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@51
PS10, Line 51: 
> KuduSparkExample
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@54
PS10, Line 54: 
> Import
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@67
PS10, Line 67: 
> Let's do the default number of replicas (i.e. remove setting the number of 
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@102
PS10, Line 102: 
> .
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 11
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 19:37:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 7:

So the snag I hit previously ended up being KUDU-2259, which caused the job to fail since the token was reissued by the master and it attempted to use a different name (despite there being no auth configured..)

It looks like this was supposed to be fixed in 1.7, but I had to update the pom to pull 1.7.1 or 1.8.0 to get it working. I don't know how changes in Kudu directly impact the kudu-spark lib, but it's resolved nonetheless.


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 7
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 27 Oct 2018 02:04:47 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: If running locally (standalone), this must be set to 'local'.
> Thanks for the feedback. Do you think this would be better?
Yes, I think this is better.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 17:58:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#11).

Change subject: [examples] Add basic Spark example (scala)
......................................................................

[examples] Add basic Spark example (scala)

This patch adds a basic Kudu-Spark example that utilizes the
Kudu-Spark integration.
It will allow users to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala
3 files changed, 285 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/11
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 11
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................

[examples] Add basic Spark example (scala)

This patch adds a basic Kudu-Spark example that utilizes the
Kudu-Spark integration.
It will allow users to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Reviewed-on: http://gerrit.cloudera.org:8080/11788
Reviewed-by: Will Berkeley <wd...@gmail.com>
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@apache.org>
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/spark/examples/SparkExample.scala
3 files changed, 287 insertions(+), 0 deletions(-)

Approvals:
  Will Berkeley: Looks good to me, but someone else must approve
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 13
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: If running locally (standalone), this must be set to 'local'.
> This wording creates an impression that this parameter must be set  explici
Thanks for the feedback. Do you think this would be better?

"This defaults to 'local', which is the required value if running the application locally (standalone)"



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 17:53:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2-submit job, you will need to set this value to match the '--master' Spark 
> Yeah, the upstream is just 'spark-submit' but I thought we wanted this targ
To clarify, my point was not to make this example CDH-specific, but to make it easy to run on CDH or HDP, or really any other distro that makes it easy way to install and configure Spark+Yarn+Hive+Kudu for a noob, especially for someone with little previous Kudu, Spark, Hive and Yarn experience. LGTM with "spark-submit".



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 19:39:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: KuduMasters
> Java strings are case sensitive, so this should be kuduMasters, to correspo
These are all fixed, but was awaiting feedback from Will/Grant on their previous naming conventions for these different variables. Will push them once I receive that feedback.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@45
PS10, Line 45: SparkMaster
> same as above - sparkMaster not SparkMaster
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@56
PS10, Line 56: KuduMasters
> Same as above. This should be kuduMasters with lowercase k
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@57
PS10, Line 57: TableName
> same as above - tableName, not TableName
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@63
PS10, Line 63: SparkMaste
> same here sparkMaster
Done


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: -DKuduMasters=master.0:7051,master.1:7051,master.2:7051 -DTableName=test_table \
             : -DSparkMaster=yarn
> same comment here, you need to change the names of sys properties to match 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 31 Oct 2018 20:42:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 12: Code-Review+1

LGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 12
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 20:32:14 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 8: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 18:16:54 +0000
Gerrit-HasComments: No

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> I guess, my point is that it would be nice if this example was applicable t
I've got this running with the System.getProperty() call on another spark2 cluster.

I'll write up some instructions and include them here so that customers don't have to do this same song and dance as we are :)



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 22:22:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@9
PS8, Line 9: Kudu-Spark 
> You mean it adds a basic kudu-spark example?
I think I copied this from the Kudu-Java example patch notes.... Fixed


http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@11
PS8, Line 11: It will allow users to pull down the pom.xml and scala source,
> nit: No paragraph break.
Pushing to gerrit throws a warn message about line breaks, which is why I added this in. Safe to ignore and leave it as a single line without width considerations, or is there a better way to meet the width limitation?


http://gerrit.cloudera.org:8080/#/c/11788/8//COMMIT_MSG@11
PS8, Line 11: users to 
> Say "users" instead of "customers". Nobody pays Apache to use Apache Kudu. 
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@22
PS8, Line 22: the Kudu-Spark i
> This is misleading. It doesn't use the KuduClient directly. Instead it uses
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@27
PS8, Line 27: Scan some rows
> and then split this list item into two:
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@37
PS8, Line 37: $ mvn package
> The above runs against a local cluster? What does it do?
Not quite - it will run as a local execution, and build the 'spark' architecture locally when executed. There's no cluster to run against here, it's effectively a standalone application running in it's own JVM.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@38
PS8, Line 38: et/kudu-spark-example-1
> More precisely, these are Java system properties. Say that, since many user
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: 
> Remove
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: 
> Remove this first part of the sentence. Parameters from the command line al
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: 
> nit: master
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@41
PS8, Line 41: ere are a few Java system properties defined in SparkExample.scala:
> I think this is obvious and should be omitted.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@41
PS8, Line 41: 
> Remove, so it's just "Defaults to..."
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@43
PS8, Line 43: A comma-separated list of
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@43
PS8, Line 43:  master 
> You mean the address, right?
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@46
PS8, Line 46: park2 'spark-submit' job,
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@47
PS8, Line 47: the '--master' Spark para
> Replace with "Defaults to".
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@49
PS8, Line 49:   Defaults to 'spark_test'.
> Omit this sentence since it's a generality about Java system properties, an
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@51
PS8, Line 51:  different set of masters for the Kudu cluster from the
            : command line and use a custom table name, use a c
> Replace with just "use a command like the following:"
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@59
PS8, Line 59: 
> Do we have to care whether Spark is running on YARN vs anything else? Seems
Yep, this was corrected in a later revision. I misunderstood some of the previous comments and made this a bit too CDH-centric. This should now just read:

"To run this as a spark2 'spark-submit' job"


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@60
PS8, Line 60: ubmit command
> What does this mean?
In the spark-submit, you supply the spark-master address as part of the command-line options. In order to make this work between both spark-submit and also a standalone java execution, I need to have the '--master' value from the spark options also be supplied to the Kudu-Spark example via '-DSparkMaster='.

Can explain in more detail in chat/call if needed.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@19
PS8, Line 19: import collection.JavaConverters._
            : 
            : import org.slf4j.LoggerFactory
            : 
            : import org.apache.kudu.client._
            : import org.apache.kudu.spark.kudu._
            : 
            : import org.apache.spark.sql.Spa
> Double-check that you are complying with the import rules at https://github
Looks correct to me, but please call out any misuse.

Placed the JavaConverters first, rearranged LoggerFactory to come before org.apache.kudu, and had org.apache.spark last. Only calling required definitions unless there are multiple required.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: kuduMasters
> As I commented elsewhere, can we name these parameters camelCase instead of
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: kuduMasters
> I know Grant said these should be capitalized like this if they are constan
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS8, Line 32: // Kud
> Space between the // and the first word
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS8, Line 40: // Defining a class that we'll use to insert data into the table
> It seems like there's benefits to using a case class-- the RDD toDF() calls
Changed this to be the following:

  // Defining a class that we'll use to insert data into the table.
  // Because we're defining a case class here, we circumvent the need to explicitly define a schema later
  // in the code, like during our RDD -> toDF() calls later on.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@45
PS8, Line 45: 
> I think this comment is redundant and can be omitted, unless you are going 
Added the following - let me know if this is insufficient:

    // Define our session and context variables for use throughout the program.
    // The kuduContext is a serializable container for Kudu client connections, while the 
    // SparkSession is the entry point to SparkSQL and the Dataset/DataFrame API.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS8, Line 47: 
> Rename to kuduContext for maximum readability.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@49
PS8, Line 49: // The kuduContext is a serializable container for Kudu client connections,
            :     // while the SparkSessio
> Can you import this at the top? I feel like this is an antipattern.
It is an anti-pattern, it's ugly - and there's no way to move it that I've been able to figure out.

It's even in spark's own documentation as such: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables

Moving that import fails the build:
[ERROR] /Users/mbarnett/src/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:14: error: not found: object spark
[ERROR] import spark.implicits._
[ERROR]        ^
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

because it's using the 'spark' instance we declare above it, which actually contains the lib.

This link explains it a bit better: https://stackoverflow.com/questions/39968707/spark-2-0-missing-spark-implicits


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@52
PS8, Line 52: l kuduContext = new KuduContext(kuduMasters, spark.sq
> nit: I'd just say "The schema of the table we're going to use."
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@53
PS8, Line 53: 
> This is obvious from the code and doesn't need to be comment.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@62
PS8, Line 62:                 StructField(nameCol, String
> Remove, so it's just "Create the..."
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@64
PS8, Line 64: 
> Please split into lines of 80 characters or less. Occasional overflows up t
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@68
PS8, Line 68: if (!kuduContext.tableExists(tableName)) {
> This restates the code. Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@76
PS8, Line 76: val userRDD = spark.sparkContext.parallelize(data)
            :       val userDF = userRDD.toDF()
> Remove. Don't just restate the code in comments-- explain it.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@78
PS8, Line 78:       kuduContext.insertRows(userDF, tableName)
> Add a log message here: "Reading back the rows written" or something.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@81
PS8, Line 81: fo("Reading back the rows just
> Does .show() not work?
I don't think there's a .show() call for RDD's, is there? readRDD.map returns an RDD - as far as I know, .show() only exists on a DataFrame or called directly through sqlContext (as done below).


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@84
PS8, Line 84:  val userTuple = readRDD.map { case Row(name: String, id: Int) => (name, id) }
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@93
PS8, Line 93: 
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@96
PS8, Line 96: > kuduMasters, "kudu.table" -> tableName)).kudu
> This would be nicer if you used https://docs.scala-lang.org/overviews/core/
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@97
PS8, Line 97:       sqlDF.createOrReplaceTempView(tableName)
> Extra newline.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@101
PS8, Line 101: nally {
> Remove, or replace with "Clean up."
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 22:03:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2 'spark-submit' job, you will need to set this value to match the '--master' Spark
> This is an Apache project and an Apache repository. Nothing about this chan
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@59
PS8, Line 59: To run this as a spark2 'spark-submit' job, you can use the spark-submit command as follows from the
> Same here. This should just be spark-submit.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@66
PS8, Line 66: $ spark-submit --class org.apache.kudu.examples.SparkExample --master yarn --deploy-mode <preferred deploy mode> --driver-java-options '-DKuduMasters=master.0:7051,master.1:7051,master.2:7051 -DTableName=test_table -DSparkMaster=yarn' target/kudu-spark-example-1.0-SNAPSHOT.jar
> Same here. This should just be spark-submit.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 9
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 19:23:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#3).

Change subject: [examples] Add basic Spark example written in Scala
......................................................................

[examples] Add basic Spark example written in Scala

This patch adds a basic Kudu client that utilizes both Kudu Java APIs, as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source, then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 257 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 3
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: - KuduMasters: A String value consisting of a comma-separated list of Kudu Master Host addresses. 
nit: trailing white space here and below.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2-submit job, you will need to set this value to match the '--master' Spark 
I think this should be spark 2 `spark-submit` job. IIRC spark2-submit is a CDH 5 only command for their compatibility reasons.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@59
PS8, Line 59: To run this against Spark2 On YARN, you can use the spark2-submit command as follows from the
Same here. This should just be spark-submit.


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@66
PS8, Line 66: $ spark2-submit --class org.apache.kudu.examples.SparkExample --master yarn --deploy-mode <preferred deploy mode> --driver-java-options '-DKuduMasters=master.0:7051,master.1:7051,master.2:7051 -DTableName=test_table -DSparkMaster=yarn' target/kudu-spark-example-1.0-SNAPSHOT.jar
Same here. This should just be spark-submit.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 18:51:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Attila Bukor, Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11788

to look at the new patch set (#9).

Change subject: [examples] Add basic Spark example (scala)
......................................................................

[examples] Add basic Spark example (scala)

This patch adds a basic Kudu client that utilizes both Kudu Java APIs,
as well as Spark SQL APIs.
It will allow customers to pull down the pom.xml and scala source,
then build and execute from their local machine.

Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
---
A examples/scala/spark-example/README.adoc
A examples/scala/spark-example/pom.xml
A examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
3 files changed, 292 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/11788/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 9
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@40
PS8, Line 40: - KuduMasters: A String value consisting of a comma-separated list of Kudu Master Host addresses. 
> nit: trailing white space here and below.
Done


http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2-submit job, you will need to set this value to match the '--master' Spark 
> I think this should be spark 2 `spark-submit` job. IIRC spark2-submit is a 
Yeah, the upstream is just 'spark-submit' but I thought we wanted this targeted for CDH - per Greg's update. 

Should I leave it as is to be CDH-specific, or change it to match upstream spark conventions?



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 18:57:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Greg Solovyev (Code Review)" <ge...@cloudera.org>.
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/4/examples/scala/spark-example/README.adoc@41
PS4, Line 41: SparkMaster
> I'm starting to wonder if this is something we should even expose at this p
That's one way to address it. I a wonder if there is a way to refactor the code, so that it can be submitted to Spark with spark-submit as well as ran as a java application. What makes me uncomfortable with the current example is that while you can run it locally (w/o a Spark cluster), you cannot run it against a Spark 2.x on Yarn and you cannot run it against Spark 1.x standalone (I tried). So, as a result, you cannot run this example against a Spark cluster deployed with CDH or HDP.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 19:28:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/8/examples/scala/spark-example/README.adoc@44
PS8, Line 44:   If running a spark2-submit job, you will need to set this value to match the '--master' Spark 
> Yeah, the upstream is just 'spark-submit' but I thought we wanted this targ
This is an Apache project and an Apache repository. Nothing about this change should be related to or specifically for CDH. All changes should target the Apache integrations.



-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 8
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 19:02:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Posted by "Mitch Barnett (Code Review)" <ge...@cloudera.org>.
Mitch Barnett has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )

Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 6:

Pushed the additional instructions for running this as a spark2-submit job, however there was a small snag I encountered that I'm still working through.

The job will actually run successfully and print out as expected, but will fail when closing down due to how I'm defining the SparkSession's 'master' instance. I'm working through how to mitigate that, and still have it be executable as both a java standalone app and via spark2-submit.


-- 
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 6
Gerrit-Owner: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Greg Solovyev <gs...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mb...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 27 Oct 2018 01:17:42 +0000
Gerrit-HasComments: No