You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by aokolnychyi <gi...@git.apache.org> on 2016/07/09 21:18:04 UTC

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

GitHub user aokolnychyi opened a pull request:

    https://github.com/apache/spark/pull/14119

    [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples

    ## What changes were proposed in this pull request?
    
    - Hard-coded Spark SQL sample snippets were moved into source files under examples sub-project.
    - Removed the inconsistency between Scala and Java Spark SQL examples
    - Scala and Java Spark SQL examples were updated
    
    ## How was this patch tested?
    
    The work is still in progress. All involved examples were tested manually. An additional round of testing will be done after the code review.
    
    
    
    ![image](https://cloud.githubusercontent.com/assets/6235869/16710314/51851606-462a-11e6-9fbe-0818daef65e4.png)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aokolnychyi/spark spark_16303

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14119
    
----
commit 95f0f41fa12e1c6f0fb8ce6cd4222fb63842b495
Author: aokolnychyi <ok...@gmail.com>
Date:   2016-07-09T20:56:47Z

    [SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examples

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263553
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.io.Serializable;
    +// $example off:create_ds$
    +
    +// $example on:schema_inferring$
    +// $example on:programmatic_schema$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.function.Function;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import org.apache.spark.api.java.function.MapFunction;
    +// $example on:create_df$
    +// $example on:run_sql$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:programmatic_schema$
    +// $example off:create_df$
    +// $example off:run_sql$
    +import org.apache.spark.sql.Encoder;
    +import org.apache.spark.sql.Encoders;
    +// $example off:create_ds$
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.RowFactory;
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession;
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.DataTypes;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +// $example off:programmatic_schema$
    +
    +public class JavaSparkSqlExample {
    +  // $example on:create_ds$
    +  public static class Person implements Serializable {
    +    private String name;
    +    private int age;
    +
    +    public String getName() {
    +      return name;
    +    }
    +
    +    public void setName(String name) {
    +      this.name = name;
    +    }
    +
    +    public int getAge() {
    +      return age;
    +    }
    +
    +    public void setAge(int age) {
    +      this.age = age;
    +    }
    +  }
    +  // $example off:create_ds$
    +
    +  public static void main(String[] args) {
    +    // $example on:init_session$
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark);
    +    runDatasetCreationExample(spark);
    +    runInferSchemaExample(spark);
    +    runProgrammaticSchemaExample(spark);
    +
    +    spark.stop();
    +  }
    +
    +  private static void runBasicDataFrameExample(SparkSession spark) {
    +    // $example on:create_df$
    +    Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show();
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema();
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show();
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df.col("name"), df.col("age").plus(1)).show();
    +    // name    (age + 1)
    +    // Michael null
    +    // Andy    31
    +    // Justin  20
    +
    +    // Select people older than 21
    +    df.filter(df.col("age").gt(21)).show();
    +    // age name
    +    // 30  Andy
    +
    +    // Count people by age
    +    df.groupBy("age").count().show();
    +    // age  count
    +    // null 1
    +    // 19   1
    +    // 30   1
    +    // $example off:untyped_ops$
    +
    +    // $example on:run_sql$
    +    // Register the DataFrame as a SQL temporary view
    +    df.createOrReplaceTempView("people");
    +
    +    Dataset<Row> sqlDF = spark.sql("SELECT * FROM people");
    +    sqlDF.show();
    +    // $example off:run_sql$
    +  }
    +
    +  private static void runDatasetCreationExample(SparkSession spark) {
    +    // $example on:create_ds$
    +    // Create an instance of a Bean class
    +    Person person = new Person();
    +    person.setName("Andy");
    +    person.setAge(32);
    +
    +    // Encoders are created for Java beans
    +    Encoder<Person> personEncoder = Encoders.bean(Person.class);
    +    Dataset<Person> javaBeanDS = spark.createDataset(
    +        Collections.singletonList(person),
    +        personEncoder
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    @liancheng could you, please, review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263743
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +// $example on:schema_inferring$
    +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
    +import org.apache.spark.sql.Encoder
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.Row
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.StringType
    +import org.apache.spark.sql.types.StructField
    +import org.apache.spark.sql.types.StructType
    +// $example off:programmatic_schema$
    +
    +object SparkSqlExample {
    +
    +  // $example on:create_ds$
    +  // Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
    +  // you can use custom classes that implement the Product interface
    +  case class Person(name: String, age: Long)
    +  // $example off:create_ds$
    +
    +  def main(args: Array[String]) {
    +    // $example on:init_session$
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    +
    +    // For implicit conversions like converting RDDs to DataFrames
    +    import spark.implicits._
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark)
    +    runDatasetCreationExample(spark)
    +    runInferSchemaExample(spark)
    +    runProgrammaticSchemaExample(spark)
    +
    +    spark.stop()
    +  }
    +
    +  private def runBasicDataFrameExample(spark: SparkSession): Unit = {
    +    // $example on:create_df$
    +    val df = spark.read.json("examples/src/main/resources/people.json")
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show()
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema()
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show()
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df("name"), df("age") + 1).show()
    +    // name    (age + 1)
    +    // Michael null
    +    // Andy    31
    +    // Justin  20
    +
    +    // Select people older than 21
    +    df.filter(df("age") > 21).show()
    +    // age name
    +    // 30  Andy
    +
    +    // Count people by age
    +    df.groupBy("age").count().show()
    +    // age  count
    +    // null 1
    +    // 19   1
    +    // 30   1
    +    // $example off:untyped_ops$
    +
    +    // $example on:run_sql$
    +    // Register the DataFrame as a SQL temporary view
    +    df.createOrReplaceTempView("people")
    +
    +    val sqlDF = spark.sql("SELECT * FROM people")
    +    sqlDF.show()
    +    // $example off:run_sql$
    +  }
    +
    +  private def runDatasetCreationExample(spark: SparkSession): Unit = {
    +    import spark.implicits._
    +    // $example on:create_ds$
    +    // Encoders are created for case classes
    +    val caseClassDS = Seq(Person("Andy", 32)).toDS()
    +    caseClassDS.show()
    +
    +    // Encoders for most common types are automatically provided by importing spark.implicits._
    +    val primitiveDS = Seq(1, 2, 3).toDS()
    +    primitiveDS.map(_ + 1).collect() // Returns: Array(2, 3, 4)
    +
    +    // DataFrames can be converted to a Dataset by providing a class. Mapping will be done by name
    +    val path = "examples/src/main/resources/people.json"
    +    val peopleDS = spark.read.json(path).as[Person]
    +    peopleDS.show()
    +    // $example off:create_ds$
    +  }
    +
    +  private def runInferSchemaExample(spark: SparkSession): Unit = {
    +    // $example on:schema_inferring$
    +    // For implicit conversions from RDDs to DataFrames
    +    import spark.implicits._
    +
    +    // Create an RDD of Person objects from a text file, convert it to a Dataframe
    +    val peopleDF = spark.sparkContext
    +        .textFile("examples/src/main/resources/people.txt")
    +        .map(_.split(","))
    +        .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
    +        .toDF()
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70256340
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala ---
    @@ -41,43 +35,47 @@ object HiveFromSpark {
         // in the current directory and creates a directory configured by `spark.sql.warehouse.dir`,
         // which defaults to the directory `spark-warehouse` in the current directory that the spark
         // application is started.
    -    val spark = SparkSession.builder
    -      .appName("HiveFromSpark")
    -      .enableHiveSupport()
    -      .getOrCreate()
    +
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    +
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate()
     
         import spark.implicits._
         import spark.sql
     
         sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
    -    sql(s"LOAD DATA LOCAL INPATH '${kv1File.getAbsolutePath}' INTO TABLE src")
    +    sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
     
         // Queries are expressed in HiveQL
    -    println("Result of 'SELECT *': ")
    -    sql("SELECT * FROM src").collect().foreach(println)
    +    sql("SELECT * FROM src").show()
     
         // Aggregation queries are also supported.
    -    val count = sql("SELECT COUNT(*) FROM src").collect().head.getLong(0)
    -    println(s"COUNT(*): $count")
    +    sql("SELECT COUNT(*) FROM src").show()
     
    -    // The results of SQL queries are themselves RDDs and support all normal RDD functions.  The
    -    // items in the RDD are of type Row, which allows you to access each column by ordinal.
    -    val rddFromSql = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")
    +    // The results of SQL queries are themselves DataFrames and support all normal functions.
    +    val sqlDF = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")
     
    -    println("Result of RDD.map:")
    -    val rddAsStrings = rddFromSql.rdd.map {
    +    // The items in DaraFrames are of type Row, which allows you to access each column by ordinal.
    +    val stringsDS = sqlDF.map {
           case Row(key: Int, value: String) => s"Key: $key, Value: $value"
         }
    +    stringsDS.show()
     
    -    // You can also use RDDs to create temporary views within a HiveContext.
    -    val rdd = spark.sparkContext.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
    -    rdd.toDF().createOrReplaceTempView("records")
    +    // You can also use DataFrames to create temporary views within a HiveContext.
    +    val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i, s"val_$i")))
    +    recordsDF.createOrReplaceTempView("records")
     
    -    // Queries can then join RDD data with data stored in Hive.
    -    println("Result of SELECT *:")
    -    sql("SELECT * FROM records r JOIN src s ON r.key = s.key").collect().foreach(println)
    +    // Queries can then join DataFrame data with data stored in Hive.
    +    sql("SELECT * FROM records r JOIN src s ON r.key = s.key").show()
    +    // $example off:spark_hive$
     
         spark.stop()
       }
    -}
    -// scalastyle:on println
    +}
    --- End diff --
    
    Otherwise scalastyle checker may complain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70245656
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +import org.apache.spark.sql.SparkSession
    +
    +object SqlDataSourceExample {
    +
    +  case class Person(name: String, age: Long)
    +
    +  def main(args: Array[String]) {
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Data Soures Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    +
    +    runBasicDataSourceExample(spark)
    +    runBasicParquetExample(spark)
    +    runParquetSchemaMergingExample(spark)
    +    runJsonDatasetExample(spark)
    +
    +    spark.stop()
    +  }
    +
    +  private def runBasicDataSourceExample(spark: SparkSession): Unit = {
    +    // $example on:generic_load_save_functions$
    +    val usersDF = spark.read.load("examples/src/main/resources/users.parquet")
    +    usersDF.select("name", "favorite_color").write.save("namesAndFavColors.parquet")
    +    // $example off:generic_load_save_functions$
    +    // $example on:manual_load_options$
    +    val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
    +    peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
    +    // $example off:manual_load_options$
    +    // $example on:direct_sql$
    +    val sqlDF = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`")
    --- End diff --
    
    Seems that it doesn't? The limit is 100 characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    **[Test build #62092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62092/consoleFull)** for PR 14119 at commit [`95f0f41`](https://github.com/apache/spark/commit/95f0f41fa12e1c6f0fb8ce6cd4222fb63842b495).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    **[Test build #62092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62092/consoleFull)** for PR 14119 at commit [`95f0f41`](https://github.com/apache/spark/commit/95f0f41fa12e1c6f0fb8ce6cd4222fb63842b495).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70262964
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -732,62 +452,7 @@ a `Dataset<Row>` can be created programmatically with three steps.
     by `SparkSession`.
     
     For example:
    -{% highlight java %}
    -import org.apache.spark.api.java.function.Function;
    -// Import factory methods provided by DataTypes.
    -import org.apache.spark.sql.types.DataTypes;
    -// Import StructType and StructField
    -import org.apache.spark.sql.types.StructType;
    -import org.apache.spark.sql.types.StructField;
    -// Import Row.
    -import org.apache.spark.sql.Row;
    -// Import RowFactory.
    -import org.apache.spark.sql.RowFactory;
    -
    -SparkSession spark = ...; // An existing SparkSession.
    -JavaSparkContext sc = spark.sparkContext
    -
    -// Load a text file and convert each line to a JavaBean.
    -JavaRDD<String> people = sc.textFile("examples/src/main/resources/people.txt");
    -
    -// The schema is encoded in a string
    -String schemaString = "name age";
    -
    -// Generate the schema based on the string of schema
    -List<StructField> fields = new ArrayList<>();
    -for (String fieldName: schemaString.split(" ")) {
    -  fields.add(DataTypes.createStructField(fieldName, DataTypes.StringType, true));
    -}
    -StructType schema = DataTypes.createStructType(fields);
    -
    -// Convert records of the RDD (people) to Rows.
    -JavaRDD<Row> rowRDD = people.map(
    -  new Function<String, Row>() {
    -    public Row call(String record) throws Exception {
    -      String[] fields = record.split(",");
    -      return RowFactory.create(fields[0], fields[1].trim());
    -    }
    -  });
    -
    -// Apply the schema to the RDD.
    -Dataset<Row> peopleDataFrame = spark.createDataFrame(rowRDD, schema);
    -
    -// Creates a temporary view using the DataFrame.
    -peopleDataFrame.createOrReplaceTempView("people");
    -
    -// SQL can be run over a temporary view created using DataFrames.
    -Dataset<Row> results = spark.sql("SELECT name FROM people");
    -
    -// The results of SQL queries are DataFrames and support all the normal RDD operations.
    -// The columns of a row in the result can be accessed by ordinal.
    -List<String> names = results.javaRDD().map(new Function<Row, String>() {
    -  public String call(Row row) {
    -    return "Name: " + row.getString(0);
    -  }
    -}).collect();
    -
    -{% endhighlight %}
    -
    +{% include_example programmatic_schema java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
    --- End diff --
    
    Same as above, add a newline before this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    **[Test build #62192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62192/consoleFull)** for PR 14119 at commit [`7451fc7`](https://github.com/apache/spark/commit/7451fc784d5c8f87c37f7707c4323a280d52417b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70264424
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1380,17 +949,17 @@ metadata.
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// spark is an existing HiveContext
    -spark.refreshTable("my_table")
    +// spark is an existing SparkSession
    +spark.catalog.refreshTable("my_table")
    --- End diff --
    
    Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70245539
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala ---
    @@ -41,43 +35,47 @@ object HiveFromSpark {
         // in the current directory and creates a directory configured by `spark.sql.warehouse.dir`,
         // which defaults to the directory `spark-warehouse` in the current directory that the spark
         // application is started.
    -    val spark = SparkSession.builder
    -      .appName("HiveFromSpark")
    -      .enableHiveSupport()
    -      .getOrCreate()
    +
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    +
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate()
     
         import spark.implicits._
         import spark.sql
     
         sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
    -    sql(s"LOAD DATA LOCAL INPATH '${kv1File.getAbsolutePath}' INTO TABLE src")
    +    sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
     
         // Queries are expressed in HiveQL
    -    println("Result of 'SELECT *': ")
    -    sql("SELECT * FROM src").collect().foreach(println)
    +    sql("SELECT * FROM src").show()
    --- End diff --
    
    Yea, this should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/14119

**Summary of the updates**

- `JavaSparkSQL.java` file was removed. I kept it initially since the file itself was quite old (2+ years) and it was present in your original WIP branch alongside the new file. But I can confirm that the new file covers the same functionality and more. No need to keep the old one, agree with you.
- Apache header in `JavaSqlDataSourceExample.java` was added.
- `$`-notation instead of `df.col("...")` in Scala examples.
- `col("...")` instead of `df.col("...")` in Java examples.
- Blank lines before `{% include_example programmatic_schema ... }` were added. However, everything was rendered fine locally even without them.
- 2 space indentation for chained method calls. My fault, sorry.
- Actual outputs for all `show()` calls were added.
- Tested manually and via `./dev/run-tests`.

**Open questions**
- Shall I add blank lines before each `{% include_example ... }` or only before those two examples?
- I pointed to a wrong location that exceeded the length limit. It is exactly the same functionality but in Java. So, 113 and 117 lines of the `JavaSqlDataSourceExample.java` file. In my view, it would make sense to keep them as they are now for the better looking documentation.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263573
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.io.Serializable;
    +// $example off:create_ds$
    +
    +// $example on:schema_inferring$
    +// $example on:programmatic_schema$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.function.Function;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import org.apache.spark.api.java.function.MapFunction;
    +// $example on:create_df$
    +// $example on:run_sql$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:programmatic_schema$
    +// $example off:create_df$
    +// $example off:run_sql$
    +import org.apache.spark.sql.Encoder;
    +import org.apache.spark.sql.Encoders;
    +// $example off:create_ds$
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.RowFactory;
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession;
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.DataTypes;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +// $example off:programmatic_schema$
    +
    +public class JavaSparkSqlExample {
    +  // $example on:create_ds$
    +  public static class Person implements Serializable {
    +    private String name;
    +    private int age;
    +
    +    public String getName() {
    +      return name;
    +    }
    +
    +    public void setName(String name) {
    +      this.name = name;
    +    }
    +
    +    public int getAge() {
    +      return age;
    +    }
    +
    +    public void setAge(int age) {
    +      this.age = age;
    +    }
    +  }
    +  // $example off:create_ds$
    +
    +  public static void main(String[] args) {
    +    // $example on:init_session$
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark);
    +    runDatasetCreationExample(spark);
    +    runInferSchemaExample(spark);
    +    runProgrammaticSchemaExample(spark);
    +
    +    spark.stop();
    +  }
    +
    +  private static void runBasicDataFrameExample(SparkSession spark) {
    +    // $example on:create_df$
    +    Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show();
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema();
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show();
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df.col("name"), df.col("age").plus(1)).show();
    +    // name    (age + 1)
    +    // Michael null
    +    // Andy    31
    +    // Justin  20
    +
    +    // Select people older than 21
    +    df.filter(df.col("age").gt(21)).show();
    +    // age name
    +    // 30  Andy
    +
    +    // Count people by age
    +    df.groupBy("age").count().show();
    +    // age  count
    +    // null 1
    +    // 19   1
    +    // 30   1
    +    // $example off:untyped_ops$
    +
    +    // $example on:run_sql$
    +    // Register the DataFrame as a SQL temporary view
    +    df.createOrReplaceTempView("people");
    +
    +    Dataset<Row> sqlDF = spark.sql("SELECT * FROM people");
    +    sqlDF.show();
    +    // $example off:run_sql$
    +  }
    +
    +  private static void runDatasetCreationExample(SparkSession spark) {
    +    // $example on:create_ds$
    +    // Create an instance of a Bean class
    +    Person person = new Person();
    +    person.setName("Andy");
    +    person.setAge(32);
    +
    +    // Encoders are created for Java beans
    +    Encoder<Person> personEncoder = Encoders.bean(Person.class);
    +    Dataset<Person> javaBeanDS = spark.createDataset(
    +        Collections.singletonList(person),
    +        personEncoder
    +    );
    +    javaBeanDS.show();
    +
    +    // Encoders for most common types are provided in class Encoders
    +    Encoder<Integer> integerEncoder = Encoders.INT();
    +    Dataset<Integer> primitiveDS = spark.createDataset(Arrays.asList(1, 2, 3), integerEncoder);
    +    Dataset<Integer> transformedDS = primitiveDS.map(new MapFunction<Integer, Integer>() {
    +      @Override
    +      public Integer call(Integer value) throws Exception {
    +        return value + 1;
    +      }
    +    }, integerEncoder);
    +    transformedDS.collect(); // Returns [2, 3, 4]
    +
    +    // DataFrames can be converted to a Dataset by providing a class. Mapping based on name
    +    String path = "examples/src/main/resources/people.json";
    +    Dataset<Person> peopleDS = spark.read().json(path).as(personEncoder);
    +    peopleDS.show();
    +    // $example off:create_ds$
    +  }
    +
    +  private static void runInferSchemaExample(SparkSession spark) {
    +    // $example on:schema_inferring$
    +    // Create an RDD of Person objects from a text file
    +    JavaRDD<Person> peopleRDD = spark.read()
    +        .textFile("examples/src/main/resources/people.txt")
    +        .javaRDD()
    +        .map(new Function<String, Person>() {
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70262914
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -679,43 +435,7 @@ a `DataFrame` can be created programmatically with three steps.
     by `SparkSession`.
     
     For example:
    -{% highlight scala %}
    -val spark: SparkSession // An existing SparkSession
    -
    -// Create an RDD
    -val people = sc.textFile("examples/src/main/resources/people.txt")
    -
    -// The schema is encoded in a string
    -val schemaString = "name age"
    -
    -// Import Row.
    -import org.apache.spark.sql.Row;
    -
    -// Import Spark SQL data types
    -import org.apache.spark.sql.types.{StructType, StructField, StringType};
    -
    -// Generate the schema based on the string of schema
    -val schema = StructType(schemaString.split(" ").map { fieldName =>
    -  StructField(fieldName, StringType, true)
    -})
    -
    -// Convert records of the RDD (people) to Rows.
    -val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
    -
    -// Apply the schema to the RDD.
    -val peopleDataFrame = spark.createDataFrame(rowRDD, schema)
    -
    -// Creates a temporary view using the DataFrame.
    -peopleDataFrame.createOrReplaceTempView("people")
    -
    -// SQL statements can be run by using the sql methods provided by spark.
    -val results = spark.sql("SELECT name FROM people")
    -
    -// The columns of a row in the result can be accessed by field index or by field name.
    -results.map(t => "Name: " + t(0)).collect().foreach(println)
    -{% endhighlight %}
    -
    -
    +{% include_example programmatic_schema scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
    --- End diff --
    
    Seems that you have to add a newline before the `{% include_example ... %}` tag, otherwise this section can't be properly rendered, not sure why:
    
    <img width="925" alt="screenshot at jul 11 21-59-30" src="https://cloud.githubusercontent.com/assets/230655/16733123/cd5a0d70-47b2-11e6-9689-621e9bdac73f.png">



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263767
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +// $example on:schema_inferring$
    +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
    +import org.apache.spark.sql.Encoder
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.Row
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.StringType
    +import org.apache.spark.sql.types.StructField
    +import org.apache.spark.sql.types.StructType
    +// $example off:programmatic_schema$
    +
    +object SparkSqlExample {
    +
    +  // $example on:create_ds$
    +  // Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
    +  // you can use custom classes that implement the Product interface
    +  case class Person(name: String, age: Long)
    +  // $example off:create_ds$
    +
    +  def main(args: Array[String]) {
    +    // $example on:init_session$
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    +
    +    // For implicit conversions like converting RDDs to DataFrames
    +    import spark.implicits._
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark)
    +    runDatasetCreationExample(spark)
    +    runInferSchemaExample(spark)
    +    runProgrammaticSchemaExample(spark)
    +
    +    spark.stop()
    +  }
    +
    +  private def runBasicDataFrameExample(spark: SparkSession): Unit = {
    +    // $example on:create_df$
    +    val df = spark.read.json("examples/src/main/resources/people.json")
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show()
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema()
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show()
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df("name"), df("age") + 1).show()
    +    // name    (age + 1)
    +    // Michael null
    +    // Andy    31
    +    // Justin  20
    +
    +    // Select people older than 21
    +    df.filter(df("age") > 21).show()
    +    // age name
    +    // 30  Andy
    +
    +    // Count people by age
    +    df.groupBy("age").count().show()
    +    // age  count
    +    // null 1
    +    // 19   1
    +    // 30   1
    +    // $example off:untyped_ops$
    +
    +    // $example on:run_sql$
    +    // Register the DataFrame as a SQL temporary view
    +    df.createOrReplaceTempView("people")
    +
    +    val sqlDF = spark.sql("SELECT * FROM people")
    +    sqlDF.show()
    +    // $example off:run_sql$
    +  }
    +
    +  private def runDatasetCreationExample(spark: SparkSession): Unit = {
    +    import spark.implicits._
    +    // $example on:create_ds$
    +    // Encoders are created for case classes
    +    val caseClassDS = Seq(Person("Andy", 32)).toDS()
    +    caseClassDS.show()
    +
    +    // Encoders for most common types are automatically provided by importing spark.implicits._
    +    val primitiveDS = Seq(1, 2, 3).toDS()
    +    primitiveDS.map(_ + 1).collect() // Returns: Array(2, 3, 4)
    +
    +    // DataFrames can be converted to a Dataset by providing a class. Mapping will be done by name
    +    val path = "examples/src/main/resources/people.json"
    +    val peopleDS = spark.read.json(path).as[Person]
    +    peopleDS.show()
    +    // $example off:create_ds$
    +  }
    +
    +  private def runInferSchemaExample(spark: SparkSession): Unit = {
    +    // $example on:schema_inferring$
    +    // For implicit conversions from RDDs to DataFrames
    +    import spark.implicits._
    +
    +    // Create an RDD of Person objects from a text file, convert it to a Dataframe
    +    val peopleDF = spark.sparkContext
    +        .textFile("examples/src/main/resources/people.txt")
    +        .map(_.split(","))
    +        .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
    +        .toDF()
    +    // Register the DataFrame as a temporary view
    +    peopleDF.createOrReplaceTempView("people")
    +
    +    // SQL statements can be run by using the sql methods provided by Spark
    +    val teenagersDF = spark.sql("SELECT name, age FROM people WHERE age BETWEEN 13 AND 19")
    +
    +    // The columns of a row in the result can be accessed by field index
    +    teenagersDF.map(teenager => "Name: " + teenager(0)).show()
    +
    +    // or by field name
    +    teenagersDF.map(teenager => "Name: " + teenager.getAs[String]("name")).show()
    +
    +    // No pre-defined encoders for Dataset[Map[K,V]], define explicitly
    +    implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]
    +    // Primitive types and case classes can be also defined as
    +    implicit val stringIntMapEncoder: Encoder[Map[String, Int]] = ExpressionEncoder()
    +
    +    // row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]
    +    teenagersDF.map(teenager => teenager.getValuesMap[Any](List("name", "age"))).collect()
    +    // Array(Map("name" -> "Justin", "age" -> 19))
    +    // $example off:schema_inferring$
    +  }
    +
    +  private def runProgrammaticSchemaExample(spark: SparkSession): Unit = {
    +    import spark.implicits._
    +    // $example on:programmatic_schema$
    +    // Create an RDD
    +    val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")
    +
    +    // The schema is encoded in a string
    +    val schemaString = "name age"
    +
    +    // Generate the schema based on the string of schema
    +    val fields = schemaString.split(" ")
    +        .map(fieldName => StructField(fieldName, StringType, nullable = true))
    +    val schema = StructType(fields)
    +
    +    // Convert records of the RDD (people) to Rows
    +    val rowRDD = peopleRDD
    +        .map(_.split(","))
    +        .map(attributes => Row(attributes(0), attributes(1).trim))
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    This looks pretty good! Only found a few minor issues. Thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    **[Test build #62192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62192/consoleFull)** for PR 14119 at commit [`7451fc7`](https://github.com/apache/spark/commit/7451fc784d5c8f87c37f7707c4323a280d52417b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Since you've added `JavaSparkSqlExample.scala`, we can remove `JavaSparkSQL.scala` now. (I guess that file was from my original WIP branch?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263783
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +import org.apache.spark.sql.SparkSession
    +
    +object SqlDataSourceExample {
    +
    +  case class Person(name: String, age: Long)
    +
    +  def main(args: Array[String]) {
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Data Soures Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70245522
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala ---
    @@ -41,43 +35,47 @@ object HiveFromSpark {
         // in the current directory and creates a directory configured by `spark.sql.warehouse.dir`,
         // which defaults to the directory `spark-warehouse` in the current directory that the spark
         // application is started.
    -    val spark = SparkSession.builder
    -      .appName("HiveFromSpark")
    -      .enableHiveSupport()
    -      .getOrCreate()
    +
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    +
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate()
     
         import spark.implicits._
         import spark.sql
     
         sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
    -    sql(s"LOAD DATA LOCAL INPATH '${kv1File.getAbsolutePath}' INTO TABLE src")
    +    sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
     
         // Queries are expressed in HiveQL
    -    println("Result of 'SELECT *': ")
    -    sql("SELECT * FROM src").collect().foreach(println)
    +    sql("SELECT * FROM src").show()
     
         // Aggregation queries are also supported.
    -    val count = sql("SELECT COUNT(*) FROM src").collect().head.getLong(0)
    -    println(s"COUNT(*): $count")
    +    sql("SELECT COUNT(*) FROM src").show()
     
    -    // The results of SQL queries are themselves RDDs and support all normal RDD functions.  The
    -    // items in the RDD are of type Row, which allows you to access each column by ordinal.
    -    val rddFromSql = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")
    +    // The results of SQL queries are themselves DataFrames and support all normal functions.
    +    val sqlDF = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")
     
    -    println("Result of RDD.map:")
    -    val rddAsStrings = rddFromSql.rdd.map {
    +    // The items in DaraFrames are of type Row, which allows you to access each column by ordinal.
    +    val stringsDS = sqlDF.map {
           case Row(key: Int, value: String) => s"Key: $key, Value: $value"
         }
    +    stringsDS.show()
     
    -    // You can also use RDDs to create temporary views within a HiveContext.
    -    val rdd = spark.sparkContext.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
    -    rdd.toDF().createOrReplaceTempView("records")
    +    // You can also use DataFrames to create temporary views within a HiveContext.
    +    val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i, s"val_$i")))
    +    recordsDF.createOrReplaceTempView("records")
     
    -    // Queries can then join RDD data with data stored in Hive.
    -    println("Result of SELECT *:")
    -    sql("SELECT * FROM records r JOIN src s ON r.key = s.key").collect().foreach(println)
    +    // Queries can then join DataFrame data with data stored in Hive.
    +    sql("SELECT * FROM records r JOIN src s ON r.key = s.key").show()
    +    // $example off:spark_hive$
     
         spark.stop()
       }
    -}
    -// scalastyle:on println
    +}
    --- End diff --
    
    Nit: Please add newline at the end of the source file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70173180
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    --- End diff --
    
    Here the imports do not follow the alphabetical order to avoid too many imports groups in the documentation (there would be a blank line between each "example on/off" block).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70257720
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +// $example on:schema_inferring$
    +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
    +import org.apache.spark.sql.Encoder
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.Row
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.StringType
    +import org.apache.spark.sql.types.StructField
    +import org.apache.spark.sql.types.StructType
    +// $example off:programmatic_schema$
    +
    +object SparkSqlExample {
    +
    +  // $example on:create_ds$
    +  // Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
    +  // you can use custom classes that implement the Product interface
    +  case class Person(name: String, age: Long)
    +  // $example off:create_ds$
    +
    +  def main(args: Array[String]) {
    +    // $example on:init_session$
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    +
    +    // For implicit conversions like converting RDDs to DataFrames
    +    import spark.implicits._
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark)
    +    runDatasetCreationExample(spark)
    +    runInferSchemaExample(spark)
    +    runProgrammaticSchemaExample(spark)
    +
    +    spark.stop()
    +  }
    +
    +  private def runBasicDataFrameExample(spark: SparkSession): Unit = {
    +    // $example on:create_df$
    +    val df = spark.read.json("examples/src/main/resources/people.json")
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show()
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema()
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show()
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df("name"), df("age") + 1).show()
    --- End diff --
    
    I'd prefer `$"name"` instead of `df("name")` through out all Scala examples. The latter is not recommended because it may introduce ambiguous query plans when dealing with self-joins. The `$`-notation needs `import spark.implicits._` though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    LGTM, I've merged this to master and branch-2.0. Thanks for working on this!
    
    I only observed one weird rendering caused by the blank lines before `{% include_example %}`, maybe my local Jekyll version is too low. I think it's fine to leave other lines as is. The exceeded lines should be OK.
    
    Could you please remove the WIP tag from the PR title? (I've removed it manually while merging this PR.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70173058
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala ---
    @@ -41,43 +35,47 @@ object HiveFromSpark {
         // in the current directory and creates a directory configured by `spark.sql.warehouse.dir`,
         // which defaults to the directory `spark-warehouse` in the current directory that the spark
         // application is started.
    -    val spark = SparkSession.builder
    -      .appName("HiveFromSpark")
    -      .enableHiveSupport()
    -      .getOrCreate()
    +
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    +
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate()
     
         import spark.implicits._
         import spark.sql
     
         sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
    -    sql(s"LOAD DATA LOCAL INPATH '${kv1File.getAbsolutePath}' INTO TABLE src")
    +    sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
     
         // Queries are expressed in HiveQL
    -    println("Result of 'SELECT *': ")
    -    sql("SELECT * FROM src").collect().foreach(println)
    +    sql("SELECT * FROM src").show()
    --- End diff --
    
    I replaced collect().foreach(println) with show() in all examples. Is it OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14119


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263718
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +// $example on:schema_inferring$
    +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
    +import org.apache.spark.sql.Encoder
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.Row
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.StringType
    +import org.apache.spark.sql.types.StructField
    +import org.apache.spark.sql.types.StructType
    +// $example off:programmatic_schema$
    +
    +object SparkSqlExample {
    +
    +  // $example on:create_ds$
    +  // Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
    +  // you can use custom classes that implement the Product interface
    +  case class Person(name: String, age: Long)
    +  // $example off:create_ds$
    +
    +  def main(args: Array[String]) {
    +    // $example on:init_session$
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62092/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263602
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.io.Serializable;
    +// $example off:create_ds$
    +
    +// $example on:schema_inferring$
    +// $example on:programmatic_schema$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.function.Function;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import org.apache.spark.api.java.function.MapFunction;
    +// $example on:create_df$
    +// $example on:run_sql$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:programmatic_schema$
    +// $example off:create_df$
    +// $example off:run_sql$
    +import org.apache.spark.sql.Encoder;
    +import org.apache.spark.sql.Encoders;
    +// $example off:create_ds$
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.RowFactory;
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession;
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.DataTypes;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +// $example off:programmatic_schema$
    +
    +public class JavaSparkSqlExample {
    +  // $example on:create_ds$
    +  public static class Person implements Serializable {
    +    private String name;
    +    private int age;
    +
    +    public String getName() {
    +      return name;
    +    }
    +
    +    public void setName(String name) {
    +      this.name = name;
    +    }
    +
    +    public int getAge() {
    +      return age;
    +    }
    +
    +    public void setAge(int age) {
    +      this.age = age;
    +    }
    +  }
    +  // $example off:create_ds$
    +
    +  public static void main(String[] args) {
    +    // $example on:init_session$
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark);
    +    runDatasetCreationExample(spark);
    +    runInferSchemaExample(spark);
    +    runProgrammaticSchemaExample(spark);
    +
    +    spark.stop();
    +  }
    +
    +  private static void runBasicDataFrameExample(SparkSession spark) {
    +    // $example on:create_df$
    +    Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show();
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema();
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show();
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df.col("name"), df.col("age").plus(1)).show();
    +    // name    (age + 1)
    +    // Michael null
    +    // Andy    31
    +    // Justin  20
    +
    +    // Select people older than 21
    +    df.filter(df.col("age").gt(21)).show();
    +    // age name
    +    // 30  Andy
    +
    +    // Count people by age
    +    df.groupBy("age").count().show();
    +    // age  count
    +    // null 1
    +    // 19   1
    +    // 30   1
    +    // $example off:untyped_ops$
    +
    +    // $example on:run_sql$
    +    // Register the DataFrame as a SQL temporary view
    +    df.createOrReplaceTempView("people");
    +
    +    Dataset<Row> sqlDF = spark.sql("SELECT * FROM people");
    +    sqlDF.show();
    +    // $example off:run_sql$
    +  }
    +
    +  private static void runDatasetCreationExample(SparkSession spark) {
    +    // $example on:create_ds$
    +    // Create an instance of a Bean class
    +    Person person = new Person();
    +    person.setName("Andy");
    +    person.setAge(32);
    +
    +    // Encoders are created for Java beans
    +    Encoder<Person> personEncoder = Encoders.bean(Person.class);
    +    Dataset<Person> javaBeanDS = spark.createDataset(
    +        Collections.singletonList(person),
    +        personEncoder
    +    );
    +    javaBeanDS.show();
    +
    +    // Encoders for most common types are provided in class Encoders
    +    Encoder<Integer> integerEncoder = Encoders.INT();
    +    Dataset<Integer> primitiveDS = spark.createDataset(Arrays.asList(1, 2, 3), integerEncoder);
    +    Dataset<Integer> transformedDS = primitiveDS.map(new MapFunction<Integer, Integer>() {
    +      @Override
    +      public Integer call(Integer value) throws Exception {
    +        return value + 1;
    +      }
    +    }, integerEncoder);
    +    transformedDS.collect(); // Returns [2, 3, 4]
    +
    +    // DataFrames can be converted to a Dataset by providing a class. Mapping based on name
    +    String path = "examples/src/main/resources/people.json";
    +    Dataset<Person> peopleDS = spark.read().json(path).as(personEncoder);
    +    peopleDS.show();
    +    // $example off:create_ds$
    +  }
    +
    +  private static void runInferSchemaExample(SparkSession spark) {
    +    // $example on:schema_inferring$
    +    // Create an RDD of Person objects from a text file
    +    JavaRDD<Person> peopleRDD = spark.read()
    +        .textFile("examples/src/main/resources/people.txt")
    +        .javaRDD()
    +        .map(new Function<String, Person>() {
    +          @Override
    +          public Person call(String line) throws Exception {
    +            String[] parts = line.split(",");
    +            Person person = new Person();
    +            person.setName(parts[0]);
    +            person.setAge(Integer.parseInt(parts[1].trim()));
    +            return person;
    +          }
    +        });
    +
    +    // Apply a schema to an RDD of JavaBeans to get a DataFrame
    +    Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);
    +    // Register the DataFrame as a temporary view
    +    peopleDF.createOrReplaceTempView("people");
    +
    +    // SQL statements can be run by using the sql methods provided by spark
    +    Dataset<Row> teenagersDF = spark.sql("SELECT name FROM people WHERE age BETWEEN 13 AND 19");
    +
    +    // The columns of a row in the result can be accessed by field index
    +    Encoder<String> stringEncoder = Encoders.STRING();
    +    Dataset<String> teenagerNamesByIndexDF = teenagersDF.map(new MapFunction<Row, String>() {
    +      @Override
    +      public String call(Row row) throws Exception {
    +        return "Name: " + row.getString(0);
    +      }
    +    }, stringEncoder);
    +    teenagerNamesByIndexDF.show();
    +
    +    // or by field name
    +    Dataset<String> teenagerNamesByFieldDF = teenagersDF.map(new MapFunction<Row, String>() {
    +      @Override
    +      public String call(Row row) throws Exception {
    +        return "Name: " + row.<String>getAs("name");
    +      }
    +    }, stringEncoder);
    +    teenagerNamesByFieldDF.show();
    +    // $example off:schema_inferring$
    +  }
    +
    +  private static void runProgrammaticSchemaExample(SparkSession spark) {
    +    // $example on:programmatic_schema$
    +    // Create an RDD
    +    JavaRDD<String> peopleRDD = spark.sparkContext()
    +        .textFile("examples/src/main/resources/people.txt", 1)
    +        .toJavaRDD();
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70256850
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java ---
    @@ -0,0 +1,192 @@
    +package org.apache.spark.examples.sql;
    --- End diff --
    
    Please add Apache license header.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263832
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala ---
    @@ -41,43 +35,47 @@ object HiveFromSpark {
         // in the current directory and creates a directory configured by `spark.sql.warehouse.dir`,
         // which defaults to the directory `spark-warehouse` in the current directory that the spark
         // application is started.
    -    val spark = SparkSession.builder
    -      .appName("HiveFromSpark")
    -      .enableHiveSupport()
    -      .getOrCreate()
    +
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    +
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate()
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263693
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/hive/JavaSparkHiveExample.java ---
    @@ -0,0 +1,106 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.hive;
    +
    +// $example on:spark_hive$
    +
    +import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.spark.api.java.function.MapFunction;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Encoders;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.SparkSession;
    +// $example off:spark_hive$
    +
    +public class JavaSparkHiveExample {
    +
    +  // $example on:spark_hive$
    +  public static class Record implements Serializable {
    +    private int key;
    +    private String value;
    +
    +    public int getKey() {
    +      return key;
    +    }
    +
    +    public void setKey(int key) {
    +      this.key = key;
    +    }
    +
    +    public String getValue() {
    +      return value;
    +    }
    +
    +    public void setValue(String value) {
    +      this.value = value;
    +    }
    +  }
    +  // $example off:spark_hive$
    +
    +  public static void main(String[] args) {
    +    // $example on:spark_hive$
    +    // warehouseLocation points to the default location for managed databases and tables
    +    String warehouseLocation = "file:" + System.getProperty("user.dir") + "spark-warehouse";
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark Hive Example")
    +        .config("spark.sql.warehouse.dir", warehouseLocation)
    +        .enableHiveSupport()
    +        .getOrCreate();
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70261011
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.io.Serializable;
    +// $example off:create_ds$
    +
    +// $example on:schema_inferring$
    +// $example on:programmatic_schema$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.function.Function;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import org.apache.spark.api.java.function.MapFunction;
    +// $example on:create_df$
    +// $example on:run_sql$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:programmatic_schema$
    +// $example off:create_df$
    +// $example off:run_sql$
    +import org.apache.spark.sql.Encoder;
    +import org.apache.spark.sql.Encoders;
    +// $example off:create_ds$
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.RowFactory;
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession;
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.DataTypes;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +// $example off:programmatic_schema$
    +
    +public class JavaSparkSqlExample {
    +  // $example on:create_ds$
    +  public static class Person implements Serializable {
    +    private String name;
    +    private int age;
    +
    +    public String getName() {
    +      return name;
    +    }
    +
    +    public void setName(String name) {
    +      this.name = name;
    +    }
    +
    +    public int getAge() {
    +      return age;
    +    }
    +
    +    public void setAge(int age) {
    +      this.age = age;
    +    }
    +  }
    +  // $example off:create_ds$
    +
    +  public static void main(String[] args) {
    +    // $example on:init_session$
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    +    // $example off:init_session$
    +
    +    runBasicDataFrameExample(spark);
    +    runDatasetCreationExample(spark);
    +    runInferSchemaExample(spark);
    +    runProgrammaticSchemaExample(spark);
    +
    +    spark.stop();
    +  }
    +
    +  private static void runBasicDataFrameExample(SparkSession spark) {
    +    // $example on:create_df$
    +    Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");
    +
    +    // Displays the content of the DataFrame to stdout
    +    df.show();
    +    // age  name
    +    // null Michael
    +    // 30   Andy
    +    // 19   Justin
    +    // $example off:create_df$
    +
    +    // $example on:untyped_ops$
    +    // Print the schema in a tree format
    +    df.printSchema();
    +    // root
    +    // |-- age: long (nullable = true)
    +    // |-- name: string (nullable = true)
    +
    +    // Select only the "name" column
    +    df.select("name").show();
    +    // name
    +    // Michael
    +    // Andy
    +    // Justin
    +
    +    // Select everybody, but increment the age by 1
    +    df.select(df.col("name"), df.col("age").plus(1)).show();
    --- End diff --
    
    Let's use `col("...")` instead of `df.col("...")` throughout all the Java examples. This requires
    
    ```java
    import static org.apache.spark.sql.functions.col;
    ```
    
    The reason is similar to the reason why we prefer `$"..."` to `df("...")` in Scala code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263511
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql;
    +
    +// $example on:programmatic_schema$
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.io.Serializable;
    +// $example off:create_ds$
    +
    +// $example on:schema_inferring$
    +// $example on:programmatic_schema$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.function.Function;
    +// $example off:programmatic_schema$
    +// $example on:create_ds$
    +import org.apache.spark.api.java.function.MapFunction;
    +// $example on:create_df$
    +// $example on:run_sql$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:programmatic_schema$
    +// $example off:create_df$
    +// $example off:run_sql$
    +import org.apache.spark.sql.Encoder;
    +import org.apache.spark.sql.Encoders;
    +// $example off:create_ds$
    +// $example off:schema_inferring$
    +import org.apache.spark.sql.RowFactory;
    +// $example on:init_session$
    +import org.apache.spark.sql.SparkSession;
    +// $example off:init_session$
    +// $example on:programmatic_schema$
    +import org.apache.spark.sql.types.DataTypes;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +// $example off:programmatic_schema$
    +
    +public class JavaSparkSqlExample {
    +  // $example on:create_ds$
    +  public static class Person implements Serializable {
    +    private String name;
    +    private int age;
    +
    +    public String getName() {
    +      return name;
    +    }
    +
    +    public void setName(String name) {
    +      this.name = name;
    +    }
    +
    +    public int getAge() {
    +      return age;
    +    }
    +
    +    public void setAge(int age) {
    +      this.age = age;
    +    }
    +  }
    +  // $example off:create_ds$
    +
    +  public static void main(String[] args) {
    +    // $example on:init_session$
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Can we add actual stdout output after each `.show()` call?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70173131
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql
    +
    +import org.apache.spark.sql.SparkSession
    +
    +object SqlDataSourceExample {
    +
    +  case class Person(name: String, age: Long)
    +
    +  def main(args: Array[String]) {
    +    val spark = SparkSession
    +        .builder()
    +        .appName("Spark SQL Data Soures Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate()
    +
    +    runBasicDataSourceExample(spark)
    +    runBasicParquetExample(spark)
    +    runParquetSchemaMergingExample(spark)
    +    runJsonDatasetExample(spark)
    +
    +    spark.stop()
    +  }
    +
    +  private def runBasicDataSourceExample(spark: SparkSession): Unit = {
    +    // $example on:generic_load_save_functions$
    +    val usersDF = spark.read.load("examples/src/main/resources/users.parquet")
    +    usersDF.select("name", "favorite_color").write.save("namesAndFavColors.parquet")
    +    // $example off:generic_load_save_functions$
    +    // $example on:manual_load_options$
    +    val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")
    +    peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet")
    +    // $example off:manual_load_options$
    +    // $example on:direct_sql$
    +    val sqlDF = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`")
    --- End diff --
    
    Here the line length slightly exceeds the limit to make the look of the documentation better. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70263657
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java ---
    @@ -0,0 +1,192 @@
    +package org.apache.spark.examples.sql;
    +
    +// $example on:schema_merging$
    +import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.List;
    +// $example off:schema_merging$
    +
    +// $example on:basic_parquet_example$
    +import org.apache.spark.api.java.function.MapFunction;
    +import org.apache.spark.sql.Encoders;
    +// import org.apache.spark.sql.Encoders;
    +// $example on:schema_merging$
    +// $example on:json_dataset$
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off:json_dataset$
    +// $example off:schema_merging$
    +// $example off:basic_parquet_example$
    +import org.apache.spark.sql.SparkSession;
    +
    +public class JavaSqlDataSourceExample {
    +
    +  // $example on:schema_merging$
    +  public static class Square implements Serializable {
    +    private int value;
    +    private int square;
    +
    +    // Getters and setters...
    +    // $example off:schema_merging$
    +    public int getValue() {
    +      return value;
    +    }
    +
    +    public void setValue(int value) {
    +      this.value = value;
    +    }
    +
    +    public int getSquare() {
    +      return square;
    +    }
    +
    +    public void setSquare(int square) {
    +      this.square = square;
    +    }
    +    // $example on:schema_merging$
    +  }
    +  // $example off:schema_merging$
    +
    +  // $example on:schema_merging$
    +  public static class Cube implements Serializable {
    +    private int value;
    +    private int cube;
    +
    +    // Getters and setters...
    +    // $example off:schema_merging$
    +    public int getValue() {
    +      return value;
    +    }
    +
    +    public void setValue(int value) {
    +      this.value = value;
    +    }
    +
    +    public int getCube() {
    +      return cube;
    +    }
    +
    +    public void setCube(int cube) {
    +      this.cube = cube;
    +    }
    +    // $example on:schema_merging$
    +  }
    +  // $example off:schema_merging$
    +
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +        .builder()
    +        .appName("Java Spark SQL Data Sources Example")
    +        .config("spark.some.config.option", "some-value")
    +        .getOrCreate();
    --- End diff --
    
    Nit: Use 2-space indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

Posted by aokolnychyi <gi...@git.apache.org>.

Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14119#discussion_r70173035
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1380,17 +949,17 @@ metadata.
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// spark is an existing HiveContext
    -spark.refreshTable("my_table")
    +// spark is an existing SparkSession
    +spark.catalog.refreshTable("my_table")
    --- End diff --
    
    Is it the correct way to refresh?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62192/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14119
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org