You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2015/12/01 07:33:34 UTC

[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/10060

    [WIP][SPARK-12069][SQL] Update documentation with Datasets

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10060
    
----
commit 649541c805a2ae69c53eb85ec9ee05dfbf8abf65
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-12-01T06:25:01Z

    docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163001886
  
    **[Test build #47356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47356/consoleFull)** for PR 10060 at commit [`3ff7a46`](https://github.com/apache/spark/commit/3ff7a463fdfc0e3e96aaf4546c5b57e538538d19).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-160891901
  
    **[Test build #46946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46946/consoleFull)** for PR 10060 at commit [`649541c`](https://github.com/apache/spark/commit/649541c805a2ae69c53eb85ec9ee05dfbf8abf65).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12069][SQL] Update documentation with D...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10060


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-161766475
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-161766236
  
    **[Test build #47151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47151/consoleFull)** for PR 10060 at commit [`3e53a4c`](https://github.com/apache/spark/commit/3e53a4ceb67f3b48c17c676843085cd431ac34c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46363607
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -19,6 +19,9 @@ package org.apache.spark.sql
     
     import java.lang.reflect.Modifier
     
    +import org.apache.spark.annotation.Experimental
    --- End diff --
    
    import order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163031777
  
    Thanks for the comments!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163002009
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47356/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47006878
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -9,18 +9,51 @@ title: Spark SQL and DataFrames
     
     # Overview
     
    -Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    +Spark SQL is a Spark module for structured data processing.  Unlike the basic Spark RDD API, the interfaces provided
    --- End diff --
    
    nit: 2 whitespaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47000161
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
    @@ -73,7 +73,25 @@ class TypedColumn[-T, U](
     
     /**
      * :: Experimental ::
    - * A column in a [[DataFrame]].
    + * A column that will be computed based on the data in a [[DataFrame]].
    + *
    + * A new column is constructed based on the input columns present in a dataframe:
    --- End diff --
    
    ah, good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163007946
  
    I made a few comments, but otherwise it's clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46624363
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -9,18 +9,51 @@ title: Spark SQL and DataFrames
     
     # Overview
     
    -Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    +Spark SQL is a Spark module for structured data processing.  Unlike the basic Spark RDD API, the interfaces provided
    +by Spark SQL provide Spark with more about the structure of both the data and the computation being performed.  Internally,
    +Spark SQL uses this extra information to perform extra optimizations.  There are several ways to
    +interact with Spark SQL including SQL, the DataFrames API and the Datasets API.  When computing a result
    +the same execution engine is used, independent of which API/language you are using to express the
    +computation.  This unification means that developers can easily switch back and forth between the
    +various APIs based on which provides the most natural way to express a given transformation.
     
    -Spark SQL can also be used to read data from an existing Hive installation.  For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section.
    +All of the examples on this page use sample data included in the Spark distribution and can be run in
    +the `spark-shell`, `pyspark` shell, or `sparkR` shell.
     
    -# DataFrames
    +## SQL
     
    -A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.
    +One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL.
    +Spark SQL can also be used to read data from an existing Hive installation.  For more on how to
    +configure this feature, please refer to the [Hive Tables](#hive-tables) section.  When running
    +SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames).
    +You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli)
    +or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
     
    -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), [Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), [Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html).
    +## DataFrames
     
    -All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell.
    +A DataFrame is a distributed collection of data organized into named columns. It is conceptually
    +equivalent to a table in a relational database or a data frame in R/Python, but with richer
    +optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such
    +as: structured data files, tables in Hive, external databases, or existing RDDs.
     
    +The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
    +[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
    +[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html).
    +
    +## Datasets
    +
    +A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of
    +RDDs (strong typing, ability to use powerful lambda functions) with the benifits of Spark SQL's
    --- End diff --
    
    benifits -> benefits 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163059528
  
    **[Test build #47366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47366/consoleFull)** for PR 10060 at commit [`4b51ad7`](https://github.com/apache/spark/commit/4b51ad781da69c73b12e92673ac0f4cf57bea370).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47007033
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -9,18 +9,51 @@ title: Spark SQL and DataFrames
     
     # Overview
     
    -Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    +Spark SQL is a Spark module for structured data processing.  Unlike the basic Spark RDD API, the interfaces provided
    +by Spark SQL provide Spark with more about the structure of both the data and the computation being performed.  Internally,
    --- End diff --
    
    Is there a word missing between "more" and "about" like information?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46283588
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -26,13 +29,51 @@ import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundRe
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically though implicits from a `SQLContext`.
    --- End diff --
    
    I might be mistaken but I think you meant to write "through" and not "though".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47020550
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -19,20 +19,60 @@ package org.apache.spark.sql
     
     import java.lang.reflect.Modifier
     
    +import scala.annotation.implicitNotFound
     import scala.reflect.{ClassTag, classTag}
     
    +import org.apache.spark.annotation.Experimental
     import org.apache.spark.sql.catalyst.encoders.{ExpressionEncoder, encoderFor}
     import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundReference, EncodeUsingSerializer}
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically through implicits from a `SQLContext`.
    + *
    + * {{{
    + *   import sqlContext.implicits._
    + *
    + *   val ds = Seq(1, 2, 3).toDS() // implicitly provided (sqlContext.implicits.newIntEncoder)
    + * }}}
    + *
    + * == Java ==
    + * Encoders are specified by calling static methods on [[Encoders]].
    + *
    + * {{{
    + *   List<String> data = Arrays.asList("abc", "abc", "xyz");
    + *   Dataset<String> ds = context.createDataset(data, Encoders.STRING());
    + * }}}
    + *
    + * Encoders can be composed into tuples:
    + *
    + * {{{
    + *   Encoder<Tuple2<Integer, String>> encoder2 = Encoders.tuple(Encoders.INT(), Encoders.STRING());
    + *   List<Tuple2<Integer, String>> data2 = Arrays.asList(new scala.Tuple2(1, "a");
    + *   Dataset<Tuple2<Integer, String>> ds2 = context.createDataset(data2, encoder2);
    + * }}}
    + *
    + * Or constructed from Java Beans:
    + *
    + * {{{
    + *   Encoders.bean(MyClass.class);
    + * }}}
    + *
    + * == Implementation ==
    + *  - Encoders are not intended to be thread-safe and thus they are allowed to avoid internal
    --- End diff --
    
    updated, is that better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12069][SQL] Update documentation with D...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47056802
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -19,20 +19,60 @@ package org.apache.spark.sql
     
     import java.lang.reflect.Modifier
     
    +import scala.annotation.implicitNotFound
     import scala.reflect.{ClassTag, classTag}
     
    +import org.apache.spark.annotation.Experimental
     import org.apache.spark.sql.catalyst.encoders.{ExpressionEncoder, encoderFor}
     import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundReference, EncodeUsingSerializer}
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically through implicits from a `SQLContext`.
    + *
    + * {{{
    + *   import sqlContext.implicits._
    + *
    + *   val ds = Seq(1, 2, 3).toDS() // implicitly provided (sqlContext.implicits.newIntEncoder)
    + * }}}
    + *
    + * == Java ==
    + * Encoders are specified by calling static methods on [[Encoders]].
    + *
    + * {{{
    + *   List<String> data = Arrays.asList("abc", "abc", "xyz");
    + *   Dataset<String> ds = context.createDataset(data, Encoders.STRING());
    + * }}}
    + *
    + * Encoders can be composed into tuples:
    + *
    + * {{{
    + *   Encoder<Tuple2<Integer, String>> encoder2 = Encoders.tuple(Encoders.INT(), Encoders.STRING());
    + *   List<Tuple2<Integer, String>> data2 = Arrays.asList(new scala.Tuple2(1, "a");
    + *   Dataset<Tuple2<Integer, String>> ds2 = context.createDataset(data2, encoder2);
    + * }}}
    + *
    + * Or constructed from Java Beans:
    + *
    + * {{{
    + *   Encoders.bean(MyClass.class);
    + * }}}
    + *
    + * == Implementation ==
    + *  - Encoders are not intended to be thread-safe and thus they are allowed to avoid internal
    --- End diff --
    
    It's way clearer, yup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46624388
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -9,18 +9,51 @@ title: Spark SQL and DataFrames
     
     # Overview
     
    -Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    +Spark SQL is a Spark module for structured data processing.  Unlike the basic Spark RDD API, the interfaces provided
    +by Spark SQL provide Spark with more about the structure of both the data and the computation being performed.  Internally,
    +Spark SQL uses this extra information to perform extra optimizations.  There are several ways to
    +interact with Spark SQL including SQL, the DataFrames API and the Datasets API.  When computing a result
    +the same execution engine is used, independent of which API/language you are using to express the
    +computation.  This unification means that developers can easily switch back and forth between the
    +various APIs based on which provides the most natural way to express a given transformation.
     
    -Spark SQL can also be used to read data from an existing Hive installation.  For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section.
    +All of the examples on this page use sample data included in the Spark distribution and can be run in
    +the `spark-shell`, `pyspark` shell, or `sparkR` shell.
     
    -# DataFrames
    +## SQL
     
    -A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.
    +One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL.
    +Spark SQL can also be used to read data from an existing Hive installation.  For more on how to
    +configure this feature, please refer to the [Hive Tables](#hive-tables) section.  When running
    +SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames).
    +You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli)
    +or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
     
    -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), [Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), [Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html).
    +## DataFrames
     
    -All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell.
    +A DataFrame is a distributed collection of data organized into named columns. It is conceptually
    +equivalent to a table in a relational database or a data frame in R/Python, but with richer
    +optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such
    +as: structured data files, tables in Hive, external databases, or existing RDDs.
     
    +The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
    +[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
    +[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html).
    +
    +## Datasets
    +
    +A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of
    +RDDs (strong typing, ability to use powerful lambda functions) with the benifits of Spark SQL's
    +optimized execution engine.  A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated
    +using functional transformations (map, flatMap, filter, etc.).
    +
    +The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
    +[Java](api/java/index.html?org/apache/spark/sql/Dataset.html).  Python does not yet have support for
    +the Dataset API, but due to its dynamic nature many of the benifits are already available (i.e. you can
    --- End diff --
    
    benifits -> benefits 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163033295
  
    **[Test build #47366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47366/consoleFull)** for PR 10060 at commit [`4b51ad7`](https://github.com/apache/spark/commit/4b51ad781da69c73b12e92673ac0f4cf57bea370).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-161766478
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47151/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163059722
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47366/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163002005
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-161739656
  
    **[Test build #47151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47151/consoleFull)** for PR 10060 at commit [`3e53a4c`](https://github.com/apache/spark/commit/3e53a4ceb67f3b48c17c676843085cd431ac34c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47007923
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -428,6 +461,45 @@ df <- sql(sqlContext, "SELECT * FROM table")
     </div>
     
     
    +## Creating Datasets
    +
    +Datasets are similar to RDDs, however, instead of using Java Serialization or Kryo they use
    +a specialized [Encoder](api/scala/index.html#org.apache.spark.sql.Encoder) to serialize the objects
    +for processing or transmitting over the network. While both encoders and standard serialization are
    +responsible for during an object into bytes, encoders are code generated dynamically and use a format
    +that allows Spark to perform many operations like filtering, sorting and hashing without deserialzing
    +the back into an object.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    --- End diff --
    
    nit: 2 whitespaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47007845
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -428,6 +461,45 @@ df <- sql(sqlContext, "SELECT * FROM table")
     </div>
     
     
    +## Creating Datasets
    +
    +Datasets are similar to RDDs, however, instead of using Java Serialization or Kryo they use
    +a specialized [Encoder](api/scala/index.html#org.apache.spark.sql.Encoder) to serialize the objects
    +for processing or transmitting over the network. While both encoders and standard serialization are
    +responsible for during an object into bytes, encoders are code generated dynamically and use a format
    +that allows Spark to perform many operations like filtering, sorting and hashing without deserialzing
    +the back into an object.
    --- End diff --
    
    the **bytes** back into an object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47007753
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -428,6 +461,45 @@ df <- sql(sqlContext, "SELECT * FROM table")
     </div>
     
     
    +## Creating Datasets
    +
    +Datasets are similar to RDDs, however, instead of using Java Serialization or Kryo they use
    +a specialized [Encoder](api/scala/index.html#org.apache.spark.sql.Encoder) to serialize the objects
    +for processing or transmitting over the network. While both encoders and standard serialization are
    +responsible for during an object into bytes, encoders are code generated dynamically and use a format
    --- End diff --
    
    during -> turning?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-160892064
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46946/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46244815
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -26,13 +29,51 @@ import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundRe
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically though implicits from a `SQLContext`.
    + *
    + * {{{
    + *   import sqlContext.implicits._
    + *
    + *   val ds = Seq(1, 2, 3).toDS() // implicitly provided (sqlContext.implicits.newIntEncoder)
    + * }}}
    + *
    + * == Java ==
    + * Encoders are specified by calling static methods on [[Encoders]].
    + *
    + * {{{
    + *   List<String> data = Arrays.asList("abc", "abc", "xyz");
    + *   Dataset<String> ds = context.createDataset(data, Encoders.STRING());
    + * }}}
    + *
    + * Encoders can be composed into tuples:
    + *
    + * {{{
    + *   Encoder<Tuple2<Integer, String>> encoder2 = Encoders.tuple(Encoders.INT(), Encoders.STRING());
    + *   List<Tuple2<Integer, String>> data2 = Arrays.asList(new scala.Tuple2(1, "a");
    + *   Dataset<Tuple2<Integer, String>> ds2 = context.createDataset(data2, encoder2);
    + * }}}
    + *
    + * Or constructed from Java Beans:
    + *
    + * {{{
    + *   Encoders.bean(MyClass.class);
    + * }}}
    + *
    + * == Implementation ==
    + *  - Encoders are not intended to be thread-safe and thus they are allowed to avoid internal
    + *  locking and reuse internal buffers to improve performance.
      *
      * @since 1.6.0
      */
    +@Experimental
    +@implicitNotFound("Unable to find encoder for type stored in a Dataset.  Primitive types " +
    +  "(Int, String, etc) and Products (case classes) and primitive types are supported by " +
    +  "importing sqlContext.implicits._  Support for serializing other types will be added in future " +
    --- End diff --
    
    @marmbrus Primitive types mentioned twice ? Is it ok ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46364663
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
    @@ -73,7 +73,25 @@ class TypedColumn[-T, U](
     
     /**
      * :: Experimental ::
    - * A column in a [[DataFrame]].
    + * A column that will be computed based on the data in a [[DataFrame]].
    + *
    + * A new column is constructed based on the input columns present in a dataframe:
    --- End diff --
    
    also mention literal here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47008598
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -19,20 +19,60 @@ package org.apache.spark.sql
     
     import java.lang.reflect.Modifier
     
    +import scala.annotation.implicitNotFound
     import scala.reflect.{ClassTag, classTag}
     
    +import org.apache.spark.annotation.Experimental
     import org.apache.spark.sql.catalyst.encoders.{ExpressionEncoder, encoderFor}
     import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundReference, EncodeUsingSerializer}
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically through implicits from a `SQLContext`.
    + *
    + * {{{
    + *   import sqlContext.implicits._
    + *
    + *   val ds = Seq(1, 2, 3).toDS() // implicitly provided (sqlContext.implicits.newIntEncoder)
    + * }}}
    + *
    + * == Java ==
    + * Encoders are specified by calling static methods on [[Encoders]].
    + *
    + * {{{
    + *   List<String> data = Arrays.asList("abc", "abc", "xyz");
    + *   Dataset<String> ds = context.createDataset(data, Encoders.STRING());
    + * }}}
    + *
    + * Encoders can be composed into tuples:
    + *
    + * {{{
    + *   Encoder<Tuple2<Integer, String>> encoder2 = Encoders.tuple(Encoders.INT(), Encoders.STRING());
    + *   List<Tuple2<Integer, String>> data2 = Arrays.asList(new scala.Tuple2(1, "a");
    + *   Dataset<Tuple2<Integer, String>> ds2 = context.createDataset(data2, encoder2);
    + * }}}
    + *
    + * Or constructed from Java Beans:
    + *
    + * {{{
    + *   Encoders.bean(MyClass.class);
    + * }}}
    + *
    + * == Implementation ==
    + *  - Encoders are not intended to be thread-safe and thus they are allowed to avoid internal
    --- End diff --
    
    I'm not sure I understand this sentence: "allowed to avoid" is troubling me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-162989557
  
    **[Test build #47356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47356/consoleFull)** for PR 10060 at commit [`3ff7a46`](https://github.com/apache/spark/commit/3ff7a463fdfc0e3e96aaf4546c5b57e538538d19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r47008015
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -428,6 +461,45 @@ df <- sql(sqlContext, "SELECT * FROM table")
     </div>
     
     
    +## Creating Datasets
    +
    +Datasets are similar to RDDs, however, instead of using Java Serialization or Kryo they use
    +a specialized [Encoder](api/scala/index.html#org.apache.spark.sql.Encoder) to serialize the objects
    +for processing or transmitting over the network. While both encoders and standard serialization are
    +responsible for during an object into bytes, encoders are code generated dynamically and use a format
    +that allows Spark to perform many operations like filtering, sorting and hashing without deserialzing
    +the back into an object.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +
    +{% highlight scala %}
    +// Encoders for most common types are automatically provided by importing sqlContext.implicits._
    +val ds = Seq(1, 2, 3).toDS()
    +ds.map(_ + 1).collect() // Returns: Array(2, 3, 4)
    +
    +// Encoders are also created for case classes.
    +case class Person(name: String, age: Long)
    +val ds = Seq(Person("Andy", 32)).toDS()
    +
    +// DataFrames can be converted to a Dataset by providing a class.  Mapping will be done by name.
    --- End diff --
    
    2 whitespaces here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10060#discussion_r46363689
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala ---
    @@ -26,13 +29,51 @@ import org.apache.spark.sql.catalyst.expressions.{DecodeUsingSerializer, BoundRe
     import org.apache.spark.sql.types._
     
     /**
    + * :: Experimental ::
      * Used to convert a JVM object of type `T` to and from the internal Spark SQL representation.
      *
    - * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
    - * and reuse internal buffers to improve performance.
    + * == Scala ==
    + * Encoders are generally created automatically though implicits from a `SQLContext`.
    --- End diff --
    
    It would also be great to expand this slightly and explain what can be inferred automatically right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-160892063
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-163059720
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10060#issuecomment-160871534
  
    **[Test build #46946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46946/consoleFull)** for PR 10060 at commit [`649541c`](https://github.com/apache/spark/commit/649541c805a2ae69c53eb85ec9ee05dfbf8abf65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org