You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pjfanning <gi...@git.apache.org> on 2017/02/11 19:05:40 UTC

[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

GitHub user pjfanning opened a pull request:

    https://github.com/apache/spark/pull/16895

    [SPARK-15615][SQL] Add an API to load DataFrame from Dataset[String] storing JSON

    ## What changes were proposed in this pull request?
    
    SPARK-15615 proposes replacing the sqlContext.read.json(rdd) with a dataset equivalent.
    SPARK-15463 adds a CSV API for reading from Dataset[String] so this keeps the API consistent.
    I am deprecating the existing RDD based APIs.
    
    ## How was this patch tested?
    
    There are existing tests. I left most tests to use the existing APIs as they delegate to the new json API.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pjfanning/spark SPARK-15615

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16895
    
----
commit 3c477c1ad95569a3b58d46cb39df516e284bd706
Author: pj.fanning <pj...@workday.com>
Date:   2017-02-11T19:00:54Z

    [SPARK-15615][SQL] Add an API to load DataFrame from Dataset[String] storing JSON, deprecating existing RDD APIs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73086/testReport)** for PR 16895 at commit [`82561c0`](https://github.com/apache/spark/commit/82561c072c668e062e9d854074bb3ef50320dd5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class LSHParams(Params):`
      * `class LSHModel(JavaModel):`
      * `class BucketedRandomProjectionLSH(JavaEstimator, LSHParams, HasInputCol, HasOutputCol, HasSeed,`
      * `class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable):`
      * `class MinHashLSH(JavaEstimator, LSHParams, HasInputCol, HasOutputCol, HasSeed,`
      * `class MinHashLSHModel(LSHModel, JavaMLReadable, JavaMLWritable):`
      * `case class StreamingExplainCommand(`
      * `case class SaveIntoDataSourceCommand(`
      * `abstract class JsonDataSource[T] extends Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73086/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73087/testReport)** for PR 16895 at commit [`cdf53bf`](https://github.com/apache/spark/commit/cdf53bf517a2f9dd6bbe347455cfb1be1f15ca45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72749/testReport)** for PR 16895 at commit [`bb304de`](https://github.com/apache/spark/commit/bb304de56e43c8e9e49dd77ca45d883b8c907fc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r101591685
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1364,10 +1364,11 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         })
       }
     
    -  test("SPARK-6245 JsonRDD.inferSchema on empty RDD") {
    +  test("SPARK-6245 JsonRDD.inferSchema on empty Dataset") {
         // This is really a test that it doesn't throw an exception
    +    val emptyDataset = spark.createDataset(empty)(Encoders.STRING)
    --- End diff --
    
    I think the implicits are already imported at the beginning of this test suite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    LGTM, I'll merge it in 1 or 2 days, if no one agains this API change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73086/testReport)** for PR 16895 at commit [`82561c0`](https://github.com/apache/spark/commit/82561c072c668e062e9d854074bb3ef50320dd5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16895


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72747/testReport)** for PR 16895 at commit [`3c477c1`](https://github.com/apache/spark/commit/3c477c1ad95569a3b58d46cb39df516e284bd706).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72747/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r101454773
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1364,10 +1364,11 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         })
       }
     
    -  test("SPARK-6245 JsonRDD.inferSchema on empty RDD") {
    +  test("SPARK-6245 JsonRDD.inferSchema on empty Dataset") {
         // This is really a test that it doesn't throw an exception
    +    val emptyDataset = spark.createDataset(empty)(Encoders.STRING)
    --- End diff --
    
    doesn't `empty.toDS` work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73015/testReport)** for PR 16895 at commit [`731951a`](https://github.com/apache/spark/commit/731951aa779334bae9250f6b969cc2e99f41896b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r100681568
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala ---
    @@ -231,4 +231,10 @@ private[json] trait TestJsonData {
       lazy val singleRow: RDD[String] = spark.sparkContext.parallelize("""{"a":123}""" :: Nil)
     
       def empty: RDD[String] = spark.sparkContext.parallelize(Seq[String]())
    +  
    +  def dataset(rdd: RDD[String]): Dataset[String] = {
    +    val sqlContext = spark.sqlContext
    +    import sqlContext.implicits._
    +    spark.createDataset(rdd)
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by pjfanning <gi...@git.apache.org>.
Github user pjfanning commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r101456955
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1364,10 +1364,11 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         })
       }
     
    -  test("SPARK-6245 JsonRDD.inferSchema on empty RDD") {
    +  test("SPARK-6245 JsonRDD.inferSchema on empty Dataset") {
         // This is really a test that it doesn't throw an exception
    +    val emptyDataset = spark.createDataset(empty)(Encoders.STRING)
    --- End diff --
    
    I can double check but the toDS call appears to require the spark implicits import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72960/testReport)** for PR 16895 at commit [`580b4e4`](https://github.com/apache/spark/commit/580b4e473b8df98cf763975fc5cdd1dd229163c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73015/testReport)** for PR 16895 at commit [`731951a`](https://github.com/apache/spark/commit/731951aa779334bae9250f6b969cc2e99f41896b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73015/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r100681578
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1364,9 +1364,9 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         })
       }
     
    -  test("SPARK-6245 JsonRDD.inferSchema on empty RDD") {
    +  test("SPARK-6245 JsonRDD.inferSchema on empty Dataset") {
         // This is really a test that it doesn't throw an exception
    -    val emptySchema = JsonInferSchema.infer(empty, "", new JSONOptions(Map.empty[String, String]))
    +    val emptySchema = JsonInferSchema.infer(dataset(empty), "", new JSONOptions(Map.empty[String, String]))
    --- End diff --
    
    I think we can just write `empty.toDS`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by pjfanning <gi...@git.apache.org>.
Github user pjfanning commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r101460138
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1364,10 +1364,11 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         })
       }
     
    -  test("SPARK-6245 JsonRDD.inferSchema on empty RDD") {
    +  test("SPARK-6245 JsonRDD.inferSchema on empty Dataset") {
         // This is really a test that it doesn't throw an exception
    +    val emptyDataset = spark.createDataset(empty)(Encoders.STRING)
    --- End diff --
    
    RDD[_] only has toDS() function added when SQLImplicits applies an implicit conversion to wrap the RDD as a DatasetHolder.
    import sparkSession.sqlContext.implicits._



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r100681550
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -328,18 +329,34 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @param jsonRDD input RDD with one JSON object per record
        * @since 1.4.0
        */
    +  @deprecated("Use json(Dataset[String]) instead.", "2.2.0")
       def json(jsonRDD: RDD[String]): DataFrame = {
    +    import sparkSession.sqlContext.implicits._
    +    json(sparkSession.createDataset(jsonRDD))
    --- End diff --
    
    nit `sparkSession.createDataset(jsonRDD)(Encoders.STRING)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16895: [SPARK-15615][SQL] Add an API to load DataFrame f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16895#discussion_r100681579
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala ---
    @@ -231,4 +231,10 @@ private[json] trait TestJsonData {
       lazy val singleRow: RDD[String] = spark.sparkContext.parallelize("""{"a":123}""" :: Nil)
     
       def empty: RDD[String] = spark.sparkContext.parallelize(Seq[String]())
    +  
    +  def dataset(rdd: RDD[String]): Dataset[String] = {
    --- End diff --
    
    actually we don't need this , see https://github.com/apache/spark/pull/16895/files#r100681578


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #73087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73087/testReport)** for PR 16895 at commit [`cdf53bf`](https://github.com/apache/spark/commit/cdf53bf517a2f9dd6bbe347455cfb1be1f15ca45).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72747/testReport)** for PR 16895 at commit [`3c477c1`](https://github.com/apache/spark/commit/3c477c1ad95569a3b58d46cb39df516e284bd706).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72749/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72960/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72749/testReport)** for PR 16895 at commit [`bb304de`](https://github.com/apache/spark/commit/bb304de56e43c8e9e49dd77ca45d883b8c907fc3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16895: [SPARK-15615][SQL] Add an API to load DataFrame from Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16895
  
    **[Test build #72960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72960/testReport)** for PR 16895 at commit [`580b4e4`](https://github.com/apache/spark/commit/580b4e473b8df98cf763975fc5cdd1dd229163c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class TransportChannelHandler extends ChannelInboundHandlerAdapter `
      * `  class LinearSVCWrapperWriter(instance: LinearSVCWrapper) extends MLWriter `
      * `  class LinearSVCWrapperReader extends MLReader[LinearSVCWrapper] `
      * `class NoSuchDatabaseException(val db: String) extends AnalysisException(s\"Database '$db' not found\")`
      * `  class ResolveBroadcastHints(conf: CatalystConf) extends Rule[LogicalPlan] `
      * `case class JsonToStruct(`
      * `case class StructToJson(`
      * `case class Hint(name: String, parameters: Seq[String], child: LogicalPlan) extends UnaryNode `
      * `case class InnerOuterEstimation(conf: CatalystConf, join: Join) extends Logging `
      * `case class LeftSemiAntiEstimation(conf: CatalystConf, join: Join) `
      * `case class NumericRange(min: JDecimal, max: JDecimal) extends Range`
      * `class FileStreamOptions(parameters: CaseInsensitiveMap[String]) extends Logging `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org