You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2016/06/17 03:39:28 UTC

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/13727

    [SPARK-15982][SPARK-16009] Harmonize the behavior of DataFrameReader.text/csv/json/parquet/orc

    ## What changes were proposed in this pull request?
    
    Issues with current reader behavior.
    - `text()` without args returns an empty DF with no columns -> inconsistent, its expected that text will always return a DF with `value` string field,
    - `textFile()` without args fails with exception because of the above reason, it expected the DF returned by `text()` to have a `value` field.
    - `orc()` does not have var args, inconsistent with others
    - `json(single-arg)` was removed, but that caused source compatibility issues - SPARK-16009
    
    The solution I am implementing is to do the following. 
    - For each format, there will be a single argument method, and a vararg method. For json, parquet, csv, text, this means adding json(string), etc.. For orc, this means adding orc(varargs).
    - Remove the special handling of text(), csv(), etc. that returns empty dataframe with no fields. Rather pass on the empty sequence of paths to the datasource, and let each datasource handle it right. For e.g, text data source, should return empty DF with schema (value: string)
    
    ## How was this patch tested?
    Added new unit tests for Scala and Java tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-15982

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13727
    
----
commit dcc4655225b27a4bc544ce38580949fb3fe60121
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-06-17T03:37:44Z

    Fixed and added tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @marmbrus @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67573034
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    --- End diff --
    
    should we checkAnswer on these there possible?  a lot of errors only manifest on `collect()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60847/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by koertkuipers <gi...@git.apache.org>.

Github user koertkuipers commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68672691
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    it will also break users code in an upgrade


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60679/consoleFull)** for PR 13727 at commit [`960048d`](https://github.com/apache/spark/commit/960048d846ba258aab3a35027e11c377dba0830d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60850/consoleFull)** for PR 13727 at commit [`2539a94`](https://github.com/apache/spark/commit/2539a947d382d8d6c21c59fe8a9420a15aad9b9a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67582678
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -146,18 +140,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        */
       @scala.annotation.varargs
       def load(paths: String*): DataFrame = {
    -    if (paths.isEmpty) {
    --- End diff --
    
    In my PR, will add the test cases to verify all the possible inputs after this code changes. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60679/consoleFull)** for PR 13727 at commit [`960048d`](https://github.com/apache/spark/commit/960048d846ba258aab3a35027e11c377dba0830d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60747/consoleFull)** for PR 13727 at commit [`29524b1`](https://github.com/apache/spark/commit/29524b1201fb3e028a2875397feba5c0e577365f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by koertkuipers <gi...@git.apache.org>.

Github user koertkuipers commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68624316
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    with this change path is no longer available in the options. this makes it hard (impossible?) for non-file based DataSources (not implementing FileFormat) to use load(...)
    
    For example for elasticsearch we use:
    ```
    sqlContext.read.format("org.elasticsearch.spark.sql").load(resource)
    ```
    i do not think this can be implemented anymore now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67575723
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,45 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = {
    +    // This method ensures that calls that explicit need single argument works, see SPARK-16009
    +    json(Seq(path): _*)
    +  }
    +
    +  /**
    +   * Loads a JSON file (one object per line) and returns the result as a [[DataFrame]].
    +   *
    +   * This function goes through the input once to determine the input schema. If you know the
    +   * schema in advance, use the version that specifies the schema to avoid the extra scan.
    +   *
    +   * You can set the following JSON-specific options to deal with non-standard JSON files:
    +   * <li>`primitivesAsString` (default `false`): infers all primitive values as a string type</li>
    +   * <li>`prefersDecimal` (default `false`): infers all floating-point values as a decimal
    +   * type. If the values do not fit in decimal, then it infers them as doubles.</li>
    +   * <li>`allowComments` (default `false`): ignores Java/C++ style comment in JSON records</li>
    +   * <li>`allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names</li>
    +   * <li>`allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
    +   * </li>
    +   * <li>`allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers
    +   * (e.g. 00012)</li>
    +   * <li>`allowBackslashEscapingAnyCharacter` (default `false`): allows accepting quoting of all
    +   * character using backslash quoting mechanism</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing.</li>
    +   * <ul>
    +   *  <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts the
    --- End diff --
    
    It might be better to just use the intented `-` notation for lists


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68644592
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    ```Scala
    sqlContext.read.option("path", resource).format("org.elasticsearch.spark.sql").load()
    ```
    
    Can you try this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67578285
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    --- End diff --
    
    Yeah, only where its easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60850/consoleFull)** for PR 13727 at commit [`2539a94`](https://github.com/apache/spark/commit/2539a947d382d8d6c21c59fe8a9420a15aad9b9a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org