You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2016/06/17 03:39:28 UTC

[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/13727

    [SPARK-15982][SPARK-16009] Harmonize the behavior of DataFrameReader.text/csv/json/parquet/orc

    ## What changes were proposed in this pull request?
    
    Issues with current reader behavior.
    - `text()` without args returns an empty DF with no columns -> inconsistent, its expected that text will always return a DF with `value` string field,
    - `textFile()` without args fails with exception because of the above reason, it expected the DF returned by `text()` to have a `value` field.
    - `orc()` does not have var args, inconsistent with others
    - `json(single-arg)` was removed, but that caused source compatibility issues - SPARK-16009
    
    The solution I am implementing is to do the following. 
    - For each format, there will be a single argument method, and a vararg method. For json, parquet, csv, text, this means adding json(string), etc.. For orc, this means adding orc(varargs).
    - Remove the special handling of text(), csv(), etc. that returns empty dataframe with no fields. Rather pass on the empty sequence of paths to the datasource, and let each datasource handle it right. For e.g, text data source, should return empty DF with schema (value: string)
    
    ## How was this patch tested?
    Added new unit tests for Scala and Java tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-15982

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13727
    
----
commit dcc4655225b27a4bc544ce38580949fb3fe60121
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-06-17T03:37:44Z

    Fixed and added tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @marmbrus @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67573034
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    --- End diff --
    
    should we checkAnswer on these there possible?  a lot of errors only manifest on `collect()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60847/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by koertkuipers <gi...@git.apache.org>.
Github user koertkuipers commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68672691
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    it will also break users code in an upgrade


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60679/consoleFull)** for PR 13727 at commit [`960048d`](https://github.com/apache/spark/commit/960048d846ba258aab3a35027e11c377dba0830d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60850/consoleFull)** for PR 13727 at commit [`2539a94`](https://github.com/apache/spark/commit/2539a947d382d8d6c21c59fe8a9420a15aad9b9a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67582678
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -146,18 +140,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        */
       @scala.annotation.varargs
       def load(paths: String*): DataFrame = {
    -    if (paths.isEmpty) {
    --- End diff --
    
    In my PR, will add the test cases to verify all the possible inputs after this code changes. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60679/consoleFull)** for PR 13727 at commit [`960048d`](https://github.com/apache/spark/commit/960048d846ba258aab3a35027e11c377dba0830d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60747/consoleFull)** for PR 13727 at commit [`29524b1`](https://github.com/apache/spark/commit/29524b1201fb3e028a2875397feba5c0e577365f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by koertkuipers <gi...@git.apache.org>.
Github user koertkuipers commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68624316
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    with this change path is no longer available in the options. this makes it hard (impossible?) for non-file based DataSources (not implementing FileFormat) to use load(...)
    
    For example for elasticsearch we use:
    ```
    sqlContext.read.format("org.elasticsearch.spark.sql").load(resource)
    ```
    i do not think this can be implemented anymore now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67575723
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,45 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = {
    +    // This method ensures that calls that explicit need single argument works, see SPARK-16009
    +    json(Seq(path): _*)
    +  }
    +
    +  /**
    +   * Loads a JSON file (one object per line) and returns the result as a [[DataFrame]].
    +   *
    +   * This function goes through the input once to determine the input schema. If you know the
    +   * schema in advance, use the version that specifies the schema to avoid the extra scan.
    +   *
    +   * You can set the following JSON-specific options to deal with non-standard JSON files:
    +   * <li>`primitivesAsString` (default `false`): infers all primitive values as a string type</li>
    +   * <li>`prefersDecimal` (default `false`): infers all floating-point values as a decimal
    +   * type. If the values do not fit in decimal, then it infers them as doubles.</li>
    +   * <li>`allowComments` (default `false`): ignores Java/C++ style comment in JSON records</li>
    +   * <li>`allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names</li>
    +   * <li>`allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
    +   * </li>
    +   * <li>`allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers
    +   * (e.g. 00012)</li>
    +   * <li>`allowBackslashEscapingAnyCharacter` (default `false`): allows accepting quoting of all
    +   * character using backslash quoting mechanism</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing.</li>
    +   * <ul>
    +   *  <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts the
    --- End diff --
    
    It might be better to just use the intented `-` notation for lists


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68644592
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    ```Scala
    sqlContext.read.option("path", resource).format("org.elasticsearch.spark.sql").load()
    ```
    
    Can you try this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67578285
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    --- End diff --
    
    Yeah, only where its easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60850/consoleFull)** for PR 13727 at commit [`2539a94`](https://github.com/apache/spark/commit/2539a947d382d8d6c21c59fe8a9420a15aad9b9a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67457293
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -119,13 +119,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(): DataFrame = {
    -    val dataSource =
    --- End diff --
    
    deduped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67575684
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,45 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = {
    +    // This method ensures that calls that explicit need single argument works, see SPARK-16009
    +    json(Seq(path): _*)
    +  }
    +
    +  /**
    +   * Loads a JSON file (one object per line) and returns the result as a [[DataFrame]].
    +   *
    +   * This function goes through the input once to determine the input schema. If you know the
    +   * schema in advance, use the version that specifies the schema to avoid the extra scan.
    +   *
    +   * You can set the following JSON-specific options to deal with non-standard JSON files:
    +   * <li>`primitivesAsString` (default `false`): infers all primitive values as a string type</li>
    +   * <li>`prefersDecimal` (default `false`): infers all floating-point values as a decimal
    +   * type. If the values do not fit in decimal, then it infers them as doubles.</li>
    +   * <li>`allowComments` (default `false`): ignores Java/C++ style comment in JSON records</li>
    +   * <li>`allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names</li>
    +   * <li>`allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
    +   * </li>
    +   * <li>`allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers
    +   * (e.g. 00012)</li>
    +   * <li>`allowBackslashEscapingAnyCharacter` (default `false`): allows accepting quoting of all
    +   * character using backslash quoting mechanism</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing.</li>
    +   * <ul>
    +   *  <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts the
    --- End diff --
    
    this does not indent correctly:
    
    ![screen shot 2016-06-17 at 2 07 18 pm](https://cloud.githubusercontent.com/assets/527/16164772/d64f8710-3494-11e6-8e90-29b4d75fe27b.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67572068
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -368,6 +397,63 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * <li>`maxCharsPerColumn` (default `1000000`): defines the maximum number of characters allowed
        * for any given value being read.</li>
        * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing.</li>
    +   * <ul>
    +   * <li>`PERMISSIVE` : sets other fields to `null` when it meets a corrupted record. When
    +   * a schema is set by user, it sets `null` for extra fields.</li>
    +   * <li>`DROPMALFORMED` : ignores the whole corrupted records.</li>
    +   * <li>`FAILFAST` : throws an exception when it meets corrupted records.</li>
    +   * </ul>
    +   *
    +   * @since 2.0.0
    +   */
    +  def csv(path: String): DataFrame = {
    +    // This method ensures that calls that explicit need single argument works, see SPARK-16009
    +    csv(Seq(path): _*)
    +  }
    +
    +  /**
    +   * Loads a CSV file and returns the result as a [[DataFrame]].
    +   *
    +   * This function will go through the input once to determine the input schema if `inferSchema`
    +   * is enabled. To avoid going through the entire data once, disable `inferSchema` option or
    +   * specify the schema explicitly using [[schema]].
    +   *
    +   * You can set the following CSV-specific options to deal with CSV files:
    +   * <li>`sep` (default `,`): sets the single character as a separator for each
    +   * field and value.</li>
    +   * <li>`encoding` (default `UTF-8`): decodes the CSV files by the given encoding
    +   * type.</li>
    +   * <li>`quote` (default `"`): sets the single character used for escaping quoted values where
    +   * the separator can be part of the value. If you would like to turn off quotations, you need to
    +   * set not `null` but an empty string. This behaviour is different form
    +   * `com.databricks.spark.csv`.</li>
    +   * <li>`escape` (default `\`): sets the single character used for escaping quotes inside
    +   * an already quoted value.</li>
    +   * <li>`comment` (default empty string): sets the single character used for skipping lines
    +   * beginning with this character. By default, it is disabled.</li>
    +   * <li>`header` (default `false`): uses the first line as names of columns.</li>
    +   * <li>`inferSchema` (default `false`): infers the input schema automatically from data. It
    +   * requires one extra pass over the data.</li>
    +   * <li>`ignoreLeadingWhiteSpace` (default `false`): defines whether or not leading whitespaces
    +   * from values being read should be skipped.</li>
    +   * <li>`ignoreTrailingWhiteSpace` (default `false`): defines whether or not trailing
    +   * whitespaces from values being read should be skipped.</li>
    +   * <li>`nullValue` (default empty string): sets the string representation of a null value.</li>
    +   * <li>`nanValue` (default `NaN`): sets the string representation of a non-number" value.</li>
    +   * <li>`positiveInf` (default `Inf`): sets the string representation of a positive infinity
    +   * value.</li>
    +   * <li>`negativeInf` (default `-Inf`): sets the string representation of a negative infinity
    +   * value.</li>
    +   * <li>`dateFormat` (default `null`): sets the string that indicates a date format. Custom date
    +   * formats follow the formats at `java.text.SimpleDateFormat`. This applies to both date type
    +   * and timestamp type. By default, it is `null` which means trying to parse times and date by
    +   * `java.sql.Timestamp.valueOf()` and `java.sql.Date.valueOf()`.</li>
    +   * <li>`maxColumns` (default `20480`): defines a hard limit of how many columns
    +   * a record can have.</li>
    +   * <li>`maxCharsPerColumn` (default `1000000`): defines the maximum number of characters allowed
    +   * for any given value being read.</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    --- End diff --
    
    This can be addressed in a follow up, but I don't think we should duplicate docs cause they are going to get out of sync.  I'd have one canonical one and the other link to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67457308
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -146,18 +140,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        */
       @scala.annotation.varargs
       def load(paths: String*): DataFrame = {
    -    if (paths.isEmpty) {
    --- End diff --
    
    removed the special handling of empty paths


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @tdas Sure, will do it soon. I might submit it after this is merged. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67460691
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,42 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = json(Seq(path): _*)
    --- End diff --
    
    we should add some comment inline on why this exists. ditto for similar functions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68675928
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    Sure, will do it soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60846/consoleFull)** for PR 13727 at commit [`24174f0`](https://github.com/apache/spark/commit/24174f08587d0fa680c9050f5648a0b090507af6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60696/consoleFull)** for PR 13727 at commit [`bb52410`](https://github.com/apache/spark/commit/bb52410a3df6f16fd51534cad77fb9366c8d2712).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Tonight, will submit a PR for test cases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13727


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67457343
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,42 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = json(Seq(path): _*)
    --- End diff --
    
    made this method depend on `json(varargs)` to prevent code duplication.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by koertkuipers <gi...@git.apache.org>.
Github user koertkuipers commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68645998
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    i believe that works as expected (i am running into some other issues now, but they seem unrelated). 
    however from a DSL perspective this is not very pretty?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67575918
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    --- End diff --
    
    What happens if you specify a schema here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    LGTM. Merging to master and 2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68674754
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    yea this is a bad breaking change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60695/consoleFull)** for PR 13727 at commit [`3150b01`](https://github.com/apache/spark/commit/3150b013dd67e98dfa80ad50ee0fa7cbcc2a7486).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60695/consoleFull)** for PR 13727 at commit [`3150b01`](https://github.com/apache/spark/commit/3150b013dd67e98dfa80ad50ee0fa7cbcc2a7486).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67987384
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +222,152 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(dir)
    +    spark.read.format("org.apache.spark.sql.test").load(dir, dir, dir)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(dir, dir): _*)
    +    Option(dir).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +    testRead(spark.read.text(dir), data, textSchema)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.text(), Seq.empty, textSchema)
    +    testRead(spark.read.text(dir, dir, dir), data ++ data ++ data, textSchema)
    +    testRead(spark.read.text(Seq(dir, dir): _*), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).text(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("textFile - API and behavior regarding schema") {
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.textFile().toDF(), Seq.empty, textSchema)
    +    testRead(spark.read.textFile(dir).toDF(), data, textSchema)
    +    testRead(spark.read.textFile(dir, dir).toDF(), data ++ data, textSchema)
    +    testRead(spark.read.textFile(Seq(dir, dir): _*).toDF(), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    val e = intercept[AnalysisException] { spark.read.schema(userSchema).textFile() }
    +    assert(e.getMessage.toLowerCase.contains("user specified schema not supported"))
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir, dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(Seq(dir, dir): _*) }
    +  }
    +
    +  test("csv - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).csv(dir)
    +    val df = spark.read.csv(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[IllegalArgumentException] {
    +      testRead(spark.read.csv(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.csv(dir), data, schema)
    +    testRead(spark.read.csv(dir, dir), data ++ data, schema)
    +    testRead(spark.read.csv(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.csv).get, data, schema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).csv(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("json - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).json(dir)
    +    val df = spark.read.json(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.json(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.json(dir), data, schema)
    +    testRead(spark.read.json(dir, dir), data ++ data, schema)
    +    testRead(spark.read.json(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.json).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).json(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir), expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir, dir), expData ++ expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(Seq(dir, dir): _*), expData ++ expData, userSchema)
    +  }
    +
    +  test("parquet - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).parquet(dir)
    +    val df = spark.read.parquet(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.parquet(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.parquet(dir), data, schema)
    +    testRead(spark.read.parquet(dir, dir), data ++ data, schema)
    +    testRead(spark.read.parquet(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.parquet).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).parquet(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).parquet(dir), expData, userSchema)
    --- End diff --
    
    @tdas ORC behaves differently. When the user-specified schema does not match the physical schema, it simply stops and reports an exception. Do you think that behavior is better than returning `null` for all the rows?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60846/consoleFull)** for PR 13727 at commit [`24174f0`](https://github.com/apache/spark/commit/24174f08587d0fa680c9050f5648a0b090507af6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @marmbrus I deduped the docs. Linking to specific method using scaladocs was hard to use and did not work in Java docs. So I just wrote it "See docs on other overloaded method". And I fixed the doc formatting. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60695/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60846/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    A few comments. Overall LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @gatorsmile I updated my tests to rigorously test this. However since this PR is about testing the common behavior (e.g. whether all of them respect user schema or not), I have not added tests for the case where there are no paths AND no user schema. The behavior will be source-specific - `parquet/json/csv` should throw error, whereas `text` should not. Could you make a PR that tests these in  CSVSuite, etc. if they are not already tested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60747/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    The following test case is for SPARK-16007. I can confirm this PR already fixes the issue.
    ```scala
      test("schema checking") {
        val schema: StructType = new StructType().add("s", "string")
        assert(spark.read.schema(schema).csv().schema === schema)
        assert(spark.read.schema(schema).json().schema === schema)
        assert(spark.read.schema(schema).parquet().schema === schema)
        assert(spark.read.schema(schema).text().schema === schema)
        assert(spark.read.schema(schema).orc().schema === schema)
      }
    ```
    
    Since the ORC data source must be used with Hive support enabled, you can comment the last line out. Or move it to another test case in Hive. 
    
    Please let me know if anything is needed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60847/consoleFull)** for PR 13727 at commit [`3498bd0`](https://github.com/apache/spark/commit/3498bd06dda12f3bf8788b787195cd8293f6ebde).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67572585
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    --- End diff --
    
    I'm not sure what it means to read text with a user specified schema.  In fact, this can't actually work if there was data and you call `collect()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67987629
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +222,152 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(dir)
    +    spark.read.format("org.apache.spark.sql.test").load(dir, dir, dir)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(dir, dir): _*)
    +    Option(dir).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +    testRead(spark.read.text(dir), data, textSchema)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.text(), Seq.empty, textSchema)
    +    testRead(spark.read.text(dir, dir, dir), data ++ data ++ data, textSchema)
    +    testRead(spark.read.text(Seq(dir, dir): _*), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).text(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("textFile - API and behavior regarding schema") {
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.textFile().toDF(), Seq.empty, textSchema)
    +    testRead(spark.read.textFile(dir).toDF(), data, textSchema)
    +    testRead(spark.read.textFile(dir, dir).toDF(), data ++ data, textSchema)
    +    testRead(spark.read.textFile(Seq(dir, dir): _*).toDF(), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    val e = intercept[AnalysisException] { spark.read.schema(userSchema).textFile() }
    +    assert(e.getMessage.toLowerCase.contains("user specified schema not supported"))
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir, dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(Seq(dir, dir): _*) }
    +  }
    +
    +  test("csv - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).csv(dir)
    +    val df = spark.read.csv(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[IllegalArgumentException] {
    +      testRead(spark.read.csv(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.csv(dir), data, schema)
    +    testRead(spark.read.csv(dir, dir), data ++ data, schema)
    +    testRead(spark.read.csv(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.csv).get, data, schema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).csv(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("json - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).json(dir)
    +    val df = spark.read.json(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.json(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.json(dir), data, schema)
    +    testRead(spark.read.json(dir, dir), data ++ data, schema)
    +    testRead(spark.read.json(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.json).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).json(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir), expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir, dir), expData ++ expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(Seq(dir, dir): _*), expData ++ expData, userSchema)
    +  }
    +
    +  test("parquet - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).parquet(dir)
    +    val df = spark.read.parquet(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.parquet(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.parquet(dir), data, schema)
    +    testRead(spark.read.parquet(dir, dir), data ++ data, schema)
    +    testRead(spark.read.parquet(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.parquet).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).parquet(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).parquet(dir), expData, userSchema)
    --- End diff --
    
    Let's documents that as a test for now.
    On Jun 21, 2016 7:51 PM, "Xiao Li" <no...@github.com> wrote:
    
    > In
    > sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
    > <https://github.com/apache/spark/pull/13727#discussion_r67987384>:
    >
    > > +
    > > +    // Reader, without user specified schema
    > > +    intercept[AnalysisException] {
    > > +      testRead(spark.read.parquet(), Seq.empty, schema)
    > > +    }
    > > +    testRead(spark.read.parquet(dir), data, schema)
    > > +    testRead(spark.read.parquet(dir, dir), data ++ data, schema)
    > > +    testRead(spark.read.parquet(Seq(dir, dir): _*), data ++ data, schema)
    > > +    // Test explicit calls to single arg method - SPARK-16009
    > > +    testRead(Option(dir).map(spark.read.parquet).get, data, schema)
    > > +
    > > +    // Reader, with user specified schema, data should be nulls as schema in file different
    > > +    // from user schema
    > > +    val expData = Seq[String](null, null, null)
    > > +    testRead(spark.read.schema(userSchema).parquet(), Seq.empty, userSchema)
    > > +    testRead(spark.read.schema(userSchema).parquet(dir), expData, userSchema)
    >
    > @tdas <https://github.com/tdas> ORC behaves differently. When the
    > user-specified schema does not match the physical schema, it simply stops
    > and reports an exception. Do you think that behavior is better than
    > returning null for all the rows?
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/13727/files/2539a947d382d8d6c21c59fe8a9420a15aad9b9a#r67987384>,
    > or mute the thread
    > <https://github.com/notifications/unsubscribe/AAoerKNkJhd5r8QzCyrTjIY8aMnl9uVZks5qOKMygaJpZM4I4AVc>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67572462
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    --- End diff --
    
    Nit: its great to include a super short description with the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60847/consoleFull)** for PR 13727 at commit [`3498bd0`](https://github.com/apache/spark/commit/3498bd06dda12f3bf8788b787195cd8293f6ebde).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60747/consoleFull)** for PR 13727 at commit [`29524b1`](https://github.com/apache/spark/commit/29524b1201fb3e028a2875397feba5c0e577365f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67576337
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    --- End diff --
    
    In that case, then its probably better for stuff to fail earlier, that is TextFileFormat should fail if there is a userSpecifiedSchema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60679/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @gatorsmile Actually, I already added tests in this PR that cover the scenario where schema is not provided. The only one that is not really tested in the orc, as it cannot be run in the DataFrameReaderWriterSuite. So could you add that test in the ORC-related test suites, if at all needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67572121
  
    --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java ---
    @@ -0,0 +1,158 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to You under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +*    http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +
    +package test.org.apache.spark.sql;
    +
    +import java.io.File;
    +import java.util.HashMap;
    +
    +import org.apache.spark.sql.SaveMode;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.test.TestSparkSession;
    +import org.apache.spark.sql.types.StructType;
    +import org.apache.spark.util.Utils;
    +import org.junit.After;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +public class JavaDataFrameReaderWriterSuite {
    +  private SparkSession spark = new TestSparkSession();
    +  private StructType schema = new StructType().add("s", "string");
    +  private transient String input;
    +  private transient String output;
    +
    +  @Before
    +  public void setUp() {
    +    input = Utils.createTempDir(System.getProperty("java.io.tmpdir"), "input").toString();
    +    File f = Utils.createTempDir(System.getProperty("java.io.tmpdir"), "output");
    +    f.delete();
    +    output = f.toString();
    +  }
    +
    +  @After
    +  public void tearDown() {
    +    spark.stop();
    +    spark = null;
    +  }
    +
    +  @Test
    +  public void testFormatAPI() {
    +    spark
    +        .read()
    +        .format("org.apache.spark.sql.test")
    +        .load()
    +        .write()
    +        .format("org.apache.spark.sql.test")
    +        .save();
    +  }
    +
    +  @Test
    +  public void testOptionsAPI() {
    +    HashMap<String, String> map = new HashMap<String, String>();
    +    map.put("e", "1");
    +    spark
    +        .read()
    +        .option("a", "1")
    +        .option("b", 1)
    +        .option("c", 1.0)
    +        .option("d", true)
    +        .options(map)
    +        .text()
    +        .write()
    +        .option("a", "1")
    +        .option("b", 1)
    +        .option("c", 1.0)
    +        .option("d", true)
    +        .options(map)
    +        .format("org.apache.spark.sql.test")
    +        .save();
    +  }
    +
    +  @Test
    +  public void testSaveModeAPI() {
    +    spark
    +        .range(10)
    +        .write()
    +        .format("org.apache.spark.sql.test")
    +        .mode(SaveMode.ErrorIfExists)
    +        .save();
    +  }
    +
    +  @Test
    +  public void testLoadAPI() {
    +    spark.read().format("org.apache.spark.sql.test").load();
    +    spark.read().format("org.apache.spark.sql.test").load(input);
    +    spark.read().format("org.apache.spark.sql.test").load(input, input, input);
    +    spark.read().format("org.apache.spark.sql.test").load(new String[]{input, input});
    +  }
    +
    +  @Test
    +  public void testTextAPI() {
    +    spark.read().text();
    +    spark.read().text(input);
    +    spark.read().text(input, input, input);
    +    spark.read().text(new String[]{input, input})
    +        .write().text(output);
    +  }
    +
    +  @Test
    +  public void testTextFileAPI() {
    +    spark.read().textFile();     // Disabled because of SPARK-XXXXX
    --- End diff --
    
    SPARK-XXXX?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60683/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68674971
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    Do you want me to fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60696/consoleFull)** for PR 13727 at commit [`bb52410`](https://github.com/apache/spark/commit/bb52410a3df6f16fd51534cad77fb9366c8d2712).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67572740
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    +  }
    +
    +  /**
    +   * This only tests whether API compiles, but does not run it as orc()
    +   * cannot be run with Hive classes.
    --- End diff --
    
    nit: `without`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    @tdas Sure, I have multiple related test cases I wrote a few days ago. It might be also useful. You can judge whether they are needed or not. Let me merge your latest changes. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60683/consoleFull)** for PR 13727 at commit [`3384473`](https://github.com/apache/spark/commit/3384473ec91d61405e2d6a51c94f46726bfcdbd9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67576047
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +220,101 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(input)
    +    spark.read.format("org.apache.spark.sql.test").load(input, input, input)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(input, input): _*)
    +    Option(input).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.text().schema === textSchema)
    +    assert(spark.read.text(input).schema === textSchema)
    +    assert(spark.read.text(input, input, input).schema === textSchema)
    +    assert(spark.read.text(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.text).get.schema === textSchema) // SPARK-16009
    +
    +    // Reader, with user specified schema
    +    assert(spark.read.schema(userSchema).text().schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).text(Seq(input, input): _*).schema === userSchema)
    +
    +    // Writer
    +    spark.read.text().write.text(output)
    +  }
    +
    +  test("textFile - API and common behavior") {
    +    // Reader, without user specified schema
    +    assert(spark.read.textFile().schema === textSchema)
    +    assert(spark.read.textFile(input).schema === textSchema)
    +    assert(spark.read.textFile(input, input, input).schema === textSchema)
    +    assert(spark.read.textFile(Seq(input, input): _*).schema === textSchema)
    +    assert(Option(input).map(spark.read.textFile).get.schema === textSchema) // SPARK-16009
    +  }
    +
    +  test("csv - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).csv().schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).csv(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).csv).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.csv(output)
    +  }
    +
    +  test("json - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).json().schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).json(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).json).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.json(output)
    +  }
    +
    +  test("parquet - API and common behavior") {
    +    // Reader, with user specified schema
    +    // Refer to csv-specific test suites for behavior without user specified schema
    +    assert(spark.read.schema(userSchema).parquet().schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(input, input, input).schema === userSchema)
    +    assert(spark.read.schema(userSchema).parquet(Seq(input, input): _*).schema === userSchema)
    +
    +    // Test explicit calls to single arg method - SPARK-16009
    +    assert(Option(input).map(spark.read.schema(userSchema).parquet).get.schema === userSchema)
    +
    +    // Writer
    +    spark.range(10).write.parquet(output)
    --- End diff --
    
    My goal in these tests were to test the API, and individual suites like CsvSuite, ParquetSuite, etc should take care of correctness. But I see your point, I could add a few basic tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    **[Test build #60683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60683/consoleFull)** for PR 13727 at commit [`3384473`](https://github.com/apache/spark/commit/3384473ec91d61405e2d6a51c94f46726bfcdbd9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67463353
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -276,7 +267,42 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
        * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
        *
    -   * @since 1.6.0
    +   * @since 1.4.0
    +   */
    +  def json(path: String): DataFrame = json(Seq(path): _*)
    --- End diff --
    
    yes. agreed. will add as inline comments, not as scala docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009] Harmonize the behavior of Dat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60678/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r67987779
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ---
    @@ -228,4 +222,152 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  test("load API") {
    +    spark.read.format("org.apache.spark.sql.test").load()
    +    spark.read.format("org.apache.spark.sql.test").load(dir)
    +    spark.read.format("org.apache.spark.sql.test").load(dir, dir, dir)
    +    spark.read.format("org.apache.spark.sql.test").load(Seq(dir, dir): _*)
    +    Option(dir).map(spark.read.format("org.apache.spark.sql.test").load)
    +  }
    +
    +  test("text - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +    testRead(spark.read.text(dir), data, textSchema)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.text(), Seq.empty, textSchema)
    +    testRead(spark.read.text(dir, dir, dir), data ++ data ++ data, textSchema)
    +    testRead(spark.read.text(Seq(dir, dir): _*), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).text(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).text(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("textFile - API and behavior regarding schema") {
    +    spark.createDataset(data).write.mode(SaveMode.Overwrite).text(dir)
    +
    +    // Reader, without user specified schema
    +    testRead(spark.read.textFile().toDF(), Seq.empty, textSchema)
    +    testRead(spark.read.textFile(dir).toDF(), data, textSchema)
    +    testRead(spark.read.textFile(dir, dir).toDF(), data ++ data, textSchema)
    +    testRead(spark.read.textFile(Seq(dir, dir): _*).toDF(), data ++ data, textSchema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.text).get, data, textSchema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    val e = intercept[AnalysisException] { spark.read.schema(userSchema).textFile() }
    +    assert(e.getMessage.toLowerCase.contains("user specified schema not supported"))
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(dir, dir) }
    +    intercept[AnalysisException] { spark.read.schema(userSchema).textFile(Seq(dir, dir): _*) }
    +  }
    +
    +  test("csv - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).csv(dir)
    +    val df = spark.read.csv(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[IllegalArgumentException] {
    +      testRead(spark.read.csv(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.csv(dir), data, schema)
    +    testRead(spark.read.csv(dir, dir), data ++ data, schema)
    +    testRead(spark.read.csv(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.csv).get, data, schema)
    +
    +    // Reader, with user specified schema, should just apply user schema on the file data
    +    testRead(spark.read.schema(userSchema).csv(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir), data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(dir, dir), data ++ data, userSchema)
    +    testRead(spark.read.schema(userSchema).csv(Seq(dir, dir): _*), data ++ data, userSchema)
    +  }
    +
    +  test("json - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).json(dir)
    +    val df = spark.read.json(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.json(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.json(dir), data, schema)
    +    testRead(spark.read.json(dir, dir), data ++ data, schema)
    +    testRead(spark.read.json(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.json).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).json(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir), expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(dir, dir), expData ++ expData, userSchema)
    +    testRead(spark.read.schema(userSchema).json(Seq(dir, dir): _*), expData ++ expData, userSchema)
    +  }
    +
    +  test("parquet - API and behavior regarding schema") {
    +    // Writer
    +    spark.createDataset(data).toDF("str").write.mode(SaveMode.Overwrite).parquet(dir)
    +    val df = spark.read.parquet(dir)
    +    checkAnswer(df, spark.createDataset(data).toDF())
    +    val schema = df.schema
    +
    +    // Reader, without user specified schema
    +    intercept[AnalysisException] {
    +      testRead(spark.read.parquet(), Seq.empty, schema)
    +    }
    +    testRead(spark.read.parquet(dir), data, schema)
    +    testRead(spark.read.parquet(dir, dir), data ++ data, schema)
    +    testRead(spark.read.parquet(Seq(dir, dir): _*), data ++ data, schema)
    +    // Test explicit calls to single arg method - SPARK-16009
    +    testRead(Option(dir).map(spark.read.parquet).get, data, schema)
    +
    +    // Reader, with user specified schema, data should be nulls as schema in file different
    +    // from user schema
    +    val expData = Seq[String](null, null, null)
    +    testRead(spark.read.schema(userSchema).parquet(), Seq.empty, userSchema)
    +    testRead(spark.read.schema(userSchema).parquet(dir), expData, userSchema)
    --- End diff --
    
    Sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60696/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harm...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13727#discussion_r68675752
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
        * @since 1.4.0
        */
       def load(path: String): DataFrame = {
    -    option("path", path).load()
    +    load(Seq(path): _*) // force invocation of `load(...varargs...)`
    --- End diff --
    
    If you can!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13727: [SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60850/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org