You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/02/11 22:51:00 UTC

[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/20579

    [SPARK-23372][SQL] Writing empty struct in parquet fails during execution. It should fail earlier in the processing.

    ## What changes were proposed in this pull request?
    Running
    spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)
    Results in
    ``` SQL
    org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema {
     }
    
    at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
     at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
     at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
     at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
     at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:225)
     at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
     at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
     at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
     at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276)
     at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206)
     at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205)
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
     at org.apache.spark.scheduler.Task.run(Task.scala:109)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.
    ```
    
    This PR addresses a couple of things.
    1) The above case now fails earlier during processing during the prep write phase.
    2) Writing an empty data frame in ORC succeeds but fails during read while inferring the schema.
        This issue is also addressed in this PR.
    
    ## How was this patch tested?
    
    Unit tests added in FileBasedDatasourceSuite.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark spark-23372

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20579
    
----
commit 9f7a1705960250cf6a828787f0f12a9f28b608c5
Author: Dilip Biswal <db...@...>
Date:   2018-02-11T17:09:07Z

    [SPARK-23372] Writing empty struct in parquet fails during execution. It should fail earlier in the processing

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167659522
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    +      }
    +    }
    +  }
    +
    +  // Formats excluding text and parquet allow writing empty data frames to files.
    +  allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe and reading from it - $format") {
    +      withTempPath { outputPath =>
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +          intercept[AnalysisException] {
    +            val df = spark.read.format(format).load(outputPath.toString)
    --- End diff --
    
    @hvanhovell Actually thats my question as well. Please see my comment [comment/question](https://github.com/apache/spark/pull/20579#issuecomment-364994881)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1618/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175950061
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -719,4 +720,27 @@ object DataSource extends Logging {
         }
         globPath
       }
    +
    +  /**
    +   * Called before writing into a FileFormat based data source to make sure the
    +   * supplied schema is not empty.
    +   * @param schema
    +   */
    +  private def verifySchema(schema: StructType): Unit = {
    +    def verifyInternal(schema: StructType): Boolean = {
    --- End diff --
    
    better to call it `hasEmptySchema`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88455/testReport)** for PR 20579 at commit [`7ecb44b`](https://github.com/apache/spark/commit/7ecb44b28eddbf7a07b65844cfd9cc98a33928e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88450/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175118143
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -542,6 +542,11 @@ case class DataSource(
           throw new AnalysisException("Cannot save interval data type into external storage.")
         }
     
    +    if (data.schema.size == 0) {
    --- End diff --
    
    Currently, we are not blocking this. I do not think we should introduce this behavior change. This is risky to block all the cases. 
    
    Previously, I tried to block CREATE TABLE with an empty schema. Later, I hit a regression because some data sources are using options/table properties to specify the schema...
    
    A general guide here is to avoid behavior changes if possible. When we have to introduce a behavior change, we should make it configurable. At least, users can convert it back by using a flag.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87317/testReport)** for PR 20579 at commit [`6f76bcc`](https://github.com/apache/spark/commit/6f76bcc9206189c3e7f78367cc7e0374af5bb2be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1537/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/816/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88485/testReport)** for PR 20579 at commit [`7920b29`](https://github.com/apache/spark/commit/7920b29f667de38ca5222e4de8f62eea688d67e4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    I agree, we should probably add a check for storing a DataFrame with no columns for now. This is normally caught by the pre-insert rules, but since the table is getting "created" in this case there is nothing to check. 
    
    In the future, I think that the create and insert should be logically separate so that the create will fail and complain that you can't create a table without at least one column. I think it will be cleaner to separate concerns like that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175019613
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -542,6 +542,11 @@ case class DataSource(
           throw new AnalysisException("Cannot save interval data type into external storage.")
         }
     
    +    if (data.schema.size == 0) {
    --- End diff --
    
    @gatorsmile May i request you to please quickly go through Wenchen's and Ryan's comments  above ? My understanding is that , we want to consistently rejecting writing empty schema for all the data sources ? Please let me know.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175952408
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -719,4 +720,27 @@ object DataSource extends Logging {
         }
         globPath
       }
    +
    +  /**
    +   * Called before writing into a FileFormat based data source to make sure the
    +   * supplied schema is not empty.
    +   * @param schema
    +   */
    +  private def verifySchema(schema: StructType): Unit = {
    +    def verifyInternal(schema: StructType): Boolean = {
    --- End diff --
    
    @cloud-fan will make the change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88388/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1670/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/801/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167465371
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    +      }
    +    }
    +  }
    +
    +  // Formats excluding text and parquet allow writing empty data frames to files.
    --- End diff --
    
    I have added this test to show the current behaviour of reading the empty data frame.  For formats like orc, json, csv , we succeed to write an empty data frame but get the error while reading while inferring the schema ? Is this the right behaviour ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175854733
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -546,6 +546,10 @@ case class DataSource(
           case dataSource: CreatableRelationProvider =>
             SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode)
           case format: FileFormat =>
    +        if (DataSource.isBuiltInFileBasedDataSource(format) && data.schema.size == 0) {
    --- End diff --
    
    We don't need this check. `FileFormat` is internal, we don't need to distinguish between built-in file format or external ones.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88436/testReport)** for PR 20579 at commit [`ecf0865`](https://github.com/apache/spark/commit/ecf08654d4c7b50eb498481011d3c6f856419207).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88327/testReport)** for PR 20579 at commit [`3a85c35`](https://github.com/apache/spark/commit/3a85c35d9a22ec106bfd97d7771629877d7cccd7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88450/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1652/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88327/testReport)** for PR 20579 at commit [`3a85c35`](https://github.com/apache/spark/commit/3a85c35d9a22ec106bfd97d7771629877d7cccd7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175952404
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -77,7 +77,6 @@ class ParquetFileFormat
           job: Job,
           options: Map[String, String],
           dataSchema: StructType): OutputWriterFactory = {
    -
    --- End diff --
    
    @cloud-fan will remove.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88279/testReport)** for PR 20579 at commit [`3588af3`](https://github.com/apache/spark/commit/3588af39eb889ab72f6800b546ba9f2107f15dc0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @cloud-fan OK.. i was thinking of adding this check in each built in datasource like Text, CSV, Parquet, ORC, JSON etc. Just like we check it in Parquet with this PR.  Would you have any concern with that approach ? That would some duplicate code under each specific format , but gives us the flexibility to change the behaviour for a datasource should we need ? What do you think ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Check whether the format is file-based data sources? Then, we do not need to check the same thing for different file sources.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    I think this should be applied to all data sources not only parquet. I can't think of any cases that a data source needs to write data with empty schema, cc @rdblue for confirmation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @cloud-fan Thank you. I assumed (wrongly) that we don't want to change the behaviour for an external file based datasource and we wanted to scope our check only to spark built in data sources. I have made the change based on your suggestion. I have parked the verifySchema method in DataSource for now. Pl. let me know if thats the right place or we want to move it to a Utility class ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88273/testReport)** for PR 20579 at commit [`ad15411`](https://github.com/apache/spark/commit/ad154115e520b82eec0b252fa19e66abdc1da832).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1572/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175013773
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -542,6 +542,11 @@ case class DataSource(
           throw new AnalysisException("Cannot save interval data type into external storage.")
         }
     
    +    if (data.schema.size == 0) {
    --- End diff --
    
    Is it required? This is a behavior change. Can we exclude it from this PR? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88436/testReport)** for PR 20579 at commit [`ecf0865`](https://github.com/apache/spark/commit/ecf08654d4c7b50eb498481011d3c6f856419207).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88440/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88450/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by mannharleen <gi...@git.apache.org>.
Github user mannharleen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167463659
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -68,6 +68,16 @@ class ParquetFileFormat
     
       override def toString: String = "Parquet"
     
    +  private def verifySchema(schema: StructType): Unit = {
    +    if (schema.size < 1) {
    --- End diff --
    
    any particular reason for using "< 1" and not "== 0". just curious


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88327/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1621/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1668/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167661193
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    --- End diff --
    
    Can we check the error message to make it sure?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87314/testReport)** for PR 20579 at commit [`9f7a170`](https://github.com/apache/spark/commit/9f7a1705960250cf6a828787f0f12a9f28b608c5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167660095
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -68,6 +68,16 @@ class ParquetFileFormat
     
       override def toString: String = "Parquet"
     
    +  private def verifySchema(schema: StructType): Unit = {
    +    if (schema.size == 0) {
    +      throw new AnalysisException(
    +        s"""
    +           |Parquet data source does not support writing empty groups.
    --- End diff --
    
    @hvanhovell Thank you. I will change it to use "schema". I will check nested schema as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    After some more thoughts, we should try our best to not introduce behavior change to existing data sources. How about we only add this check for file-based data sources(all of them are built-in)? We can have a follow-up JIRA to add this check for data source v2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r173625828
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    +      }
    +    }
    +  }
    +
    +  // Formats excluding text and parquet allow writing empty data frames to files.
    +  allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe and reading from it - $format") {
    +      withTempPath { outputPath =>
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +          intercept[AnalysisException] {
    +            val df = spark.read.format(format).load(outputPath.toString)
    --- End diff --
    
    Sorry if I misunderstood. The link is https://github.com/apache/spark/pull/20579#issuecomment-364994881. Is that the right link?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87341/testReport)** for PR 20579 at commit [`6f76bcc`](https://github.com/apache/spark/commit/6f76bcc9206189c3e7f78367cc7e0374af5bb2be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87314/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88273/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r176232893
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -719,4 +720,27 @@ object DataSource extends Logging {
         }
         globPath
       }
    +
    +  /**
    +   * Called before writing into a FileFormat based data source to make sure the
    +   * supplied schema is not empty.
    +   * @param schema
    +   */
    +  private def hasEmptySchema(schema: StructType): Unit = {
    +    def hasEmptySchemaInternal(schema: StructType): Boolean = {
    --- End diff --
    
    @cloud-fan I have gone ahead and changed the top level function name to validateSchema. I have kept the internal function name to be hasEmptySchema. Hopefully it makes sense now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167664375
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    --- End diff --
    
    Sure @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r176163585
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -719,4 +720,27 @@ object DataSource extends Logging {
         }
         globPath
       }
    +
    +  /**
    +   * Called before writing into a FileFormat based data source to make sure the
    +   * supplied schema is not empty.
    +   * @param schema
    +   */
    +  private def hasEmptySchema(schema: StructType): Unit = {
    +    def hasEmptySchemaInternal(schema: StructType): Boolean = {
    --- End diff --
    
    they should be `verifySchema` and `hasEmptySchema`, depending on their return type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88458/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88396/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1665/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1541/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88485/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88485/testReport)** for PR 20579 at commit [`7920b29`](https://github.com/apache/spark/commit/7920b29f667de38ca5222e4de8f62eea688d67e4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88279/testReport)** for PR 20579 at commit [`3588af3`](https://github.com/apache/spark/commit/3588af39eb889ab72f6800b546ba9f2107f15dc0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    LGTM, can we add something to the migration guide?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175858198
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -546,6 +546,10 @@ case class DataSource(
           case dataSource: CreatableRelationProvider =>
             SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode)
           case format: FileFormat =>
    +        if (DataSource.isBuiltInFileBasedDataSource(format) && data.schema.size == 0) {
    --- End diff --
    
    actually we just need [this check](https://github.com/apache/spark/pull/20579/files#diff-ee26d4c4be21e92e92a02e9f16dbc285R71) here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88455/testReport)** for PR 20579 at commit [`7ecb44b`](https://github.com/apache/spark/commit/7ecb44b28eddbf7a07b65844cfd9cc98a33928e9).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1691/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/798/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87341/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @dilipbiswal this is a nice improvement. I left a few comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87314/testReport)** for PR 20579 at commit [`9f7a170`](https://github.com/apache/spark/commit/9f7a1705960250cf6a828787f0f12a9f28b608c5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167463979
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -68,6 +68,16 @@ class ParquetFileFormat
     
       override def toString: String = "Parquet"
     
    +  private def verifySchema(schema: StructType): Unit = {
    +    if (schema.size < 1) {
    --- End diff --
    
    no particular reason :-). I can change that to "== 0".


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @cloud-fan @rdblue Thank you for clarification. I am sorry, i hadn't seen your comments before i pushed the last change which targets only parquet. I will adjust the fix to target all formats in a future commit. Thanks again !!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87317/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @gatorsmile Thank you Sean. I will follow your suggestion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r176187422
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -719,4 +720,27 @@ object DataSource extends Logging {
         }
         globPath
       }
    +
    +  /**
    +   * Called before writing into a FileFormat based data source to make sure the
    +   * supplied schema is not empty.
    +   * @param schema
    +   */
    +  private def hasEmptySchema(schema: StructType): Unit = {
    +    def hasEmptySchemaInternal(schema: StructType): Boolean = {
    --- End diff --
    
    You are right @cloud-fan. Given we are raising the error from the function itself, should i rename it to validateSchema ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88440/testReport)** for PR 20579 at commit [`4fe4eb6`](https://github.com/apache/spark/commit/4fe4eb6dee62b85523cd937c97076285836350a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88273/testReport)** for PR 20579 at commit [`ad15411`](https://github.com/apache/spark/commit/ad154115e520b82eec0b252fa19e66abdc1da832).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87341/testReport)** for PR 20579 at commit [`6f76bcc`](https://github.com/apache/spark/commit/6f76bcc9206189c3e7f78367cc7e0374af5bb2be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88279/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88458/testReport)** for PR 20579 at commit [`7ecb44b`](https://github.com/apache/spark/commit/7ecb44b28eddbf7a07b65844cfd9cc98a33928e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167623896
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -68,6 +68,16 @@ class ParquetFileFormat
     
       override def toString: String = "Parquet"
     
    +  private def verifySchema(schema: StructType): Unit = {
    +    if (schema.size == 0) {
    +      throw new AnalysisException(
    +        s"""
    +           |Parquet data source does not support writing empty groups.
    --- End diff --
    
    `group` is a parquet term. Let's use `schema` instead?
    
    We should also check for nested empty schema's.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88436/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175154988
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -542,6 +542,11 @@ case class DataSource(
           throw new AnalysisException("Cannot save interval data type into external storage.")
         }
     
    +    if (data.schema.size == 0) {
    --- End diff --
    
    @gatorsmile @cloud-fan OK.. sounds reasonable to me. I will rollback the latest change in this PR and we can discuss if we want to introduce the behaviour change in a future jira/pr. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r175950101
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -77,7 +77,6 @@ class ParquetFileFormat
           job: Job,
           options: Map[String, String],
           dataSchema: StructType): OutputWriterFactory = {
    -
    --- End diff --
    
    unnecessary change


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1656/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20579


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #87317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87317/testReport)** for PR 20579 at commit [`6f76bcc`](https://github.com/apache/spark/commit/6f76bcc9206189c3e7f78367cc7e0374af5bb2be).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    **[Test build #88458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88458/testReport)** for PR 20579 at commit [`7ecb44b`](https://github.com/apache/spark/commit/7ecb44b28eddbf7a07b65844cfd9cc98a33928e9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Thanks a lot @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @cloud-fan ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/815/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    @gatorsmile When you get a chance, could you please see if the check for internal datasource looks reasonable ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88455/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20579
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20579#discussion_r167625448
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  // Text and Parquet format does not allow wrting data frame with empty schema.
    +  Seq("parquet", "text").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") {
    +      withTempPath { outputPath =>
    +        intercept[AnalysisException] {
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +        }
    +      }
    +    }
    +  }
    +
    +  // Formats excluding text and parquet allow writing empty data frames to files.
    +  allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format =>
    +    test(s"SPARK-23372 writing empty dataframe and reading from it - $format") {
    +      withTempPath { outputPath =>
    +          spark.emptyDataFrame.write.format(format).save(outputPath.toString)
    +          intercept[AnalysisException] {
    +            val df = spark.read.format(format).load(outputPath.toString)
    --- End diff --
    
    This should pass right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org