You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/05/22 04:21:42 UTC

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/21389

    [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFileFormat

    ## What changes were proposed in this pull request?
    This pr added code to verify a schema in Json/Orc/ParquetFileFormat along with CSVFileFormat.
    
    ## How was this patch tested?
    Added tests in `OrcSourceSuite`, `ParquetQuerySuite`, and `JsonSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-24204

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21389
    
----
commit 0d88bcb58f9298bed433b8febc4c9cfb5d92f6a9
Author: Takeshi Yamamuro <ya...@...>
Date:   2018-05-08T01:55:26Z

    Fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91011/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91208/testReport)** for PR 21389 at commit [`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91955/testReport)** for PR 21389 at commit [`f78334c`](https://github.com/apache/spark/commit/f78334c8cd0e4438e1abf102fae72d090c4388be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92359/testReport)** for PR 21389 at commit [`a3c400d`](https://github.com/apache/spark/commit/a3c400d83398443bbf4abdb3764f52c6c4a8893d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190136535
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala ---
    @@ -89,6 +89,8 @@ class OrcFileFormat
           job: Job,
           options: Map[String, String],
           dataSchema: StructType): OutputWriterFactory = {
    +    DataSourceUtils.verifySchema("ORC", dataSchema)
    --- End diff --
    
    yea, that's smart. I will update.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/227/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92390/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198341823
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    --- End diff --
    
    This is duplicated with `CSVSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92368/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198340297
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,101 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: CalendarIntervalType | _: StructType | _: ArrayType | _: MapType
    +          if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      // JSON and ORC don't support an Interval type, but we pass it in read pass
    +      // for back-compatibility.
    +      case _: CalendarIntervalType if isReadPath &&
    --- End diff --
    
    If `isReadPath` is `false`, we should always throw exception.
    So we can simplify all the handling of `CalendarIntervalType` as following:
    ```
    case _: CalendarIntervalType if !isReadPath => throwUnsupportedException(dataType)
    case _: CalendarIntervalType if format.isInstanceOf[JsonFileFormat] | format.isInstanceOf[OrcFileFormat] => throwUnsupportedException(dataType)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    I checked manually and I noticed some incorrect behaviour changes (detailed below). So, I'll update in following prs. Sorry to bother you.
    
     - CSV
    
    `// Struct`
    
    `// Write Path`
    Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(dir)
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support struct<a:int,b:string> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support struct<a:int,b:string> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType.fromDDL("a struct<b: Int>")
    spark.range(1).write.mode("overwrite").csv(dir)
    spark.read.schema(schema).csv(dir).collect()
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support struct<b:int> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- this pr`
    java.lang.UnsupportedOperationException: CSV data source does not support struct<b:int> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Map`
    
    `// Write Path`
    Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.csv(dir)
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support map<string,int> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support map<string,int> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType.fromDDL("a map<int, int>")
    spark.range(1).write.mode("overwrite").csv(dir)
    spark.read.schema(schema).csv(dir).collect()
    `master and this pr has the same behaviour`
    `- master`
    java.lang.UnsupportedOperationException: CSV data source does not support map<int,int> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `- this pr`
    java.lang.UnsupportedOperationException: CSV data source does not support map<int,int> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Array`
    
    `// Write Path`
    Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands").write.csv(dir)
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support array<string> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support array<string> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType.fromDDL("a array<int>")
    spark.range(1).write.mode("overwrite").csv(dir)
    spark.read.schema(schema).csv(dir).collect()
    `master and this pr has the same behaviour`
    `--master`
    java.lang.UnsupportedOperationException: CSV data source does not support array<int> data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- this pr`
    java.lang.UnsupportedOperationException: CSV data source does not support array<int> data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Interval`
    
    `// Write Path`
    sql("select interval 1 days").write.csv(dir)
    `master and this pr has the same behaviour`
    `-- master`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    `-- thir pr`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    
    `// Read Path`
    val schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    spark.range(1).write.mode("overwrite").csv(dir)
    spark.read.schema(schema).csv(dir).collect()
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support calendarinterval data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support calendarinterval data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Null`
    
    `// Write Path`
    sql("select null").write.mode("overwrite").csv(dir)
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType(StructField("a", NullType, true) :: Nil)
    spark.range(1).write.mode("overwrite").csv(dir)
    spark.read.schema(schema).csv(dir).collect()
    `master and this pr has the same behaviour`
    `-- master`
    java.lang.UnsupportedOperationException: CSV data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:132)
    `-- thir pr`
    java.lang.UnsupportedOperationException: CSV data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
     - JSON
    
    `// Null`
    `// Write Path`
    sql("select null").write.mode("overwrite").json(dir)
    `// master and this pr has the same behaviour`
    `-- master`
    Passed
    `-- thir pr`
    Passed
    
    `// Read Path`
    val schema = StructType(StructField("a", NullType, true) :: Nil)
    spark.range(1).write.mode("overwrite").json(dir)
    spark.read.schema(schema).json(dir).collect()
    `// master and this pr has the same behaviour`
    `-- master`
    Passed
    `-- thir pr`
    Passed
    
    `// Interval`
    
    `// Write Path`
    sql("select interval 1 days").write.json(dir)
    `// master and this pr has the same behaviour`
    `-- master`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    `-- thir pr`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    
    `// Read Path`
    val schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    spark.range(1).write.mode("overwrite").json(dir)
    spark.read.schema(schema).json(dir).collect()
    **// We should keep the same behaviour**
    `-- master`
    Passed
    `-- thir pr`
    java.lang.UnsupportedOperationException: JSON data source does not support calendarinterval data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
     - parquet
    
    `// Null`
    
    `// Write Path`
    sql("select null").write.mode("overwrite").parquet(dir)
    // master hit the exception in a executor side
    `-- master`
    Caused by: java.lang.RuntimeException: Unsupported data type NullType.
      at scala.sys.package$.error(package.scala:27)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206)
    `-- thir pr`
    java.lang.UnsupportedOperationException: Parquet data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType(StructField("a", NullType, true) :: Nil)
    spark.range(1).write.mode("overwrite").parquet(dir)
    spark.read.schema(schema).parquet(dir).collect()
    `// master hit the exception in a executor side`
    `-- master`
    Caused by: org.apache.spark.sql.AnalysisException: Unsupported data type StructField(a,NullType,true).dataType;
      at org.apache.spark.sql.execution.datasources.parquet.SparkToParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:558)
    `-- thir pr`
    java.lang.UnsupportedOperationException: Parquet data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Interval`
    
    `// Write Path`
    sql("select interval 1 days").write.parquet(dir)
    // master and this pr has the same behaviour
    `-- master`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    `-- thir pr`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    
    `// Read Path`
    val schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    spark.range(1).write.mode("overwrite").parquet(dir)
    spark.read.schema(schema).parquet(dir).collect()
    `// master hit the exception in a executor side`
    `-- master`
    Caused by: org.apache.spark.sql.AnalysisException: Unsupported data type StructField(a,CalendarIntervalType,true).dataType;
      at org.apache.spark.sql.execution.datasources.parquet.SparkToParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:558)
    `-- thir pr`
    java.lang.UnsupportedOperationException: Parquet data source does not support calendarinterval data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
     - orc
    
    `// Null`
    
    `// Write Path`
    sql("select null").write.mode("overwrite").orc(dir)
    // master hit the exception in a executor side
    `-- master`
    Caused by: java.lang.IllegalArgumentException: Can't parse category at 'struct<NULL:null^>'
      at org.apache.orc.TypeDescription.parseCategory(TypeDescription.java:223)
    `-- thir pr`
    java.lang.UnsupportedOperationException: ORC data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Read Path`
    val schema = StructType(StructField("a", NullType, true) :: Nil)
    spark.range(1).write.mode("overwrite").orc(dir)
    spark.read.schema(schema).orc(dir).collect()
    **// We should keep the same behaviour**
    `-- master`
    Passed
    `-- thir pr`
    java.lang.UnsupportedOperationException: ORC data source does not support null data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    
    `// Interval`
    
    `// Write Path`
    sql("select interval 1 days").write.orc(dir)
    // master and this pr has the same behaviour
    `-- master`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    `-- thir pr`
    org.apache.spark.sql.AnalysisException: Cannot save interval data type into external storage.;
      at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:511)
    
    `// Read Path`
    val schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    spark.range(1).write.mode("overwrite").orc(dir)
    spark.read.schema(schema).orc(dir).collect()
    **// We should keep the same behaviour**
    `-- master`
    Passed
    `-- thir pr`
    java.lang.UnsupportedOperationException: ORC data source does not support calendarinterval data type.
      at org.apache.spark.sql.execution.datasources.DataSourceUtils$.throwUnsupportedException$1(DataSourceUtils.scala:40)
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92382/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3450/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92363/testReport)** for PR 21389 at commit [`c306953`](https://github.com/apache/spark/commit/c30695302a2790b458c09c478b19c40ec53243c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92382/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91010/testReport)** for PR 21389 at commit [`c7fe24f`](https://github.com/apache/spark/commit/c7fe24fa8eb4fc49e2fef732d903e70edcd867af).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198350214
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    --- End diff --
    
    fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92390/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92364/testReport)** for PR 21389 at commit [`6dfcc68`](https://github.com/apache/spark/commit/6dfcc682989cdd8b378ad94a5ea3c888b396181f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/496/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626122
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    --- End diff --
    
    simplify the condition?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92368/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90944/testReport)** for PR 21389 at commit [`00208be`](https://github.com/apache/spark/commit/00208bea2ee174798705283fc78ed73e337e4a42).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91035/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92366/testReport)** for PR 21389 at commit [`eda4ca0`](https://github.com/apache/spark/commit/eda4ca075072c309becb7087e605a9a773b1be95).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92390/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3514/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92263/testReport)** for PR 21389 at commit [`a3c400d`](https://github.com/apache/spark/commit/a3c400d83398443bbf4abdb3764f52c6c4a8893d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Just a sec, I'm currently double checking the behaviour changes manually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190461628
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    +   */
    +  def verifySchema(format: String, schema: StructType): Unit = {
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For backward-compatibility
    --- End diff --
    
    ok, I will.
    Also, we need to merge this function with `CSVUtils.verifySchema` in this pr?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4070/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4083/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91208/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3486/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91048/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190461402
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    +   */
    +  def verifySchema(format: String, schema: StructType): Unit = {
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For backward-compatibility
    --- End diff --
    
    Do we have any test case for this?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91059/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91806/testReport)** for PR 21389 at commit [`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91955/testReport)** for PR 21389 at commit [`f78334c`](https://github.com/apache/spark/commit/f78334c8cd0e4438e1abf102fae72d090c4388be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    @maropu Just want to double check whether all the data types are not supported before this PR? Have you ran these test cases without the code changes? After this PR, the error messages are more readable and captured earlier?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198342819
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    --- End diff --
    
    Also there is a space indent here, which should be removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91035/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3443/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198355457
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626501
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,222 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91059/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3441/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3520/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189913422
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala ---
    @@ -48,4 +49,33 @@ object JsonUtils {
           json.sample(withReplacement = false, options.samplingRatio, 1)
         }
       }
    +
    +  /**
    +   * Verify if the schema is supported in JSON datasource.
    +   */
    +  def verifySchema(schema: StructType): Unit = {
    --- End diff --
    
    Hmm .. but wouldn't the supported types be very specific to data source?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/180/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90937/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91806/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r195472992
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,61 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    --- End diff --
    
    Please improve the description? and document which built-in file formats are covered by this function. Also document which data types are not supported for each data source? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91915/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91011/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198351604
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    --- End diff --
    
    No, you are right. I misunderstood.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91806/testReport)** for PR 21389 at commit [`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/502/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190461517
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    +   */
    +  def verifySchema(format: String, schema: StructType): Unit = {
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For backward-compatibility
    +      case NullType if format == "JSON" =>
    +
    +      case _ =>
    +        throw new UnsupportedOperationException(
    --- End diff --
    
    Basically, for such a PR, we need to check all the data types that we block and ensure no behavior change is introduced by this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92366/testReport)** for PR 21389 at commit [`eda4ca0`](https://github.com/apache/spark/commit/eda4ca075072c309becb7087e605a9a773b1be95).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198350225
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    +        // write path
    +        var msg = intercept[AnalysisException] {
    +          sql("select interval 1 days").write.format(format).mode("overwrite").save(tempDir)
    +        }.getMessage
    +        assert(msg.contains("Cannot save interval data type into external storage."))
    +
    +        msg = intercept[UnsupportedOperationException] {
    +          spark.udf.register("testType", () => new IntervalData())
    +          sql("select testType()").write.format(format).mode("overwrite").save(tempDir)
    +        }.getMessage
    +        assert(msg.toLowerCase(Locale.ROOT)
    +          .contains(s"$format data source does not support calendarinterval data type."))
    +
    +        // read path
    +        // We expect the types below should be passed for backward-compatibility
    +
    +        // Interval type
    +        var schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    +        spark.range(1).write.format(format).mode("overwrite").save(tempDir)
    +        spark.read.schema(schema).format(format).load(tempDir).collect()
    +
    +        // UDT having interval data
    +        schema = StructType(StructField("a", new IntervalUDT(), true) :: Nil)
    +        spark.range(1).write.format(format).mode("overwrite").save(tempDir)
    +        spark.read.schema(schema).format(format).load(tempDir).collect()
    +      }
    +    }
    +
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +      Seq("parquet", "csv").foreach { format =>
    --- End diff --
    
    fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/499/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3505/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    cc @gengliangwang  Please take another look


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91915/testReport)** for PR 21389 at commit [`92d3553`](https://github.com/apache/spark/commit/92d35539955e5ff5b1710169d65be0b08090581e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198348474
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    --- End diff --
    
    This tests are moved from `CSVSuite`: https://github.com/apache/spark/pull/21389/files#diff-219ac8201e443435499123f96e94d29fL743
    
    Do I misunderstand your comment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190130347
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala ---
    @@ -89,6 +89,8 @@ class OrcFileFormat
           job: Job,
           options: Map[String, String],
           dataSchema: StructType): OutputWriterFactory = {
    +    DataSourceUtils.verifySchema("ORC", dataSchema)
    --- End diff --
    
    Thank you for refactoring the PR, @maropu ! What about using `shortName` instead of string literal "ORC" here? Then, we can have the same line like the following.
    ```
    DataSourceUtils.verifySchema(shortName, dataSchema)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91011/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91955/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92366/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92363/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90934/testReport)** for PR 21389 at commit [`0d88bcb`](https://github.com/apache/spark/commit/0d88bcb58f9298bed433b8febc4c9cfb5d92f6a9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91059/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21389


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91208/testReport)** for PR 21389 at commit [`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4122/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189924257
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala ---
    @@ -48,4 +49,33 @@ object JsonUtils {
           json.sample(withReplacement = false, options.samplingRatio, 1)
         }
       }
    +
    +  /**
    +   * Verify if the schema is supported in JSON datasource.
    +   */
    +  def verifySchema(schema: StructType): Unit = {
    --- End diff --
    
    Since supported types are specific to data sources, I think we need to verify a schema in each file format implementations. But, yes.... these built-in format (orc and parquet) has the same supported types, so it might be better to move the code `veryfySchema` into somewhere (e.g., `DataSourceUtils` or something) for avoiding code duplication....


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90937/testReport)** for PR 21389 at commit [`00208be`](https://github.com/apache/spark/commit/00208bea2ee174798705283fc78ed73e337e4a42).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/504/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92359/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190461717
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    +   */
    +  def verifySchema(format: String, schema: StructType): Unit = {
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For backward-compatibility
    +      case NullType if format == "JSON" =>
    +
    +      case _ =>
    +        throw new UnsupportedOperationException(
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189872259
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala ---
    @@ -48,4 +49,33 @@ object JsonUtils {
           json.sample(withReplacement = false, options.samplingRatio, 1)
         }
       }
    +
    +  /**
    +   * Verify if the schema is supported in JSON datasource.
    +   */
    +  def verifySchema(schema: StructType): Unit = {
    --- End diff --
    
    The function `verifySchema` is very similar with the one in Orc/Parquet except the exception message. Should we put it into a util object?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91035/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91048/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91895/testReport)** for PR 21389 at commit [`92d3553`](https://github.com/apache/spark/commit/92d35539955e5ff5b1710169d65be0b08090581e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92364/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90944/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91010/testReport)** for PR 21389 at commit [`c7fe24f`](https://github.com/apache/spark/commit/c7fe24fa8eb4fc49e2fef732d903e70edcd867af).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92364/testReport)** for PR 21389 at commit [`6dfcc68`](https://github.com/apache/spark/commit/6dfcc682989cdd8b378ad94a5ea3c888b396181f).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3491/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198341944
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    +        // write path
    +        var msg = intercept[AnalysisException] {
    +          sql("select interval 1 days").write.format(format).mode("overwrite").save(tempDir)
    +        }.getMessage
    +        assert(msg.contains("Cannot save interval data type into external storage."))
    +
    +        msg = intercept[UnsupportedOperationException] {
    +          spark.udf.register("testType", () => new IntervalData())
    +          sql("select testType()").write.format(format).mode("overwrite").save(tempDir)
    +        }.getMessage
    +        assert(msg.toLowerCase(Locale.ROOT)
    +          .contains(s"$format data source does not support calendarinterval data type."))
    +
    +        // read path
    +        // We expect the types below should be passed for backward-compatibility
    +
    +        // Interval type
    +        var schema = StructType(StructField("a", CalendarIntervalType, true) :: Nil)
    +        spark.range(1).write.format(format).mode("overwrite").save(tempDir)
    +        spark.read.schema(schema).format(format).load(tempDir).collect()
    +
    +        // UDT having interval data
    +        schema = StructType(StructField("a", new IntervalUDT(), true) :: Nil)
    +        spark.range(1).write.format(format).mode("overwrite").save(tempDir)
    +        spark.read.schema(schema).format(format).load(tempDir).collect()
    +      }
    +    }
    +
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +      Seq("parquet", "csv").foreach { format =>
    --- End diff --
    
    Nit: we can put all the write path together to reduce duplicated code


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/500/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92263/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/519/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198355370
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,101 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: CalendarIntervalType | _: StructType | _: ArrayType | _: MapType
    +          if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      // JSON and ORC don't support an Interval type, but we pass it in read pass
    +      // for back-compatibility.
    +      case _: CalendarIntervalType if isReadPath &&
    --- End diff --
    
    Thanks for the comment. Based on the suggestion, I modified code. Could you check again?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92382/testReport)** for PR 21389 at commit [`1cfc7b0`](https://github.com/apache/spark/commit/1cfc7b02089401eca2f17db55b113e6620f398be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3487/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90934/testReport)** for PR 21389 at commit [`0d88bcb`](https://github.com/apache/spark/commit/0d88bcb58f9298bed433b8febc4c9cfb5d92f6a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90944/testReport)** for PR 21389 at commit [`00208be`](https://github.com/apache/spark/commit/00208bea2ee174798705283fc78ed73e337e4a42).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r190791115
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    +   */
    +  def verifySchema(format: String, schema: StructType): Unit = {
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For backward-compatibility
    --- End diff --
    
    Yes, as long as it does not break anything. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91016/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189964737
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2408,4 +2409,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           spark.read.option("mode", "PERMISSIVE").option("encoding", "UTF-8").json(Seq(badJson).toDS()),
           Row(badJson))
       }
    +
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    ok, I'll brush up the tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90934/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626168
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For JSON backward-compatibility
    +      case NullType if format.isInstanceOf[JsonFileFormat] ||
    +        (isReadPath && format.isInstanceOf[OrcFileFormat]) =>
    +
    +      case _ => throwUnsupportedException(dataType)
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91016/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198341922
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    --- End diff --
    
    Nit: we can put all the write path together to reduce duplicated code


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626232
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,222 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    Can we split this test case to multiple smaller ones? It will be easy to understand what happens if any of them fail by a meaningful test case name.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626497
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    --- End diff --
    
    yea, I'll recheck and list up these in the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92263/testReport)** for PR 21389 at commit [`a3c400d`](https://github.com/apache/spark/commit/a3c400d83398443bbf4abdb3764f52c6c4a8893d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    ping


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #90937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90937/testReport)** for PR 21389 at commit [`00208be`](https://github.com/apache/spark/commit/00208bea2ee174798705283fc78ed73e337e4a42).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91048/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92363/testReport)** for PR 21389 at commit [`c306953`](https://github.com/apache/spark/commit/c30695302a2790b458c09c478b19c40ec53243c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626154
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    +
    +      case udt: UserDefinedType[_] => verifyType(udt.sqlType)
    +
    +      // For JSON backward-compatibility
    +      case NullType if format.isInstanceOf[JsonFileFormat] ||
    +        (isReadPath && format.isInstanceOf[OrcFileFormat]) =>
    +
    +      case _ => throwUnsupportedException(dataType)
    --- End diff --
    
    Write a comment above this? 
    > // Actually we won't pass in unsupported data types, this is a safety check.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #92359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92359/testReport)** for PR 21389 at commit [`a3c400d`](https://github.com/apache/spark/commit/a3c400d83398443bbf4abdb3764f52c6c4a8893d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/440/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4004/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626306
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    --- End diff --
    
    Let us try to enumerate all the ones we want to block? Instead of relying on the last case to throw the exceptions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92368/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    cc: @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189961194
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2408,4 +2409,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           spark.read.option("mode", "PERMISSIVE").option("encoding", "UTF-8").json(Seq(badJson).toDS()),
           Row(badJson))
       }
    +
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    (Probably, the suggestion is related to the one above) Essentially, the supported types are specific to datasource implementations, so I'm not 100% sure that it's the best to put these tests in `FileBasedDataSourceSuite `.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3625/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/114/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189952391
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2408,4 +2409,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           spark.read.option("mode", "PERMISSIVE").option("encoding", "UTF-8").json(Seq(badJson).toDS()),
           Row(badJson))
       }
    +
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    Thank you for pinging me, @maropu .
    Since this is all about file-based data sources, can we have all these test cases in `FileBasedDataSourceSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/192/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r198341248
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  // Unsupported data types of csv, json, orc, and parquet are as follows;
    +  //  csv -> R/W: Interval, Null, Array, Map, Struct
    +  //  json -> W: Interval
    +  //  orc -> W: Interval, Null
    +  //  parquet -> R/W: Interval, Null
    +  test("SPARK-24204 error handling for unsupported Array/Map/Struct types - csv") {
    +    withTempDir { dir =>
    +      val csvDir = new File(dir, "csv").getCanonicalPath
    +      var msg = intercept[UnsupportedOperationException] {
    +        Seq((1, "Tesla")).toDF("a", "b").selectExpr("struct(a, b)").write.csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<a:int,b:string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a struct<b: Int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support struct<b:int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<string,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType.fromDDL("a map<int, int>")
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support map<int,int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, Array("Tesla", "Chevy", "Ford"))).toDF("id", "brands")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<string> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +         val schema = StructType.fromDDL("a array<int>")
    +         spark.range(1).write.mode("overwrite").csv(csvDir)
    +         spark.read.schema(schema).csv(csvDir).collect()
    +       }.getMessage
    +      assert(msg.contains("CSV data source does not support array<int> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25)))).toDF("id", "vectors")
    +          .write.mode("overwrite").csv(csvDir)
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type"))
    +
    +      msg = intercept[UnsupportedOperationException] {
    +        val schema = StructType(StructField("a", new UDT.MyDenseVectorUDT(), true) :: Nil)
    +        spark.range(1).write.mode("overwrite").csv(csvDir)
    +        spark.read.schema(schema).csv(csvDir).collect()
    +      }.getMessage
    +      assert(msg.contains("CSV data source does not support array<double> data type."))
    +    }
    +  }
    +
    +  test("SPARK-24204 error handling for unsupported Interval data types - csv, json, parquet, orc") {
    +    withTempDir { dir =>
    +      val tempDir = new File(dir, "files").getCanonicalPath
    +
    +       Seq("orc", "json").foreach { format =>
    +        // write path
    --- End diff --
    
    space indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/514/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91915/testReport)** for PR 21389 at commit [`92d3553`](https://github.com/apache/spark/commit/92d35539955e5ff5b1710169d65be0b08090581e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r189964186
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2408,4 +2409,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           spark.read.option("mode", "PERMISSIVE").option("encoding", "UTF-8").json(Seq(badJson).toDS()),
           Row(badJson))
       }
    +
    +  test("SPARK-24204 error handling for unsupported data types") {
    --- End diff --
    
    The suite doesn't assume that all file-based data source has the same capability. In this PR, the test codes are almost the same and the only difference are the mapping tables. For example,
    - `json` -> Interval
    - `orc` -> Interval, Null
    - `parquet` -> Interval, Null


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91016/testReport)** for PR 21389 at commit [`e4ebac2`](https://github.com/apache/spark/commit/e4ebac25afcb0ab0451de4802aa8b78b8fe6c21f).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    **[Test build #91895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91895/testReport)** for PR 21389 at commit [`92d3553`](https://github.com/apache/spark/commit/92d35539955e5ff5b1710169d65be0b08090581e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r195579915
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,61 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource.
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91895/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21389#discussion_r197626164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
    +import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
    +import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
    +import org.apache.spark.sql.types._
    +
    +
    +object DataSourceUtils {
    +
    +  /**
    +   * Verify if the schema is supported in datasource in write path.
    +   */
    +  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = false)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource in read path.
    +   */
    +  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
    +    verifySchema(format, schema, isReadPath = true)
    +  }
    +
    +  /**
    +   * Verify if the schema is supported in datasource. This verification should be done
    +   * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues`
    +   * in `FileFormat`.
    +   *
    +   * Unsupported data types of csv, json, orc, and parquet are as follows;
    +   *  csv -> R/W: Interval, Null, Array, Map, Struct
    +   *  json -> W: Interval
    +   *  orc -> W: Interval, Null
    +   *  parquet -> R/W: Interval, Null
    +   */
    +  private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = {
    +    def throwUnsupportedException(dataType: DataType): Unit = {
    +      throw new UnsupportedOperationException(
    +        s"$format data source does not support ${dataType.simpleString} data type.")
    +    }
    +
    +    def verifyType(dataType: DataType): Unit = dataType match {
    +      case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType |
    +           StringType | BinaryType | DateType | TimestampType | _: DecimalType =>
    +
    +      case _: StructType | _: ArrayType | _: MapType if format.isInstanceOf[CSVFileFormat] =>
    +        throwUnsupportedException(dataType)
    +
    +      case st: StructType => st.foreach { f => verifyType(f.dataType) }
    +
    +      case ArrayType(elementType, _) => verifyType(elementType)
    +
    +      case MapType(keyType, valueType, _) =>
    +        verifyType(keyType)
    +        verifyType(valueType)
    +
    +      case _: CalendarIntervalType if isReadPath && format.isInstanceOf[JsonFileFormat] ||
    +        isReadPath && format.isInstanceOf[OrcFileFormat] =>
    --- End diff --
    
    ok, will do


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91010/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org