You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/08/02 00:50:20 UTC

[jira] [Created] (SPARK-16842) Concern about disallowing user-given schema for Parquet and ORC

Hyukjin Kwon created SPARK-16842:
------------------------------------

             Summary: Concern about disallowing user-given schema for Parquet and ORC
                 Key: SPARK-16842
                 URL: https://issues.apache.org/jira/browse/SPARK-16842
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Hyukjin Kwon


If my understanding is correct,

If the user-given schema is different with the inferred schema, it is handled differently for each datasource.

- For JSON and CSV
  it is kind of permissive generally (for example, compatibility among numeric types).

- For ORC and Parquet
  Generally it is strict to types. So they don't allow the compatibility (except for very few cases, e.g. for Parquet, https://github.com/apache/spark/pull/14272 and https://github.com/apache/spark/pull/14278)

- For Text
  it only supports `StringType`.

- For JDBC
  it does not take user-given schema since it does not implement `SchemaRelationProvider`.

By allowing the user-given schema, we can use some types such as {{DateType}} and {{TimestampType}} for JSON and CSV. CSV and JSON allows arguably permissive schema.

To cut this short, JSON and CSV do not have the complete schema information written in the data whereas Orc and Parquet do. 

So, we might have to just disallow giving user-given schema. Actually, we can't give schemas for Orc and Parquet almost at all times if my understanding it correct. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org