You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2016/07/13 06:21:20 UTC

[jira] [Created] (SPARK-16518) Schema Compatibility of Parquet Data Source

Xiao Li created SPARK-16518:
-------------------------------

             Summary: Schema Compatibility of Parquet Data Source
                 Key: SPARK-16518
                 URL: https://issues.apache.org/jira/browse/SPARK-16518
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Xiao Li


Currently, we are not checking the schema compatibility. Different file formats behave differently. This JIRA just summarizes what I observed for parquet data source tables.

*Scenario 1*
The existing schema is {{(col1 int, col2 string)}}
The schema of append data is {{(col1 int, col2 int)}}

*Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the error we got:
{noformat}
Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost): java.lang.NullPointerException
	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:231)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(generated.java:62)
{noformat}
*Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the error we got:
{noformat}
Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost): org.apache.spark.SparkException: Failed merging schema of file file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/spark-4c2f0b69-ee05-4be1-91f0-0e54f89f2308/part-r-00000-6b76638c-a624-444c-9479-3c8e894cb65e.snappy.parquet:
root
 |-- a: integer (nullable = false)
 |-- b: string (nullable = true)
{noformat}

*Scenario 2*




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org