You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sim <si...@swoop.com> on 2015/07/26 18:31:36 UTC

Schema evolution in tables

The  schema merging
<http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging>  
section of the Spark SQL documentation shows an example of schema evolution
in a partitioned table. 
Is this functionality only available when creating a Spark SQL table? 
dataFrameWithEvolvedSchema.saveAsTable("my_table", SaveMode.Append) fails
with 
java.lang.RuntimeException: Relation[ ... ]
org.apache.spark.sql.parquet.ParquetRelation2@83a73a05 requires that the
query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates
the same number of columns as its schema.
What is the Spark SQL idiom for appending data to a table while managing
schema evolution?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Schema-evolution-in-tables-tp23999.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Schema evolution in tables

Posted by Brandon White <bw...@gmail.com>.

Sim did you find anything? :)

On Sun, Jul 26, 2015 at 9:31 AM, sim <si...@swoop.com> wrote:

> The schema merging
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging>
> section of the Spark SQL documentation shows an example of schema evolution
> in a partitioned table.
>
> Is this functionality only available when creating a Spark SQL table?
>
> dataFrameWithEvolvedSchema.saveAsTable("my_table", SaveMode.Append) fails
> with
>
> java.lang.RuntimeException: Relation[ ... ] org.apache.spark.sql.parquet.ParquetRelation2@83a73a05
>  requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
>
> What is the Spark SQL idiom for appending data to a table while managing
> schema evolution?
> ------------------------------
> View this message in context: Schema evolution in tables
> <http://apache-spark-user-list.1001560.n3.nabble.com/Schema-evolution-in-tables-tp23999.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>

Re: Schema evolution in tables

Posted by sim <si...@swoop.com>.

There is not automated solution right now. You have to issue manual ALTER
TABLE commands, which works for adding top-level columns but gets tricky if
you are adding a field in a deeply nested struct.

Hopefully, the issue will be fixed in 2.2 because work has started on
https://issues.apache.org/jira/browse/SPARK-18727



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Schema-evolution-in-tables-tp23999p28305.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org