You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by sim <si...@swoop.com> on 2015/08/09 05:58:25 UTC
Re: Spark inserting into parquet files with different schema
Adam, did you find a solution for this?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark inserting into parquet files with different schema
Posted by Simeon Simeonov <si...@swoop.com>.
Michael, please, see http://apache-spark-user-list.1001560.n3.nabble.com/Schema-evolution-in-tables-tt23999.html
The exception is
java.lang.RuntimeException: Relation[ ... ] org.apache.spark.sql.parquet.ParquetRelation2@83a73a05
requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
Is this behavior expected? Shall I create a JIRA issue if it is not?
From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 3:44 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema
What is the error you are getting? It would also be awesome if you could try with Spark 1.5 when the first preview comes out (hopefully early next week).
On Mon, Aug 10, 2015 at 11:41 AM, Simeon Simeonov <si...@swoop.com>> wrote:
Michael, is there an example anywhere that demonstrates how this works with the schema changing over time?
Must the Hive tables be set up as external tables outside of saveAsTable? In my experience, in 1.4.1, writing to a table with SaveMode.Append fails if the schema don't match.
Thanks,
Sim
From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 2:36 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema
Older versions of Spark (i.e. when it was still called SchemaRDD instead of DataFrame) did not support merging different parquet schema. However, Spark 1.4+ should.
On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com>> wrote:
Adam, did you find a solution for this?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>
Re: Spark inserting into parquet files with different schema
Posted by Michael Armbrust <mi...@databricks.com>.
What is the error you are getting? It would also be awesome if you could
try with Spark 1.5 when the first preview comes out (hopefully early next
week).
On Mon, Aug 10, 2015 at 11:41 AM, Simeon Simeonov <si...@swoop.com> wrote:
> Michael, is there an example anywhere that demonstrates how this works
> with the schema changing over time?
>
> Must the Hive tables be set up as external tables outside of saveAsTable?
> In my experience, in 1.4.1, writing to a table with SaveMode.Append fails
> if the schema don't match.
>
> Thanks,
> Sim
>
> From: Michael Armbrust <mi...@databricks.com>
> Date: Monday, August 10, 2015 at 2:36 PM
> To: Simeon Simeonov <si...@swoop.com>
> Cc: user <us...@spark.apache.org>
> Subject: Re: Spark inserting into parquet files with different schema
>
> Older versions of Spark (i.e. when it was still called SchemaRDD instead
> of DataFrame) did not support merging different parquet schema. However,
> Spark 1.4+ should.
>
> On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com> wrote:
>
>> Adam, did you find a solution for this?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
Re: Spark inserting into parquet files with different schema
Posted by Simeon Simeonov <si...@swoop.com>.
Michael, is there an example anywhere that demonstrates how this works with the schema changing over time?
Must the Hive tables be set up as external tables outside of saveAsTable? In my experience, in 1.4.1, writing to a table with SaveMode.Append fails if the schema don't match.
Thanks,
Sim
From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 2:36 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema
Older versions of Spark (i.e. when it was still called SchemaRDD instead of DataFrame) did not support merging different parquet schema. However, Spark 1.4+ should.
On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com>> wrote:
Adam, did you find a solution for this?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>
Re: Spark inserting into parquet files with different schema
Posted by Michael Armbrust <mi...@databricks.com>.
Older versions of Spark (i.e. when it was still called SchemaRDD instead of
DataFrame) did not support merging different parquet schema. However,
Spark 1.4+ should.
On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com> wrote:
> Adam, did you find a solution for this?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>