You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sim <si...@swoop.com> on 2015/08/09 05:58:25 UTC

Re: Spark inserting into parquet files with different schema

Adam, did you find a solution for this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark inserting into parquet files with different schema

Posted by Simeon Simeonov <si...@swoop.com>.

Michael, please, see http://apache-spark-user-list.1001560.n3.nabble.com/Schema-evolution-in-tables-tt23999.html

The exception is

java.lang.RuntimeException: Relation[ ... ] org.apache.spark.sql.parquet.ParquetRelation2@83a73a05
 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

Is this behavior expected? Shall I create a JIRA issue if it is not?

From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 3:44 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema

What is the error you are getting?  It would also be awesome if you could try with Spark 1.5 when the first preview comes out (hopefully early next week).

On Mon, Aug 10, 2015 at 11:41 AM, Simeon Simeonov <si...@swoop.com>> wrote:
Michael, is there an example anywhere that demonstrates how this works with the schema changing over time?

Must the Hive tables be set up as external tables outside of saveAsTable? In my experience, in 1.4.1, writing to a table with SaveMode.Append fails if the schema don't match.

Thanks,
Sim

From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 2:36 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema

Older versions of Spark (i.e. when it was still called SchemaRDD instead of DataFrame) did not support merging different parquet schema.  However, Spark 1.4+ should.

On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com>> wrote:
Adam, did you find a solution for this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: Spark inserting into parquet files with different schema

Posted by Michael Armbrust <mi...@databricks.com>.

What is the error you are getting?  It would also be awesome if you could
try with Spark 1.5 when the first preview comes out (hopefully early next
week).

On Mon, Aug 10, 2015 at 11:41 AM, Simeon Simeonov <si...@swoop.com> wrote:

> Michael, is there an example anywhere that demonstrates how this works
> with the schema changing over time?
>
> Must the Hive tables be set up as external tables outside of saveAsTable?
> In my experience, in 1.4.1, writing to a table with SaveMode.Append fails
> if the schema don't match.
>
> Thanks,
> Sim
>
> From: Michael Armbrust <mi...@databricks.com>
> Date: Monday, August 10, 2015 at 2:36 PM
> To: Simeon Simeonov <si...@swoop.com>
> Cc: user <us...@spark.apache.org>
> Subject: Re: Spark inserting into parquet files with different schema
>
> Older versions of Spark (i.e. when it was still called SchemaRDD instead
> of DataFrame) did not support merging different parquet schema.  However,
> Spark 1.4+ should.
>
> On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com> wrote:
>
>> Adam, did you find a solution for this?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Spark inserting into parquet files with different schema

Posted by Simeon Simeonov <si...@swoop.com>.

Michael, is there an example anywhere that demonstrates how this works with the schema changing over time?

Must the Hive tables be set up as external tables outside of saveAsTable? In my experience, in 1.4.1, writing to a table with SaveMode.Append fails if the schema don't match.

Thanks,
Sim

From: Michael Armbrust <mi...@databricks.com>>
Date: Monday, August 10, 2015 at 2:36 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark inserting into parquet files with different schema

Older versions of Spark (i.e. when it was still called SchemaRDD instead of DataFrame) did not support merging different parquet schema.  However, Spark 1.4+ should.

On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com>> wrote:
Adam, did you find a solution for this?

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: Spark inserting into parquet files with different schema

Posted by Michael Armbrust <mi...@databricks.com>.

Older versions of Spark (i.e. when it was still called SchemaRDD instead of
DataFrame) did not support merging different parquet schema.  However,
Spark 1.4+ should.

On Sat, Aug 8, 2015 at 8:58 PM, sim <si...@swoop.com> wrote:

> Adam, did you find a solution for this?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>