You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jorge Machado <jo...@me.com.INVALID> on 2020/05/09 11:28:07 UTC
How to deal Schema Evolution with Dataset API
Hello everyone,
One question to the community.
Imagine I have this
Case class Person(age: int)
spark.read.parquet(“inputPath”).as[Person]
After a few weeks of coding I change the class to:
Case class Person(age: int, name: Option[String] = None)
Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.
Spark version 2.3.3
How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.
Thx
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: How to deal Schema Evolution with Dataset API
Posted by Edgardo Szrajber <sz...@yahoo.com.INVALID>.
If you want to keep the dataset, maybe you can try to add a constructor to the case class (through the companion objcet) that receives only the age.Bentzi
Sent from Yahoo Mail on Android
On Sat, May 9, 2020 at 17:50, Jorge Machado<jo...@me.com.INVALID> wrote: Ok, I found a way to solve it.
Just pass the schema like this:
val schema = Encoders.product[Person].schema
spark.read.schema(schema).parquet(“input”)….
> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
>
> Hello everyone,
>
> One question to the community.
>
> Imagine I have this
>
> Case class Person(age: int)
>
> spark.read.parquet(“inputPath”).as[Person]
>
>
> After a few weeks of coding I change the class to:
> Case class Person(age: int, name: Option[String] = None)
>
>
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.
>
> Spark version 2.3.3
>
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.
>
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: How to deal Schema Evolution with Dataset API
Posted by Jorge Machado <jo...@me.com.INVALID>.
Ok, I found a way to solve it.
Just pass the schema like this:
val schema = Encoders.product[Person].schema
spark.read.schema(schema).parquet(“input”)….
> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
>
> Hello everyone,
>
> One question to the community.
>
> Imagine I have this
>
> Case class Person(age: int)
>
> spark.read.parquet(“inputPath”).as[Person]
>
>
> After a few weeks of coding I change the class to:
> Case class Person(age: int, name: Option[String] = None)
>
>
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.
>
> Spark version 2.3.3
>
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.
>
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org