You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jorge Machado <jo...@me.com.INVALID> on 2020/05/09 14:50:27 UTC
Re: How to deal Schema Evolution with Dataset API
Ok, I found a way to solve it.
Just pass the schema like this:
val schema = Encoders.product[Person].schema
spark.read.schema(schema).parquet(“input”)….
> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
>
> Hello everyone,
>
> One question to the community.
>
> Imagine I have this
>
> Case class Person(age: int)
>
> spark.read.parquet(“inputPath”).as[Person]
>
>
> After a few weeks of coding I change the class to:
> Case class Person(age: int, name: Option[String] = None)
>
>
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.
>
> Spark version 2.3.3
>
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.
>
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: How to deal Schema Evolution with Dataset API
Posted by Edgardo Szrajber <sz...@yahoo.com.INVALID>.
If you want to keep the dataset, maybe you can try to add a constructor to the case class (through the companion objcet) that receives only the age.Bentzi
Sent from Yahoo Mail on Android
On Sat, May 9, 2020 at 17:50, Jorge Machado<jo...@me.com.INVALID> wrote: Ok, I found a way to solve it.
Just pass the schema like this:
val schema = Encoders.product[Person].schema
spark.read.schema(schema).parquet(“input”)….
> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
>
> Hello everyone,
>
> One question to the community.
>
> Imagine I have this
>
> Case class Person(age: int)
>
> spark.read.parquet(“inputPath”).as[Person]
>
>
> After a few weeks of coding I change the class to:
> Case class Person(age: int, name: Option[String] = None)
>
>
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.
>
> Spark version 2.3.3
>
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.
>
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org