You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jorge Machado <jo...@me.com.INVALID> on 2020/05/09 14:50:27 UTC

Re: How to deal Schema Evolution with Dataset API

Ok, I found a way to solve it. 

Just pass the schema like this: 

val schema = Encoders.product[Person].schema

spark.read.schema(schema).parquet(“input”)….

> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
> 
> Hello everyone, 
> 
> One question to the community. 
> 
> Imagine I have this
> 
> 	Case class Person(age: int)
> 
> 	spark.read.parquet(“inputPath”).as[Person]
> 
> 
> After a few weeks of coding I change the class to: 
> 	Case class Person(age: int, name: Option[String] = None)
> 
> 
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 
> 
> Spark version 2.3.3
> 
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 
> 
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: How to deal Schema Evolution with Dataset API

Posted by Edgardo Szrajber <sz...@yahoo.com.INVALID>.

If you want to keep the dataset, maybe you can try to add a constructor to the case class (through the companion objcet) that receives only the age.Bentzi

Sent from Yahoo Mail on Android 
 
  On Sat, May 9, 2020 at 17:50, Jorge Machado<jo...@me.com.INVALID> wrote:   Ok, I found a way to solve it. 

Just pass the schema like this: 

val schema = Encoders.product[Person].schema

spark.read.schema(schema).parquet(“input”)….

> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
> 
> Hello everyone, 
> 
> One question to the community. 
> 
> Imagine I have this
> 
>     Case class Person(age: int)
> 
>     spark.read.parquet(“inputPath”).as[Person]
> 
> 
> After a few weeks of coding I change the class to: 
>     Case class Person(age: int, name: Option[String] = None)
> 
> 
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 
> 
> Spark version 2.3.3
> 
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 
> 
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org