You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jorge Machado <jo...@me.com.INVALID> on 2020/05/09 11:28:07 UTC

How to deal Schema Evolution with Dataset API

Hello everyone, 

One question to the community. 

Imagine I have this

	Case class Person(age: int)

	spark.read.parquet(“inputPath”).as[Person]


After a few weeks of coding I change the class to: 
	Case class Person(age: int, name: Option[String] = None)


Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 

Spark version 2.3.3

How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 

Thx
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: How to deal Schema Evolution with Dataset API

Posted by Edgardo Szrajber <sz...@yahoo.com.INVALID>.

If you want to keep the dataset, maybe you can try to add a constructor to the case class (through the companion objcet) that receives only the age.Bentzi

Sent from Yahoo Mail on Android 
 
  On Sat, May 9, 2020 at 17:50, Jorge Machado<jo...@me.com.INVALID> wrote:   Ok, I found a way to solve it. 

Just pass the schema like this: 

val schema = Encoders.product[Person].schema

spark.read.schema(schema).parquet(“input”)….

> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
> 
> Hello everyone, 
> 
> One question to the community. 
> 
> Imagine I have this
> 
>     Case class Person(age: int)
> 
>     spark.read.parquet(“inputPath”).as[Person]
> 
> 
> After a few weeks of coding I change the class to: 
>     Case class Person(age: int, name: Option[String] = None)
> 
> 
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 
> 
> Spark version 2.3.3
> 
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 
> 
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: How to deal Schema Evolution with Dataset API

Posted by Jorge Machado <jo...@me.com.INVALID>.

Ok, I found a way to solve it. 

Just pass the schema like this: 

val schema = Encoders.product[Person].schema

spark.read.schema(schema).parquet(“input”)….

> On 9. May 2020, at 13:28, Jorge Machado <jo...@me.com.INVALID> wrote:
> 
> Hello everyone, 
> 
> One question to the community. 
> 
> Imagine I have this
> 
> 	Case class Person(age: int)
> 
> 	spark.read.parquet(“inputPath”).as[Person]
> 
> 
> After a few weeks of coding I change the class to: 
> 	Case class Person(age: int, name: Option[String] = None)
> 
> 
> Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 
> 
> Spark version 2.3.3
> 
> How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 
> 
> Thx
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org