You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Jorge Machado <jo...@me.com> on 2020/05/09 07:07:09 UTC

How to deal Schema Evolution with Dataset API

Hello everyone, 

One question to the community. 

Imagine I have this

	Case class Person(age: int)

	spark.read.parquet(“inputPath”).as[Person]


After a few weeks of coding I change the class to: 
	Case class Person(age: int, name: Option[String] = None)


Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file. 

Spark version 2.3.3

How is the best way to guard or fix this? Regenerating all data seems not to be a option for us. 

Thx

Re: How to deal Schema Evolution with Dataset API

Posted by Mike Thomsen <mi...@gmail.com>.

This should be posted on the Spark user list, not the NiFi one.

On Sat, May 9, 2020 at 3:07 AM Jorge Machado <jo...@me.com> wrote:

> Hello everyone,
>
> One question to the community.
>
> Imagine I have this
>
>         Case class Person(age: int)
>
>         spark.read.parquet(“inputPath”).as[Person]
>
>
> After a few weeks of coding I change the class to:
>         Case class Person(age: int, name: Option[String] = None)
>
>
> Then when I run the new code on the same input it fails saying that It
> cannot find the name on the schema from the parquet file.
>
> Spark version 2.3.3
>
> How is the best way to guard or fix this? Regenerating all data seems not
> to be a option for us.
>
> Thx
>
>
>
>