You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Rahul Narayanan <ra...@gmail.com> on 2020/12/03 18:58:16 UTC

Fwd: Schema evolution in hudi

---------- Forwarded message ---------
From: Rahul Narayanan <ra...@gmail.com>
Date: Thu, Dec 3, 2020 at 11:46 AM
Subject: Schema evolution in hudi
To: users@hudi.apache.org <us...@hudi.apache.org>

Hi Team,

We are interested in writing new columns and maybe removing some columns in
the future in our dataset. I have read hudi supports schema evolution if it
is backward compatible. To do a poc I tried writing a spark data frame to
hudi using schema but it’s failing. How to write a spark data frame to hudi
specifying the schema explicitly

Thanks in advance

Re: Schema evolution in hudi

Posted by Vinoth Chandar <vi...@apache.org>.

Hi Rahul,

On the specific scenario, if you could raise a GH Support issue with
steps/stacktrace we can certainly help out.

On the first part, we have relied on Avro schema evolution/compatibility
thus far, where you null out the old columns (which is very cheap for
parquet storage anyway).
For tools like delta-streamer, this is enforced by the external schema
registries. However, you are right that Spark data frame path may need some
more work.

Happy to work through this with you on a ticket as well.

thanks
vinoth

On Mon, Dec 7, 2020 at 12:50 PM Rahul Narayanan <ra...@gmail.com>
wrote:

> ---------- Forwarded message ---------
> From: Rahul Narayanan <ra...@gmail.com>
> Date: Thu, Dec 3, 2020 at 11:46 AM
> Subject: Schema evolution in hudi
> To: users@hudi.apache.org <us...@hudi.apache.org>
>
>
> Hi Team,
>
> We are interested in writing new columns and maybe removing some columns in
> the future in our dataset. I have read hudi supports schema evolution if it
> is backward compatible. To do a poc I tried writing a spark data frame to
> hudi using schema but it’s failing. How to write a spark data frame to hudi
> specifying the schema explicitly
>
> Thanks in advance
>

Re: Schema evolution in hudi

Posted by Vinoth Chandar <vi...@apache.org>.

Hi Rahul,

On the specific scenario, if you could raise a GH Support issue with
steps/stacktrace we can certainly help out.

On the first part, we have relied on Avro schema evolution/compatibility
thus far, where you null out the old columns (which is very cheap for
parquet storage anyway).
For tools like delta-streamer, this is enforced by the external schema
registries. However, you are right that Spark data frame path may need some
more work.

Happy to work through this with you on a ticket as well.

thanks
vinoth

On Mon, Dec 7, 2020 at 12:50 PM Rahul Narayanan <ra...@gmail.com>
wrote:

> ---------- Forwarded message ---------
> From: Rahul Narayanan <ra...@gmail.com>
> Date: Thu, Dec 3, 2020 at 11:46 AM
> Subject: Schema evolution in hudi
> To: users@hudi.apache.org <us...@hudi.apache.org>
>
>
> Hi Team,
>
> We are interested in writing new columns and maybe removing some columns in
> the future in our dataset. I have read hudi supports schema evolution if it
> is backward compatible. To do a poc I tried writing a spark data frame to
> hudi using schema but it’s failing. How to write a spark data frame to hudi
> specifying the schema explicitly
>
> Thanks in advance
>