You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Vinay Patil <vi...@gmail.com> on 2021/07/17 14:57:53 UTC

[DISCUSS] Create Spark and Flink utilities module

Hi Team,

As part of https://issues.apache.org/jira/browse/HUDI-1872, we are creating
a separate flink-utilities module. Based on our discussion on the PR,
should we even create a spark-utilities module. This would look like :

hudi-utilities
├── hudi-flink-utilities
└── hudi-spark-utilities

This would also mean to create separate utilities-bundle for Flink and
Spark,

hudi-utilities-bundle
 ├── hudi-flink-utilities-bundle
 └── hudi-spark-utilities-bundle

This is not a backward compatible change as users will have to provide an
engine specific bundle. IMO, since Hudi is supporting Flink and Spark it
will be good to have  engine specific bundle.

What do you think?

Regards,
Vinay Patil

Re: [DISCUSS] Create Spark and Flink utilities module

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Vinay,

I am not sure why we are bundling parquet with Flink. If so, we could try
and resolve that?
That's the route we can first take IMO.

Our bundles don't bundle, spark, flink, hadoop, parquet. So I think a
single bundle is doable.

Happy to help with specific issues as they come up.

On Tue, Jul 20, 2021 at 9:35 AM Vinay Patil <vi...@gmail.com> wrote:

> Hi Vinoth,
>
> > I wonder if it's possible to structure the code in separate modules, but
> have a single bundle
>
> Yes, this is possible, initially I started doing the same in this PR -
> https://github.com/apache/hudi/pull/3162 , hence wanted to discuss this
> here, if we create a single bundle, we have to make sure there are no
> dependency conflicts. For example: Flink-Hudi is using a different version
> of parquet as compared to Spark-Hudi because of which the tests started to
> fail with `java.lang.NoSuchMethodError:
> org.apache.parquet.column.ParquetProperties.getColumnIndexTruncateLength()`
>
>
> Regards,
> Vinay Patil
>
>
> On Tue, Jul 20, 2021 at 9:46 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Hi Vinay.
> >
> > Thanks for kicking this off.
> >
> > I wonder if it's possible to structure the code in separate modules, but
> > have a single bundle.
> > Or is that a painful experience? (if so, can you share what issues we are
> > running into?)
> >
> > We have rarely done backwards incompatible changes and users appreciate
> > that.
> > So love to understand why this is warranted here.
> >
> > Thanks
> > Vinoth
> >
> > On Sat, Jul 17, 2021 at 7:58 AM Vinay Patil <vi...@gmail.com>
> > wrote:
> >
> > > Hi Team,
> > >
> > > As part of https://issues.apache.org/jira/browse/HUDI-1872, we are
> > > creating
> > > a separate flink-utilities module. Based on our discussion on the PR,
> > > should we even create a spark-utilities module. This would look like :
> > >
> > > hudi-utilities
> > > ├── hudi-flink-utilities
> > > └── hudi-spark-utilities
> > >
> > > This would also mean to create separate utilities-bundle for Flink and
> > > Spark,
> > >
> > > hudi-utilities-bundle
> > >  ├── hudi-flink-utilities-bundle
> > >  └── hudi-spark-utilities-bundle
> > >
> > > This is not a backward compatible change as users will have to provide
> an
> > > engine specific bundle. IMO, since Hudi is supporting Flink and Spark
> it
> > > will be good to have  engine specific bundle.
> > >
> > > What do you think?
> > >
> > > Regards,
> > > Vinay Patil
> > >
> >
>

Re: [DISCUSS] Create Spark and Flink utilities module

Posted by Vinay Patil <vi...@gmail.com>.
Hi Vinoth,

> I wonder if it's possible to structure the code in separate modules, but
have a single bundle

Yes, this is possible, initially I started doing the same in this PR -
https://github.com/apache/hudi/pull/3162 , hence wanted to discuss this
here, if we create a single bundle, we have to make sure there are no
dependency conflicts. For example: Flink-Hudi is using a different version
of parquet as compared to Spark-Hudi because of which the tests started to
fail with `java.lang.NoSuchMethodError:
org.apache.parquet.column.ParquetProperties.getColumnIndexTruncateLength()`


Regards,
Vinay Patil


On Tue, Jul 20, 2021 at 9:46 PM Vinoth Chandar <vi...@apache.org> wrote:

> Hi Vinay.
>
> Thanks for kicking this off.
>
> I wonder if it's possible to structure the code in separate modules, but
> have a single bundle.
> Or is that a painful experience? (if so, can you share what issues we are
> running into?)
>
> We have rarely done backwards incompatible changes and users appreciate
> that.
> So love to understand why this is warranted here.
>
> Thanks
> Vinoth
>
> On Sat, Jul 17, 2021 at 7:58 AM Vinay Patil <vi...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > As part of https://issues.apache.org/jira/browse/HUDI-1872, we are
> > creating
> > a separate flink-utilities module. Based on our discussion on the PR,
> > should we even create a spark-utilities module. This would look like :
> >
> > hudi-utilities
> > ├── hudi-flink-utilities
> > └── hudi-spark-utilities
> >
> > This would also mean to create separate utilities-bundle for Flink and
> > Spark,
> >
> > hudi-utilities-bundle
> >  ├── hudi-flink-utilities-bundle
> >  └── hudi-spark-utilities-bundle
> >
> > This is not a backward compatible change as users will have to provide an
> > engine specific bundle. IMO, since Hudi is supporting Flink and Spark it
> > will be good to have  engine specific bundle.
> >
> > What do you think?
> >
> > Regards,
> > Vinay Patil
> >
>

Re: [DISCUSS] Create Spark and Flink utilities module

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Vinay.

Thanks for kicking this off.

I wonder if it's possible to structure the code in separate modules, but
have a single bundle.
Or is that a painful experience? (if so, can you share what issues we are
running into?)

We have rarely done backwards incompatible changes and users appreciate
that.
So love to understand why this is warranted here.

Thanks
Vinoth

On Sat, Jul 17, 2021 at 7:58 AM Vinay Patil <vi...@gmail.com> wrote:

> Hi Team,
>
> As part of https://issues.apache.org/jira/browse/HUDI-1872, we are
> creating
> a separate flink-utilities module. Based on our discussion on the PR,
> should we even create a spark-utilities module. This would look like :
>
> hudi-utilities
> ├── hudi-flink-utilities
> └── hudi-spark-utilities
>
> This would also mean to create separate utilities-bundle for Flink and
> Spark,
>
> hudi-utilities-bundle
>  ├── hudi-flink-utilities-bundle
>  └── hudi-spark-utilities-bundle
>
> This is not a backward compatible change as users will have to provide an
> engine specific bundle. IMO, since Hudi is supporting Flink and Spark it
> will be good to have  engine specific bundle.
>
> What do you think?
>
> Regards,
> Vinay Patil
>