You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2020/02/03 19:16:22 UTC
Re: [DISCUSS] Remove HoodieWriteClient
yes. we will continue to invest in both is what I am trying to get across..
Agree 100% that spark datasource is a primary entry point for users
On Mon, Jan 27, 2020 at 1:59 AM hmatu <hm...@foxmail.com> wrote:
> Thanks.
>
>
>
> IMO, we should focus more on SparkDatasource level, not the compatibility
> with the HoodieClient level.
>
>
> Thanks,
> Hmatu
>
>
>
>
>
> ------------------ Original ------------------
> From: "Vinoth Chandar"<vinoth@apache.org>;
> Date: Mon, Jan 27, 2020 02:45 AM
> To: "dev"<dev@hudi.apache.org>;
>
> Subject: Re: [DISCUSS] Remove HoodieWriteClient
>
>
>
> The datasource and deltastreamer are all built on top of the
> HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
> actually more efficient for ingesting data from say Kafka. We can go from
> avro to parquet or avro to avro directly (as opposed to avro -> row
> ->
> parquet, or avro -> row -> avro). This is one of the reasons for
> Hudi's
> design even.. RFC-13 will change a bunch of things here..
>
> But we do need the RDD api IMO
>
> On Sun, Jan 26, 2020 at 8:13 AM hmatu <hmantu@foxmail.com> wrote:
>
> > Hi guys,
> >
> >
> > As we know, hudi project contains HoodieWriteClient and
> HoodieSparkSource
> > level framework. But may 99% user just use HoodieSparkSource except
> for
> > uber. So I suggest remove HoodieWriteClient. WDYT?
> >
> >
> > Thanks
> > Hmatu