You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by hmatu <hm...@foxmail.com> on 2020/01/26 18:13:06 UTC

[DISCUSS] Remove HoodieWriteClient

Hi guys,


As we know, hudi project contains HoodieWriteClient and HoodieSparkSource level framework. But may 99% user just use HoodieSparkSource except for uber. So I suggest remove HoodieWriteClient. WDYT?


Thanks
Hmatu

Re: [DISCUSS] Remove HoodieWriteClient

Posted by Vinoth Chandar <vi...@apache.org>.

yes. we will continue to invest in both is what I am trying to get across..
Agree 100% that spark datasource is a primary entry point for users

On Mon, Jan 27, 2020 at 1:59 AM hmatu <hm...@foxmail.com> wrote:

> Thanks.
>
>
>
> IMO, we should focus more on SparkDatasource level, not the compatibility
> with the HoodieClient level.
>
>
> Thanks,
> Hmatu
>
>
>
>
>
> ------------------&nbsp;Original&nbsp;------------------
> From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
> Date:&nbsp;Mon, Jan 27, 2020 02:45 AM
> To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] Remove HoodieWriteClient
>
>
>
> The datasource and deltastreamer are all built on top of the
> HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
> actually more efficient for ingesting data from say Kafka. We can go from
> avro to parquet or avro to avro directly (as opposed to avro -&gt; row
> -&gt;
> parquet, or avro -&gt; row -&gt; avro). This is one of the reasons for
> Hudi's
> design even.. RFC-13 will change a bunch of things here..
>
> But we do need the RDD api IMO
>
> On Sun, Jan 26, 2020 at 8:13 AM hmatu <hmantu@foxmail.com&gt; wrote:
>
> &gt; Hi guys,
> &gt;
> &gt;
> &gt; As we know, hudi project contains HoodieWriteClient and
> HoodieSparkSource
> &gt; level framework. But may 99% user just use HoodieSparkSource except
> for
> &gt; uber. So I suggest remove HoodieWriteClient. WDYT?
> &gt;
> &gt;
> &gt; Thanks
> &gt; Hmatu

Re: [DISCUSS] Remove HoodieWriteClient

Posted by hmatu <hm...@foxmail.com>.

Thanks. 



IMO, we should focus more on SparkDatasource level, not the compatibility with the HoodieClient level.


Thanks,
Hmatu





------------------&nbsp;Original&nbsp;------------------
From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
Date:&nbsp;Mon, Jan 27, 2020 02:45 AM
To:&nbsp;"dev"<dev@hudi.apache.org&gt;;

Subject:&nbsp;Re: [DISCUSS] Remove HoodieWriteClient



The datasource and deltastreamer are all built on top of the
HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
actually more efficient for ingesting data from say Kafka. We can go from
avro to parquet or avro to avro directly (as opposed to avro -&gt; row -&gt;
parquet, or avro -&gt; row -&gt; avro). This is one of the reasons for Hudi's
design even.. RFC-13 will change a bunch of things here..

But we do need the RDD api IMO

On Sun, Jan 26, 2020 at 8:13 AM hmatu <hmantu@foxmail.com&gt; wrote:

&gt; Hi guys,
&gt;
&gt;
&gt; As we know, hudi project contains HoodieWriteClient and HoodieSparkSource
&gt; level framework. But may 99% user just use HoodieSparkSource except for
&gt; uber. So I suggest remove HoodieWriteClient. WDYT?
&gt;
&gt;
&gt; Thanks
&gt; Hmatu

Re: [DISCUSS] Remove HoodieWriteClient

Posted by Vinoth Chandar <vi...@apache.org>.

The datasource and deltastreamer are all built on top of the
HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
actually more efficient for ingesting data from say Kafka. We can go from
avro to parquet or avro to avro directly (as opposed to avro -> row ->
parquet, or avro -> row -> avro). This is one of the reasons for Hudi's
design even.. RFC-13 will change a bunch of things here..

But we do need the RDD api IMO

On Sun, Jan 26, 2020 at 8:13 AM hmatu <hm...@foxmail.com> wrote:

> Hi guys,
>
>
> As we know, hudi project contains HoodieWriteClient and HoodieSparkSource
> level framework. But may 99% user just use HoodieSparkSource except for
> uber. So I suggest remove HoodieWriteClient. WDYT?
>
>
> Thanks
> Hmatu