You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2020/02/03 19:16:22 UTC

Re: [DISCUSS] Remove HoodieWriteClient

yes. we will continue to invest in both is what I am trying to get across..
Agree 100% that spark datasource is a primary entry point for users

On Mon, Jan 27, 2020 at 1:59 AM hmatu <hm...@foxmail.com> wrote:

> Thanks.
>
>
>
> IMO, we should focus more on SparkDatasource level, not the compatibility
> with the HoodieClient level.
>
>
> Thanks,
> Hmatu
>
>
>
>
>
> ------------------&nbsp;Original&nbsp;------------------
> From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
> Date:&nbsp;Mon, Jan 27, 2020 02:45 AM
> To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] Remove HoodieWriteClient
>
>
>
> The datasource and deltastreamer are all built on top of the
> HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
> actually more efficient for ingesting data from say Kafka. We can go from
> avro to parquet or avro to avro directly (as opposed to avro -&gt; row
> -&gt;
> parquet, or avro -&gt; row -&gt; avro). This is one of the reasons for
> Hudi's
> design even.. RFC-13 will change a bunch of things here..
>
> But we do need the RDD api IMO
>
> On Sun, Jan 26, 2020 at 8:13 AM hmatu <hmantu@foxmail.com&gt; wrote:
>
> &gt; Hi guys,
> &gt;
> &gt;
> &gt; As we know, hudi project contains HoodieWriteClient and
> HoodieSparkSource
> &gt; level framework. But may 99% user just use HoodieSparkSource except
> for
> &gt; uber. So I suggest remove HoodieWriteClient. WDYT?
> &gt;
> &gt;
> &gt; Thanks
> &gt; Hmatu