You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Rui Li <li...@gmail.com> on 2020/10/10 13:16:15 UTC

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Thanks for pointing me to the RFC! When using Spark to write a table, we
need to launch several Spark jobs, e.g. to search index and tag locations,
workload profiling, etc. Now RFC-13 aims to encapsulate all these in a
single Flink DAG, right? Do we have plans about how to achieve this?

On Tue, Sep 29, 2020 at 9:40 AM 王** <wx...@126.com> wrote:

> Hi Rui
> Thanks for asking, the design for flink integeration can be found here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
> please ping me if you have any questions.
>
>
> At 2020-09-28 20:43:22, "Rui Li" <li...@apache.org> wrote:
> >Hello,
> >
> >Very excited to see the on-going efforts for Flink integration. I wonder
> >whether there's a design doc for this feature? I would like to learn more
> >and hopefully to make some contributions.
> >
> >On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com>
> wrote:
> >
> >> Yes, we have some ideas around schema evolution and have discussed with
> >> Balaji before as well. I'm going to put these thoughts down and share
> it on
> >> the cWiki for all of us to jam. Realistically, I don't think we can hit
> in
> >> 0.7.0. We already have a pretty strong list of items for 0.7.0.
> >>
> >> Spark 3 SQL syntax like MERGE will definitely boost usability!
> >>
> >> Thanks,
> >> Nishith
> >>
> >> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> >>
> >> > On schema evolution, Nishith and Balaji were both thinking about this.
> >> May
> >> > be there is a proposal in works?
> >> > I would guess we will not be able to hit it in 0.7.0 though. Maybe by
> the
> >> > end of year/0.8.0?
> >> >
> >> > Tanu, thanks for the kind words! def, if we pull together, we will
> reach
> >> > there sooner. Looking forward to more contributions! :)
> >> >
> >> > >We were actually thinking of moving to Spark 3.0 but thought it’s too
> >> > early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> >> > That's correct. There is a PR already open for this. We expect this
> to be
> >> > fixed in 0.6.1 shortly and we will unlock spark 3.0 support
> >> >
> >> > 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems
> that
> >> > have had this, either had an unfair head start or built ahead with
> spark
> >> 3
> >> > in mind. :))
> >> > We will close this gap down.
> >> >
> >> > On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <
> xu.shiyan.raymond@gmail.com>
> >> > wrote:
> >> >
> >> > > +1 on the full schema evolution support. May I know which ticket
> this
> >> is
> >> > > related to? thanks.
> >> > >
> >> > > On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
> >> > >
> >> > > > Thanks Vinoth, also we would consider support full schema
> >> > evolution(such
> >> > > as
> >> > > >
> >> > > > drop some fields) of hudi in 0.7.0, since right now hudi follows
> avro
> >> > > >
> >> > > > schema compatibility
> >> > > >
> >> > > >
> >> > > >
> >> > > > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
> >> > > >
> >> > > >
> >> > > >
> >> > > > > Thanks Vinoth. These are really exciting items and hats off to
> you
> >> > and
> >> > > > team
> >> > > >
> >> > > > > in pushing the releases swiftly and improving the framework all
> the
> >> > > > time. I
> >> > > >
> >> > > > > hope someday I will start contributing once I will get free
> from my
> >> > > major
> >> > > >
> >> > > > > deliverables and have more understanding the nitty gritty
> details
> >> of
> >> > > > Hudi.
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > You have mentioned Spark3.0 support in next release. We were
> >> actually
> >> > > >
> >> > > > > thinking of moving to Spark 3.0 but thought it’s too early with
> 0.6
> >> > > >
> >> > > > > release. Is 0.6 not fully tested with Spark 3.0 ?
> >> > > >
> >> > > > >
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <
> vinoth@apache.org>
> >> > > > wrote:
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > > Hello all,
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Pursuant to our conversation around release planning, I am
> happy
> >> to
> >> > > > share
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > the initial set of proposals for the next minor/major releases
> >> > (minor
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > release ofc can go out based on time)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> >> > > 0.6.0..)
> >> > > > *
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Flink/Writer common refactoring for Flink
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Small file handling support w/o caching
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Spark3 Support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Remaining bootstrap items
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Full list here :
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> https://issues.apache.org/jira/projects/HUDI/versions/12348168
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > <
> https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > *0.7.0 with major new features *
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-15: metadata, range index (w/ spark support), bloom index
> >> > > > (eliminate
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > file listing, query pruning, improve bloom index perf)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-08: Record Index (to solve global index scalability/perf)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-18/19: Clustering/Insert overwrite
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Spark 3 based datasource rewrite (structured streaming
> >> sink/source,
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > DELETE/MERGE)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Incremental Query on logs (Hive, Spark)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Parallel writing support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Redesign of marker files for S3
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Stretch: ORC, PrestoSQL Support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Full list here :
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> https://issues.apache.org/jira/projects/HUDI/versions/12348721
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Please chime in with your thoughts. If you would like to
> commit
> >> to
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > contributing a feature towards a release, please do so by
> marking
> >> > > *`Fix
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Version/s`* field with that release number.
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Thanks
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Vinoth
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >--
> >Cheers,
> >Rui Li
>


-- 
Best regards!
Rui Li

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by wangxianghu <wx...@126.com>.

Hi Rui
This article may answer your question：https://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50 <http://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50>
中文版：https://mp.weixin.qq.com/s/LvKaj5ytk6imEU5Dc1Sr5Q

> 2020年10月10日 下午9:16，Rui Li <li...@gmail.com> 写道：
> 
> Thanks for pointing me to the RFC! When using Spark to write a table, we
> need to launch several Spark jobs, e.g. to search index and tag locations,
> workload profiling, etc. Now RFC-13 aims to encapsulate all these in a
> single Flink DAG, right? Do we have plans about how to achieve this?
> 
> On Tue, Sep 29, 2020 at 9:40 AM 王** <wx...@126.com> wrote:
> 
>> Hi Rui
>> Thanks for asking, the design for flink integeration can be found here:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
>> please ping me if you have any questions.
>> 
>> 
>> At 2020-09-28 20:43:22, "Rui Li" <li...@apache.org> wrote:
>>> Hello,
>>> 
>>> Very excited to see the on-going efforts for Flink integration. I wonder
>>> whether there's a design doc for this feature? I would like to learn more
>>> and hopefully to make some contributions.
>>> 
>>> On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com>
>> wrote:
>>> 
>>>> Yes, we have some ideas around schema evolution and have discussed with
>>>> Balaji before as well. I'm going to put these thoughts down and share
>> it on
>>>> the cWiki for all of us to jam. Realistically, I don't think we can hit
>> in
>>>> 0.7.0. We already have a pretty strong list of items for 0.7.0.
>>>> 
>>>> Spark 3 SQL syntax like MERGE will definitely boost usability!
>>>> 
>>>> Thanks,
>>>> Nishith
>>>> 
>>>> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org>
>> wrote:
>>>> 
>>>>> On schema evolution, Nishith and Balaji were both thinking about this.
>>>> May
>>>>> be there is a proposal in works?
>>>>> I would guess we will not be able to hit it in 0.7.0 though. Maybe by
>> the
>>>>> end of year/0.8.0?
>>>>> 
>>>>> Tanu, thanks for the kind words! def, if we pull together, we will
>> reach
>>>>> there sooner. Looking forward to more contributions! :)
>>>>> 
>>>>>> We were actually thinking of moving to Spark 3.0 but thought it’s too
>>>>> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
>>>>> That's correct. There is a PR already open for this. We expect this
>> to be
>>>>> fixed in 0.6.1 shortly and we will unlock spark 3.0 support
>>>>> 
>>>>> 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems
>> that
>>>>> have had this, either had an unfair head start or built ahead with
>> spark
>>>> 3
>>>>> in mind. :))
>>>>> We will close this gap down.
>>>>> 
>>>>> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <
>> xu.shiyan.raymond@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> +1 on the full schema evolution support. May I know which ticket
>> this
>>>> is
>>>>>> related to? thanks.
>>>>>> 
>>>>>> On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
>>>>>> 
>>>>>>> Thanks Vinoth, also we would consider support full schema
>>>>> evolution(such
>>>>>> as
>>>>>>> 
>>>>>>> drop some fields) of hudi in 0.7.0, since right now hudi follows
>> avro
>>>>>>> 
>>>>>>> schema compatibility
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks Vinoth. These are really exciting items and hats off to
>> you
>>>>> and
>>>>>>> team
>>>>>>> 
>>>>>>>> in pushing the releases swiftly and improving the framework all
>> the
>>>>>>> time. I
>>>>>>> 
>>>>>>>> hope someday I will start contributing once I will get free
>> from my
>>>>>> major
>>>>>>> 
>>>>>>>> deliverables and have more understanding the nitty gritty
>> details
>>>> of
>>>>>>> Hudi.
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> You have mentioned Spark3.0 support in next release. We were
>>>> actually
>>>>>>> 
>>>>>>>> thinking of moving to Spark 3.0 but thought it’s too early with
>> 0.6
>>>>>>> 
>>>>>>>> release. Is 0.6 not fully tested with Spark 3.0 ?
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <
>> vinoth@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>>> Hello all,
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Pursuant to our conversation around release planning, I am
>> happy
>>>> to
>>>>>>> share
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> the initial set of proposals for the next minor/major releases
>>>>> (minor
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> release ofc can go out based on time)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> *Next Minor version 0.6.1 (with stuff that did not make it to
>>>>>> 0.6.0..)
>>>>>>> *
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Flink/Writer common refactoring for Flink
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Small file handling support w/o caching
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Spark3 Support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Remaining bootstrap items
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Completing bulk_insertV2 (sort mode, de-dup etc)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Full list here :
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>> https://issues.apache.org/jira/projects/HUDI/versions/12348168
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> <
>> https://issues.apache.org/jira/projects/HUDI/versions/12348168>
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> *0.7.0 with major new features *
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-15: metadata, range index (w/ spark support), bloom index
>>>>>>> (eliminate
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> file listing, query pruning, improve bloom index perf)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-08: Record Index (to solve global index scalability/perf)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-18/19: Clustering/Insert overwrite
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Spark 3 based datasource rewrite (structured streaming
>>>> sink/source,
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> DELETE/MERGE)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Incremental Query on logs (Hive, Spark)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Parallel writing support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Redesign of marker files for S3
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Stretch: ORC, PrestoSQL Support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Full list here :
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>> https://issues.apache.org/jira/projects/HUDI/versions/12348721
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Please chime in with your thoughts. If you would like to
>> commit
>>>> to
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> contributing a feature towards a release, please do so by
>> marking
>>>>>> *`Fix
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Version/s`* field with that release number.
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Thanks
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Vinoth
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Cheers,
>>> Rui Li
>> 
> 
> 
> -- 
> Best regards!
> Rui Li