You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2020/09/23 02:55:02 UTC

[DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Hello all,

Pursuant to our conversation around release planning, I am happy to share
the initial set of proposals for the next minor/major releases (minor
release ofc can go out based on time)

*Next Minor version 0.6.1 (with stuff that did not make it to 0.6.0..) *
Flink/Writer common refactoring for Flink
Small file handling support w/o caching
Spark3 Support
Remaining bootstrap items
Completing bulk_insertV2 (sort mode, de-dup etc)
Full list here :
https://issues.apache.org/jira/projects/HUDI/versions/12348168
<https://issues.apache.org/jira/projects/HUDI/versions/12348168>

*0.7.0 with major new features *
RFC-15: metadata, range index (w/ spark support), bloom index (eliminate
file listing, query pruning, improve bloom index perf)
RFC-08: Record Index (to solve global index scalability/perf)
RFC-18/19: Clustering/Insert overwrite
Spark 3 based datasource rewrite (structured streaming sink/source,
DELETE/MERGE)
Incremental Query on logs (Hive, Spark)
Parallel writing support
Redesign of marker files for S3
Stretch: ORC, PrestoSQL Support

Full list here :
https://issues.apache.org/jira/projects/HUDI/versions/12348721

Please chime in with your thoughts. If you would like to commit to
contributing a feature towards a release, please do so by marking *`Fix
Version/s`* field with that release number.

Thanks
Vinoth

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by wangxianghu <wx...@126.com>.

Hi Rui
This article may answer your question：https://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50 <http://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50>
中文版：https://mp.weixin.qq.com/s/LvKaj5ytk6imEU5Dc1Sr5Q

> 2020年10月10日 下午9:16，Rui Li <li...@gmail.com> 写道：
> 
> Thanks for pointing me to the RFC! When using Spark to write a table, we
> need to launch several Spark jobs, e.g. to search index and tag locations,
> workload profiling, etc. Now RFC-13 aims to encapsulate all these in a
> single Flink DAG, right? Do we have plans about how to achieve this?
> 
> On Tue, Sep 29, 2020 at 9:40 AM 王** <wx...@126.com> wrote:
> 
>> Hi Rui
>> Thanks for asking, the design for flink integeration can be found here:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
>> please ping me if you have any questions.
>> 
>> 
>> At 2020-09-28 20:43:22, "Rui Li" <li...@apache.org> wrote:
>>> Hello,
>>> 
>>> Very excited to see the on-going efforts for Flink integration. I wonder
>>> whether there's a design doc for this feature? I would like to learn more
>>> and hopefully to make some contributions.
>>> 
>>> On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com>
>> wrote:
>>> 
>>>> Yes, we have some ideas around schema evolution and have discussed with
>>>> Balaji before as well. I'm going to put these thoughts down and share
>> it on
>>>> the cWiki for all of us to jam. Realistically, I don't think we can hit
>> in
>>>> 0.7.0. We already have a pretty strong list of items for 0.7.0.
>>>> 
>>>> Spark 3 SQL syntax like MERGE will definitely boost usability!
>>>> 
>>>> Thanks,
>>>> Nishith
>>>> 
>>>> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org>
>> wrote:
>>>> 
>>>>> On schema evolution, Nishith and Balaji were both thinking about this.
>>>> May
>>>>> be there is a proposal in works?
>>>>> I would guess we will not be able to hit it in 0.7.0 though. Maybe by
>> the
>>>>> end of year/0.8.0?
>>>>> 
>>>>> Tanu, thanks for the kind words! def, if we pull together, we will
>> reach
>>>>> there sooner. Looking forward to more contributions! :)
>>>>> 
>>>>>> We were actually thinking of moving to Spark 3.0 but thought it’s too
>>>>> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
>>>>> That's correct. There is a PR already open for this. We expect this
>> to be
>>>>> fixed in 0.6.1 shortly and we will unlock spark 3.0 support
>>>>> 
>>>>> 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems
>> that
>>>>> have had this, either had an unfair head start or built ahead with
>> spark
>>>> 3
>>>>> in mind. :))
>>>>> We will close this gap down.
>>>>> 
>>>>> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <
>> xu.shiyan.raymond@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> +1 on the full schema evolution support. May I know which ticket
>> this
>>>> is
>>>>>> related to? thanks.
>>>>>> 
>>>>>> On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
>>>>>> 
>>>>>>> Thanks Vinoth, also we would consider support full schema
>>>>> evolution(such
>>>>>> as
>>>>>>> 
>>>>>>> drop some fields) of hudi in 0.7.0, since right now hudi follows
>> avro
>>>>>>> 
>>>>>>> schema compatibility
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks Vinoth. These are really exciting items and hats off to
>> you
>>>>> and
>>>>>>> team
>>>>>>> 
>>>>>>>> in pushing the releases swiftly and improving the framework all
>> the
>>>>>>> time. I
>>>>>>> 
>>>>>>>> hope someday I will start contributing once I will get free
>> from my
>>>>>> major
>>>>>>> 
>>>>>>>> deliverables and have more understanding the nitty gritty
>> details
>>>> of
>>>>>>> Hudi.
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> You have mentioned Spark3.0 support in next release. We were
>>>> actually
>>>>>>> 
>>>>>>>> thinking of moving to Spark 3.0 but thought it’s too early with
>> 0.6
>>>>>>> 
>>>>>>>> release. Is 0.6 not fully tested with Spark 3.0 ?
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>> On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <
>> vinoth@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>>>> Hello all,
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Pursuant to our conversation around release planning, I am
>> happy
>>>> to
>>>>>>> share
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> the initial set of proposals for the next minor/major releases
>>>>> (minor
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> release ofc can go out based on time)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> *Next Minor version 0.6.1 (with stuff that did not make it to
>>>>>> 0.6.0..)
>>>>>>> *
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Flink/Writer common refactoring for Flink
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Small file handling support w/o caching
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Spark3 Support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Remaining bootstrap items
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Completing bulk_insertV2 (sort mode, de-dup etc)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Full list here :
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>> https://issues.apache.org/jira/projects/HUDI/versions/12348168
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> <
>> https://issues.apache.org/jira/projects/HUDI/versions/12348168>
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> *0.7.0 with major new features *
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-15: metadata, range index (w/ spark support), bloom index
>>>>>>> (eliminate
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> file listing, query pruning, improve bloom index perf)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-08: Record Index (to solve global index scalability/perf)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> RFC-18/19: Clustering/Insert overwrite
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Spark 3 based datasource rewrite (structured streaming
>>>> sink/source,
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> DELETE/MERGE)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Incremental Query on logs (Hive, Spark)
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Parallel writing support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Redesign of marker files for S3
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Stretch: ORC, PrestoSQL Support
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Full list here :
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>> https://issues.apache.org/jira/projects/HUDI/versions/12348721
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Please chime in with your thoughts. If you would like to
>> commit
>>>> to
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> contributing a feature towards a release, please do so by
>> marking
>>>>>> *`Fix
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Version/s`* field with that release number.
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Thanks
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> Vinoth
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Cheers,
>>> Rui Li
>> 
> 
> 
> -- 
> Best regards!
> Rui Li

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by Rui Li <li...@gmail.com>.

Thanks for pointing me to the RFC! When using Spark to write a table, we
need to launch several Spark jobs, e.g. to search index and tag locations,
workload profiling, etc. Now RFC-13 aims to encapsulate all these in a
single Flink DAG, right? Do we have plans about how to achieve this?

On Tue, Sep 29, 2020 at 9:40 AM 王** <wx...@126.com> wrote:

> Hi Rui
> Thanks for asking, the design for flink integeration can be found here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
> please ping me if you have any questions.
>
>
> At 2020-09-28 20:43:22, "Rui Li" <li...@apache.org> wrote:
> >Hello,
> >
> >Very excited to see the on-going efforts for Flink integration. I wonder
> >whether there's a design doc for this feature? I would like to learn more
> >and hopefully to make some contributions.
> >
> >On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com>
> wrote:
> >
> >> Yes, we have some ideas around schema evolution and have discussed with
> >> Balaji before as well. I'm going to put these thoughts down and share
> it on
> >> the cWiki for all of us to jam. Realistically, I don't think we can hit
> in
> >> 0.7.0. We already have a pretty strong list of items for 0.7.0.
> >>
> >> Spark 3 SQL syntax like MERGE will definitely boost usability!
> >>
> >> Thanks,
> >> Nishith
> >>
> >> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> >>
> >> > On schema evolution, Nishith and Balaji were both thinking about this.
> >> May
> >> > be there is a proposal in works?
> >> > I would guess we will not be able to hit it in 0.7.0 though. Maybe by
> the
> >> > end of year/0.8.0?
> >> >
> >> > Tanu, thanks for the kind words! def, if we pull together, we will
> reach
> >> > there sooner. Looking forward to more contributions! :)
> >> >
> >> > >We were actually thinking of moving to Spark 3.0 but thought it’s too
> >> > early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> >> > That's correct. There is a PR already open for this. We expect this
> to be
> >> > fixed in 0.6.1 shortly and we will unlock spark 3.0 support
> >> >
> >> > 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems
> that
> >> > have had this, either had an unfair head start or built ahead with
> spark
> >> 3
> >> > in mind. :))
> >> > We will close this gap down.
> >> >
> >> > On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <
> xu.shiyan.raymond@gmail.com>
> >> > wrote:
> >> >
> >> > > +1 on the full schema evolution support. May I know which ticket
> this
> >> is
> >> > > related to? thanks.
> >> > >
> >> > > On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
> >> > >
> >> > > > Thanks Vinoth, also we would consider support full schema
> >> > evolution(such
> >> > > as
> >> > > >
> >> > > > drop some fields) of hudi in 0.7.0, since right now hudi follows
> avro
> >> > > >
> >> > > > schema compatibility
> >> > > >
> >> > > >
> >> > > >
> >> > > > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
> >> > > >
> >> > > >
> >> > > >
> >> > > > > Thanks Vinoth. These are really exciting items and hats off to
> you
> >> > and
> >> > > > team
> >> > > >
> >> > > > > in pushing the releases swiftly and improving the framework all
> the
> >> > > > time. I
> >> > > >
> >> > > > > hope someday I will start contributing once I will get free
> from my
> >> > > major
> >> > > >
> >> > > > > deliverables and have more understanding the nitty gritty
> details
> >> of
> >> > > > Hudi.
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > You have mentioned Spark3.0 support in next release. We were
> >> actually
> >> > > >
> >> > > > > thinking of moving to Spark 3.0 but thought it’s too early with
> 0.6
> >> > > >
> >> > > > > release. Is 0.6 not fully tested with Spark 3.0 ?
> >> > > >
> >> > > > >
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <
> vinoth@apache.org>
> >> > > > wrote:
> >> > > >
> >> > > > >
> >> > > >
> >> > > > > > Hello all,
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Pursuant to our conversation around release planning, I am
> happy
> >> to
> >> > > > share
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > the initial set of proposals for the next minor/major releases
> >> > (minor
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > release ofc can go out based on time)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> >> > > 0.6.0..)
> >> > > > *
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Flink/Writer common refactoring for Flink
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Small file handling support w/o caching
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Spark3 Support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Remaining bootstrap items
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Full list here :
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> https://issues.apache.org/jira/projects/HUDI/versions/12348168
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > <
> https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > *0.7.0 with major new features *
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-15: metadata, range index (w/ spark support), bloom index
> >> > > > (eliminate
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > file listing, query pruning, improve bloom index perf)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-08: Record Index (to solve global index scalability/perf)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > RFC-18/19: Clustering/Insert overwrite
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Spark 3 based datasource rewrite (structured streaming
> >> sink/source,
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > DELETE/MERGE)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Incremental Query on logs (Hive, Spark)
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Parallel writing support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Redesign of marker files for S3
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Stretch: ORC, PrestoSQL Support
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Full list here :
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> https://issues.apache.org/jira/projects/HUDI/versions/12348721
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Please chime in with your thoughts. If you would like to
> commit
> >> to
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > contributing a feature towards a release, please do so by
> marking
> >> > > *`Fix
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Version/s`* field with that release number.
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Thanks
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > > Vinoth
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > > >
> >> > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >--
> >Cheers,
> >Rui Li
>


-- 
Best regards!
Rui Li

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by 王** <wx...@126.com>.

Hi Rui
Thanks for asking, the design for flink integeration can be found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
please ping me if you have any questions.


At 2020-09-28 20:43:22, "Rui Li" <li...@apache.org> wrote:
>Hello,
>
>Very excited to see the on-going efforts for Flink integration. I wonder
>whether there's a design doc for this feature? I would like to learn more
>and hopefully to make some contributions.
>
>On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com> wrote:
>
>> Yes, we have some ideas around schema evolution and have discussed with
>> Balaji before as well. I'm going to put these thoughts down and share it on
>> the cWiki for all of us to jam. Realistically, I don't think we can hit in
>> 0.7.0. We already have a pretty strong list of items for 0.7.0.
>>
>> Spark 3 SQL syntax like MERGE will definitely boost usability!
>>
>> Thanks,
>> Nishith
>>
>> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org> wrote:
>>
>> > On schema evolution, Nishith and Balaji were both thinking about this.
>> May
>> > be there is a proposal in works?
>> > I would guess we will not be able to hit it in 0.7.0 though. Maybe by the
>> > end of year/0.8.0?
>> >
>> > Tanu, thanks for the kind words! def, if we pull together, we will reach
>> > there sooner. Looking forward to more contributions! :)
>> >
>> > >We were actually thinking of moving to Spark 3.0 but thought it’s too
>> > early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
>> > That's correct. There is a PR already open for this. We expect this to be
>> > fixed in 0.6.1 shortly and we will unlock spark 3.0 support
>> >
>> > 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems that
>> > have had this, either had an unfair head start or built ahead with spark
>> 3
>> > in mind. :))
>> > We will close this gap down.
>> >
>> > On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <xu...@gmail.com>
>> > wrote:
>> >
>> > > +1 on the full schema evolution support. May I know which ticket this
>> is
>> > > related to? thanks.
>> > >
>> > > On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
>> > >
>> > > > Thanks Vinoth, also we would consider support full schema
>> > evolution(such
>> > > as
>> > > >
>> > > > drop some fields) of hudi in 0.7.0, since right now hudi follows avro
>> > > >
>> > > > schema compatibility
>> > > >
>> > > >
>> > > >
>> > > > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
>> > > >
>> > > >
>> > > >
>> > > > > Thanks Vinoth. These are really exciting items and hats off to you
>> > and
>> > > > team
>> > > >
>> > > > > in pushing the releases swiftly and improving the framework all the
>> > > > time. I
>> > > >
>> > > > > hope someday I will start contributing once I will get free from my
>> > > major
>> > > >
>> > > > > deliverables and have more understanding the nitty gritty details
>> of
>> > > > Hudi.
>> > > >
>> > > > >
>> > > >
>> > > > > You have mentioned Spark3.0 support in next release. We were
>> actually
>> > > >
>> > > > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
>> > > >
>> > > > > release. Is 0.6 not fully tested with Spark 3.0 ?
>> > > >
>> > > > >
>> > > >
>> > > > >
>> > > >
>> > > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org>
>> > > > wrote:
>> > > >
>> > > > >
>> > > >
>> > > > > > Hello all,
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > Pursuant to our conversation around release planning, I am happy
>> to
>> > > > share
>> > > >
>> > > > > >
>> > > >
>> > > > > > the initial set of proposals for the next minor/major releases
>> > (minor
>> > > >
>> > > > > >
>> > > >
>> > > > > > release ofc can go out based on time)
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
>> > > 0.6.0..)
>> > > > *
>> > > >
>> > > > > >
>> > > >
>> > > > > > Flink/Writer common refactoring for Flink
>> > > >
>> > > > > >
>> > > >
>> > > > > > Small file handling support w/o caching
>> > > >
>> > > > > >
>> > > >
>> > > > > > Spark3 Support
>> > > >
>> > > > > >
>> > > >
>> > > > > > Remaining bootstrap items
>> > > >
>> > > > > >
>> > > >
>> > > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
>> > > >
>> > > > > >
>> > > >
>> > > > > > Full list here :
>> > > >
>> > > > > >
>> > > >
>> > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
>> > > >
>> > > > > >
>> > > >
>> > > > > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > *0.7.0 with major new features *
>> > > >
>> > > > > >
>> > > >
>> > > > > > RFC-15: metadata, range index (w/ spark support), bloom index
>> > > > (eliminate
>> > > >
>> > > > > >
>> > > >
>> > > > > > file listing, query pruning, improve bloom index perf)
>> > > >
>> > > > > >
>> > > >
>> > > > > > RFC-08: Record Index (to solve global index scalability/perf)
>> > > >
>> > > > > >
>> > > >
>> > > > > > RFC-18/19: Clustering/Insert overwrite
>> > > >
>> > > > > >
>> > > >
>> > > > > > Spark 3 based datasource rewrite (structured streaming
>> sink/source,
>> > > >
>> > > > > >
>> > > >
>> > > > > > DELETE/MERGE)
>> > > >
>> > > > > >
>> > > >
>> > > > > > Incremental Query on logs (Hive, Spark)
>> > > >
>> > > > > >
>> > > >
>> > > > > > Parallel writing support
>> > > >
>> > > > > >
>> > > >
>> > > > > > Redesign of marker files for S3
>> > > >
>> > > > > >
>> > > >
>> > > > > > Stretch: ORC, PrestoSQL Support
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > Full list here :
>> > > >
>> > > > > >
>> > > >
>> > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > Please chime in with your thoughts. If you would like to commit
>> to
>> > > >
>> > > > > >
>> > > >
>> > > > > > contributing a feature towards a release, please do so by marking
>> > > *`Fix
>> > > >
>> > > > > >
>> > > >
>> > > > > > Version/s`* field with that release number.
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > > > Thanks
>> > > >
>> > > > > >
>> > > >
>> > > > > > Vinoth
>> > > >
>> > > > > >
>> > > >
>> > > > > >
>> > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>
>-- 
>Cheers,
>Rui Li

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by Rui Li <li...@apache.org>.

Hello,

Very excited to see the on-going efforts for Flink integration. I wonder
whether there's a design doc for this feature? I would like to learn more
and hopefully to make some contributions.

On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <n3...@gmail.com> wrote:

> Yes, we have some ideas around schema evolution and have discussed with
> Balaji before as well. I'm going to put these thoughts down and share it on
> the cWiki for all of us to jam. Realistically, I don't think we can hit in
> 0.7.0. We already have a pretty strong list of items for 0.7.0.
>
> Spark 3 SQL syntax like MERGE will definitely boost usability!
>
> Thanks,
> Nishith
>
> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > On schema evolution, Nishith and Balaji were both thinking about this.
> May
> > be there is a proposal in works?
> > I would guess we will not be able to hit it in 0.7.0 though. Maybe by the
> > end of year/0.8.0?
> >
> > Tanu, thanks for the kind words! def, if we pull together, we will reach
> > there sooner. Looking forward to more contributions! :)
> >
> > >We were actually thinking of moving to Spark 3.0 but thought it’s too
> > early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> > That's correct. There is a PR already open for this. We expect this to be
> > fixed in 0.6.1 shortly and we will unlock spark 3.0 support
> >
> > 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems that
> > have had this, either had an unfair head start or built ahead with spark
> 3
> > in mind. :))
> > We will close this gap down.
> >
> > On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <xu...@gmail.com>
> > wrote:
> >
> > > +1 on the full schema evolution support. May I know which ticket this
> is
> > > related to? thanks.
> > >
> > > On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
> > >
> > > > Thanks Vinoth, also we would consider support full schema
> > evolution(such
> > > as
> > > >
> > > > drop some fields) of hudi in 0.7.0, since right now hudi follows avro
> > > >
> > > > schema compatibility
> > > >
> > > >
> > > >
> > > > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
> > > >
> > > >
> > > >
> > > > > Thanks Vinoth. These are really exciting items and hats off to you
> > and
> > > > team
> > > >
> > > > > in pushing the releases swiftly and improving the framework all the
> > > > time. I
> > > >
> > > > > hope someday I will start contributing once I will get free from my
> > > major
> > > >
> > > > > deliverables and have more understanding the nitty gritty details
> of
> > > > Hudi.
> > > >
> > > > >
> > > >
> > > > > You have mentioned Spark3.0 support in next release. We were
> actually
> > > >
> > > > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
> > > >
> > > > > release. Is 0.6 not fully tested with Spark 3.0 ?
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org>
> > > > wrote:
> > > >
> > > > >
> > > >
> > > > > > Hello all,
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > Pursuant to our conversation around release planning, I am happy
> to
> > > > share
> > > >
> > > > > >
> > > >
> > > > > > the initial set of proposals for the next minor/major releases
> > (minor
> > > >
> > > > > >
> > > >
> > > > > > release ofc can go out based on time)
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> > > 0.6.0..)
> > > > *
> > > >
> > > > > >
> > > >
> > > > > > Flink/Writer common refactoring for Flink
> > > >
> > > > > >
> > > >
> > > > > > Small file handling support w/o caching
> > > >
> > > > > >
> > > >
> > > > > > Spark3 Support
> > > >
> > > > > >
> > > >
> > > > > > Remaining bootstrap items
> > > >
> > > > > >
> > > >
> > > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> > > >
> > > > > >
> > > >
> > > > > > Full list here :
> > > >
> > > > > >
> > > >
> > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
> > > >
> > > > > >
> > > >
> > > > > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > *0.7.0 with major new features *
> > > >
> > > > > >
> > > >
> > > > > > RFC-15: metadata, range index (w/ spark support), bloom index
> > > > (eliminate
> > > >
> > > > > >
> > > >
> > > > > > file listing, query pruning, improve bloom index perf)
> > > >
> > > > > >
> > > >
> > > > > > RFC-08: Record Index (to solve global index scalability/perf)
> > > >
> > > > > >
> > > >
> > > > > > RFC-18/19: Clustering/Insert overwrite
> > > >
> > > > > >
> > > >
> > > > > > Spark 3 based datasource rewrite (structured streaming
> sink/source,
> > > >
> > > > > >
> > > >
> > > > > > DELETE/MERGE)
> > > >
> > > > > >
> > > >
> > > > > > Incremental Query on logs (Hive, Spark)
> > > >
> > > > > >
> > > >
> > > > > > Parallel writing support
> > > >
> > > > > >
> > > >
> > > > > > Redesign of marker files for S3
> > > >
> > > > > >
> > > >
> > > > > > Stretch: ORC, PrestoSQL Support
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > Full list here :
> > > >
> > > > > >
> > > >
> > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > Please chime in with your thoughts. If you would like to commit
> to
> > > >
> > > > > >
> > > >
> > > > > > contributing a feature towards a release, please do so by marking
> > > *`Fix
> > > >
> > > > > >
> > > >
> > > > > > Version/s`* field with that release number.
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > Thanks
> > > >
> > > > > >
> > > >
> > > > > > Vinoth
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > >
> > > >
> > > >
> > >
> >
>


-- 
Cheers,
Rui Li

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by nishith agarwal <n3...@gmail.com>.

Yes, we have some ideas around schema evolution and have discussed with
Balaji before as well. I'm going to put these thoughts down and share it on
the cWiki for all of us to jam. Realistically, I don't think we can hit in
0.7.0. We already have a pretty strong list of items for 0.7.0.

Spark 3 SQL syntax like MERGE will definitely boost usability!

Thanks,
Nishith

On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <vi...@apache.org> wrote:

> On schema evolution, Nishith and Balaji were both thinking about this. May
> be there is a proposal in works?
> I would guess we will not be able to hit it in 0.7.0 though. Maybe by the
> end of year/0.8.0?
>
> Tanu, thanks for the kind words! def, if we pull together, we will reach
> there sooner. Looking forward to more contributions! :)
>
> >We were actually thinking of moving to Spark 3.0 but thought it’s too
> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> That's correct. There is a PR already open for this. We expect this to be
> fixed in 0.6.1 shortly and we will unlock spark 3.0 support
>
> 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems that
> have had this, either had an unfair head start or built ahead with spark 3
> in mind. :))
> We will close this gap down.
>
> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <xu...@gmail.com>
> wrote:
>
> > +1 on the full schema evolution support. May I know which ticket this is
> > related to? thanks.
> >
> > On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
> >
> > > Thanks Vinoth, also we would consider support full schema
> evolution(such
> > as
> > >
> > > drop some fields) of hudi in 0.7.0, since right now hudi follows avro
> > >
> > > schema compatibility
> > >
> > >
> > >
> > > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
> > >
> > >
> > >
> > > > Thanks Vinoth. These are really exciting items and hats off to you
> and
> > > team
> > >
> > > > in pushing the releases swiftly and improving the framework all the
> > > time. I
> > >
> > > > hope someday I will start contributing once I will get free from my
> > major
> > >
> > > > deliverables and have more understanding the nitty gritty details of
> > > Hudi.
> > >
> > > >
> > >
> > > > You have mentioned Spark3.0 support in next release. We were actually
> > >
> > > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
> > >
> > > > release. Is 0.6 not fully tested with Spark 3.0 ?
> > >
> > > >
> > >
> > > >
> > >
> > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org>
> > > wrote:
> > >
> > > >
> > >
> > > > > Hello all,
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Pursuant to our conversation around release planning, I am happy to
> > > share
> > >
> > > > >
> > >
> > > > > the initial set of proposals for the next minor/major releases
> (minor
> > >
> > > > >
> > >
> > > > > release ofc can go out based on time)
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> > 0.6.0..)
> > > *
> > >
> > > > >
> > >
> > > > > Flink/Writer common refactoring for Flink
> > >
> > > > >
> > >
> > > > > Small file handling support w/o caching
> > >
> > > > >
> > >
> > > > > Spark3 Support
> > >
> > > > >
> > >
> > > > > Remaining bootstrap items
> > >
> > > > >
> > >
> > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> > >
> > > > >
> > >
> > > > > Full list here :
> > >
> > > > >
> > >
> > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
> > >
> > > > >
> > >
> > > > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > *0.7.0 with major new features *
> > >
> > > > >
> > >
> > > > > RFC-15: metadata, range index (w/ spark support), bloom index
> > > (eliminate
> > >
> > > > >
> > >
> > > > > file listing, query pruning, improve bloom index perf)
> > >
> > > > >
> > >
> > > > > RFC-08: Record Index (to solve global index scalability/perf)
> > >
> > > > >
> > >
> > > > > RFC-18/19: Clustering/Insert overwrite
> > >
> > > > >
> > >
> > > > > Spark 3 based datasource rewrite (structured streaming sink/source,
> > >
> > > > >
> > >
> > > > > DELETE/MERGE)
> > >
> > > > >
> > >
> > > > > Incremental Query on logs (Hive, Spark)
> > >
> > > > >
> > >
> > > > > Parallel writing support
> > >
> > > > >
> > >
> > > > > Redesign of marker files for S3
> > >
> > > > >
> > >
> > > > > Stretch: ORC, PrestoSQL Support
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Full list here :
> > >
> > > > >
> > >
> > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Please chime in with your thoughts. If you would like to commit to
> > >
> > > > >
> > >
> > > > > contributing a feature towards a release, please do so by marking
> > *`Fix
> > >
> > > > >
> > >
> > > > > Version/s`* field with that release number.
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Thanks
> > >
> > > > >
> > >
> > > > > Vinoth
> > >
> > > > >
> > >
> > > > >
> > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by Vinoth Chandar <vi...@apache.org>.

On schema evolution, Nishith and Balaji were both thinking about this. May
be there is a proposal in works?
I would guess we will not be able to hit it in 0.7.0 though. Maybe by the
end of year/0.8.0?

Tanu, thanks for the kind words! def, if we pull together, we will reach
there sooner. Looking forward to more contributions! :)

>We were actually thinking of moving to Spark 3.0 but thought it’s too
early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
That's correct. There is a PR already open for this. We expect this to be
fixed in 0.6.1 shortly and we will unlock spark 3.0 support

0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems that
have had this, either had an unfair head start or built ahead with spark 3
in mind. :))
We will close this gap down.

On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <xu...@gmail.com>
wrote:

> +1 on the full schema evolution support. May I know which ticket this is
> related to? thanks.
>
> On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:
>
> > Thanks Vinoth, also we would consider support full schema evolution(such
> as
> >
> > drop some fields) of hudi in 0.7.0, since right now hudi follows avro
> >
> > schema compatibility
> >
> >
> >
> > tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
> >
> >
> >
> > > Thanks Vinoth. These are really exciting items and hats off to you and
> > team
> >
> > > in pushing the releases swiftly and improving the framework all the
> > time. I
> >
> > > hope someday I will start contributing once I will get free from my
> major
> >
> > > deliverables and have more understanding the nitty gritty details of
> > Hudi.
> >
> > >
> >
> > > You have mentioned Spark3.0 support in next release. We were actually
> >
> > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
> >
> > > release. Is 0.6 not fully tested with Spark 3.0 ?
> >
> > >
> >
> > >
> >
> > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org>
> > wrote:
> >
> > >
> >
> > > > Hello all,
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > Pursuant to our conversation around release planning, I am happy to
> > share
> >
> > > >
> >
> > > > the initial set of proposals for the next minor/major releases (minor
> >
> > > >
> >
> > > > release ofc can go out based on time)
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> 0.6.0..)
> > *
> >
> > > >
> >
> > > > Flink/Writer common refactoring for Flink
> >
> > > >
> >
> > > > Small file handling support w/o caching
> >
> > > >
> >
> > > > Spark3 Support
> >
> > > >
> >
> > > > Remaining bootstrap items
> >
> > > >
> >
> > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> >
> > > >
> >
> > > > Full list here :
> >
> > > >
> >
> > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
> >
> > > >
> >
> > > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > *0.7.0 with major new features *
> >
> > > >
> >
> > > > RFC-15: metadata, range index (w/ spark support), bloom index
> > (eliminate
> >
> > > >
> >
> > > > file listing, query pruning, improve bloom index perf)
> >
> > > >
> >
> > > > RFC-08: Record Index (to solve global index scalability/perf)
> >
> > > >
> >
> > > > RFC-18/19: Clustering/Insert overwrite
> >
> > > >
> >
> > > > Spark 3 based datasource rewrite (structured streaming sink/source,
> >
> > > >
> >
> > > > DELETE/MERGE)
> >
> > > >
> >
> > > > Incremental Query on logs (Hive, Spark)
> >
> > > >
> >
> > > > Parallel writing support
> >
> > > >
> >
> > > > Redesign of marker files for S3
> >
> > > >
> >
> > > > Stretch: ORC, PrestoSQL Support
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > Full list here :
> >
> > > >
> >
> > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > Please chime in with your thoughts. If you would like to commit to
> >
> > > >
> >
> > > > contributing a feature towards a release, please do so by marking
> *`Fix
> >
> > > >
> >
> > > > Version/s`* field with that release number.
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > Thanks
> >
> > > >
> >
> > > > Vinoth
> >
> > > >
> >
> > > >
> >
> > >
> >
> >
>

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by Raymond Xu <xu...@gmail.com>.

+1 on the full schema evolution support. May I know which ticket this is
related to? thanks.

On Wed, Sep 23, 2020 at 5:20 AM leesf <le...@gmail.com> wrote:

> Thanks Vinoth, also we would consider support full schema evolution(such as
>
> drop some fields) of hudi in 0.7.0, since right now hudi follows avro
>
> schema compatibility
>
>
>
> tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：
>
>
>
> > Thanks Vinoth. These are really exciting items and hats off to you and
> team
>
> > in pushing the releases swiftly and improving the framework all the
> time. I
>
> > hope someday I will start contributing once I will get free from my major
>
> > deliverables and have more understanding the nitty gritty details of
> Hudi.
>
> >
>
> > You have mentioned Spark3.0 support in next release. We were actually
>
> > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
>
> > release. Is 0.6 not fully tested with Spark 3.0 ?
>
> >
>
> >
>
> > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org>
> wrote:
>
> >
>
> > > Hello all,
>
> > >
>
> > >
>
> > >
>
> > > Pursuant to our conversation around release planning, I am happy to
> share
>
> > >
>
> > > the initial set of proposals for the next minor/major releases (minor
>
> > >
>
> > > release ofc can go out based on time)
>
> > >
>
> > >
>
> > >
>
> > > *Next Minor version 0.6.1 (with stuff that did not make it to 0.6.0..)
> *
>
> > >
>
> > > Flink/Writer common refactoring for Flink
>
> > >
>
> > > Small file handling support w/o caching
>
> > >
>
> > > Spark3 Support
>
> > >
>
> > > Remaining bootstrap items
>
> > >
>
> > > Completing bulk_insertV2 (sort mode, de-dup etc)
>
> > >
>
> > > Full list here :
>
> > >
>
> > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
>
> > >
>
> > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
>
> > >
>
> > >
>
> > >
>
> > > *0.7.0 with major new features *
>
> > >
>
> > > RFC-15: metadata, range index (w/ spark support), bloom index
> (eliminate
>
> > >
>
> > > file listing, query pruning, improve bloom index perf)
>
> > >
>
> > > RFC-08: Record Index (to solve global index scalability/perf)
>
> > >
>
> > > RFC-18/19: Clustering/Insert overwrite
>
> > >
>
> > > Spark 3 based datasource rewrite (structured streaming sink/source,
>
> > >
>
> > > DELETE/MERGE)
>
> > >
>
> > > Incremental Query on logs (Hive, Spark)
>
> > >
>
> > > Parallel writing support
>
> > >
>
> > > Redesign of marker files for S3
>
> > >
>
> > > Stretch: ORC, PrestoSQL Support
>
> > >
>
> > >
>
> > >
>
> > > Full list here :
>
> > >
>
> > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
>
> > >
>
> > >
>
> > >
>
> > > Please chime in with your thoughts. If you would like to commit to
>
> > >
>
> > > contributing a feature towards a release, please do so by marking *`Fix
>
> > >
>
> > > Version/s`* field with that release number.
>
> > >
>
> > >
>
> > >
>
> > > Thanks
>
> > >
>
> > > Vinoth
>
> > >
>
> > >
>
> >
>
>

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by leesf <le...@gmail.com>.

Thanks Vinoth, also we would consider support full schema evolution(such as
drop some fields) of hudi in 0.7.0, since right now hudi follows avro
schema compatibility

tanu dua <ta...@gmail.com> 于2020年9月23日周三 下午12:38写道：

> Thanks Vinoth. These are really exciting items and hats off to you and team
> in pushing the releases swiftly and improving the framework all the time. I
> hope someday I will start contributing once I will get free from my major
> deliverables and have more understanding the nitty gritty details of Hudi.
>
> You have mentioned Spark3.0 support in next release. We were actually
> thinking of moving to Spark 3.0 but thought it’s too early with 0.6
> release. Is 0.6 not fully tested with Spark 3.0 ?
>
>
> On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org> wrote:
>
> > Hello all,
> >
> >
> >
> > Pursuant to our conversation around release planning, I am happy to share
> >
> > the initial set of proposals for the next minor/major releases (minor
> >
> > release ofc can go out based on time)
> >
> >
> >
> > *Next Minor version 0.6.1 (with stuff that did not make it to 0.6.0..) *
> >
> > Flink/Writer common refactoring for Flink
> >
> > Small file handling support w/o caching
> >
> > Spark3 Support
> >
> > Remaining bootstrap items
> >
> > Completing bulk_insertV2 (sort mode, de-dup etc)
> >
> > Full list here :
> >
> > https://issues.apache.org/jira/projects/HUDI/versions/12348168
> >
> > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> >
> >
> >
> > *0.7.0 with major new features *
> >
> > RFC-15: metadata, range index (w/ spark support), bloom index (eliminate
> >
> > file listing, query pruning, improve bloom index perf)
> >
> > RFC-08: Record Index (to solve global index scalability/perf)
> >
> > RFC-18/19: Clustering/Insert overwrite
> >
> > Spark 3 based datasource rewrite (structured streaming sink/source,
> >
> > DELETE/MERGE)
> >
> > Incremental Query on logs (Hive, Spark)
> >
> > Parallel writing support
> >
> > Redesign of marker files for S3
> >
> > Stretch: ORC, PrestoSQL Support
> >
> >
> >
> > Full list here :
> >
> > https://issues.apache.org/jira/projects/HUDI/versions/12348721
> >
> >
> >
> > Please chime in with your thoughts. If you would like to commit to
> >
> > contributing a feature towards a release, please do so by marking *`Fix
> >
> > Version/s`* field with that release number.
> >
> >
> >
> > Thanks
> >
> > Vinoth
> >
> >
>

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Posted by tanu dua <ta...@gmail.com>.

Thanks Vinoth. These are really exciting items and hats off to you and team
in pushing the releases swiftly and improving the framework all the time. I
hope someday I will start contributing once I will get free from my major
deliverables and have more understanding the nitty gritty details of Hudi.

You have mentioned Spark3.0 support in next release. We were actually
thinking of moving to Spark 3.0 but thought it’s too early with 0.6
release. Is 0.6 not fully tested with Spark 3.0 ?


On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <vi...@apache.org> wrote:

> Hello all,
>
>
>
> Pursuant to our conversation around release planning, I am happy to share
>
> the initial set of proposals for the next minor/major releases (minor
>
> release ofc can go out based on time)
>
>
>
> *Next Minor version 0.6.1 (with stuff that did not make it to 0.6.0..) *
>
> Flink/Writer common refactoring for Flink
>
> Small file handling support w/o caching
>
> Spark3 Support
>
> Remaining bootstrap items
>
> Completing bulk_insertV2 (sort mode, de-dup etc)
>
> Full list here :
>
> https://issues.apache.org/jira/projects/HUDI/versions/12348168
>
> <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
>
>
>
> *0.7.0 with major new features *
>
> RFC-15: metadata, range index (w/ spark support), bloom index (eliminate
>
> file listing, query pruning, improve bloom index perf)
>
> RFC-08: Record Index (to solve global index scalability/perf)
>
> RFC-18/19: Clustering/Insert overwrite
>
> Spark 3 based datasource rewrite (structured streaming sink/source,
>
> DELETE/MERGE)
>
> Incremental Query on logs (Hive, Spark)
>
> Parallel writing support
>
> Redesign of marker files for S3
>
> Stretch: ORC, PrestoSQL Support
>
>
>
> Full list here :
>
> https://issues.apache.org/jira/projects/HUDI/versions/12348721
>
>
>
> Please chime in with your thoughts. If you would like to commit to
>
> contributing a feature towards a release, please do so by marking *`Fix
>
> Version/s`* field with that release number.
>
>
>
> Thanks
>
> Vinoth
>
>