You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by Bhavani Sudha <bh...@apache.org> on 2020/08/25 03:36:46 UTC

[ANNOUNCE] Apache Hudi 0.6.0 released

The Apache Hudi team is pleased to announce the release of Apache Hudi
0.6.0.

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
Incrementals. Apache Hudi manages storage of large analytical datasets on
DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
provides the ability to query them.

This release comes 2 months after 0.5.3. It includes more than 200 resolved
issues, comprising new features, perf improvements, as well as general
improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to efficiently
bootstrap large datasets into Hudi without having to copy the data
(experimental feature), via both Spark datasource writer and
DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
faster than bloom index for cases where updates/deletes spread across a
large portion of the table. With this version, rollbacks are done using
marker files and a supporting upgrade and downgrade infrastructure is
provided to users for smooth transition. HoodieMultiDeltaStreamer tool
(experimental feature) is added in this version to support ingesting
multiple kafka streams in a single DeltaStreamer deployment for enhancing
operational experience. Bulk inserts are further improved by avoiding any
dataframe-rdd conversions, accompanied with configurable sorting modes.
While this conversion of dataframe to rdd, is not a bottleneck for
upsert/deletes, subsequent releases will expand this to other write
operations. Other performance improvements include supporting async
compaction for spark streaming writes.

For details on how to use Hudi, please look at the quick start page located
at:
https://hudi.apache.org/docs/quick-start-guide.html

If you'd like to download the source release, you can find it here:
https://github.com/apache/hudi/releases/tag/release-0.6.0

You can read more about the release (including release notes) here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663

We would like to thank all contributors, the community, and the Apache
Software Foundation for enabling this release and we look forward to
continued collaboration. We welcome your help and feedback. For more
information on how to report problems, and to get involved, visit the
project website at:
http://hudi.apache.org/

Thanks to everyone involved!
- Bhavani Sudha

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Bhavani Sudha <bh...@apache.org>.
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by leesf <le...@gmail.com>.
Great, thanks sudha and all involved.

Pratyaksh Sharma <pr...@gmail.com> 于2020年8月25日周二 下午1:17写道:

> Great news! :)
>
> On Tue, Aug 25, 2020 at 10:09 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > - announce
> >
> > Folks, please keep the follow ups to dev@ and users@
> >
> >
> >
> > On Mon, Aug 24, 2020 at 9:26 PM vino yang <ya...@gmail.com> wrote:
> >
> > > Great news!
> > >
> > > Thanks to Bhavani Sudha for driving the release! And thanks to every
> one
> > of
> > > the whole community!
> > >
> > > Best,
> > > Vino
> > >
> > > Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:
> > >
> > > > The Apache Hudi team is pleased to announce the release of Apache
> Hudi
> > > > 0.6.0.
> > > >
> > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > > > Incrementals. Apache Hudi manages storage of large analytical
> datasets
> > on
> > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage)
> > and
> > > > provides the ability to query them.
> > > >
> > > > This release comes 2 months after 0.5.3. It includes more than 200
> > > > resolved issues, comprising new features, perf improvements, as well
> as
> > > > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms
> to
> > > > efficiently bootstrap large datasets into Hudi without having to copy
> > the
> > > > data (experimental feature), via both Spark datasource writer and
> > > > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can
> > be
> > > > faster than bloom index for cases where updates/deletes spread
> across a
> > > > large portion of the table. With this version, rollbacks are done
> using
> > > > marker files and a supporting upgrade and downgrade infrastructure is
> > > > provided to users for smooth transition. HoodieMultiDeltaStreamer
> tool
> > > > (experimental feature) is added in this version to support ingesting
> > > > multiple kafka streams in a single DeltaStreamer deployment for
> > enhancing
> > > > operational experience. Bulk inserts are further improved by avoiding
> > any
> > > > dataframe-rdd conversions, accompanied with configurable sorting
> modes.
> > > > While this conversion of dataframe to rdd, is not a bottleneck for
> > > > upsert/deletes, subsequent releases will expand this to other write
> > > > operations. Other performance improvements include supporting async
> > > > compaction for spark streaming writes.
> > > >
> > > > For details on how to use Hudi, please look at the quick start page
> > > > located at:
> > > > https://hudi.apache.org/docs/quick-start-guide.html
> > > >
> > > > If you'd like to download the source release, you can find it here:
> > > > https://github.com/apache/hudi/releases/tag/release-0.6.0
> > > >
> > > > You can read more about the release (including release notes) here:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> > > >
> > > > We would like to thank all contributors, the community, and the
> Apache
> > > > Software Foundation for enabling this release and we look forward to
> > > > continued collaboration. We welcome your help and feedback. For more
> > > > information on how to report problems, and to get involved, visit the
> > > > project website at:
> > > > http://hudi.apache.org/
> > > >
> > > > Thanks to everyone involved!
> > > > - Bhavani Sudha
> > > >
> > >
> >
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Great news! :)

On Tue, Aug 25, 2020 at 10:09 AM Vinoth Chandar <vi...@apache.org> wrote:

> - announce
>
> Folks, please keep the follow ups to dev@ and users@
>
>
>
> On Mon, Aug 24, 2020 at 9:26 PM vino yang <ya...@gmail.com> wrote:
>
> > Great news!
> >
> > Thanks to Bhavani Sudha for driving the release! And thanks to every one
> of
> > the whole community!
> >
> > Best,
> > Vino
> >
> > Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:
> >
> > > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > > 0.6.0.
> > >
> > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > > Incrementals. Apache Hudi manages storage of large analytical datasets
> on
> > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage)
> and
> > > provides the ability to query them.
> > >
> > > This release comes 2 months after 0.5.3. It includes more than 200
> > > resolved issues, comprising new features, perf improvements, as well as
> > > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > > efficiently bootstrap large datasets into Hudi without having to copy
> the
> > > data (experimental feature), via both Spark datasource writer and
> > > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can
> be
> > > faster than bloom index for cases where updates/deletes spread across a
> > > large portion of the table. With this version, rollbacks are done using
> > > marker files and a supporting upgrade and downgrade infrastructure is
> > > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > > (experimental feature) is added in this version to support ingesting
> > > multiple kafka streams in a single DeltaStreamer deployment for
> enhancing
> > > operational experience. Bulk inserts are further improved by avoiding
> any
> > > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > > While this conversion of dataframe to rdd, is not a bottleneck for
> > > upsert/deletes, subsequent releases will expand this to other write
> > > operations. Other performance improvements include supporting async
> > > compaction for spark streaming writes.
> > >
> > > For details on how to use Hudi, please look at the quick start page
> > > located at:
> > > https://hudi.apache.org/docs/quick-start-guide.html
> > >
> > > If you'd like to download the source release, you can find it here:
> > > https://github.com/apache/hudi/releases/tag/release-0.6.0
> > >
> > > You can read more about the release (including release notes) here:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> > >
> > > We would like to thank all contributors, the community, and the Apache
> > > Software Foundation for enabling this release and we look forward to
> > > continued collaboration. We welcome your help and feedback. For more
> > > information on how to report problems, and to get involved, visit the
> > > project website at:
> > > http://hudi.apache.org/
> > >
> > > Thanks to everyone involved!
> > > - Bhavani Sudha
> > >
> >
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Vinoth Chandar <vi...@apache.org>.
- announce

Folks, please keep the follow ups to dev@ and users@



On Mon, Aug 24, 2020 at 9:26 PM vino yang <ya...@gmail.com> wrote:

> Great news!
>
> Thanks to Bhavani Sudha for driving the release! And thanks to every one of
> the whole community!
>
> Best,
> Vino
>
> Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:
>
> > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > 0.6.0.
> >
> > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > Incrementals. Apache Hudi manages storage of large analytical datasets on
> > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> > provides the ability to query them.
> >
> > This release comes 2 months after 0.5.3. It includes more than 200
> > resolved issues, comprising new features, perf improvements, as well as
> > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > efficiently bootstrap large datasets into Hudi without having to copy the
> > data (experimental feature), via both Spark datasource writer and
> > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> > faster than bloom index for cases where updates/deletes spread across a
> > large portion of the table. With this version, rollbacks are done using
> > marker files and a supporting upgrade and downgrade infrastructure is
> > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > (experimental feature) is added in this version to support ingesting
> > multiple kafka streams in a single DeltaStreamer deployment for enhancing
> > operational experience. Bulk inserts are further improved by avoiding any
> > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > While this conversion of dataframe to rdd, is not a bottleneck for
> > upsert/deletes, subsequent releases will expand this to other write
> > operations. Other performance improvements include supporting async
> > compaction for spark streaming writes.
> >
> > For details on how to use Hudi, please look at the quick start page
> > located at:
> > https://hudi.apache.org/docs/quick-start-guide.html
> >
> > If you'd like to download the source release, you can find it here:
> > https://github.com/apache/hudi/releases/tag/release-0.6.0
> >
> > You can read more about the release (including release notes) here:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> >
> > We would like to thank all contributors, the community, and the Apache
> > Software Foundation for enabling this release and we look forward to
> > continued collaboration. We welcome your help and feedback. For more
> > information on how to report problems, and to get involved, visit the
> > project website at:
> > http://hudi.apache.org/
> >
> > Thanks to everyone involved!
> > - Bhavani Sudha
> >
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Vinoth Chandar <vi...@apache.org>.
- announce

Folks, please keep the follow ups to dev@ and users@



On Mon, Aug 24, 2020 at 9:26 PM vino yang <ya...@gmail.com> wrote:

> Great news!
>
> Thanks to Bhavani Sudha for driving the release! And thanks to every one of
> the whole community!
>
> Best,
> Vino
>
> Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:
>
> > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > 0.6.0.
> >
> > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > Incrementals. Apache Hudi manages storage of large analytical datasets on
> > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> > provides the ability to query them.
> >
> > This release comes 2 months after 0.5.3. It includes more than 200
> > resolved issues, comprising new features, perf improvements, as well as
> > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > efficiently bootstrap large datasets into Hudi without having to copy the
> > data (experimental feature), via both Spark datasource writer and
> > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> > faster than bloom index for cases where updates/deletes spread across a
> > large portion of the table. With this version, rollbacks are done using
> > marker files and a supporting upgrade and downgrade infrastructure is
> > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > (experimental feature) is added in this version to support ingesting
> > multiple kafka streams in a single DeltaStreamer deployment for enhancing
> > operational experience. Bulk inserts are further improved by avoiding any
> > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > While this conversion of dataframe to rdd, is not a bottleneck for
> > upsert/deletes, subsequent releases will expand this to other write
> > operations. Other performance improvements include supporting async
> > compaction for spark streaming writes.
> >
> > For details on how to use Hudi, please look at the quick start page
> > located at:
> > https://hudi.apache.org/docs/quick-start-guide.html
> >
> > If you'd like to download the source release, you can find it here:
> > https://github.com/apache/hudi/releases/tag/release-0.6.0
> >
> > You can read more about the release (including release notes) here:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> >
> > We would like to thank all contributors, the community, and the Apache
> > Software Foundation for enabling this release and we look forward to
> > continued collaboration. We welcome your help and feedback. For more
> > information on how to report problems, and to get involved, visit the
> > project website at:
> > http://hudi.apache.org/
> >
> > Thanks to everyone involved!
> > - Bhavani Sudha
> >
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by vino yang <ya...@gmail.com>.
Great news!

Thanks to Bhavani Sudha for driving the release! And thanks to every one of
the whole community!

Best,
Vino

Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by vino yang <ya...@gmail.com>.
Great news!

Thanks to Bhavani Sudha for driving the release! And thanks to every one of
the whole community!

Best,
Vino

Bhavani Sudha <bh...@apache.org> 于2020年8月25日周二 上午11:37写道:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Bhavani Sudha <bh...@apache.org>.
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Sivabalan <n....@gmail.com>.
Great, Excited for the release! Thanks Sudha for driving the release.


On Tue, Aug 25, 2020 at 12:06 AM Bhavani Sudha <bh...@gmail.com>
wrote:

> Moving announce@ to bcc to avoid disruptions.
>
> On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
> wrote:
>
> > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > 0.6.0.
> >
> > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > Incrementals. Apache Hudi manages storage of large analytical datasets on
> > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> > provides the ability to query them.
> >
> > This release comes 2 months after 0.5.3. It includes more than 200
> > resolved issues, comprising new features, perf improvements, as well as
> > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > efficiently bootstrap large datasets into Hudi without having to copy the
> > data (experimental feature), via both Spark datasource writer and
> > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> > faster than bloom index for cases where updates/deletes spread across a
> > large portion of the table. With this version, rollbacks are done using
> > marker files and a supporting upgrade and downgrade infrastructure is
> > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > (experimental feature) is added in this version to support ingesting
> > multiple kafka streams in a single DeltaStreamer deployment for enhancing
> > operational experience. Bulk inserts are further improved by avoiding any
> > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > While this conversion of dataframe to rdd, is not a bottleneck for
> > upsert/deletes, subsequent releases will expand this to other write
> > operations. Other performance improvements include supporting async
> > compaction for spark streaming writes.
> >
> > For details on how to use Hudi, please look at the quick start page
> > located at:
> > https://hudi.apache.org/docs/quick-start-guide.html
> >
> > If you'd like to download the source release, you can find it here:
> > https://github.com/apache/hudi/releases/tag/release-0.6.0
> >
> > You can read more about the release (including release notes) here:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> >
> > We would like to thank all contributors, the community, and the Apache
> > Software Foundation for enabling this release and we look forward to
> > continued collaboration. We welcome your help and feedback. For more
> > information on how to report problems, and to get involved, visit the
> > project website at:
> > http://hudi.apache.org/
> >
> > Thanks to everyone involved!
> > - Bhavani Sudha
> >
>


-- 
Regards,
-Sivabalan

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Sivabalan <n....@gmail.com>.
Great, Excited for the release! Thanks Sudha for driving the release.


On Tue, Aug 25, 2020 at 12:06 AM Bhavani Sudha <bh...@gmail.com>
wrote:

> Moving announce@ to bcc to avoid disruptions.
>
> On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
> wrote:
>
> > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > 0.6.0.
> >
> > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > Incrementals. Apache Hudi manages storage of large analytical datasets on
> > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> > provides the ability to query them.
> >
> > This release comes 2 months after 0.5.3. It includes more than 200
> > resolved issues, comprising new features, perf improvements, as well as
> > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > efficiently bootstrap large datasets into Hudi without having to copy the
> > data (experimental feature), via both Spark datasource writer and
> > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> > faster than bloom index for cases where updates/deletes spread across a
> > large portion of the table. With this version, rollbacks are done using
> > marker files and a supporting upgrade and downgrade infrastructure is
> > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > (experimental feature) is added in this version to support ingesting
> > multiple kafka streams in a single DeltaStreamer deployment for enhancing
> > operational experience. Bulk inserts are further improved by avoiding any
> > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > While this conversion of dataframe to rdd, is not a bottleneck for
> > upsert/deletes, subsequent releases will expand this to other write
> > operations. Other performance improvements include supporting async
> > compaction for spark streaming writes.
> >
> > For details on how to use Hudi, please look at the quick start page
> > located at:
> > https://hudi.apache.org/docs/quick-start-guide.html
> >
> > If you'd like to download the source release, you can find it here:
> > https://github.com/apache/hudi/releases/tag/release-0.6.0
> >
> > You can read more about the release (including release notes) here:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> >
> > We would like to thank all contributors, the community, and the Apache
> > Software Foundation for enabling this release and we look forward to
> > continued collaboration. We welcome your help and feedback. For more
> > information on how to report problems, and to get involved, visit the
> > project website at:
> > http://hudi.apache.org/
> >
> > Thanks to everyone involved!
> > - Bhavani Sudha
> >
>


-- 
Regards,
-Sivabalan

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Bhavani Sudha <bh...@gmail.com>.
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Posted by Bhavani Sudha <bh...@gmail.com>.
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha <bh...@apache.org>
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>