You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Konstantin Knauf <kn...@apache.org> on 2020/08/25 20:22:41 UTC

[ANNOUNCE] Weekly Community Update 2020/31-34

Dear community,

The "weekly" community update is back after a short summer break! This time
I've tried to cover most of what happened during the last four weeks, but I
might pick up some older topics in the next weeks' updates, too.

Activity on the dev@ mailing list has picked up quite a bit as feature
development & design for the next releases of Apache Flink and Apache Flink
Stateful Functions is going at full steam. In detail:

Flink Development
==============

* [releases] [Flink 1.12] The work on Flink 1.12 is well underway with
feature freeze planned for end of October [1]. Our release managers Robert
& Dian are periodically reminding the developer community of current
blockers to reduce time during release testing for this release [2].

* [releases] [Stateful Functions 2.2] Igal has started a discussion
releasing Stateful Functions 2.2. soon (proposed feature freeze:
September 10). The most notable feature is maybe the option to embed a
stateful functions module in a DataStream program via DataStream
Ingress/Egress. Checkout [3] for a full list of the planned features.

* [releases] [Flink 1.10] Flink 1.10.2 was released. [4]

* [apis] Besides the Stateful Functions API, Flink currently has three
top-level APIs: DataStream (streaming), DataSet (batch) and TableAPI/SQL
(unified). A major step towards the goal of a truly unified batch and
stream processing engine is the unification of the DataStream/DataSet APIs.
This is one of the main topics of the upcoming release(s), specifically:
    * Aljoscha has published FLIP-131 [5] proposing to deprecate and
eventually drop the DataSet API. In order to still support the same breadth
of use cases, we need to make sure that all its use cases are covered by
the two remaining APIs: a unified DataStream API and the Table API. These
changes are not part of FLIP-131 itself, but are covered in other FLIPs,
which already exist (like FLIP-27 [6] or FLIP-129 [7]) or will be published
over the next few weeks like FLIP-134 (see below). [8]
    * Most importantly, FLIP-134 [9] discusses how the DataStream API could
be used to efficiently execute batch workloads in the future. In essence
the FLIP proposes to introduce a BATCH and a STREAMING execution mode for
DataStream programs. The STREAMING mode corresponds to the current
behavior, while the BATCH mode adjusts the behavior in various areas to fit
the requirements of batch processing, e.g. pipelined scheduling with region
failover, blocking shuffles, no checkpointing, no watermarks, ... [10]

* [apis] Time proposes FLIP-136 to improve the interoperability between the
Data Stream and Table API. The FLIP covers the conversion between
DataStream <-> Table (incl. cnangelong streams, watermarks, etc.) as well
as more additional support for working with the Row type in the DataStream
API. [11]

* [datastream api] Dawid proposes to remove a set of deprecated methods
from the DataStream API. [12]

* [runtime] Yuan Mei has started a discussion on FLIP-135 to introduce
task-local recovery. The FLIP is about the introduction of a new
failover/recovery strategy for Flink Jobs, that trades consistency for
availability. Specifically, in the case of approximate task-local recovery
the failure of some tasks would not trigger a restart of the rest of the
job, but in turn you can expect data loss or duplication. [13]

* [python] Xingbo Huang proposes to extend the support of Pandas/vectorized
functions from scalar functions to aggregate functions. For more details on
Pandas support on PyFlink see the blog post linked below. [14]

* [connectors] Aljoscha has started a discussion on dropping support for
Kafka 0.10/0.11 in Flink 1.12+. [15]

* [connectors] Robert has revived the discussion on adding support for
Hbase 2.3.x. There is a consensus to add the HBase 2.x connector Apache
Flink, but no consensus yet on whether to move the existing HBase 1.x from
the Flink project to Apache Bahir, too. [16]
<https://flink.apache.org/news/2020/08/25/release-1.10.2.html>
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Planning-Flink-1-12-tp43348.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Next-Stateful-Functions-Release-tp44063.html
[4] https://flink.apache.org/news/2020/08/25/release-1.10.2.html
[5]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface?src=contextnavpagetreemode

[7]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-131-Consolidate-the-user-facing-Dataflow-SDKs-APIs-and-deprecate-the-DataSet-API-tp43521.html

[9]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-134-DataStream-Semantics-for-Bounded-Input-tp43839p43965.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-136-Improve-interoperability-between-DataStream-and-Table-API-tp43993.html
[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-methods-from-DataStream-API-tp43938.html
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
[14]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-137-Support-Pandas-UDAF-in-PyFlink-tp44060.html
[15]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Kafka-0-10-x-connector-and-possibly-0-11-x-tp44087.html
[16]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html

flink-packages.org
==============

Jark has recently published a set of Flink connectors (DataStream & Table
API/SQL) that allow to ingest the changelog of MySQL and Postgres without
additional tools like Kafka or Debezium. [17]

[17] https://flink-packages.org/packages/cdc-connectors

Notable Bugs
==========

To be honest, I did not search through every bug ticket created over the
last four weeks, only the last seven days, and I did not find anything
particularly notable. So, I'll leave you without any bug reports this time.

Events, Blog Posts, Misc
===================

* David Anderson is now an Apache Flink committer. Congrats! [18]

* There have been a couple blog posts on the Flink blog recently that
highlight some of the features added in latest release:
    * PyFlink: The Integration of Pands into PyFlink [19]
<https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html>
    *  <https://flink.apache.org/news/2020/08/06/external-resource.html>Accelerating
your workload with GPU and other external resources [20]
    * Monitoring and Controlling Networks of IoT Devices with Flink
Stateful Functions [21]
    * The State of Flink on Docker [22]
<https://flink.apache.org/news/2020/08/20/flink-docker.html>

* The schedule for Flink Forward Global is live [23]. The event is free and
you can already register under [24].

[18]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-David-Anderson-tp43814p43847.html
[19]
https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html
[20] https://flink.apache.org/news/2020/08/06/external-resource.html
[21] https://flink.apache.org/2020/08/19/statefun.html
[22] https://flink.apache.org/news/2020/08/20/flink-docker.html
[23] https://www.flink-forward.org/global-2020/conference-program
[24]
https://www.eventbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516#tickets

Cheers,

Konstantin


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [ANNOUNCE] Weekly Community Update 2020/31-34

Posted by Robert Metzger <rm...@apache.org>.
Thanks a lot for doing these updates!

On Tue, Aug 25, 2020 at 10:23 PM Konstantin Knauf <kn...@apache.org> wrote:

> Dear community,
>
> The "weekly" community update is back after a short summer break! This time
> I've tried to cover most of what happened during the last four weeks, but I
> might pick up some older topics in the next weeks' updates, too.
>
> Activity on the dev@ mailing list has picked up quite a bit as feature
> development & design for the next releases of Apache Flink and Apache Flink
> Stateful Functions is going at full steam. In detail:
>
> Flink Development
> ==============
>
> * [releases] [Flink 1.12] The work on Flink 1.12 is well underway with
> feature freeze planned for end of October [1]. Our release managers Robert
> & Dian are periodically reminding the developer community of current
> blockers to reduce time during release testing for this release [2].
>
> * [releases] [Stateful Functions 2.2] Igal has started a discussion
> releasing Stateful Functions 2.2. soon (proposed feature freeze:
> September 10). The most notable feature is maybe the option to embed a
> stateful functions module in a DataStream program via DataStream
> Ingress/Egress. Checkout [3] for a full list of the planned features.
>
> * [releases] [Flink 1.10] Flink 1.10.2 was released. [4]
>
> * [apis] Besides the Stateful Functions API, Flink currently has three
> top-level APIs: DataStream (streaming), DataSet (batch) and TableAPI/SQL
> (unified). A major step towards the goal of a truly unified batch and
> stream processing engine is the unification of the DataStream/DataSet APIs.
> This is one of the main topics of the upcoming release(s), specifically:
>     * Aljoscha has published FLIP-131 [5] proposing to deprecate and
> eventually drop the DataSet API. In order to still support the same breadth
> of use cases, we need to make sure that all its use cases are covered by
> the two remaining APIs: a unified DataStream API and the Table API. These
> changes are not part of FLIP-131 itself, but are covered in other FLIPs,
> which already exist (like FLIP-27 [6] or FLIP-129 [7]) or will be published
> over the next few weeks like FLIP-134 (see below). [8]
>     * Most importantly, FLIP-134 [9] discusses how the DataStream API could
> be used to efficiently execute batch workloads in the future. In essence
> the FLIP proposes to introduce a BATCH and a STREAMING execution mode for
> DataStream programs. The STREAMING mode corresponds to the current
> behavior, while the BATCH mode adjusts the behavior in various areas to fit
> the requirements of batch processing, e.g. pipelined scheduling with region
> failover, blocking shuffles, no checkpointing, no watermarks, ... [10]
>
> * [apis] Time proposes FLIP-136 to improve the interoperability between the
> Data Stream and Table API. The FLIP covers the conversion between
> DataStream <-> Table (incl. cnangelong streams, watermarks, etc.) as well
> as more additional support for working with the Row type in the DataStream
> API. [11]
>
> * [datastream api] Dawid proposes to remove a set of deprecated methods
> from the DataStream API. [12]
>
> * [runtime] Yuan Mei has started a discussion on FLIP-135 to introduce
> task-local recovery. The FLIP is about the introduction of a new
> failover/recovery strategy for Flink Jobs, that trades consistency for
> availability. Specifically, in the case of approximate task-local recovery
> the failure of some tasks would not trigger a restart of the rest of the
> job, but in turn you can expect data loss or duplication. [13]
>
> * [python] Xingbo Huang proposes to extend the support of Pandas/vectorized
> functions from scalar functions to aggregate functions. For more details on
> Pandas support on PyFlink see the blog post linked below. [14]
>
> * [connectors] Aljoscha has started a discussion on dropping support for
> Kafka 0.10/0.11 in Flink 1.12+. [15]
>
> * [connectors] Robert has revived the discussion on adding support for
> Hbase 2.3.x. There is a consensus to add the HBase 2.x connector Apache
> Flink, but no consensus yet on whether to move the existing HBase 1.x from
> the Flink project to Apache Bahir, too. [16]
> <https://flink.apache.org/news/2020/08/25/release-1.10.2.html>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Planning-Flink-1-12-tp43348.html
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html
> [3]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Next-Stateful-Functions-Release-tp44063.html
> [4] https://flink.apache.org/news/2020/08/25/release-1.10.2.html
> [5]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
> [6]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface?src=contextnavpagetreemode
>
> [7]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API
> [8]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-131-Consolidate-the-user-facing-Dataflow-SDKs-APIs-and-deprecate-the-DataSet-API-tp43521.html
>
> [9]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522
> [10]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-134-DataStream-Semantics-for-Bounded-Input-tp43839p43965.html
> [11]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-136-Improve-interoperability-between-DataStream-and-Table-API-tp43993.html
> [12]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-methods-from-DataStream-API-tp43938.html
> [13]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
> [14]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-137-Support-Pandas-UDAF-in-PyFlink-tp44060.html
> [15]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Kafka-0-10-x-connector-and-possibly-0-11-x-tp44087.html
> [16]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html
>
> flink-packages.org
> ==============
>
> Jark has recently published a set of Flink connectors (DataStream & Table
> API/SQL) that allow to ingest the changelog of MySQL and Postgres without
> additional tools like Kafka or Debezium. [17]
>
> [17] https://flink-packages.org/packages/cdc-connectors
>
> Notable Bugs
> ==========
>
> To be honest, I did not search through every bug ticket created over the
> last four weeks, only the last seven days, and I did not find anything
> particularly notable. So, I'll leave you without any bug reports this time.
>
> Events, Blog Posts, Misc
> ===================
>
> * David Anderson is now an Apache Flink committer. Congrats! [18]
>
> * There have been a couple blog posts on the Flink blog recently that
> highlight some of the features added in latest release:
>     * PyFlink: The Integration of Pands into PyFlink [19]
> <https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html
> >
>     *  <https://flink.apache.org/news/2020/08/06/external-resource.html
> >Accelerating
> your workload with GPU and other external resources [20]
>     * Monitoring and Controlling Networks of IoT Devices with Flink
> Stateful Functions [21]
>     * The State of Flink on Docker [22]
> <https://flink.apache.org/news/2020/08/20/flink-docker.html>
>
> * The schedule for Flink Forward Global is live [23]. The event is free and
> you can already register under [24].
>
> [18]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-David-Anderson-tp43814p43847.html
> [19]
> https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html
> [20] https://flink.apache.org/news/2020/08/06/external-resource.html
> [21] https://flink.apache.org/2020/08/19/statefun.html
> [22] https://flink.apache.org/news/2020/08/20/flink-docker.html
> [23] https://www.flink-forward.org/global-2020/conference-program
> [24]
>
> https://www.eventbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516#tickets
>
> Cheers,
>
> Konstantin
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>