You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/11 09:41:50 UTC
[GitHub] [flink-web] lindong28 commented on a diff in pull request #556: Release Flink ML 2.1.0

lindong28 commented on code in PR #556:
URL: https://github.com/apache/flink-web/pull/556#discussion_r917735347


##########
_posts/2022-07-07-release-ml-2.1.0.md:
##########
@@ -0,0 +1,159 @@
+---
+layout: post 
+title:  "Apache Flink ML 2.1.0 Release Announcement"
+date: 2022-07-07T08:00:00.000Z
+categories: news
+authors:
+- zhangzhipeng:
+  name: "Zhipeng Zhang"
+- lindong:
+  name: "Dong Lin"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML's infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure via benchmarks and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs to help users learn and use Flink ML.
+
+---
+
+The Apache Flink community is excited to announce the release of Flink ML 2.1.0!
+This release focuses on improving Flink ML's infrastructure, such as Python SDK,
+memory management, and benchmark framework, to facilitate the development of 
+performant, memory-safe, and easy-to-use algorithm libraries. We validated the 
+enhanced infrastructure by implementing, benchmarking, and optimizing 10 new 
+algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the 
+performance of selected algorithms from alternative popular ML libraries.
+In addition, this release added example Python and Java programs for each 
+algorithm in the library to help users learn and use Flink ML.
+
+With the improvements and performance benchmarks made in this release, we 
+believe Flink ML's infrastructure is ready for use by the interested developers 
+in the community to build performant pythonic machine learning libraries.
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) 
+and share your feedback with the community through the Flink 
+[mailing lists](https://flink.apache.org/community.html#mailing-lists) or
+[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new
+release and we’d be eager to learn about your experience with it.
+
+{% toc %}
+
+# Notable Features
+
+## API and Infrastructure
+
+### Supporting fine-grained per-operator memory management
+
+Before this release, algorithm operators with internal states (e.g. the training
+data to be replayed for each round of iteration) store state data using the 
+state-backend API (e.g. ListState). Such an operator either needs to store all 
+data in memory, which risks OOM, or it needs to always store data on disk. 
+In the latter case, it needs to read and de-serialize all data from disks 
+repeatedly in each round of iteration even if the data can fit in RAM, leading 
+to sub-optimal performance when the training data size is small. This makes it 
+hard for developers to write performant and memory-safe operators.
+
+This release enhances the Flink ML infrastructure with the mechanism to specify 
+the amount of managed memory that an operator can consume. This allows algorithm
+operators to write and read data from managed memory when the data size is below
+the quota, and automatically spill those data that exceeds the memory quota to 
+disks to avoid OOM. Algorithm developers can use this mechanism to achieve 
+optimal algorithm performance as input data size varies. Please feel free to 
+check out the implementation of the KMeans operator for example.
+
+### Improved infrastructure for developing online learning algorithms
+
+A key objective of Flink ML is to facilitate the development of online learning 
+applications. In the last release, we enhanced the Flink ML API with 
+setModelData() and getModelData(), which allows users of online learning 
+algorithms to transmit and persist model data as unbounded data streams. 
+This release continues the effort by improving and validating the infrastructure
+needed to develop online learning algorithms.
+
+Specifically, this release added two online learning algorithm prototypes (i.e. 
+OnlineKMeans and OnlineLogisticRegression) with tests covering the entire 
+lifecycle of using these algorithms. These two algorithms introduce concepts 
+such as global batch size and model version, together with metrics and APIs to 
+set and get those values. While the online algorithm prototypes have not been 
+optimized for prediction accuracy yet, this line of work is an important step 
+toward setting up best practices for building online learning algorithms in 
+Flink ML. We hope more contributors from the community can join this effort.
+
+### Tools for benchmarking algorithm performance
+
+An easy-to-use benchmark framework is critical to developing and maintaining 
+performant algorithm libraries in Flink ML. This release added a benchmark 
+framework that provides APIs to write pluggable and reusable data generators, 
+takes benchmark configuration in JSON format, and outputs benchmark results in 
+JSON format to enable custom analysis. An off-the-shelf script is provided to 
+visualize benchmark results using Matplotlib. Feel free to check out this 
+[README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)

Review Comment:
   Change the link to https://github.com/apache/flink-ml/blob/release-2.1/flink-ml-benchmark/README.md?



##########
_posts/2022-07-07-release-ml-2.1.0.md:
##########
@@ -0,0 +1,159 @@
+---
+layout: post 
+title:  "Apache Flink ML 2.1.0 Release Announcement"
+date: 2022-07-07T08:00:00.000Z
+categories: news
+authors:
+- zhangzhipeng:
+  name: "Zhipeng Zhang"
+- lindong:
+  name: "Dong Lin"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML's infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure via benchmarks and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs to help users learn and use Flink ML.
+
+---
+
+The Apache Flink community is excited to announce the release of Flink ML 2.1.0!
+This release focuses on improving Flink ML's infrastructure, such as Python SDK,
+memory management, and benchmark framework, to facilitate the development of 
+performant, memory-safe, and easy-to-use algorithm libraries. We validated the 
+enhanced infrastructure by implementing, benchmarking, and optimizing 10 new 
+algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the 
+performance of selected algorithms from alternative popular ML libraries.
+In addition, this release added example Python and Java programs for each 
+algorithm in the library to help users learn and use Flink ML.
+
+With the improvements and performance benchmarks made in this release, we 
+believe Flink ML's infrastructure is ready for use by the interested developers 
+in the community to build performant pythonic machine learning libraries.
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) 
+and share your feedback with the community through the Flink 
+[mailing lists](https://flink.apache.org/community.html#mailing-lists) or
+[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new
+release and we’d be eager to learn about your experience with it.
+
+{% toc %}
+
+# Notable Features
+
+## API and Infrastructure
+
+### Supporting fine-grained per-operator memory management
+
+Before this release, algorithm operators with internal states (e.g. the training
+data to be replayed for each round of iteration) store state data using the 
+state-backend API (e.g. ListState). Such an operator either needs to store all 
+data in memory, which risks OOM, or it needs to always store data on disk. 
+In the latter case, it needs to read and de-serialize all data from disks 
+repeatedly in each round of iteration even if the data can fit in RAM, leading 
+to sub-optimal performance when the training data size is small. This makes it 
+hard for developers to write performant and memory-safe operators.
+
+This release enhances the Flink ML infrastructure with the mechanism to specify 
+the amount of managed memory that an operator can consume. This allows algorithm
+operators to write and read data from managed memory when the data size is below
+the quota, and automatically spill those data that exceeds the memory quota to 
+disks to avoid OOM. Algorithm developers can use this mechanism to achieve 
+optimal algorithm performance as input data size varies. Please feel free to 
+check out the implementation of the KMeans operator for example.
+
+### Improved infrastructure for developing online learning algorithms
+
+A key objective of Flink ML is to facilitate the development of online learning 
+applications. In the last release, we enhanced the Flink ML API with 
+setModelData() and getModelData(), which allows users of online learning 
+algorithms to transmit and persist model data as unbounded data streams. 
+This release continues the effort by improving and validating the infrastructure
+needed to develop online learning algorithms.
+
+Specifically, this release added two online learning algorithm prototypes (i.e. 
+OnlineKMeans and OnlineLogisticRegression) with tests covering the entire 
+lifecycle of using these algorithms. These two algorithms introduce concepts 
+such as global batch size and model version, together with metrics and APIs to 
+set and get those values. While the online algorithm prototypes have not been 
+optimized for prediction accuracy yet, this line of work is an important step 
+toward setting up best practices for building online learning algorithms in 
+Flink ML. We hope more contributors from the community can join this effort.
+
+### Tools for benchmarking algorithm performance
+
+An easy-to-use benchmark framework is critical to developing and maintaining 
+performant algorithm libraries in Flink ML. This release added a benchmark 
+framework that provides APIs to write pluggable and reusable data generators, 
+takes benchmark configuration in JSON format, and outputs benchmark results in 
+JSON format to enable custom analysis. An off-the-shelf script is provided to 
+visualize benchmark results using Matplotlib. Feel free to check out this 
+[README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)
+for instructions on how to use this benchmark framework.
+
+The benchmark framework currently supports evaluating algorithm throughput. 
+In the future release, we plan to support evaluating algorithm latency and 
+accuracy.
+
+## Python SDK
+
+This release enhances the Python SDK so that operators in the Flink ML Python 
+library can invoke the corresponding operators in the Java library. The Python 
+operator is a thin-wrapper around the Java operator and delivers the same 
+performance as the Java operator during execution. This capability significantly
+improves developer velocity by allowing algorithm developers to maintain both 
+the Python and the Java libraries of algorithms without having to implement 
+those algorithms twice. 
+
+## Algorithm Library
+This release continues to extend the algorithm library in Flink ML, with the 
+focus on validating the functionalities and the performance of Flink ML 
+infrastructure using representative algorithms in different categories.
+
+Below are the lists of algorithms newly supported in this release, grouped by 
+their categories:
+
+- feature engineering (MinMaxScaler, StringIndexer, VectorAssembler, 
+StandardScaler, Bucketizer)
+- online learning (OnlineKmeans, OnlineLogisiticRegression)
+- regression (LinearRegression)
+- classification (LinearSVC)
+- evaluation (BinaryClassificationEvaluator)
+
+Example Python and Java programs for these algorithms are provided on the Apache
+Flink ML [website](https://nightlies.apache.org/flink/flink-ml-docs-stable/) to 

Review Comment:
   Change the link to https://nightlies.apache.org/flink/flink-ml-docs-release-2.1/?



##########
_posts/2022-07-07-release-ml-2.1.0.md:
##########
@@ -0,0 +1,159 @@
+---
+layout: post 
+title:  "Apache Flink ML 2.1.0 Release Announcement"
+date: 2022-07-07T08:00:00.000Z
+categories: news
+authors:
+- zhangzhipeng:
+  name: "Zhipeng Zhang"
+- lindong:
+  name: "Dong Lin"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML's infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure via benchmarks and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs to help users learn and use Flink ML.
+
+---
+
+The Apache Flink community is excited to announce the release of Flink ML 2.1.0!
+This release focuses on improving Flink ML's infrastructure, such as Python SDK,
+memory management, and benchmark framework, to facilitate the development of 
+performant, memory-safe, and easy-to-use algorithm libraries. We validated the 
+enhanced infrastructure by implementing, benchmarking, and optimizing 10 new 
+algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the 
+performance of selected algorithms from alternative popular ML libraries.
+In addition, this release added example Python and Java programs for each 
+algorithm in the library to help users learn and use Flink ML.
+
+With the improvements and performance benchmarks made in this release, we 
+believe Flink ML's infrastructure is ready for use by the interested developers 
+in the community to build performant pythonic machine learning libraries.
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) 
+and share your feedback with the community through the Flink 
+[mailing lists](https://flink.apache.org/community.html#mailing-lists) or
+[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new
+release and we’d be eager to learn about your experience with it.
+
+{% toc %}
+
+# Notable Features
+
+## API and Infrastructure
+
+### Supporting fine-grained per-operator memory management
+
+Before this release, algorithm operators with internal states (e.g. the training
+data to be replayed for each round of iteration) store state data using the 
+state-backend API (e.g. ListState). Such an operator either needs to store all 
+data in memory, which risks OOM, or it needs to always store data on disk. 
+In the latter case, it needs to read and de-serialize all data from disks 
+repeatedly in each round of iteration even if the data can fit in RAM, leading 
+to sub-optimal performance when the training data size is small. This makes it 
+hard for developers to write performant and memory-safe operators.
+
+This release enhances the Flink ML infrastructure with the mechanism to specify 
+the amount of managed memory that an operator can consume. This allows algorithm
+operators to write and read data from managed memory when the data size is below
+the quota, and automatically spill those data that exceeds the memory quota to 
+disks to avoid OOM. Algorithm developers can use this mechanism to achieve 
+optimal algorithm performance as input data size varies. Please feel free to 
+check out the implementation of the KMeans operator for example.
+
+### Improved infrastructure for developing online learning algorithms
+
+A key objective of Flink ML is to facilitate the development of online learning 
+applications. In the last release, we enhanced the Flink ML API with 
+setModelData() and getModelData(), which allows users of online learning 
+algorithms to transmit and persist model data as unbounded data streams. 
+This release continues the effort by improving and validating the infrastructure
+needed to develop online learning algorithms.
+
+Specifically, this release added two online learning algorithm prototypes (i.e. 
+OnlineKMeans and OnlineLogisticRegression) with tests covering the entire 
+lifecycle of using these algorithms. These two algorithms introduce concepts 
+such as global batch size and model version, together with metrics and APIs to 
+set and get those values. While the online algorithm prototypes have not been 
+optimized for prediction accuracy yet, this line of work is an important step 
+toward setting up best practices for building online learning algorithms in 
+Flink ML. We hope more contributors from the community can join this effort.
+
+### Tools for benchmarking algorithm performance
+
+An easy-to-use benchmark framework is critical to developing and maintaining 
+performant algorithm libraries in Flink ML. This release added a benchmark 
+framework that provides APIs to write pluggable and reusable data generators, 
+takes benchmark configuration in JSON format, and outputs benchmark results in 
+JSON format to enable custom analysis. An off-the-shelf script is provided to 
+visualize benchmark results using Matplotlib. Feel free to check out this 
+[README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)
+for instructions on how to use this benchmark framework.
+
+The benchmark framework currently supports evaluating algorithm throughput. 
+In the future release, we plan to support evaluating algorithm latency and 
+accuracy.
+
+## Python SDK
+
+This release enhances the Python SDK so that operators in the Flink ML Python 
+library can invoke the corresponding operators in the Java library. The Python 
+operator is a thin-wrapper around the Java operator and delivers the same 
+performance as the Java operator during execution. This capability significantly
+improves developer velocity by allowing algorithm developers to maintain both 
+the Python and the Java libraries of algorithms without having to implement 
+those algorithms twice. 
+
+## Algorithm Library
+This release continues to extend the algorithm library in Flink ML, with the 
+focus on validating the functionalities and the performance of Flink ML 
+infrastructure using representative algorithms in different categories.
+
+Below are the lists of algorithms newly supported in this release, grouped by 
+their categories:
+
+- feature engineering (MinMaxScaler, StringIndexer, VectorAssembler, 
+StandardScaler, Bucketizer)
+- online learning (OnlineKmeans, OnlineLogisiticRegression)
+- regression (LinearRegression)
+- classification (LinearSVC)
+- evaluation (BinaryClassificationEvaluator)
+
+Example Python and Java programs for these algorithms are provided on the Apache
+Flink ML [website](https://nightlies.apache.org/flink/flink-ml-docs-stable/) to 
+help users learn and try out Flink ML. And we also provided example benchmark 
+[configuration files](https://github.com/apache/flink-ml/tree/master/flink-ml-benchmark/src/main/resources)
+in the repo for users to validate Flink ML performance. Feel free to check out 
+this [README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)

Review Comment:
   Change the link to https://github.com/apache/flink-ml/blob/release-2.1/flink-ml-benchmark/README.md?



##########
_posts/2022-07-07-release-ml-2.1.0.md:
##########
@@ -0,0 +1,159 @@
+---
+layout: post 
+title:  "Apache Flink ML 2.1.0 Release Announcement"
+date: 2022-07-07T08:00:00.000Z
+categories: news
+authors:
+- zhangzhipeng:
+  name: "Zhipeng Zhang"
+- lindong:
+  name: "Dong Lin"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML's infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure via benchmarks and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs to help users learn and use Flink ML.
+
+---
+
+The Apache Flink community is excited to announce the release of Flink ML 2.1.0!
+This release focuses on improving Flink ML's infrastructure, such as Python SDK,
+memory management, and benchmark framework, to facilitate the development of 
+performant, memory-safe, and easy-to-use algorithm libraries. We validated the 
+enhanced infrastructure by implementing, benchmarking, and optimizing 10 new 
+algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the 
+performance of selected algorithms from alternative popular ML libraries.
+In addition, this release added example Python and Java programs for each 
+algorithm in the library to help users learn and use Flink ML.
+
+With the improvements and performance benchmarks made in this release, we 
+believe Flink ML's infrastructure is ready for use by the interested developers 
+in the community to build performant pythonic machine learning libraries.
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) 
+and share your feedback with the community through the Flink 
+[mailing lists](https://flink.apache.org/community.html#mailing-lists) or
+[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new
+release and we’d be eager to learn about your experience with it.
+
+{% toc %}
+
+# Notable Features
+
+## API and Infrastructure
+
+### Supporting fine-grained per-operator memory management
+
+Before this release, algorithm operators with internal states (e.g. the training
+data to be replayed for each round of iteration) store state data using the 
+state-backend API (e.g. ListState). Such an operator either needs to store all 
+data in memory, which risks OOM, or it needs to always store data on disk. 
+In the latter case, it needs to read and de-serialize all data from disks 
+repeatedly in each round of iteration even if the data can fit in RAM, leading 
+to sub-optimal performance when the training data size is small. This makes it 
+hard for developers to write performant and memory-safe operators.
+
+This release enhances the Flink ML infrastructure with the mechanism to specify 
+the amount of managed memory that an operator can consume. This allows algorithm
+operators to write and read data from managed memory when the data size is below
+the quota, and automatically spill those data that exceeds the memory quota to 
+disks to avoid OOM. Algorithm developers can use this mechanism to achieve 
+optimal algorithm performance as input data size varies. Please feel free to 
+check out the implementation of the KMeans operator for example.
+
+### Improved infrastructure for developing online learning algorithms
+
+A key objective of Flink ML is to facilitate the development of online learning 
+applications. In the last release, we enhanced the Flink ML API with 
+setModelData() and getModelData(), which allows users of online learning 
+algorithms to transmit and persist model data as unbounded data streams. 
+This release continues the effort by improving and validating the infrastructure
+needed to develop online learning algorithms.
+
+Specifically, this release added two online learning algorithm prototypes (i.e. 
+OnlineKMeans and OnlineLogisticRegression) with tests covering the entire 
+lifecycle of using these algorithms. These two algorithms introduce concepts 
+such as global batch size and model version, together with metrics and APIs to 
+set and get those values. While the online algorithm prototypes have not been 
+optimized for prediction accuracy yet, this line of work is an important step 
+toward setting up best practices for building online learning algorithms in 
+Flink ML. We hope more contributors from the community can join this effort.
+
+### Tools for benchmarking algorithm performance
+
+An easy-to-use benchmark framework is critical to developing and maintaining 
+performant algorithm libraries in Flink ML. This release added a benchmark 
+framework that provides APIs to write pluggable and reusable data generators, 
+takes benchmark configuration in JSON format, and outputs benchmark results in 
+JSON format to enable custom analysis. An off-the-shelf script is provided to 
+visualize benchmark results using Matplotlib. Feel free to check out this 
+[README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)
+for instructions on how to use this benchmark framework.
+
+The benchmark framework currently supports evaluating algorithm throughput. 
+In the future release, we plan to support evaluating algorithm latency and 
+accuracy.
+
+## Python SDK
+
+This release enhances the Python SDK so that operators in the Flink ML Python 
+library can invoke the corresponding operators in the Java library. The Python 
+operator is a thin-wrapper around the Java operator and delivers the same 
+performance as the Java operator during execution. This capability significantly
+improves developer velocity by allowing algorithm developers to maintain both 
+the Python and the Java libraries of algorithms without having to implement 
+those algorithms twice. 
+
+## Algorithm Library
+This release continues to extend the algorithm library in Flink ML, with the 
+focus on validating the functionalities and the performance of Flink ML 
+infrastructure using representative algorithms in different categories.
+
+Below are the lists of algorithms newly supported in this release, grouped by 
+their categories:
+
+- feature engineering (MinMaxScaler, StringIndexer, VectorAssembler, 
+StandardScaler, Bucketizer)
+- online learning (OnlineKmeans, OnlineLogisiticRegression)
+- regression (LinearRegression)
+- classification (LinearSVC)
+- evaluation (BinaryClassificationEvaluator)
+
+Example Python and Java programs for these algorithms are provided on the Apache
+Flink ML [website](https://nightlies.apache.org/flink/flink-ml-docs-stable/) to 
+help users learn and try out Flink ML. And we also provided example benchmark 
+[configuration files](https://github.com/apache/flink-ml/tree/master/flink-ml-benchmark/src/main/resources)
+in the repo for users to validate Flink ML performance. Feel free to check out 
+this [README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)
+for instructions on how to run those benchmarks.
+
+# Upgrade Notes
+
+Please review this note for a list of adjustments to make and issues to check
+if you plan to upgrade to Flink ML 2.1.0.
+
+This note discusses any critical information about incompatibilities and
+breaking changes, performance changes, and any other changes that might impact
+your production deployment of Flink ML.
+
+
+* **Flink dependency is changed from 1.14 to 1.15**.
+
+  This change introduces all the breaking changes listed in the Flink 1.15
+  [release notes](https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/).
+
+# Release Notes and Resources
+
+Please take a look at the [release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351141)
+for a detailed list of changes and new features.
+
+The binary distribution and source artifacts are now available on the updated

Review Comment:
   Can you remove the `binary distribution` here?



##########
_posts/2022-07-07-release-ml-2.1.0.md:
##########
@@ -0,0 +1,159 @@
+---
+layout: post 
+title:  "Apache Flink ML 2.1.0 Release Announcement"
+date: 2022-07-07T08:00:00.000Z
+categories: news
+authors:
+- zhangzhipeng:
+  name: "Zhipeng Zhang"
+- lindong:
+  name: "Dong Lin"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML's infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure via benchmarks and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs to help users learn and use Flink ML.
+
+---
+
+The Apache Flink community is excited to announce the release of Flink ML 2.1.0!
+This release focuses on improving Flink ML's infrastructure, such as Python SDK,
+memory management, and benchmark framework, to facilitate the development of 
+performant, memory-safe, and easy-to-use algorithm libraries. We validated the 
+enhanced infrastructure by implementing, benchmarking, and optimizing 10 new 
+algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the 
+performance of selected algorithms from alternative popular ML libraries.
+In addition, this release added example Python and Java programs for each 
+algorithm in the library to help users learn and use Flink ML.
+
+With the improvements and performance benchmarks made in this release, we 
+believe Flink ML's infrastructure is ready for use by the interested developers 
+in the community to build performant pythonic machine learning libraries.
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) 
+and share your feedback with the community through the Flink 
+[mailing lists](https://flink.apache.org/community.html#mailing-lists) or
+[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new
+release and we’d be eager to learn about your experience with it.
+
+{% toc %}
+
+# Notable Features
+
+## API and Infrastructure
+
+### Supporting fine-grained per-operator memory management
+
+Before this release, algorithm operators with internal states (e.g. the training
+data to be replayed for each round of iteration) store state data using the 
+state-backend API (e.g. ListState). Such an operator either needs to store all 
+data in memory, which risks OOM, or it needs to always store data on disk. 
+In the latter case, it needs to read and de-serialize all data from disks 
+repeatedly in each round of iteration even if the data can fit in RAM, leading 
+to sub-optimal performance when the training data size is small. This makes it 
+hard for developers to write performant and memory-safe operators.
+
+This release enhances the Flink ML infrastructure with the mechanism to specify 
+the amount of managed memory that an operator can consume. This allows algorithm
+operators to write and read data from managed memory when the data size is below
+the quota, and automatically spill those data that exceeds the memory quota to 
+disks to avoid OOM. Algorithm developers can use this mechanism to achieve 
+optimal algorithm performance as input data size varies. Please feel free to 
+check out the implementation of the KMeans operator for example.
+
+### Improved infrastructure for developing online learning algorithms
+
+A key objective of Flink ML is to facilitate the development of online learning 
+applications. In the last release, we enhanced the Flink ML API with 
+setModelData() and getModelData(), which allows users of online learning 
+algorithms to transmit and persist model data as unbounded data streams. 
+This release continues the effort by improving and validating the infrastructure
+needed to develop online learning algorithms.
+
+Specifically, this release added two online learning algorithm prototypes (i.e. 
+OnlineKMeans and OnlineLogisticRegression) with tests covering the entire 
+lifecycle of using these algorithms. These two algorithms introduce concepts 
+such as global batch size and model version, together with metrics and APIs to 
+set and get those values. While the online algorithm prototypes have not been 
+optimized for prediction accuracy yet, this line of work is an important step 
+toward setting up best practices for building online learning algorithms in 
+Flink ML. We hope more contributors from the community can join this effort.
+
+### Tools for benchmarking algorithm performance
+
+An easy-to-use benchmark framework is critical to developing and maintaining 
+performant algorithm libraries in Flink ML. This release added a benchmark 
+framework that provides APIs to write pluggable and reusable data generators, 
+takes benchmark configuration in JSON format, and outputs benchmark results in 
+JSON format to enable custom analysis. An off-the-shelf script is provided to 
+visualize benchmark results using Matplotlib. Feel free to check out this 
+[README](https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/README.md)
+for instructions on how to use this benchmark framework.
+
+The benchmark framework currently supports evaluating algorithm throughput. 
+In the future release, we plan to support evaluating algorithm latency and 
+accuracy.
+
+## Python SDK
+
+This release enhances the Python SDK so that operators in the Flink ML Python 
+library can invoke the corresponding operators in the Java library. The Python 
+operator is a thin-wrapper around the Java operator and delivers the same 
+performance as the Java operator during execution. This capability significantly
+improves developer velocity by allowing algorithm developers to maintain both 
+the Python and the Java libraries of algorithms without having to implement 
+those algorithms twice. 
+
+## Algorithm Library
+This release continues to extend the algorithm library in Flink ML, with the 
+focus on validating the functionalities and the performance of Flink ML 
+infrastructure using representative algorithms in different categories.
+
+Below are the lists of algorithms newly supported in this release, grouped by 
+their categories:
+
+- feature engineering (MinMaxScaler, StringIndexer, VectorAssembler, 

Review Comment:
   Would it be better to start the sentence with an upper-case letter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org