You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Danny McCormick via user <us...@beam.apache.org> on 2023/11/18 00:21:05 UTC

Beam 2.52.0 Release

I am happy to announce that the 2.52.0 release of Beam has been finalized.
This release includes both improvements and new functionality.

For more information on changes in 2.52.0, check out the detailed release
notes - https://github.com/apache/beam/milestone/16. Here is an overview of
the changes in the release.

Highlights

* Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been
finally removed from Java SDK "core" package. Please, use
`beam-sdks-java-extensions-avro` instead. This will allow to easily update
Avro version in user code without potential breaking changes in Beam "core"
since the Beam Avro extension already supports the latest Avro versions and
should handle this. (https://github.com/apache/beam/issues/25252).
* Publishing Java 21 SDK container images now supported as part of Apache
Beam release process. (https://github.com/apache/beam/issues/28120)
  * Direct Runner and Dataflow Runner support running pipelines on Java21
(experimental until tests fully setup). For other runners (Flink, Spark,
Samza, etc) support status depend on runner projects.

New Features / Improvements

* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it
is set to true, Flink runner will run batch jobs using the DataStream API.
By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK (
https://github.com/apache/beam/pull/28621).
* state amd side input cache has been enabled to a default of 100 MB. Use
`--max_cache_memory_usage_mb=X` to provide cache size for the user state
API and side inputs. (Python) (https://github.com/apache/beam/issues/28770).
* Beam YAML stable release. Beam pipelines can now be written using YAML
and leverage the Beam YAML framework which includes a preliminary set of
IO's and turnkey transforms. More information can be found in the YAML root
folder and in the (
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md
).

Breaking Changes

* `org.apache.beam.sdk.io.CountingSource.CounterMark` uses custom
`CounterMarkCoder` as a default coder since all Avro-dependent classes
finally moved to `extensions/avro`. In case if it's still required to use
`AvroCoder` for `CounterMark`, then, as a workaround, a copy of "old"
`CountingSource` class should be placed into a project code and used
directly
(https://github.com/apache/beam/issues/25252).
* Renamed `host` to `firestoreHost` in `FirestoreOptions` to avoid
potential conflict of command line arguments (Java) (
https://github.com/apache/beam/pull/29201).

Bugfixes

* Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's
BigtableIO.BigtableSource when you have more cores than bytes to read
(Java) (https://github.com/apache/beam/issues/28793).
* `watch_file_pattern` arg of the RunInference arg had no effect prior to
2.52.0. To use the behavior of arg `watch_file_pattern` prior to 2.52.0,
follow the documentation at
https://beam.apache.org/documentation/ml/side-input-updates/ and use
`WatchFilePattern` PTransform as a SideInput. (
https://github.com/apache/beam/pulls/28948)
* `MLTransform` doesn't output artifacts such as min, max and quantiles.
Instead, `MLTransform` will add a feature to output these artifacts as
human readable format - (https://github.com/apache/beam/issues/29017). For
now, to use the artifacts such as min and max that were produced by the
eariler `MLTransform`, use `read_artifact_location` of `MLTransform`, which
reads artifacts that were produced earlier in a different `MLTransform` (
https://github.com/apache/beam/pull/29016/)
* Fixed a memory leak, which affected some long-running Python pipelines: (
https://github.com/apache/beam/issues/28246).

Security Fixes

* Fixed CVE-2023-39325 - (https://www.cve.org/CVERecord?id=CVE-2023-39325)
(Java/Python/Go) (https://github.com/apache/beam/issues/29118).
* Mitigated CVE-2023-47248 - (
https://nvd.nist.gov/vuln/detail/CVE-2023-47248)  (Python) (
https://github.com/apache/beam/issues/29392).

Thanks,
Danny

Re: Beam 2.52.0 Release

Posted by XQ Hu via user <us...@beam.apache.org>.

Thanks a lot! Great job, Team!

On Fri, Nov 17, 2023 at 7:21 PM Danny McCormick via user <
user@beam.apache.org> wrote:

> I am happy to announce that the 2.52.0 release of Beam has been finalized.
> This release includes both improvements and new functionality.
>
> For more information on changes in 2.52.0, check out the detailed release
> notes - https://github.com/apache/beam/milestone/16. Here is an overview
> of the changes in the release.
>
> Highlights
>
> * Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been
> finally removed from Java SDK "core" package. Please, use
> `beam-sdks-java-extensions-avro` instead. This will allow to easily update
> Avro version in user code without potential breaking changes in Beam "core"
> since the Beam Avro extension already supports the latest Avro versions and
> should handle this. (https://github.com/apache/beam/issues/25252).
> * Publishing Java 21 SDK container images now supported as part of Apache
> Beam release process. (https://github.com/apache/beam/issues/28120)
>   * Direct Runner and Dataflow Runner support running pipelines on Java21
> (experimental until tests fully setup). For other runners (Flink, Spark,
> Samza, etc) support status depend on runner projects.
>
> New Features / Improvements
>
> * Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it
> is set to true, Flink runner will run batch jobs using the DataStream API.
> By default the option is set to false, so the batch jobs are still executed
> using the DataSet API.
> * `upload_graph` as one of the Experiments options for DataflowRunner is
> no longer required when the graph is larger than 10MB for Java SDK (
> https://github.com/apache/beam/pull/28621).
> * state amd side input cache has been enabled to a default of 100 MB. Use
> `--max_cache_memory_usage_mb=X` to provide cache size for the user state
> API and side inputs. (Python) (https://github.com/apache/beam/issues/28770
> ).
> * Beam YAML stable release. Beam pipelines can now be written using YAML
> and leverage the Beam YAML framework which includes a preliminary set of
> IO's and turnkey transforms. More information can be found in the YAML root
> folder and in the (
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md
> ).
>
> Breaking Changes
>
> * `org.apache.beam.sdk.io.CountingSource.CounterMark` uses custom
> `CounterMarkCoder` as a default coder since all Avro-dependent classes
> finally moved to `extensions/avro`. In case if it's still required to use
> `AvroCoder` for `CounterMark`, then, as a workaround, a copy of "old"
> `CountingSource` class should be placed into a project code and used
> directly
> (https://github.com/apache/beam/issues/25252).
> * Renamed `host` to `firestoreHost` in `FirestoreOptions` to avoid
> potential conflict of command line arguments (Java) (
> https://github.com/apache/beam/pull/29201).
>
> Bugfixes
>
> * Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's
> BigtableIO.BigtableSource when you have more cores than bytes to read
> (Java) (https://github.com/apache/beam/issues/28793).
> * `watch_file_pattern` arg of the RunInference arg had no effect prior to
> 2.52.0. To use the behavior of arg `watch_file_pattern` prior to 2.52.0,
> follow the documentation at
> https://beam.apache.org/documentation/ml/side-input-updates/ and use
> `WatchFilePattern` PTransform as a SideInput. (
> https://github.com/apache/beam/pulls/28948)
> * `MLTransform` doesn't output artifacts such as min, max and quantiles.
> Instead, `MLTransform` will add a feature to output these artifacts as
> human readable format - (https://github.com/apache/beam/issues/29017).
> For now, to use the artifacts such as min and max that were produced by the
> eariler `MLTransform`, use `read_artifact_location` of `MLTransform`, which
> reads artifacts that were produced earlier in a different `MLTransform` (
> https://github.com/apache/beam/pull/29016/)
> * Fixed a memory leak, which affected some long-running Python pipelines: (
> https://github.com/apache/beam/issues/28246).
>
> Security Fixes
>
> * Fixed CVE-2023-39325 - (https://www.cve.org/CVERecord?id=CVE-2023-39325)
> (Java/Python/Go) (https://github.com/apache/beam/issues/29118).
> * Mitigated CVE-2023-47248 - (
> https://nvd.nist.gov/vuln/detail/CVE-2023-47248)  (Python) (
> https://github.com/apache/beam/issues/29392).
>
> Thanks,
> Danny
>

Re: Beam 2.52.0 Release

Posted by XQ Hu via dev <de...@beam.apache.org>.

Thanks a lot! Great job, Team!

On Fri, Nov 17, 2023 at 7:21 PM Danny McCormick via user <
user@beam.apache.org> wrote:

> I am happy to announce that the 2.52.0 release of Beam has been finalized.
> This release includes both improvements and new functionality.
>
> For more information on changes in 2.52.0, check out the detailed release
> notes - https://github.com/apache/beam/milestone/16. Here is an overview
> of the changes in the release.
>
> Highlights
>
> * Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been
> finally removed from Java SDK "core" package. Please, use
> `beam-sdks-java-extensions-avro` instead. This will allow to easily update
> Avro version in user code without potential breaking changes in Beam "core"
> since the Beam Avro extension already supports the latest Avro versions and
> should handle this. (https://github.com/apache/beam/issues/25252).
> * Publishing Java 21 SDK container images now supported as part of Apache
> Beam release process. (https://github.com/apache/beam/issues/28120)
>   * Direct Runner and Dataflow Runner support running pipelines on Java21
> (experimental until tests fully setup). For other runners (Flink, Spark,
> Samza, etc) support status depend on runner projects.
>
> New Features / Improvements
>
> * Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it
> is set to true, Flink runner will run batch jobs using the DataStream API.
> By default the option is set to false, so the batch jobs are still executed
> using the DataSet API.
> * `upload_graph` as one of the Experiments options for DataflowRunner is
> no longer required when the graph is larger than 10MB for Java SDK (
> https://github.com/apache/beam/pull/28621).
> * state amd side input cache has been enabled to a default of 100 MB. Use
> `--max_cache_memory_usage_mb=X` to provide cache size for the user state
> API and side inputs. (Python) (https://github.com/apache/beam/issues/28770
> ).
> * Beam YAML stable release. Beam pipelines can now be written using YAML
> and leverage the Beam YAML framework which includes a preliminary set of
> IO's and turnkey transforms. More information can be found in the YAML root
> folder and in the (
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md
> ).
>
> Breaking Changes
>
> * `org.apache.beam.sdk.io.CountingSource.CounterMark` uses custom
> `CounterMarkCoder` as a default coder since all Avro-dependent classes
> finally moved to `extensions/avro`. In case if it's still required to use
> `AvroCoder` for `CounterMark`, then, as a workaround, a copy of "old"
> `CountingSource` class should be placed into a project code and used
> directly
> (https://github.com/apache/beam/issues/25252).
> * Renamed `host` to `firestoreHost` in `FirestoreOptions` to avoid
> potential conflict of command line arguments (Java) (
> https://github.com/apache/beam/pull/29201).
>
> Bugfixes
>
> * Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's
> BigtableIO.BigtableSource when you have more cores than bytes to read
> (Java) (https://github.com/apache/beam/issues/28793).
> * `watch_file_pattern` arg of the RunInference arg had no effect prior to
> 2.52.0. To use the behavior of arg `watch_file_pattern` prior to 2.52.0,
> follow the documentation at
> https://beam.apache.org/documentation/ml/side-input-updates/ and use
> `WatchFilePattern` PTransform as a SideInput. (
> https://github.com/apache/beam/pulls/28948)
> * `MLTransform` doesn't output artifacts such as min, max and quantiles.
> Instead, `MLTransform` will add a feature to output these artifacts as
> human readable format - (https://github.com/apache/beam/issues/29017).
> For now, to use the artifacts such as min and max that were produced by the
> eariler `MLTransform`, use `read_artifact_location` of `MLTransform`, which
> reads artifacts that were produced earlier in a different `MLTransform` (
> https://github.com/apache/beam/pull/29016/)
> * Fixed a memory leak, which affected some long-running Python pipelines: (
> https://github.com/apache/beam/issues/28246).
>
> Security Fixes
>
> * Fixed CVE-2023-39325 - (https://www.cve.org/CVERecord?id=CVE-2023-39325)
> (Java/Python/Go) (https://github.com/apache/beam/issues/29118).
> * Mitigated CVE-2023-47248 - (
> https://nvd.nist.gov/vuln/detail/CVE-2023-47248)  (Python) (
> https://github.com/apache/beam/issues/29392).
>
> Thanks,
> Danny
>