You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Rose Nguyen <rt...@google.com> on 2019/01/15 22:53:53 UTC
Apache Beam Newsletter - January 2019
[image: Beam.png]
January 2019 | Newsletter
What’s been done
------------------------------
Apache Beam 2.9.0 released (by: many contributors)
-
Download the release here.
<https://beam.apache.org/get-started/downloads/>
-
See the blog post
<https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html> for more
details.
NemoRunner (by: Won Wook Song)
-
A new Beam runner on Apache Nemo.
<https://github.com/apache/incubator-nemo>
-
See BEAM-6097 <https://github.com/apache/beam/pull/7236> for more
details.
Code Generation for Beam SQL (by: Andrew Pilloud, SQL team)
-
Now using Apache Calcite to generate Java for simple SQL transforms,
deleting 11,000 lines of code from Beam.
-
See BEAM-5112 <https://github.com/apache/beam/pull/6417> for more
details.
Vendor Guava (by: Kenneth Knowles, Lukasz Cwik)
-
Beam uses now a vendored version of Guava. This avoids guava from
leaking in most modules and make the size of the artifacts smaller
-
See BEAM-3608 <https://issues.apache.org/jira/browse/BEAM-3608> for more
details.
ClickHouseIO (by: Gleb Kanterov)
-
A new connector for ClickHouse <https://clickhouse.yandex/>, an
open-source columnar DBMS for OLAP.
-
See BEAM-5964 <https://issues.apache.org/jira/browse/BEAM-5964> for more
details.
KafkaIO (by: Alexey Romanenko)
-
Added support of writing into multiple topics with KafkaIO.
-
See BEAM-5798 <https://issues.apache.org/jira/browse/BEAM-5798> for more
details
Updated I/O connector development guides (by: Melissa Pashniak, Chamikara
Jayalath)
-
Updated the existing Overview: Developing I/O connectors
<https://beam.apache.org/documentation/io/developing-io-overview/> with
current implementation options.
-
Updated the Python-specific guide
<https://beam.apache.org/documentation/io/developing-io-python/> to
cover Source and FileBasedSink.
-
Added a new Java-specific guide
<https://beam.apache.org/documentation/io/developing-io-java/> that
covers Source and FileBasedSink.
Learning Resources page (by: David Cavazos)
-
Added a new page that collects Beam learning resources from the site and
community.
-
See Learning Resources
<https://beam.apache.org/documentation/resources/learning-resources/>
for more details
What we’re working on...
------------------------------
Spark Runner (by: Marek Simunek, Vaclav Plajt, David Moravek, Ismaël Mejía)
-
Performance tuning and optimization
-
See BEAM-5987 <https://issues.apache.org/jira/browse/BEAM-5987>,
BEAM-6332 <https://issues.apache.org/jira/browse/BEAM-6332>, BEAM-5392
<https://issues.apache.org/jira/browse/BEAM-5392>
Apache Beam 2.10.0 release (by: many contributors)
Apache Beam Colabs (by: David Cavazos)
-
Notebook style tutorial on getting starting with Beam
-
Will run on Colaboratory in a sandbox environment
Resources
------------------------------
Apache Beam events tracker (by: many contributors)
-
A community-owned spreadsheet to help the Apache Beam community organize
Meetups and events.
-
See the spreadsheet
<https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/edit?usp=sharing>
for more details.
Transpose a BigQuery table in Dataflow (by: Sameer Abhyankar)
-
Transpose or Pivot a BigQuery table using a Dataflow job.
-
See the code and README here
<https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dataflow-bigquery-transpose>
.
BigQuery Utilities for Apache Beam (by: Cory Tucker)
-
A small library of utilities for making it simpler to read from, write
to, and generally interact with BigQuery within your Apache Beam pipeline.
-
See the project repo
<https://github.com/WindfallData/beam-bigquery-utils> for more details.
Building a Data Warehouse using Apache Beam and Dataflow (by: Ahmed
El.Hussaini)
-
“In this series of posts, I’m going to go through the process of
building a modern data warehouse using Apache Beam’s Java SDK and Google
Dataflow… I’ll be using Java 8, Gradle 4.10.2, and Apache Beam’s SDK 2.9.0.”
-
See article
<https://medium.com/@sandboxws/building-a-data-warehouse-using-apache-beam-and-dataflow-part-i-building-your-first-pipeline-b63d22c86662>
on Medium.
Meet the Apache Software Foundation’s Top 5 Code Committers (by: Ed Targett)
-
“The Apache Software Foundation (ASF)—which this year celebrates its
20th anniversary—is the meritocratic heart of arguably the world’s most
vibrant open source community.”
-
See article <https://www.cbronline.com/feature/apache-top-5> on Computer
Business Review.
Alibaba Blinks: Building an open source, data-driven cloud empire in
real-time (by: George Anadiotis)
-
“Acquiring data Artisans, the vendor leading development of open source
Apache Flink framework for real-time data processing, is the latest move
from Alibaba.”
-
See article
<https://www.zdnet.com/article/alibaba-blinks-building-an-open-source-data-driven-cloud-empire-in-real-time/>
on ZDNet.
Until Next Time!
*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in January 2019 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread.*
--
Rose Thị Nguyễn
Fwd: Apache Beam Newsletter - January 2019
Posted by Byung-Gon Chun <bg...@gmail.com>.
NemoRunner was appeared in Apache Beam Newsletter (January 2019).
Cheers,
Gon
---------- Forwarded message ---------
From: Rose Nguyen <rt...@google.com>
Date: Wed, Jan 16, 2019 at 7:54 AM
Subject: Apache Beam Newsletter - January 2019
To: <us...@beam.apache.org>, <de...@beam.apache.org>
[image: Beam.png]
January 2019 | Newsletter
What’s been done
------------------------------
Apache Beam 2.9.0 released (by: many contributors)
-
Download the release here.
<https://beam.apache.org/get-started/downloads/>
-
See the blog post
<https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html> for more
details.
NemoRunner (by: Won Wook Song)
-
A new Beam runner on Apache Nemo.
<https://github.com/apache/incubator-nemo>
-
See BEAM-6097 <https://github.com/apache/beam/pull/7236> for more
details.
Code Generation for Beam SQL (by: Andrew Pilloud, SQL team)
-
Now using Apache Calcite to generate Java for simple SQL transforms,
deleting 11,000 lines of code from Beam.
-
See BEAM-5112 <https://github.com/apache/beam/pull/6417> for more
details.
Vendor Guava (by: Kenneth Knowles, Lukasz Cwik)
-
Beam uses now a vendored version of Guava. This avoids guava from
leaking in most modules and make the size of the artifacts smaller
-
See BEAM-3608 <https://issues.apache.org/jira/browse/BEAM-3608> for more
details.
ClickHouseIO (by: Gleb Kanterov)
-
A new connector for ClickHouse <https://clickhouse.yandex/>, an
open-source columnar DBMS for OLAP.
-
See BEAM-5964 <https://issues.apache.org/jira/browse/BEAM-5964> for more
details.
KafkaIO (by: Alexey Romanenko)
-
Added support of writing into multiple topics with KafkaIO.
-
See BEAM-5798 <https://issues.apache.org/jira/browse/BEAM-5798> for more
details
Updated I/O connector development guides (by: Melissa Pashniak, Chamikara
Jayalath)
-
Updated the existing Overview: Developing I/O connectors
<https://beam.apache.org/documentation/io/developing-io-overview/> with
current implementation options.
-
Updated the Python-specific guide
<https://beam.apache.org/documentation/io/developing-io-python/> to
cover Source and FileBasedSink.
-
Added a new Java-specific guide
<https://beam.apache.org/documentation/io/developing-io-java/> that
covers Source and FileBasedSink.
Learning Resources page (by: David Cavazos)
-
Added a new page that collects Beam learning resources from the site and
community.
-
See Learning Resources
<https://beam.apache.org/documentation/resources/learning-resources/>
for more details
What we’re working on...
------------------------------
Spark Runner (by: Marek Simunek, Vaclav Plajt, David Moravek, Ismaël Mejía)
-
Performance tuning and optimization
-
See BEAM-5987 <https://issues.apache.org/jira/browse/BEAM-5987>,
BEAM-6332 <https://issues.apache.org/jira/browse/BEAM-6332>, BEAM-5392
<https://issues.apache.org/jira/browse/BEAM-5392>
Apache Beam 2.10.0 release (by: many contributors)
Apache Beam Colabs (by: David Cavazos)
-
Notebook style tutorial on getting starting with Beam
-
Will run on Colaboratory in a sandbox environment
Resources
------------------------------
Apache Beam events tracker (by: many contributors)
-
A community-owned spreadsheet to help the Apache Beam community organize
Meetups and events.
-
See the spreadsheet
<https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/edit?usp=sharing>
for more details.
Transpose a BigQuery table in Dataflow (by: Sameer Abhyankar)
-
Transpose or Pivot a BigQuery table using a Dataflow job.
-
See the code and README here
<https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dataflow-bigquery-transpose>
.
BigQuery Utilities for Apache Beam (by: Cory Tucker)
-
A small library of utilities for making it simpler to read from, write
to, and generally interact with BigQuery within your Apache Beam pipeline.
-
See the project repo
<https://github.com/WindfallData/beam-bigquery-utils> for more details.
Building a Data Warehouse using Apache Beam and Dataflow (by: Ahmed
El.Hussaini)
-
“In this series of posts, I’m going to go through the process of
building a modern data warehouse using Apache Beam’s Java SDK and Google
Dataflow… I’ll be using Java 8, Gradle 4.10.2, and Apache Beam’s SDK 2.9.0.”
-
See article
<https://medium.com/@sandboxws/building-a-data-warehouse-using-apache-beam-and-dataflow-part-i-building-your-first-pipeline-b63d22c86662>
on Medium.
Meet the Apache Software Foundation’s Top 5 Code Committers (by: Ed Targett)
-
“The Apache Software Foundation (ASF)—which this year celebrates its
20th anniversary—is the meritocratic heart of arguably the world’s most
vibrant open source community.”
-
See article <https://www.cbronline.com/feature/apache-top-5> on Computer
Business Review.
Alibaba Blinks: Building an open source, data-driven cloud empire in
real-time (by: George Anadiotis)
-
“Acquiring data Artisans, the vendor leading development of open source
Apache Flink framework for real-time data processing, is the latest move
from Alibaba.”
-
See article
<https://www.zdnet.com/article/alibaba-blinks-building-an-open-source-data-driven-cloud-empire-in-real-time/>
on ZDNet.
Until Next Time!
*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in January 2019 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread.*
--
Rose Thị Nguyễn
--
Byung-Gon Chun