You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Rose Nguyen <rt...@google.com> on 2019/01/15 22:53:53 UTC

Apache Beam Newsletter - January 2019

[image: Beam.png]

January 2019 | Newsletter

What’s been done

------------------------------

Apache Beam 2.9.0 released (by: many contributors)

   -

   Download the release here.
   <https://beam.apache.org/get-started/downloads/>
   -

   See the blog post
   <https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html> for more
   details.


NemoRunner (by: Won Wook Song)

   -

   A new Beam runner on Apache Nemo.
   <https://github.com/apache/incubator-nemo>
   -

   See BEAM-6097 <https://github.com/apache/beam/pull/7236> for more
   details.


Code Generation for Beam SQL (by: Andrew Pilloud, SQL team)

   -

   Now using Apache Calcite to generate Java for simple SQL transforms,
   deleting 11,000 lines of code from Beam.
   -

   See BEAM-5112 <https://github.com/apache/beam/pull/6417> for more
   details.


Vendor Guava (by: Kenneth Knowles, Lukasz Cwik)

   -

   Beam uses now a vendored version of Guava. This avoids guava from
   leaking in most modules and make the size of the artifacts smaller
   -

   See BEAM-3608 <https://issues.apache.org/jira/browse/BEAM-3608> for more
   details.


ClickHouseIO (by: Gleb Kanterov)

   -

   A new connector for ClickHouse <https://clickhouse.yandex/>, an
   open-source columnar DBMS for OLAP.
   -

   See BEAM-5964 <https://issues.apache.org/jira/browse/BEAM-5964> for more
   details.


KafkaIO (by: Alexey Romanenko)

   -

   Added support of writing into multiple topics with KafkaIO.
   -

   See BEAM-5798 <https://issues.apache.org/jira/browse/BEAM-5798> for more
   details


Updated I/O connector development guides (by: Melissa Pashniak, Chamikara
Jayalath)

   -

   Updated the existing Overview: Developing I/O connectors
   <https://beam.apache.org/documentation/io/developing-io-overview/> with
   current implementation options.
   -

   Updated the Python-specific guide
   <https://beam.apache.org/documentation/io/developing-io-python/> to
   cover Source and FileBasedSink.
   -

   Added a new Java-specific guide
   <https://beam.apache.org/documentation/io/developing-io-java/> that
   covers Source and FileBasedSink.


Learning Resources page (by: David Cavazos)

   -

   Added a new page that collects Beam learning resources from the site and
   community.
   -

   See Learning Resources
   <https://beam.apache.org/documentation/resources/learning-resources/>
   for more details



What we’re working on...

------------------------------

Spark Runner (by: Marek Simunek, Vaclav Plajt, David Moravek, Ismaël Mejía)

   -

   Performance tuning and optimization
   -

   See BEAM-5987 <https://issues.apache.org/jira/browse/BEAM-5987>,
   BEAM-6332 <https://issues.apache.org/jira/browse/BEAM-6332>, BEAM-5392
   <https://issues.apache.org/jira/browse/BEAM-5392>


Apache Beam 2.10.0 release (by: many contributors)

Apache Beam Colabs (by: David Cavazos)

   -

   Notebook style tutorial on getting starting with Beam
   -

   Will run on Colaboratory in a sandbox environment



     Resources
------------------------------

Apache Beam events tracker (by: many contributors)

   -

   A community-owned spreadsheet to help the Apache Beam community organize
   Meetups and events.
   -

   See the spreadsheet
   <https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/edit?usp=sharing>
   for more details.


Transpose a BigQuery table in Dataflow (by: Sameer Abhyankar)

   -

   Transpose or Pivot a BigQuery table using a Dataflow job.
   -

   See the code and README here
   <https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dataflow-bigquery-transpose>
   .


BigQuery Utilities for Apache Beam (by: Cory Tucker)

   -

   A small library of utilities for making it simpler to read from, write
   to, and generally interact with BigQuery within your Apache Beam pipeline.
   -

   See the project repo
   <https://github.com/WindfallData/beam-bigquery-utils> for more details.


Building a Data Warehouse using Apache Beam and Dataflow (by: Ahmed
El.Hussaini)

   -

   “In this series of posts, I’m going to go through the process of
   building a modern data warehouse using Apache Beam’s Java SDK and Google
   Dataflow… I’ll be using Java 8, Gradle 4.10.2, and Apache Beam’s SDK 2.9.0.”
   -

   See article
   <https://medium.com/@sandboxws/building-a-data-warehouse-using-apache-beam-and-dataflow-part-i-building-your-first-pipeline-b63d22c86662>
   on Medium.


Meet the Apache Software Foundation’s Top 5 Code Committers (by: Ed Targett)

   -

   “The Apache Software Foundation (ASF)—which this year celebrates its
   20th anniversary—is the meritocratic heart of arguably the world’s most
   vibrant open source community.”
   -

   See article <https://www.cbronline.com/feature/apache-top-5> on Computer
   Business Review.


Alibaba Blinks: Building an open source, data-driven cloud empire in
real-time (by: George Anadiotis)

   -

   “Acquiring data Artisans, the vendor leading development of open source
   Apache Flink framework for real-time data processing, is the latest move
   from Alibaba.”
   -

   See article
   <https://www.zdnet.com/article/alibaba-blinks-building-an-open-source-data-driven-cloud-empire-in-real-time/>
   on ZDNet.



Until Next Time!

*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in January 2019 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread.*
-- 
Rose Thị Nguyễn

Fwd: Apache Beam Newsletter - January 2019

Posted by Byung-Gon Chun <bg...@gmail.com>.
NemoRunner was appeared in Apache Beam Newsletter (January 2019).

Cheers,
Gon

---------- Forwarded message ---------
From: Rose Nguyen <rt...@google.com>
Date: Wed, Jan 16, 2019 at 7:54 AM
Subject: Apache Beam Newsletter - January 2019
To: <us...@beam.apache.org>, <de...@beam.apache.org>



[image: Beam.png]

January 2019 | Newsletter

What’s been done

------------------------------

Apache Beam 2.9.0 released (by: many contributors)

   -

   Download the release here.
   <https://beam.apache.org/get-started/downloads/>
   -

   See the blog post
   <https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html> for more
   details.


NemoRunner (by: Won Wook Song)

   -

   A new Beam runner on Apache Nemo.
   <https://github.com/apache/incubator-nemo>
   -

   See BEAM-6097 <https://github.com/apache/beam/pull/7236> for more
   details.


Code Generation for Beam SQL (by: Andrew Pilloud, SQL team)

   -

   Now using Apache Calcite to generate Java for simple SQL transforms,
   deleting 11,000 lines of code from Beam.
   -

   See BEAM-5112 <https://github.com/apache/beam/pull/6417> for more
   details.


Vendor Guava (by: Kenneth Knowles, Lukasz Cwik)

   -

   Beam uses now a vendored version of Guava. This avoids guava from
   leaking in most modules and make the size of the artifacts smaller
   -

   See BEAM-3608 <https://issues.apache.org/jira/browse/BEAM-3608> for more
   details.


ClickHouseIO (by: Gleb Kanterov)

   -

   A new connector for ClickHouse <https://clickhouse.yandex/>, an
   open-source columnar DBMS for OLAP.
   -

   See BEAM-5964 <https://issues.apache.org/jira/browse/BEAM-5964> for more
   details.


KafkaIO (by: Alexey Romanenko)

   -

   Added support of writing into multiple topics with KafkaIO.
   -

   See BEAM-5798 <https://issues.apache.org/jira/browse/BEAM-5798> for more
   details


Updated I/O connector development guides (by: Melissa Pashniak, Chamikara
Jayalath)

   -

   Updated the existing Overview: Developing I/O connectors
   <https://beam.apache.org/documentation/io/developing-io-overview/> with
   current implementation options.
   -

   Updated the Python-specific guide
   <https://beam.apache.org/documentation/io/developing-io-python/> to
   cover Source and FileBasedSink.
   -

   Added a new Java-specific guide
   <https://beam.apache.org/documentation/io/developing-io-java/> that
   covers Source and FileBasedSink.


Learning Resources page (by: David Cavazos)

   -

   Added a new page that collects Beam learning resources from the site and
   community.
   -

   See Learning Resources
   <https://beam.apache.org/documentation/resources/learning-resources/>
   for more details



What we’re working on...

------------------------------

Spark Runner (by: Marek Simunek, Vaclav Plajt, David Moravek, Ismaël Mejía)

   -

   Performance tuning and optimization
   -

   See BEAM-5987 <https://issues.apache.org/jira/browse/BEAM-5987>,
   BEAM-6332 <https://issues.apache.org/jira/browse/BEAM-6332>, BEAM-5392
   <https://issues.apache.org/jira/browse/BEAM-5392>


Apache Beam 2.10.0 release (by: many contributors)

Apache Beam Colabs (by: David Cavazos)

   -

   Notebook style tutorial on getting starting with Beam
   -

   Will run on Colaboratory in a sandbox environment



     Resources
------------------------------

Apache Beam events tracker (by: many contributors)

   -

   A community-owned spreadsheet to help the Apache Beam community organize
   Meetups and events.
   -

   See the spreadsheet
   <https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/edit?usp=sharing>
   for more details.


Transpose a BigQuery table in Dataflow (by: Sameer Abhyankar)

   -

   Transpose or Pivot a BigQuery table using a Dataflow job.
   -

   See the code and README here
   <https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dataflow-bigquery-transpose>
   .


BigQuery Utilities for Apache Beam (by: Cory Tucker)

   -

   A small library of utilities for making it simpler to read from, write
   to, and generally interact with BigQuery within your Apache Beam pipeline.
   -

   See the project repo
   <https://github.com/WindfallData/beam-bigquery-utils> for more details.


Building a Data Warehouse using Apache Beam and Dataflow (by: Ahmed
El.Hussaini)

   -

   “In this series of posts, I’m going to go through the process of
   building a modern data warehouse using Apache Beam’s Java SDK and Google
   Dataflow… I’ll be using Java 8, Gradle 4.10.2, and Apache Beam’s SDK 2.9.0.”
   -

   See article
   <https://medium.com/@sandboxws/building-a-data-warehouse-using-apache-beam-and-dataflow-part-i-building-your-first-pipeline-b63d22c86662>
   on Medium.


Meet the Apache Software Foundation’s Top 5 Code Committers (by: Ed Targett)

   -

   “The Apache Software Foundation (ASF)—which this year celebrates its
   20th anniversary—is the meritocratic heart of arguably the world’s most
   vibrant open source community.”
   -

   See article <https://www.cbronline.com/feature/apache-top-5> on Computer
   Business Review.


Alibaba Blinks: Building an open source, data-driven cloud empire in
real-time (by: George Anadiotis)

   -

   “Acquiring data Artisans, the vendor leading development of open source
   Apache Flink framework for real-time data processing, is the latest move
   from Alibaba.”
   -

   See article
   <https://www.zdnet.com/article/alibaba-blinks-building-an-open-source-data-driven-cloud-empire-in-real-time/>
   on ZDNet.



Until Next Time!

*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in January 2019 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread.*
-- 
Rose Thị Nguyễn


-- 
Byung-Gon Chun