You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by me...@apache.org on 2018/01/24 18:53:52 UTC

[beam-site] branch mergebot updated (9eac3bb -> db51ee5)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


    from 9eac3bb  This closes #378
     add e6f055e  Prepare repository for deployment.
     new 64b8b9d  Add 2017 look back blog post
     new db51ee5  This closes #370

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/programming-guide/index.html |  12 +-
 src/_data/authors.yml                              |   8 ++
 src/_posts/2018-01-09-beam-a-look-back.md          | 132 +++++++++++++++++++++
 src/images/blog/2017-look-back/timeline.png        | Bin 0 -> 12454 bytes
 4 files changed, 146 insertions(+), 6 deletions(-)
 create mode 100644 src/_posts/2018-01-09-beam-a-look-back.md
 create mode 100644 src/images/blog/2017-look-back/timeline.png

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.

[beam-site] 02/02: This closes #370

Posted by me...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit db51ee5167a1092b3a0a6719de0ff4bd4cbb4ce6
Merge: e6f055e 64b8b9d
Author: Mergebot <me...@apache.org>
AuthorDate: Wed Jan 24 10:53:36 2018 -0800

    This closes #370

 src/_data/authors.yml                       |   8 ++
 src/_posts/2018-01-09-beam-a-look-back.md   | 132 ++++++++++++++++++++++++++++
 src/images/blog/2017-look-back/timeline.png | Bin 0 -> 12454 bytes
 3 files changed, 140 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.

[beam-site] 01/02: Add 2017 look back blog post

Posted by me...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 64b8b9d60cbe4f95c1a5bbf7f66b0499b286f5bd
Author: Jean-Baptiste Onofré <jb...@apache.org>
AuthorDate: Wed Jan 10 16:39:49 2018 +0100

    Add 2017 look back blog post
---
 src/_data/authors.yml                       |   8 ++
 src/_posts/2018-01-09-beam-a-look-back.md   | 132 ++++++++++++++++++++++++++++
 src/images/blog/2017-look-back/timeline.png | Bin 0 -> 12454 bytes
 3 files changed, 140 insertions(+)

diff --git a/src/_data/authors.yml b/src/_data/authors.yml
index d8cd836..0980cb5 100644
--- a/src/_data/authors.yml
+++ b/src/_data/authors.yml
@@ -47,3 +47,11 @@ jkff:
     name: Eugene Kirpichov
     email: ekirpichov@gmail.com
     twitter:
+jbonofre:
+    name: Jean-Baptiste Onofré
+    email: jbonofre@apache.org
+    twitter: jbonofre
+ianand:
+    name: Anand Iyer
+    email: ianand@google.com
+    twitter:
diff --git a/src/_posts/2018-01-09-beam-a-look-back.md b/src/_posts/2018-01-09-beam-a-look-back.md
new file mode 100644
index 0000000..d7643dc
--- /dev/null
+++ b/src/_posts/2018-01-09-beam-a-look-back.md
@@ -0,0 +1,132 @@
+---
+layout: post
+title:  "Apache Beam: A Look Back at 2017"
+date:   2018-01-09 00:00:01 -0800
+excerpt_separator: <!--more-->
+categories: blog
+authors:
+  - ianand
+  - jbonofre
+---
+
+On January 10, 2017, Apache Beam got [promoted]({{ site.baseurl }}/blog/2017/01/10/beam-graduates.html)
+as a Top-Level Apache Software Foundation project. It was an important milestone
+that validated the value of the project, legitimacy of its community, and
+heralded its growing adoption. In the past year, Apache Beam has been on a
+phenomenal growth trajectory, with significant growth in its community and
+feature set. Let us walk you through some of the notable achievements.
+
+<!--more-->
+
+## Use cases
+
+First, lets take a glimpse at how Beam was used in 2017. Apache Beam being a
+unified framework for batch and stream processing, enables a very wide spectrum
+of diverse use cases. Here are some use cases that exemplify the versatility of
+Beam.
+
+<img class="center-block"
+     src="{{ site.baseurl }}/images/blog/2017-look-back/timeline.png"
+     alt="Use Cases"
+     width="600">
+
+## Community growth
+
+In 2017, Apache Beam had 174 contributors worldwide, from many different
+organizations. As an Apache project, we are proud to count 18 PMC members and
+31 committers. The community had 7 releases in 2017, each bringing a rich set of
+new features and fixes.
+
+The most obvious and encouraging sign of the growth of Apache Beam’s community,
+and validation of its core value proposition of portability, is the addition of
+significant new [runners]({{ site.baseurl }}/documentation/runners/capability-matrix/)
+(i.e. execution engines). We entered 2017 with Apache Flink, Apache Spark 1.x,
+Google Cloud Dataflow, Apache Apex, and Apache Gearpump. In 2017, the following
+new and updated runners were developed:
+
+ - Apache Spark 2.x update
+ - [IBM Streams runner](https://www.ibm.com/blogs/bluemix/2017/10/streaming-analytics-updates-ibm-streams-runner-apache-beam-2-0/)
+ - MapReduce runner
+ - [JStorm runner](http://jstorm.io/)
+
+In addition to runners, Beam added new IO connectors, some notable ones being
+the Cassandra, MQTT, AMQP, HBase/HCatalog, JDBC, Solr, Tika, Redis, and
+ElasticSearch connectors. Beam’s IO connectors make it possible to read from or
+write to data sources/sinks even when they are not natively supported by the
+underlying execution engine. Beam also provides fully pluggable filesystem
+support, allowing us to support and extend our coverage to HDFS, S3, Azure
+Storage, and Google Storage. We continue to add new IO connectors and
+filesystems to extend the Beam use cases.
+
+A particularly telling sign of the maturity of an open source community is when
+it is able to collaborate with multiple other open source communities, and
+mutually improve the state of the art. Over the past few months, the Beam,
+Calcite, and Flink communities have come together to define a robust [spec](https://docs.google.com/document/d/1wrla8mF_mmq-NW9sdJHYVgMyZsgCmHumJJ5f5WUzTiM/edit)
+for Streaming SQL, with engineers from over four organizations contributing to
+it. If, like us, you are excited by the prospect of improving the state of
+streaming SQL, please join us!
+
+In addition to SQL, new XML and JSON based declarative DSLs are also in PoC.
+
+## Continued innovation
+
+Innovation is important to the success on any open source project, and Beam has
+a rich history of bringing innovative new ideas to the open source community.
+Apache Beam was the first to introduce some seminal concepts in the world of
+big-data processing:
+
+ - Unified batch and streaming SDK that enables users to author big-data jobs
+   without having to learn multiple disparate SDKs/APIs.
+ - Cross-Engine Portability: Giving enterprises the confidence that workloads
+   authored today will not have to be re-written when open source engines become
+   outdated and are supplanted by newer ones.
+ - [Semantics](https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101)
+   essential for reasoning about unbounded unordered data, and achieving
+   consistent and correct output from a streaming job.
+
+In 2017, the pace of innovation continued. The following capabilities were
+introduced:
+
+ - Cross-Language Portability framework, and a [Go](https://golang.org/) SDK
+   developed with it.
+ - Dynamically Shardable IO (SplittableDoFn)
+ - Support for schemas in PCollection, allowing us to extend the runner
+   capabilities.
+ - Extensions addressing new use cases such as machine learning, and new data
+   formats.
+
+## Areas of improvement
+
+Any retrospective view of a project is incomplete without an honest assessment
+of areas of improvement. Two aspects stand out:
+
+ - Helping runners showcase their individual strengths. After all, portability
+   does not imply homogeneity. Different runners have different areas in which
+   they excel, and we need to do a better job of helping them highlight their
+   strengths.
+ - Based on the previous point, helping customers make a more informed decision
+   when they select a runner or migrate from one to another.
+
+In 2018, we aim to take proactive steps to improve the above aspects.
+
+## Ethos of the project and its community
+
+The world of batch and stream big-data processing today is reminiscent of the
+[Tower of Babel](https://en.wikipedia.org/wiki/Tower_of_Babel) parable: a
+slowdown of progress because different communities spoke different languages.
+Similarly, today there are multiple disparate big-data SDKs/APIs, each with
+their own distinct terminology to describe similar concepts. The side effect is
+user confusion and slower adoption.
+
+The Apache Beam project aims to provide an industry standard portable SDK that
+will:
+
+ - Benefit users by providing ***innovation with stability***: The separation of
+   SDK and engine enables healthy competition between runners, without requiring
+   users to constantly learn new SDKs/APIs and rewrite their workloads to
+   benefit from new innovation.
+ - Benefit big-data engines by ***growing the pie for everyone***: Making it
+   easier for users to author, maintain, upgrade and migrate their big-data
+   workloads will lead to significant growth in the number of production
+   big-data deployments.
+
diff --git a/src/images/blog/2017-look-back/timeline.png b/src/images/blog/2017-look-back/timeline.png
new file mode 100644
index 0000000..0394cd8
Binary files /dev/null and b/src/images/blog/2017-look-back/timeline.png differ

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.