You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by al...@apache.org on 2017/05/08 22:04:54 UTC
[1/2] beam git commit: Remove readme.md files that are covered in web
site
Repository: beam
Updated Branches:
refs/heads/master 962e8796d -> 152c5bcc7
Remove readme.md files that are covered in web site
Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fba4c010
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fba4c010
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fba4c010
Branch: refs/heads/master
Commit: fba4c010ce90db8b5cdac489d47126ca3ee8db65
Parents: 962e879
Author: Ahmet Altay <al...@google.com>
Authored: Mon May 8 15:00:33 2017 -0700
Committer: Ahmet Altay <al...@google.com>
Committed: Mon May 8 15:01:09 2017 -0700
----------------------------------------------------------------------
runners/spark/README.md | 124 ----------------------
sdks/java/javadoc/README.md | 23 ----
sdks/python/apache_beam/tests/data/README.md | 20 ----
3 files changed, 167 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/beam/blob/fba4c010/runners/spark/README.md
----------------------------------------------------------------------
diff --git a/runners/spark/README.md b/runners/spark/README.md
deleted file mode 100644
index 15f7081..0000000
--- a/runners/spark/README.md
+++ /dev/null
@@ -1,124 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-Spark Beam Runner (Spark-Runner)
-================================
-
-## Intro
-
-The Spark-Runner allows users to execute data pipelines written against the Apache Beam API
-with Apache Spark. This runner allows to execute both batch and streaming pipelines on top of the Spark engine.
-
-## Overview
-
-### Features
-
-- ParDo
-- GroupByKey
-- Combine
-- Windowing
-- Flatten
-- View
-- Side inputs/outputs
-- Encoding
-
-### Fault-Tolerance
-
-The Spark runner fault-tolerance guarantees the same guarantees as [Apache Spark](http://spark.apache.org/).
-
-### Monitoring
-
-The Spark runner supports user-defined counters via Beam Aggregators implemented on top of Spark's [Accumulators](http://spark.apache.org/docs/1.6.3/programming-guide.html#accumulators).
-The Aggregators (defined by the pipeline author) and Spark's internal metrics are reported using Spark's [metrics system](http://spark.apache.org/docs/1.6.3/monitoring.html#metrics).
-Spark also provides a web UI for monitoring, more details [here](http://spark.apache.org/docs/1.6.3/monitoring.html).
-
-## Beam Model support
-
-### Batch
-
-The Spark runner provides full support for the Beam Model in batch processing via Spark [RDD](http://spark.apache.org/docs/1.6.3/programming-guide.html#resilient-distributed-datasets-rdds)s.
-
-### Streaming
-
-Providing full support for the Beam Model in streaming pipelines is under development. To follow-up you can subscribe to our [mailing list](http://beam.apache.org/get-started/support/).
-
-### issue tracking
-
-See [Beam JIRA](https://issues.apache.org/jira/browse/BEAM) (runner-spark)
-
-
-## Getting Started
-
-To get the latest version of the Spark Runner, first clone the Beam repository:
-
- git clone https://github.com/apache/beam
-
-
-Then switch to the newly created directory and run Maven to build the Apache Beam:
-
- cd beam
- mvn clean install -DskipTests
-
-Now Apache Beam and the Spark Runner are installed in your local maven repository.
-
-If we wanted to run a Beam pipeline with the default options of a Spark instance in local mode,
-we would do the following:
-
- Pipeline p = <logic for pipeline creation >
- PipelineResult result = p.run();
- result.waitUntilFinish();
-
-To create a pipeline runner to run against a different Spark cluster, with a custom master url we
-would do the following:
-
- SparkPipelineOptions options = PipelineOptionsFactory.as(SparkPipelineOptions.class);
- options.setSparkMaster("spark://host:port");
- Pipeline p = <logic for pipeline creation >
- PipelineResult result = p.run();
- result.waitUntilFinish();
-
-## Word Count Example
-
-First download a text document to use as input:
-
- curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt
-
-Switch to the Spark runner directory:
-
- cd runners/spark
-
-Then run the [word count example][wc] from the SDK using a Spark instance in local mode:
-
- mvn exec:exec -DmainClass=org.apache.beam.runners.spark.examples.WordCount \
- -Dinput=/tmp/kinglear.txt -Doutput=/tmp/out -Drunner=SparkRunner \
- -DsparkMaster=local
-
-Check the output by running:
-
- head /tmp/out-00000-of-00001
-
-__Note: running examples using `mvn exec:exec` only works for Spark local mode at the
-moment. See the next section for how to run on a cluster.__
-
-[wc]: https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/examples/WordCount.java
-## Running on a Cluster
-
-Spark Beam pipelines can be run on a cluster using the `spark-submit` command.
-
-TBD pending native HDFS support (currently blocked by [BEAM-59](https://issues.apache.org/jira/browse/BEAM-59)).
http://git-wip-us.apache.org/repos/asf/beam/blob/fba4c010/sdks/java/javadoc/README.md
----------------------------------------------------------------------
diff --git a/sdks/java/javadoc/README.md b/sdks/java/javadoc/README.md
deleted file mode 100644
index bb17c3f..0000000
--- a/sdks/java/javadoc/README.md
+++ /dev/null
@@ -1,23 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-# SDK Javadoc
-
-This directory contains package-info files for external javadoc we would like
-our javadoc to link to using `-linkoffline`.
http://git-wip-us.apache.org/repos/asf/beam/blob/fba4c010/sdks/python/apache_beam/tests/data/README.md
----------------------------------------------------------------------
diff --git a/sdks/python/apache_beam/tests/data/README.md b/sdks/python/apache_beam/tests/data/README.md
deleted file mode 100644
index 5856342..0000000
--- a/sdks/python/apache_beam/tests/data/README.md
+++ /dev/null
@@ -1,20 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-# Generating Test Data Files
-* [privatekey.p12](https://github.com/google/oauth2client/blob/master/tests/data/privatekey.p12)
[2/2] beam git commit: This closes #2966
Posted by al...@apache.org.
This closes #2966
Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/152c5bcc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/152c5bcc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/152c5bcc
Branch: refs/heads/master
Commit: 152c5bcc7f6fb53565ed132653384f4244542b59
Parents: 962e879 fba4c01
Author: Ahmet Altay <al...@google.com>
Authored: Mon May 8 15:04:46 2017 -0700
Committer: Ahmet Altay <al...@google.com>
Committed: Mon May 8 15:04:46 2017 -0700
----------------------------------------------------------------------
runners/spark/README.md | 124 ----------------------
sdks/java/javadoc/README.md | 23 ----
sdks/python/apache_beam/tests/data/README.md | 20 ----
3 files changed, 167 deletions(-)
----------------------------------------------------------------------