You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2018/04/26 05:01:37 UTC

[beam-site] branch mergebot updated (7cc5dfd -> 75cae4f)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard 7cc5dfd  This closes #418
 discard 523bbd0  Update Nexmark launch instructions for Gradle
     new d41a892  Update Nexmark launch instructions for Gradle
     new 75cae4f  This closes #418

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (7cc5dfd)
            \
             N -- N -- N   refs/heads/mergebot (75cae4f)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.

[beam-site] 01/02: Update Nexmark launch instructions for Gradle

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit d41a8927c4d2ca640f9fdd2abc5fe5ab90e94bf5
Author: Kenneth Knowles <kl...@google.com>
AuthorDate: Wed Apr 11 11:58:59 2018 -0700

    Update Nexmark launch instructions for Gradle
---
 src/documentation/sdks/nexmark.md | 270 ++++++++++++++++++++++++--------------
 1 file changed, 175 insertions(+), 95 deletions(-)

diff --git a/src/documentation/sdks/nexmark.md b/src/documentation/sdks/nexmark.md
index 5e86cd7..d73943a 100644
--- a/src/documentation/sdks/nexmark.md
+++ b/src/documentation/sdks/nexmark.md
@@ -9,12 +9,14 @@ permalink: /documentation/sdks/java/nexmark/
 ## What it is
 
 Nexmark is a suite of pipelines inspired by the 'continuous data stream'
-queries in [Nexmark research paper](http://datalab.cs.pdx.edu/niagaraST/NEXMark/)
+queries in [Nexmark research
+paper](http://datalab.cs.pdx.edu/niagaraST/NEXMark/)
 
-These are multiple queries over a three entities model representing on online auction system:
+These are multiple queries over a three entities model representing on online
+auction system:
 
- - **Person** represents a person submitting an item for auction and/or making a bid
-    on an auction.
+ - **Person** represents a person submitting an item for auction and/or making
+   a bid on an auction.
  - **Auction** represents an item under auction.
  - **Bid** represents a bid for an item under auction.
 
@@ -62,11 +64,14 @@ We have augmented the original queries with five more:
   queries.
 
 ## Benchmark workload configuration
-Here are some of the knobs of the benchmark workload (see [NexmarkConfiguration.java](https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java)).
+
+Here are some of the knobs of the benchmark workload (see
+[NexmarkConfiguration.java](https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java)).
 
 These configuration items can be passed to the launch command line.
 
 ### Events generation (defaults)
+
 * 100 000 events generated
 * 100 generator threads
 * Event rate in SIN curve
@@ -76,21 +81,26 @@ These configuration items can be passed to the launch command line.
 * 1000 concurrent persons bidding / creating auctions
 
 ### Windows (defaults)
+
 * size 10s
 * sliding period 5s
 * watermark hold for 0s
 
 ### Events Proportions (defaults)
+
 * Hot Auctions = ½
 * Hot Bidders =¼
 * Hot Sellers=¼
 
 ### Technical
+
 * Artificial CPU load
 * Artificial IO load
 
 ## Nexmark output
-Here is an example output of the Nexmark benchmark run in streaming mode with the SMOKE suite on the (local) direct runner:
+
+Here is an example output of the Nexmark benchmark run in streaming mode with
+the SMOKE suite on the (local) direct runner:
 
 <pre>
 Performance:
@@ -112,17 +122,19 @@ Performance:
 
 ## Benchmark launch configuration
 
-We can specify the Beam runner to use with maven profiles, available profiles are:
-
-    direct-runner
-    spark-runner
-    flink-runner
-    apex-runner
+The Nexmark launcher accepts the `--runner` argument as usual for programs that
+use Beam PipelineOptions to manage their command line arguments. In addition
+to this, the necessary dependencies must be configured.
 
-The runner must also be specified like in any other Beam pipeline using:
+When running via Gradle, the following two parameters control the execution:
 
-    --runner
+    -P nexmark.args
+        The command line to pass to the Nexmark main program.
 
+    -P nexmark.runner
+	The Gradle project name of the runner, such as ":beam-runners-direct-java" or
+	":beam-runners-flink. The project names can be found in the root
+        `settings.gradle`.
 
 Test data is deterministically synthesized on demand. The test
 data may be synthesized in the same pipeline as the query itself,
@@ -417,124 +429,192 @@ Yet to come
 
 ### Running SMOKE suite on the DirectRunner (local)
 
+The DirectRunner is default, so it is not required to pass `-Pnexmark.runner`.
+Here we do it for maximum clarity.
+
+The direct runner does not have separate batch and streaming modes, but the
+Nexmark launch does.
+
+These parameters leave on many of the DirectRunner's extra safety checks so the
+SMOKE suite can make sure there is nothing broken in the Nexmark suite.
+
 Batch Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=false --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true"
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-direct-java" \
+        -Pnexmark.args="
+            --runner=DirectRunner
+            --streaming=false
+            --suite=SMOKE
+            --manageResources=false
+            --monitorJobs=true
+            --enforceEncodability=true
+            --enforceImmutability=true"
 
 Streaming Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=true --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true"
-
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-direct-java" \
+        -Pnexmark.args="
+            --runner=DirectRunner
+            --streaming=true
+            --suite=SMOKE
+            --manageResources=false
+            --monitorJobs=true
+            --enforceEncodability=true
+            --enforceImmutability=true"
 
 ### Running SMOKE suite on the SparkRunner (local)
 
+The SparkRunner is special-cased in the Nexmark gradle launch. The task will
+provide the version of Spark that the SparkRunner is built against, and
+configure logging.
+
 Batch Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true"
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-spark" \
+        -Pnexmark.args="
+            --runner=SparkRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=false
+            --manageResources=false
+            --monitorJobs=true"
 
 Streaming Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true"
-
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-spark" \
+        -Pnexmark.args="
+            --runner=SparkRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=true
+            --manageResources=false
+            --monitorJobs=true"
 
 ### Running SMOKE suite on the FlinkRunner (local)
 
 Batch Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true  --flinkMaster=local"
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-flink_2.11" \
+        -Pnexmark.args="
+            --runner=FlinkRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=false
+            --manageResources=false
+            --monitorJobs=true
+            --flinkMaster=local"
 
 Streaming Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true  --flinkMaster=local"
-
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-flink_2.11" \
+        -Pnexmark.args="
+            --runner=FlinkRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=true
+            --manageResources=false
+            --monitorJobs=true
+            --flinkMaster=local"
 
 ### Running SMOKE suite on the ApexRunner (local)
 
 Batch Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=false"
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-apex" \
+        -Pnexmark.args="
+            --runner=ApexRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=false
+            --manageResources=false
+            --monitorJobs=true"
 
 Streaming Mode:
 
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=false"
-
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-apex" \
+        -Pnexmark.args="
+            --runner=ApexRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=true
+            --manageResources=false
+            --monitorJobs=true"
 
 ### Running SMOKE suite on Google Cloud Dataflow
 
-Building package:
-
-    mvn clean package -Pdataflow-runner
-
-Submit to Google Dataflow service:
-
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar \
-  org.apache.beam.sdk.nexmark.Main \
-  --runner=DataflowRunner
-  --project=<your project> \
-  --zone=<your zone> \
-  --workerMachineType=n1-highmem-8 \
-  --stagingLocation=gs://<a gs path for staging> \
-  --streaming=true \
-  --sourceType=PUBSUB \
-  --pubSubMode=PUBLISH_ONLY \
-  --pubsubTopic=<an existing Pubsub topic> \
-  --resourceNameMode=VERBATIM \
-  --manageResources=false \
-  --monitorJobs=false \
-  --numEventGenerators=64 \
-  --numWorkers=16 \
-  --maxNumWorkers=16 \
-  --suite=SMOKE \
-  --firstEventRate=100000 \
-  --nextEventRate=100000 \
-  --ratePeriodSec=3600 \
-  --isRateLimited=true \
-  --avgPersonByteSize=500 \
-  --avgAuctionByteSize=500 \
-  --avgBidByteSize=500 \
-  --probDelayedEvent=0.000001 \
-  --occasionalDelaySec=3600 \
-  --numEvents=0 \
-  --useWallclockEventTime=true \
-  --usePubsubPublishTime=true \
-  --experiments=enable_custom_pubsub_sink
-```
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar \
-  org.apache.beam.sdk.nexmark.Main \
-  --runner=DataflowRunner
-  --project=<your project> \
-  --zone=<your zone> \
-  --workerMachineType=n1-highmem-8 \
-  --stagingLocation=gs://<a gs path for staging> \
-  --streaming=true \
-  --sourceType=PUBSUB \
-  --pubSubMode=SUBSCRIBE_ONLY \
-  --pubsubSubscription=<an existing Pubsub subscription to above topic> \
-  --resourceNameMode=VERBATIM \
-  --manageResources=false \
-  --monitorJobs=false \
-  --numWorkers=64 \
-  --maxNumWorkers=64 \
-  --suite=SMOKE \
-  --usePubsubPublishTime=true \
-  --outputPath=gs://<a gs path under which log files will be written> \
-  --windowSizeSec=600 \
-  --occasionalDelaySec=3600 \
-  --maxLogEvents=10000 \
-  --experiments=enable_custom_pubsub_source
-```
+Set these up first so the below command is valid
+
+    PROJECT=<your project>
+    ZONE=<your zone>
+    STAGING_LOCATION=gs://<a GCS path for staging>
+    PUBSUB_TOPCI=<existing pubsub topic>
+
+Launch:
+
+    ./gradlew :beam-sdks-java-nexmark:run \
+        -Pnexmark.runner=":beam-runners-google-cloud-dataflow" \
+        -Pnexmark.args="
+            --runner=DataflowRunner
+            --suite=SMOKE
+            --streamTimeout=60
+            --streaming=true
+            --manageResources=false
+            --monitorJobs=true
+            --project=${PROJECT}
+            --zone=${ZONE}
+            --workerMachineType=n1-highmem-8
+            --stagingLocation=${STAGING_LOCATION}
+            --streaming=true
+            --sourceType=PUBSUB
+            --pubSubMode=PUBLISH_ONLY
+            --pubsubTopic=${PUBSUB_TOPIC}
+            --resourceNameMode=VERBATIM
+            --manageResources=false
+            --monitorJobs=false
+            --numEventGenerators=64
+            --numWorkers=16
+            --maxNumWorkers=16
+            --suite=SMOKE
+            --firstEventRate=100000
+            --nextEventRate=100000
+            --ratePeriodSec=3600
+            --isRateLimited=true
+            --avgPersonByteSize=500
+            --avgAuctionByteSize=500
+            --avgBidByteSize=500
+            --probDelayedEvent=0.000001
+            --occasionalDelaySec=3600
+            --numEvents=0
+            --useWallclockEventTime=true
+            --usePubsubPublishTime=true
+            --experiments=enable_custom_pubsub_sink"
 
 ### Running query 0 on a Spark cluster with Apache Hadoop YARN
 
-
 Building package:
 
-    mvn clean package -Pspark-runner
+    ./gradlew :beam-sdks-java-nexmark:assemble
 
 Submit to the cluster:
 
-    spark-submit --master yarn-client --class org.apache.beam.sdk.nexmark.Main --driver-memory 512m --executor-memory 512m --executor-cores 1 beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar --runner=SparkRunner --query=0 --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true
+    spark-submit \
+        --class org.apache.beam.sdk.nexmark.Main \
+        --master yarn-client \
+        --driver-memory 512m \
+        --executor-memory 512m \
+        --executor-cores 1 \
+        sdks/java/nexmark/build/libs/beam-sdks-java-nexmark-{{ site.release_latest }}-spark.jar \
+            --runner=SparkRunner \
+            --query=0 \
+            --streamTimeout=60 \
+            --streaming=false \
+            --manageResources=false \
+            --monitorJobs=true"

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.

[beam-site] 02/02: This closes #418

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 75cae4fb3d55de2bb508ed53db27e75bbd858e08
Merge: 9193579 d41a892
Author: Mergebot <me...@apache.org>
AuthorDate: Wed Apr 25 22:01:15 2018 -0700

    This closes #418

 src/documentation/sdks/nexmark.md | 270 ++++++++++++++++++++++++--------------
 1 file changed, 175 insertions(+), 95 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-role@apache.org.