You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by ie...@apache.org on 2017/11/08 14:29:06 UTC
[1/2] beam git commit: [BEAM-2853] remove Nexmark README.md now that
the doc is in the website
Repository: beam
Updated Branches:
refs/heads/master e0166ceb6 -> 9c5454287
[BEAM-2853] remove Nexmark README.md now that the doc is in the website
Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f78f4559
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f78f4559
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f78f4559
Branch: refs/heads/master
Commit: f78f45596bf5f3a5c3fd0b12b74688e1beeb3c7a
Parents: e0166ce
Author: Etienne Chauchot <ec...@gmail.com>
Authored: Wed Nov 8 11:23:13 2017 +0100
Committer: Etienne Chauchot <ec...@gmail.com>
Committed: Wed Nov 8 11:23:13 2017 +0100
----------------------------------------------------------------------
sdks/java/nexmark/README.md | 340 ---------------------------------------
1 file changed, 340 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/beam/blob/f78f4559/sdks/java/nexmark/README.md
----------------------------------------------------------------------
diff --git a/sdks/java/nexmark/README.md b/sdks/java/nexmark/README.md
deleted file mode 100644
index f252943..0000000
--- a/sdks/java/nexmark/README.md
+++ /dev/null
@@ -1,340 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-# NEXMark test suite
-
-This is a suite of pipelines inspired by the 'continuous data stream'
-queries in [http://datalab.cs.pdx.edu/niagaraST/NEXMark/]
-(http://datalab.cs.pdx.edu/niagaraST/NEXMark/).
-
-These are multiple queries over a three entities model representing on online auction system:
-
- - **Person** represents a person submitting an item for auction and/or making a bid
- on an auction.
- - **Auction** represents an item under auction.
- - **Bid** represents a bid for an item under auction.
-
-The queries exercise many aspects of Beam model:
-
-* **Query1**: What are the bid values in Euro's?
- Illustrates a simple map.
-* **Query2**: What are the auctions with particular auction numbers?
- Illustrates a simple filter.
-* **Query3**: Who is selling in particular US states?
- Illustrates an incremental join (using per-key state and timer) and filter.
-* **Query4**: What is the average selling price for each auction
- category?
- Illustrates complex join (using custom window functions) and
- aggregation.
-* **Query5**: Which auctions have seen the most bids in the last period?
- Illustrates sliding windows and combiners.
-* **Query6**: What is the average selling price per seller for their
- last 10 closed auctions.
- Shares the same 'winning bids' core as for **Query4**, and
- illustrates a specialized combiner.
-* **Query7**: What are the highest bids per period?
- Deliberately implemented using a side input to illustrate fanout.
-* **Query8**: Who has entered the system and created an auction in
- the last period?
- Illustrates a simple join.
-
-We have augmented the original queries with five more:
-
-* **Query0**: Pass-through.
- Allows us to measure the monitoring overhead.
-* **Query9**: Winning-bids.
- A common sub-query shared by **Query4** and **Query6**.
-* **Query10**: Log all events to GCS files.
- Illustrates windows with large side effects on firing.
-* **Query11**: How many bids did a user make in each session they
- were active?
- Illustrates session windows.
-* **Query12**: How many bids does a user make within a fixed
- processing time limit?
- Illustrates working in processing time in the Global window, as
- compared with event time in non-Global windows for all the other
- queries.
-
-We can specify the Beam runner to use with maven profiles, available profiles are:
-
-* direct-runner
-* spark-runner
-* flink-runner
-* apex-runner
-
-The runner must also be specified like in any other Beam pipeline using
-
- --runner
-
-
-Test data is deterministically synthesized on demand. The test
-data may be synthesized in the same pipeline as the query itself,
-or may be published to Pubsub.
-
-The query results may be:
-
-* Published to Pubsub.
-* Written to text files as plain text.
-* Written to text files using an Avro encoding.
-* Send to BigQuery.
-* Discarded.
-
-# Configuration
-
-## Common configuration parameters
-
-Decide if batch or streaming:
-
- --streaming=true
-
-Number of events generators
-
- --numEventGenerators=4
-
-Run query N
-
- --query=N
-
-## Available Suites
-The suite to run can be chosen using this configuration parameter:
-
- --suite=SUITE
-
-Available suites are:
-* DEFAULT: Test default configuration with query 0.
-* SMOKE: Run the 12 default configurations.
-* STRESS: Like smoke but for 1m events.
-* FULL_THROTTLE: Like SMOKE but 100m events.
-
-
-
-## Apex specific configuration
-
- --manageResources=false --monitorJobs=false
-
-## Dataflow specific configuration
-
- --manageResources=false --monitorJobs=true \
- --enforceEncodability=false --enforceImmutability=false
- --project=<your project> \
- --zone=<your zone> \
- --workerMachineType=n1-highmem-8 \
- --stagingLocation=<a gs path for staging> \
- --runner=DataflowRunner \
- --tempLocation=gs://talend-imejia/nexmark/temp/ \
- --stagingLocation=gs://talend-imejia/nexmark/temp/staging/ \
- --filesToStage=target/beam-sdks-java-nexmark-2.1.0-SNAPSHOT.jar
-
-## Direct specific configuration
-
- --manageResources=false --monitorJobs=true \
- --enforceEncodability=false --enforceImmutability=false
-
-## Flink specific configuration
-
- --manageResources=false --monitorJobs=true \
- --flinkMaster=local --parallelism=#numcores
-
-## Spark specific configuration
-
- --manageResources=false --monitorJobs=true \
- --sparkMaster=local \
- -Dspark.ui.enabled=false -DSPARK_LOCAL_IP=localhost -Dsun.io.serialization.extendedDebugInfo=true
-
-# Current Status
-
-Open issues are tracked [here](https://github.com../../../../../issues):
-
-## Batch / Synthetic / Local
-
-| Query | Direct | Spark | Flink | Apex |
-| ----: | ------ | ------------------------------------------------------------ | ---------------------------------------------------------- | ------------------------------------------------------------ |
-| 0 | ok | ok | ok | ok |
-| 1 | ok | ok | ok | ok |
-| 2 | ok | ok | ok | ok |
-| 3 | ok | [BEAM-1035](https://issues.apache.org/jira/browse/BEAM-1035) | ok | [BEAM-1037](https://issues.apache.org/jira/browse/BEAM-1037) |
-| 4 | ok | ok | ok | ok |
-| 5 | ok | ok | ok | ok |
-| 6 | ok | ok | ok | ok |
-| 7 | ok | ok | ok | ok |
-| 8 | ok | ok | ok | ok |
-| 9 | ok | ok | ok | ok |
-| 10 | ok | ok | ok | ok |
-| 11 | ok | ok | ok | ok |
-| 12 | ok | ok | ok | ok |
-
-## Streaming / Synthetic / Local
-
-| Query | Direct | Spark | Flink | Apex |
-| ----: | ------ | ------------------------------------------------------------ | ---------------------------------------------------------- | ------------------------------------------------------------ |
-| 0 | ok | ok | ok | ok |
-| 1 | ok | ok | ok | ok |
-| 2 | ok | ok | ok | ok |
-| 3 | ok | [BEAM-1035](https://issues.apache.org/jira/browse/BEAM-1035) | ok | [BEAM-1037](https://issues.apache.org/jira/browse/BEAM-1037) |
-| 4 | ok | ok | ok | ok |
-| 5 | ok | ok | ok | ok |
-| 6 | ok | ok | ok | ok |
-| 7 | ok | [BEAM-2112](https://issues.apache.org/jira/browse/BEAM-2112) | ok | ok |
-| 8 | ok | ok | ok | ok |
-| 9 | ok | ok | ok | ok |
-| 10 | ok | ok | ok | ok |
-| 11 | ok | ok | ok | ok |
-| 12 | ok | ok | ok | ok |
-
-## Batch / Synthetic / Cluster
-
-TODO
-
-| Query | Dataflow | Spark | Flink | Apex |
-| ----: | ------------------------------ | ------------------------------ | ------------------------------ | ------------------------------ |
-| 0 | | | | |
-
-## Streaming / Synthetic / Cluster
-
-TODO
-
-| Query | Dataflow | Spark | Flink | Apex |
-| ----: | ------------------------------ | ------------------------------ | ------------------------------ | ------------------------------ |
-| 0 | | | | |
-
-# Running Nexmark
-
-## Running SMOKE suite on the DirectRunner (local)
-
-Batch Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=false --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true"
-
-Streaming Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=true --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true"
-
-
-## Running SMOKE suite on the SparkRunner (local)
-
-Batch Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true"
-
-Streaming Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true"
-
-
-## Running SMOKE suite on the FlinkRunner (local)
-
-Batch Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true --flinkMaster=local"
-
-Streaming Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true --flinkMaster=local"
-
-
-## Running SMOKE suite on the ApexRunner (local)
-
-Batch Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=false"
-
-Streaming Mode
-
- mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=false"
-
-
-## Running SMOKE suite on Google Cloud Dataflow
-
-Building package
-
- mvn clean package -Pdataflow-runner
-
-Submit to Google Dataflow service
-
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar \
- org.apache.beam.sdk.nexmark.Main \
- --runner=DataflowRunner
- --project=<your project> \
- --zone=<your zone> \
- --workerMachineType=n1-highmem-8 \
- --stagingLocation=<a gs path for staging> \
- --streaming=true \
- --sourceType=PUBSUB \
- --pubSubMode=PUBLISH_ONLY \
- --pubsubTopic=<an existing Pubsub topic> \
- --resourceNameMode=VERBATIM \
- --manageResources=false \
- --monitorJobs=false \
- --numEventGenerators=64 \
- --numWorkers=16 \
- --maxNumWorkers=16 \
- --suite=SMOKE \
- --firstEventRate=100000 \
- --nextEventRate=100000 \
- --ratePeriodSec=3600 \
- --isRateLimited=true \
- --avgPersonByteSize=500 \
- --avgAuctionByteSize=500 \
- --avgBidByteSize=500 \
- --probDelayedEvent=0.000001 \
- --occasionalDelaySec=3600 \
- --numEvents=0 \
- --useWallclockEventTime=true \
- --usePubsubPublishTime=true \
- --experiments=enable_custom_pubsub_sink
-```
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar \
- org.apache.beam.sdk.nexmark.Main \
- --runner=DataflowRunner
- --project=<your project> \
- --zone=<your zone> \
- --workerMachineType=n1-highmem-8 \
- --stagingLocation=<a gs path for staging> \
- --streaming=true \
- --sourceType=PUBSUB \
- --pubSubMode=SUBSCRIBE_ONLY \
- --pubsubSubscription=<an existing Pubsub subscription to above topic> \
- --resourceNameMode=VERBATIM \
- --manageResources=false \
- --monitorJobs=false \
- --numWorkers=64 \
- --maxNumWorkers=64 \
- --suite=SMOKE \
- --usePubsubPublishTime=true \
- --outputPath=<a gs path under which log files will be written> \
- --windowSizeSec=600 \
- --occasionalDelaySec=3600 \
- --maxLogEvents=10000 \
- --experiments=enable_custom_pubsub_source
-```
-
-## Running query 0 on a Spark cluster with yarn
-
-Building package
-
- mvn clean package -Pspark-runner
-
-Submit to the cluster
-
- spark-submit --master yarn-client --class org.apache.beam.sdk.nexmark.Main --driver-memory 512m --executor-memory 512m --executor-cores 1 beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar --runner=SparkRunner --query=0 --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true
-
[2/2] beam git commit: This closes #4097
Posted by ie...@apache.org.
This closes #4097
Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9c545428
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9c545428
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9c545428
Branch: refs/heads/master
Commit: 9c54542875cd7180e225cbe6837074539e505934
Parents: e0166ce f78f455
Author: Ismaël Mejía <ie...@gmail.com>
Authored: Wed Nov 8 15:23:18 2017 +0100
Committer: Ismaël Mejía <ie...@gmail.com>
Committed: Wed Nov 8 15:23:18 2017 +0100
----------------------------------------------------------------------
sdks/java/nexmark/README.md | 340 ---------------------------------------
1 file changed, 340 deletions(-)
----------------------------------------------------------------------