You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by xi...@apache.org on 2017/10/26 17:22:41 UTC

samza-hello-samza git commit: Add instructions to README.md

Repository: samza-hello-samza
Updated Branches:
  refs/heads/master 1a34a83d1 -> 498063c11


Add instructions to README.md


Project: http://git-wip-us.apache.org/repos/asf/samza-hello-samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza-hello-samza/commit/498063c1
Tree: http://git-wip-us.apache.org/repos/asf/samza-hello-samza/tree/498063c1
Diff: http://git-wip-us.apache.org/repos/asf/samza-hello-samza/diff/498063c1

Branch: refs/heads/master
Commit: 498063c11b21a72ba42ab5de40bdf8d83e7161ee
Parents: 1a34a83
Author: xiliu <xi...@xiliu-ld1.linkedin.biz>
Authored: Tue Oct 24 15:50:47 2017 -0700
Committer: xiliu <xi...@xiliu-ld1.linkedin.biz>
Committed: Tue Oct 24 15:50:47 2017 -0700

----------------------------------------------------------------------
 README.md    | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 build.gradle |  1 +
 2 files changed, 86 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza-hello-samza/blob/498063c1/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 0f80e9e..27140eb 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@ hello-samza
 
 Hello Samza is a starter project for [Apache Samza](http://samza.apache.org/) jobs.
 
-Please see [Hello Samza](http://samza.apache.org/startup/hello-samza/0.13/) to get started.
+Please see [Hello Samza](http://samza.apache.org/startup/hello-samza/0.13/) and [Hello Samza High Level API](http://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-yarn.html) to get started.
 
 ### Pull requests and questions
 
@@ -12,3 +12,87 @@ Please see [Hello Samza](http://samza.apache.org/startup/hello-samza/0.13/) to g
 ### Contribution
 
 To start contributing on [Hello Samza](http://samza.apache.org/startup/hello-samza/0.13/) first read [Rules](http://samza.apache.org/contribute/rules.html) and [Contributor Corner](https://cwiki.apache.org/confluence/display/SAMZA/Contributor%27s+Corner). Notice that [Hello Samza](http://samza.apache.org/startup/hello-samza/0.13/) git repository does not support git pull request.
+
+### Instructions
+
+The **Hello Samza** project contains example Samza applications of high-level API as well as low-level API. The following are the instructions to install the binaries and run the applications in a local Yarn cluster. 
+
+#### 1. Get the Code
+
+Check out the hello-samza project:
+
+{% highlight bash %}
+git clone https://git.apache.org/samza-hello-samza.git hello-samza
+cd hello-samza
+git checkout latest
+{% endhighlight %}
+
+This project contains everything you'll need to run your first Samza application.
+
+#### 2. Start a Grid
+
+A Samza grid usually comprises three different systems: [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html), [Kafka](http://kafka.apache.org/), and [ZooKeeper](http://zookeeper.apache.org/). The hello-samza project comes with a script called "grid" to help you setup these systems. Start by running:
+
+{% highlight bash %}
+./bin/grid bootstrap
+{% endhighlight %}
+
+This command will download, install, and start ZooKeeper, Kafka, and YARN. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called "deploy" inside hello-samza's root folder.
+
+If you get a complaint that JAVA_HOME is not set, then you'll need to set it to the path where Java is installed on your system.
+
+Once the grid command completes, you can verify that YARN is up and running by going to [http://localhost:8088](http://localhost:8088). This is the YARN UI.
+
+#### 3. Build a Samza Application Package
+
+Before you can run a Samza application, you need to build a package for it. This package is what YARN uses to deploy your apps on the grid.
+
+NOTE: if you are building from the latest branch of hello-samza project, make sure that you run the following step from your local Samza project first:
+
+{% highlight bash %}
+git clone https://github.com/apache/samza.git
+cd samza
+./gradlew publishToMavenLocal
+{% endhighlight %}
+
+Then, you can continue w/ the following command in hello-samza project:
+
+{% highlight bash %}
+mvn clean package
+mkdir -p deploy/samza
+tar -xvf ./target/hello-samza-0.14.0-SNAPSHOT-dist.tar.gz -C deploy/samza
+{% endhighlight %}
+
+#### 4. Run a Samza Application
+
+After you've built your Samza package, you can start the example applications on the grid.
+
+##### - High-level API Examples
+
+Package **samza.examples.cookbook** contains various examples of high-level API operator usage, such as map, partitionBy, window and join. Each example is a runnable Samza application, and the steps to run is in the class javadocs, e.g [PageViewAdClickJoiner](https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/cookbook/PageViewAdClickJoiner.java).
+
+Package **samza.examples.wikipedia.application** contains a small Samza application which consumes the real-time feeds from Wikipedia, extracts the metadata of the events, and calculates statistics of all edits in a 10-second window. You can start the app on the grid using the run-app.sh script:
+
+{% highlight bash %}
+./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-application.properties
+{% endhighlight %}
+
+Once the job is started, we can tail the kafka topic by:
+
+{% highlight bash %}
+./deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
+{% endhighlight %}
+
+A code walkthrough of this application can be found [here](http://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-code.html).
+
+##### - Low-level API Examples
+
+Package **samza.examples.wikipedia.task** contains the low-level API Samza code for the Wikipedia example. To run it, use the following scripts:
+
+{% highlight bash %}
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties
+{% endhighlight %}
+
+Once the jobs are started, you can use the same _kafka-console-consumer.sh_ command as in High-level API Wikipedia example to check out the output of the statistics.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza-hello-samza/blob/498063c1/build.gradle
----------------------------------------------------------------------
diff --git a/build.gradle b/build.gradle
index ec451d5..9d1f543 100644
--- a/build.gradle
+++ b/build.gradle
@@ -18,6 +18,7 @@
  */
 
 apply plugin: 'eclipse'
+apply plugin: 'idea'
 apply plugin: 'java'
 
 defaultTasks 'distTar'