You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by ie...@apache.org on 2017/06/27 10:00:10 UTC

[1/3] beam-site git commit: Add Amazon DynamoDB example using HadoopInputFormatIO

Repository: beam-site
Updated Branches:
  refs/heads/asf-site 3ab9c27eb -> 855364b8b


Add Amazon DynamoDB example using HadoopInputFormatIO


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/920a0be8
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/920a0be8
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/920a0be8

Branch: refs/heads/asf-site
Commit: 920a0be82cc96d621435e3d320552d0799804e3d
Parents: 3ab9c27
Author: Seshadri Chakkravarthy <se...@gmail.com>
Authored: Fri Jun 23 09:37:39 2017 -0700
Committer: Ismaël Mejía <ie...@gmail.com>
Committed: Tue Jun 27 11:56:25 2017 +0200

----------------------------------------------------------------------
 src/documentation/io/built-in-hadoop.md | 43 ++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/920a0be8/src/documentation/io/built-in-hadoop.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/built-in-hadoop.md b/src/documentation/io/built-in-hadoop.md
index 722facb..240d919 100644
--- a/src/documentation/io/built-in-hadoop.md
+++ b/src/documentation/io/built-in-hadoop.md
@@ -225,4 +225,47 @@ PCollection<KV<Long, HCatRecord>> hcatData =
 
 ```py
   # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+### Amazon DynamoDB - DynamoDBInputFormat
+
+To read data from Amazon DynamoDB, use `org.apache.hadoop.dynamodb.read.DynamoDBInputFormat`.
+DynamoDBInputFormat implements the older `org.apache.hadoop.mapred.InputFormat` interface and to make it compatible with HadoopInputFormatIO which uses the newer abstract class `org.apache.hadoop.mapreduce.InputFormat`, 
+a wrapper API is required which acts as an adapter between HadoopInputFormatIO and DynamoDBInputFormat (or in general any InputFormat implementing `org.apache.hadoop.mapred.InputFormat`)
+The below example uses one such available wrapper API - <https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java>
+
+
+```java
+Configuration dynamoDBConf = new Configuration();
+Job job = Job.getInstance(dynamoDBConf);
+com.twitter.elephantbird.mapreduce.input.MapReduceInputFormatWrapper.setInputFormat(org.apache.hadoop.dynamodb.read.DynamoDBInputFormat.class, job);
+dynamoDBConf = job.getConfiguration();
+dynamoDBConf.setClass("key.class", Text.class, WritableComparable.class);
+dynamoDBConf.setClass("value.class", org.apache.hadoop.dynamodb.DynamoDBItemWritable.class, Writable.class);
+dynamoDBConf.set("dynamodb.servicename", "dynamodb");
+dynamoDBConf.set("dynamodb.input.tableName", "table_name");
+dynamoDBConf.set("dynamodb.endpoint", "dynamodb.us-west-1.amazonaws.com");
+dynamoDBConf.set("dynamodb.regionid", "us-west-1");
+dynamoDBConf.set("dynamodb.throughput.read", "1");
+dynamoDBConf.set("dynamodb.throughput.read.percent", "1");
+dynamoDBConf.set("dynamodb.version", "2011-12-05");
+dynamoDBConf.set(DynamoDBConstants.DYNAMODB_ACCESS_KEY_CONF, "aws_access_key");
+dynamoDBConf.set(DynamoDBConstants.DYNAMODB_SECRET_KEY_CONF, "aws_secret_key");
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+Call Read transform as follows:
+
+```java
+PCollection<Text, DynamoDBItemWritable> dynamoDBData =
+  p.apply("read",
+  HadoopInputFormatIO.<Text, DynamoDBItemWritable>read()
+  .withConfiguration(dynamoDBConf);
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
 ```
\ No newline at end of file


[2/3] beam-site git commit: Regenerate website

Posted by ie...@apache.org.
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c66525cc
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c66525cc
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c66525cc

Branch: refs/heads/asf-site
Commit: c66525cc62c38bfd10b3a295ed97f036aa3b856a
Parents: 920a0be
Author: Ismaël Mejía <ie...@gmail.com>
Authored: Tue Jun 27 11:57:06 2017 +0200
Committer: Ismaël Mejía <ie...@gmail.com>
Committed: Tue Jun 27 11:57:06 2017 +0200

----------------------------------------------------------------------
 .../documentation/io/built-in/hadoop/index.html | 42 ++++++++++++++++++++
 1 file changed, 42 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/c66525cc/content/documentation/io/built-in/hadoop/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html
index a18c9b9..ce66332 100644
--- a/content/documentation/io/built-in/hadoop/index.html
+++ b/content/documentation/io/built-in/hadoop/index.html
@@ -362,6 +362,48 @@
 </code></pre>
 </div>
 
+<h3 id="amazon-dynamodb---dynamodbinputformat">Amazon DynamoDB - DynamoDBInputFormat</h3>
+
+<p>To read data from Amazon DynamoDB, use <code class="highlighter-rouge">org.apache.hadoop.dynamodb.read.DynamoDBInputFormat</code>.
+DynamoDBInputFormat implements the older <code class="highlighter-rouge">org.apache.hadoop.mapred.InputFormat</code> interface and to make it compatible with HadoopInputFormatIO which uses the newer abstract class <code class="highlighter-rouge">org.apache.hadoop.mapreduce.InputFormat</code>, 
+a wrapper API is required which acts as an adapter between HadoopInputFormatIO and DynamoDBInputFormat (or in general any InputFormat implementing <code class="highlighter-rouge">org.apache.hadoop.mapred.InputFormat</code>)
+The below example uses one such available wrapper API - <a href="https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java">https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java</a></p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Configuration</span> <span class="n">dynamoDBConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Configuration</span><span class="o">();</span>
+<span class="n">Job</span> <span class="n">job</span> <span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span class="na">getInstance</span><span class="o">(</span><span class="n">dynamoDBConf</span><span class="o">);</span>
+<span class="n">com</span><span class="o">.</span><span class="na">twitter</span><span class="o">.</span><span class="na">elephantbird</span><span class="o">.</span><span class="na">mapreduce</span><span class="o">.</span><span class="na">input</span><span class="o">.</span><span class="na">MapReduceInputFormatWrapper</span><span class="o">.</span><span class="na">setInputFormat</span><span class="o">(</span><span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">hadoop</span><span class="o">.</span><span class="na">dynamodb</span><span class="o">.</span><span class="na">read</span><span class="o">.</span><span class="na">DynamoDBInputFormat</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">job</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span> <span class="o">=</span> <span class="n">job</span><span class="o">.</span><span class="na">getConfiguration</span><span class="o">();</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">setClass</span><span class="o">(</span><span class="s">"key.class"</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">WritableComparable</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">setClass</span><span class="o">(</span><span class="s">"value.class"</span><span class="o">,</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">hadoop</span><span class="o">.</span><span class="na">dynamodb</span><span class="o">.</span><span class="na">DynamoDBItemWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Writable</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.servicename"</span><span class="o">,</span> <span class="s">"dynamodb"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.input.tableName"</span><span class="o">,</span> <span class="s">"table_name"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.endpoint"</span><span class="o">,</span> <span class="s">"dynamodb.us-west-1.amazonaws.com"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.regionid"</span><span class="o">,</span> <span class="s">"us-west-1"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.throughput.read"</span><span class="o">,</span> <span class="s">"1"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.throughput.read.percent"</span><span class="o">,</span> <span class="s">"1"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.version"</span><span class="o">,</span> <span class="s">"2011-12-05"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="n">DynamoDBConstants</span><span class="o">.</span><span class="na">DYNAMODB_ACCESS_KEY_CONF</span><span class="o">,</span> <span class="s">"aws_access_key"</span><span class="o">);</span>
+<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="n">DynamoDBConstants</span><span class="o">.</span><span class="na">DYNAMODB_SECRET_KEY_CONF</span><span class="o">,</span> <span class="s">"aws_secret_key"</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<div class="language-py highlighter-rouge"><pre class="highlight"><code>  <span class="c"># The Beam SDK for Python does not support Hadoop InputFormat IO.</span>
+</code></pre>
+</div>
+
+<p>Call Read transform as follows:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">DynamoDBItemWritable</span><span class="o">&gt;</span> <span class="n">dynamoDBData</span> <span class="o">=</span>
+  <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="s">"read"</span><span class="o">,</span>
+  <span class="n">HadoopInputFormatIO</span><span class="o">.&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">DynamoDBItemWritable</span><span class="o">&gt;</span><span class="n">read</span><span class="o">()</span>
+  <span class="o">.</span><span class="na">withConfiguration</span><span class="o">(</span><span class="n">dynamoDBConf</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<div class="language-py highlighter-rouge"><pre class="highlight"><code>  <span class="c"># The Beam SDK for Python does not support Hadoop InputFormat IO.</span>
+</code></pre>
+</div>
+
     </div>
     <footer class="footer">
   <div class="footer__contained">


[3/3] beam-site git commit: This closes #258

Posted by ie...@apache.org.
This closes #258


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/855364b8
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/855364b8
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/855364b8

Branch: refs/heads/asf-site
Commit: 855364b8b777db1b22a1c4acfab1afb9b8d42d47
Parents: 3ab9c27 c66525c
Author: Ismaël Mejía <ie...@gmail.com>
Authored: Tue Jun 27 11:57:06 2017 +0200
Committer: Ismaël Mejía <ie...@gmail.com>
Committed: Tue Jun 27 11:57:06 2017 +0200

----------------------------------------------------------------------
 .../documentation/io/built-in/hadoop/index.html | 42 +++++++++++++++++++
 src/documentation/io/built-in-hadoop.md         | 43 ++++++++++++++++++++
 2 files changed, 85 insertions(+)
----------------------------------------------------------------------