You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by fh...@apache.org on 2014/11/18 16:34:43 UTC

svn commit: r1640370 - in /incubator/flink: _posts/ img/blog/ site/ site/blog/ site/blog/page2/ site/blog/page3/ site/img/blog/ site/news/2014/11/18/

Author: fhueske
Date: Tue Nov 18 15:34:42 2014
New Revision: 1640370

URL: http://svn.apache.org/r1640370
Log:
Added Hadoop Compat Blog Post

Added:
    incubator/flink/_posts/2014-11-18-hadoop-compatibility.md
    incubator/flink/img/blog/hcompat-flow.png   (with props)
    incubator/flink/img/blog/hcompat-logos.png   (with props)
    incubator/flink/site/blog/page3/
    incubator/flink/site/blog/page3/index.html
    incubator/flink/site/img/blog/hcompat-flow.png   (with props)
    incubator/flink/site/img/blog/hcompat-logos.png   (with props)
    incubator/flink/site/news/2014/11/18/
    incubator/flink/site/news/2014/11/18/hadoop-compatibility.html
Modified:
    incubator/flink/site/blog/index.html
    incubator/flink/site/blog/page2/index.html
    incubator/flink/site/index.html

Added: incubator/flink/_posts/2014-11-18-hadoop-compatibility.md
URL: http://svn.apache.org/viewvc/incubator/flink/_posts/2014-11-18-hadoop-compatibility.md?rev=1640370&view=auto
==============================================================================
--- incubator/flink/_posts/2014-11-18-hadoop-compatibility.md (added)
+++ incubator/flink/_posts/2014-11-18-hadoop-compatibility.md Tue Nov 18 15:34:42 2014
@@ -0,0 +1,91 @@
+---
+layout: post
+title:  'Hadoop Compatibility in Flink'
+date:   2014-11-18 10:00:00
+categories: news
+---
+
+[Apache Hadoop](http://hadoop.apache.org) is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. While being conceptually equivalent, Hadoop’s MapReduce and Flink’s interfaces for these functions are unfortunately not source compatible.
+
+##Flink’s Hadoop Compatibility Package
+
+<center>
+<img src="{{ site.baseurl }}/img/blog/hcompat-logos.png" style="width:30%;margin:15px">
+</center>
+
+To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a [Google Summer of Code](https://developers.google.com/open-source/soc/) 2014 project. 
+
+With the Hadoop Compatibility package, you can reuse all your Hadoop
+
+* ``InputFormats`` (mapred and mapreduce APIs)
+* ``OutputFormats`` (mapred and mapreduce APIs)
+* ``Mappers`` (mapred API)
+* ``Reducers`` (mapred API)
+
+in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (``Writables`` and ``WritableComparable``).
+
+The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions. 
+
+```java
+
+// Definition of Hadoop Mapper function
+public class Tokenizer implements Mapper<LongWritable, Text, Text, LongWritable> { ... }
+// Definition of Hadoop Reducer function
+public class Counter implements Reducer<Text, LongWritable, Text, LongWritable> { ... }
+
+public static void main(String[] args) {
+  final String inputPath = args[0];
+  final String outputPath = args[1];
+
+  final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+        
+  // Setup Hadoop’s TextInputFormat
+  HadoopInputFormat<LongWritable, Text> hadoopInputFormat = 
+      new HadoopInputFormat<LongWritable, Text>(
+        new TextInputFormat(), LongWritable.class, Text.class, new JobConf());
+  TextInputFormat.addInputPath(hadoopInputFormat.getJobConf(), new Path(inputPath));
+  
+  // Read a DataSet with the Hadoop InputFormat
+  DataSet<Tuple2<LongWritable, Text>> text = env.createInput(hadoopInputFormat);
+  DataSet<Tuple2<Text, LongWritable>> words = text
+    // Wrap Tokenizer Mapper function
+    .flatMap(new HadoopMapFunction<LongWritable, Text, Text, LongWritable>(new Tokenizer()))
+    .groupBy(0)
+    // Wrap Counter Reducer function (used as Reducer and Combiner)
+    .reduceGroup(new HadoopReduceCombineFunction<Text, LongWritable, Text, LongWritable>(
+      new Counter(), new Counter()));
+        
+  // Setup Hadoop’s TextOutputFormat
+  HadoopOutputFormat<Text, LongWritable> hadoopOutputFormat = 
+    new HadoopOutputFormat<Text, LongWritable>(
+      new TextOutputFormat<Text, LongWritable>(), new JobConf());
+  hadoopOutputFormat.getJobConf().set("mapred.textoutputformat.separator", " ");
+  TextOutputFormat.setOutputPath(hadoopOutputFormat.getJobConf(), new Path(outputPath));
+        
+  // Output & Execute
+  words.output(hadoopOutputFormat);
+  env.execute("Hadoop Compat WordCount");
+}
+
+```
+
+As you can see, Flink represents Hadoop key-value pairs as `Tuple2<key, value>` tuples. Note, that the program uses Flink’s `groupBy()` transformation to group data on the key field (field 0 of the `Tuple2<key, value>`) before it is given to the Reducer function. At the moment, the compatibility package does not evaluate custom Hadoop partitioners, sorting comparators, or grouping comparators.
+
+Hadoop functions can be used at any position within a Flink program and of course also be mixed with native Flink functions. This means that instead of assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler such as [Apache Oozie](http://oozie.apache.org), you can implement an arbitrary complex Flink program consisting of multiple Hadoop Input- and OutputFormats, Mapper and Reducer functions. When executing such a Flink program, data will be pipelined between your Hadoop functions and will not be written to HDFS just for the purpose of data exchange.
+
+<center>
+<img src="{{ site.baseurl }}/img/blog/hcompat-flow.png" style="width:100%;margin:15px">
+</center>
+
+##What comes next?
+
+While the Hadoop compatibility package is already very useful, we are currently working on a dedicated Hadoop Job operation to embed and execute Hadoop jobs as a whole in Flink programs, including their custom partitioning, sorting, and grouping code. With this feature, you will be able to chain multiple Hadoop jobs, mix them with Flink functions, and other operations such as [Spargel]({{ site.baseurl }}/docs/0.7-incubating/spargel_guide.html) operations (Pregel/Giraph-style jobs).
+
+##Summary
+
+Flink lets you reuse a lot of the code you wrote for Hadoop MapReduce, including all data types, all Input- and OutputFormats, and Mapper and Reducers of the mapred-API. Hadoop functions can be used within Flink programs and mixed with all other Flink functions. Due to Flink’s pipelined execution, Hadoop functions can arbitrarily be assembled without data exchange via HDFS. Moreover, the Flink community is currently working on a dedicated Hadoop Job operation to supporting the execution of Hadoop jobs as a whole.
+
+If you want to use Flink’s Hadoop compatibility package checkout our [documentation]({{ site.baseurl }}/docs/0.7-incubating/hadoop_compatibility.html).
+
+<br>
+<small>Written by Fabian Hueske ([@fhueske](https://twitter.com/fhueske)).</small>
\ No newline at end of file

Added: incubator/flink/img/blog/hcompat-flow.png
URL: http://svn.apache.org/viewvc/incubator/flink/img/blog/hcompat-flow.png?rev=1640370&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/flink/img/blog/hcompat-flow.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/flink/img/blog/hcompat-logos.png
URL: http://svn.apache.org/viewvc/incubator/flink/img/blog/hcompat-logos.png?rev=1640370&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/flink/img/blog/hcompat-logos.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/flink/site/blog/index.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/blog/index.html?rev=1640370&r1=1640369&r2=1640370&view=diff
==============================================================================
--- incubator/flink/site/blog/index.html (original)
+++ incubator/flink/site/blog/index.html Tue Nov 18 15:34:42 2014
@@ -116,6 +116,95 @@
 		<div class="col-md-8">
 			
 			<article>
+				<h2><a href="/news/2014/11/18/hadoop-compatibility.html">Hadoop Compatibility in Flink</a></h2>
+				<p class="meta">18 Nov 2014</p>
+				
+				<div><p><a href="http://hadoop.apache.org">Apache Hadoop</a> is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. While being conceptually equivalent, Hadoop’s MapReduce and Flink’s interfaces for these functions are unfortunately not source compatible.</p>
+
+<h2 id="flink’s-hadoop-compatibility-package">Flink’s Hadoop Compatibility Package</h2>
+
+<p><center>
+<img src="/img/blog/hcompat-logos.png" style="width:30%;margin:15px">
+</center></p>
+
+<p>To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a <a href="https://developers.google.com/open-source/soc/">Google Summer of Code</a> 2014 project. </p>
+
+<p>With the Hadoop Compatibility package, you can reuse all your Hadoop</p>
+
+<ul>
+<li><code>InputFormats</code> (mapred and mapreduce APIs)</li>
+<li><code>OutputFormats</code> (mapred and mapreduce APIs)</li>
+<li><code>Mappers</code> (mapred API)</li>
+<li><code>Reducers</code> (mapred API)</li>
+</ul>
+
+<p>in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (<code>Writables</code> and <code>WritableComparable</code>).</p>
+
+<p>The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions. </p>
+<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Definition of Hadoop Mapper function</span>
+<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Tokenizer</span> <span class="kd">implements</span> <span class="n">Mapper</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="c1">// Definition of Hadoop Reducer function</span>
+<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Counter</span> <span class="kd">implements</span> <span class="n">Reducer</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+
+<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+  <span class="kd">final</span> <span class="n">String</span> <span class="n">inputPath</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
+  <span class="kd">final</span> <span class="n">String</span> <span class="n">outputPath</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
+
+  <span class="kd">final</span> <span class="n">ExecutionEnvironment</span> <span class="n">env</span> <span class="o">=</span> <span class="n">ExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span><span class="o">();</span>
+
+  <span class="c1">// Setup Hadoop’s TextInputFormat</span>
+  <span class="n">HadoopInputFormat</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;</span> <span class="n">hadoopInputFormat</span> <span class="o">=</span> 
+      <span class="k">new</span> <span class="n">HadoopInputFormat</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;(</span>
+        <span class="k">new</span> <span class="nf">TextInputFormat</span><span class="o">(),</span> <span class="n">LongWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span>
+  <span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">inputPath</span><span class="o">));</span>
+
+  <span class="c1">// Read a DataSet with the Hadoop InputFormat</span>
+  <span class="n">DataSet</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;&gt;</span> <span class="n">text</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">createInput</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">);</span>
+  <span class="n">DataSet</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">text</span>
+    <span class="c1">// Wrap Tokenizer Mapper function</span>
+    <span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopMapFunction</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span><span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">()))</span>
+    <span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
+    <span class="c1">// Wrap Counter Reducer function (used as Reducer and Combiner)</span>
+    <span class="o">.</span><span class="na">reduceGroup</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopReduceCombineFunction</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span>
+      <span class="k">new</span> <span class="nf">Counter</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Counter</span><span class="o">()));</span>
+
+  <span class="c1">// Setup Hadoop’s TextOutputFormat</span>
+  <span class="n">HadoopOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="n">hadoopOutputFormat</span> <span class="o">=</span> 
+    <span class="k">new</span> <span class="n">HadoopOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span>
+      <span class="k">new</span> <span class="n">TextOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(),</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span>
+  <span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">().</span><span class="na">set</span><span class="o">(</span><span class="s">&quot;mapred.textoutputformat.separator&quot;</span><span class="o">,</span> <span class="s">&quot; &quot;</span><span class="o">);</span>
+  <span class="n">TextOutputFormat</span><span class="o">.</span><span class="na">setOutputPath</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">outputPath</span><span class="o">));</span>
+
+  <span class="c1">// Output &amp; Execute</span>
+  <span class="n">words</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">);</span>
+  <span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="s">&quot;Hadoop Compat WordCount&quot;</span><span class="o">);</span>
+<span class="o">}</span>
+</code></pre></div>
+<p>As you can see, Flink represents Hadoop key-value pairs as <code>Tuple2&lt;key, value&gt;</code> tuples. Note, that the program uses Flink’s <code>groupBy()</code> transformation to group data on the key field (field 0 of the <code>Tuple2&lt;key, value&gt;</code>) before it is given to the Reducer function. At the moment, the compatibility package does not evaluate custom Hadoop partitioners, sorting comparators, or grouping comparators.</p>
+
+<p>Hadoop functions can be used at any position within a Flink program and of course also be mixed with native Flink functions. This means that instead of assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler such as <a href="http://oozie.apache.org">Apache Oozie</a>, you can implement an arbitrary complex Flink program consisting of multiple Hadoop Input- and OutputFormats, Mapper and Reducer functions. When executing such a Flink program, data will be pipelined between your Hadoop functions and will not be written to HDFS just for the purpose of data exchange.</p>
+
+<p><center>
+<img src="/img/blog/hcompat-flow.png" style="width:100%;margin:15px">
+</center></p>
+
+<h2 id="what-comes-next?">What comes next?</h2>
+
+<p>While the Hadoop compatibility package is already very useful, we are currently working on a dedicated Hadoop Job operation to embed and execute Hadoop jobs as a whole in Flink programs, including their custom partitioning, sorting, and grouping code. With this feature, you will be able to chain multiple Hadoop jobs, mix them with Flink functions, and other operations such as <a href="/docs/0.7-incubating/spargel_guide.html">Spargel</a> operations (Pregel/Giraph-style jobs).</p>
+
+<h2 id="summary">Summary</h2>
+
+<p>Flink lets you reuse a lot of the code you wrote for Hadoop MapReduce, including all data types, all Input- and OutputFormats, and Mapper and Reducers of the mapred-API. Hadoop functions can be used within Flink programs and mixed with all other Flink functions. Due to Flink’s pipelined execution, Hadoop functions can arbitrarily be assembled without data exchange via HDFS. Moreover, the Flink community is currently working on a dedicated Hadoop Job operation to supporting the execution of Hadoop jobs as a whole.</p>
+
+<p>If you want to use Flink’s Hadoop compatibility package checkout our <a href="/docs/0.7-incubating/hadoop_compatibility.html">documentation</a>.</p>
+
+<p><br>
+<small>Written by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>).</small></p>
+</div>
+				<a href="/news/2014/11/18/hadoop-compatibility.html#disqus_thread">Hadoop Compatibility in Flink</a>
+			</article>
+			
+			<article>
 				<h2><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a></h2>
 				<p class="meta">04 Nov 2014</p>
 				
@@ -774,30 +863,6 @@ You can now press the &quot;Run&quot; bu
 				<a href="/news/2014/01/28/querying_mongodb.html#disqus_thread">Accessing Data Stored in MongoDB with Stratosphere</a>
 			</article>
 			
-			<article>
-				<h2><a href="/news/2014/01/26/optimizer_plan_visualization_tool.html">Optimizer Plan Visualization Tool</a></h2>
-				<p class="meta">26 Jan 2014</p>
-				
-				<div><p>Stratosphere&#39;s hybrid approach combines <strong>MapReduce</strong> and <strong>MPP database</strong> techniques. One central part of this approach is to have a <strong>separation between the programming (API) and the way programs are executed</strong> <em>(execution plans)</em>. The <strong>compiler/optimizer</strong> decides the details concerning caching or when to partition/broadcast with a holistic view of the program. The same program may actually be executed differently in different scenarios (input data of different sizes, different number of machines).</p>
-
-<p><strong>If you want to know how exactly the system executes your program, you can find it out in two ways</strong>:</p>
-
-<ol>
-<li><p>The <strong>browser-based webclient UI</strong>, which takes programs packaged into JARs and draws the execution plan as a visual data flow (check out the <a href="http://stratosphere.eu/docs/0.4/program_execution/web_interface.html">documentation</a> for details).</p></li>
-<li><p>For <strong>programs using the <a href="http://stratosphere.eu/docs/0.4/program_execution/local_executor.html">Local- </a> or <a href="http://stratosphere.eu/docs/0.4/program_execution/remote_executor.html">Remote Executor</a></strong>, you can get the optimizer plan using the method <code>LocalExecutor.optimizerPlanAsJSON(plan)</code>. The <strong>resulting JSON</strong> string describes the execution strategies chosen by the optimizer. Naturally, you do not want to parse that yourself, especially for longer programs.</p></li>
-</ol>
-
-<p>The builds <em>0.5-SNAPSHOT</em> and later come with a <strong>tool that visualizes the JSON</strong> string. It is a standalone version of the webclient&#39;s visualization, packed as an html document <code>tools/planVisualizer.html</code>.</p>
-
-<p>If you open it in a browser (for example <code>chromium-browser tools/planVisualizer.html</code>) it shows a text area where you can paste the JSON string and it renders that string as a dataflow plan (assuming it was a valid JSON string and plan). The pictures below show how that looks for the <a href="https://github.com/stratosphere/stratosphere/blob/release-0.4/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/record/connectedcomponents/WorksetConnectedComponents.java?source=cc">included sample program</a> that uses delta iterations to compute the connected components of a graph.</p>
-
-<p><img src="/img/blog/plan_visualizer1.png" style="width:100%;"></p>
-
-<p><img src="/img/blog/plan_visualizer2.png" style="width:100%;"></p>
-</div>
-				<a href="/news/2014/01/26/optimizer_plan_visualization_tool.html#disqus_thread">Optimizer Plan Visualization Tool</a>
-			</article>
-			
 		</div>
 		<div class="col-md-2"></div>
 	</div>
@@ -827,7 +892,7 @@ var disqus_shortname = 'stratosphere-eu'
 	
 	</li>
 	<li>
-		<span class="page_number ">Page: 1 of 2</span>
+		<span class="page_number ">Page: 1 of 3</span>
 	</li>
 	<li>
 	

Modified: incubator/flink/site/blog/page2/index.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/blog/page2/index.html?rev=1640370&r1=1640369&r2=1640370&view=diff
==============================================================================
--- incubator/flink/site/blog/page2/index.html (original)
+++ incubator/flink/site/blog/page2/index.html Tue Nov 18 15:34:42 2014
@@ -116,6 +116,30 @@
 		<div class="col-md-8">
 			
 			<article>
+				<h2><a href="/news/2014/01/26/optimizer_plan_visualization_tool.html">Optimizer Plan Visualization Tool</a></h2>
+				<p class="meta">26 Jan 2014</p>
+				
+				<div><p>Stratosphere&#39;s hybrid approach combines <strong>MapReduce</strong> and <strong>MPP database</strong> techniques. One central part of this approach is to have a <strong>separation between the programming (API) and the way programs are executed</strong> <em>(execution plans)</em>. The <strong>compiler/optimizer</strong> decides the details concerning caching or when to partition/broadcast with a holistic view of the program. The same program may actually be executed differently in different scenarios (input data of different sizes, different number of machines).</p>
+
+<p><strong>If you want to know how exactly the system executes your program, you can find it out in two ways</strong>:</p>
+
+<ol>
+<li><p>The <strong>browser-based webclient UI</strong>, which takes programs packaged into JARs and draws the execution plan as a visual data flow (check out the <a href="http://stratosphere.eu/docs/0.4/program_execution/web_interface.html">documentation</a> for details).</p></li>
+<li><p>For <strong>programs using the <a href="http://stratosphere.eu/docs/0.4/program_execution/local_executor.html">Local- </a> or <a href="http://stratosphere.eu/docs/0.4/program_execution/remote_executor.html">Remote Executor</a></strong>, you can get the optimizer plan using the method <code>LocalExecutor.optimizerPlanAsJSON(plan)</code>. The <strong>resulting JSON</strong> string describes the execution strategies chosen by the optimizer. Naturally, you do not want to parse that yourself, especially for longer programs.</p></li>
+</ol>
+
+<p>The builds <em>0.5-SNAPSHOT</em> and later come with a <strong>tool that visualizes the JSON</strong> string. It is a standalone version of the webclient&#39;s visualization, packed as an html document <code>tools/planVisualizer.html</code>.</p>
+
+<p>If you open it in a browser (for example <code>chromium-browser tools/planVisualizer.html</code>) it shows a text area where you can paste the JSON string and it renders that string as a dataflow plan (assuming it was a valid JSON string and plan). The pictures below show how that looks for the <a href="https://github.com/stratosphere/stratosphere/blob/release-0.4/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/record/connectedcomponents/WorksetConnectedComponents.java?source=cc">included sample program</a> that uses delta iterations to compute the connected components of a graph.</p>
+
+<p><img src="/img/blog/plan_visualizer1.png" style="width:100%;"></p>
+
+<p><img src="/img/blog/plan_visualizer2.png" style="width:100%;"></p>
+</div>
+				<a href="/news/2014/01/26/optimizer_plan_visualization_tool.html#disqus_thread">Optimizer Plan Visualization Tool</a>
+			</article>
+			
+			<article>
 				<h2><a href="/news/2014/01/13/stratosphere-release-0.4.html">Stratosphere 0.4 Released</a></h2>
 				<p class="meta">13 Jan 2014</p>
 				
@@ -427,23 +451,6 @@ We demonstrate our optimizer and a job s
 				<a href="/news/2012/10/15/icde2013.html#disqus_thread">Stratosphere Demo Accepted for ICDE 2013</a>
 			</article>
 			
-			<article>
-				<h2><a href="/news/2012/08/21/release02.html">Version 0.2 Released</a></h2>
-				<p class="meta">21 Aug 2012</p>
-				
-				<div><p>We are happy to announce that version 0.2 of the Stratosphere System has been released. It has a lot of performance improvements as well as a bunch of exciting new features like:</p>
-<ul>
-<li>The new Sopremo Algebra Layer and the Meteor Scripting Language</li>
-<li>The whole new tuple data model for the PACT API</li>
-<li>Fault tolerance through local checkpoints</li>
-<li>A ton of performance improvements on all layers</li>
-<li>Support for plug-ins on the data flow channel layer</li>
-<li>Many new library classes (for example new Input-/Output-Formats)</li>
-</ul>
-<p>For a complete list of new features, check out the <a href="https://stratosphere.eu/wiki/doku.php/wiki:changesrelease0.2">change log</a>.</p></div>
-				<a href="/news/2012/08/21/release02.html#disqus_thread">Version 0.2 Released</a>
-			</article>
-			
 		</div>
 		<div class="col-md-2"></div>
 	</div>
@@ -473,11 +480,11 @@ var disqus_shortname = 'stratosphere-eu'
 	
 	</li>
 	<li>
-		<span class="page_number ">Page: 2 of 2</span>
+		<span class="page_number ">Page: 2 of 3</span>
 	</li>
 	<li>
 	
-		<span>Next</span>
+		<a href="/blog/page-3" class="next">Next</a>
 	
 	</li>
 </ul>

Added: incubator/flink/site/blog/page3/index.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/blog/page3/index.html?rev=1640370&view=auto
==============================================================================
--- incubator/flink/site/blog/page3/index.html (added)
+++ incubator/flink/site/blog/page3/index.html Tue Nov 18 15:34:42 2014
@@ -0,0 +1,208 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <title>Apache Flink (incubating): Blog</title>
+    <link rel="stylesheet" href="/css/bootstrap.css">
+    <link rel="stylesheet" href="/css/bootstrap-lumen-custom.css">
+    <link rel="stylesheet" href="/css/syntax.css">
+    <link rel="stylesheet" href="/css/custom.css">
+    <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css" rel="stylesheet">
+    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
+    <script src="/js/bootstrap.min.js"></script>
+    <link rel="icon" type="image/png" href="/favicon.png" />
+
+  </head>
+  <body>
+
+<nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+  <div class="container">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <div class="logo-container">
+        <img src="/img/logo/png/50/color_50.png" id="logo-element"/>
+        <a class="navbar-brand" href="/index.html">Apache Flink</a>
+      </div>
+    </div>
+
+    <div class="collapse navbar-collapse" id="navbar-collapse-1">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Quickstart <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/0.7-incubating/setup_quickstart.html">Setup Flink</a></li>
+            <li><a href="/docs/0.7-incubating/java_api_quickstart.html">Java API</a></li>
+            <li><a href="/docs/0.7-incubating/scala_api_quickstart.html">Scala API</a></li>
+          </ul>
+        </li>
+
+        <li>
+          <a href="/downloads.html" class="">Downloads</a>
+        </li>
+
+        <li>
+          <a href="/docs/0.6-incubating/faq.html" class="">FAQ</a>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li role="presentation" class="dropdown-header">Current Stable:</li>
+            <li><a href="/docs/0.7-incubating/">0.7.0-incubating</a></li>
+            <li><a href="/docs/0.7-incubating/api/java">0.7.0-incubating Javadocs</a></li>
+            <li><a href="/docs/0.7-incubating/api/scala/index.html#org.apache.flink.api.scala.package">0.7.0-incubating Scaladocs</a></li>
+            <li class="divider"></li>
+            <li role="presentation" class="dropdown-header">Previous:</li>
+            <li><a href="/docs/0.6-incubating/">0.6-incubating</a></li>
+            <li><a href="/docs/0.6-incubating/api/java">0.6-incubating Javadocs</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Community <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a href="/community.html#mailing-lists">Mailing Lists</a></li>
+            <li><a href="/community.html#issues">Issues</a></li>
+            <li><a href="/community.html#team">Team</a></li>
+            <li class="divider"></li>
+            <li><a href="/how-to-contribute.html">How To Contribute</a></li>
+            <li><a href="/coding_guidelines.html">Coding Guidelines</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a class="extLink" href="http://www.apache.org/">Apache Software Foundation</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/how-it-works.html">How it works</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/thanks.html">Thanks</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://incubator.apache.org/projects/flink.html">Incubation Status page</a><i class="small-font-awesome fa fa-external-link"></i></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a class="extLink" href="/material.html">Material</a></li>
+            <li><a class="extLink" href="https://cwiki.apache.org/confluence/display/FLINK">Wiki</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="https://wiki.apache.org/incubator/StratosphereProposal">Incubator Proposal</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/licenses/LICENSE-2.0">License</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="https://github.com/apache/incubator-flink">Source Code</a><i class="small-font-awesome fa fa-external-link"></i></li>
+          </ul>
+        </li>
+
+        <li>
+          <a href="/blog/index.html" class="active">Blog</a>
+        </li>
+
+      </ul>
+    </div>
+  </div>
+</nav>
+
+    <div style="padding-top:70px" class="container">
+
+<div class="container">
+	<div class="row">
+		<div class="col-md-2"></div>
+		<div class="col-md-8">
+			
+			<article>
+				<h2><a href="/news/2012/08/21/release02.html">Version 0.2 Released</a></h2>
+				<p class="meta">21 Aug 2012</p>
+				
+				<div><p>We are happy to announce that version 0.2 of the Stratosphere System has been released. It has a lot of performance improvements as well as a bunch of exciting new features like:</p>
+<ul>
+<li>The new Sopremo Algebra Layer and the Meteor Scripting Language</li>
+<li>The whole new tuple data model for the PACT API</li>
+<li>Fault tolerance through local checkpoints</li>
+<li>A ton of performance improvements on all layers</li>
+<li>Support for plug-ins on the data flow channel layer</li>
+<li>Many new library classes (for example new Input-/Output-Formats)</li>
+</ul>
+<p>For a complete list of new features, check out the <a href="https://stratosphere.eu/wiki/doku.php/wiki:changesrelease0.2">change log</a>.</p></div>
+				<a href="/news/2012/08/21/release02.html#disqus_thread">Version 0.2 Released</a>
+			</article>
+			
+		</div>
+		<div class="col-md-2"></div>
+	</div>
+</div>
+
+<script type="text/javascript">
+/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+/* * * DON'T EDIT BELOW THIS LINE * * */
+(function () {
+    var s = document.createElement('script'); s.async = true;
+    s.type = 'text/javascript';
+    s.src = '//' + disqus_shortname + '.disqus.com/count.js';
+    (document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
+}());
+</script>
+    
+
+
+
+<!-- Pagination links -->
+<ul class="pager">
+	<li>
+	
+		<a href="/blog/page-2" class="previous">Previous</a>
+	
+	</li>
+	<li>
+		<span class="page_number ">Page: 3 of 3</span>
+	</li>
+	<li>
+	
+		<span>Next</span>
+	
+	</li>
+</ul>
+
+
+     <div class="footer">
+
+<hr class="divider">
+
+<p><small>Apache Flink is an effort undergoing incubation at The Apache Software
+Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is
+required of all newly accepted projects until a further review indicates that
+the infrastructure, communications, and decision making process have
+stabilized in a manner consistent with other successful ASF projects. While
+incubation status is not necessarily a reflection of the completeness or
+stability of the code, it does indicate that the project has yet to be fully
+endorsed by the ASF.</small></p>
+
+<p><a href="http://incubator.apache.org/"><img src="/img/apache-incubator-logo.png" alt="Incubator Logo"></a></p>
+
+<p class="text-center"><a href="/privacy-policy.html">Privacy Policy<a></p>
+
+      </div>
+    </div>
+
+    
+
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+
+    </script>
+
+  </body>
+</html>

Added: incubator/flink/site/img/blog/hcompat-flow.png
URL: http://svn.apache.org/viewvc/incubator/flink/site/img/blog/hcompat-flow.png?rev=1640370&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/flink/site/img/blog/hcompat-flow.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/flink/site/img/blog/hcompat-logos.png
URL: http://svn.apache.org/viewvc/incubator/flink/site/img/blog/hcompat-logos.png?rev=1640370&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/flink/site/img/blog/hcompat-logos.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/flink/site/index.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/index.html?rev=1640370&r1=1640369&r2=1640370&view=diff
==============================================================================
--- incubator/flink/site/index.html (original)
+++ incubator/flink/site/index.html Tue Nov 18 15:34:42 2014
@@ -164,6 +164,8 @@ $( document ).ready(function() {
         <h5 style="margin:0px;">Recent Blog Posts</h5>
         <ul style="list-style-position:inside;margin:0;padding:0;">
           
+                    <li style="list-style-type: none; list-style-position:inside;"><small><a href="/news/2014/11/18/hadoop-compatibility.html">Hadoop Compatibility in Flink</a> (18 Nov 2014)</small></li>
+          
                     <li style="list-style-type: none; list-style-position:inside;"><small><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a> (04 Nov 2014)</small></li>
           
                     <li style="list-style-type: none; list-style-position:inside;"><small><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a> (03 Oct 2014)</small></li>
@@ -174,8 +176,6 @@ $( document ).ready(function() {
           
                     <li style="list-style-type: none; list-style-position:inside;"><small><a href="/news/2014/05/31/release-0.5.html">Stratosphere version 0.5 available</a> (31 May 2014)</small></li>
           
-                    <li style="list-style-type: none; list-style-position:inside;"><small><a href="/news/2014/04/16/stratosphere-goes-apache-incubator.html">Stratosphere accepted as Apache Incubator Project</a> (16 Apr 2014)</small></li>
-          
         </ul>
       </div>
     </div>

Added: incubator/flink/site/news/2014/11/18/hadoop-compatibility.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/news/2014/11/18/hadoop-compatibility.html?rev=1640370&view=auto
==============================================================================
--- incubator/flink/site/news/2014/11/18/hadoop-compatibility.html (added)
+++ incubator/flink/site/news/2014/11/18/hadoop-compatibility.html Tue Nov 18 15:34:42 2014
@@ -0,0 +1,271 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <title>Apache Flink (incubating): Hadoop Compatibility in Flink</title>
+    <link rel="stylesheet" href="/css/bootstrap.css">
+    <link rel="stylesheet" href="/css/bootstrap-lumen-custom.css">
+    <link rel="stylesheet" href="/css/syntax.css">
+    <link rel="stylesheet" href="/css/custom.css">
+    <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css" rel="stylesheet">
+    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
+    <script src="/js/bootstrap.min.js"></script>
+    <link rel="icon" type="image/png" href="/favicon.png" />
+
+  </head>
+  <body>
+
+<nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+  <div class="container">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <div class="logo-container">
+        <img src="/img/logo/png/50/color_50.png" id="logo-element"/>
+        <a class="navbar-brand" href="/index.html">Apache Flink</a>
+      </div>
+    </div>
+
+    <div class="collapse navbar-collapse" id="navbar-collapse-1">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Quickstart <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/0.7-incubating/setup_quickstart.html">Setup Flink</a></li>
+            <li><a href="/docs/0.7-incubating/java_api_quickstart.html">Java API</a></li>
+            <li><a href="/docs/0.7-incubating/scala_api_quickstart.html">Scala API</a></li>
+          </ul>
+        </li>
+
+        <li>
+          <a href="/downloads.html" class="">Downloads</a>
+        </li>
+
+        <li>
+          <a href="/docs/0.6-incubating/faq.html" class="">FAQ</a>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li role="presentation" class="dropdown-header">Current Stable:</li>
+            <li><a href="/docs/0.7-incubating/">0.7.0-incubating</a></li>
+            <li><a href="/docs/0.7-incubating/api/java">0.7.0-incubating Javadocs</a></li>
+            <li><a href="/docs/0.7-incubating/api/scala/index.html#org.apache.flink.api.scala.package">0.7.0-incubating Scaladocs</a></li>
+            <li class="divider"></li>
+            <li role="presentation" class="dropdown-header">Previous:</li>
+            <li><a href="/docs/0.6-incubating/">0.6-incubating</a></li>
+            <li><a href="/docs/0.6-incubating/api/java">0.6-incubating Javadocs</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Community <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a href="/community.html#mailing-lists">Mailing Lists</a></li>
+            <li><a href="/community.html#issues">Issues</a></li>
+            <li><a href="/community.html#team">Team</a></li>
+            <li class="divider"></li>
+            <li><a href="/how-to-contribute.html">How To Contribute</a></li>
+            <li><a href="/coding_guidelines.html">Coding Guidelines</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a class="extLink" href="http://www.apache.org/">Apache Software Foundation</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/how-it-works.html">How it works</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/thanks.html">Thanks</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://incubator.apache.org/projects/flink.html">Incubation Status page</a><i class="small-font-awesome fa fa-external-link"></i></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project <b class="caret"></b></a>
+          <ul class="dropdown-menu">
+            <li><a class="extLink" href="/material.html">Material</a></li>
+            <li><a class="extLink" href="https://cwiki.apache.org/confluence/display/FLINK">Wiki</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="https://wiki.apache.org/incubator/StratosphereProposal">Incubator Proposal</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="http://www.apache.org/licenses/LICENSE-2.0">License</a><i class="small-font-awesome fa fa-external-link"></i></li>
+            <li><a class="extLink" href="https://github.com/apache/incubator-flink">Source Code</a><i class="small-font-awesome fa fa-external-link"></i></li>
+          </ul>
+        </li>
+
+        <li>
+          <a href="/blog/index.html" class="">Blog</a>
+        </li>
+
+      </ul>
+    </div>
+  </div>
+</nav>
+
+    <div style="padding-top:70px" class="container">
+
+<div class="container">
+	<div class="row">
+		<div class="col-md-2"></div>
+		<div class="col-md-8">
+			<article>
+			  <h2>Hadoop Compatibility in Flink</h2>
+			  <p class="meta">18 Nov 2014</p>
+			    
+			  <div>
+				<p><a href="http://hadoop.apache.org">Apache Hadoop</a> is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. While being conceptually equivalent, Hadoop’s MapReduce and Flink’s interfaces for these functions are unfortunately not source compatible.</p>
+
+<h2 id="flink’s-hadoop-compatibility-package">Flink’s Hadoop Compatibility Package</h2>
+
+<p><center>
+<img src="/img/blog/hcompat-logos.png" style="width:30%;margin:15px">
+</center></p>
+
+<p>To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a <a href="https://developers.google.com/open-source/soc/">Google Summer of Code</a> 2014 project. </p>
+
+<p>With the Hadoop Compatibility package, you can reuse all your Hadoop</p>
+
+<ul>
+<li><code>InputFormats</code> (mapred and mapreduce APIs)</li>
+<li><code>OutputFormats</code> (mapred and mapreduce APIs)</li>
+<li><code>Mappers</code> (mapred API)</li>
+<li><code>Reducers</code> (mapred API)</li>
+</ul>
+
+<p>in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (<code>Writables</code> and <code>WritableComparable</code>).</p>
+
+<p>The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions. </p>
+<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Definition of Hadoop Mapper function</span>
+<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Tokenizer</span> <span class="kd">implements</span> <span class="n">Mapper</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="c1">// Definition of Hadoop Reducer function</span>
+<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Counter</span> <span class="kd">implements</span> <span class="n">Reducer</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+
+<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+  <span class="kd">final</span> <span class="n">String</span> <span class="n">inputPath</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
+  <span class="kd">final</span> <span class="n">String</span> <span class="n">outputPath</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
+
+  <span class="kd">final</span> <span class="n">ExecutionEnvironment</span> <span class="n">env</span> <span class="o">=</span> <span class="n">ExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span><span class="o">();</span>
+
+  <span class="c1">// Setup Hadoop’s TextInputFormat</span>
+  <span class="n">HadoopInputFormat</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;</span> <span class="n">hadoopInputFormat</span> <span class="o">=</span> 
+      <span class="k">new</span> <span class="n">HadoopInputFormat</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;(</span>
+        <span class="k">new</span> <span class="nf">TextInputFormat</span><span class="o">(),</span> <span class="n">LongWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span>
+  <span class="n">TextInputFormat</span><span class="o">.</span><span class="na">addInputPath</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">inputPath</span><span class="o">));</span>
+
+  <span class="c1">// Read a DataSet with the Hadoop InputFormat</span>
+  <span class="n">DataSet</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">&gt;&gt;</span> <span class="n">text</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">createInput</span><span class="o">(</span><span class="n">hadoopInputFormat</span><span class="o">);</span>
+  <span class="n">DataSet</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">text</span>
+    <span class="c1">// Wrap Tokenizer Mapper function</span>
+    <span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopMapFunction</span><span class="o">&lt;</span><span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span><span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">()))</span>
+    <span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
+    <span class="c1">// Wrap Counter Reducer function (used as Reducer and Combiner)</span>
+    <span class="o">.</span><span class="na">reduceGroup</span><span class="o">(</span><span class="k">new</span> <span class="n">HadoopReduceCombineFunction</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">,</span> <span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span>
+      <span class="k">new</span> <span class="nf">Counter</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Counter</span><span class="o">()));</span>
+
+  <span class="c1">// Setup Hadoop’s TextOutputFormat</span>
+  <span class="n">HadoopOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;</span> <span class="n">hadoopOutputFormat</span> <span class="o">=</span> 
+    <span class="k">new</span> <span class="n">HadoopOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(</span>
+      <span class="k">new</span> <span class="n">TextOutputFormat</span><span class="o">&lt;</span><span class="n">Text</span><span class="o">,</span> <span class="n">LongWritable</span><span class="o">&gt;(),</span> <span class="k">new</span> <span class="nf">JobConf</span><span class="o">());</span>
+  <span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">().</span><span class="na">set</span><span class="o">(</span><span class="s">&quot;mapred.textoutputformat.separator&quot;</span><span class="o">,</span> <span class="s">&quot; &quot;</span><span class="o">);</span>
+  <span class="n">TextOutputFormat</span><span class="o">.</span><span class="na">setOutputPath</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">.</span><span class="na">getJobConf</span><span class="o">(),</span> <span class="k">new</span> <span class="nf">Path</span><span class="o">(</span><span class="n">outputPath</span><span class="o">));</span>
+
+  <span class="c1">// Output &amp; Execute</span>
+  <span class="n">words</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">hadoopOutputFormat</span><span class="o">);</span>
+  <span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="s">&quot;Hadoop Compat WordCount&quot;</span><span class="o">);</span>
+<span class="o">}</span>
+</code></pre></div>
+<p>As you can see, Flink represents Hadoop key-value pairs as <code>Tuple2&lt;key, value&gt;</code> tuples. Note, that the program uses Flink’s <code>groupBy()</code> transformation to group data on the key field (field 0 of the <code>Tuple2&lt;key, value&gt;</code>) before it is given to the Reducer function. At the moment, the compatibility package does not evaluate custom Hadoop partitioners, sorting comparators, or grouping comparators.</p>
+
+<p>Hadoop functions can be used at any position within a Flink program and of course also be mixed with native Flink functions. This means that instead of assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler such as <a href="http://oozie.apache.org">Apache Oozie</a>, you can implement an arbitrary complex Flink program consisting of multiple Hadoop Input- and OutputFormats, Mapper and Reducer functions. When executing such a Flink program, data will be pipelined between your Hadoop functions and will not be written to HDFS just for the purpose of data exchange.</p>
+
+<p><center>
+<img src="/img/blog/hcompat-flow.png" style="width:100%;margin:15px">
+</center></p>
+
+<h2 id="what-comes-next?">What comes next?</h2>
+
+<p>While the Hadoop compatibility package is already very useful, we are currently working on a dedicated Hadoop Job operation to embed and execute Hadoop jobs as a whole in Flink programs, including their custom partitioning, sorting, and grouping code. With this feature, you will be able to chain multiple Hadoop jobs, mix them with Flink functions, and other operations such as <a href="/docs/0.7-incubating/spargel_guide.html">Spargel</a> operations (Pregel/Giraph-style jobs).</p>
+
+<h2 id="summary">Summary</h2>
+
+<p>Flink lets you reuse a lot of the code you wrote for Hadoop MapReduce, including all data types, all Input- and OutputFormats, and Mapper and Reducers of the mapred-API. Hadoop functions can be used within Flink programs and mixed with all other Flink functions. Due to Flink’s pipelined execution, Hadoop functions can arbitrarily be assembled without data exchange via HDFS. Moreover, the Flink community is currently working on a dedicated Hadoop Job operation to supporting the execution of Hadoop jobs as a whole.</p>
+
+<p>If you want to use Flink’s Hadoop compatibility package checkout our <a href="/docs/0.7-incubating/hadoop_compatibility.html">documentation</a>.</p>
+
+<p><br>
+<small>Written by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>).</small></p>
+
+
+			  </div>
+			</article>
+		</div>
+		<div class="col-md-2"></div>
+	</div>
+	<div class="row" style="padding-top:30px">
+		<div class="col-md-2"></div>
+		<div class="col-md-8">
+			    <div id="disqus_thread"></div>
+			    <script type="text/javascript">
+			        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+			        var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+			        /* * * DON'T EDIT BELOW THIS LINE * * */
+			        (function() {
+			            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+			            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+			            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+			        })();
+			    </script>
+			    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+			    <a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
+			    
+		</div>
+		<div class="col-md-2"></div>
+	</div>
+
+
+</div>
+
+
+
+     <div class="footer">
+
+<hr class="divider">
+
+<p><small>Apache Flink is an effort undergoing incubation at The Apache Software
+Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is
+required of all newly accepted projects until a further review indicates that
+the infrastructure, communications, and decision making process have
+stabilized in a manner consistent with other successful ASF projects. While
+incubation status is not necessarily a reflection of the completeness or
+stability of the code, it does indicate that the project has yet to be fully
+endorsed by the ASF.</small></p>
+
+<p><a href="http://incubator.apache.org/"><img src="/img/apache-incubator-logo.png" alt="Incubator Logo"></a></p>
+
+<p class="text-center"><a href="/privacy-policy.html">Privacy Policy<a></p>
+
+      </div>
+    </div>
+
+    
+
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+
+    </script>
+
+  </body>
+</html>