You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by rm...@apache.org on 2014/07/14 16:10:07 UTC

svn commit: r1610413 [4/4] - in /incubator/flink: ./ _layouts/ site/ site/docs/0.6-SNAPSHOT/ site/docs/0.6-SNAPSHOT/img/

Modified: incubator/flink/site/docs/0.6-SNAPSHOT/yarn_setup.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/docs/0.6-SNAPSHOT/yarn_setup.html?rev=1610413&r1=1610412&r2=1610413&view=diff
==============================================================================
--- incubator/flink/site/docs/0.6-SNAPSHOT/yarn_setup.html (original)
+++ incubator/flink/site/docs/0.6-SNAPSHOT/yarn_setup.html Mon Jul 14 14:10:06 2014
@@ -109,6 +109,7 @@
       <li>Setup &amp; Configuration
         <ul>
           <li><a href="local_setup.html">Local Setup</a></li>
+          <li><a href="building.html">Build Flink</a></li>
           <li><a href="cluster_setup.html">Cluster Setup</a></li>
           <li><a href="yarn_setup.html">YARN Setup</a></li>
           <li><a href="config.html">Configuration</a></li>
@@ -159,10 +160,10 @@
 <a href="#introducing-yarn">Introducing YARN</a>
 <ul>
 <li>
-<a href="#start-stratosphere-session">Start Stratosphere Session</a>
+<a href="#start-flink-session">Start Flink Session</a>
 <ul>
 <li>
-<a href="#download-stratosphere-for-yarn">Download Stratosphere for YARN</a>
+<a href="#download-flink-for-yarn">Download Flink for YARN</a>
 </li>
 <li>
 <a href="#start-a-session">Start a Session</a>
@@ -172,10 +173,10 @@
 </ul>
 </li>
 <li>
-<a href="#submit-job-to-stratosphere">Submit Job to Stratosphere</a>
+<a href="#submit-job-to-flink">Submit Job to Flink</a>
 </li>
 <li>
-<a href="#build-stratosphere-for-a-specific-hadoop-version">Build Stratosphere for a specific Hadoop Version</a>
+<a href="#build-yarn-client-for-a-specific-hadoop-version">Build YARN client for a specific Hadoop version</a>
 </li>
 <li>
 <a href="#background">Background</a>
@@ -187,13 +188,13 @@
 
 <p>Start YARN session with 4 Taskmanagers (each with 4 GB of Heapspace):</p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash">wget https://github.com/stratosphere/stratosphere/releases/download/release-0.5.1/stratosphere-bin-0.5.1-yarn.tar.gz
-tar xvzf stratosphere-dist-0.5.1-yarn.tar.gz
-<span class="nb">cd </span>stratosphere-yarn-0.5.1/
+tar xvzf flink-dist-0.5.1-yarn.tar.gz
+<span class="nb">cd </span>flink-yarn-0.5.1/
 ./bin/yarn-session.sh -n <span class="m">4</span> -jm <span class="m">1024</span> -tm 4096
 </code></pre></div>
 <h1 id="introducing-yarn">Introducing YARN</h1>
 
-<p>Apache <a href="http://hadoop.apache.org/">Hadoop YARN</a> is a cluster resource management framework. It allows to run various distributed applications on top of a cluster. Stratosphere runs on YARN next to other applications. Users do not have to setup or install anything if there is already a YARN setup.</p>
+<p>Apache <a href="http://hadoop.apache.org/">Hadoop YARN</a> is a cluster resource management framework. It allows to run various distributed applications on top of a cluster. Flink runs on YARN next to other applications. Users do not have to setup or install anything if there is already a YARN setup.</p>
 
 <p><strong>Requirements</strong></p>
 
@@ -202,23 +203,23 @@ tar xvzf stratosphere-dist-0.5.1-yarn.ta
 <li>HDFS</li>
 </ul>
 
-<p>If you have troubles using the Stratosphere YARN client, have a look in the <a href="/docs/0.5/general/faq.html">FAQ section</a>.</p>
+<p>If you have troubles using the Flink YARN client, have a look in the <a href="/docs/0.5/general/faq.html">FAQ section</a>.</p>
 
-<h2 id="start-stratosphere-session">Start Stratosphere Session</h2>
+<h2 id="start-flink-session">Start Flink Session</h2>
 
-<p>Follow these instructions to learn how to launch a Stratosphere Session within your YARN cluster.</p>
+<p>Follow these instructions to learn how to launch a Flink Session within your YARN cluster.</p>
 
-<p>A session will start all required Stratosphere services (JobManager and TaskManagers) so that you can submit programs to the cluster. Note that you can run multiple programs per session.</p>
+<p>A session will start all required Flink services (JobManager and TaskManagers) so that you can submit programs to the cluster. Note that you can run multiple programs per session.</p>
 
-<h3 id="download-stratosphere-for-yarn">Download Stratosphere for YARN</h3>
+<h3 id="download-flink-for-yarn">Download Flink for YARN</h3>
 
 <p>Download the YARN tgz package on the <a href="/downloads/#nightly">download page</a>. It contains the required files.</p>
 
-<p>If you want to build the YARN .tgz file from sources, follow the build instructions. Make sure to use the <code>-Dhadoop.profile=2</code> profile. You can find the file in <code>stratosphere-dist/target/stratosphere-dist--yarn.tar.gz</code> (<em>Note: The version might be different for you</em> ).</p>
+<p>If you want to build the YARN .tgz file from sources, follow the build instructions. Make sure to use the <code>-Dhadoop.profile=2</code> profile. You can find the file in <code>flink-dist/target/flink-dist--yarn.tar.gz</code> (<em>Note: The version might be different for you</em> ).</p>
 
 <p>Extract the package using:</p>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">tar xvzf stratosphere-dist-0.5.1-yarn.tar.gz
-<span class="nb">cd </span>stratosphere-yarn-0.5.1/
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">tar xvzf flink-dist-0.5.1-yarn.tar.gz
+<span class="nb">cd </span>flink-yarn-0.5.1/
 </code></pre></div>
 <h3 id="start-a-session">Start a Session</h3>
 
@@ -242,35 +243,35 @@ tar xvzf stratosphere-dist-0.5.1-yarn.ta
 <p><strong>Example:</strong> Issue the following command to allocate 10 TaskTrackers, with 8 GB of memory each:</p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/yarn-session.sh -n <span class="m">10</span> -tm 8192
 </code></pre></div>
-<p>The system will use the configuration in <code>conf/stratosphere-config.yaml</code>. Please follow our <a href="config.html">configuration guide</a> if you want to change something. Stratosphere on YARN will overwrite the following configuration parameters <code>jobmanager.rpc.address</code> (because the JobManager is always allocated at different machines) and <code>taskmanager.tmp.dirs</code> (we are using the tmp directories given by YARN).</p>
+<p>The system will use the configuration in <code>conf/flink-config.yaml</code>. Please follow our <a href="config.html">configuration guide</a> if you want to change something. Flink on YARN will overwrite the following configuration parameters <code>jobmanager.rpc.address</code> (because the JobManager is always allocated at different machines) and <code>taskmanager.tmp.dirs</code> (we are using the tmp directories given by YARN).</p>
 
 <p>The example invocation starts 11 containers, since there is one additional container for the ApplicationMaster and JobTracker.</p>
 
-<p>Once Stratosphere is deployed in your YARN cluster, it will show you the connection details of the JobTracker.</p>
+<p>Once Flink is deployed in your YARN cluster, it will show you the connection details of the JobTracker.</p>
 
 <p>The client has to remain open to keep the deployment running. We suggest to use <code>screen</code>, which will start a detachable shell:</p>
 
 <ol>
 <li>Open <code>screen</code>,</li>
-<li>Start Stratosphere on YARN,</li>
+<li>Start Flink on YARN,</li>
 <li>Use <code>CTRL+a</code>, then press <code>d</code> to detach the screen session,</li>
 <li>Use <code>screen -r</code> to resume again.</li>
 </ol>
 
-<h1 id="submit-job-to-stratosphere">Submit Job to Stratosphere</h1>
+<h1 id="submit-job-to-flink">Submit Job to Flink</h1>
 
-<p>Use the following command to submit a Stratosphere program to the YARN cluster:</p>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/stratosphere
+<p>Use the following command to submit a Flink program to the YARN cluster:</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/flink
 </code></pre></div>
 <p>Please refer to the documentation of the <a href="cli.html">commandline client</a>.</p>
 
 <p>The command will show you a help menu like this:</p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="o">[</span>...<span class="o">]</span>
-Action <span class="s2">&quot;run&quot;</span> compiles and submits a Stratosphere program.
+Action <span class="s2">&quot;run&quot;</span> compiles and submits a Flink program.
   <span class="s2">&quot;run&quot;</span> action arguments:
      -a,--arguments &lt;programArgs&gt;   Program arguments
      -c,--class &lt;classname&gt;         Program class
-     -j,--jarfile &lt;jarfile&gt;         Stratosphere program JAR file
+     -j,--jarfile &lt;jarfile&gt;         Flink program JAR file
      -m,--jobmanager &lt;host:port&gt;    Jobmanager to which the program is submitted
      -w,--wait                      Wait <span class="k">for</span> program to finish
 <span class="o">[</span>...<span class="o">]</span>
@@ -280,49 +281,26 @@ Action <span class="s2">&quot;run&quot;<
 <p><strong>Example</strong></p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash">wget -O apache-license-v2.txt http://www.apache.org/licenses/LICENSE-2.0.txt
 
-./bin/stratosphere run -j ./examples/stratosphere-java-examples-0.5.1-WordCount.jar <span class="se">\</span>
+./bin/flink run -j ./examples/flink-java-examples-0.5.1-WordCount.jar <span class="se">\</span>
                        -a <span class="m">1</span> file://<span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/apache-license-v2.txt file://<span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/wordcount-result.txt 
 </code></pre></div>
 <p>If there is the following error, make sure that all TaskManagers started:</p>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">Exception in thread <span class="s2">&quot;main&quot;</span> eu.stratosphere.compiler.CompilerException:
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">Exception in thread <span class="s2">&quot;main&quot;</span> org.apache.flinkcompiler.CompilerException:
     Available instances could not be determined from job manager: Connection timed out.
 </code></pre></div>
 <p>You can check the number of TaskManagers in the JobManager web interface. The address of this interface is printed in the YARN session console.</p>
 
 <p>If the TaskManagers do not show up after a minute, you should investigate the issue using the log files.</p>
 
-<h1 id="build-stratosphere-for-a-specific-hadoop-version">Build Stratosphere for a specific Hadoop Version</h1>
+<h1 id="build-yarn-client-for-a-specific-hadoop-version">Build YARN client for a specific Hadoop version</h1>
 
-<p>This section covers building Stratosphere for a specific Hadoop version. Most users do not need to do this manually.
-The problem is that Stratosphere uses HDFS and YARN which are both from Apache Hadoop. There exist many different builds of Hadoop (from both the upstream project and the different Hadoop distributions). Typically errors arise with the RPC services. An error could look like this:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ERROR: The job was not successfully submitted to the nephele job manager:
-    eu.stratosphere.nephele.executiongraph.GraphConversionException: Cannot compute input splits for TSV:
-    java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
-    Protocol message contained an invalid tag (zero).; Host Details :
-</code></pre></div>
-<p><strong>Example</strong></p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">mvn -Dhadoop.profile=2 -Pcdh-repo -Dhadoop.version=2.2.0-cdh5.0.0-beta-2 -DskipTests package
-</code></pre></div>
-<p>The commands in detail:</p>
-
-<ul>
-<li> <code>-Dhadoop.profile=2</code> activates the Hadoop YARN profile of Stratosphere. This will enable all components of Stratosphere that are compatible with Hadoop 2.2</li>
-<li> <code>-Pcdh-repo</code> activates the Cloudera Hadoop dependencies. If you want other vendor&#39;s Hadoop dependencies (not in maven central) add the repository to your local maven configuration in <code>~/.m2/</code>.</li>
-<li><code>-Dhadoop.version=2.2.0-cdh5.0.0-beta-2</code> sets a special version of the Hadoop dependencies. Make sure that the specified Hadoop version is compatible with the profile you activated.</li>
-</ul>
-
-<p>If you want to build HDFS for Hadoop 2 without YARN, use the following parameter:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">-P!include-yarn
-</code></pre></div>
-<p>Some Cloudera versions (such as <code>2.0.0-cdh4.2.0</code>) require this, since they have a new HDFS version with the old YARN API.</p>
-
-<p>Please post to the <em>Stratosphere mailinglist</em>(<a href="mailto:dev@flink.incubator.apache.org">dev@flink.incubator.apache.org</a>) or create an issue on <a href="https://issues.apache.org/jira/browse/FLINK">Jira</a>, if you have issues with your YARN setup and Stratosphere.</p>
+<p>Users using Hadoop distributions from companies like Hortonworks, Cloudera or MapR might have to build Flink against their specific versions of Hadoop (HDFS) and YARN. Please read the <a href="building.html">build instructions</a> for more details.</p>
 
 <h1 id="background">Background</h1>
 
-<p>This section briefly describes how Stratosphere and YARN interact. </p>
+<p>This section briefly describes how Flink and YARN interact. </p>
 
-<p><img src="img/StratosphereOnYarn.svg" class="img-responsive"></p>
+<p><img src="img/FlinkOnYarn.svg" class="img-responsive"></p>
 
 <p>The YARN client needs to access the Hadoop configuration to connect to the YARN resource manager and to HDFS. It determines the Hadoop configuration using the following strategy:</p>
 
@@ -331,13 +309,13 @@ The problem is that Stratosphere uses HD
 <li>If the above strategy fails (this should not be the case in a correct YARN setup), the client is using the <code>HADOOP_HOME</code> environment variable. If it is set, the client tries to access <code>$HADOOP_HOME/etc/hadoop</code> (Hadoop 2) and <code>$HADOOP_HOME/conf</code> (Hadoop 1).</li>
 </ul>
 
-<p>When starting a new Stratosphere YARN session, the client first checks if the requested resources (containers and memory) are available. After that, it uploads a jar that contains Stratosphere and the configuration to HDFS (step 1).</p>
+<p>When starting a new Flink YARN session, the client first checks if the requested resources (containers and memory) are available. After that, it uploads a jar that contains Flink and the configuration to HDFS (step 1).</p>
 
 <p>The next step of the client is to request (step 2) a YARN container to start the <em>ApplicationMaster</em> (step 3). Since the client registered the configuration and jar-file as a resource for the container, the NodeManager of YARN running on that particular machine will take care of preparing the container (e.g. downloading the files). Once that has finished, the <em>ApplicationMaster</em> (AM) is started.</p>
 
-<p>The <em>JobManager</em> and AM are running in the same container. Once they successfully started, the AM knows the address of the JobManager (its own host). It is generating a new Stratosphere configuration file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded to HDFS. Additionally, the <em>AM</em> container is also serving Stratosphere&#39;s web interface.</p>
+<p>The <em>JobManager</em> and AM are running in the same container. Once they successfully started, the AM knows the address of the JobManager (its own host). It is generating a new Flink configuration file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded to HDFS. Additionally, the <em>AM</em> container is also serving Flink&#39;s web interface.</p>
 
-<p>After that, the AM starts allocating the containers for Stratosphere&#39;s TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Stratosphere is set up and ready to accept Jobs.</p>
+<p>After that, the AM starts allocating the containers for Flink&#39;s TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Flink is set up and ready to accept Jobs.</p>
 
 
       <div style="padding-top:30px" id="disqus_thread"></div>

Modified: incubator/flink/site/how-to-contribute.html
URL: http://svn.apache.org/viewvc/incubator/flink/site/how-to-contribute.html?rev=1610413&r1=1610412&r2=1610413&view=diff
==============================================================================
--- incubator/flink/site/how-to-contribute.html (original)
+++ incubator/flink/site/how-to-contribute.html Mon Jul 14 14:10:06 2014
@@ -149,7 +149,35 @@
 <li><p>It is typically helpful to switch to a <em>topic branch</em> for the changes. To create a dedicated branch based on the current master, use the following command:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">git checkout -b myBranch master
 </code></pre></div></li>
-<li><p>Now you can create your changes, compile the code, and validate the changes. Here are some pointers on how to <a href="https://github.com/apache/incubator-flink/#eclipse-setup-and-debugging">set up the Eclipse IDE for development</a>, and how to <a href="https://github.com/apache/incubator-flink/#build-stratosphere">build the code</a>.</p></li>
+<li><p>Now you can create your changes, compile the code, and validate the changes. Here are some pointers on how to <a href="https://github.com/apache/incubator-flink/#build-apache-flink">build the code</a>.
+In addition to that, we recommend setting up Eclipse (or IntelliJ) using the &quot;Import Maven Project&quot; feature. If you want to work on the scala code you will need the following plugins:</p>
+
+<p>Eclipse 4.x:</p>
+
+<ul>
+<li>scala-ide: <a href="http://download.scala-ide.org/sdk/e38/scala210/stable/site">http://download.scala-ide.org/sdk/e38/scala210/stable/site</a></li>
+<li>m2eclipse-scala: <a href="http://alchim31.free.fr/m2e-scala/update-site">http://alchim31.free.fr/m2e-scala/update-site</a></li>
+<li>build-helper-maven-plugin: <a href="https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/">https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/</a></li>
+</ul>
+
+<p>Eclipse 3.7:</p>
+
+<ul>
+<li>scala-ide: <a href="http://download.scala-ide.org/sdk/e37/scala210/stable/site">http://download.scala-ide.org/sdk/e37/scala210/stable/site</a></li>
+<li>m2eclipse-scala: <a href="http://alchim31.free.fr/m2e-scala/update-site">http://alchim31.free.fr/m2e-scala/update-site</a></li>
+<li>build-helper-maven-plugin: <a href="https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/">https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/</a></li>
+</ul>
+
+<p>When you don&#39;t have the plugins your project will have build errors, you can just close the scala projects and ignore them.</p>
+
+<p>Import the Flink source code using Maven&#39;s Import tool:</p>
+
+<ul>
+<li>Select &quot;Import&quot; from the &quot;File&quot;-menu.</li>
+<li>Expand &quot;Maven&quot; node, select &quot;Existing Maven Projects&quot;, and click &quot;next&quot; button</li>
+<li>Select the root directory by clicking on the &quot;Browse&quot; button and navigate to the top folder of the cloned Flink git repository.</li>
+<li>Ensure that all projects are selected and click the &quot;Finish&quot; button.</li>
+</ul></li>
 <li><p>After you have finalized your contribution, verify the compliance with the contribution guidelines (see below), and commit them. To make the changes easily mergeable, please rebase them to the latest version of the main repositories master branch. Assuming you created a topic branch (step 3), you can follow this sequence of commands to do that:
 Switch to the master branch, update it to the latest revision, switch back to your topic branch, and rebase it on top of the master branch.</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">git checkout master
@@ -198,6 +226,8 @@ git rebase master
 
 <p><strong>ASF git web interface</strong>: <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-flink.git;a=summary">https://git-wip-us.apache.org/repos/asf?p=incubator-flink.git;a=summary</a></p>
 
+<p><strong>ASF svn for the website</strong>: <a href="https://svn.apache.org/repos/asf/incubator/flink/">https://svn.apache.org/repos/asf/incubator/flink/</a>.</p>
+
 <p>Details on how to set the credentials for the ASF git repostiory are <a href="https://git-wip-us.apache.org/">linked here</a>.
 To merge pull requests from our GitHub mirror, there is a script in the source <code>./tools/merge_pull_request.sh.template</code>. Rename it to <code>merge_pull_request.sh</code> with the appropriate settings and use it for merging.</p>