You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by uc...@apache.org on 2015/04/22 16:17:17 UTC

[20/30] flink git commit: [docs] Change doc layout

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/examples.md
----------------------------------------------------------------------
diff --git a/docs/examples.md b/docs/examples.md
deleted file mode 100644
index c19a2fa..0000000
--- a/docs/examples.md
+++ /dev/null
@@ -1,490 +0,0 @@
----
-title:  "Bundled Examples"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-The following example programs showcase different applications of Flink 
-from simple word counting to graph algorithms. The code samples illustrate the 
-use of [Flink's API](programming_guide.html). 
-
-The full source code of the following and more examples can be found in the __flink-java-examples__
-or __flink-scala-examples__ module.
-
-## Word Count
-WordCount is the "Hello World" of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-DataSet<String> text = env.readTextFile("/path/to/file"); 
-
-DataSet<Tuple2<String, Integer>> counts = 
-        // split up the lines in pairs (2-tuples) containing: (word,1)
-        text.flatMap(new Tokenizer())
-        // group by the tuple field "0" and sum up tuple field "1"
-        .groupBy(0)
-        .sum(1);
-
-counts.writeAsCsv(outputPath, "\n", " ");
-
-// User-defined functions
-public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
-
-    @Override
-    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
-        // normalize and split the line
-        String[] tokens = value.toLowerCase().split("\\W+");
-        
-        // emit the pairs
-        for (String token : tokens) {
-            if (token.length() > 0) {
-                out.collect(new Tuple2<String, Integer>(token, 1));
-            }   
-        }
-    }
-}
-~~~
-
-The {% gh_link /flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java  "WordCount example" %} implements the above described algorithm with input parameters: `<text input path>, <output path>`. As test data, any text file will do.
-
-</div>
-<div data-lang="scala" markdown="1">
-
-~~~scala
-val env = ExecutionEnvironment.getExecutionEnvironment
-
-// get input data
-val text = env.readTextFile("/path/to/file")
-
-val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
-  .map { (_, 1) }
-  .groupBy(0)
-  .sum(1)
-
-counts.writeAsCsv(outputPath, "\n", " ")
-~~~
-
-The {% gh_link /flink-examples/flink-scala-examples/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala  "WordCount example" %} implements the above described algorithm with input parameters: `<text input path>, <output path>`. As test data, any text file will do.
-
-
-</div>
-</div>
-
-## Page Rank
-
-The PageRank algorithm computes the "importance" of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries.
-
-In this simple example, PageRank is implemented with a [bulk iteration](iterations.html) and a fixed number of iterations.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-// read the pages and initial ranks by parsing a CSV file
-DataSet<Tuple2<Long, Double>> pagesWithRanks = env.readCsvFile(pagesInputPath)
-						   .types(Long.class, Double.class)
-
-// the links are encoded as an adjacency list: (page-id, Array(neighbor-ids))
-DataSet<Tuple2<Long, Long[]>> pageLinkLists = getLinksDataSet(env);
-
-// set iterative data set
-IterativeDataSet<Tuple2<Long, Double>> iteration = pagesWithRanks.iterate(maxIterations);
-
-DataSet<Tuple2<Long, Double>> newRanks = iteration
-        // join pages with outgoing edges and distribute rank
-        .join(pageLinkLists).where(0).equalTo(0).flatMap(new JoinVertexWithEdgesMatch())
-        // collect and sum ranks
-        .groupBy(0).sum(1)
-        // apply dampening factor
-        .map(new Dampener(DAMPENING_FACTOR, numPages));
-
-DataSet<Tuple2<Long, Double>> finalPageRanks = iteration.closeWith(
-        newRanks, 
-        newRanks.join(iteration).where(0).equalTo(0)
-        // termination condition
-        .filter(new EpsilonFilter()));
-
-finalPageRanks.writeAsCsv(outputPath, "\n", " ");
-
-// User-defined functions
-
-public static final class JoinVertexWithEdgesMatch 
-                    implements FlatJoinFunction<Tuple2<Long, Double>, Tuple2<Long, Long[]>, 
-                                            Tuple2<Long, Double>> {
-
-    @Override
-    public void join(<Tuple2<Long, Double> page, Tuple2<Long, Long[]> adj, 
-                        Collector<Tuple2<Long, Double>> out) {
-        Long[] neigbors = adj.f1;
-        double rank = page.f1;
-        double rankToDistribute = rank / ((double) neigbors.length);
-            
-        for (int i = 0; i < neigbors.length; i++) {
-            out.collect(new Tuple2<Long, Double>(neigbors[i], rankToDistribute));
-        }
-    }
-}
-
-public static final class Dampener implements MapFunction<Tuple2<Long,Double>, Tuple2<Long,Double>> {
-    private final double dampening, randomJump;
-
-    public Dampener(double dampening, double numVertices) {
-        this.dampening = dampening;
-        this.randomJump = (1 - dampening) / numVertices;
-    }
-
-    @Override
-    public Tuple2<Long, Double> map(Tuple2<Long, Double> value) {
-        value.f1 = (value.f1 * dampening) + randomJump;
-        return value;
-    }
-}
-
-public static final class EpsilonFilter 
-                implements FilterFunction<Tuple2<Tuple2<Long, Double>, Tuple2<Long, Double>>> {
-
-    @Override
-    public boolean filter(Tuple2<Tuple2<Long, Double>, Tuple2<Long, Double>> value) {
-        return Math.abs(value.f0.f1 - value.f1.f1) > EPSILON;
-    }
-}
-~~~
-
-The {% gh_link /flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph/PageRankBasic.java "PageRank program" %} implements the above example.
-It requires the following parameters to run: `<pages input path>, <links input path>, <output path>, <num pages>, <num iterations>`.
-
-</div>
-<div data-lang="scala" markdown="1">
-
-~~~scala
-// User-defined types
-case class Link(sourceId: Long, targetId: Long)
-case class Page(pageId: Long, rank: Double)
-case class AdjacencyList(sourceId: Long, targetIds: Array[Long])
-
-// set up execution environment
-val env = ExecutionEnvironment.getExecutionEnvironment
-
-// read the pages and initial ranks by parsing a CSV file
-val pages = env.readCsvFile[Page](pagesInputPath)
-
-// the links are encoded as an adjacency list: (page-id, Array(neighbor-ids))
-val links = env.readCsvFile[Link](linksInputPath)
-
-// assign initial ranks to pages
-val pagesWithRanks = pages.map(p => Page(p, 1.0 / numPages))
-
-// build adjacency list from link input
-val adjacencyLists = links
-  // initialize lists
-  .map(e => AdjacencyList(e.sourceId, Array(e.targetId)))
-  // concatenate lists
-  .groupBy("sourceId").reduce {
-  (l1, l2) => AdjacencyList(l1.sourceId, l1.targetIds ++ l2.targetIds)
-  }
-
-// start iteration
-val finalRanks = pagesWithRanks.iterateWithTermination(maxIterations) {
-  currentRanks =>
-    val newRanks = currentRanks
-      // distribute ranks to target pages
-      .join(adjacencyLists).where("pageId").equalTo("sourceId") {
-        (page, adjacent, out: Collector[Page]) =>
-        for (targetId <- adjacent.targetIds) {
-          out.collect(Page(targetId, page.rank / adjacent.targetIds.length))
-        }
-      }
-      // collect ranks and sum them up
-      .groupBy("pageId").aggregate(SUM, "rank")
-      // apply dampening factor
-      .map { p =>
-        Page(p.pageId, (p.rank * DAMPENING_FACTOR) + ((1 - DAMPENING_FACTOR) / numPages))
-      }
-
-    // terminate if no rank update was significant
-    val termination = currentRanks.join(newRanks).where("pageId").equalTo("pageId") {
-      (current, next, out: Collector[Int]) =>
-        // check for significant update
-        if (math.abs(current.rank - next.rank) > EPSILON) out.collect(1)
-    }
-
-    (newRanks, termination)
-}
-
-val result = finalRanks
-
-// emit result
-result.writeAsCsv(outputPath, "\n", " ")
-~~~
-
-he {% gh_link /flink-examples/flink-scala-examples/src/main/scala/org/apache/flink/examples/scala/graph/PageRankBasic.scala "PageRank program" %} implements the above example.
-It requires the following parameters to run: `<pages input path>, <links input path>, <output path>, <num pages>, <num iterations>`.
-</div>
-</div>
-
-Input files are plain text files and must be formatted as follows:
-- Pages represented as an (long) ID separated by new-line characters.
-    * For example `"1\n2\n12\n42\n63\n"` gives five pages with IDs 1, 2, 12, 42, and 63.
-- Links are represented as pairs of page IDs which are separated by space characters. Links are separated by new-line characters:
-    * For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (directed) links (1)->(2), (2)->(12), (1)->(12), and (42)->(63).
-
-For this simple implementation it is required that each page has at least one incoming and one outgoing link (a page can point to itself).
-
-## Connected Components
-
-The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID.
-
-This implementation uses a [delta iteration](iterations.html): Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-// read vertex and edge data
-DataSet<Long> vertices = getVertexDataSet(env);
-DataSet<Tuple2<Long, Long>> edges = getEdgeDataSet(env).flatMap(new UndirectEdge());
-
-// assign the initial component IDs (equal to the vertex ID)
-DataSet<Tuple2<Long, Long>> verticesWithInitialId = vertices.map(new DuplicateValue<Long>());
-        
-// open a delta iteration
-DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iteration =
-        verticesWithInitialId.iterateDelta(verticesWithInitialId, maxIterations, 0);
-
-// apply the step logic: 
-DataSet<Tuple2<Long, Long>> changes = iteration.getWorkset()
-        // join with the edges
-        .join(edges).where(0).equalTo(0).with(new NeighborWithComponentIDJoin())
-        // select the minimum neighbor component ID
-        .groupBy(0).aggregate(Aggregations.MIN, 1)
-        // update if the component ID of the candidate is smaller
-        .join(iteration.getSolutionSet()).where(0).equalTo(0)
-        .flatMap(new ComponentIdFilter());
-
-// close the delta iteration (delta and new workset are identical)
-DataSet<Tuple2<Long, Long>> result = iteration.closeWith(changes, changes);
-
-// emit result
-result.writeAsCsv(outputPath, "\n", " ");
-
-// User-defined functions
-
-public static final class DuplicateValue<T> implements MapFunction<T, Tuple2<T, T>> {
-    
-    @Override
-    public Tuple2<T, T> map(T vertex) {
-        return new Tuple2<T, T>(vertex, vertex);
-    }
-}
-
-public static final class UndirectEdge 
-                    implements FlatMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>> {
-    Tuple2<Long, Long> invertedEdge = new Tuple2<Long, Long>();
-    
-    @Override
-    public void flatMap(Tuple2<Long, Long> edge, Collector<Tuple2<Long, Long>> out) {
-        invertedEdge.f0 = edge.f1;
-        invertedEdge.f1 = edge.f0;
-        out.collect(edge);
-        out.collect(invertedEdge);
-    }
-}
-
-public static final class NeighborWithComponentIDJoin 
-                implements JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>> {
-
-    @Override
-    public Tuple2<Long, Long> join(Tuple2<Long, Long> vertexWithComponent, Tuple2<Long, Long> edge) {
-        return new Tuple2<Long, Long>(edge.f1, vertexWithComponent.f1);
-    }
-}
-
-public static final class ComponentIdFilter 
-                    implements FlatMapFunction<Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>>, 
-                                            Tuple2<Long, Long>> {
-
-    @Override
-    public void flatMap(Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>> value, 
-                        Collector<Tuple2<Long, Long>> out) {
-        if (value.f0.f1 < value.f1.f1) {
-            out.collect(value.f0);
-        }
-    }
-}
-~~~
-
-The {% gh_link /flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph/ConnectedComponents.java "ConnectedComponents program" %} implements the above example. It requires the following parameters to run: `<vertex input path>, <edge input path>, <output path> <max num iterations>`.
-
-</div>
-<div data-lang="scala" markdown="1">
-
-~~~scala
-// set up execution environment
-val env = ExecutionEnvironment.getExecutionEnvironment
-
-// read vertex and edge data
-// assign the initial components (equal to the vertex id)
-val vertices = getVerticesDataSet(env).map { id => (id, id) }
-
-// undirected edges by emitting for each input edge the input edges itself and an inverted
-// version
-val edges = getEdgesDataSet(env).flatMap { edge => Seq(edge, (edge._2, edge._1)) }
-
-// open a delta iteration
-val verticesWithComponents = vertices.iterateDelta(vertices, maxIterations, Array(0)) {
-  (s, ws) =>
-
-    // apply the step logic: join with the edges
-    val allNeighbors = ws.join(edges).where(0).equalTo(0) { (vertex, edge) =>
-      (edge._2, vertex._2)
-    }
-
-    // select the minimum neighbor
-    val minNeighbors = allNeighbors.groupBy(0).min(1)
-
-    // update if the component of the candidate is smaller
-    val updatedComponents = minNeighbors.join(s).where(0).equalTo(0) {
-      (newVertex, oldVertex, out: Collector[(Long, Long)]) =>
-        if (newVertex._2 < oldVertex._2) out.collect(newVertex)
-    }
-
-    // delta and new workset are identical
-    (updatedComponents, updatedComponents)
-}
-
-verticesWithComponents.writeAsCsv(outputPath, "\n", " ")
-    
-~~~
-
-The {% gh_link /flink-examples/flink-scala-examples/src/main/scala/org/apache/flink/examples/scala/graph/ConnectedComponents.scala "ConnectedComponents program" %} implements the above example. It requires the following parameters to run: `<vertex input path>, <edge input path>, <output path> <max num iterations>`.
-</div>
-</div>
-
-Input files are plain text files and must be formatted as follows:
-- Vertices represented as IDs and separated by new-line characters.
-    * For example `"1\n2\n12\n42\n63\n"` gives five vertices with (1), (2), (12), (42), and (63).
-- Edges are represented as pairs for vertex IDs which are separated by space characters. Edges are separated by new-line characters:
-    * For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (undirected) links (1)-(2), (2)-(12), (1)-(12), and (42)-(63).
-
-## Relational Query
-
-The Relational Query example assumes two tables, one with `orders` and the other with `lineitems` as specified by the [TPC-H decision support benchmark](http://www.tpc.org/tpch/). TPC-H is a standard benchmark in the database industry. See below for instructions how to generate the input data.
-
-The example implements the following SQL query.
-
-~~~sql
-SELECT l_orderkey, o_shippriority, sum(l_extendedprice) as revenue
-    FROM orders, lineitem
-WHERE l_orderkey = o_orderkey
-    AND o_orderstatus = "F" 
-    AND YEAR(o_orderdate) > 1993
-    AND o_orderpriority LIKE "5%"
-GROUP BY l_orderkey, o_shippriority;
-~~~
-
-The Flink program, which implements the above query looks as follows.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-// get orders data set: (orderkey, orderstatus, orderdate, orderpriority, shippriority)
-DataSet<Tuple5<Integer, String, String, String, Integer>> orders = getOrdersDataSet(env);
-// get lineitem data set: (orderkey, extendedprice)
-DataSet<Tuple2<Integer, Double>> lineitems = getLineitemDataSet(env);
-
-// orders filtered by year: (orderkey, custkey)
-DataSet<Tuple2<Integer, Integer>> ordersFilteredByYear =
-        // filter orders
-        orders.filter(
-            new FilterFunction<Tuple5<Integer, String, String, String, Integer>>() {
-                @Override
-                public boolean filter(Tuple5<Integer, String, String, String, Integer> t) {
-                    // status filter
-                    if(!t.f1.equals(STATUS_FILTER)) {
-                        return false;
-                    // year filter
-                    } else if(Integer.parseInt(t.f2.substring(0, 4)) <= YEAR_FILTER) {
-                        return false;
-                    // order priority filter
-                    } else if(!t.f3.startsWith(OPRIO_FILTER)) {
-                        return false;
-                    }
-                    return true;
-                }
-            })
-        // project fields out that are no longer required
-        .project(0,4).types(Integer.class, Integer.class);
-
-// join orders with lineitems: (orderkey, shippriority, extendedprice)
-DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders = 
-        ordersFilteredByYear.joinWithHuge(lineitems)
-                            .where(0).equalTo(0)
-                            .projectFirst(0,1).projectSecond(1)
-                            .types(Integer.class, Integer.class, Double.class);
-
-// extendedprice sums: (orderkey, shippriority, sum(extendedprice))
-DataSet<Tuple3<Integer, Integer, Double>> priceSums = 
-        // group by order and sum extendedprice
-        lineitemsOfOrders.groupBy(0,1).aggregate(Aggregations.SUM, 2);
-
-// emit result
-priceSums.writeAsCsv(outputPath);
-~~~
-
-The {% gh_link /flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/RelationalQuery.java "Relational Query program" %} implements the above query. It requires the following parameters to run: `<orders input path>, <lineitem input path>, <output path>`.
-
-</div>
-<div data-lang="scala" markdown="1">
-Coming soon...
-
-The {% gh_link /flink-examples/flink-scala-examples/src/main/scala/org/apache/flink/examples/scala/relational/RelationalQuery.scala "Relational Query program" %} implements the above query. It requires the following parameters to run: `<orders input path>, <lineitem input path>, <output path>`.
-
-</div>
-</div>
-
-The orders and lineitem files can be generated using the [TPC-H benchmark](http://www.tpc.org/tpch/) suite's data generator tool (DBGEN). 
-Take the following steps to generate arbitrary large input files for the provided Flink programs:
-
-1.  Download and unpack DBGEN
-2.  Make a copy of *makefile.suite* called *Makefile* and perform the following changes:
-
-~~~bash
-DATABASE = DB2
-MACHINE  = LINUX
-WORKLOAD = TPCH
-CC       = gcc
-~~~
-
-1.  Build DBGEN using *make*
-2.  Generate lineitem and orders relations using dbgen. A scale factor
-    (-s) of 1 results in a generated data set with about 1 GB size.
-
-~~~bash
-./dbgen -T o -s 1
-~~~

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/favicon.ico
----------------------------------------------------------------------
diff --git a/docs/favicon.ico b/docs/favicon.ico
deleted file mode 100644
index 41f40ed..0000000
Binary files a/docs/favicon.ico and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/favicon.png
----------------------------------------------------------------------
diff --git a/docs/favicon.png b/docs/favicon.png
deleted file mode 100644
index 54bbfd5..0000000
Binary files a/docs/favicon.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/fig/LICENSE.txt
----------------------------------------------------------------------
diff --git a/docs/fig/LICENSE.txt b/docs/fig/LICENSE.txt
new file mode 100644
index 0000000..35b8673
--- /dev/null
+++ b/docs/fig/LICENSE.txt
@@ -0,0 +1,17 @@
+All image files in the folder and its subfolders are
+licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/fig/overview-stack-0.9.png
----------------------------------------------------------------------
diff --git a/docs/fig/overview-stack-0.9.png b/docs/fig/overview-stack-0.9.png
new file mode 100644
index 0000000..70b775a
Binary files /dev/null and b/docs/fig/overview-stack-0.9.png differ

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/flink_on_tez_guide.md
----------------------------------------------------------------------
diff --git a/docs/flink_on_tez_guide.md b/docs/flink_on_tez_guide.md
deleted file mode 100644
index 5cc1e31..0000000
--- a/docs/flink_on_tez_guide.md
+++ /dev/null
@@ -1,293 +0,0 @@
----
-title: "Running Flink on YARN leveraging Tez"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-
-<a href="#top"></a>
-
-## Introduction
-
-You can run Flink using Tez as an execution environment. Flink on Tez 
-is currently included in *flink-staging* in alpha. All classes are
-located in the *org.apache.flink.tez* package.
-
-## Why Flink on Tez
-
-[Apache Tez](tez.apache.org) is a scalable data processing
-platform. Tez provides an API for specifying a directed acyclic
-graph (DAG), and functionality for placing the DAG vertices in YARN
-containers, as well as data shuffling.  In Flink's architecture,
-Tez is at about the same level as Flink's network stack. While Flink's
-network stack focuses heavily on low latency in order to support 
-pipelining, data streaming, and iterative algorithms, Tez
-focuses on scalability and elastic resource usage.
-
-Thus, by replacing Flink's network stack with Tez, users can get scalability
-and elastic resource usage in shared clusters while retaining Flink's 
-APIs, optimizer, and runtime algorithms (local sorts, hash tables, etc).
-
-Flink programs can run almost unmodified using Tez as an execution
-environment. Tez supports local execution (e.g., for debugging), and 
-remote execution on YARN.
-
-
-## Local execution
-
-The `LocalTezEnvironment` can be used run programs using the local
-mode provided by Tez. This example shows how WordCount can be run using the Tez local mode.
-It is identical to a normal Flink WordCount, except that the `LocalTezEnvironment` is used.
-To run in local Tez mode, you can simply run a Flink on Tez program
-from your IDE (e.g., right click and run).
-  
-{% highlight java %}
-public class WordCountExample {
-    public static void main(String[] args) throws Exception {
-        final LocalTezEnvironment env = LocalTezEnvironment.create();
-
-        DataSet<String> text = env.fromElements(
-            "Who's there?",
-            "I think I hear them. Stand, ho! Who's there?");
-
-        DataSet<Tuple2<String, Integer>> wordCounts = text
-            .flatMap(new LineSplitter())
-            .groupBy(0)
-            .sum(1);
-
-        wordCounts.print();
-
-        env.execute("Word Count Example");
-    }
-
-    public static class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
-        @Override
-        public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
-            for (String word : line.split(" ")) {
-                out.collect(new Tuple2<String, Integer>(word, 1));
-            }
-        }
-    }
-}
-{% endhighlight %}
-
-## YARN execution
-
-### Setup
-
-- Install Tez on your Hadoop 2 cluster following the instructions from the
-  [Apache Tez website](http://tez.apache.org/install.html). If you are able to run 
-  the examples that ship with Tez, then Tez has been successfully installed.
-  
-- Currently, you need to build Flink yourself to obtain Flink on Tez
-  (the reason is a Hadoop version compatibility: Tez releases artifacts
-  on Maven central with a Hadoop 2.6.0 dependency). Build Flink
-  using `mvn -DskipTests clean package -Pinclude-tez -Dhadoop.version=X.X.X -Dtez.version=X.X.X`.
-  Make sure that the Hadoop version matches the version that Tez uses.
-  Obtain the jar file contained in the Flink distribution under
-  `flink-staging/flink-tez/target/flink-tez-x.y.z-flink-fat-jar.jar` 
-  and upload it to some directory in HDFS. E.g., to upload the file
-  to the directory `/apps`, execute
-  {% highlight bash %}
-  $ hadoop fs -put /path/to/flink-tez-x.y.z-flink-fat-jar.jar /apps
-  {% endhighlight %}  
- 
-- Edit the tez-site.xml configuration file, adding an entry that points to the
-  location of the file. E.g., assuming that the file is in the directory `/apps/`, 
-  add the following entry to tez-site.xml:
-    {% highlight xml %}
-<property>
-  <name>tez.aux.uris</name>
-  <value>${fs.default.name}/apps/flink-tez-x.y.z-flink-fat-jar.jar</value>
-</property>
-    {% endhighlight %}  
-    
-- At this point, you should be able to run the pre-packaged examples, e.g., run WordCount:
-  {% highlight bash %}
-  $ hadoop jar /path/to/flink-tez-x.y.z-flink-fat-jar.jar wc hdfs:/path/to/text hdfs:/path/to/output
-  {% endhighlight %}  
-
-
-### Packaging your program
-
-Application packaging is currently a bit different than in Flink standalone mode.
-  Flink programs that run on Tez need to be packaged in a "fat jar"
-  file that contain the Flink client. This jar can then be executed via the `hadoop jar` command.
-  An easy way to do that is to use the provided `flink-tez-quickstart` maven archetype.
-  Create a new project as
-  
-  {% highlight bash %}
-  $ mvn archetype:generate                             \
-    -DarchetypeGroupId=org.apache.flink              \
-    -DarchetypeArtifactId=flink-tez-quickstart           \
-    -DarchetypeVersion={{site.FLINK_VERSION_SHORT}}
-  {% endhighlight %}
-  
-  and specify the group id, artifact id, version, and package of your project. For example,
-  let us assume the following options: `org.myorganization`, `flink-on-tez`, `0.1`, and `org.myorganization`.
-  You should see the following output on your terminal:
-  
-  {% highlight bash %}
-  $ mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-tez-quickstart
-  [INFO] Scanning for projects...
-  [INFO]
-  [INFO] ------------------------------------------------------------------------
-  [INFO] Building Maven Stub Project (No POM) 1
-  [INFO] ------------------------------------------------------------------------
-  [INFO]
-  [INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) > generate-sources @ standalone-pom >>>
-  [INFO]
-  [INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) < generate-sources @ standalone-pom <<<
-  [INFO]
-  [INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom ---
-  [INFO] Generating project in Interactive mode
-  [INFO] Archetype [org.apache.flink:flink-tez-quickstart:0.9-SNAPSHOT] found in catalog local
-  Define value for property 'groupId': : org.myorganization
-  Define value for property 'artifactId': : flink-on-tez
-  Define value for property 'version':  1.0-SNAPSHOT: : 0.1
-  Define value for property 'package':  org.myorganization: :
-  Confirm properties configuration:
-  groupId: org.myorganization
-  artifactId: flink-on-tez
-  version: 0.1
-  package: org.myorganization
-   Y: : Y
-  [INFO] ----------------------------------------------------------------------------
-  [INFO] Using following parameters for creating project from Archetype: flink-tez-quickstart:0.9-SNAPSHOT
-  [INFO] ----------------------------------------------------------------------------
-  [INFO] Parameter: groupId, Value: org.myorganization
-  [INFO] Parameter: artifactId, Value: flink-on-tez
-  [INFO] Parameter: version, Value: 0.1
-  [INFO] Parameter: package, Value: org.myorganization
-  [INFO] Parameter: packageInPathFormat, Value: org/myorganization
-  [INFO] Parameter: package, Value: org.myorganization
-  [INFO] Parameter: version, Value: 0.1
-  [INFO] Parameter: groupId, Value: org.myorganization
-  [INFO] Parameter: artifactId, Value: flink-on-tez
-  [INFO] project created from Archetype in dir: /Users/kostas/Dropbox/flink-tez-quickstart-test/flink-on-tez
-  [INFO] ------------------------------------------------------------------------
-  [INFO] BUILD SUCCESS
-  [INFO] ------------------------------------------------------------------------
-  [INFO] Total time: 44.130 s
-  [INFO] Finished at: 2015-02-26T17:59:45+01:00
-  [INFO] Final Memory: 15M/309M
-  [INFO] ------------------------------------------------------------------------
-  {% endhighlight %}
-  
-  The project contains an example called `YarnJob.java` that provides the skeleton 
-  for a Flink-on-Tez job. Program execution is currently done using Hadoop's `ProgramDriver`, 
-  see the `Driver.java` class for an example. Create the fat jar using 
-  `mvn -DskipTests clean package`. The resulting jar will be located in the `target/` directory. 
-  You can now execute a job as follows:
-  
-  {% highlight bash %}
-$ mvn -DskipTests clean package
-$ hadoop jar flink-on-tez/target/flink-on-tez-0.1-flink-fat-jar.jar yarnjob [command-line parameters]
-  {% endhighlight %}
-  
-  Flink programs that run on YARN using Tez as an execution engine need to use the `RemoteTezEnvironment` and 
-  register the class that contains the `main` method with that environment:
-  {% highlight java %}
-  public class WordCountExample {
-      public static void main(String[] args) throws Exception {
-          final RemoteTezEnvironment env = RemoteTezEnvironment.create();
-  
-          DataSet<String> text = env.fromElements(
-              "Who's there?",
-              "I think I hear them. Stand, ho! Who's there?");
-  
-          DataSet<Tuple2<String, Integer>> wordCounts = text
-              .flatMap(new LineSplitter())
-              .groupBy(0)
-              .sum(1);
-  
-          wordCounts.print();
-      
-          env.registerMainClass(WordCountExample.class);
-          env.execute("Word Count Example");
-      }
-  
-      public static class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
-          @Override
-          public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
-              for (String word : line.split(" ")) {
-                  out.collect(new Tuple2<String, Integer>(word, 1));
-              }
-          }
-      }
-  }
-  {% endhighlight %}
-
-
-## How it works
-
-Flink on Tez reuses the Flink APIs, the Flink optimizer,
-and the Flink local runtime, including Flink's hash table and sort implementations. Tez
-replaces Flink's network stack and control plan, and is responsible for scheduling and
-network shuffles.
-
-The figure below shows how a Flink program passes through the Flink stack and generates
-a Tez DAG (instead of a JobGraph that would be created using normal Flink execution).
-
-<div style="text-align: center;">
-<img src="img/flink_on_tez_translation.png" alt="Translation of a Flink program to a Tez DAG." height="600px" vspace="20px" style="text-align: center;"/>
-</div>
-
-All local processing, including memory management, sorting, and hashing is performed by
-Flink as usual. Local processing is encapsulated in Tez vertices, as seen in the figure
-below. Tez vertices are connected by edges. Tez is currently based on a key-value data
-model. In the current implementation, the elements that are processed by Flink operators
-are wrapped inside Tez values, and the Tez key field is used to indicate the index of the target task
-that the elements are destined to.
-
-<div style="text-align: center;">
-<img src="img/flink_tez_vertex.png" alt="Encapsulation of Flink runtime inside Tez vertices." height="200px" vspace="20px" style="text-align: center;"/>
-</div>
-
-## Limitations
-
-Currently, Flink on Tez does not support all features of the Flink API. We are working
-to enable all of the missing features listed below. In the meantime, if your project depends on these features, we suggest
-to use [Flink on YARN]({{site.baseurl}}/yarn_setup.html) or [Flink standalone]({{site.baseurl}}/setup_quickstart.html).
-
-The following features are currently missing.
-
-- Dedicated client: jobs need to be submitted via Hadoop's command-line client
-
-- Self-joins: currently binary operators that receive the same input are not supported due to 
-  [TEZ-1190](https://issues.apache.org/jira/browse/TEZ-1190).
-
-- Iterative programs are currently not supported.
-
-- Broadcast variables are currently not supported.
-
-- Accummulators and counters are currently not supported.
-
-- Performance: The current implementation has not been heavily tested for performance, and misses several optimizations,
-  including task chaining.
-
-- Streaming API: Streaming programs will not currently compile to Tez DAGs.
-
-- Scala API: The current implementation has only been tested with the Java API.
-
-
-

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/gce_setup.md
----------------------------------------------------------------------
diff --git a/docs/gce_setup.md b/docs/gce_setup.md
deleted file mode 100644
index 9b8c441..0000000
--- a/docs/gce_setup.md
+++ /dev/null
@@ -1,89 +0,0 @@
----
-title:  "Google Compute Engine Setup"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-
-This documentation provides instructions on how to setup Flink fully
-automatically with Hadoop 1 or Hadoop 2 on top of a
-[Google Compute Engine](https://cloud.google.com/compute/) cluster. This is made
-possible by Google's [bdutil](https://cloud.google.com/hadoop/bdutil) which
-starts a cluster and deploys Flink with Hadoop. To get started, just follow the
-steps below:
-
-# Prerequisites
-
-## Install Google Cloud SDK
-
-Please follow the instructions on how to setup the
-[Google Cloud SDK](https://cloud.google.com/sdk/).
-
-## Install bdutil
-
-At the moment, there is no bdutil release yet which includes the Flink
-extension. However, you can get the latest version of bdutil with Flink support
-from [GitHub](https://github.com/GoogleCloudPlatform/bdutil):
-
-    git clone https://github.com/GoogleCloudPlatform/bdutil.git
-
-After you have downloaded the source, change into the newly created `bdutil`
-directory and continue with the next steps.
-
-# Deploying Flink on Google Compute Engine
-
-## Set up a bucket
-
-If you have not done so, create a bucket for the bdutil config and
-staging files. A new bucket can be created with gsutil:
-
-    gsutil mb gs://<bucket_name>
-
-
-## Adapt the bdutil config
-
-To deploy Flink with bdutil, adapt at least the following variables in
-bdutil_env.sh.
-
-    CONFIGBUCKET="<bucket_name>"
-    PROJECT="<compute_engine_project_name>"
-    NUM_WORKERS=<number_of_workers>
-
-## Adapt the Flink config
-
-bdutil's Flink extension handles the configuration for you. You may additionally
-adjust configuration variables in `extensions/flink/flink_env.sh`. If you want
-to make further configuration, please take a look at
-[configuring Flink](config.md). You will have to restart Flink after changing
-its configuration using `bin/stop-cluster` and `bin/start-cluster`.
-
-## Bring up a cluster with Flink
-
-To bring up the Flink cluster on Google Compute Engine, execute:
-
-    ./bdutil -e extensions/flink/flink_env.sh deploy
-
-## Run a Flink example job:
-
-    ./bdutil shell
-    cd /home/hadoop/flink-install/bin
-    ./flink run ../examples/flink-java-examples-*-WordCount.jar gs://dataflow-samples/shakespeare/othello.txt gs://<bucket_name>/output

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/gelly_guide.md
----------------------------------------------------------------------
diff --git a/docs/gelly_guide.md b/docs/gelly_guide.md
deleted file mode 100644
index cc85296..0000000
--- a/docs/gelly_guide.md
+++ /dev/null
@@ -1,487 +0,0 @@
----
-title: "Gelly: Flink Graph API"
-is_beta: true
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-<a href="#top"></a>
-
-Introduction
-------------
-
-Gelly is a Java Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using high-level functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms.
-
-Using Gelly
------------
-
-Gelly is currently part of the *staging* Maven project. All relevant classes are located in the *org.apache.flink.graph* package.
-
-Add the following dependency to your `pom.xml` to use Gelly.
-
-~~~xml
-<dependency>
-    <groupId>org.apache.flink</groupId>
-    <artifactId>flink-gelly</artifactId>
-    <version>{{site.FLINK_VERSION_SHORT}}</version>
-</dependency>
-~~~
-
-The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink Java API. After reading this guide, you might also want to check the {% gh_link /flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ "Gelly examples" %}.
-
-Graph Representation
------------
-
-In Gelly, a `Graph` is represented by a `DataSet` of vertices and a `DataSet` of edges.
-
-The `Graph` nodes are represented by the `Vertex` type. A `Vertex` is defined by a unique ID and a value. `Vertex` IDs should implement the `Comparable` interface. Vertices without value can be represented by setting the value type to `NullValue`.
-
-{% highlight java %}
-// create a new vertex with a Long ID and a String value
-Vertex<Long, String> v = new Vertex<Long, String>(1L, "foo");
-
-// create a new vertex with a Long ID and no value
-Vertex<Long, NullValue> v = new Vertex<Long, NullValue>(1L, NullValue.getInstance());
-{% endhighlight %}
-
-The graph edges are represented by the `Edge` type. An `Edge` is defined by a source ID (the ID of the source `Vertex`), a target ID (the ID of the target `Vertex`) and an optional value. The source and target IDs should be of the same type as the `Vertex` IDs. Edges with no value have a `NullValue` value type.
-
-{% highlight java %}
-Edge<Long, Double> e = new Edge<Long, Double>(1L, 2L, 0.5);
-
-// reverse the source and target of this edge
-Edge<Long, Double> reversed = e.reverse();
-
-Double weight = e.getValue(); // weight = 0.5
-{% endhighlight %}
-
-[Back to top](#top)
-
-Graph Creation
------------
-
-You can create a `Graph` in the following ways:
-
-* from a `DataSet` of edges and an optional `DataSet` of vertices:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-DataSet<Vertex<String, Long>> vertices = ...
-
-DataSet<Edge<String, Double>> edges = ...
-
-Graph<String, Long, Double> graph = Graph.fromDataSet(vertices, edges, env);
-{% endhighlight %}
-
-* from a `DataSet` of `Tuple3` and an optional `DataSet` of `Tuple2`. In this case, Gelly will convert each `Tuple3` to an `Edge`, where the first field will be the source ID, the second field will be the target ID and the third field will be the edge value. Equivalently, each `Tuple2` will be converted to a `Vertex`, where the first field will be the vertex ID and the second field will be the vertex value:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-DataSet<Tuple2<String, Long>> vertexTuples = env.readCsvFile("path/to/vertex/input");
-
-DataSet<Tuple3<String, String, Double>> edgeTuples = env.readCsvFile("path/to/edge/input");
-
-Graph<String, Long, Double> graph = Graph.fromTupleDataSet(vertexTuples, edgeTuples, env);
-{% endhighlight %}
-
-* from a `Collection` of edges and an optional `Collection` of vertices:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-List<Vertex<Long, Long>> vertexList = new ArrayList...
-
-List<Edge<Long, String>> edgeList = new ArrayList...
-
-Graph<Long, Long, String> graph = Graph.fromCollection(vertexList, edgeList, env);
-{% endhighlight %}
-
-If no vertex input is provided during Graph creation, Gelly will automatically produce the `Vertex` `DataSet` from the edge input. In this case, the created vertices will have no values. Alternatively, you can provide a `MapFunction` as an argument to the creation method, in order to initialize the `Vertex` values:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-// initialize the vertex value to be equal to the vertex ID
-Graph<Long, Long, String> graph = Graph.fromCollection(edges, 
-				new MapFunction<Long, Long>() {
-					public Long map(Long value) { 
-						return value; 
-					} 
-				}, env);
-{% endhighlight %}
-
-[Back to top](#top)
-
-Graph Properties
-------------
-
-Gelly includes the following methods for retrieving various Graph properties and metrics:
-
-{% highlight java %}
-// get the Vertex DataSet
-DataSet<Vertex<K, VV>> getVertices()
-
-// get the Edge DataSet
-DataSet<Edge<K, EV>> getEdges()
-
-// get the IDs of the vertices as a DataSet
-DataSet<K> getVertexIds()
-
-// get the source-target pairs of the edge IDs as a DataSet
-DataSet<Tuple2<K, K>> getEdgeIds() 
-
-// get a DataSet of <vertex ID, in-degree> pairs for all vertices
-DataSet<Tuple2<K, Long>> inDegrees() 
-
-// get a DataSet of <vertex ID, out-degree> pairs for all vertices
-DataSet<Tuple2<K, Long>> outDegrees()
-
-// get a DataSet of <vertex ID, degree> pairs for all vertices, where degree is the sum of in- and out- degrees
-DataSet<Tuple2<K, Long>> getDegrees()
-
-// get the number of vertices
-long numberOfVertices()
-
-// get the number of edges
-long numberOfEdges()
-
-// get a DataSet of Triplets<srcVertex, trgVertex, edge>
-DataSet<Triplet<K, VV, EV>> getTriplets()
-
-{% endhighlight %}
-
-[Back to top](#top)
-
-Graph Transformations
------------------
-
-* <strong>Map</strong>: Gelly provides specialized methods for applying a map transformation on the vertex values or edge values. `mapVertices` and `mapEdges` return a new `Graph`, where the IDs of the vertices (or edges) remain unchanged, while the values are transformed according to the provided user-defined map function. The map functions also allow changing the type of the vertex or edge values.
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-Graph<Long, Long, Long> graph = Graph.fromDataSet(vertices, edges, env);
-
-// increment each vertex value by one
-Graph<Long, Long, Long> updatedGraph = graph.mapVertices(
-				new MapFunction<Vertex<Long, Long>, Long>() {
-					public Long map(Vertex<Long, Long> value) {
-						return value.getValue() + 1;
-					}
-				});
-{% endhighlight %}
-
-* <strong>Filter</strong>: A filter transformation applies a user-defined filter function on the vertices or edges of the `Graph`. `filterOnEdges` will create a sub-graph of the original graph, keeping only the edges that satisfy the provided predicate. Note that the vertex dataset will not be modified. Respectively, `filterOnVertices` applies a filter on the vertices of the graph. Edges whose source and/or target do not satisfy the vertex predicate are removed from the resulting edge dataset. The `subgraph` method can be used to apply a filter function to the vertices and the edges at the same time.
-
-{% highlight java %}
-Graph<Long, Long, Long> graph = ...
-
-graph.subgraph(
-		new FilterFunction<Vertex<Long, Long>>() {
-			   	public boolean filter(Vertex<Long, Long> vertex) {
-					// keep only vertices with positive values
-					return (vertex.getValue() > 0);
-			   }
-		   },
-		new FilterFunction<Edge<Long, Long>>() {
-				public boolean filter(Edge<Long, Long> edge) {
-					// keep only edges with negative values
-					return (edge.getValue() < 0);
-				}
-		})
-{% endhighlight %}
-
-<p class="text-center">
-    <img alt="Filter Transformations" width="80%" src="img/gelly-filter.png"/>
-</p>
-
-* <strong>Join</strong>: Gelly provides specialized methods for joining the vertex and edge datasets with other input datasets. `joinWithVertices` joins the vertices with a `Tuple2` input data set. The join is performed using the vertex ID and the first field of the `Tuple2` input as the join keys. The method returns a new `Graph` where the vertex values have been updated according to a provided user-defined map function.
-Similarly, an input dataset can be joined with the edges, using one of three methods. `joinWithEdges` expects an input `DataSet` of `Tuple3` and joins on the composite key of both source and target vertex IDs. `joinWithEdgesOnSource` expects a `DataSet` of `Tuple2` and joins on the source key of the edges and the first attribute of the input dataset and `joinWithEdgesOnTarget` expects a `DataSet` of `Tuple2` and joins on the target key of the edges and the first attribute of the input dataset. All three methods apply a map function on the edge and the input data set values.
-Note that if the input dataset contains a key multiple times, all Gelly join methods will only consider the first value encountered.
-
-{% highlight java %}
-Graph<Long, Double, Double> network = ...
-
-DataSet<Tuple2<Long, Long>> vertexOutDegrees = network.outDegrees();
-
-// assign the transition probabilities as the edge weights
-Graph<Long, Double, Double> networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees,
-				new MapFunction<Tuple2<Double, Long>, Double>() {
-					public Double map(Tuple2<Double, Long> value) {
-						return value.f0 / value.f1;
-					}
-				});
-{% endhighlight %}
-
-* <strong>Reverse</strong>: the `reverse()` method returns a new `Graph` where the direction of all edges has been reversed.
-
-* <strong>Undirected</strong>: In Gelly, a `Graph` is always directed. Undirected graphs can be represented by adding all opposite-direction edges to a graph. For this purpose, Gelly provides the `getUndirected()` method.
-
-* <strong>Union</strong>: Gelly's `union()` method performs a union on the vertex and edges sets of the input graphs. Duplicate vertices are removed from the resulting `Graph`, while if duplicate edges exists, these will be maintained.
-
-<p class="text-center">
-    <img alt="Union Transformation" width="50%" src="img/gelly-union.png"/>
-</p>
-
-[Back to top](#top)
-
-Graph Mutations
------------
-
-Gelly includes the following methods for adding and removing vertices and edges from an input `Graph`:
-
-{% highlight java %}
-// adds a Vertex and the given edges to the Graph. If the Vertex already exists, it will not be added again, but the given edges will.
-Graph<K, VV, EV> addVertex(final Vertex<K, VV> vertex, List<Edge<K, EV>> edges)
-
-// adds an Edge to the Graph. If the source and target vertices do not exist in the graph, they will also be added.
-Graph<K, VV, EV> addEdge(Vertex<K, VV> source, Vertex<K, VV> target, EV edgeValue)
-
-// removes the given Vertex and its edges from the Graph.
-Graph<K, VV, EV> removeVertex(Vertex<K, VV> vertex)
-
-// removes *all* edges that match the given Edge from the Graph.
-Graph<K, VV, EV> removeEdge(Edge<K, EV> edge)
-{% endhighlight %}
-
-Neighborhood Methods
------------
-
-Neighborhood methods allow vertices to perform an aggregation on their first-hop neighborhood.
-
-`reduceOnEdges()` can be used to compute an aggregation on the neighboring edges of a vertex, while `reduceOnNeighbors()` has access on both the neighboring edges and vertices. The neighborhood scope is defined by the `EdgeDirection` parameter, which takes the values `IN`, `OUT` or `ALL`. `IN` will gather all in-coming edges (neighbors) of a vertex, `OUT` will gather all out-going edges (neighbors), while `ALL` will gather all edges (neighbors).
-
-For example, assume that you want to select the minimum weight of all out-edges for each vertex in the following graph:
-
-<p class="text-center">
-    <img alt="reduceOnEdges Example" width="50%" src="img/gelly-example-graph.png"/>
-</p>
-
-The following code will collect the out-edges for each vertex and apply the `SelectMinWeight()` user-defined function on each of the resulting neighborhoods:
-
-{% highlight java %}
-Graph<Long, Long, Double> graph = ...
-
-DataSet<Tuple2<Long, Double>> minWeights = graph.reduceOnEdges(
-				new SelectMinWeight(), EdgeDirection.OUT);
-
-// user-defined function to select the minimum weight
-static final class SelectMinWeight implements EdgesFunction<Long, Double, Tuple2<Long, Double>> {
-
-    public Tuple2<Long, Double> iterateEdges(Iterable<Tuple2<Long, Edge<Long, Double>>> edges) {
-
-        long minWeight = Double.MAX_VALUE;
-        long vertexId = -1;
-
-        for (Tuple2<Long, Edge<Long, Double>> edge: edges) {
-            if (edge.f1.getValue() < weight) {
-            weight = edge.f1.getValue();
-            vertexId = edge.f0;
-        }
-        return new Tuple2<Long, Double>(vertexId, minWeight);
-    }
-}
-{% endhighlight %}
-
-<p class="text-center">
-    <img alt="reduceOnEdges Example" width="50%" src="img/gelly-reduceOnEdges.png"/>
-</p>
-
-Similarly, assume that you would like to compute the sum of the values of all in-coming neighbors, for every vertex. The following code will collect the in-coming neighbors for each vertex and apply the `SumValues()` user-defined function on each neighborhood:
-
-{% highlight java %}
-Graph<Long, Long, Double> graph = ...
-
-DataSet<Tuple2<Long, Long>> verticesWithSum = graph.reduceOnNeighbors(
-				new SumValues(), EdgeDirection.IN);
-
-// user-defined function to sum the neighbor values
-static final class SumValues implements NeighborsFunction<Long, Long, Double, Tuple2<Long, Long>> {
-		
-	public Tuple2<Long, Long> iterateNeighbors(Iterable<Tuple3<Long, Edge<Long, Double>, 
-		Vertex<Long, Long>>> neighbors) {
-		
-		long sum = 0;
-		long vertexId = -1;
-
-		for (Tuple3<Long, Edge<Long, Double>, Vertex<Long, Long>> neighbor : neighbors) {
-			vertexId = neighbor.f0;
-			sum += neighbor.f2.getValue();
-		}
-		return new Tuple2<Long, Long>(vertexId, sum);
-	}
-}
-{% endhighlight %}
-
-<p class="text-center">
-    <img alt="reduseOnNeighbors Example" width="70%" src="img/gelly-reduceOnNeighbors.png"/>
-</p>
-
-When the aggregation computation does not require access to the vertex value (for which the aggregation is performed), it is advised to use the more efficient `EdgesFunction` and `NeighborsFunction` for the user-defined functions. When access to the vertex value is required, one should use `EdgesFunctionWithVertexValue` and `NeighborsFunctionWithVertexValue` instead. 
-
-[Back to top](#top)
-
-Vertex-centric Iterations
------------
-
-Gelly wraps Flink's [Spargel API](spargel_guide.html) to provide methods for vertex-centric iterations.
-Like in Spargel, the user only needs to implement two functions: a `VertexUpdateFunction`, which defines how a vertex will update its value
-based on the received messages and a `MessagingFunction`, which allows a vertex to send out messages for the next superstep.
-These functions and the maximum number of iterations to run are given as parameters to Gelly's `runVertexCentricIteration`.
-This method will execute the vertex-centric iteration on the input Graph and return a new Graph, with updated vertex values:
-
-{% highlight java %}
-Graph<Long, Double, Double> graph = ...
-
-// run Single-Source-Shortest-Paths vertex-centric iteration
-Graph<Long, Double, Double> result = 
-			graph.runVertexCentricIteration(
-			new VertexDistanceUpdater(), new MinDistanceMessenger(), maxIterations);
-
-// user-defined functions
-public static final class VertexDistanceUpdater {...}
-public static final class MinDistanceMessenger {...}
-
-{% endhighlight %}
-
-### Configuring a Vertex-Centric Iteration
-A vertex-centric iteration can be configured using an `IterationConfiguration` object.
-Currently, the following parameters can be specified:
-
-* <strong>Name</strong>: The name for the vertex-centric iteration. The name is displayed in logs and messages 
-and can be specified using the `setName()` method.
-
-* <strong>Parallelism</strong>: The parallelism for the iteration. It can be set using the `setParallelism()` method.	
-
-* <strong>Solution set in unmanaged memory</strong>: Defines whether the solution set is kept in managed memory (Flink's internal way of keeping objects in serialized form) or as a simple object map. By default, the solution set runs in managed memory. This property can be set using the `setSolutionSetUnmanagedMemory()` method.
-
-* <strong>Aggregators</strong>: Iteration aggregators can be registered using the `registerAggregator()` method. An iteration aggregator combines
-all aggregates globally once per superstep and makes them available in the next superstep. Registered aggregators can be accessed inside the user-defined `VertexUpdateFunction` and `MessagingFunction`.
-
-* <strong>Broadcast Variables</strong>: DataSets can be added as [Broadcast Variables](programming_guide.html#broadcast-variables) to the `VertexUpdateFunction` and `MessagingFunction`, using the `addBroadcastSetForUpdateFunction()` and `addBroadcastSetForMessagingFunction()` methods, respectively.
-
-{% highlight java %}
-
-Graph<Long, Double, Double> graph = ...
-
-// configure the iteration
-IterationConfiguration parameters = new IterationConfiguration();
-
-// set the iteration name
-parameters.setName("Gelly Iteration");
-
-// set the parallelism
-parameters.setParallelism(16);
-
-// register an aggregator
-parameters.registerAggregator("sumAggregator", new LongSumAggregator());
-
-// run the vertex-centric iteration, also passing the configuration parameters
-Graph<Long, Double, Double> result = 
-			graph.runVertexCentricIteration(
-			new VertexUpdater(), new Messenger(), maxIterations, parameters);
-
-// user-defined functions
-public static final class VertexUpdater extends VertexUpdateFunction {
-
-	LongSumAggregator aggregator = new LongSumAggregator();
-
-	public void preSuperstep() {
-	
-		// retrieve the Aggregator
-		aggregator = getIterationAggregator("sumAggregator");
-	}
-
-
-	public void updateVertex(Long vertexKey, Long vertexValue, MessageIterator inMessages) {
-		
-		//do some computation
-		Long partialValue = ...
-
-		// aggregate the partial value
-		aggregator.aggregate(partialValue);
-
-		// update the vertex value
-		setNewVertexValue(...);
-	}
-}
-
-public static final class Messenger extends MessagingFunction {...}
-
-{% endhighlight %}
-
-[Back to top](#top)
-
-Graph Validation
------------
-
-Gelly provides a simple utility for performing validation checks on input graphs. Depending on the application context, a graph may or may not be valid according to certain criteria. For example, a user might need to validate whether their graph contains duplicate edges or whether its structure is bipartite. In order to validate a graph, one can define a custom `GraphValidator` and implement its `validate()` method. `InvalidVertexIdsValidator` is Gelly's pre-defined validator. It checks that the edge set contains valid vertex IDs, i.e. that all edge IDs
-also exist in the vertex IDs set.
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-// create a list of vertices with IDs = {1, 2, 3, 4, 5}
-List<Vertex<Long, Long>> vertices = ...
-
-// create a list of edges with IDs = {(1, 2) (1, 3), (2, 4), (5, 6)}
-List<Edge<Long, Long>> edges = ...
-
-Graph<Long, Long, Long> graph = Graph.fromCollection(vertices, edges, env);
-
-// will return false: 6 is an invalid ID
-graph.validate(new InvalidVertexIdsValidator<Long, Long, Long>()); 
-
-{% endhighlight %}
-
-[Back to top](#top)
-
-Library Methods
------------
-Gelly has a growing collection of graph algorithms for easily analyzing large-scale Graphs. So far, the following library methods are implemented:
-
-* PageRank
-* Single-Source Shortest Paths
-* Label Propagation
-* Simple Community Detection
-
-Gelly's library methods can be used by simply calling the `run()` method on the input graph:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-Graph<Long, Long, NullValue> graph = ...
-
-// run Label Propagation for 30 iterations to detect communities on the input graph
-DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(
-				new LabelPropagation<Long>(30)).getVertices();
-
-// print the result
-verticesWithCommunity.print();
-
-env.execute();
-{% endhighlight %}
-
-[Back to top](#top)
-
-

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/hadoop_compatibility.md
----------------------------------------------------------------------
diff --git a/docs/hadoop_compatibility.md b/docs/hadoop_compatibility.md
deleted file mode 100644
index 92981b7..0000000
--- a/docs/hadoop_compatibility.md
+++ /dev/null
@@ -1,247 +0,0 @@
----
-title: "Hadoop Compatibility"
-is_beta: true
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-
-* This will be replaced by the TOC
-{:toc}
-
-Flink is compatible with Apache Hadoop MapReduce interfaces and therefore allows
-reusing code that was implemented for Hadoop MapReduce.
-
-You can:
-
-- use Hadoop's `Writable` [data types](programming_guide.html#data-types) in Flink programs.
-- use any Hadoop `InputFormat` as a [DataSource](programming_guide.html#data-sources).
-- use any Hadoop `OutputFormat` as a [DataSink](programming_guide.html#data-sinks).
-- use a Hadoop `Mapper` as [FlatMapFunction](dataset_transformations.html#flatmap).
-- use a Hadoop `Reducer` as [GroupReduceFunction](dataset_transformations.html#groupreduce-on-grouped-dataset).
-
-This document shows how to use existing Hadoop MapReduce code with Flink. Please refer to the
-[Connecting to other systems](example_connectors.html) guide for reading from Hadoop supported file systems.
-
-### Project Configuration
-
-Support for Haddop input/output formats is part of the `flink-java` and
-`flink-scala` Maven modules that are always required when writing Flink jobs.
-The code is located in `org.apache.flink.api.java.hadoop` and
-`org.apache.flink.api.scala.hadoop` in an additional sub-package for the
-`mapred` and `mapreduce` API.
-
-Support for Hadoop Mappers and Reducers is contained in the `flink-staging`
-Maven module.
-This code resides in the `org.apache.flink.hadoopcompatibility`
-package.
-
-Add the following dependency to your `pom.xml` if you want to reuse Mappers
-and Reducers.
-
-~~~xml
-<dependency>
-	<groupId>org.apache.flink</groupId>
-	<artifactId>flink-hadoop-compatibility</artifactId>
-	<version>{{site.FLINK_VERSION_SHORT}}</version>
-</dependency>
-~~~
-
-### Using Hadoop Data Types
-
-Flink supports all Hadoop `Writable` and `WritableComparable` data types
-out-of-the-box. You do not need to include the Hadoop Compatibility dependency,
-if you only want to use your Hadoop data types. See the
-[Programming Guide](programming_guide.html#data-types) for more details.
-
-### Using Hadoop InputFormats
-
-Hadoop input formats can be used to create a data source by using
-one of the methods `readHadoopFile` or `createHadoopInput` of the
-`ExecutionEnvironment`. The former is used for input formats derived
-from `FileInputFormat` while the latter has to be used for general purpose
-input formats.
-
-The resulting `DataSet` contains 2-tuples where the first field
-is the key and the second field is the value retrieved from the Hadoop
-InputFormat.
-
-The following example shows how to use Hadoop's `TextInputFormat`.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-DataSet<Tuple2<LongWritable, Text>> input =
-    env.readHadoopFile(new TextInputFormat(), LongWritable.class, Text.class, textPath);
-
-// Do something with the data.
-[...]
-~~~
-
-</div>
-<div data-lang="scala" markdown="1">
-
-~~~scala
-val env = ExecutionEnvironment.getExecutionEnvironment
-		
-val input: DataSet[(LongWritable, Text)] =
-  env.readHadoopFile(new TextInputFormat, classOf[LongWritable], classOf[Text], textPath)
-
-// Do something with the data.
-[...]
-~~~
-
-</div>
-
-</div>
-
-### Using Hadoop OutputFormats
-
-Flink provides a compatibility wrapper for Hadoop `OutputFormats`. Any class
-that implements `org.apache.hadoop.mapred.OutputFormat` or extends
-`org.apache.hadoop.mapreduce.OutputFormat` is supported.
-The OutputFormat wrapper expects its input data to be a DataSet containing
-2-tuples of key and value. These are to be processed by the Hadoop OutputFormat.
-
-The following example shows how to use Hadoop's `TextOutputFormat`.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-~~~java
-// Obtain the result we want to emit
-DataSet<Tuple2<Text, IntWritable>> hadoopResult = [...]
-		
-// Set up the Hadoop TextOutputFormat.
-HadoopOutputFormat<Text, IntWritable> hadoopOF = 
-  // create the Flink wrapper.
-  new HadoopOutputFormat<Text, IntWritable>(
-    // set the Hadoop OutputFormat and specify the job.
-    new TextOutputFormat<Text, IntWritable>(), job
-  );
-hadoopOF.getConfiguration().set("mapreduce.output.textoutputformat.separator", " ");
-TextOutputFormat.setOutputPath(job, new Path(outputPath));
-		
-// Emit data using the Hadoop TextOutputFormat.
-hadoopResult.output(hadoopOF);
-~~~
-
-</div>
-<div data-lang="scala" markdown="1">
-
-~~~scala
-// Obtain your result to emit.
-val hadoopResult: DataSet[(Text, IntWritable)] = [...]
-
-val hadoopOF = new HadoopOutputFormat[Text,IntWritable](
-  new TextOutputFormat[Text, IntWritable],
-  new JobConf)
-
-hadoopOF.getJobConf.set("mapred.textoutputformat.separator", " ")
-FileOutputFormat.setOutputPath(hadoopOF.getJobConf, new Path(resultPath))
-
-hadoopResult.output(hadoopOF)
-
-		
-~~~
-
-</div>
-
-</div>
-
-### Using Hadoop Mappers and Reducers
-
-Hadoop Mappers are semantically equivalent to Flink's [FlatMapFunctions](dataset_transformations.html#flatmap) and Hadoop Reducers are equivalent to Flink's [GroupReduceFunctions](dataset_transformations.html#groupreduce-on-grouped-dataset). Flink provides wrappers for implementations of Hadoop MapReduce's `Mapper` and `Reducer` interfaces, i.e., you can reuse your Hadoop Mappers and Reducers in regular Flink programs. At the moment, only the Mapper and Reduce interfaces of Hadoop's mapred API (`org.apache.hadoop.mapred`) are supported.
-
-The wrappers take a `DataSet<Tuple2<KEYIN,VALUEIN>>` as input and produce a `DataSet<Tuple2<KEYOUT,VALUEOUT>>` as output where `KEYIN` and `KEYOUT` are the keys and `VALUEIN` and `VALUEOUT` are the values of the Hadoop key-value pairs that are processed by the Hadoop functions. For Reducers, Flink offers a wrapper for a GroupReduceFunction with (`HadoopReduceCombineFunction`) and without a Combiner (`HadoopReduceFunction`). The wrappers accept an optional `JobConf` object to configure the Hadoop Mapper or Reducer.
-
-Flink's function wrappers are 
-
-- `org.apache.flink.hadoopcompatibility.mapred.HadoopMapFunction`,
-- `org.apache.flink.hadoopcompatibility.mapred.HadoopReduceFunction`, and
-- `org.apache.flink.hadoopcompatibility.mapred.HadoopReduceCombineFunction`.
-
-and can be used as regular Flink [FlatMapFunctions](dataset_transformations.html#flatmap) or [GroupReduceFunctions](dataset_transformations.html#groupreduce-on-grouped-dataset).
-
-The following example shows how to use Hadoop `Mapper` and `Reducer` functions.
-
-~~~java
-// Obtain data to process somehow.
-DataSet<Tuple2<Text, LongWritable>> text = [...]
-
-DataSet<Tuple2<Text, LongWritable>> result = text
-  // use Hadoop Mapper (Tokenizer) as MapFunction
-  .flatMap(new HadoopMapFunction<LongWritable, Text, Text, LongWritable>(
-    new Tokenizer()
-  ))
-  .groupBy(0)
-  // use Hadoop Reducer (Counter) as Reduce- and CombineFunction
-  .reduceGroup(new HadoopReduceCombineFunction<Text, LongWritable, Text, LongWritable>(
-    new Counter(), new Counter()
-  ));
-~~~
-
-**Please note:** The Reducer wrapper works on groups as defined by Flink's [groupBy()](dataset_transformations.html#transformations-on-grouped-dataset) operation. It does not consider any custom partitioners, sort or grouping comparators you might have set in the `JobConf`. 
-
-### Complete Hadoop WordCount Example
-
-The following example shows a complete WordCount implementation using Hadoop data types, Input- and OutputFormats, and Mapper and Reducer implementations.
-
-~~~java
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-		
-// Set up the Hadoop TextInputFormat.
-Job job = Job.getInstance();
-HadoopInputFormat<LongWritable, Text> hadoopIF = 
-  new HadoopInputFormat<LongWritable, Text>(
-    new TextInputFormat(), LongWritable.class, Text.class, job
-  );
-TextInputFormat.addInputPath(job, new Path(inputPath));
-		
-// Read data using the Hadoop TextInputFormat.
-DataSet<Tuple2<LongWritable, Text>> text = env.createInput(hadoopIF);
-
-DataSet<Tuple2<Text, LongWritable>> result = text
-  // use Hadoop Mapper (Tokenizer) as MapFunction
-  .flatMap(new HadoopMapFunction<LongWritable, Text, Text, LongWritable>(
-    new Tokenizer()
-  ))
-  .groupBy(0)
-  // use Hadoop Reducer (Counter) as Reduce- and CombineFunction
-  .reduceGroup(new HadoopReduceCombineFunction<Text, LongWritable, Text, LongWritable>(
-    new Counter(), new Counter()
-  ));
-
-// Set up the Hadoop TextOutputFormat.
-HadoopOutputFormat<Text, IntWritable> hadoopOF = 
-  new HadoopOutputFormat<Text, IntWritable>(
-    new TextOutputFormat<Text, IntWritable>(), job
-  );
-hadoopOF.getConfiguration().set("mapreduce.output.textoutputformat.separator", " ");
-TextOutputFormat.setOutputPath(job, new Path(outputPath));
-		
-// Emit data using the Hadoop TextOutputFormat.
-result.output(hadoopOF);
-
-// Execute Program
-env.execute("Hadoop WordCount");
-~~~

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/how_to_contribute.md
----------------------------------------------------------------------
diff --git a/docs/how_to_contribute.md b/docs/how_to_contribute.md
deleted file mode 100644
index 86ef780..0000000
--- a/docs/how_to_contribute.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title:  "How to contribute"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-The "How to contribute"-guide is now located [on the project website](http://flink.apache.org/how-to-contribute.html).