You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by se...@apache.org on 2014/08/20 13:47:29 UTC

[1/2] git commit: [FLINK-1050] Windows local setup documentation: added note for java being in %PATH% variable, headers, and missing link label

Repository: incubator-flink
Updated Branches:
  refs/heads/master e8f2e9d0e -> 6c48063ad


[FLINK-1050] Windows local setup documentation: added note for java being in %PATH% variable, headers, and missing link label

This closes #95


Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/91a52a14
Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/91a52a14
Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/91a52a14

Branch: refs/heads/master
Commit: 91a52a140a11f8a917dd735b384b61387bbf47e6
Parents: e8f2e9d
Author: Fabian Hueske <fh...@apache.org>
Authored: Thu Aug 14 18:00:57 2014 +0200
Committer: Stephan Ewen <se...@apache.org>
Committed: Wed Aug 20 13:32:53 2014 +0200

----------------------------------------------------------------------
 docs/local_setup.md | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/91a52a14/docs/local_setup.md
----------------------------------------------------------------------
diff --git a/docs/local_setup.md b/docs/local_setup.md
index dd8ffe5..d12401b 100644
--- a/docs/local_setup.md
+++ b/docs/local_setup.md
@@ -56,12 +56,17 @@ INFO ... - Starting web info server for JobManager on port 8081
 
 The JobManager will also start a web frontend on port 8081, which you can check with your browser at `http://localhost:8081`.
 
+<section id="windows">
 # Flink on Windows
 
 If you want to run Flink on Windows you need to download, unpack and configure the Flink archive as mentioned above. After that you can either use the **Windows Batch** file (`.bat`) or use **Cygwin**  to run the Flink Jobmanager.
 
+### Starting with Windows Batch Files
+
 To start Flink in local mode from the *Windows Batch*, open the command window, navigate to the `bin/` directory of Flink and run `start-local.bat`.
 
+Note: The ``bin`` folder of your Java Runtime Environment must be included in Window's ``%PATH%`` variable. Follow this [guide](http://www.java.com/en/download/help/path.xml) to add Java to the ``%PATH%`` variable.
+
 ```bash
 $ cd flink
 $ cd bin
@@ -72,6 +77,8 @@ Do not close this batch window. Stop job manager by pressing Ctrl+C.
 
 After that, you need to open a second terminal to run jobs using `flink.bat`.
 
+### Starting with Cygwin and Unix Scripts
+
 With *Cygwin* you need to start the Cygwin Terminal, navigate to your Flink directory and run the `start-local.sh` script:
 
 ```bash
@@ -80,6 +87,8 @@ $ bin/start-local.sh
 Starting Nephele job manager
 ```
 
+### Installing Flink from Git
+
 If you are installing Flink from the git repository and you are using the Windows git shell, Cygwin can produce a failure similiar to this one:
 
 ```bash


[2/2] git commit: [Documentation] Add documentation on accessing Microsoft Azure Storage Tables [ci skip]

Posted by se...@apache.org.
[Documentation] Add documentation on accessing Microsoft Azure Storage Tables
[ci skip]

This closes #100


Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/6c48063a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/6c48063a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/6c48063a

Branch: refs/heads/master
Commit: 6c48063adf2caddd5041d373fd7c25629df1b245
Parents: 91a52a1
Author: Robert Metzger <rm...@apache.org>
Authored: Wed Aug 20 11:49:37 2014 +0200
Committer: Stephan Ewen <se...@apache.org>
Committed: Wed Aug 20 13:43:37 2014 +0200

----------------------------------------------------------------------
 docs/_config.yml           |   1 +
 docs/_layouts/docs.html    |   1 +
 docs/example_connectors.md | 111 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/6c48063a/docs/_config.yml
----------------------------------------------------------------------
diff --git a/docs/_config.yml b/docs/_config.yml
index 3f2e410..fbb44a6 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -6,6 +6,7 @@
 #------------------------------------------------------------------------------
 
 FLINK_VERSION_STABLE: 0.6-incubating-SNAPSHOT # this variable can point to a SNAPSHOT version in the git source.
+FLINK_VERSION_HADOOP_2_STABLE: 0.6-hadoop2-incubating-SNAPSHOT
 FLINK_VERSION_SHORT: 0.6
 FLINK_ISSUES_URL: https://issues.apache.org/jira/browse/FLINK
 FLINK_GITHUB_URL:  https://github.com/apache/incubator-flink

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/6c48063a/docs/_layouts/docs.html
----------------------------------------------------------------------
diff --git a/docs/_layouts/docs.html b/docs/_layouts/docs.html
index f69b26a..a9b94be 100644
--- a/docs/_layouts/docs.html
+++ b/docs/_layouts/docs.html
@@ -58,6 +58,7 @@
                         <ul>
                             <li><a href="java_api_examples.html">Java API</a></li>
                             <li><a href="scala_api_examples.html">Scala API</a></li>
+                            <li><a href="example_connectors.html">Connecting to other systems</a></li>
                         </ul>
                     </li>
 

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/6c48063a/docs/example_connectors.md
----------------------------------------------------------------------
diff --git a/docs/example_connectors.md b/docs/example_connectors.md
new file mode 100644
index 0000000..9b55d91
--- /dev/null
+++ b/docs/example_connectors.md
@@ -0,0 +1,111 @@
+---
+title:  "Example: Connectors"
+---
+
+Apache Flink allows users to access many different systems as data sources or sinks. The system is designed for very easy extensibility. Similar to Apache Hadoop, Flink has the concept of so called `InputFormat`s and `OutputFormat`s.
+
+One implementation of these `InputFormat`s is the `HadoopInputFormat`. This is a wrapper that allows users to use all existing Hadoop input formats with Flink.
+
+This page shows some examples for connecting Flink to other systems.
+
+
+## Access Microsoft Azure Table Storage
+
+_Note: This example works starting from Flink 0.6-incubating_
+
+This example is using the `HadoopInputFormat` wrapper to use an existing Hadoop input format implementation for accessing [Azure's Table Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/).
+
+1. Download and compile the `azure-tables-hadoop` project. The input format developed by the project is not yet available in Maven Central, therefore, we have to build the project ourselves.
+Execute the following commands:
+
+    ```bash
+    git clone https://github.com/mooso/azure-tables-hadoop.git
+    cd azure-tables-hadoop
+    mvn clean install
+    ```
+
+2. Setup a new Flink project using the quickstarts:
+
+    ```bash
+    curl https://raw.githubusercontent.com/apache/incubator-flink/master/flink-quickstart/quickstart.sh | bash
+    ```
+
+3. Set the the version of Flink to `{{site.FLINK_VERSION_HADOOP_2_STABLE}}` in the `pom.xml` file. The quickstart.sh script sets the version to the `hadoop1` version of Flink. Since the `microsoft-hadoop-azure` has been written for Hadoop 2.2 (mapreduce-API) version, we need to use the appropriate Flink version. 
+
+    Replace all occurences of `<version>{{site.FLINK_VERSION_STABLE}}</version>` with `<version>{{site.FLINK_VERSION_HADOOP_2_STABLE}}</version>`.
+4. Add the following dependencies (in the `<dependencies>` section) to your `pom.xml` file:
+
+    ```xml
+    <dependency>
+    	<groupId>org.apache.flink</groupId>
+    	<artifactId>flink-hadoop-compatibility</artifactId>
+    	<version>{{site.FLINK_VERSION_HADOOP_2_STABLE}}</version>
+    </dependency>
+    <dependency>
+      <groupId>com.microsoft.hadoop</groupId>
+      <artifactId>microsoft-hadoop-azure</artifactId>
+      <version>0.0.4</version>
+    </dependency>
+    ```
+    - `flink-hadoop-compatibility` is a Flink package that provides the Hadoop input format wrappers.
+    - `microsoft-hadoop-azure` is adding the project we've build before to our project.
+
+The project is now prepared for starting to code. We recommend to import the project into an IDE, such as Eclipse or IntelliJ. (Import as a Maven project!).
+Browse to the code of the `Job.java` file. Its an empty skeleton for a Flink job.
+
+Paste the following code into it:
+```java
+import java.util.Map;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.hadoopcompatibility.mapreduce.HadoopInputFormat;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.Job;
+import com.microsoft.hadoop.azure.AzureTableConfiguration;
+import com.microsoft.hadoop.azure.AzureTableInputFormat;
+import com.microsoft.hadoop.azure.WritableEntity;
+import com.microsoft.windowsazure.storage.table.EntityProperty;
+
+public class AzureTableExample {
+  public static void main(String[] args) throws Exception {
+    // set up the execution environment
+    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+    // create a  AzureTableInputFormat, using a Hadoop input format wrapper
+    HadoopInputFormat<Text, WritableEntity> hdIf = new HadoopInputFormat<Text, WritableEntity>(new AzureTableInputFormat(), Text.class, WritableEntity.class, new Job());
+    // set the Account URI, something like: https://apacheflink.table.core.windows.net
+    hdIf.getConfiguration().set(AzureTableConfiguration.Keys.ACCOUNT_URI.getKey(), "TODO"); 
+    // set the secret storage key here
+    hdIf.getConfiguration().set(AzureTableConfiguration.Keys.STORAGE_KEY.getKey(), "TODO");
+    // set the table name here
+    hdIf.getConfiguration().set(AzureTableConfiguration.Keys.TABLE_NAME.getKey(), "TODO");
+    
+    DataSet<Tuple2<Text, WritableEntity>> input = env.createInput(hdIf);
+    // a little example how to use the data in a mapper.
+    DataSet<String> fin = input.map(new MapFunction<Tuple2<Text,WritableEntity>, String>() {
+      @Override
+      public String map(Tuple2<Text, WritableEntity> arg0) throws Exception {
+        System.err.println("--------------------------------\nKey = "+arg0.f0);
+        WritableEntity we = arg0.f1;
+        for(Map.Entry<String, EntityProperty> prop : we.getProperties().entrySet()) {
+          System.err.println("key="+prop.getKey() + " ; value (asString)="+prop.getValue().getValueAsString());
+        }
+        return arg0.f0.toString();
+      }
+    });
+    // emit result (this works only locally)
+    fin.print();
+    // execute program
+    env.execute("Azure Example");
+  }
+}
+```
+The example shows how to access an Azure table and turn data into Flink's `DataSet` (more specifically, the type of the set is `DataSet<Tuple2<Text, WritableEntity>>`). With the `DataSet`, you can apply all known transformations to the DataSet.
+
+## Access MongoDB
+
+_Note: This example works starting from Flink 0.5 (then called Stratosphere)_
+
+Please see this (slightly outdated) blogpost on [How to access MongoDB with Apache Flink](http://flink.incubator.apache.org/news/2014/01/28/querying_mongodb.html).
+