You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by uc...@apache.org on 2017/01/24 10:12:50 UTC
flink git commit: [FLINK-5404] [docs] Consolidate and update AWS setup

Repository: flink
Updated Branches:
  refs/heads/master 3c4a4b2fe -> f0d96e30d


[FLINK-5404] [docs] Consolidate and update AWS setup

This closes #3054.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/f0d96e30
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/f0d96e30
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/f0d96e30

Branch: refs/heads/master
Commit: f0d96e30d33b6c0c307b8c14dc450feb99e62bab
Parents: 3c4a4b2
Author: medale <me...@yahoo.com>
Authored: Mon Jan 2 22:13:56 2017 -0500
Committer: Ufuk Celebi <uc...@apache.org>
Committed: Tue Jan 24 11:12:33 2017 +0100

----------------------------------------------------------------------
 docs/dev/batch/connectors.md | 28 ++++------------------------
 docs/setup/aws.md            | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 33 insertions(+), 29 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/f0d96e30/docs/dev/batch/connectors.md
----------------------------------------------------------------------
diff --git a/docs/dev/batch/connectors.md b/docs/dev/batch/connectors.md
index 4e5b009..912d0af 100644
--- a/docs/dev/batch/connectors.md
+++ b/docs/dev/batch/connectors.md
@@ -52,33 +52,13 @@ interface. There are Hadoop `FileSystem` implementations for
 
 In order to use a Hadoop file system with Flink, make sure that
 
-- the `flink-conf.yaml` has set the `fs.hdfs.hadoopconf` property set to the Hadoop configuration directory.
-- the Hadoop configuration (in that directory) has an entry for the required file system. Examples for S3 and Alluxio are shown below.
-- the required classes for using the file system are available in the `lib/` folder of the Flink installation (on all machines running Flink). If putting the files into the directory is not possible, Flink is also respecting the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
+- the `flink-conf.yaml` has set the `fs.hdfs.hadoopconf` property to the Hadoop configuration directory. For automated testing or running from an IDE the directory containing `flink-conf.yaml` can be set by defining the `FLINK_CONF_DIR` environment variable.
+- the Hadoop configuration (in that directory) has an entry for the required file system in a file `core-site.xml`. Examples for S3 and Alluxio are linked/shown below.
+- the required classes for using the file system are available in the `lib/` folder of the Flink installation (on all machines running Flink). If putting the files into the directory is not possible, Flink also respects the `HADOOP_CLASSPATH` environment variable to add Hadoop jar files to the classpath.
 
 #### Amazon S3
 
-For Amazon S3 support add the following entries into the `core-site.xml` file:
-
-~~~xml
-<!-- configure the file system implementation -->
-<property>
-  <name>fs.s3.impl</name>
-  <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
-</property>
-
-<!-- set your AWS ID -->
-<property>
-  <name>fs.s3.awsAccessKeyId</name>
-  <value>putKeyHere</value>
-</property>
-
-<!-- set your AWS access key -->
-<property>
-  <name>fs.s3.awsSecretAccessKey</name>
-  <value>putSecretHere</value>
-</property>
-~~~
+See [Deployment & Operations - Deployment - AWS - S3: Simple Storage Service]({{ site.baseurl }}/setup/aws.html) for available S3 file system implementations, their configuration and required libraries.
 
 #### Alluxio
 

http://git-wip-us.apache.org/repos/asf/flink/blob/f0d96e30/docs/setup/aws.md
----------------------------------------------------------------------
diff --git a/docs/setup/aws.md b/docs/setup/aws.md
index 8d04d59..9ebef61 100644
--- a/docs/setup/aws.md
+++ b/docs/setup/aws.md
@@ -98,6 +98,8 @@ This is the recommended S3 FileSystem implementation to use. It uses Amazon's SD
 You need to point Flink to a valid Hadoop configuration, which contains the following properties in `core-site.xml`:
 
 ```xml
+<configuration>
+
 <property>
   <name>fs.s3.impl</name>
   <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
@@ -109,6 +111,8 @@ You need to point Flink to a valid Hadoop configuration, which contains the foll
   <name>fs.s3.buffer.dir</name>
   <value>/tmp</value>
 </property>
+
+</configuration>
 ```
 
 This registers `S3AFileSystem` as the default FileSystem for URIs with the `s3://` scheme.
@@ -130,13 +134,13 @@ This registers `NativeS3FileSystem` as the default FileSystem for URIs with the
 
 #### Hadoop Configuration
 
-You can specify the [Hadoop configuration]({{ site.baseurl }}/setup/config.html#hdfs) in various ways, for examples by configuring the path to the Hadoop configuration directory in `flink-conf.yaml`:
+You can specify the [Hadoop configuration]({{ site.baseurl }}/setup/config.html#hdfs) in various ways, for example by configuring the path to the Hadoop configuration directory in `flink-conf.yaml`:
 
 ```
 fs.hdfs.hadoopconf: /path/to/etc/hadoop
 ```
 
-This registers `path/to/etc/hadoop` as Hadoop's configuration directory with Flink.
+This registers `/path/to/etc/hadoop` as Hadoop's configuration directory with Flink. Flink will look for the `core-site.xml` and `hdfs-site.xml` files in the specified directory.  
 
 {% top %}
 
@@ -148,7 +152,7 @@ After setting up the S3 FileSystem, you need to make sure that Flink is allowed
 
 #### Identity and Access Management (IAM) (Recommended)
 
-The recommended way of setting up credentials on AWS is via [Identity and Access Management (IAM)](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html). You can use IAM features to securely give Flink instances the credentials that they need in order to access S3 buckets. Details about how to do this are beyond the scope of this documentation. Please refer to the AWS user guide. What you are looking for are [IAM Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
+When using `S3AFileSystem` the recommended way of setting up credentials on AWS is via [Identity and Access Management (IAM)](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html). You can use IAM features to securely give Flink instances the credentials that they need in order to access S3 buckets. Details about how to do this are beyond the scope of this documentation. Please refer to the AWS user guide. What you are looking for are [IAM Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
 
 If you set this up correctly, you can manage access to S3 within AWS and don't need to distribute any access keys to Flink.
 
@@ -156,11 +160,31 @@ Note that this only works with `S3AFileSystem` and not `NativeS3FileSystem`.
 
 {% top %}
 
-#### Access Keys (Discouraged)
+#### Access Keys with S3AFileSystem (Discouraged)
 
 Access to S3 can be granted via your **access and secret key pair**. Please note that this is discouraged since the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
 
-You need to configure both `fs.s3.awsAccessKeyId` and `fs.s3.awsSecretAccessKey`  in Hadoop's  `core-site.xml`:
+For `S3AFileSystem` you need to configure both `fs.s3a.access.key` and `fs.s3a.secret.key`  in Hadoop's  `core-site.xml`:
+
+```xml
+<property>
+  <name>fs.s3a.access.key</name>
+  <value></value>
+</property>
+
+<property>
+  <name>fs.s3a.secret.key</name>
+  <value></value>
+</property>
+```
+
+{% top %}
+
+#### Access Keys with NativeS3FileSystem (Discouraged)
+
+Access to S3 can be granted via your **access and secret key pair**. But this is discouraged and you should use `S3AFileSystem` [with the required IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
+
+For `NativeS3FileSystem` you need to configure both `fs.s3.awsAccessKeyId` and `fs.s3.awsSecretAccessKey`  in Hadoop's  `core-site.xml`:
 
 ```xml
 <property>