You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ch...@apache.org on 2018/07/09 10:36:44 UTC
flink git commit: [FLINK-9772][docs] Update Hadoop compatibility docs regarding HadoopInputs

Repository: flink
Updated Branches:
  refs/heads/master 8655d6db5 -> b5b4fb9bc


[FLINK-9772][docs] Update Hadoop compatibility docs regarding HadoopInputs

This closes #6278.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/b5b4fb9b
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/b5b4fb9b
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/b5b4fb9b

Branch: refs/heads/master
Commit: b5b4fb9bc8ab71953135674d099c10d5bfa4be9b
Parents: 8655d6d
Author: yanghua <ya...@gmail.com>
Authored: Sat Jul 7 11:49:21 2018 +0800
Committer: zentol <ch...@apache.org>
Committed: Mon Jul 9 12:35:42 2018 +0200

----------------------------------------------------------------------
 docs/dev/batch/hadoop_compatibility.md | 15 ++++++++++-----
 docs/dev/batch/index.md                | 15 ---------------
 2 files changed, 10 insertions(+), 20 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/b5b4fb9b/docs/dev/batch/hadoop_compatibility.md
----------------------------------------------------------------------
diff --git a/docs/dev/batch/hadoop_compatibility.md b/docs/dev/batch/hadoop_compatibility.md
index 4eba2a8..c665b5d 100644
--- a/docs/dev/batch/hadoop_compatibility.md
+++ b/docs/dev/batch/hadoop_compatibility.md
@@ -73,11 +73,14 @@ if you only want to use your Hadoop data types. See the
 
 ### Using Hadoop InputFormats
 
-Hadoop input formats can be used to create a data source by using
-one of the methods `readHadoopFile` or `createHadoopInput` of the
-`ExecutionEnvironment`. The former is used for input formats derived
+To use Hadoop `InputFormats` with Flink the format must first be wrapped
+using either `readHadoopFile` or `createHadoopInput` of the
+`HadoopInputs` utilty class. 
+The former is used for input formats derived
 from `FileInputFormat` while the latter has to be used for general purpose
 input formats.
+The resulting `InputFormat` can be used to create a data source by using
+`ExecutionEnvironmen#createInput`.
 
 The resulting `DataSet` contains 2-tuples where the first field
 is the key and the second field is the value retrieved from the Hadoop
@@ -92,7 +95,8 @@ The following example shows how to use Hadoop's `TextInputFormat`.
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 DataSet<Tuple2<LongWritable, Text>> input =
-    env.readHadoopFile(new TextInputFormat(), LongWritable.class, Text.class, textPath);
+    env.createInput(HadoopInputs.readHadoopFile(new TextInputFormat(),
+                        LongWritable.class, Text.class, textPath));
 
 // Do something with the data.
 [...]
@@ -105,7 +109,8 @@ DataSet<Tuple2<LongWritable, Text>> input =
 val env = ExecutionEnvironment.getExecutionEnvironment
 
 val input: DataSet[(LongWritable, Text)] =
-  env.readHadoopFile(new TextInputFormat, classOf[LongWritable], classOf[Text], textPath)
+  env.createInput(HadoopInputs.readHadoopFile(
+                    new TextInputFormat, classOf[LongWritable], classOf[Text], textPath))
 
 // Do something with the data.
 [...]

http://git-wip-us.apache.org/repos/asf/flink/blob/b5b4fb9b/docs/dev/batch/index.md
----------------------------------------------------------------------
diff --git a/docs/dev/batch/index.md b/docs/dev/batch/index.md
index 24b3390..c624fce 100644
--- a/docs/dev/batch/index.md
+++ b/docs/dev/batch/index.md
@@ -824,9 +824,6 @@ File-based:
 - `readFileOfPrimitives(path, delimiter, Class)` / `PrimitiveInputFormat` - Parses files of new-line (or another char sequence)
    delimited primitive data types such as `String` or `Integer` using the given delimiter.
 
-- `readHadoopFile(FileInputFormat, Key, Value, path)` / `FileInputFormat` - Creates a JobConf and reads file from the specified
-   path with the specified FileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
-
 - `readSequenceFile(Key, Value, path)` / `SequenceFileInputFormat` - Creates a JobConf and reads file from the specified path with
    type SequenceFileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
 
@@ -878,11 +875,6 @@ DataSet<Tuple2<String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file
 DataSet<Person>> csvInput = env.readCsvFile("hdfs:///the/CSV/file")
                          .pojoType(Person.class, "name", "age", "zipcode");
 
-
-// read a file from the specified path of type TextInputFormat
-DataSet<Tuple2<LongWritable, Text>> tuples =
- env.readHadoopFile(new TextInputFormat(), LongWritable.class, Text.class, "hdfs://nnHost:nnPort/path/to/file");
-
 // read a file from the specified path of type SequenceFileInputFormat
 DataSet<Tuple2<IntWritable, Text>> tuples =
  env.readSequenceFile(IntWritable.class, Text.class, "hdfs://nnHost:nnPort/path/to/file");
@@ -974,9 +966,6 @@ File-based:
 - `readFileOfPrimitives(path, delimiter)` / `PrimitiveInputFormat` - Parses files of new-line (or another char sequence)
   delimited primitive data types such as `String` or `Integer` using the given delimiter.
 
-- `readHadoopFile(FileInputFormat, Key, Value, path)` / `FileInputFormat` - Creates a JobConf and reads file from the specified
-   path with the specified FileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
-
 - `readSequenceFile(Key, Value, path)` / `SequenceFileInputFormat` - Creates a JobConf and reads file from the specified path with
    type SequenceFileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
 
@@ -1039,10 +1028,6 @@ val values = env.fromElements("Foo", "bar", "foobar", "fubar")
 // generate a number sequence
 val numbers = env.generateSequence(1, 10000000)
 
-// read a file from the specified path of type TextInputFormat
-val tuples = env.readHadoopFile(new TextInputFormat, classOf[LongWritable],
- classOf[Text], "hdfs://nnHost:nnPort/path/to/file")
-
 // read a file from the specified path of type SequenceFileInputFormat
 val tuples = env.readSequenceFile(classOf[IntWritable], classOf[Text],
  "hdfs://nnHost:nnPort/path/to/file")