You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by st...@apache.org on 2014/11/26 12:44:32 UTC

[1/2] hadoop git commit: HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading in concurrent context. (stack via stevel)

Repository: hadoop
Updated Branches:
  refs/heads/branch-2 0c8a8ba86 -> 6a01497d7
  refs/heads/trunk 4a3161182 -> aa7dac335


HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading  in concurrent context. (stack via stevel)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/6a01497d
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/6a01497d
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/6a01497d

Branch: refs/heads/branch-2
Commit: 6a01497d7c410f6e6ecc094c3dddc9d68e184585
Parents: 0c8a8ba
Author: Steve Loughran <st...@apache.org>
Authored: Wed Nov 26 11:43:46 2014 +0000
Committer: Steve Loughran <st...@apache.org>
Committed: Wed Nov 26 11:43:46 2014 +0000

----------------------------------------------------------------------
 .../markdown/filesystem/fsdatainputstream.md    | 62 +++++++++++++++-----
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt     |  3 +
 2 files changed, 49 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/6a01497d/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
index 0b0469b..4b4318e 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
@@ -31,8 +31,8 @@ with extensions that add key assumptions to the system.
 reads starting at this offset.
 1. The cost of forward and backward seeks is low.
 1. There is no requirement for the stream implementation to be thread-safe.
- Callers MUST assume that instances are not thread-safe.
-
+1. BUT, if a stream implements [PositionedReadable](#PositionedReadable),
+ "positioned reads" MUST be thread-safe.
 
 Files are opened via `FileSystem.open(p)`, which, if successful, returns:
 
@@ -84,7 +84,7 @@ unexpectedly.
     FSDIS' = ((undefined), (undefined), False)
 
 
-### `Seekable.getPos()`
+### <a name="Seekable.getPos"></a>`Seekable.getPos()`
 
 Return the current position. The outcome when a stream is closed is undefined.
 
@@ -97,7 +97,7 @@ Return the current position. The outcome when a stream is closed is undefined.
     result = pos(FSDIS)
 
 
-### `InputStream.read()`
+### <a name="InputStream.read"></a> `InputStream.read()`
 
 Return the data at the current position.
 
@@ -117,7 +117,7 @@ Return the data at the current position.
         result = -1
 
 
-### `InputStream.read(buffer[], offset, length)`
+### <a name="InputStream.read.buffer[]"></a> `InputStream.read(buffer[], offset, length)`
 
 Read `length` bytes of data into the destination buffer, starting at offset
 `offset`
@@ -151,7 +151,7 @@ Exceptions that may be raised on precondition failure are
           FSDIS' = (pos+l, data, true)
           result = l
 
-### `Seekable.seek(s)`
+### <a name="Seekable.seek"></a>`Seekable.seek(s)`
 
 
 #### Preconditions
@@ -237,14 +237,47 @@ class, which can react to a checksum error in a read by attempting to source
 the data elsewhere. It a new source can be found it attempts to reread and
 recheck that portion of the file.
 
-## interface `PositionedReadable`
+## <a name="PositionedReadable"></a> interface `PositionedReadable`
+
+The `PositionedReadable` operations supply "positioned reads" ("pread").
+They provide the ability to read data into a buffer from a specific
+position in the data stream. Positioned reads equate to a
+[`Seekable.seek`](#Seekable.seek) at a particular offset followed by a
+[`InputStream.read(buffer[], offset, length)`](#InputStream.read.buffer[]),
+only there is a single method invocation, rather than `seek` then
+`read`, and two positioned reads can *optionally* run concurrently
+over a single instance of a `FSDataInputStream` stream.
+
+The interface declares positioned reads thread-safe (some of the
+implementations do not follow this guarantee).
+
+Any positional read run concurrent with a stream operation &mdash; e.g.
+[`Seekable.seek`](#Seekable.seek), [`Seekable.getPos()`](#Seekable.getPos),
+and [`InputStream.read()`](#InputStream.read) &mdash; MUST run in
+isolation; there must not be  mutual interference.
+
+Concurrent positional reads and stream operations MUST be serializable;
+one may block the other so they run in series but, for better throughput
+and 'liveness', they SHOULD run concurrently.
 
-The `PositionedReadable` operations provide the ability to
-read data into a buffer from a specific position in
-the data stream.
+Given two parallel positional reads, one at `pos1` for `len1` into buffer
+`dest1`, and another at `pos2` for `len2` into buffer `dest2`, AND given
+a concurrent, stream read run after a seek to `pos3`, the resultant
+buffers MUST be filled as follows, even if the reads happen to overlap
+on the underlying stream:
 
-Although the interface declares that it must be thread safe,
-some of the implementations do not follow this guarantee.
+    // Positioned read #1
+    read(pos1, dest1, ... len1) -> dest1[0..len1 - 1] =
+      [data(FS, path, pos1), data(FS, path, pos1 + 1) ... data(FS, path, pos1 + len1 - 1]
+
+    // Positioned read #2
+    read(pos2, dest2, ... len2) -> dest2[0..len2 - 1] =
+      [data(FS, path, pos2), data(FS, path, pos2 + 1) ... data(FS, path, pos2 + len2 - 1]
+
+    // Stream read
+    seek(pos3);
+    read(dest3, ... len3) -> dest3[0..len3 - 1] =
+      [data(FS, path, pos3), data(FS, path, pos3 + 1) ... data(FS, path, pos3 + len3 - 1]
 
 #### Implementation preconditions
 
@@ -265,9 +298,6 @@ of `pos` is unchanged at the end of the operation
     pos(FSDIS') == pos(FSDIS)
 
 
-There are no guarantees that this holds *during* the operation.
-
-
 #### Failure states
 
 For any operations that fail, the contents of the destination
@@ -326,7 +356,7 @@ The semantics of this are exactly equivalent to
 are expected to receive access to the data of `FS.Files[p]` at the time of opening.
 * If the underlying data is changed during the read process, these changes MAY or
 MAY NOT be visible.
-* Such changes are visible MAY be partially visible.
+* Such changes that are visible MAY be partially visible.
 
 
 At time t0

http://git-wip-us.apache.org/repos/asf/hadoop/blob/6a01497d/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 85c855b..e01a560 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -141,6 +141,9 @@ Release 2.7.0 - UNRELEASED
     HDFS-7440. Consolidate snapshot related operations in a single class.
     (wheat9)
 
+    HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading
+    in concurrent context. (stack via stevel)
+
   OPTIMIZATIONS
 
   BUG FIXES


[2/2] hadoop git commit: HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading in concurrent context. (stack via stevel)

Posted by st...@apache.org.
HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading  in concurrent context. (stack via stevel)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/aa7dac33
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/aa7dac33
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/aa7dac33

Branch: refs/heads/trunk
Commit: aa7dac335960950d2254a5a78bd1f0786a290538
Parents: 4a31611
Author: Steve Loughran <st...@apache.org>
Authored: Wed Nov 26 11:43:46 2014 +0000
Committer: Steve Loughran <st...@apache.org>
Committed: Wed Nov 26 11:44:03 2014 +0000

----------------------------------------------------------------------
 .../markdown/filesystem/fsdatainputstream.md    | 62 +++++++++++++++-----
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt     |  3 +
 2 files changed, 49 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/aa7dac33/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
index 0b0469b..4b4318e 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
@@ -31,8 +31,8 @@ with extensions that add key assumptions to the system.
 reads starting at this offset.
 1. The cost of forward and backward seeks is low.
 1. There is no requirement for the stream implementation to be thread-safe.
- Callers MUST assume that instances are not thread-safe.
-
+1. BUT, if a stream implements [PositionedReadable](#PositionedReadable),
+ "positioned reads" MUST be thread-safe.
 
 Files are opened via `FileSystem.open(p)`, which, if successful, returns:
 
@@ -84,7 +84,7 @@ unexpectedly.
     FSDIS' = ((undefined), (undefined), False)
 
 
-### `Seekable.getPos()`
+### <a name="Seekable.getPos"></a>`Seekable.getPos()`
 
 Return the current position. The outcome when a stream is closed is undefined.
 
@@ -97,7 +97,7 @@ Return the current position. The outcome when a stream is closed is undefined.
     result = pos(FSDIS)
 
 
-### `InputStream.read()`
+### <a name="InputStream.read"></a> `InputStream.read()`
 
 Return the data at the current position.
 
@@ -117,7 +117,7 @@ Return the data at the current position.
         result = -1
 
 
-### `InputStream.read(buffer[], offset, length)`
+### <a name="InputStream.read.buffer[]"></a> `InputStream.read(buffer[], offset, length)`
 
 Read `length` bytes of data into the destination buffer, starting at offset
 `offset`
@@ -151,7 +151,7 @@ Exceptions that may be raised on precondition failure are
           FSDIS' = (pos+l, data, true)
           result = l
 
-### `Seekable.seek(s)`
+### <a name="Seekable.seek"></a>`Seekable.seek(s)`
 
 
 #### Preconditions
@@ -237,14 +237,47 @@ class, which can react to a checksum error in a read by attempting to source
 the data elsewhere. It a new source can be found it attempts to reread and
 recheck that portion of the file.
 
-## interface `PositionedReadable`
+## <a name="PositionedReadable"></a> interface `PositionedReadable`
+
+The `PositionedReadable` operations supply "positioned reads" ("pread").
+They provide the ability to read data into a buffer from a specific
+position in the data stream. Positioned reads equate to a
+[`Seekable.seek`](#Seekable.seek) at a particular offset followed by a
+[`InputStream.read(buffer[], offset, length)`](#InputStream.read.buffer[]),
+only there is a single method invocation, rather than `seek` then
+`read`, and two positioned reads can *optionally* run concurrently
+over a single instance of a `FSDataInputStream` stream.
+
+The interface declares positioned reads thread-safe (some of the
+implementations do not follow this guarantee).
+
+Any positional read run concurrent with a stream operation &mdash; e.g.
+[`Seekable.seek`](#Seekable.seek), [`Seekable.getPos()`](#Seekable.getPos),
+and [`InputStream.read()`](#InputStream.read) &mdash; MUST run in
+isolation; there must not be  mutual interference.
+
+Concurrent positional reads and stream operations MUST be serializable;
+one may block the other so they run in series but, for better throughput
+and 'liveness', they SHOULD run concurrently.
 
-The `PositionedReadable` operations provide the ability to
-read data into a buffer from a specific position in
-the data stream.
+Given two parallel positional reads, one at `pos1` for `len1` into buffer
+`dest1`, and another at `pos2` for `len2` into buffer `dest2`, AND given
+a concurrent, stream read run after a seek to `pos3`, the resultant
+buffers MUST be filled as follows, even if the reads happen to overlap
+on the underlying stream:
 
-Although the interface declares that it must be thread safe,
-some of the implementations do not follow this guarantee.
+    // Positioned read #1
+    read(pos1, dest1, ... len1) -> dest1[0..len1 - 1] =
+      [data(FS, path, pos1), data(FS, path, pos1 + 1) ... data(FS, path, pos1 + len1 - 1]
+
+    // Positioned read #2
+    read(pos2, dest2, ... len2) -> dest2[0..len2 - 1] =
+      [data(FS, path, pos2), data(FS, path, pos2 + 1) ... data(FS, path, pos2 + len2 - 1]
+
+    // Stream read
+    seek(pos3);
+    read(dest3, ... len3) -> dest3[0..len3 - 1] =
+      [data(FS, path, pos3), data(FS, path, pos3 + 1) ... data(FS, path, pos3 + len3 - 1]
 
 #### Implementation preconditions
 
@@ -265,9 +298,6 @@ of `pos` is unchanged at the end of the operation
     pos(FSDIS') == pos(FSDIS)
 
 
-There are no guarantees that this holds *during* the operation.
-
-
 #### Failure states
 
 For any operations that fail, the contents of the destination
@@ -326,7 +356,7 @@ The semantics of this are exactly equivalent to
 are expected to receive access to the data of `FS.Files[p]` at the time of opening.
 * If the underlying data is changed during the read process, these changes MAY or
 MAY NOT be visible.
-* Such changes are visible MAY be partially visible.
+* Such changes that are visible MAY be partially visible.
 
 
 At time t0

http://git-wip-us.apache.org/repos/asf/hadoop/blob/aa7dac33/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 8cc08e2..89e67eb 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -398,6 +398,9 @@ Release 2.7.0 - UNRELEASED
     HDFS-7440. Consolidate snapshot related operations in a single class.
     (wheat9)
 
+    HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading
+    in concurrent context. (stack via stevel)
+
   OPTIMIZATIONS
 
   BUG FIXES