You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/10/14 19:20:31 UTC

[jira] Created: (HADOOP-6313) Expose flush APIs to application users

Expose flush APIs to application users
--------------------------------------

                 Key: HADOOP-6313
                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.21.0


Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
1. Three flush APIs
* API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
* API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
* API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).

2. Support flush APIs in FS
* FSDataOutputStream#flush supports API1
* FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
{noformat}
  public interface Syncable {
    public void hflush() throws IOException;  // support API2
    public void hsync() throws IOException;   // support API3
  }
{noformat}
* In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Attachment:     (was: hflushCommon.patch)

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Attachment: hflushCommon2.patch

Thank Sanjay and Stack for your review. Here is a patch that adds a @Deprecated annotation to FSDataOutputStream#sync() to remove the javac warning.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Attachment: hflushCommon2.patch

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Attachment: hflushCommon1.patch

This patch removes Syncable implementation in RawLocalFileSystem and makes the default implementation of hflush & hsync to be flush in FSDataOutputStream.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771650#action_12771650 ] 

Hairong Kuang commented on HADOOP-6313:
---------------------------------------

Here is the new ant test-patch result:

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.


> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772042#action_12772042 ] 

Hudson commented on HADOOP-6313:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #76 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/76/])
    . Implement Syncable interface in FSDataOutputStream to expose flush APIs to application users. Contributed by Hairong Kuang.


> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771205#action_12771205 ] 

Sanjay Radia commented on HADOOP-6313:
--------------------------------------

+1

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6313:
----------------------------------

    Attachment: hflushCommon.patch

This patch 
1. defines Syncable interface
2. makes FSDataOutputStream to implement Syncable interface
3. makes LocalFSFileOutputStream of RawLocalFileSystem to implement the Syncable interface and also makes it a BufferedOutputStream
4. implement a unit test to test 2 and 3.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770225#action_12770225 ] 

Hairong Kuang commented on HADOOP-6313:
---------------------------------------

> 3. makes LocalFSFileOutputStream of RawLocalFileSystem to implement the Syncable interface and also makes it a BufferedOutputStream
The reason that I made this change is that LocalFSFileOutputStream of RawLocalFileSystem implements Syncable in the trunk although there is a bug. My patch fixed the bug. But another option is to remove the Syncable implementation in RawLocalFileSystem because I guess most users use LocalFileSystem anyway.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771235#action_12771235 ] 

Hadoop QA commented on HADOOP-6313:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423266/hflushCommon1.patch
  against trunk revision 829289.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 174 javac compiler warnings (more than the trunk's current 172 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/110/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/110/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/110/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/110/console

This message is automatically generated.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-6313:
------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I committed the patch to both trunk and 21. Thank you Hairong.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6313) Expose flush APIs to application users

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-6313:
------------------------------------

    Affects Version/s: 0.22.0
                       0.21.0
        Fix Version/s: 0.22.0
         Release Note: FSOutputDataStream implement Syncable interface to provide hflush and hsync APIs to the application users.

> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772235#action_12772235 ] 

Hudson commented on HADOOP-6313:
--------------------------------

Integrated in Hadoop-Common-trunk #144 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/144/])
    . Implement Syncable interface in FSDataOutputStream to expose flush APIs to application users. Contributed by Hairong Kuang.


> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hflushCommon1.patch, hflushCommon2.patch, hflushCommon2.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6313) Expose flush APIs to application users

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771285#action_12771285 ] 

stack commented on HADOOP-6313:
-------------------------------

+1 on patch.


> Expose flush APIs to application users
> --------------------------------------
>
>                 Key: HADOOP-6313
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6313
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hflushCommon.patch, hflushCommon1.patch
>
>
> Earlier this year, Yahoo, Facebook, and Hbase developers had a roundtable discussion where we agreed to support three types of flush in HDFS (API1, 2, and 3) and the append project aims to implement API2. Here is a proposal to expose these APIs to application users.
> 1. Three flush APIs
> * API1: flushes out from the address space of client into the socket to the data nodes.   On the return of the call there is no guarantee that that data is out of the underlying node and no guarantee of having reached a DN.  New readers will eventually see this data if there are no failures.
> * API2: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers.  New readers will see the data after the call has returned. 
> * API3: flushes out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> 2. Support flush APIs in FS
> * FSDataOutputStream#flush supports API1
> * FSDataOutputStream implements Syncable interface defined below. If its wrapped output stream (i.e. each file system's stream) is Syncable, FSDataOutputStream#hflush() and hsync() call its wrapped output stream's hflush & hsync.
> {noformat}
>   public interface Syncable {
>     public void hflush() throws IOException;  // support API2
>     public void hsync() throws IOException;   // support API3
>   }
> {noformat}
> * In each file system, if only hflush() is implemented, hsync() by default calls hflush().  If only hsync() is implemented, hflush() by default calls flush().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.