You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Yongkun Wang (JIRA)" <ji...@apache.org> on 2012/07/23 10:37:36 UTC

[jira] [Created] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Yongkun Wang created FLUME-1391:
-----------------------------------

             Summary: Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
                 Key: FLUME-1391
                 URL: https://issues.apache.org/jira/browse/FLUME-1391
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
    Affects Versions: v1.3.0
            Reporter: Yongkun Wang


For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
{code}
    /** create a sync point */
    public void sync() throws IOException {
      if (sync != null && lastSyncPos != out.getPos()) {
        out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
        out.write(sync);                          // write sync
        lastSyncPos = out.getPos();               // update lastSyncPos
      }
    }

    /** flush all currently written data to the file system */
    public void syncFs() throws IOException {
      if (out != null) {
        out.sync();                               // flush contents to file system
      }
    }
{code}

Therefore, using sync() in HDFSSequenceFile may be better.
{code}
  @Override
  public void sync() throws IOException {
    //writer.syncFs(); //for hadoop 0.20.205.0+
    writer.sync(); //support hadoop 0.20.2+
  }
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Yongkun Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yongkun Wang updated FLUME-1391:
--------------------------------

    Issue Type: Bug  (was: Improvement)
    
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426037#comment-13426037 ] 

Jarek Jarcec Cecho commented on FLUME-1391:
-------------------------------------------

I've verified source code of Hadoop 0.20, 1.0.3 and 2.0.0. The syncFs() is basically just delegating functionality to sync(). That's why I assumed that it's fine and committed this code.

Jarcec
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426181#comment-13426181 ] 

Jarek Jarcec Cecho commented on FLUME-1391:
-------------------------------------------

I do agree with you Mike. Flume can't be reliable on Hadoop 0.20.2 without working sync. I would "sell it" this way - Flume is not supported on Hadoop 0.20.2, but it's able to run there without any reliability assumptions.

My reasoning here is that if user understands the sync issue and he still wants to use Hadoop 0.20.2, it's his problem. But we might not want to give him hard time to get it operational.

Jarcec
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Hari Shreedharan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan reassigned FLUME-1391:
---------------------------------------

    Assignee: Yongkun Wang

Added Yongkun to contributors list and assigned to him.
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425977#comment-13425977 ] 

Mike Percy commented on FLUME-1391:
-----------------------------------

So, does sync() in the latest versions call hsync() ? Have we verified that this change is actually durable?

Note that Hadoop 0.20.2 does not support hsync and so it's not durable, and therefore Flume NG cannot really support that version.
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426152#comment-13426152 ] 

Mike Percy commented on FLUME-1391:
-----------------------------------

Thanks for checking Jarcec, just wanted to make sure.

Still, because Hadoop 0.20.2 does not support a durable sync() operation in the way that Hadoop 1.x does, that platform cannot be fully supported by Flume with its current design assumptions. As far as I know, Flume cannot be reliable on Hadoop 0.20.2.

Regards,
Mike
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Yongkun Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yongkun Wang updated FLUME-1391:
--------------------------------

        Fix Version/s: v1.3.0
    Affects Version/s:     (was: v1.3.0)
                       v1.1.0
    
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Yongkun Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yongkun Wang updated FLUME-1391:
--------------------------------

    Attachment: HDFSSink-for-hadoop-0.20.2.patch
    
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425869#comment-13425869 ] 

Jarek Jarcec Cecho commented on FLUME-1391:
-------------------------------------------

Committed and pushed to our new GIT repository. I'll close this ticket once I'll be able to assign it to Yongkun.

Jarcec
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1391) Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

Posted by "Yongkun Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426372#comment-13426372 ] 

Yongkun Wang commented on FLUME-1391:
-------------------------------------

Hi guys, thanks for the reviews and comments. 

I tested this patch with flume release 1.2.0. It can work well with hadoop 0.20.205.0 and latest hadoop release 1.0.3.

But this patch is not enough to make flume 1.2.0 to work with hadoop 0.20.2. The hdfs sink cannot be started because there were some big changes from 0.20.2 to 0.20.205.0+ on hadoop security. Need more patches. 
But I will keep these patches internal for our current hadoop cluster, and wait for the upgrade of hadoop to use the latest flume.
                
> Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-1391
>                 URL: https://issues.apache.org/jira/browse/FLUME-1391
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Yongkun Wang
>            Assignee: Yongkun Wang
>              Labels: hadoop
>             Fix For: v1.3.0
>
>         Attachments: HDFSSink-for-hadoop-0.20.2.patch
>
>
> For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):
> {code}
>     /** create a sync point */
>     public void sync() throws IOException {
>       if (sync != null && lastSyncPos != out.getPos()) {
>         out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
>         out.write(sync);                          // write sync
>         lastSyncPos = out.getPos();               // update lastSyncPos
>       }
>     }
>     /** flush all currently written data to the file system */
>     public void syncFs() throws IOException {
>       if (out != null) {
>         out.sync();                               // flush contents to file system
>       }
>     }
> {code}
> Therefore, using sync() in HDFSSequenceFile may be better.
> {code}
>   @Override
>   public void sync() throws IOException {
>     //writer.syncFs(); //for hadoop 0.20.205.0+
>     writer.sync(); //support hadoop 0.20.2+
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira