You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/12/17 00:59:18 UTC

[jira] Created: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
----------------------------------------------------------------------------------------------

                 Key: HADOOP-6450
                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.

These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869947#action_12869947 ] 

Chris Douglas commented on HADOOP-6450:
---------------------------------------

This is marked as a blocker for HDFS-826, but a solution without the {{Replicable}} interface was used. Is this issue still valid?

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt, Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-6450:
-------------------------------------

    Attachment: Replicable.txt

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt, Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-6450:
----------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Won't Fix

Marking as wontfix. Please reopen if required.

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt, Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-6450:
-------------------------------------

    Status: Patch Available  (was: Open)

Can somebody please review this patch?

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt, Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792511#action_12792511 ] 

Hadoop QA commented on HADOOP-6450:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428446/Replicable.txt
  against trunk revision 892113.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/225/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/225/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/225/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/225/console

This message is automatically generated.

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt, Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-6450:
-------------------------------------

    Attachment: Replicable.txt

Added a Replicable interface to retrieve the number of currently valid replicas.

> Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6450
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Replicable.txt
>
>
> The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is being written into until the file is closed. There are some long running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the application has no knowledge of it until all the datanode(s) fail and the application gets an IO error.
> These applictions would benefit a lot if they can determine the number of live replicas of the current block to which it is writing data. For example, the application can decide that when one of the datanode in the write pipeline fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.