You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Dmytro Molkov (JIRA)" <ji...@apache.org> on 2010/11/05 19:34:42 UTC

[jira] Created: (HDFS-1490) TransferFSImage should timeout

TransferFSImage should timeout
------------------------------

                 Key: HDFS-1490
                 URL: https://issues.apache.org/jira/browse/HDFS-1490
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Dmytro Molkov
            Assignee: Dmytro Molkov
            Priority: Minor


Sometimes when primary crashes during image transfer secondary namenode would hang trying to read the image from HTTP connection forever.
It would be great to set timeouts on the connection so if something like that happens there is no need to restart the secondary itself.
In our case restarting components is handled by the set of scripts and since the Secondary as the process is running it would just stay hung until we get an alarm saying the checkpointing doesn't happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Resolved] (HDFS-1490) TransferFSImage should timeout

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved HDFS-1490.
-------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
    
> TransferFSImage should timeout
> ------------------------------
>
>                 Key: HDFS-1490
>                 URL: https://issues.apache.org/jira/browse/HDFS-1490
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
>             Fix For: 3.0.0, 2.2.0-alpha
>
>         Attachments: HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch
>
>
> Sometimes when primary crashes during image transfer secondary namenode would hang trying to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that happens there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since the Secondary as the process is running it would just stay hung until we get an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira