You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2010/04/07 22:10:33 UTC

[jira] Created: (HBASE-2421) Put hangs for 10 retries on failed region servers

Put hangs for 10 retries on failed region servers
-------------------------------------------------

                 Key: HBASE-2421
                 URL: https://issues.apache.org/jira/browse/HBASE-2421
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
            Assignee: ryan rawson
            Priority: Critical
             Fix For: 0.20.5, 0.21.0


Since MultiPut got in, instead of calling getRegionLocationForRowWithRetries we now call getRegionServerWithRetries to send an array list of Puts. The problem is that if the region server failed, we'll still retry the 10 times in a backoff fashion even tho we get connections refused. This is also true for a single put since it's the same code path.

Marking as critical since it almost disables our responsiveness to machine failures in certain cases where we are already sending a batch of edits when the server fails. Assigning to Ryan since he's been there recently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2421) Put hangs for 10 retries on failed region servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-2421.
--------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Forward-ported to TRUNK.

> Put hangs for 10 retries on failed region servers
> -------------------------------------------------
>
>                 Key: HBASE-2421
>                 URL: https://issues.apache.org/jira/browse/HBASE-2421
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: ryan rawson
>            Priority: Critical
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2421-2.txt, HBASE-2421-trunk.patch, hbase-2421.txt, HBASE-2421.txt
>
>
> Since MultiPut got in, instead of calling getRegionLocationForRowWithRetries we now call getRegionServerWithRetries to send an array list of Puts. The problem is that if the region server failed, we'll still retry the 10 times in a backoff fashion even tho we get connections refused. This is also true for a single put since it's the same code path.
> Marking as critical since it almost disables our responsiveness to machine failures in certain cases where we are already sending a batch of edits when the server fails. Assigning to Ryan since he's been there recently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.