You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Mikhail Bautin (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 23:25:13 UTC

[jira] [Created] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Retry immediately after a NotServingRegionException in a multiput
-----------------------------------------------------------------

                 Key: HBASE-5813
                 URL: https://issues.apache.org/jira/browse/HBASE-5813
             Project: HBase
          Issue Type: Improvement
            Reporter: Mikhail Bautin
            Assignee: Mikhail Bautin


After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.9.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Not allocating a TreeMap in case of a "singleton put". I apologize for spam.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256904#comment-13256904 ] 

Phabricator commented on HBASE-5813:
------------------------------------

aaiyer has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

  Looks good.

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v14

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256900#comment-13256900 ] 

Phabricator commented on HBASE-5813:
------------------------------------

khemani has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1733 It is probably better to wrap the InterruptedException into a InterruptedIoException and throw it immediately.
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1665 Is a TreeMap needed? Or a HashMap would have worked?
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1271 similar logic ought to be present here in getRegionServerWithRetries()

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v14

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.1.patch

mbautin requested code review of "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA

  After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.


TEST PLAN
  Run unit tests

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/6453/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256947#comment-13256947 ] 

Phabricator commented on HBASE-5813:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1665 HashMap cannot be keyed by byte[] (see previous comments on this diff)

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v16

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256205#comment-13256205 ] 

Phabricator commented on HBASE-5813:
------------------------------------

Kannan has added reviewers to the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Added Reviewers: khemani, aaiyer

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v8

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256990#comment-13256990 ] 

Phabricator commented on HBASE-5813:
------------------------------------

khemani has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

  The ideal behavior will be the following

  look up cached location
  retry:
  make the call
  if the call fails with NSRE
    look up cache for location with reload equal to true
    if you get the same location again then
      wait
      lookup cache for location with reload equal to <not sure what>
      goto retry
    else
      goto retry

  The difference is that you are not making an extra roundtrip to the region server even when you have discovered that prevAddress is same as newAddress.

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v17

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.6.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Fixing the bug Ted pointed out. Also, contrary to my previous comment, there is not transition between single-put and multi-put cases between retries, so there is no need to handle detection of the same region location consistently in these two cases.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268780#comment-13268780 ] 

Phabricator commented on HBASE-5813:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1707 for efficiency, let's populate this map only if msWaitOnSameRegionLoc > 0.

  That for most common cases, i.e. the 1 attempt case, we avoid building this map.
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1765 could you explain this change: why don't we add these failed requests to the "failed" list?

REVISION DETAIL
  https://reviews.facebook.net/D2847

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.12.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.5.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA

  Adding more logic to handle the case when we get an NSRE but the region location is the same for one of the regions on a regionserver. It is possible that that region is being reassigned and the new location has not yet been written to META. In order to avoid exhausting retries too quickly in that case, we wait for the appropriate amount of time before the retry.

  This will need one more iteration—a request that started as a multi-put might become a single put during the course of retries, and previous region location needs to be correctly passed from multi-put to single put.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.3.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA

  Addressing Ted's comment.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256172#comment-13256172 ] 

Phabricator commented on HBASE-5813:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1736 newRegionLocations.get(regionName) may return null, right ?
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1663 Should 'new TreeMap<byte[], HServerAddress>(Bytes.BYTES_COMPARATOR)' be used here ?

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v8

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.11.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Updating retry logic in getRegionServerWithRetries slightly after discussing offline with Liyin.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256986#comment-13256986 ] 

Phabricator commented on HBASE-5813:
------------------------------------

khemani has requested changes to the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

  Requesting Changes so that it can get accepted again

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v17

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.12.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Prakash: I think this implements your comments. I moved the logic that decides whether to wait if region locations are the same as before to the point when we already know the new region locations. Re-testing.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.12.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.10.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Addressing Prakash's comments.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256988#comment-13256988 ] 

Phabricator commented on HBASE-5813:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

  All unit tests passed. I also ran Test*ServerCmdLine tests from D2811 and checked the logs for retries after a NotServingRegionException. This fix did save two five-second pauses in those tests.

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v17

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256768#comment-13256768 ] 

Phabricator commented on HBASE-5813:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

  I cannot accept a second time :-)

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v14

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.8.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Replacing another occurrence of HashMap<byte[], ...> with a TreeMap. It is incorrect to use byte arrays as keys in a HashMap (http://stackoverflow.com/questions/1058149/using-a-byte-array-as-hashmap-key-java). Also improving comments.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.4.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA

  Using the previous logic (enabling the wait) in case of InterruptedException.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.2.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA

  Addressing Liyin's offline feedback: not waiting at the last retry either.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256735#comment-13256735 ] 

Phabricator commented on HBASE-5813:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1736 We add locations for all regions participating in the query to newRegionLocations in the initial loop of this function.
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1663 Good catch!

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v8

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255978#comment-13255978 ] 

Phabricator commented on HBASE-5813:
------------------------------------

tedyu has accepted the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1629 Please remove 'after' or change it to 'after failure'.

REVISION DETAIL
  https://reviews.facebook.net/D2847

BRANCH
  retry_immediately_after_a_HBASE-5813_v2

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5813:
-------------------------------

    Attachment: D2847.7.patch

mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer

  Removing unused import.

REVISION DETAIL
  https://reviews.facebook.net/D2847

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

                
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
>                 Key: HBASE-5813
>                 URL: https://issues.apache.org/jira/browse/HBASE-5813
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira