You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Mikhail Bautin (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 23:25:13 UTC
[jira] [Created] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Retry immediately after a NotServingRegionException in a multiput
-----------------------------------------------------------------
Key: HBASE-5813
URL: https://issues.apache.org/jira/browse/HBASE-5813
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.9.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Not allocating a TreeMap in case of a "singleton put". I apologize for spam.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256904#comment-13256904 ]
Phabricator commented on HBASE-5813:
------------------------------------
aaiyer has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Looks good.
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v14
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256900#comment-13256900 ]
Phabricator commented on HBASE-5813:
------------------------------------
khemani has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1733 It is probably better to wrap the InterruptedException into a InterruptedIoException and throw it immediately.
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1665 Is a TreeMap needed? Or a HashMap would have worked?
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1271 similar logic ought to be present here in getRegionServerWithRetries()
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v14
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.1.patch
mbautin requested code review of "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA
After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
TEST PLAN
Run unit tests
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/6453/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256947#comment-13256947 ]
Phabricator commented on HBASE-5813:
------------------------------------
mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1665 HashMap cannot be keyed by byte[] (see previous comments on this diff)
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v16
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256205#comment-13256205 ]
Phabricator commented on HBASE-5813:
------------------------------------
Kannan has added reviewers to the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Added Reviewers: khemani, aaiyer
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v8
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256990#comment-13256990 ]
Phabricator commented on HBASE-5813:
------------------------------------
khemani has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
The ideal behavior will be the following
look up cached location
retry:
make the call
if the call fails with NSRE
look up cache for location with reload equal to true
if you get the same location again then
wait
lookup cache for location with reload equal to <not sure what>
goto retry
else
goto retry
The difference is that you are not making an extra roundtrip to the region server even when you have discovered that prevAddress is same as newAddress.
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v17
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.6.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Fixing the bug Ted pointed out. Also, contrary to my previous comment, there is not transition between single-put and multi-put cases between retries, so there is no need to handle detection of the same region location consistently in these two cases.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268780#comment-13268780 ]
Phabricator commented on HBASE-5813:
------------------------------------
Kannan has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1707 for efficiency, let's populate this map only if msWaitOnSameRegionLoc > 0.
That for most common cases, i.e. the 1 attempt case, we avoid building this map.
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1765 could you explain this change: why don't we add these failed requests to the "failed" list?
REVISION DETAIL
https://reviews.facebook.net/D2847
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.12.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.5.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA
Adding more logic to handle the case when we get an NSRE but the region location is the same for one of the regions on a regionserver. It is possible that that region is being reassigned and the new location has not yet been written to META. In order to avoid exhausting retries too quickly in that case, we wait for the appropriate amount of time before the retry.
This will need one more iteration—a request that started as a multi-put might become a single put during the course of retries, and previous region location needs to be correctly passed from multi-put to single put.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.3.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA
Addressing Ted's comment.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256172#comment-13256172 ]
Phabricator commented on HBASE-5813:
------------------------------------
tedyu has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1736 newRegionLocations.get(regionName) may return null, right ?
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1663 Should 'new TreeMap<byte[], HServerAddress>(Bytes.BYTES_COMPARATOR)' be used here ?
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v8
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.11.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Updating retry logic in getRegionServerWithRetries slightly after discussing offline with Liyin.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256986#comment-13256986 ]
Phabricator commented on HBASE-5813:
------------------------------------
khemani has requested changes to the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Requesting Changes so that it can get accepted again
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v17
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.12.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Prakash: I think this implements your comments. I moved the logic that decides whether to wait if region locations are the same as before to the point when we already know the new region locations. Re-testing.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
src/main/java/org/apache/hadoop/hbase/util/Threads.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.12.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.10.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Addressing Prakash's comments.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256988#comment-13256988 ]
Phabricator commented on HBASE-5813:
------------------------------------
mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
All unit tests passed. I also ran Test*ServerCmdLine tests from D2811 and checked the logs for retries after a NotServingRegionException. This fix did save two five-second pauses in those tests.
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v17
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.10.patch, D2847.11.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256768#comment-13256768 ]
Phabricator commented on HBASE-5813:
------------------------------------
tedyu has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
I cannot accept a second time :-)
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v14
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch, D2847.9.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.8.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Replacing another occurrence of HashMap<byte[], ...> with a TreeMap. It is incorrect to use byte arrays as keys in a HashMap (http://stackoverflow.com/questions/1058149/using-a-byte-array-as-hashmap-key-java). Also improving comments.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch, D2847.8.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.4.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA
Using the previous logic (enabling the wait) in case of InterruptedException.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.2.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA
Addressing Liyin's offline feedback: not waiting at the last retry either.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256735#comment-13256735 ]
Phabricator commented on HBASE-5813:
------------------------------------
mbautin has commented on the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1736 We add locations for all regions participating in the query to newRegionLocations in the initial loop of this function.
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1663 Good catch!
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v8
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255978#comment-13255978 ]
Phabricator commented on HBASE-5813:
------------------------------------
tedyu has accepted the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:1629 Please remove 'after' or change it to 'after failure'.
REVISION DETAIL
https://reviews.facebook.net/D2847
BRANCH
retry_immediately_after_a_HBASE-5813_v2
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5813) Retry immediately after a
NotServingRegionException in a multiput
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5813:
-------------------------------
Attachment: D2847.7.patch
mbautin updated the revision "[jira] [HBASE-5813] [89-fb] Retry immediately after a NotServingRegionException in a multiput".
Reviewers: Liyin, Kannan, khemani, todd, tedyu, stack, JIRA, aaiyer
Removing unused import.
REVISION DETAIL
https://reviews.facebook.net/D2847
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> Retry immediately after a NotServingRegionException in a multiput
> -----------------------------------------------------------------
>
> Key: HBASE-5813
> URL: https://issues.apache.org/jira/browse/HBASE-5813
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: D2847.1.patch, D2847.2.patch, D2847.3.patch, D2847.4.patch, D2847.5.patch, D2847.6.patch, D2847.7.patch
>
>
> After we get some errors in a multiput we invalidate the region location cache and wait for the configured time interval according to the backoff policy. However, if all "errors" in multiput processing were NotServingRegionExceptions, we don't really need to wait. We can retry immediately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira