You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2010/02/07 07:58:27 UTC

[jira] Created: (HBASE-2189) HCM trashes meta cache even when not needed

HCM trashes meta cache even when not needed
-------------------------------------------

Key: HBASE-2189
URL: https://issues.apache.org/jira/browse/HBASE-2189
Project: Hadoop HBase
Issue Type: Improvement
Affects Versions: 0.20.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Fix For: 0.20.4, 0.21.0

I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:

{code}
client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
No server address listed in .META. for region TestTable,0000086976,1265524283534
client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because:
No server address listed in .META. for region TestTable,0000086976,1265524283534
client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
{code}

So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache?

Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-2189.
---------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch and trunk. Thanks for trying it out Stack.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189-v2.patch, HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832108#action_12832108 ] 

Jean-Daniel Cryans commented on HBASE-2189:
-------------------------------------------

Seems I forgot to handle an exception, let's fix that.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831872#action_12831872 ] 

stack commented on HBASE-2189:
------------------------------

I tried shutting down the server hosting .META. in a cluster with this patch in place.  I get connection refused.
{code}
100 row(s) in 2.1810 seconds
hbase(main):002:0> scan 'TestTable', {LIMIT => 100}
NativeException: java.net.ConnectException: Connection refused

hbase(main):003:0> scan 'TestTable', {LIMIT => 100}
NativeException: java.net.ConnectException: Connection refused

hbase(main):004:0> scan 'TestTable', {LIMIT => 100}
NativeException: java.net.ConnectException: Connection refused

hbase(main):005:0> scan 'TestTable', {LIMIT => 100}
NativeException: java.net.ConnectException: Connection refused
{code}

Without it, shell works fine as .META. moves servers.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832189#action_12832189 ] 

stack commented on HBASE-2189:
------------------------------

+1 on patch.  I tried various combinations of shutdown of servers carrying root and meta and client kept-on keeping-on.  I'd say go commit it.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189-v2.patch, HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2189:
--------------------------------------

    Attachment: HBASE-2189.patch

Patch that adds the proper exception handling, made against branch.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831566#action_12831566 ] 

stack commented on HBASE-2189:
------------------------------

Can you test killing the server hosting .META. during an upload? The relocation of .META. does seem gratuitous.

Should comment be about regions and not tables?

{code}
+          // Only relocate the parent table if necessary
{code}

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2189) HCM trashes meta cache even when not needed

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2189:
--------------------------------------

    Attachment: HBASE-2189-v2.patch

I forgot to check for ConnectionError, so I think the best is to do the reverse of what the other patch did e.g. relocate only if we don't get RegionOfflineException or NoServerForRegionException.

> HCM trashes meta cache even when not needed
> -------------------------------------------
>
>                 Key: HBASE-2189
>                 URL: https://issues.apache.org/jira/browse/HBASE-2189
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2189-v2.patch, HBASE-2189.patch
>
>
> I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT lookups than needed. For example, typical output of PE seqWrite during split:
> {code}
> client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for tableName=TestTable from cache because of 0000380292
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because:
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: 
>  No server address listed in .META. for region TestTable,0000086976,1265524283534
> client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0000380292,99999999999999
> client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.103:56279
> client.HConnectionManager$TableServers: Cached location for TestTable,0000086976,1265524283534 is 192.168.1.103:56279
> {code}
> So why exactly are we removing .META.,,1 from the cache? Because a row didn't have the right address? So that means we did contact .META. but the information we got is still stall because the split isn't finished yet... but why should that result in trashing the cache? 
> Because we don't differentiate between NSRE / WRE from other exceptions like empty server address. This happens a lot more often now that the Master clears that cell when a region is closed instead of keeping the old value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.