You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Izaak Rubin (JIRA)" <ji...@apache.org> on 2008/07/18 00:31:31 UTC
[jira] Commented: (HBASE-679) Regionserver addresses are still not right in the new tables page

    [ https://issues.apache.org/jira/browse/HBASE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614551#action_12614551 ] 

Izaak Rubin commented on HBASE-679:
-----------------------------------

I've been going through this issue for a little while now, and I think I have an idea of what ultimately needs to be done.

First, I was wrong in my last comment when I said that the problem had changed from the original issue description.  I've observed the problem in the HBase instance that Jim is running from the aa0-00... machines.  And, after a bit more fiddling with the UI and some data on my own computer, I think I'm seeing trace bits of the same problem as well.

In my opinion, this issue boils down to the exact same problem we saw in HBASE-727.  The problem is this: we are caching information about regionserver location that most of the time will be wrong.  In HBASE-727 we were able to work around this by being a bit hacky - if the regionserver location was wrong and the page was unable to load after several tries, we threw an IOException into the system that caused some gears to churn, and usually this would stir things up enough to get the right location out.  But, being the one who made patch for HBASE-727, I have to admit that this is a bit hacky.

Now, in this issue, the cached info coming back from HRegionInfo.getValue().getBindAddress() is also incorrect.  As with HBASE-727, the data isn't consistently incorrect either - sometimes the address that comes back is correct, and sometimes it isn't.  Sometimes after a few page refreshes it gets it right, and sometimes it continues to be wrong indefinitely.  

There are a number of things that can be done, both for this particular issue and for the larger problem in general.  

For HBASE-679:
 * I can modify table.jsp to fail a bit more gracefully when the wrong address comes back.  We could even put up a picture of the twitter fail whale: http://www.pestaola.gr/img1/twitter-whale.png (just kidding!)
 * We can leave everything as is and just let the user get an anonymous error code 500 in certain cases.  Maybe it's better to tell them nothing if it isn't working?
 * I can try to make some kind of a hacky fix.  I'm not sure this will work though, since throwing an exception from inside a .jsp page won't really do anything.  The fix would have to be in the java code, but then there would be an issue of how to determine if the address is right or not.
 * We can punt on the issue until the overall problem is fixed (see below).

For the greater problem of incorrect caching:
 * I would ask why we are caching this information in the first place.  Information like regionserver host:port is likely to change if the user is shutting down HBase and restarting.  There needs to be some way for every node to be informed of fresh location data when HBase is started.  Admittedly, I am still incredibly naive when it comes to the inner-workings and practicalities of HBase, and I would imagine that this is probably a lot easier said than done.  I'd imagine that this is something that would get pushed to 0.3.  Still, I maintain that this is something worth doing.  This problem continues to manifest itself in interesting ways all over the UI, and if we really want the UI to be a reliable reflection of the data in HBase, we need to make this fix.

> Regionserver addresses are still not right in the new tables page
> -----------------------------------------------------------------
>
>                 Key: HBASE-679
>                 URL: https://issues.apache.org/jira/browse/HBASE-679
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Izaak Rubin
>         Attachments: ms.patch, ms_revised.patch
>
>
> They are mostly right.
> I'm guessing its stale cache of regions in the client hosted by the UI.  If the webserver ran a scan, it'd probably fix it all up but thats a bit messy.  I tried using the address that is in the .META. table directly but that doesn't work.... we don't seem to deploy table properly and UI complains "No server address for row TestTable,,1213074650399".  I'll attach my patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.