You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2012/12/01 08:54:51 UTC

Re: Recovery from cluster wide failure

HTablePool has to go away...

To survive cluster failures and get the best performance do this:
1. Create the HConnection yourself. HConnectionManager.createConnection(...). This gives you an "unmanaged" connection, which represents a cluster. You need to remember to close it eventually.
2. Create a single ExecutorService.
3. Now, when you need to perform an operation create an HTable using the HTable(byte[]. HConnection, ExecutorService) constructor, perform the operation, then close it and throw it away (close is actually a noop in this case, but it should still be called). This constructor is extremely cheap (assuming your HConnection has cached the region locations for the table already).


I added this code precisely because of the fact that a client can possibly outlive the cluster it connects to and because creating a new HTable this way each time is actually faster than retrieving from and returning to the HTablePool; and because lastly the caller controls things rather than a Byzantine caching of HTable in HTablePool.


See HBASE-4805.
In the end HConnectionImplementation should maintain the ExecutorService and have a getTable method, but that would just be syntactic sugar.


-- Lars



________________________________
 From: Bryan Baugher <bj...@gmail.com>
To: user <us...@hbase.apache.org> 
Sent: Friday, November 30, 2012 12:51 PM
Subject: Re: Recovery from cluster wide failure
 
I am honestly a little confused by what Igor's factory/table gets you as it
only seems to be checking if the table is closed and affecting the close()
logic.

The way I see it there are 3 things that need to be fixed in order to get
this to work for HTablePool.

1. The pool needs a way to determine if a table is invalid and not add it
back in when consumers call close()
2. HTable/HTablePool needs a way to proactively remove closed/stale
HConnections from HConnectionManager
3. The constructors for HTable that do not take in an HConnection need to
delete the connection when an exception occurs

If the improvements described above are implemented then 2 is already taken
care of as htable.close() deletes the connection and similarly for 3.

1 is the hardest as HTablePool has no concept of an HTable or HConnection.
My original idea was to add a method to HTableInterfaceFactory for checking
if the table is valid and an additional method added to HTable for checking
if the connection is closed or aborted, but even that seems awkward.


On Fri, Nov 30, 2012 at 2:16 PM, Stack <st...@duboce.net> wrote:

> On Fri, Nov 30, 2012 at 8:56 AM, Bryan Baugher <bj...@gmail.com> wrote:
>
> > Unfortunately it does not seem like HTable or HTablePool have any logic
> to
> > tell the HConnectionManager the connection is stale and I don't believe
> you
> > can rely on all of the clients giving back the connection at the same
> time
> > in order to solve this issue.
> >
> > So I have a couple questions,
> >
> > 1. Since HConnectionImplementation understands if it is being managed or
> > not, would it make sense for it to remove itself from the
> > HConnectionManager cache when abort(String, Throwable) is called via
> > deleteStaleConnection(..)? Notice that the close() method currently does
> > something similar.
> >
> >
> Sounds right, yes.
>
>
>
> > 2. Should HConnectionManager delete connections that are closed/aborted
> and
> > have been passed back to it via the deleteConnection methods?
> >
> >
> Also sounds like the right thing to do.
>
>
>
> > Although I wish I had a junit that could show this, I also believe that a
> > HConnectionImplementation can become aborted during construction. We saw
> > this happening while the cluster services were down, HConnectionManager
> > would retrieve a new HConnection but it would come to us already
> > closed/aborted.
> >
> > There are a couple other issues with HTablePool[1] and dealing with this
> > issue but these behaviors seem like they would need to be addressed
> first.
> >
> > [1] - https://issues.apache.org/jira/browse/HBASE-6956
> >
> >
> What do you think of what Igor pasted into the issue?
>
> St.Ack
>



-- 
-Bryan