You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rodrick Megraw <re...@hotmail.com> on 2010/03/11 23:10:26 UTC

Live table switching



Hi,
 
I am building a web service that looks up data
from several HBase tables and returns a result. There will be an hourly
batch process that generates new versions of the tables, and, when
they’re available, the web service should switch over to using them. The
switchover cannot introduce any interruption or latency into the web
service.
 
I have considered that if the new version of a
table has the exact same column families as the old table, the batch
process could simply update the old table while it is still in use by the service, and then do a delete scan (to delete rows
that are in the old version of the table but not in the new version). But I believe I have a case
where column families could be added or deleted in the new table
version, which is why I’m looking at creating a new table and switching
over to it.
 
What’s the preferred strategy for implementing
this type of table switching? Is a non-blocking table rename possible?
If not, what’s better?
 
Thanks much.

 		 	   		  
_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/

RE: Live table switching

Posted by Rodrick Megraw <re...@hotmail.com>.
Makes sense. No magic, but not hard to implement. For now I'm only dealing with a single client, so I dodge the issue you point out about many simultaneous client hits. Thanks much.

> Date: Thu, 11 Mar 2010 14:23:04 -0800
> Subject: Re: Live table switching
> From: stack@duboce.net
> To: hbase-user@hadoop.apache.org
> 
> On Thu, Mar 11, 2010 at 2:10 PM, Rodrick Megraw <re...@hotmail.com> wrote:
> >
> > I am building a web service that looks up data
> > from several HBase tables and returns a result. There will be an hourly
> > batch process that generates new versions of the tables, and, when
> > they’re available, the web service should switch over to using them. The
> > switchover cannot introduce any interruption or latency into the web
> > service.
> >
> 
> This would be a sweet feature for hbase to have.
> 
> 
> > I have considered that if the new version of a
> > table has the exact same column families as the old table, the batch
> > process could simply update the old table while it is still in use by the service, and then do a delete scan (to delete rows
> > that are in the old version of the table but not in the new version). But I believe I have a case
> > where column families could be added or deleted in the new table
> > version, which is why I’m looking at creating a new table and switching
> > over to it.
> >
> > What’s the preferred strategy for implementing
> > this type of table switching? Is a non-blocking table rename possible?
> > If not, what’s better?
> >
> > Thanks much.
> >
> 
> Here are a few comments.
> 
> You can't switch out the data from under a running table, not easily
> at least, because the regionservers have the current data files all
> open.  There is also the content that is in memory, the cache in your
> case and probably not the memstore since it seems like you are taking
> on reads only.  The memory content would need to be cleared on swap of
> data out from under the table.
> 
> You could add a completely new table by putting the files into place
> in hdfs and then use a script like bin/add_table.rb to add the new
> tables regions to the catalog table.   This can be made run quickly...
> seconds.   Then you'd kick clients to start using the new table
> instead.  We should talk about getting the clients to quickly fill
> their cache of new table regions locally to keep latencies to the
> minimum so you avoid all clients all rushing to the catalog table to
> find the location of all regions in new table at same time.  Once all
> clients had been transitioned, you could remove the old table.... and
> so on every hour.
> 
> St.Ack
 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/

Re: Live table switching

Posted by Stack <st...@duboce.net>.
On Thu, Mar 11, 2010 at 2:10 PM, Rodrick Megraw <re...@hotmail.com> wrote:
>
> I am building a web service that looks up data
> from several HBase tables and returns a result. There will be an hourly
> batch process that generates new versions of the tables, and, when
> they’re available, the web service should switch over to using them. The
> switchover cannot introduce any interruption or latency into the web
> service.
>

This would be a sweet feature for hbase to have.


> I have considered that if the new version of a
> table has the exact same column families as the old table, the batch
> process could simply update the old table while it is still in use by the service, and then do a delete scan (to delete rows
> that are in the old version of the table but not in the new version). But I believe I have a case
> where column families could be added or deleted in the new table
> version, which is why I’m looking at creating a new table and switching
> over to it.
>
> What’s the preferred strategy for implementing
> this type of table switching? Is a non-blocking table rename possible?
> If not, what’s better?
>
> Thanks much.
>

Here are a few comments.

You can't switch out the data from under a running table, not easily
at least, because the regionservers have the current data files all
open.  There is also the content that is in memory, the cache in your
case and probably not the memstore since it seems like you are taking
on reads only.  The memory content would need to be cleared on swap of
data out from under the table.

You could add a completely new table by putting the files into place
in hdfs and then use a script like bin/add_table.rb to add the new
tables regions to the catalog table.   This can be made run quickly...
seconds.   Then you'd kick clients to start using the new table
instead.  We should talk about getting the clients to quickly fill
their cache of new table regions locally to keep latencies to the
minimum so you avoid all clients all rushing to the catalog table to
find the location of all regions in new table at same time.  Once all
clients had been transitioned, you could remove the old table.... and
so on every hour.

St.Ack