You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Schubert Zhang <zs...@gmail.com> on 2011/01/20 17:56:26 UTC

Online modify table schema (e.g. AddColumnFamily, DeleteColumnFamily, etc) and other features

I have following questions about some features which are needed in my
applications.

1. The current HBase release(0.89, 0.90 ~) does not support to modify table
schema online.
     If I want add or delete a ColumnFamily, I must disable/offline the
table firstly.
     Is there a plan to support online modification?
     I think after the table schema info is moved into ZooKeeper from
RegionServer, this feature is possible to be supported.

2. Different memstore/memtable-size and region-size for different table.
    This feature is useful for different tables for different applications.
     Is there a plan to support it?

3. Periodic flush.
    Current HBase only flush memstore to HFile by memory size thresholds.
    If a Region-Store is quiet for a long time (such as 2 hours) for a
write-dense application, the periodic flush can free more memory.
    Is there a plan to support it?

Schubert Zhang

RE: Online modify table schema (e.g. AddColumnFamily, DeleteColumnFamily, etc) and other features

Posted by Jonathan Gray <jg...@fb.com>.
> I have following questions about some features which are needed in my
> applications.
> 
> 1. The current HBase release(0.89, 0.90 ~) does not support to modify table
> schema online.
>      If I want add or delete a ColumnFamily, I must disable/offline the table
> firstly.
>      Is there a plan to support online modification?
>      I think after the table schema info is moved into ZooKeeper from
> RegionServer, this feature is possible to be supported.

This is in the plans and exactly as you describe.  We are planning to move table schema information into ZooKeeper and then support online schema modifications.

I'm unsure if this is currently targeted at 0.92 or not, but I think the plan is to be aggressive on releasing 0.92 soon.  This would represent a reasonably large change so might get punted into 0.94 but hopefully no later than that.

Of course, this is an open source project, so if there are developers who need this then all contributions are welcome.  There are some open JIRAs related to this and I think a good bit of code already written (but based on the old master).

See HBASE-1730 for more info...

https://issues.apache.org/jira/browse/HBASE-1730


> 2. Different memstore/memtable-size and region-size for different table.
>     This feature is useful for different tables for different applications.
>      Is there a plan to support it?

I agree that these settings can be useful on a per-table setting (and where applicable, a per-family setting).

I'm not aware of anyone currently looking into these specific settings but again, just need a developer to go after it and implement it :)


> 
> 3. Periodic flush.
>     Current HBase only flush memstore to HFile by memory size thresholds.
>     If a Region-Store is quiet for a long time (such as 2 hours) for a write-dense
> application, the periodic flush can free more memory.
>     Is there a plan to support it?

I'm unsure what you're describing.  If there is a write-dense application, then MemStores will be flushed as they get filled or as the total MemStore usage across all regions goes above the max allowed capacity given available heap.  If write-heavy workload, then MemStores should constantly be filled and should constantly be flushed.

What would the benefit be of flushing if no writes for a certain period?  This is to free aggregate MemStore space for frequently written to regions when there are less frequently written-to regions that are idle but taking up MemStore space?

JG

Re: Online modify table schema (e.g. AddColumnFamily, DeleteColumnFamily, etc) and other features

Posted by Schubert Zhang <zs...@gmail.com>.
Thank you so much,  Jonathan Gray and Jean-Daniel Cryans,

1. We will try to do more research and try to do something for the
AddColumn/DeleteColumn issue.

2. It's my fault, I missed to check the HTableDescriptor metadata.
    It's great we can use MAX_FILESIZE and MEMSTORE_FLUSHSIZE to control
different table.

3. Yes, the total memstore heap threshold and the HLog.cleanOldLogs can
flush the old memstores.
    My scenario is:
    The columnfamilies/stores are defined by time-ranges (such days). So,
the columnfamilies/stores of yesterday will be quiet today.

Thank you again.
Schubert Zhang
On Fri, Jan 21, 2011 at 2:48 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> To complement what jgray answered...
>
> >
> > 2. Different memstore/memtable-size and region-size for different table.
> >    This feature is useful for different tables for different
> applications.
> >     Is there a plan to support it?
>
> Have you taken a look at the table properties? In the shell you can
> tune MAX_FILESIZE and MEMSTORE_FLUSHSIZE to do exactly what you
> described.
>
> >
> > 3. Periodic flush.
> >    Current HBase only flush memstore to HFile by memory size thresholds.
> >    If a Region-Store is quiet for a long time (such as 2 hours) for a
> > write-dense application, the periodic flush can free more memory.
> >    Is there a plan to support it?
>
> If a region isn't written to, and there's a write-heavy job going on,
> then as the HLogs get created it's easy to reach the max number of
> HLogs a RS can get (32 by default) and after that point the memstores
> for the oldest edits will be flushed. If it's really inserting a lot
> of data, I expect that it can takes less than an hour to force flush
> unused memstores. Check out HLog.cleanOldLogs and
> HLog.findMemstoresWithEditsEqualOrOlderThan
>
> It's not really periodic or easily configurable tho.
>
> J-D
>

Re: Online modify table schema (e.g. AddColumnFamily, DeleteColumnFamily, etc) and other features

Posted by Jean-Daniel Cryans <jd...@apache.org>.
To complement what jgray answered...

>
> 2. Different memstore/memtable-size and region-size for different table.
>    This feature is useful for different tables for different applications.
>     Is there a plan to support it?

Have you taken a look at the table properties? In the shell you can
tune MAX_FILESIZE and MEMSTORE_FLUSHSIZE to do exactly what you
described.

>
> 3. Periodic flush.
>    Current HBase only flush memstore to HFile by memory size thresholds.
>    If a Region-Store is quiet for a long time (such as 2 hours) for a
> write-dense application, the periodic flush can free more memory.
>    Is there a plan to support it?

If a region isn't written to, and there's a write-heavy job going on,
then as the HLogs get created it's easy to reach the max number of
HLogs a RS can get (32 by default) and after that point the memstores
for the oldest edits will be flushed. If it's really inserting a lot
of data, I expect that it can takes less than an hour to force flush
unused memstores. Check out HLog.cleanOldLogs and
HLog.findMemstoresWithEditsEqualOrOlderThan

It's not really periodic or easily configurable tho.

J-D