You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by y_...@tsmc.com on 2009/06/23 02:46:47 UTC

Katta for secondary index?

Hi there,

HBase access data only by key, right?
Anybody use HBase + Katta(for secondary index)? Does it work?
We just want to transfer part of our Oracle table data to HBase
for multi parallel computing.
Any suggestions would be appreciated!
Thank you

Fleming
 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 ---------------------------------------------------------------------------

Re: Katta for secondary index?

Posted by stack <st...@duboce.net>.

On Mon, Jun 22, 2009 at 5:46 PM, <y_...@tsmc.com> wrote:

> Hi there,
>
> HBase access data only by key, right?
> Anybody use HBase + Katta(for secondary index)? Does it work?

Katta works but its just a means of distributing lucene indices.  You need
to make the indices first.  You've checked out the BuildTableIndex mapreduce
job in hbase?  It indexes table contents.  The index is sharded by the
number of reducers you run.  Perhaps you can have Katta deploy this product
for you?  Perhaps the indices made are not what you want for secondary
lookups but you could adapt BuildTableIndex?

Does the table change frequently?  A batch job to redo the index is OK with
you?  In TRUNK you could run a scan that only found records created after a
certain date so you could add incremental indices and then do the full build
of the index at some lesser frequency.

There is also the experimental tableindexed subclass of hbase that will keep
up a secondary table as an index using transactional hbase so insert into
primary and secondary table is done as a single transaction (Its not yet in
trunk but should be here soon).

St.Ack

> We just want to transfer part of our Oracle table data to HBase
> for multi parallel computing.
> Any suggestions would be appreciated!
> Thank you
>
> Fleming
>
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>
>  ---------------------------------------------------------------------------
>
>
>
>