You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Segel <mi...@hotmail.com> on 2010/03/08 21:13:04 UTC

Are transactional indexes and indexed indexes incompatible?

Hi,

There are two ways one can add secondary indexing to HBase.

One is from $HBASE_HOME/contrib/transactional/ *.jar and the other is from $HBASE_HOME/contrib/indexed/*.jar

In setting up transactional, you need to following in hbase-site.xml:
<!-- Indexing Setup -->
  <property>
    <name>hbase.regionserver.class</name>
    <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value>
    <description> Required for secondary indexing. </description>
  </property>
  <property>
    <name>hbase.regionserver.impl</name>
    <value>org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer</value>
    <description> Required for secondary indexing. </description>
  </property>

and for Indexing we set the following:



<property>

 
<name>hbase.hregion.impl</name>

 
<value>org.apache.hadoop.hbase.regionserver.IdxRegion</value>

</property>
So it looks like that they can co-exist.Is this not the case?
Thx
-Mike



 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/

Re: Are transactional indexes and indexed indexes incompatible?

Posted by Dan Washusen <da...@reactive.org>.
They are two fairly different beasts.  In a nutshell...

The "transactional" contrib (THBase) creates a secondary table and uses the
source tables column value as the key and the source tables row key as the
value.  Each time you write to the source table additional writes are
performed to the secondary tables. To perform a query it scans the secondary
table then performs a series of gets on the source table,  IMO, this 1+n
approach would only be suitable if you are sure each indexed scan returns a
smallish result set.  Unlike core HBase rows are returned from an indexed
scan are in key order from the secondary "index" table  (e.g. the source
tables column value) and it's my understanding that you can't apply filters
to the source table in an indexed scan and that you can only use one index
at a time.  Another thing to consider is that indexed functionality of
THBase in built on top of the transactional extension of HBase...

The "indexed" contrib (IHBase) honors the core HBase scan functionality.
Rows are returned in key order, Scan.startRow and Scan.stopRow are honored,
filters are applied, etc.  IHBase creates indices *in memory* for each
region (each region sever will have many regions) and uses the core
scan functionality to apply filters etc.  The scan can then be executed with
index hints (that can include a complex graph of ands and ors).  If an index
hint is specified then a hook in the IHBase is called after each row in
processed by the scan to fast forward to the next potential match skipping
all the rows that are known not to match.  Where possible, it's still up to
you as the client to specify start and/or stop rows with your scan to avoid
contacting every region.

Another options is to use core HBase and add lookup tables for the use cases
that require it.

The options all have good and bad points and I'd say the answer to your
question is, it depends on your use case...

Cheers,
Dan

On 9 March 2010 07:53, Michael Segel <mi...@hotmail.com> wrote:

>
> So which is the better way to go?
>
> The transactional or indexed package?
>
> Our first use case is where we want to create a secondary index based on
> the values of a couple of fields and to store it in a column of a column
> family.
>
> Thx
>
>
> > Date: Mon, 8 Mar 2010 12:28:25 -0800
> > Subject: Re: Are transactional indexes and indexed indexes incompatible?
> > From: jdcryans@apache.org
> > To: hbase-user@hadoop.apache.org
> >
> > Yeah this is the downside of the way we currently do contribs, they
> > are all exclusive.
> >
> > J-D
> >
> > On Mon, Mar 8, 2010 at 12:13 PM, Michael Segel
> > <mi...@hotmail.com> wrote:
> > >
> > > Hi,
> > >
> > > There are two ways one can add secondary indexing to HBase.
> > >
>
>
> _________________________________________________________________
> Hotmail: Free, trusted and rich email service.
> http://clk.atdmt.com/GBL/go/201469228/direct/01/
>

RE: Are transactional indexes and indexed indexes incompatible?

Posted by Michael Segel <mi...@hotmail.com>.
So which is the better way to go?

The transactional or indexed package?

Our first use case is where we want to create a secondary index based on the values of a couple of fields and to store it in a column of a column family.

Thx


> Date: Mon, 8 Mar 2010 12:28:25 -0800
> Subject: Re: Are transactional indexes and indexed indexes incompatible?
> From: jdcryans@apache.org
> To: hbase-user@hadoop.apache.org
> 
> Yeah this is the downside of the way we currently do contribs, they
> are all exclusive.
> 
> J-D
> 
> On Mon, Mar 8, 2010 at 12:13 PM, Michael Segel
> <mi...@hotmail.com> wrote:
> >
> > Hi,
> >
> > There are two ways one can add secondary indexing to HBase.
> >

 		 	   		  
_________________________________________________________________
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/

Re: Are transactional indexes and indexed indexes incompatible?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yeah this is the downside of the way we currently do contribs, they
are all exclusive.

J-D

On Mon, Mar 8, 2010 at 12:13 PM, Michael Segel
<mi...@hotmail.com> wrote:
>
> Hi,
>
> There are two ways one can add secondary indexing to HBase.
>
> One is from $HBASE_HOME/contrib/transactional/ *.jar and the other is from $HBASE_HOME/contrib/indexed/*.jar
>
> In setting up transactional, you need to following in hbase-site.xml:
> <!-- Indexing Setup -->
>  <property>
>    <name>hbase.regionserver.class</name>
>    <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value>
>    <description> Required for secondary indexing. </description>
>  </property>
>  <property>
>    <name>hbase.regionserver.impl</name>
>    <value>org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer</value>
>    <description> Required for secondary indexing. </description>
>  </property>
>
> and for Indexing we set the following:
>
>
>
> <property>
>
>
> <name>hbase.hregion.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.IdxRegion</value>
>
> </property>
> So it looks like that they can co-exist.Is this not the case?
> Thx
> -Mike
>
>
>
>
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/