You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marcus Herou <ma...@tailsweep.com> on 2008/07/14 21:36:36 UTC

Sorted columns

Hi guys.

A simple question: Is only the row key sorted in HBase ?

What if you would like to obtain a scanner based on another column ? I
thought the "auto" sorted feature was one of the reasons you would like to
store for example urls in a reverted manner.

Have I misunderstood something ?

We did choose Hbase as our db for storage of a billion urls but not being
able to search efficiently makes the choice harder...

Kindly

//Marcus

-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Sorted columns

Posted by Jean-Daniel Cryans <jd...@gmail.com>.
Well in fact it's a "sparse, distributed, persistent multidimensional sorted
map".

J-D

On Mon, Jul 14, 2008 at 4:09 PM, Marcus Herou <ma...@tailsweep.com>
wrote:

> Thanks guessed that as well.
>
> Guess i need to treat Hbase as a distributed sorted map then.
>
> Kindly
>
> //Marcus
>
> On Mon, Jul 14, 2008 at 9:57 PM, Jean-Daniel Cryans <jd...@gmail.com>
> wrote:
>
> > Marcus,
> >
> > The one thing you misunderstood is that the row key is not a column and I
> > guess this is caused by a RDBMS background ;) The reason why you want to
> > store reverted urls is that you want to have a fast scanner e.g. if you
> > fetch 30 lines and they are distributed on 30 different machines, the
> > performance will suffer. To search on column families, you have to build
> > search tables using MapReduce or use external indexes that I guess are
> > familiar for you.
> >
> > Hope it helps,
> >
> > J-D
> >
> > On Mon, Jul 14, 2008 at 3:36 PM, Marcus Herou <
> marcus.herou@tailsweep.com>
> > wrote:
> >
> > > Hi guys.
> > >
> > > A simple question: Is only the row key sorted in HBase ?
> > >
> > > What if you would like to obtain a scanner based on another column ? I
> > > thought the "auto" sorted feature was one of the reasons you would like
> > to
> > > store for example urls in a reverted manner.
> > >
> > > Have I misunderstood something ?
> > >
> > > We did choose Hbase as our db for storage of a billion urls but not
> being
> > > able to search efficiently makes the choice harder...
> > >
> > > Kindly
> > >
> > > //Marcus
> > >
> > > --
> > > Marcus Herou CTO and co-founder Tailsweep AB
> > > +46702561312
> > > marcus.herou@tailsweep.com
> > > http://www.tailsweep.com/
> > > http://blogg.tailsweep.com/
> > >
> >
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>

Re: Sorted columns

Posted by Marcus Herou <ma...@tailsweep.com>.
Thanks guessed that as well.

Guess i need to treat Hbase as a distributed sorted map then.

Kindly

//Marcus

On Mon, Jul 14, 2008 at 9:57 PM, Jean-Daniel Cryans <jd...@gmail.com>
wrote:

> Marcus,
>
> The one thing you misunderstood is that the row key is not a column and I
> guess this is caused by a RDBMS background ;) The reason why you want to
> store reverted urls is that you want to have a fast scanner e.g. if you
> fetch 30 lines and they are distributed on 30 different machines, the
> performance will suffer. To search on column families, you have to build
> search tables using MapReduce or use external indexes that I guess are
> familiar for you.
>
> Hope it helps,
>
> J-D
>
> On Mon, Jul 14, 2008 at 3:36 PM, Marcus Herou <ma...@tailsweep.com>
> wrote:
>
> > Hi guys.
> >
> > A simple question: Is only the row key sorted in HBase ?
> >
> > What if you would like to obtain a scanner based on another column ? I
> > thought the "auto" sorted feature was one of the reasons you would like
> to
> > store for example urls in a reverted manner.
> >
> > Have I misunderstood something ?
> >
> > We did choose Hbase as our db for storage of a billion urls but not being
> > able to search efficiently makes the choice harder...
> >
> > Kindly
> >
> > //Marcus
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.herou@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Sorted columns

Posted by Jean-Daniel Cryans <jd...@gmail.com>.
Marcus,

The one thing you misunderstood is that the row key is not a column and I
guess this is caused by a RDBMS background ;) The reason why you want to
store reverted urls is that you want to have a fast scanner e.g. if you
fetch 30 lines and they are distributed on 30 different machines, the
performance will suffer. To search on column families, you have to build
search tables using MapReduce or use external indexes that I guess are
familiar for you.

Hope it helps,

J-D

On Mon, Jul 14, 2008 at 3:36 PM, Marcus Herou <ma...@tailsweep.com>
wrote:

> Hi guys.
>
> A simple question: Is only the row key sorted in HBase ?
>
> What if you would like to obtain a scanner based on another column ? I
> thought the "auto" sorted feature was one of the reasons you would like to
> store for example urls in a reverted manner.
>
> Have I misunderstood something ?
>
> We did choose Hbase as our db for storage of a billion urls but not being
> able to search efficiently makes the choice harder...
>
> Kindly
>
> //Marcus
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>