You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Josh <n3...@gmail.com> on 2008/05/09 00:35:22 UTC

Feedback on my implementation.

Greetings,

I am looking for some feedback on my use of HBase.

To allow matching on column values, I have put data into the column
family attribute name, for example:

colum-fam:attribute1:value1
colum-fam:attribute2:value2

This allows one to match values in the following way:

select colum-fam:attribute1:value1,colum-fam:attribute2:value2 from MyTable;

Programmatically, when I use:

table.obtainscanner(new Text[] { new
Text("colum-fam:attribute1:value1"), new
Text("colum-fam:attribute2:value2") }, new Text(""));

I get rows matching either value1 || value2, so I have logic that
looks for both columns in each row to ensure an exact match.

I am thinking this isn't an ideal implementation, as the scanner above
must walk every row in the table.

Any idea how this might scale?  Would adding RegionServers cut down on
the time it takes to walk the whole table?

Thanks for your input!

Re: Feedback on my implementation.

Posted by Clint Morgan <cl...@gmail.com>.

On Thu, May 8, 2008 at 3:35 PM, Josh <n3...@gmail.com> wrote:

>  To allow matching on column values, I have put data into the column
>  family attribute name, for example:

Have a look at the filter interface, RegexpRowFilter will let you
match rows whose columns have certain values.  Sound like what you
want...

Re: Feedback on my implementation.

Posted by Bryan Duxbury <br...@rapleaf.com>.

First, I don't think it's safe to use multiple colons in a column  
name. That will probably mess up some of the internals of HBase and  
get you inconsistent results.

Second, you talk about how the scanner will have to walk every row in  
the table. That's true no matter what. Scanners always traverse an  
entire row range, which by default is the entire table, but of course  
it can be constrained to a specific range.

Finally, I'm not exactly sure what you're trying to accomplish here.  
Are you trying to select rows by a cell values? If not, then what?  
Can you describe your use case a little more clearly, perhaps with a  
concrete example?

-Bryan

On May 8, 2008, at 3:35 PM, Josh wrote:

> Greetings,
>
> I am looking for some feedback on my use of HBase.
>
> To allow matching on column values, I have put data into the column
> family attribute name, for example:
>
> colum-fam:attribute1:value1
> colum-fam:attribute2:value2
>
> This allows one to match values in the following way:
>
> select colum-fam:attribute1:value1,colum-fam:attribute2:value2 from  
> MyTable;
>
> Programmatically, when I use:
>
> table.obtainscanner(new Text[] { new
> Text("colum-fam:attribute1:value1"), new
> Text("colum-fam:attribute2:value2") }, new Text(""));
>
> I get rows matching either value1 || value2, so I have logic that
> looks for both columns in each row to ensure an exact match.
>
> I am thinking this isn't an ideal implementation, as the scanner above
> must walk every row in the table.
>
> Any idea how this might scale?  Would adding RegionServers cut down on
> the time it takes to walk the whole table?
>
> Thanks for your input!