You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by anil gupta <an...@gmail.com> on 2012/08/17 22:03:07 UTC

Range Based Filtering with FuzzyRowFilter

Hi All,

I have a question related to FuzzyRowFilterfilter. I have a similar
filtering requirement which might be an extension to FuzzyRowFilter.
Suppose, i have the following structure of rowkeys: userid_actionid, where
userid is of 6 digit and then actionid is 5 digit. I would like to get all
the rows with actionid between 00200 to 00350. With current FuzzyRowFilter
i can search for all the rows a particular actionid. Instead of searching
for a particular actionid i would like to search for a range of actionid.

Does this use case sounds like an extension to current FuzzyRowFilter? Can
i run this kind of filter on HBase0.92 without doing any significant update
to the cluster. I am willing to put in my efforts to do the necessary
changes required in FuzzyRowFilter for my requirement.
If you know of any other easier & equally optimized way to do the same then
please share that.

-- 
Thanks & Regards,
Anil Gupta

Re: Range Based Filtering with FuzzyRowFilter

Posted by lars hofhansl <lh...@yahoo.com>.
You might want to rethink your key schema or denormalize your data at write time.
If the key leads with userid then searching for a range of action ids is necessary a full scan through your table, which is not what you want (unless you run these rarely as Map/Reduce type jobs).

I assume you have different scans, which scan by userid; so I'd suggest just storing the same data again but with actionid_userid as key.

If the values of your cells are large store a mapping of actionid_userid -> userid_actionid in the 2nd table (i.e. a 2ndary index). In that case mind the previous discussions we had about consistency here, though.

-- Lars

________________________________
From: anil gupta <an...@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, August 17, 2012 1:03 PM
Subject: Range Based Filtering with FuzzyRowFilter

Hi All,

I have a question related to FuzzyRowFilterfilter. I have a similar
filtering requirement which might be an extension to FuzzyRowFilter.
Suppose, i have the following structure of rowkeys: userid_actionid, where
userid is of 6 digit and then actionid is 5 digit. I would like to get all
the rows with actionid between 00200 to 00350. With current FuzzyRowFilter
i can search for all the rows a particular actionid. Instead of searching
for a particular actionid i would like to search for a range of actionid.

Does this use case sounds like an extension to current FuzzyRowFilter? Can
i run this kind of filter on HBase0.92 without doing any significant update
to the cluster. I am willing to put in my efforts to do the necessary
changes required in FuzzyRowFilter for my requirement.
If you know of any other easier & equally optimized way to do the same then
please share that.

-- 
Thanks & Regards,
Anil Gupta