You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eugeny Morozov <em...@griddynamics.com> on 2013/01/19 00:28:19 UTC

Custom Filter and SEEK_NEXT_USING_HINT issue

Hi, folks!

HBase, Hadoop, etc version is CDH-4.1.2

I'm using custom FuzzyRowFilter, which I get from
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
suddenly after quite a time we found that it starts loosing data.

Basically the idea of FuzzyRowFilter is that it tries to find key that has
been provided and if there is no such a key - but more exists in table - it
returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds required
key. As I understand, HBase in this key will fast-forward to required key -
it must be similar or same as to get Scan with setStartRow.

I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
to get it using Scan.setStartRow.
For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop row
or anything related.
That's what happening:

Fzzy: AAAA1Q7iQ9JA
Next fzzy: F7dtxwqVQ_Pw
Fzzy: AQAAnA96rxTg
Next fzzy: F7dtxwqVQ_Pw
Fzzy: AgAADQWPSIDw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: AwAA-Q33Zb9Q
Next fzzy: F7dtxwqVQ_Pw
Fzzy: BAAAOg8oyu7A
Next fzzy: F7dtxwqVQ_Pw
Fzzy: BQAA9gqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: BgABZQ7iQ9JA
Next fzzy: F7dtxwqVQ_Pw
Fzzy: BwAAbgrpAojg
Next fzzy: F7dtxwqVQ_Pw
Fzzy: CAAAUQWPSIDw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: CQABVgqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: CgAAOQ7iQ9JA
Next fzzy: F7dtxwqVQ_Pw
Fzzy: CwAALwqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: DAAAMwWPSIDw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: DQAADgjqzsIQ
Next fzzy: F7dtxwqVQ_Pw
Fzzy: DgAAOgCcWv9g
Next fzzy: F7dtxwqVQ_Pw
Fzzy: DwAAKg7iQ9JA
Next fzzy: F7dtxwqVQ_Pw
Fzzy: EAAAugqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: EQAAJAqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: EgAABgIOMBgg
Next fzzy: F7dtxwqVQ_Pw
Fzzy: EwAAEwqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: FAAACQqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: FQAAIAqVQrTw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: FgAAeAWPSIDw
Next fzzy: F7dtxwqVQ_Pw
Fzzy: FwAAAw33Zb9Q
Next fzzy: F7dtxwqVQ_Pw
Fzzy: F7dt8QWPSIDw

It's obvious that my FuzzyRowFilter knows what to search and every time it
repeats its question.
The very first key - I suppose is just the first key of a region where my
key is located.
The very last key - is the key that is already bigger than what I'm trying
to find - that's the reason why FuzzyFilter stopped there.

Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
unsuccessfully.
Do you have any idea how to explain these many trials?

Thanks in advance.
-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Eugeny Morozov <em...@griddynamics.com>.
Ted, thanks for the question.
There are results of investigation.

It seems I am mistaken. I thought that scanners are assigned to each
regions to scan (and do that in parallel) and that means each scanner
should start from the beginning of its region and then fall down to the
required record.

But currently we have 256 splits in the table by the first byte of values:
start - end
NA  - \x01
\x01 - \x02
\x02 - \x03
...
\xFE - \xFF
\xFF - NA

And it turns out that the values I've seen are the values from different
regions, except two last values - they both reside in just one region:
AAAA1Q7iQ9JA : [0  <-- that's the value's first byte (meaning particular
region here)
AQAAnA96rxTg : [1
AgAADQWPSIDw : [2
...
EwAAEwqVQrTw : [19
FAAACQqVQrTw : [20
FQAAIAqVQrTw : [21
FgAAeAWPSIDw : [22
FwAAAw33Zb9Q : [23
F7dt8QWPSIDw : [23

1. I still don't get, why it skips required value.
2. The only explanation to have such an output I've found is that scanning
is  searching regions one by one until it found the value. Should it be so?
Shouldn't it start from the beginning (if there is no setStartRow) (and in
parallel for all regions at once) and in second step (after filter's
getHint method) know exactly where to go?


On Sat, Jan 19, 2013 at 5:16 PM, Ted <yu...@gmail.com> wrote:

> In your original email you said the first key looked like start key of a
> region, can you verify that ?
>
> Thanks
>
> On Jan 19, 2013, at 1:36 AM, Eugeny Morozov <em...@griddynamics.com>
> wrote:
>
> > Ted,
> >
> > that is correct.
> > HBase 0.92.x and we use part of the patch 6509.
> >
> > I use the filter as a custom filter, it lives in separate jar file and
> goes
> > to HBase's classpath. I did not patch HBase.
> > Moreover I do not use protobuf's descriptions that comes with the filter
> in
> > patch. Only two classes I have - FuzzyRowFilter itself and its test
> class.
> >
> > And it works perfectly on small dataset like 100 rows (1 region). But
> when
> > my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> > not sure, but it seems to me it is not fault of the filter.
> >
> >
> > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
> >>
> >> Looks like you were using patch from HBASE-6509 which was integrated to
> >> trunk only.
> >> Please confirm.
> >>
> >> Copying Alex who wrote the patch.
> >>
> >> Cheers
> >>
> >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> >> <em...@griddynamics.com>wrote:
> >>
> >>> Hi, folks!
> >>>
> >>> HBase, Hadoop, etc version is CDH-4.1.2
> >>>
> >>> I'm using custom FuzzyRowFilter, which I get from
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> >>> suddenly after quite a time we found that it starts loosing data.
> >>>
> >>> Basically the idea of FuzzyRowFilter is that it tries to find key that
> >> has
> >>> been provided and if there is no such a key - but more exists in table
> -
> >> it
> >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> >> required
> >>> key. As I understand, HBase in this key will fast-forward to required
> >> key -
> >>> it must be similar or same as to get Scan with setStartRow.
> >>>
> >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
> able
> >>> to get it using Scan.setStartRow.
> >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
> >> row
> >>> or anything related.
> >>> That's what happening:
> >>>
> >>> Fzzy: AAAA1Q7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AQAAnA96rxTg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AgAADQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AwAA-Q33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BAAAOg8oyu7A
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BQAA9gqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BgABZQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BwAAbgrpAojg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CAAAUQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CQABVgqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CgAAOQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CwAALwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DAAAMwWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DQAADgjqzsIQ
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DgAAOgCcWv9g
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DwAAKg7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EAAAugqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EQAAJAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EgAABgIOMBgg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EwAAEwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FAAACQqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FQAAIAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FgAAeAWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FwAAAw33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: F7dt8QWPSIDw
> >>>
> >>> It's obvious that my FuzzyRowFilter knows what to search and every time
> >> it
> >>> repeats its question.
> >>> The very first key - I suppose is just the first key of a region where
> my
> >>> key is located.
> >>> The very last key - is the key that is already bigger than what I'm
> >> trying
> >>> to find - that's the reason why FuzzyFilter stopped there.
> >>>
> >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> >>> unsuccessfully.
> >>> Do you have any idea how to explain these many trials?
> >>>
> >>> Thanks in advance.
> >>> --
> >>> Evgeny Morozov
> >>> Developer Grid Dynamics
> >>> Skype: morozov.evgeny
> >>> www.griddynamics.com
> >>> emorozov@griddynamics.com
> >
> >
> >
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emorozov@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Ted <yu...@gmail.com>.
In your original email you said the first key looked like start key of a region, can you verify that ?

Thanks

On Jan 19, 2013, at 1:36 AM, Eugeny Morozov <em...@griddynamics.com> wrote:

> Ted,
> 
> that is correct.
> HBase 0.92.x and we use part of the patch 6509.
> 
> I use the filter as a custom filter, it lives in separate jar file and goes
> to HBase's classpath. I did not patch HBase.
> Moreover I do not use protobuf's descriptions that comes with the filter in
> patch. Only two classes I have - FuzzyRowFilter itself and its test class.
> 
> And it works perfectly on small dataset like 100 rows (1 region). But when
> my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> not sure, but it seems to me it is not fault of the filter.
> 
> 
> On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> 
>> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
>> 
>> Looks like you were using patch from HBASE-6509 which was integrated to
>> trunk only.
>> Please confirm.
>> 
>> Copying Alex who wrote the patch.
>> 
>> Cheers
>> 
>> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
>> <em...@griddynamics.com>wrote:
>> 
>>> Hi, folks!
>>> 
>>> HBase, Hadoop, etc version is CDH-4.1.2
>>> 
>>> I'm using custom FuzzyRowFilter, which I get from
>> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
>>> suddenly after quite a time we found that it starts loosing data.
>>> 
>>> Basically the idea of FuzzyRowFilter is that it tries to find key that
>> has
>>> been provided and if there is no such a key - but more exists in table -
>> it
>>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
>> required
>>> key. As I understand, HBase in this key will fast-forward to required
>> key -
>>> it must be similar or same as to get Scan with setStartRow.
>>> 
>>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
>>> to get it using Scan.setStartRow.
>>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
>> row
>>> or anything related.
>>> That's what happening:
>>> 
>>> Fzzy: AAAA1Q7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AQAAnA96rxTg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AgAADQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AwAA-Q33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BAAAOg8oyu7A
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BQAA9gqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BgABZQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BwAAbgrpAojg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CAAAUQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CQABVgqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CgAAOQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CwAALwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DAAAMwWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DQAADgjqzsIQ
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DgAAOgCcWv9g
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DwAAKg7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EAAAugqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EQAAJAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EgAABgIOMBgg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EwAAEwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FAAACQqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FQAAIAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FgAAeAWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FwAAAw33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: F7dt8QWPSIDw
>>> 
>>> It's obvious that my FuzzyRowFilter knows what to search and every time
>> it
>>> repeats its question.
>>> The very first key - I suppose is just the first key of a region where my
>>> key is located.
>>> The very last key - is the key that is already bigger than what I'm
>> trying
>>> to find - that's the reason why FuzzyFilter stopped there.
>>> 
>>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
>>> unsuccessfully.
>>> Do you have any idea how to explain these many trials?
>>> 
>>> Thanks in advance.
>>> --
>>> Evgeny Morozov
>>> Developer Grid Dynamics
>>> Skype: morozov.evgeny
>>> www.griddynamics.com
>>> emorozov@griddynamics.com
> 
> 
> 
> -- 
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Eugeny Morozov <em...@griddynamics.com>.
Anoop, Ramkrishna

Thank you for explanation! I've got it.


On Mon, Jan 21, 2013 at 12:59 PM, Anoop Sam John <an...@huawei.com> wrote:

> > I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
> @Eugeny -  FuzzyFilter like any other filter works at the server side. The
> scanning from client side will be like sequential starting from the 1st
> region (Region with empty startkey or the corresponding region which
> contains the startkey whatever you mentioned in your scan). From client,
> request will go to RS for scanning a region. Once that region is over the
> next region will be contacted for scan(from client) and so on.  There is no
> parallel scanning of multiple regions from client side.  [This is when
> using a HTable scan APIs]
>
> When MR used for scanning, we will be doing parallel scans from all the
> regions. Here will be having mappers per region.  But the normal scan from
> client side will be sequential on the regions not parallel.
>
> -Anoop-
> ________________________________________
> From: Eugeny Morozov [emorozov@griddynamics.com]
> Sent: Monday, January 21, 2013 1:46 PM
> To: user@hbase.apache.org
> Cc: Alex Baranau
> Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue
>
> Finally, the mystery has been solved.
>
> Small remark before I explain everything.
>
> The situation with only region is absolutely the same:
> Fzzy: AAAA1Q7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw  <-- the value I'm trying to find.
> Fzzy: F7dt8QWPSIDw
> Somehow FuzzyRowFilter has just omit my value here.
>
>
> So, the explanation.
> In javadoc for FuzzyRowFilter question mark is used as substitution for
> unknown value. Of course it's possible to use anything including zero
> instead of question mark.
> For quite some time we used literals to encode our keys. Literals like
> you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
> of just 8 bytes, which requires 1.5 times more space. So we've decided to
> store raw version - just  byte[8]. But unfortunately the symbol '?' is
> exactly in the middle of the byte (according to ascii table
> http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
> of values in some cases. In the same time question mark is exactly before
> any letter that could be used in key.
>
> Despite the fact we have integration tests - that's just a coincidence we
> haven't such an example in there.
>
> So, as an advice - always use zero instead of question mark for
> FuzzyRowFilter.
>
> Thank's to everyone!
>
> P.S. But the question with region scanning order is still here. I do not
> understand why with FuzzyFilter it goes from one region to another until it
> stops at the value. I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
>
> On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <michael_segel@hotmail.com
> >wrote:
>
> > If its the same class and its not a patch, then the first class loaded
> > wins.
> >
> > So if you have a Class Foo and HBase has a Class Foo, your code will
> never
> > see the light of day.
> >
> > Perhaps I'm stating the obvious but its something to think about when
> > working w Hadoop.
> >
> > On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <em...@griddynamics.com>
> > wrote:
> >
> > > Ted,
> > >
> > > that is correct.
> > > HBase 0.92.x and we use part of the patch 6509.
> > >
> > > I use the filter as a custom filter, it lives in separate jar file and
> > goes
> > > to HBase's classpath. I did not patch HBase.
> > > Moreover I do not use protobuf's descriptions that comes with the
> filter
> > in
> > > patch. Only two classes I have - FuzzyRowFilter itself and its test
> > class.
> > >
> > > And it works perfectly on small dataset like 100 rows (1 region). But
> > when
> > > my dataset is more than 10mln (260 regions), it somehow loosing rows.
> I'm
> > > not sure, but it seems to me it is not fault of the filter.
> > >
> > >
> > > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
> > >>
> > >> Looks like you were using patch from HBASE-6509 which was integrated
> to
> > >> trunk only.
> > >> Please confirm.
> > >>
> > >> Copying Alex who wrote the patch.
> > >>
> > >> Cheers
> > >>
> > >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> > >> <em...@griddynamics.com>wrote:
> > >>
> > >>> Hi, folks!
> > >>>
> > >>> HBase, Hadoop, etc version is CDH-4.1.2
> > >>>
> > >>> I'm using custom FuzzyRowFilter, which I get from
> > >>>
> > >>>
> > >>
> >
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> > >>> suddenly after quite a time we found that it starts loosing data.
> > >>>
> > >>> Basically the idea of FuzzyRowFilter is that it tries to find key
> that
> > >> has
> > >>> been provided and if there is no such a key - but more exists in
> table
> > -
> > >> it
> > >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> > >> required
> > >>> key. As I understand, HBase in this key will fast-forward to required
> > >> key -
> > >>> it must be similar or same as to get Scan with setStartRow.
> > >>>
> > >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
> > able
> > >>> to get it using Scan.setStartRow.
> > >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row,
> stop
> > >> row
> > >>> or anything related.
> > >>> That's what happening:
> > >>>
> > >>> Fzzy: AAAA1Q7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AQAAnA96rxTg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AgAADQWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AwAA-Q33Zb9Q
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BAAAOg8oyu7A
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BQAA9gqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BgABZQ7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BwAAbgrpAojg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CAAAUQWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CQABVgqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CgAAOQ7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CwAALwqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DAAAMwWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DQAADgjqzsIQ
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DgAAOgCcWv9g
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DwAAKg7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EAAAugqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EQAAJAqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EgAABgIOMBgg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EwAAEwqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FAAACQqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FQAAIAqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FgAAeAWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FwAAAw33Zb9Q
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: F7dt8QWPSIDw
> > >>>
> > >>> It's obvious that my FuzzyRowFilter knows what to search and every
> time
> > >> it
> > >>> repeats its question.
> > >>> The very first key - I suppose is just the first key of a region
> where
> > my
> > >>> key is located.
> > >>> The very last key - is the key that is already bigger than what I'm
> > >> trying
> > >>> to find - that's the reason why FuzzyFilter stopped there.
> > >>>
> > >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> > >>> unsuccessfully.
> > >>> Do you have any idea how to explain these many trials?
> > >>>
> > >>> Thanks in advance.
> > >>> --
> > >>> Evgeny Morozov
> > >>> Developer Grid Dynamics
> > >>> Skype: morozov.evgeny
> > >>> www.griddynamics.com
> > >>> emorozov@griddynamics.com
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Evgeny Morozov
> > > Developer Grid Dynamics
> > > Skype: morozov.evgeny
> > > www.griddynamics.com
> > > emorozov@griddynamics.com
> >
> >
>
>
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

RE: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Anoop Sam John <an...@huawei.com>.
> I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.

@Eugeny -  FuzzyFilter like any other filter works at the server side. The scanning from client side will be like sequential starting from the 1st region (Region with empty startkey or the corresponding region which contains the startkey whatever you mentioned in your scan). From client, request will go to RS for scanning a region. Once that region is over the next region will be contacted for scan(from client) and so on.  There is no parallel scanning of multiple regions from client side.  [This is when using a HTable scan APIs]

When MR used for scanning, we will be doing parallel scans from all the regions. Here will be having mappers per region.  But the normal scan from client side will be sequential on the regions not parallel.

-Anoop-
________________________________________
From: Eugeny Morozov [emorozov@griddynamics.com]
Sent: Monday, January 21, 2013 1:46 PM
To: user@hbase.apache.org
Cc: Alex Baranau
Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Finally, the mystery has been solved.

Small remark before I explain everything.

The situation with only region is absolutely the same:
Fzzy: AAAA1Q7iQ9JA
Next fzzy: F7dtxwqVQ_Pw  <-- the value I'm trying to find.
Fzzy: F7dt8QWPSIDw
Somehow FuzzyRowFilter has just omit my value here.


So, the explanation.
In javadoc for FuzzyRowFilter question mark is used as substitution for
unknown value. Of course it's possible to use anything including zero
instead of question mark.
For quite some time we used literals to encode our keys. Literals like
you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
of just 8 bytes, which requires 1.5 times more space. So we've decided to
store raw version - just  byte[8]. But unfortunately the symbol '?' is
exactly in the middle of the byte (according to ascii table
http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
of values in some cases. In the same time question mark is exactly before
any letter that could be used in key.

Despite the fact we have integration tests - that's just a coincidence we
haven't such an example in there.

So, as an advice - always use zero instead of question mark for
FuzzyRowFilter.

Thank's to everyone!

P.S. But the question with region scanning order is still here. I do not
understand why with FuzzyFilter it goes from one region to another until it
stops at the value. I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.


On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <mi...@hotmail.com>wrote:

> If its the same class and its not a patch, then the first class loaded
> wins.
>
> So if you have a Class Foo and HBase has a Class Foo, your code will never
> see the light of day.
>
> Perhaps I'm stating the obvious but its something to think about when
> working w Hadoop.
>
> On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <em...@griddynamics.com>
> wrote:
>
> > Ted,
> >
> > that is correct.
> > HBase 0.92.x and we use part of the patch 6509.
> >
> > I use the filter as a custom filter, it lives in separate jar file and
> goes
> > to HBase's classpath. I did not patch HBase.
> > Moreover I do not use protobuf's descriptions that comes with the filter
> in
> > patch. Only two classes I have - FuzzyRowFilter itself and its test
> class.
> >
> > And it works perfectly on small dataset like 100 rows (1 region). But
> when
> > my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> > not sure, but it seems to me it is not fault of the filter.
> >
> >
> > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
> >>
> >> Looks like you were using patch from HBASE-6509 which was integrated to
> >> trunk only.
> >> Please confirm.
> >>
> >> Copying Alex who wrote the patch.
> >>
> >> Cheers
> >>
> >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> >> <em...@griddynamics.com>wrote:
> >>
> >>> Hi, folks!
> >>>
> >>> HBase, Hadoop, etc version is CDH-4.1.2
> >>>
> >>> I'm using custom FuzzyRowFilter, which I get from
> >>>
> >>>
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> >>> suddenly after quite a time we found that it starts loosing data.
> >>>
> >>> Basically the idea of FuzzyRowFilter is that it tries to find key that
> >> has
> >>> been provided and if there is no such a key - but more exists in table
> -
> >> it
> >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> >> required
> >>> key. As I understand, HBase in this key will fast-forward to required
> >> key -
> >>> it must be similar or same as to get Scan with setStartRow.
> >>>
> >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
> able
> >>> to get it using Scan.setStartRow.
> >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
> >> row
> >>> or anything related.
> >>> That's what happening:
> >>>
> >>> Fzzy: AAAA1Q7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AQAAnA96rxTg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AgAADQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AwAA-Q33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BAAAOg8oyu7A
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BQAA9gqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BgABZQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BwAAbgrpAojg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CAAAUQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CQABVgqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CgAAOQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CwAALwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DAAAMwWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DQAADgjqzsIQ
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DgAAOgCcWv9g
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DwAAKg7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EAAAugqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EQAAJAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EgAABgIOMBgg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EwAAEwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FAAACQqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FQAAIAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FgAAeAWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FwAAAw33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: F7dt8QWPSIDw
> >>>
> >>> It's obvious that my FuzzyRowFilter knows what to search and every time
> >> it
> >>> repeats its question.
> >>> The very first key - I suppose is just the first key of a region where
> my
> >>> key is located.
> >>> The very last key - is the key that is already bigger than what I'm
> >> trying
> >>> to find - that's the reason why FuzzyFilter stopped there.
> >>>
> >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> >>> unsuccessfully.
> >>> Do you have any idea how to explain these many trials?
> >>>
> >>> Thanks in advance.
> >>> --
> >>> Evgeny Morozov
> >>> Developer Grid Dynamics
> >>> Skype: morozov.evgeny
> >>> www.griddynamics.com
> >>> emorozov@griddynamics.com
> >>>
> >>
> >
> >
> >
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emorozov@griddynamics.com
>
>


--
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by ramkrishna vasudevan <ra...@gmail.com>.
On Mon, Jan 21, 2013 at 1:46 PM, Eugeny Morozov
<em...@griddynamics.com>wrote:

> I do not
> understand why with FuzzyFilter it goes from one region to another until it
> stops at the value. I suppose if scanning process has started at once on
> all regions
>

Scanning process does not start parallely on all regions.  Once a start row
is specified with the scan, the corresponding region server is picked up
and on that region server,
the scan starts from that region which holds the start row and the scan
proceeds till it finds the stop row. The stop row can be any of the regions
in the same region server, in the exact increasing byte order.

Regards
Ram

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Eugeny Morozov <em...@griddynamics.com>.
Finally, the mystery has been solved.

Small remark before I explain everything.

The situation with only region is absolutely the same:
Fzzy: AAAA1Q7iQ9JA
Next fzzy: F7dtxwqVQ_Pw  <-- the value I'm trying to find.
Fzzy: F7dt8QWPSIDw
Somehow FuzzyRowFilter has just omit my value here.


So, the explanation.
In javadoc for FuzzyRowFilter question mark is used as substitution for
unknown value. Of course it's possible to use anything including zero
instead of question mark.
For quite some time we used literals to encode our keys. Literals like
you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
of just 8 bytes, which requires 1.5 times more space. So we've decided to
store raw version - just  byte[8]. But unfortunately the symbol '?' is
exactly in the middle of the byte (according to ascii table
http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
of values in some cases. In the same time question mark is exactly before
any letter that could be used in key.

Despite the fact we have integration tests - that's just a coincidence we
haven't such an example in there.

So, as an advice - always use zero instead of question mark for
FuzzyRowFilter.

Thank's to everyone!

P.S. But the question with region scanning order is still here. I do not
understand why with FuzzyFilter it goes from one region to another until it
stops at the value. I suppose if scanning process has started at once on
all regions, then I would find in log files at least one value per region,
but I have found one value per region only for those regions, that resides
before the particular one.


On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <mi...@hotmail.com>wrote:

> If its the same class and its not a patch, then the first class loaded
> wins.
>
> So if you have a Class Foo and HBase has a Class Foo, your code will never
> see the light of day.
>
> Perhaps I'm stating the obvious but its something to think about when
> working w Hadoop.
>
> On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <em...@griddynamics.com>
> wrote:
>
> > Ted,
> >
> > that is correct.
> > HBase 0.92.x and we use part of the patch 6509.
> >
> > I use the filter as a custom filter, it lives in separate jar file and
> goes
> > to HBase's classpath. I did not patch HBase.
> > Moreover I do not use protobuf's descriptions that comes with the filter
> in
> > patch. Only two classes I have - FuzzyRowFilter itself and its test
> class.
> >
> > And it works perfectly on small dataset like 100 rows (1 region). But
> when
> > my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> > not sure, but it seems to me it is not fault of the filter.
> >
> >
> > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
> >>
> >> Looks like you were using patch from HBASE-6509 which was integrated to
> >> trunk only.
> >> Please confirm.
> >>
> >> Copying Alex who wrote the patch.
> >>
> >> Cheers
> >>
> >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> >> <em...@griddynamics.com>wrote:
> >>
> >>> Hi, folks!
> >>>
> >>> HBase, Hadoop, etc version is CDH-4.1.2
> >>>
> >>> I'm using custom FuzzyRowFilter, which I get from
> >>>
> >>>
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> >>> suddenly after quite a time we found that it starts loosing data.
> >>>
> >>> Basically the idea of FuzzyRowFilter is that it tries to find key that
> >> has
> >>> been provided and if there is no such a key - but more exists in table
> -
> >> it
> >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> >> required
> >>> key. As I understand, HBase in this key will fast-forward to required
> >> key -
> >>> it must be similar or same as to get Scan with setStartRow.
> >>>
> >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
> able
> >>> to get it using Scan.setStartRow.
> >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
> >> row
> >>> or anything related.
> >>> That's what happening:
> >>>
> >>> Fzzy: AAAA1Q7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AQAAnA96rxTg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AgAADQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: AwAA-Q33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BAAAOg8oyu7A
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BQAA9gqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BgABZQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: BwAAbgrpAojg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CAAAUQWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CQABVgqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CgAAOQ7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: CwAALwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DAAAMwWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DQAADgjqzsIQ
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DgAAOgCcWv9g
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: DwAAKg7iQ9JA
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EAAAugqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EQAAJAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EgAABgIOMBgg
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: EwAAEwqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FAAACQqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FQAAIAqVQrTw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FgAAeAWPSIDw
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: FwAAAw33Zb9Q
> >>> Next fzzy: F7dtxwqVQ_Pw
> >>> Fzzy: F7dt8QWPSIDw
> >>>
> >>> It's obvious that my FuzzyRowFilter knows what to search and every time
> >> it
> >>> repeats its question.
> >>> The very first key - I suppose is just the first key of a region where
> my
> >>> key is located.
> >>> The very last key - is the key that is already bigger than what I'm
> >> trying
> >>> to find - that's the reason why FuzzyFilter stopped there.
> >>>
> >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> >>> unsuccessfully.
> >>> Do you have any idea how to explain these many trials?
> >>>
> >>> Thanks in advance.
> >>> --
> >>> Evgeny Morozov
> >>> Developer Grid Dynamics
> >>> Skype: morozov.evgeny
> >>> www.griddynamics.com
> >>> emorozov@griddynamics.com
> >>>
> >>
> >
> >
> >
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emorozov@griddynamics.com
>
>


-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Michael Segel <mi...@hotmail.com>.
If its the same class and its not a patch, then the first class loaded wins. 

So if you have a Class Foo and HBase has a Class Foo, your code will never see the light of day.

Perhaps I'm stating the obvious but its something to think about when working w Hadoop. 

On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <em...@griddynamics.com> wrote:

> Ted,
> 
> that is correct.
> HBase 0.92.x and we use part of the patch 6509.
> 
> I use the filter as a custom filter, it lives in separate jar file and goes
> to HBase's classpath. I did not patch HBase.
> Moreover I do not use protobuf's descriptions that comes with the filter in
> patch. Only two classes I have - FuzzyRowFilter itself and its test class.
> 
> And it works perfectly on small dataset like 100 rows (1 region). But when
> my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> not sure, but it seems to me it is not fault of the filter.
> 
> 
> On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:
> 
>> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
>> 
>> Looks like you were using patch from HBASE-6509 which was integrated to
>> trunk only.
>> Please confirm.
>> 
>> Copying Alex who wrote the patch.
>> 
>> Cheers
>> 
>> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
>> <em...@griddynamics.com>wrote:
>> 
>>> Hi, folks!
>>> 
>>> HBase, Hadoop, etc version is CDH-4.1.2
>>> 
>>> I'm using custom FuzzyRowFilter, which I get from
>>> 
>>> 
>> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
>>> suddenly after quite a time we found that it starts loosing data.
>>> 
>>> Basically the idea of FuzzyRowFilter is that it tries to find key that
>> has
>>> been provided and if there is no such a key - but more exists in table -
>> it
>>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
>> required
>>> key. As I understand, HBase in this key will fast-forward to required
>> key -
>>> it must be similar or same as to get Scan with setStartRow.
>>> 
>>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
>>> to get it using Scan.setStartRow.
>>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
>> row
>>> or anything related.
>>> That's what happening:
>>> 
>>> Fzzy: AAAA1Q7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AQAAnA96rxTg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AgAADQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AwAA-Q33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BAAAOg8oyu7A
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BQAA9gqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BgABZQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BwAAbgrpAojg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CAAAUQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CQABVgqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CgAAOQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CwAALwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DAAAMwWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DQAADgjqzsIQ
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DgAAOgCcWv9g
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DwAAKg7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EAAAugqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EQAAJAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EgAABgIOMBgg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EwAAEwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FAAACQqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FQAAIAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FgAAeAWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FwAAAw33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: F7dt8QWPSIDw
>>> 
>>> It's obvious that my FuzzyRowFilter knows what to search and every time
>> it
>>> repeats its question.
>>> The very first key - I suppose is just the first key of a region where my
>>> key is located.
>>> The very last key - is the key that is already bigger than what I'm
>> trying
>>> to find - that's the reason why FuzzyFilter stopped there.
>>> 
>>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
>>> unsuccessfully.
>>> Do you have any idea how to explain these many trials?
>>> 
>>> Thanks in advance.
>>> --
>>> Evgeny Morozov
>>> Developer Grid Dynamics
>>> Skype: morozov.evgeny
>>> www.griddynamics.com
>>> emorozov@griddynamics.com
>>> 
>> 
> 
> 
> 
> -- 
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com


Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Eugeny Morozov <em...@griddynamics.com>.
Ted,

that is correct.
HBase 0.92.x and we use part of the patch 6509.

I use the filter as a custom filter, it lives in separate jar file and goes
to HBase's classpath. I did not patch HBase.
Moreover I do not use protobuf's descriptions that comes with the filter in
patch. Only two classes I have - FuzzyRowFilter itself and its test class.

And it works perfectly on small dataset like 100 rows (1 region). But when
my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
not sure, but it seems to me it is not fault of the filter.


On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yu...@gmail.com> wrote:

> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
>
> Looks like you were using patch from HBASE-6509 which was integrated to
> trunk only.
> Please confirm.
>
> Copying Alex who wrote the patch.
>
> Cheers
>
> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> <em...@griddynamics.com>wrote:
>
> > Hi, folks!
> >
> > HBase, Hadoop, etc version is CDH-4.1.2
> >
> > I'm using custom FuzzyRowFilter, which I get from
> >
> >
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> > suddenly after quite a time we found that it starts loosing data.
> >
> > Basically the idea of FuzzyRowFilter is that it tries to find key that
> has
> > been provided and if there is no such a key - but more exists in table -
> it
> > returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> required
> > key. As I understand, HBase in this key will fast-forward to required
> key -
> > it must be similar or same as to get Scan with setStartRow.
> >
> > I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
> > to get it using Scan.setStartRow.
> > For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
> row
> > or anything related.
> > That's what happening:
> >
> > Fzzy: AAAA1Q7iQ9JA
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: AQAAnA96rxTg
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: AgAADQWPSIDw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: AwAA-Q33Zb9Q
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: BAAAOg8oyu7A
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: BQAA9gqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: BgABZQ7iQ9JA
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: BwAAbgrpAojg
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: CAAAUQWPSIDw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: CQABVgqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: CgAAOQ7iQ9JA
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: CwAALwqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: DAAAMwWPSIDw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: DQAADgjqzsIQ
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: DgAAOgCcWv9g
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: DwAAKg7iQ9JA
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: EAAAugqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: EQAAJAqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: EgAABgIOMBgg
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: EwAAEwqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: FAAACQqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: FQAAIAqVQrTw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: FgAAeAWPSIDw
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: FwAAAw33Zb9Q
> > Next fzzy: F7dtxwqVQ_Pw
> > Fzzy: F7dt8QWPSIDw
> >
> > It's obvious that my FuzzyRowFilter knows what to search and every time
> it
> > repeats its question.
> > The very first key - I suppose is just the first key of a region where my
> > key is located.
> > The very last key - is the key that is already bigger than what I'm
> trying
> > to find - that's the reason why FuzzyFilter stopped there.
> >
> > Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> > unsuccessfully.
> > Do you have any idea how to explain these many trials?
> >
> > Thanks in advance.
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emorozov@griddynamics.com
> >
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

Posted by Ted Yu <yu...@gmail.com>.
To my knowledge CDH-4.1.2 is based on HBase 0.92.x

Looks like you were using patch from HBASE-6509 which was integrated to
trunk only.
Please confirm.

Copying Alex who wrote the patch.

Cheers

On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
<em...@griddynamics.com>wrote:

> Hi, folks!
>
> HBase, Hadoop, etc version is CDH-4.1.2
>
> I'm using custom FuzzyRowFilter, which I get from
>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> suddenly after quite a time we found that it starts loosing data.
>
> Basically the idea of FuzzyRowFilter is that it tries to find key that has
> been provided and if there is no such a key - but more exists in table - it
> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds required
> key. As I understand, HBase in this key will fast-forward to required key -
> it must be similar or same as to get Scan with setStartRow.
>
> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
> to get it using Scan.setStartRow.
> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop row
> or anything related.
> That's what happening:
>
> Fzzy: AAAA1Q7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: AQAAnA96rxTg
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: AgAADQWPSIDw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: AwAA-Q33Zb9Q
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: BAAAOg8oyu7A
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: BQAA9gqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: BgABZQ7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: BwAAbgrpAojg
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: CAAAUQWPSIDw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: CQABVgqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: CgAAOQ7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: CwAALwqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: DAAAMwWPSIDw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: DQAADgjqzsIQ
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: DgAAOgCcWv9g
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: DwAAKg7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: EAAAugqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: EQAAJAqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: EgAABgIOMBgg
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: EwAAEwqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: FAAACQqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: FQAAIAqVQrTw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: FgAAeAWPSIDw
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: FwAAAw33Zb9Q
> Next fzzy: F7dtxwqVQ_Pw
> Fzzy: F7dt8QWPSIDw
>
> It's obvious that my FuzzyRowFilter knows what to search and every time it
> repeats its question.
> The very first key - I suppose is just the first key of a region where my
> key is located.
> The very last key - is the key that is already bigger than what I'm trying
> to find - that's the reason why FuzzyFilter stopped there.
>
> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> unsuccessfully.
> Do you have any idea how to explain these many trials?
>
> Thanks in advance.
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com
>