You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jozef Vilcek <jo...@gmail.com> on 2016/03/29 14:42:36 UTC

Coprocessors causes data access delay?

Hi,

I need a help to shed some light on following observation:

Through hbase shell I do PUT to a non existing cell (create) and immediate
GET.
When doing this against hbase table without a coprocessor, every GET shows
the value immediately. When I enable a coprocessor on the table and perform
the same check, I do not get a value for every GET. Sometimes GET returns
nothing, like cell is not there. If I delay GET few minutes after PUT, then
data are correctly retrieved.

I want to understand what is going on.
Is coprocessor is doing something wrong? Is this somehow normal? or ... ?

Coprocessor is a RegionObserver doing data filtering / expiration based on
predefined rules.
HBase version is 1.0.0 from cloudera cdh5.5.1 distibution.

Thanks,
Jozo

Re: Coprocessors causes data access delay?

Posted by Jozef Vilcek <jo...@gmail.com>.
When changed

   store.getSmallestReadPoint()

to

   ((HStore)store).getHRegion().getReadpoint(IsolationLevel.READ_COMMITTED)

coprocessor behavior is correct. This is taken from coprocessor example
code. We have recently upgraded hbase version (from 0.94.x to 1.0.0) and
missed out that coprocessor example has changed too. From the API
perspective it is a bit unexpected :)

Thanks a lot for the support!

On Mon, Apr 18, 2016 at 12:36 PM, Anoop John <an...@gmail.com> wrote:

> >>store.getSmallestReadPoint()
>
> This is the problem.   At the time of write some other readers (opened
> in past) were present, then all those are present readPnts and u will
> get min of all such read points.
>
> Its a pity that we are not passing the read point to the CP pre hook.
> That is obtained in RegionScannerImpl.
>
> If u can change the HBase code and build,  can just try passing the
> readPnt also to CP and use that to open this Scanner and test once?
>
> I refer below code where we call CP pre hook
>
> public KeyValueScanner getScanner(Scan scan,
>       final NavigableSet<byte []> targetCols, long readPt) throws
> IOException {
>     lock.readLock().lock();
>     try {
>       KeyValueScanner scanner = null;
>       if (this.getCoprocessorHost() != null) {
>         scanner = this.getCoprocessorHost().preStoreScannerOpen(this,
> scan, targetCols);
>       }
>       scanner = createScanner(scan, targetCols, readPt, scanner);
>       return scanner;
>     } finally {
>       lock.readLock().unlock();
>     }
>   }
>
>
>
> -Anoop-
>
> On Mon, Apr 18, 2016 at 12:56 PM, Jozef Vilcek <jo...@gmail.com>
> wrote:
> > I have new information after some more debugging:
> >
> > I am sure coprocessor is not filtering out values in get/put hooks.
> > The observed behavior is:
> >   We create non-existing cell via PUT and immediately after the call
> >   returns, we are trying to get the created cell value via GET. The GET
> >   is done every second in a loop until the cell is returned (the result
> >   is not empty).
> >   For most of the time the first GET request succeeds, but there are
> >   some periods of time when the obtained results are empty and the cell
> >   appears just after some time ranging from few seconds to something
> >   like 20 minutes.
> >   We do such test every minute in a separate process and log the results.
> >   Excerpt from such log:
> >
> >     [put-time 20:19:03] [successful-get-time 20:21:03 = after-120.0-secs]
> >     [put-time 20:20:03] [successful-get-time 20:21:03 = after-59.6-secs]
> >     [put-time 23:01:02] [successful-get-time 23:05:50 = after-288.2-secs]
> >     [put-time 23:02:02] [successful-get-time 23:05:50 = after-228.1-secs]
> >     [put-time 23:03:02] [successful-get-time 23:05:50 = after-167.9-secs]
> >     [put-time 23:04:02] [successful-get-time 23:05:50 = after-107.7-secs]
> >     [put-time 23:05:02] [successful-get-time 23:05:50 = after-47.5-secs]
> >     [put-time 23:08:02] [successful-get-time 23:31:59 =
> after-1436.9-secs]
> >     [put-time 23:09:02] [successful-get-time 23:31:59 =
> after-1376.8-secs]
> >
> >   Note that the cells seem to appear in "waves", multiple at the same
> time.
> >   Unsuccessful GETs (those between the PUT and the final successful GET)
> >   don't timeout and return just empty result without any cells.
> >
> >   Coprocessor is logging "trafic" in pre/post get/put hooks for tested
> > keys.
> >   PUTs and GETs are always observed immediately client send requests, for
> >   for problematic cases, coprocessor does not see any data, yet, for GETs
> >
> >
> > Problem is somehow related to coprocessor itself. When I switch to
> > coprocessor which does only logging, observed data delays disappear.
> > However, observed delays are happening only sporadically for small period
> > of time. Like it is triggered by some other action (compaction, hbase
> being
> > under load, ...?).
> >
> > I can not share the full code, but "simplified" coprocessor, which also
> > produced this behavior looks like this:
> >
> > RegionObserver coprocessor, which just creates a new scanner with dummy
> > filter in its preStoreScannerOpen, preFlushScannerOpen and
> > preCompactScannerOpen:
> >
> >   preStoreScannerOpen:
> >     return new StoreScanner(store, store.getScanInfo(),
> >       scan.setFilter(new DummyFilter()), targetCols,
> >  store.getSmallestReadPoint());
> >
> >   preFlushScannerOpen:
> >     return new StoreScanner(store, store.getScanInfo(), new
> > Scan().setFilter(new DummyFilter()),
> >       Collections.singletonList(memstoreScanner),
> >       ScanType.COMPACT_RETAIN_DELETES, store.getSmallestReadPoint(),
> >       HConstants.OLDEST_TIMESTAMP);
> >
> >   preCompactScannerOpen:
> >     return new StoreScanner(store, store.getScanInfo(),
> >       new Scan().setFilter(new DummyFilter()), scanners, scanType,
> >       store.getSmallestReadPoint(), earliestPutTs);
> >
> > Besides some logging, the DummyFilter efectively just returns
> > ReturnCode.INCLUDE
> > in its filterKeyValue() method.
> >
> > Shooting blind. Can it be that key under some circumstances can not be
> seen
> > by StoreScanner until it is flushed somewhere?
> >
> > Many thanks in advance,
> > Jozo
> >
> > On Fri, Apr 1, 2016 at 10:47 AM, Anoop John <an...@gmail.com>
> wrote:
> >
> >> Can you provide us the code u have written for the pre hooks + the
> >> custom filter code.  You sure the filter is not filtering out this
> >> recent cell?  Add some logs there and check?
> >>
> >> -Anoop-
> >>
> >> On Thu, Mar 31, 2016 at 12:57 PM, Jozef Vilcek <jo...@gmail.com>
> >> wrote:
> >> > Test was done via hbase shell prompt. I am not sure on it's behavior.
> Can
> >> > it hide some timeouts and return nothing? I hope not.
> >> > I will try to reproduce it with java API.
> >> >
> >> > Coprocessor does not do anything asynchronous in it's hooks. It is
> hooked
> >> > only on preStoreScannerOpen, preFlushScannerOpen,
> preCompactScannerOpen.
> >> > What it does is, create a StoreScanner with custom filter, which skips
> >> > certain cells based on fixed rules line qualifier name pattern or how
> old
> >> > cell is.
> >> >
> >> >
> >> >> Do you wait for the Put to return before you issue the Get?  Or have
> you
> >> >> disabled auto flushing on your table instance --
> >> HTable.setAutoFlush(false)?
> >> >
> >> >> Coprocessor hooks executed on the Put and Get paths are blocking, so
> >> unless
> >> >> you're overriding normal processing in the prePut() hook and doing
> >> >> something asynchronous, the Put should be there by the time it
> returns
> >> to
> >> >> the client.  But it all depends on what is your coprocessor actually
> >> >> doing.  Can you tell us more about that?
> >> >
> >> >> On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:
> >> >
> >> >> > Is it possible to come up with unit test showing this behavior
> using>
> >> > simplified coprocessor ?> >> > Thanks> >> > > On Mar 29, 2016, at 5:42
> >> AM, Jozef Vilcek <jo...@gmail.com> wrote:> > >> > > Hi,> > >> >
> > I
> >> need a help to shed some light on following observation:> > >> > >
> Through
> >> hbase shell I do PUT to a non existing cell (create) and> > immediate>
> > >
> >> GET.> > > When doing this against hbase table without a coprocessor,
> every
> >> GET> > shows> > > the value immediately. When I enable a coprocessor on
> the
> >> table and> > > perform> > > the same check, I do not get a value for
> every
> >> GET. Sometimes GET returns> > > nothing, like cell is not there. If I
> delay
> >> GET few minutes after PUT,> > then> > > data are correctly retrieved.>
> > >>
> >> > > I want to understand what is going on.> > > Is coprocessor is doing
> >> something wrong? Is this somehow normal? or ... ?> > >> > > Coprocessor
> is
> >> a RegionObserver doing data filtering / expiration based> > on> > >
> >> predefined rules.> > > HBase version is 1.0.0 from cloudera cdh5.5.1
> >> distibution.> > >> > > Thanks,> > > Jozo
> >> >>
> >>
>

Re: Coprocessors causes data access delay?

Posted by Anoop John <an...@gmail.com>.
>>store.getSmallestReadPoint()

This is the problem.   At the time of write some other readers (opened
in past) were present, then all those are present readPnts and u will
get min of all such read points.

Its a pity that we are not passing the read point to the CP pre hook.
That is obtained in RegionScannerImpl.

If u can change the HBase code and build,  can just try passing the
readPnt also to CP and use that to open this Scanner and test once?

I refer below code where we call CP pre hook

public KeyValueScanner getScanner(Scan scan,
      final NavigableSet<byte []> targetCols, long readPt) throws IOException {
    lock.readLock().lock();
    try {
      KeyValueScanner scanner = null;
      if (this.getCoprocessorHost() != null) {
        scanner = this.getCoprocessorHost().preStoreScannerOpen(this,
scan, targetCols);
      }
      scanner = createScanner(scan, targetCols, readPt, scanner);
      return scanner;
    } finally {
      lock.readLock().unlock();
    }
  }



-Anoop-

On Mon, Apr 18, 2016 at 12:56 PM, Jozef Vilcek <jo...@gmail.com> wrote:
> I have new information after some more debugging:
>
> I am sure coprocessor is not filtering out values in get/put hooks.
> The observed behavior is:
>   We create non-existing cell via PUT and immediately after the call
>   returns, we are trying to get the created cell value via GET. The GET
>   is done every second in a loop until the cell is returned (the result
>   is not empty).
>   For most of the time the first GET request succeeds, but there are
>   some periods of time when the obtained results are empty and the cell
>   appears just after some time ranging from few seconds to something
>   like 20 minutes.
>   We do such test every minute in a separate process and log the results.
>   Excerpt from such log:
>
>     [put-time 20:19:03] [successful-get-time 20:21:03 = after-120.0-secs]
>     [put-time 20:20:03] [successful-get-time 20:21:03 = after-59.6-secs]
>     [put-time 23:01:02] [successful-get-time 23:05:50 = after-288.2-secs]
>     [put-time 23:02:02] [successful-get-time 23:05:50 = after-228.1-secs]
>     [put-time 23:03:02] [successful-get-time 23:05:50 = after-167.9-secs]
>     [put-time 23:04:02] [successful-get-time 23:05:50 = after-107.7-secs]
>     [put-time 23:05:02] [successful-get-time 23:05:50 = after-47.5-secs]
>     [put-time 23:08:02] [successful-get-time 23:31:59 = after-1436.9-secs]
>     [put-time 23:09:02] [successful-get-time 23:31:59 = after-1376.8-secs]
>
>   Note that the cells seem to appear in "waves", multiple at the same time.
>   Unsuccessful GETs (those between the PUT and the final successful GET)
>   don't timeout and return just empty result without any cells.
>
>   Coprocessor is logging "trafic" in pre/post get/put hooks for tested
> keys.
>   PUTs and GETs are always observed immediately client send requests, for
>   for problematic cases, coprocessor does not see any data, yet, for GETs
>
>
> Problem is somehow related to coprocessor itself. When I switch to
> coprocessor which does only logging, observed data delays disappear.
> However, observed delays are happening only sporadically for small period
> of time. Like it is triggered by some other action (compaction, hbase being
> under load, ...?).
>
> I can not share the full code, but "simplified" coprocessor, which also
> produced this behavior looks like this:
>
> RegionObserver coprocessor, which just creates a new scanner with dummy
> filter in its preStoreScannerOpen, preFlushScannerOpen and
> preCompactScannerOpen:
>
>   preStoreScannerOpen:
>     return new StoreScanner(store, store.getScanInfo(),
>       scan.setFilter(new DummyFilter()), targetCols,
>  store.getSmallestReadPoint());
>
>   preFlushScannerOpen:
>     return new StoreScanner(store, store.getScanInfo(), new
> Scan().setFilter(new DummyFilter()),
>       Collections.singletonList(memstoreScanner),
>       ScanType.COMPACT_RETAIN_DELETES, store.getSmallestReadPoint(),
>       HConstants.OLDEST_TIMESTAMP);
>
>   preCompactScannerOpen:
>     return new StoreScanner(store, store.getScanInfo(),
>       new Scan().setFilter(new DummyFilter()), scanners, scanType,
>       store.getSmallestReadPoint(), earliestPutTs);
>
> Besides some logging, the DummyFilter efectively just returns
> ReturnCode.INCLUDE
> in its filterKeyValue() method.
>
> Shooting blind. Can it be that key under some circumstances can not be seen
> by StoreScanner until it is flushed somewhere?
>
> Many thanks in advance,
> Jozo
>
> On Fri, Apr 1, 2016 at 10:47 AM, Anoop John <an...@gmail.com> wrote:
>
>> Can you provide us the code u have written for the pre hooks + the
>> custom filter code.  You sure the filter is not filtering out this
>> recent cell?  Add some logs there and check?
>>
>> -Anoop-
>>
>> On Thu, Mar 31, 2016 at 12:57 PM, Jozef Vilcek <jo...@gmail.com>
>> wrote:
>> > Test was done via hbase shell prompt. I am not sure on it's behavior. Can
>> > it hide some timeouts and return nothing? I hope not.
>> > I will try to reproduce it with java API.
>> >
>> > Coprocessor does not do anything asynchronous in it's hooks. It is hooked
>> > only on preStoreScannerOpen, preFlushScannerOpen, preCompactScannerOpen.
>> > What it does is, create a StoreScanner with custom filter, which skips
>> > certain cells based on fixed rules line qualifier name pattern or how old
>> > cell is.
>> >
>> >
>> >> Do you wait for the Put to return before you issue the Get?  Or have you
>> >> disabled auto flushing on your table instance --
>> HTable.setAutoFlush(false)?
>> >
>> >> Coprocessor hooks executed on the Put and Get paths are blocking, so
>> unless
>> >> you're overriding normal processing in the prePut() hook and doing
>> >> something asynchronous, the Put should be there by the time it returns
>> to
>> >> the client.  But it all depends on what is your coprocessor actually
>> >> doing.  Can you tell us more about that?
>> >
>> >> On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:
>> >
>> >> > Is it possible to come up with unit test showing this behavior using>
>> > simplified coprocessor ?> >> > Thanks> >> > > On Mar 29, 2016, at 5:42
>> AM, Jozef Vilcek <jo...@gmail.com> wrote:> > >> > > Hi,> > >> > > I
>> need a help to shed some light on following observation:> > >> > > Through
>> hbase shell I do PUT to a non existing cell (create) and> > immediate> > >
>> GET.> > > When doing this against hbase table without a coprocessor, every
>> GET> > shows> > > the value immediately. When I enable a coprocessor on the
>> table and> > > perform> > > the same check, I do not get a value for every
>> GET. Sometimes GET returns> > > nothing, like cell is not there. If I delay
>> GET few minutes after PUT,> > then> > > data are correctly retrieved.> > >>
>> > > I want to understand what is going on.> > > Is coprocessor is doing
>> something wrong? Is this somehow normal? or ... ?> > >> > > Coprocessor is
>> a RegionObserver doing data filtering / expiration based> > on> > >
>> predefined rules.> > > HBase version is 1.0.0 from cloudera cdh5.5.1
>> distibution.> > >> > > Thanks,> > > Jozo
>> >>
>>

Re: Coprocessors causes data access delay?

Posted by Jozef Vilcek <jo...@gmail.com>.
I have new information after some more debugging:

I am sure coprocessor is not filtering out values in get/put hooks.
The observed behavior is:
  We create non-existing cell via PUT and immediately after the call
  returns, we are trying to get the created cell value via GET. The GET
  is done every second in a loop until the cell is returned (the result
  is not empty).
  For most of the time the first GET request succeeds, but there are
  some periods of time when the obtained results are empty and the cell
  appears just after some time ranging from few seconds to something
  like 20 minutes.
  We do such test every minute in a separate process and log the results.
  Excerpt from such log:

    [put-time 20:19:03] [successful-get-time 20:21:03 = after-120.0-secs]
    [put-time 20:20:03] [successful-get-time 20:21:03 = after-59.6-secs]
    [put-time 23:01:02] [successful-get-time 23:05:50 = after-288.2-secs]
    [put-time 23:02:02] [successful-get-time 23:05:50 = after-228.1-secs]
    [put-time 23:03:02] [successful-get-time 23:05:50 = after-167.9-secs]
    [put-time 23:04:02] [successful-get-time 23:05:50 = after-107.7-secs]
    [put-time 23:05:02] [successful-get-time 23:05:50 = after-47.5-secs]
    [put-time 23:08:02] [successful-get-time 23:31:59 = after-1436.9-secs]
    [put-time 23:09:02] [successful-get-time 23:31:59 = after-1376.8-secs]

  Note that the cells seem to appear in "waves", multiple at the same time.
  Unsuccessful GETs (those between the PUT and the final successful GET)
  don't timeout and return just empty result without any cells.

  Coprocessor is logging "trafic" in pre/post get/put hooks for tested
keys.
  PUTs and GETs are always observed immediately client send requests, for
  for problematic cases, coprocessor does not see any data, yet, for GETs


Problem is somehow related to coprocessor itself. When I switch to
coprocessor which does only logging, observed data delays disappear.
However, observed delays are happening only sporadically for small period
of time. Like it is triggered by some other action (compaction, hbase being
under load, ...?).

I can not share the full code, but "simplified" coprocessor, which also
produced this behavior looks like this:

RegionObserver coprocessor, which just creates a new scanner with dummy
filter in its preStoreScannerOpen, preFlushScannerOpen and
preCompactScannerOpen:

  preStoreScannerOpen:
    return new StoreScanner(store, store.getScanInfo(),
      scan.setFilter(new DummyFilter()), targetCols,
 store.getSmallestReadPoint());

  preFlushScannerOpen:
    return new StoreScanner(store, store.getScanInfo(), new
Scan().setFilter(new DummyFilter()),
      Collections.singletonList(memstoreScanner),
      ScanType.COMPACT_RETAIN_DELETES, store.getSmallestReadPoint(),
      HConstants.OLDEST_TIMESTAMP);

  preCompactScannerOpen:
    return new StoreScanner(store, store.getScanInfo(),
      new Scan().setFilter(new DummyFilter()), scanners, scanType,
      store.getSmallestReadPoint(), earliestPutTs);

Besides some logging, the DummyFilter efectively just returns
ReturnCode.INCLUDE
in its filterKeyValue() method.

Shooting blind. Can it be that key under some circumstances can not be seen
by StoreScanner until it is flushed somewhere?

Many thanks in advance,
Jozo

On Fri, Apr 1, 2016 at 10:47 AM, Anoop John <an...@gmail.com> wrote:

> Can you provide us the code u have written for the pre hooks + the
> custom filter code.  You sure the filter is not filtering out this
> recent cell?  Add some logs there and check?
>
> -Anoop-
>
> On Thu, Mar 31, 2016 at 12:57 PM, Jozef Vilcek <jo...@gmail.com>
> wrote:
> > Test was done via hbase shell prompt. I am not sure on it's behavior. Can
> > it hide some timeouts and return nothing? I hope not.
> > I will try to reproduce it with java API.
> >
> > Coprocessor does not do anything asynchronous in it's hooks. It is hooked
> > only on preStoreScannerOpen, preFlushScannerOpen, preCompactScannerOpen.
> > What it does is, create a StoreScanner with custom filter, which skips
> > certain cells based on fixed rules line qualifier name pattern or how old
> > cell is.
> >
> >
> >> Do you wait for the Put to return before you issue the Get?  Or have you
> >> disabled auto flushing on your table instance --
> HTable.setAutoFlush(false)?
> >
> >> Coprocessor hooks executed on the Put and Get paths are blocking, so
> unless
> >> you're overriding normal processing in the prePut() hook and doing
> >> something asynchronous, the Put should be there by the time it returns
> to
> >> the client.  But it all depends on what is your coprocessor actually
> >> doing.  Can you tell us more about that?
> >
> >> On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:
> >
> >> > Is it possible to come up with unit test showing this behavior using>
> > simplified coprocessor ?> >> > Thanks> >> > > On Mar 29, 2016, at 5:42
> AM, Jozef Vilcek <jo...@gmail.com> wrote:> > >> > > Hi,> > >> > > I
> need a help to shed some light on following observation:> > >> > > Through
> hbase shell I do PUT to a non existing cell (create) and> > immediate> > >
> GET.> > > When doing this against hbase table without a coprocessor, every
> GET> > shows> > > the value immediately. When I enable a coprocessor on the
> table and> > > perform> > > the same check, I do not get a value for every
> GET. Sometimes GET returns> > > nothing, like cell is not there. If I delay
> GET few minutes after PUT,> > then> > > data are correctly retrieved.> > >>
> > > I want to understand what is going on.> > > Is coprocessor is doing
> something wrong? Is this somehow normal? or ... ?> > >> > > Coprocessor is
> a RegionObserver doing data filtering / expiration based> > on> > >
> predefined rules.> > > HBase version is 1.0.0 from cloudera cdh5.5.1
> distibution.> > >> > > Thanks,> > > Jozo
> >>
>

Re: Coprocessors causes data access delay?

Posted by Anoop John <an...@gmail.com>.
Can you provide us the code u have written for the pre hooks + the
custom filter code.  You sure the filter is not filtering out this
recent cell?  Add some logs there and check?

-Anoop-

On Thu, Mar 31, 2016 at 12:57 PM, Jozef Vilcek <jo...@gmail.com> wrote:
> Test was done via hbase shell prompt. I am not sure on it's behavior. Can
> it hide some timeouts and return nothing? I hope not.
> I will try to reproduce it with java API.
>
> Coprocessor does not do anything asynchronous in it's hooks. It is hooked
> only on preStoreScannerOpen, preFlushScannerOpen, preCompactScannerOpen.
> What it does is, create a StoreScanner with custom filter, which skips
> certain cells based on fixed rules line qualifier name pattern or how old
> cell is.
>
>
>> Do you wait for the Put to return before you issue the Get?  Or have you
>> disabled auto flushing on your table instance -- HTable.setAutoFlush(false)?
>
>> Coprocessor hooks executed on the Put and Get paths are blocking, so unless
>> you're overriding normal processing in the prePut() hook and doing
>> something asynchronous, the Put should be there by the time it returns to
>> the client.  But it all depends on what is your coprocessor actually
>> doing.  Can you tell us more about that?
>
>> On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:
>
>> > Is it possible to come up with unit test showing this behavior using> > simplified coprocessor ?> >> > Thanks> >> > > On Mar 29, 2016, at 5:42 AM, Jozef Vilcek <jo...@gmail.com> wrote:> > >> > > Hi,> > >> > > I need a help to shed some light on following observation:> > >> > > Through hbase shell I do PUT to a non existing cell (create) and> > immediate> > > GET.> > > When doing this against hbase table without a coprocessor, every GET> > shows> > > the value immediately. When I enable a coprocessor on the table and> > > perform> > > the same check, I do not get a value for every GET. Sometimes GET returns> > > nothing, like cell is not there. If I delay GET few minutes after PUT,> > then> > > data are correctly retrieved.> > >> > > I want to understand what is going on.> > > Is coprocessor is doing something wrong? Is this somehow normal? or ... ?> > >> > > Coprocessor is a RegionObserver doing data filtering / expiration based> > on> > > predefined rules.> > > HBase version is 1.0.0 from cloudera cdh5.5.1 distibution.> > >> > > Thanks,> > > Jozo
>>

Re: Coprocessors causes data access delay?

Posted by Jozef Vilcek <jo...@gmail.com>.
Test was done via hbase shell prompt. I am not sure on it's behavior. Can
it hide some timeouts and return nothing? I hope not.
I will try to reproduce it with java API.

Coprocessor does not do anything asynchronous in it's hooks. It is hooked
only on preStoreScannerOpen, preFlushScannerOpen, preCompactScannerOpen.
What it does is, create a StoreScanner with custom filter, which skips
certain cells based on fixed rules line qualifier name pattern or how old
cell is.


> Do you wait for the Put to return before you issue the Get?  Or have you
> disabled auto flushing on your table instance -- HTable.setAutoFlush(false)?

> Coprocessor hooks executed on the Put and Get paths are blocking, so unless
> you're overriding normal processing in the prePut() hook and doing
> something asynchronous, the Put should be there by the time it returns to
> the client.  But it all depends on what is your coprocessor actually
> doing.  Can you tell us more about that?

> On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:

> > Is it possible to come up with unit test showing this behavior using> > simplified coprocessor ?> >> > Thanks> >> > > On Mar 29, 2016, at 5:42 AM, Jozef Vilcek <jo...@gmail.com> wrote:> > >> > > Hi,> > >> > > I need a help to shed some light on following observation:> > >> > > Through hbase shell I do PUT to a non existing cell (create) and> > immediate> > > GET.> > > When doing this against hbase table without a coprocessor, every GET> > shows> > > the value immediately. When I enable a coprocessor on the table and> > > perform> > > the same check, I do not get a value for every GET. Sometimes GET returns> > > nothing, like cell is not there. If I delay GET few minutes after PUT,> > then> > > data are correctly retrieved.> > >> > > I want to understand what is going on.> > > Is coprocessor is doing something wrong? Is this somehow normal? or ... ?> > >> > > Coprocessor is a RegionObserver doing data filtering / expiration based> > on> > > predefined rules.> > > HBase version is 1.0.0 from cloudera cdh5.5.1 distibution.> > >> > > Thanks,> > > Jozo
>

Re: Coprocessors causes data access delay?

Posted by Gary Helmling <gh...@gmail.com>.
Do you wait for the Put to return before you issue the Get?  Or have you
disabled auto flushing on your table instance -- HTable.setAutoFlush(false)?

Coprocessor hooks executed on the Put and Get paths are blocking, so unless
you're overriding normal processing in the prePut() hook and doing
something asynchronous, the Put should be there by the time it returns to
the client.  But it all depends on what is your coprocessor actually
doing.  Can you tell us more about that?

On Tue, Mar 29, 2016 at 6:22 AM Ted Yu <yu...@gmail.com> wrote:

> Is it possible to come up with unit test showing this behavior using
> simplified coprocessor ?
>
> Thanks
>
> > On Mar 29, 2016, at 5:42 AM, Jozef Vilcek <jo...@gmail.com> wrote:
> >
> > Hi,
> >
> > I need a help to shed some light on following observation:
> >
> > Through hbase shell I do PUT to a non existing cell (create) and
> immediate
> > GET.
> > When doing this against hbase table without a coprocessor, every GET
> shows
> > the value immediately. When I enable a coprocessor on the table and
> perform
> > the same check, I do not get a value for every GET. Sometimes GET returns
> > nothing, like cell is not there. If I delay GET few minutes after PUT,
> then
> > data are correctly retrieved.
> >
> > I want to understand what is going on.
> > Is coprocessor is doing something wrong? Is this somehow normal? or ... ?
> >
> > Coprocessor is a RegionObserver doing data filtering / expiration based
> on
> > predefined rules.
> > HBase version is 1.0.0 from cloudera cdh5.5.1 distibution.
> >
> > Thanks,
> > Jozo
>

Re: Coprocessors causes data access delay?

Posted by Ted Yu <yu...@gmail.com>.
Is it possible to come up with unit test showing this behavior using simplified coprocessor ?

Thanks

> On Mar 29, 2016, at 5:42 AM, Jozef Vilcek <jo...@gmail.com> wrote:
> 
> Hi,
> 
> I need a help to shed some light on following observation:
> 
> Through hbase shell I do PUT to a non existing cell (create) and immediate
> GET.
> When doing this against hbase table without a coprocessor, every GET shows
> the value immediately. When I enable a coprocessor on the table and perform
> the same check, I do not get a value for every GET. Sometimes GET returns
> nothing, like cell is not there. If I delay GET few minutes after PUT, then
> data are correctly retrieved.
> 
> I want to understand what is going on.
> Is coprocessor is doing something wrong? Is this somehow normal? or ... ?
> 
> Coprocessor is a RegionObserver doing data filtering / expiration based on
> predefined rules.
> HBase version is 1.0.0 from cloudera cdh5.5.1 distibution.
> 
> Thanks,
> Jozo