You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com> on 2016/03/02 23:55:27 UTC

(CDH5.5) AssertionError: Key followed by smaller key

Recently we upgraded to CDH5.3.8, and needed to use CDAP Readless
Increments (https://github.com/caskdata/cdap) to overcome the
recently-fixed performance regression around Increment.

We are now looking to upgrade to CDH5.5.x, and have attempted to upgrade 1
slave in our 5.3.8 cluster to 5.5.0.  Unfortunately on this slave we are
seeing errors like below, for use cases that work perfectly fine on the
rest of the cluster:

https://gist.github.com/bbeaudreault/5214b28319981c18379f

We will see this typically within minutes of the server starting. The
exception originally comes from CDAP's IncrementSummingHandler invocation,
but eventually falls through to the same underlying hbase libraries as the
stacktrace above (the above is from my test code -- see below -- in an
attempt to exonerate cdap).

In debugging, I've done the following to rule out CDAP's wrappers as a
possible problem:

try {
// cdap code

} catch (AssertionError e) {
// Do a normal HRegion.get(get, false) and HRegion.getScanner()
}

Both of the followup raw get and scans will also throw the same
AssertionError.  HOWEVER, if I first do a HRegion.flushcache(true), those
same raw get/scan will then work (returning 0 results however).

I understand that at this point we are using a modified version of an
unofficial coprocessor and it will be hard for you guys to provide a real
clear solution without knowing the full environment.  I'm hoping to just
get some guidance, as I'm in the midst of debugging this and have a few
questions I was hoping you guys might have knowledge of:

- Where does OLDEST_TIMESTAMP/Minimum come from? I see it in the code, but
don't really see any KeyValues instantiated with it in any code paths that
I am familiar with.

- Any ideas on how to inspect the full result that is causing this issue?
I'm in an unfortunate position right now in which I do not know how to
reproduce the issue (the table in question gets a wide range of usage
patterns from multiple clients). Since it seems a flushcache() call fixes
the issue (though oddly returns 0 results afterward) it seems like I can't
just read a raw hfile -- the problem may be in the memstore?

- Any thoughts/hints on how this AssertionError could possibly be triggered
which I could use to inform my investigation?

Any thoughts would be greatly appreciated.

Thanks!