You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Daniel Vimont <da...@commonvox.org> on 2016/10/14 00:06:14 UTC

[DISCUSSION] Bugs in some of the OrderedBytes decode methods?

I'm currently looking into the various implementations of `DataType` in
hbase-common, and I just wanted to get confirmation of something before I
open up a JIRA and fix what **appear** to be bugs in the underlying
OrderedBytes
code.

All `DataType` implementations have their own overrides of the `#decode`
method. Some of these throw an appropriate exception when an invalid
byte-array is passed to them; for example:

*Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*

(This throws an IllegalArgumentException: "unexpected value in first byte:
0x78".)

But for other implementations, *no* validation is done; for example:

*Short bogusShort = OrderedInt16.ASCENDING.decode(new
SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*

(This returns a short value of 1669, without complaint -- by ignoring the
first invalid "header" byte and constructing the value 1669 from the two
bytes that follow.)

In those implementations which lack validation, there are "assert"
statements in the place where I would expect an exception to be explicitly
thrown (or, in the context of OrderedBytes, one would expect the
#unexpectedHeader
method to be invoked, which in turn throws the exception). I just wanted to
check to make sure whether (perhaps for the sake of extreme efficiency?)
some validations in HBase low-level processing are intentionally being done
via bypassable "assert" statements instead of the throwing of exceptions.

Thanks!

Dan


[image: --]

Daniel Vimont
[image: https://]about.me/dvimont
<https://about.me/dvimont?promo=email_sig&utm_source=email_sig&utm_medium=external_link&utm_campaign=chrome_ext>

Re: [DISCUSSION] Bugs in some of the OrderedBytes decode methods?

Posted by Nick Dimiduk <nd...@apache.org>.
Hi Daniel,

I'd say any inconsistencies between implementations is a bug. It would be
great to normalize the implementations and improve the coverage of our
correctness suite. Following those improvements, some attention should be
paid to the performance of encoding and decoding implementations. A couple
of the encodings are a bit complicated and their translation from the C
inspiration didn't always allow for the same short-cuts. In that mind,
any of the variable length encodings could benefit from some scrutiny.

(I'm not *that* concerned about the perf of the encoders compared to other
aspects of the HBase client API, but these things do tend to accumulate in
aggregate.)

As for Phoenix, it implements its own encodings and does not use
OrderedBytes. Phoenix does use the DataType interface, however, and could
be replumbed to use OrderedBytes, though don't underestimate the amount of
work this would require.

-n

On Thursday, October 13, 2016, Daniel Vimont <da...@commonvox.org> wrote:

> Thanks for checking into that, Ted. As I said, hopefully Nick D. can give
> us the final word on this question.
>
> On Fri, Oct 14, 2016 at 10:53 AM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>> wrote:
>
> > I searched Phoenix code base - there is no reference to OrderedBytes.
> >
> > On Thu, Oct 13, 2016 at 6:45 PM, Daniel Vimont <daniel@commonvox.org
> <javascript:;>>
> > wrote:
> >
> > > Agreed, Ted. But I wanted to be sure there wasn't some hidden reason
> for
> > > the current "assert"-only code. The only other possibility that came to
> > > mind is that there may be some interoperability issue(s) with external
> > > consumers of OrderedBytes (such as Phoenix) which requires that no
> > > exception be thrown by some of the #decode methods. Hoping that Nick D.
> > can
> > > perhaps provide the final word on this.
> > >
> > >
> > > On Fri, Oct 14, 2016 at 9:29 AM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>> wrote:
> > >
> > > > I think throwing exception should be the action to take.
> > > >
> > > > Relying on assertion is not robust.
> > > >
> > > > > On Oct 13, 2016, at 5:06 PM, Daniel Vimont <daniel@commonvox.org
> <javascript:;>>
> > > wrote:
> > > > >
> > > > > I'm currently looking into the various implementations of
> `DataType`
> > in
> > > > > hbase-common, and I just wanted to get confirmation of something
> > > before I
> > > > > open up a JIRA and fix what **appear** to be bugs in the underlying
> > > > > OrderedBytes
> > > > > code.
> > > > >
> > > > > All `DataType` implementations have their own overrides of the
> > > `#decode`
> > > > > method. Some of these throw an appropriate exception when an
> invalid
> > > > > byte-array is passed to them; for example:
> > > > >
> > > > > *Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
> > > > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > > > >
> > > > > (This throws an IllegalArgumentException: "unexpected value in
> first
> > > > byte:
> > > > > 0x78".)
> > > > >
> > > > > But for other implementations, *no* validation is done; for
> example:
> > > > >
> > > > > *Short bogusShort = OrderedInt16.ASCENDING.decode(new
> > > > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > > > >
> > > > > (This returns a short value of 1669, without complaint -- by
> ignoring
> > > the
> > > > > first invalid "header" byte and constructing the value 1669 from
> the
> > > two
> > > > > bytes that follow.)
> > > > >
> > > > > In those implementations which lack validation, there are "assert"
> > > > > statements in the place where I would expect an exception to be
> > > > explicitly
> > > > > thrown (or, in the context of OrderedBytes, one would expect the
> > > > > #unexpectedHeader
> > > > > method to be invoked, which in turn throws the exception). I just
> > > wanted
> > > > to
> > > > > check to make sure whether (perhaps for the sake of extreme
> > > efficiency?)
> > > > > some validations in HBase low-level processing are intentionally
> > being
> > > > done
> > > > > via bypassable "assert" statements instead of the throwing of
> > > exceptions.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Dan
> > > > >
> > > > >
> > > > > [image: --]
> > > > >
> > > > > Daniel Vimont
> > > > > [image: https://]about.me/dvimont
> > > > > <https://about.me/dvimont?promo=email_sig&utm_source=
> > > > email_sig&utm_medium=external_link&utm_campaign=chrome_ext>
> > > >
> > >
> >
>

Re: [DISCUSSION] Bugs in some of the OrderedBytes decode methods?

Posted by Daniel Vimont <da...@commonvox.org>.
Thanks for checking into that, Ted. As I said, hopefully Nick D. can give
us the final word on this question.

On Fri, Oct 14, 2016 at 10:53 AM, Ted Yu <yu...@gmail.com> wrote:

> I searched Phoenix code base - there is no reference to OrderedBytes.
>
> On Thu, Oct 13, 2016 at 6:45 PM, Daniel Vimont <da...@commonvox.org>
> wrote:
>
> > Agreed, Ted. But I wanted to be sure there wasn't some hidden reason for
> > the current "assert"-only code. The only other possibility that came to
> > mind is that there may be some interoperability issue(s) with external
> > consumers of OrderedBytes (such as Phoenix) which requires that no
> > exception be thrown by some of the #decode methods. Hoping that Nick D.
> can
> > perhaps provide the final word on this.
> >
> >
> > On Fri, Oct 14, 2016 at 9:29 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > I think throwing exception should be the action to take.
> > >
> > > Relying on assertion is not robust.
> > >
> > > > On Oct 13, 2016, at 5:06 PM, Daniel Vimont <da...@commonvox.org>
> > wrote:
> > > >
> > > > I'm currently looking into the various implementations of `DataType`
> in
> > > > hbase-common, and I just wanted to get confirmation of something
> > before I
> > > > open up a JIRA and fix what **appear** to be bugs in the underlying
> > > > OrderedBytes
> > > > code.
> > > >
> > > > All `DataType` implementations have their own overrides of the
> > `#decode`
> > > > method. Some of these throw an appropriate exception when an invalid
> > > > byte-array is passed to them; for example:
> > > >
> > > > *Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
> > > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > > >
> > > > (This throws an IllegalArgumentException: "unexpected value in first
> > > byte:
> > > > 0x78".)
> > > >
> > > > But for other implementations, *no* validation is done; for example:
> > > >
> > > > *Short bogusShort = OrderedInt16.ASCENDING.decode(new
> > > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > > >
> > > > (This returns a short value of 1669, without complaint -- by ignoring
> > the
> > > > first invalid "header" byte and constructing the value 1669 from the
> > two
> > > > bytes that follow.)
> > > >
> > > > In those implementations which lack validation, there are "assert"
> > > > statements in the place where I would expect an exception to be
> > > explicitly
> > > > thrown (or, in the context of OrderedBytes, one would expect the
> > > > #unexpectedHeader
> > > > method to be invoked, which in turn throws the exception). I just
> > wanted
> > > to
> > > > check to make sure whether (perhaps for the sake of extreme
> > efficiency?)
> > > > some validations in HBase low-level processing are intentionally
> being
> > > done
> > > > via bypassable "assert" statements instead of the throwing of
> > exceptions.
> > > >
> > > > Thanks!
> > > >
> > > > Dan
> > > >
> > > >
> > > > [image: --]
> > > >
> > > > Daniel Vimont
> > > > [image: https://]about.me/dvimont
> > > > <https://about.me/dvimont?promo=email_sig&utm_source=
> > > email_sig&utm_medium=external_link&utm_campaign=chrome_ext>
> > >
> >
>

Re: [DISCUSSION] Bugs in some of the OrderedBytes decode methods?

Posted by Ted Yu <yu...@gmail.com>.
I searched Phoenix code base - there is no reference to OrderedBytes.

On Thu, Oct 13, 2016 at 6:45 PM, Daniel Vimont <da...@commonvox.org> wrote:

> Agreed, Ted. But I wanted to be sure there wasn't some hidden reason for
> the current "assert"-only code. The only other possibility that came to
> mind is that there may be some interoperability issue(s) with external
> consumers of OrderedBytes (such as Phoenix) which requires that no
> exception be thrown by some of the #decode methods. Hoping that Nick D. can
> perhaps provide the final word on this.
>
>
> On Fri, Oct 14, 2016 at 9:29 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I think throwing exception should be the action to take.
> >
> > Relying on assertion is not robust.
> >
> > > On Oct 13, 2016, at 5:06 PM, Daniel Vimont <da...@commonvox.org>
> wrote:
> > >
> > > I'm currently looking into the various implementations of `DataType` in
> > > hbase-common, and I just wanted to get confirmation of something
> before I
> > > open up a JIRA and fix what **appear** to be bugs in the underlying
> > > OrderedBytes
> > > code.
> > >
> > > All `DataType` implementations have their own overrides of the
> `#decode`
> > > method. Some of these throw an appropriate exception when an invalid
> > > byte-array is passed to them; for example:
> > >
> > > *Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
> > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > >
> > > (This throws an IllegalArgumentException: "unexpected value in first
> > byte:
> > > 0x78".)
> > >
> > > But for other implementations, *no* validation is done; for example:
> > >
> > > *Short bogusShort = OrderedInt16.ASCENDING.decode(new
> > > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> > >
> > > (This returns a short value of 1669, without complaint -- by ignoring
> the
> > > first invalid "header" byte and constructing the value 1669 from the
> two
> > > bytes that follow.)
> > >
> > > In those implementations which lack validation, there are "assert"
> > > statements in the place where I would expect an exception to be
> > explicitly
> > > thrown (or, in the context of OrderedBytes, one would expect the
> > > #unexpectedHeader
> > > method to be invoked, which in turn throws the exception). I just
> wanted
> > to
> > > check to make sure whether (perhaps for the sake of extreme
> efficiency?)
> > > some validations in HBase low-level processing are intentionally being
> > done
> > > via bypassable "assert" statements instead of the throwing of
> exceptions.
> > >
> > > Thanks!
> > >
> > > Dan
> > >
> > >
> > > [image: --]
> > >
> > > Daniel Vimont
> > > [image: https://]about.me/dvimont
> > > <https://about.me/dvimont?promo=email_sig&utm_source=
> > email_sig&utm_medium=external_link&utm_campaign=chrome_ext>
> >
>

Re: [DISCUSSION] Bugs in some of the OrderedBytes decode methods?

Posted by Daniel Vimont <da...@commonvox.org>.
Agreed, Ted. But I wanted to be sure there wasn't some hidden reason for
the current "assert"-only code. The only other possibility that came to
mind is that there may be some interoperability issue(s) with external
consumers of OrderedBytes (such as Phoenix) which requires that no
exception be thrown by some of the #decode methods. Hoping that Nick D. can
perhaps provide the final word on this.


On Fri, Oct 14, 2016 at 9:29 AM, Ted Yu <yu...@gmail.com> wrote:

> I think throwing exception should be the action to take.
>
> Relying on assertion is not robust.
>
> > On Oct 13, 2016, at 5:06 PM, Daniel Vimont <da...@commonvox.org> wrote:
> >
> > I'm currently looking into the various implementations of `DataType` in
> > hbase-common, and I just wanted to get confirmation of something before I
> > open up a JIRA and fix what **appear** to be bugs in the underlying
> > OrderedBytes
> > code.
> >
> > All `DataType` implementations have their own overrides of the `#decode`
> > method. Some of these throw an appropriate exception when an invalid
> > byte-array is passed to them; for example:
> >
> > *Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
> > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> >
> > (This throws an IllegalArgumentException: "unexpected value in first
> byte:
> > 0x78".)
> >
> > But for other implementations, *no* validation is done; for example:
> >
> > *Short bogusShort = OrderedInt16.ASCENDING.decode(new
> > SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> >
> > (This returns a short value of 1669, without complaint -- by ignoring the
> > first invalid "header" byte and constructing the value 1669 from the two
> > bytes that follow.)
> >
> > In those implementations which lack validation, there are "assert"
> > statements in the place where I would expect an exception to be
> explicitly
> > thrown (or, in the context of OrderedBytes, one would expect the
> > #unexpectedHeader
> > method to be invoked, which in turn throws the exception). I just wanted
> to
> > check to make sure whether (perhaps for the sake of extreme efficiency?)
> > some validations in HBase low-level processing are intentionally being
> done
> > via bypassable "assert" statements instead of the throwing of exceptions.
> >
> > Thanks!
> >
> > Dan
> >
> >
> > [image: --]
> >
> > Daniel Vimont
> > [image: https://]about.me/dvimont
> > <https://about.me/dvimont?promo=email_sig&utm_source=
> email_sig&utm_medium=external_link&utm_campaign=chrome_ext>
>

Re: [DISCUSSION] Bugs in some of the OrderedBytes decode methods?

Posted by Ted Yu <yu...@gmail.com>.
I think throwing exception should be the action to take. 

Relying on assertion is not robust. 

> On Oct 13, 2016, at 5:06 PM, Daniel Vimont <da...@commonvox.org> wrote:
> 
> I'm currently looking into the various implementations of `DataType` in
> hbase-common, and I just wanted to get confirmation of something before I
> open up a JIRA and fix what **appear** to be bugs in the underlying
> OrderedBytes
> code.
> 
> All `DataType` implementations have their own overrides of the `#decode`
> method. Some of these throw an appropriate exception when an invalid
> byte-array is passed to them; for example:
> 
> *Number bogusNumeric = OrderedNumeric.ASCENDING.decode(new
> SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> 
> (This throws an IllegalArgumentException: "unexpected value in first byte:
> 0x78".)
> 
> But for other implementations, *no* validation is done; for example:
> 
> *Short bogusShort = OrderedInt16.ASCENDING.decode(new
> SimplePositionedMutableByteRange(Bytes.toBytes("xyzpdq")));*
> 
> (This returns a short value of 1669, without complaint -- by ignoring the
> first invalid "header" byte and constructing the value 1669 from the two
> bytes that follow.)
> 
> In those implementations which lack validation, there are "assert"
> statements in the place where I would expect an exception to be explicitly
> thrown (or, in the context of OrderedBytes, one would expect the
> #unexpectedHeader
> method to be invoked, which in turn throws the exception). I just wanted to
> check to make sure whether (perhaps for the sake of extreme efficiency?)
> some validations in HBase low-level processing are intentionally being done
> via bypassable "assert" statements instead of the throwing of exceptions.
> 
> Thanks!
> 
> Dan
> 
> 
> [image: --]
> 
> Daniel Vimont
> [image: https://]about.me/dvimont
> <https://about.me/dvimont?promo=email_sig&utm_source=email_sig&utm_medium=external_link&utm_campaign=chrome_ext>