You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Cornish, Duane C." <Du...@jhuapl.edu> on 2014/01/08 17:25:23 UTC

Accumulo lexicographical order

Accumulo Users,

I know Accumulo keys are sorted in lexicographical order in the tables.  Where can I find the specification defining that order?  For example, what lexicographical order are symbols in?  Is the order the same as the numerical order of UTF-8 encoding, ASCII encoding, or some other encoding scheme?

Thanks in advance,
Duane Cornish

Re: Accumulo lexicographical order

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Jan 8, 2014 at 11:25 AM, Cornish, Duane C. <Duane.Cornish@jhuapl.edu
> wrote:

> Accumulo Users,
>
>
>
> I know Accumulo keys are sorted in lexicographical order in the tables.
> Where can I find the specification defining that order?  For example, what
> lexicographical order are symbols in?  Is the order the same as the
> numerical order of UTF-8 encoding, ASCII encoding, or some other encoding
> scheme?
>

Bytes are compared an unsigned integers 0 to 255


>
>
> Thanks in advance,
>
> Duane Cornish
>

RE: Accumulo lexicographical order

Posted by "Cornish, Duane C." <Du...@jhuapl.edu>.
Great!  Thanks for all the help!

From: Keith Turner [mailto:keith@deenlo.com]
Sent: Wednesday, January 08, 2014 12:04 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo lexicographical order



On Wed, Jan 8, 2014 at 11:50 AM, Mike Drob <md...@mdrob.com>> wrote:

Duane,

Most API methods for inserting values take byte arrays or byte sequences directly. The lexographic order is based on the natural ordering of the bytes, i.e. \x00 sorts before \x01. The methods that take strings will assume UTF-8 encoding and convert for you. If you find a situation where this is not the case, please let us know!

The one exception to all of this is the timestamp part of the key, which is stored in numeric order. I want to say that they are kept in reverse order, but don't remember the exact details off hand.
Thats correct.  The most recent timestamps are sorted first.  Sorted as signed long.



I'm on my phone, so finding the exact place where this is documented will be a challenge, but I would expect it to be part of our user manual on accumulo.apache.org<http://accumulo.apache.org>

Mike
On Jan 8, 2014 8:26 AM, "Cornish, Duane C." <Du...@jhuapl.edu>> wrote:
Accumulo Users,

I know Accumulo keys are sorted in lexicographical order in the tables.  Where can I find the specification defining that order?  For example, what lexicographical order are symbols in?  Is the order the same as the numerical order of UTF-8 encoding, ASCII encoding, or some other encoding scheme?

Thanks in advance,
Duane Cornish


Re: Accumulo lexicographical order

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Jan 8, 2014 at 11:50 AM, Mike Drob <md...@mdrob.com> wrote:

> Duane,
>
> Most API methods for inserting values take byte arrays or byte sequences
> directly. The lexographic order is based on the natural ordering of the
> bytes, i.e. \x00 sorts before \x01. The methods that take strings will
> assume UTF-8 encoding and convert for you. If you find a situation where
> this is not the case, please let us know!
>
> The one exception to all of this is the timestamp part of the key, which
> is stored in numeric order. I want to say that they are kept in reverse
> order, but don't remember the exact details off hand.
>
Thats correct.  The most recent timestamps are sorted first.  Sorted as
signed long.



> I'm on my phone, so finding the exact place where this is documented will
> be a challenge, but I would expect it to be part of our user manual on
> accumulo.apache.org
>
> Mike
> On Jan 8, 2014 8:26 AM, "Cornish, Duane C." <Du...@jhuapl.edu>
> wrote:
>
>> Accumulo Users,
>>
>>
>>
>> I know Accumulo keys are sorted in lexicographical order in the tables.
>> Where can I find the specification defining that order?  For example, what
>> lexicographical order are symbols in?  Is the order the same as the
>> numerical order of UTF-8 encoding, ASCII encoding, or some other encoding
>> scheme?
>>
>>
>>
>> Thanks in advance,
>>
>> Duane Cornish
>>
>

Re: Accumulo lexicographical order

Posted by Mike Drob <md...@mdrob.com>.
Duane,

Most API methods for inserting values take byte arrays or byte sequences
directly. The lexographic order is based on the natural ordering of the
bytes, i.e. \x00 sorts before \x01. The methods that take strings will
assume UTF-8 encoding and convert for you. If you find a situation where
this is not the case, please let us know!

The one exception to all of this is the timestamp part of the key, which is
stored in numeric order. I want to say that they are kept in reverse order,
but don't remember the exact details off hand.

I'm on my phone, so finding the exact place where this is documented will
be a challenge, but I would expect it to be part of our user manual on
accumulo.apache.org

Mike
On Jan 8, 2014 8:26 AM, "Cornish, Duane C." <Du...@jhuapl.edu>
wrote:

> Accumulo Users,
>
>
>
> I know Accumulo keys are sorted in lexicographical order in the tables.
> Where can I find the specification defining that order?  For example, what
> lexicographical order are symbols in?  Is the order the same as the
> numerical order of UTF-8 encoding, ASCII encoding, or some other encoding
> scheme?
>
>
>
> Thanks in advance,
>
> Duane Cornish
>