You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nasron Cheong <na...@kontagent.com> on 2013/11/01 02:19:49 UTC

Column qualifiers with hierarchy and filters

Hi,

I'm trying to determine the best way to serialize a sequence of
integers/strings that represent a hierarchy for a column qualifier, which
would be compatible with the ColumnPrefixFilters, and BinaryComparators.

However, due to the lexicographical sorting, it's awkward to serialize the
sequence of values needed to get it to work.

What are the typical solutions to this? Do people just zero pad integers to
make sure they sort correctly? Or do I have to implement my own
QualifierFilter - which seems expensive since I'd be deserializing every
byte array just to compare.

Thanks

- Nasron

Re: Column qualifiers with hierarchy and filters

Posted by Asaf Mesika <as...@gmail.com>.
Can you give an example of your query?

On Friday, November 1, 2013, Nasron Cheong wrote:

> Hi,
>
> I'm trying to determine the best way to serialize a sequence of
> integers/strings that represent a hierarchy for a column qualifier, which
> would be compatible with the ColumnPrefixFilters, and BinaryComparators.
>
> However, due to the lexicographical sorting, it's awkward to serialize the
> sequence of values needed to get it to work.
>
> What are the typical solutions to this? Do people just zero pad integers to
> make sure they sort correctly? Or do I have to implement my own
> QualifierFilter - which seems expensive since I'd be deserializing every
> byte array just to compare.
>
> Thanks
>
> - Nasron
>

Re: Column qualifiers with hierarchy and filters

Posted by Asaf Mesika <as...@gmail.com>.
Both are created when you declare the table and not in runtime so in
shouldn't matter to you anyway

On Thursday, November 7, 2013, Nasron Cheong wrote:

> Why is that? Afaik everything is just a byte sequence, what prevents
> non-printable chars from being used in CF/table names?
>
> - Nasron
>
>
> On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> > This is fine for the key. Just so you are aware, you can not use this for
> > table name and CF name since they need to be printable characters only.
> >
> > JM
> >
> >
> > 2013/11/6 Nasron Cheong <na...@kontagent.com>
> >
> > > Yes, after some digging around, the key is to store integers as byte
> > > representation, but more importantly to store them as big-endian so
> that
> > > the lexicographical sequence is maintained.
> > >
> > > Thanks!
> > >
> > > - Nasron
> > >
> > >
> > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <pr...@gmail.com>
> > > wrote:
> > >
> > > > you can store the byte representation of the integer (fixed length)
> > > instead
> > > > of the integer (which will be stored as strings of variable length)
> and
> > > > will also be sorted.
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > > <na...@kontagent.com>wrote:
> > > >
> > > > > Yes, its limited in the sense that we have to precalculate the
> number
> > > of
> > > > > digits required so we don't run out, and if we overestimate, then
> our
> > > row
> > > > > keys end up taking up more space than we'd care to.
> > > > >
> > > > > We can probably live with this approach for now, but I wonder if
> > > there's
> > > > a
> > > > > better way.
> > > > >
> > > > > - Nasron
> > > > >
> > > > >
> > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Nasron,
> > > > > >
> > > > > > Why are you saying that it's a limited way? Does it achieve your
> > > needs?
> > > > > >
> > > > > >
> > > > > > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> > > > > >
> > > > > > > An example query would be the following, say the column
> qualifier
> > > was
> > > > > of
> > > > > > > the form
> > > > > > >
> > > > > > > <bucket #>:<msg type>
> > > > > > >
> > > > > > > where <bucket #> should be an integer value, and msg type is a
> > > > string.
> > > > > > E.g.
> > > > > > >
> > > > > > > 1:abc
> > > > > > > 1000:abc
> > > > > > > 2: abc
> > > > > > >
> > > > > > > would appear in the above sequence, which is out of order when
> > > doing
> > > > > > prefix
> > > > > > > filtering. Zero padding could fix this:
> > > > > > >
> > > > > > > 0001:abc
> > > > > > > 0002:abc
> > > > > > > 1000: abc
> > > > > > >
> > > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > > qualifiers)
> > > > > > is
> > > > > > > correct, in order for prefix filtering to work. Are there other
> > > > > options?
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > > <na...@kontagent.com>wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm trying to determine the best way to serialize a sequence
> of
> > > > > >

Re: Column qualifiers with hierarchy and filters

Posted by Ted Yu <yu...@gmail.com>.
Please take a look
at src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java (0.94) :

  public static final String VALID_USER_TABLE_REGEX =
"(?:[a-zA-Z_0-9][a-zA-Z_0-9.-]*)";

Cheers


On Thu, Nov 7, 2013 at 9:47 AM, Nasron Cheong
<na...@kontagent.com>wrote:

> Why is that? Afaik everything is just a byte sequence, what prevents
> non-printable chars from being used in CF/table names?
>
> - Nasron
>
>
> On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> > This is fine for the key. Just so you are aware, you can not use this for
> > table name and CF name since they need to be printable characters only.
> >
> > JM
> >
> >
> > 2013/11/6 Nasron Cheong <na...@kontagent.com>
> >
> > > Yes, after some digging around, the key is to store integers as byte
> > > representation, but more importantly to store them as big-endian so
> that
> > > the lexicographical sequence is maintained.
> > >
> > > Thanks!
> > >
> > > - Nasron
> > >
> > >
> > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <pr...@gmail.com>
> > > wrote:
> > >
> > > > you can store the byte representation of the integer (fixed length)
> > > instead
> > > > of the integer (which will be stored as strings of variable length)
> and
> > > > will also be sorted.
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > > <na...@kontagent.com>wrote:
> > > >
> > > > > Yes, its limited in the sense that we have to precalculate the
> number
> > > of
> > > > > digits required so we don't run out, and if we overestimate, then
> our
> > > row
> > > > > keys end up taking up more space than we'd care to.
> > > > >
> > > > > We can probably live with this approach for now, but I wonder if
> > > there's
> > > > a
> > > > > better way.
> > > > >
> > > > > - Nasron
> > > > >
> > > > >
> > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Nasron,
> > > > > >
> > > > > > Why are you saying that it's a limited way? Does it achieve your
> > > needs?
> > > > > >
> > > > > >
> > > > > > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> > > > > >
> > > > > > > An example query would be the following, say the column
> qualifier
> > > was
> > > > > of
> > > > > > > the form
> > > > > > >
> > > > > > > <bucket #>:<msg type>
> > > > > > >
> > > > > > > where <bucket #> should be an integer value, and msg type is a
> > > > string.
> > > > > > E.g.
> > > > > > >
> > > > > > > 1:abc
> > > > > > > 1000:abc
> > > > > > > 2: abc
> > > > > > >
> > > > > > > would appear in the above sequence, which is out of order when
> > > doing
> > > > > > prefix
> > > > > > > filtering. Zero padding could fix this:
> > > > > > >
> > > > > > > 0001:abc
> > > > > > > 0002:abc
> > > > > > > 1000: abc
> > > > > > >
> > > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > > qualifiers)
> > > > > > is
> > > > > > > correct, in order for prefix filtering to work. Are there other
> > > > > options?
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > > <na...@kontagent.com>wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm trying to determine the best way to serialize a sequence
> of
> > > > > > > > integers/strings that represent a hierarchy for a column
> > > qualifier,
> > > > > > which
> > > > > > > > would be compatible with the ColumnPrefixFilters, and
> > > > > > BinaryComparators.
> > > > > > > >
> > > > > > > > However, due to the lexicographical sorting, it's awkward to
> > > > > serialize
> > > > > > > the
> > > > > > > > sequence of values needed to get it to work.
> > > > > > > >
> > > > > > > > What are the typical solutions to this? Do people just zero
> pad
> > > > > > integers
> > > > > > > > to make sure they sort correctly? Or do I have to implement
> my
> > > own
> > > > > > > > QualifierFilter - which seems expensive since I'd be
> > > deserializing
> > > > > > every
> > > > > > > > byte array just to compare.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > - Nasron
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Premal Shah.
> > > >
> > >
> >
>

Re: Column qualifiers with hierarchy and filters

Posted by Nasron Cheong <na...@kontagent.com>.
Why is that? Afaik everything is just a byte sequence, what prevents
non-printable chars from being used in CF/table names?

- Nasron


On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> This is fine for the key. Just so you are aware, you can not use this for
> table name and CF name since they need to be printable characters only.
>
> JM
>
>
> 2013/11/6 Nasron Cheong <na...@kontagent.com>
>
> > Yes, after some digging around, the key is to store integers as byte
> > representation, but more importantly to store them as big-endian so that
> > the lexicographical sequence is maintained.
> >
> > Thanks!
> >
> > - Nasron
> >
> >
> > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <pr...@gmail.com>
> > wrote:
> >
> > > you can store the byte representation of the integer (fixed length)
> > instead
> > > of the integer (which will be stored as strings of variable length) and
> > > will also be sorted.
> > >
> > >
> > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > <na...@kontagent.com>wrote:
> > >
> > > > Yes, its limited in the sense that we have to precalculate the number
> > of
> > > > digits required so we don't run out, and if we overestimate, then our
> > row
> > > > keys end up taking up more space than we'd care to.
> > > >
> > > > We can probably live with this approach for now, but I wonder if
> > there's
> > > a
> > > > better way.
> > > >
> > > > - Nasron
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi Nasron,
> > > > >
> > > > > Why are you saying that it's a limited way? Does it achieve your
> > needs?
> > > > >
> > > > >
> > > > > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> > > > >
> > > > > > An example query would be the following, say the column qualifier
> > was
> > > > of
> > > > > > the form
> > > > > >
> > > > > > <bucket #>:<msg type>
> > > > > >
> > > > > > where <bucket #> should be an integer value, and msg type is a
> > > string.
> > > > > E.g.
> > > > > >
> > > > > > 1:abc
> > > > > > 1000:abc
> > > > > > 2: abc
> > > > > >
> > > > > > would appear in the above sequence, which is out of order when
> > doing
> > > > > prefix
> > > > > > filtering. Zero padding could fix this:
> > > > > >
> > > > > > 0001:abc
> > > > > > 0002:abc
> > > > > > 1000: abc
> > > > > >
> > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > qualifiers)
> > > > > is
> > > > > > correct, in order for prefix filtering to work. Are there other
> > > > options?
> > > > > >
> > > > > > - Nasron
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > <na...@kontagent.com>wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm trying to determine the best way to serialize a sequence of
> > > > > > > integers/strings that represent a hierarchy for a column
> > qualifier,
> > > > > which
> > > > > > > would be compatible with the ColumnPrefixFilters, and
> > > > > BinaryComparators.
> > > > > > >
> > > > > > > However, due to the lexicographical sorting, it's awkward to
> > > > serialize
> > > > > > the
> > > > > > > sequence of values needed to get it to work.
> > > > > > >
> > > > > > > What are the typical solutions to this? Do people just zero pad
> > > > > integers
> > > > > > > to make sure they sort correctly? Or do I have to implement my
> > own
> > > > > > > QualifierFilter - which seems expensive since I'd be
> > deserializing
> > > > > every
> > > > > > > byte array just to compare.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Premal Shah.
> > >
> >
>

Re: Column qualifiers with hierarchy and filters

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
This is fine for the key. Just so you are aware, you can not use this for
table name and CF name since they need to be printable characters only.

JM


2013/11/6 Nasron Cheong <na...@kontagent.com>

> Yes, after some digging around, the key is to store integers as byte
> representation, but more importantly to store them as big-endian so that
> the lexicographical sequence is maintained.
>
> Thanks!
>
> - Nasron
>
>
> On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <pr...@gmail.com>
> wrote:
>
> > you can store the byte representation of the integer (fixed length)
> instead
> > of the integer (which will be stored as strings of variable length) and
> > will also be sorted.
> >
> >
> > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > <na...@kontagent.com>wrote:
> >
> > > Yes, its limited in the sense that we have to precalculate the number
> of
> > > digits required so we don't run out, and if we overestimate, then our
> row
> > > keys end up taking up more space than we'd care to.
> > >
> > > We can probably live with this approach for now, but I wonder if
> there's
> > a
> > > better way.
> > >
> > > - Nasron
> > >
> > >
> > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Nasron,
> > > >
> > > > Why are you saying that it's a limited way? Does it achieve your
> needs?
> > > >
> > > >
> > > > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> > > >
> > > > > An example query would be the following, say the column qualifier
> was
> > > of
> > > > > the form
> > > > >
> > > > > <bucket #>:<msg type>
> > > > >
> > > > > where <bucket #> should be an integer value, and msg type is a
> > string.
> > > > E.g.
> > > > >
> > > > > 1:abc
> > > > > 1000:abc
> > > > > 2: abc
> > > > >
> > > > > would appear in the above sequence, which is out of order when
> doing
> > > > prefix
> > > > > filtering. Zero padding could fix this:
> > > > >
> > > > > 0001:abc
> > > > > 0002:abc
> > > > > 1000: abc
> > > > >
> > > > > But is a limited way of ensuring the sequence of CQ (column
> > qualifiers)
> > > > is
> > > > > correct, in order for prefix filtering to work. Are there other
> > > options?
> > > > >
> > > > > - Nasron
> > > > >
> > > > >
> > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > <na...@kontagent.com>wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm trying to determine the best way to serialize a sequence of
> > > > > > integers/strings that represent a hierarchy for a column
> qualifier,
> > > > which
> > > > > > would be compatible with the ColumnPrefixFilters, and
> > > > BinaryComparators.
> > > > > >
> > > > > > However, due to the lexicographical sorting, it's awkward to
> > > serialize
> > > > > the
> > > > > > sequence of values needed to get it to work.
> > > > > >
> > > > > > What are the typical solutions to this? Do people just zero pad
> > > > integers
> > > > > > to make sure they sort correctly? Or do I have to implement my
> own
> > > > > > QualifierFilter - which seems expensive since I'd be
> deserializing
> > > > every
> > > > > > byte array just to compare.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > - Nasron
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Premal Shah.
> >
>

Re: Column qualifiers with hierarchy and filters

Posted by Nasron Cheong <na...@kontagent.com>.
Yes, after some digging around, the key is to store integers as byte
representation, but more importantly to store them as big-endian so that
the lexicographical sequence is maintained.

Thanks!

- Nasron


On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <pr...@gmail.com> wrote:

> you can store the byte representation of the integer (fixed length) instead
> of the integer (which will be stored as strings of variable length) and
> will also be sorted.
>
>
> On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> <na...@kontagent.com>wrote:
>
> > Yes, its limited in the sense that we have to precalculate the number of
> > digits required so we don't run out, and if we overestimate, then our row
> > keys end up taking up more space than we'd care to.
> >
> > We can probably live with this approach for now, but I wonder if there's
> a
> > better way.
> >
> > - Nasron
> >
> >
> > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Nasron,
> > >
> > > Why are you saying that it's a limited way? Does it achieve your needs?
> > >
> > >
> > > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> > >
> > > > An example query would be the following, say the column qualifier was
> > of
> > > > the form
> > > >
> > > > <bucket #>:<msg type>
> > > >
> > > > where <bucket #> should be an integer value, and msg type is a
> string.
> > > E.g.
> > > >
> > > > 1:abc
> > > > 1000:abc
> > > > 2: abc
> > > >
> > > > would appear in the above sequence, which is out of order when doing
> > > prefix
> > > > filtering. Zero padding could fix this:
> > > >
> > > > 0001:abc
> > > > 0002:abc
> > > > 1000: abc
> > > >
> > > > But is a limited way of ensuring the sequence of CQ (column
> qualifiers)
> > > is
> > > > correct, in order for prefix filtering to work. Are there other
> > options?
> > > >
> > > > - Nasron
> > > >
> > > >
> > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > <na...@kontagent.com>wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to determine the best way to serialize a sequence of
> > > > > integers/strings that represent a hierarchy for a column qualifier,
> > > which
> > > > > would be compatible with the ColumnPrefixFilters, and
> > > BinaryComparators.
> > > > >
> > > > > However, due to the lexicographical sorting, it's awkward to
> > serialize
> > > > the
> > > > > sequence of values needed to get it to work.
> > > > >
> > > > > What are the typical solutions to this? Do people just zero pad
> > > integers
> > > > > to make sure they sort correctly? Or do I have to implement my own
> > > > > QualifierFilter - which seems expensive since I'd be deserializing
> > > every
> > > > > byte array just to compare.
> > > > >
> > > > > Thanks
> > > > >
> > > > > - Nasron
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> Premal Shah.
>

Re: Column qualifiers with hierarchy and filters

Posted by Premal Shah <pr...@gmail.com>.
you can store the byte representation of the integer (fixed length) instead
of the integer (which will be stored as strings of variable length) and
will also be sorted.


On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
<na...@kontagent.com>wrote:

> Yes, its limited in the sense that we have to precalculate the number of
> digits required so we don't run out, and if we overestimate, then our row
> keys end up taking up more space than we'd care to.
>
> We can probably live with this approach for now, but I wonder if there's a
> better way.
>
> - Nasron
>
>
> On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Nasron,
> >
> > Why are you saying that it's a limited way? Does it achieve your needs?
> >
> >
> > 2013/11/4 Nasron Cheong <na...@kontagent.com>
> >
> > > An example query would be the following, say the column qualifier was
> of
> > > the form
> > >
> > > <bucket #>:<msg type>
> > >
> > > where <bucket #> should be an integer value, and msg type is a string.
> > E.g.
> > >
> > > 1:abc
> > > 1000:abc
> > > 2: abc
> > >
> > > would appear in the above sequence, which is out of order when doing
> > prefix
> > > filtering. Zero padding could fix this:
> > >
> > > 0001:abc
> > > 0002:abc
> > > 1000: abc
> > >
> > > But is a limited way of ensuring the sequence of CQ (column qualifiers)
> > is
> > > correct, in order for prefix filtering to work. Are there other
> options?
> > >
> > > - Nasron
> > >
> > >
> > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > <na...@kontagent.com>wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm trying to determine the best way to serialize a sequence of
> > > > integers/strings that represent a hierarchy for a column qualifier,
> > which
> > > > would be compatible with the ColumnPrefixFilters, and
> > BinaryComparators.
> > > >
> > > > However, due to the lexicographical sorting, it's awkward to
> serialize
> > > the
> > > > sequence of values needed to get it to work.
> > > >
> > > > What are the typical solutions to this? Do people just zero pad
> > integers
> > > > to make sure they sort correctly? Or do I have to implement my own
> > > > QualifierFilter - which seems expensive since I'd be deserializing
> > every
> > > > byte array just to compare.
> > > >
> > > > Thanks
> > > >
> > > > - Nasron
> > > >
> > >
> >
>



-- 
Regards,
Premal Shah.

Re: Column qualifiers with hierarchy and filters

Posted by Nasron Cheong <na...@kontagent.com>.
Yes, its limited in the sense that we have to precalculate the number of
digits required so we don't run out, and if we overestimate, then our row
keys end up taking up more space than we'd care to.

We can probably live with this approach for now, but I wonder if there's a
better way.

- Nasron


On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Nasron,
>
> Why are you saying that it's a limited way? Does it achieve your needs?
>
>
> 2013/11/4 Nasron Cheong <na...@kontagent.com>
>
> > An example query would be the following, say the column qualifier was of
> > the form
> >
> > <bucket #>:<msg type>
> >
> > where <bucket #> should be an integer value, and msg type is a string.
> E.g.
> >
> > 1:abc
> > 1000:abc
> > 2: abc
> >
> > would appear in the above sequence, which is out of order when doing
> prefix
> > filtering. Zero padding could fix this:
> >
> > 0001:abc
> > 0002:abc
> > 1000: abc
> >
> > But is a limited way of ensuring the sequence of CQ (column qualifiers)
> is
> > correct, in order for prefix filtering to work. Are there other options?
> >
> > - Nasron
> >
> >
> > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > <na...@kontagent.com>wrote:
> >
> > > Hi,
> > >
> > > I'm trying to determine the best way to serialize a sequence of
> > > integers/strings that represent a hierarchy for a column qualifier,
> which
> > > would be compatible with the ColumnPrefixFilters, and
> BinaryComparators.
> > >
> > > However, due to the lexicographical sorting, it's awkward to serialize
> > the
> > > sequence of values needed to get it to work.
> > >
> > > What are the typical solutions to this? Do people just zero pad
> integers
> > > to make sure they sort correctly? Or do I have to implement my own
> > > QualifierFilter - which seems expensive since I'd be deserializing
> every
> > > byte array just to compare.
> > >
> > > Thanks
> > >
> > > - Nasron
> > >
> >
>

Re: Column qualifiers with hierarchy and filters

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Nasron,

Why are you saying that it's a limited way? Does it achieve your needs?


2013/11/4 Nasron Cheong <na...@kontagent.com>

> An example query would be the following, say the column qualifier was of
> the form
>
> <bucket #>:<msg type>
>
> where <bucket #> should be an integer value, and msg type is a string. E.g.
>
> 1:abc
> 1000:abc
> 2: abc
>
> would appear in the above sequence, which is out of order when doing prefix
> filtering. Zero padding could fix this:
>
> 0001:abc
> 0002:abc
> 1000: abc
>
> But is a limited way of ensuring the sequence of CQ (column qualifiers) is
> correct, in order for prefix filtering to work. Are there other options?
>
> - Nasron
>
>
> On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> <na...@kontagent.com>wrote:
>
> > Hi,
> >
> > I'm trying to determine the best way to serialize a sequence of
> > integers/strings that represent a hierarchy for a column qualifier, which
> > would be compatible with the ColumnPrefixFilters, and BinaryComparators.
> >
> > However, due to the lexicographical sorting, it's awkward to serialize
> the
> > sequence of values needed to get it to work.
> >
> > What are the typical solutions to this? Do people just zero pad integers
> > to make sure they sort correctly? Or do I have to implement my own
> > QualifierFilter - which seems expensive since I'd be deserializing every
> > byte array just to compare.
> >
> > Thanks
> >
> > - Nasron
> >
>

Re: Column qualifiers with hierarchy and filters

Posted by Nasron Cheong <na...@kontagent.com>.
An example query would be the following, say the column qualifier was of
the form

<bucket #>:<msg type>

where <bucket #> should be an integer value, and msg type is a string. E.g.

1:abc
1000:abc
2: abc

would appear in the above sequence, which is out of order when doing prefix
filtering. Zero padding could fix this:

0001:abc
0002:abc
1000: abc

But is a limited way of ensuring the sequence of CQ (column qualifiers) is
correct, in order for prefix filtering to work. Are there other options?

- Nasron


On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
<na...@kontagent.com>wrote:

> Hi,
>
> I'm trying to determine the best way to serialize a sequence of
> integers/strings that represent a hierarchy for a column qualifier, which
> would be compatible with the ColumnPrefixFilters, and BinaryComparators.
>
> However, due to the lexicographical sorting, it's awkward to serialize the
> sequence of values needed to get it to work.
>
> What are the typical solutions to this? Do people just zero pad integers
> to make sure they sort correctly? Or do I have to implement my own
> QualifierFilter - which seems expensive since I'd be deserializing every
> byte array just to compare.
>
> Thanks
>
> - Nasron
>