You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by rob mancuso <rc...@gmail.com> on 2013/06/08 19:54:50 UTC

observer coprocessor question regarding puts

Hi,

I'm looking to write a post-put observer coprocessor to maintain a
secondary index.  Basically, my current rowkey design is a composite of
A,B,C and I want to be able to also access data by C.  So all i'm looking
to do is invert the rowkey and apply it for all cf:cq values that come in.

My problem (i think), is that in all the good examples i've seen, they all
deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know the
qualifier ahead of time.  I'm looking to specify the family and generate a
put to the secondary index table for all qualifiers ...not knowing or
caring what the qualifier is.

Any pointers would be appreciated,
Thx - Rob

Is there a way

Re: observer coprocessor question regarding puts

Posted by rob mancuso <rc...@gmail.com>.

Thx Mike, makes perfect sense.  I'm using opentsdb, so my schema is fixed.
Metric is at the front of my key (my [A]) and dataserver is at the end (my
[C]).  I need to be able to query by either or, and simply inverting the
rowkey allows me to use the opentsdb apis...by leaving the cf:cq and value
as is.

My initial attempt works, but I'm getting socket timeouts when I increase
volume.  I have some more debugging to do.

Thx
On Jun 14, 2013 10:46 AM, "Michael Segel" <mi...@hotmail.com> wrote:

> Not to beat a dead horse...
>
> I did want to touch a bit more on the schema design issues and
> considerations.
>
> If you have a really wide composite key and you're only storing a single
> cell, you will end up with a very long (tall) table.
>
> Does this make sense?
>
> Would it make more sense in using a smaller key and then storing multiple
> cells with part of the rowkey as a column qualifier?
>
> Using your example... you have [A,B,C] as your rowkey and then Column1
> with a value.
>
> You could make the row key [A, B] with the column qualifier [C] storing
> the value there.
>
> Does that make sense?
>
> -Mike
>
> On Jun 13, 2013, at 9:51 PM, Michel Segel <mi...@hotmail.com>
> wrote:
>
> > Ok...
> >
> > But then you are duplicating the data, so you will have to reconcile the
> two sets and there is a possibility that the data sets are out of sync.
> >
> > I don't know your entire Schema, but if the row key is larger than the
> value, you may want to think about changing the Schema.
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Jun 13, 2013, at 9:34 PM, rob mancuso <rc...@gmail.com> wrote:
> >
> >> Thx Mike, for the most part.
> >>
> >> My key is substantially larger than my value, so I was thinking of
> leaving
> >> the cq->value stuff as is and just inverting the rowkey.
> >>
> >> So the original table would have
> >>
> >> [A, B, C] cf1:cq1 val1
> >>
> >> And the secondary table would have
> >>
> >> [C, B, A] cf1:cq1 val1
> >> On Jun 10, 2013 3:42 PM, "Michael Segel" <mi...@hotmail.com>
> wrote:
> >>
> >>>
> >>> If I understand you ...
> >>>
> >>> You have the row key = [A,B,C]
> >>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
> >>>
> >>> That is to say that your inverted index would be all of the rows where
> the
> >>> value of C = x  .
> >>> And x is some value.
> >>>
> >>> You should have to worry about column qualifiers just the values of A
> , B
> >>> and C.
> >>>
> >>> In this case, the columns in your index will also be the values of the
> >>> tuples.
> >>> You really don't need C because you already have it, but then you'd
> need
> >>> to remember to add it to the pair (A, B) that you are storing.
> >>> I'd say waste the space and store (A,B,C) but that's just me.
> >>>
> >>>
> >>> Is that what you want to do?
> >>>
> >>> -Mike
> >>>
> >>> On Jun 9, 2013, at 12:16 PM, rob mancuso <rc...@gmail.com> wrote:
> >>>
> >>>> Thx Anoop, I believe this is what I'm looking for.
> >>>>
> >>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
> >>> requirement
> >>>> to access data by [C] only.  So I'm looking to use a post-put
> coprocessor
> >>>> to maintain one secondary index table where the rowkey starts with
> [C].
> >>> My
> >>>> cqs are numerics representing time and can be any number btw 1 and
> 3600
> >>> (ie
> >>>> seconds within an hour). Because I won't know the cq value for each
> >>>> incoming put (just the cf), I need something to deconstruct the put
> into
> >>> a
> >>>> list of cqs ...which I believe you've provided with getFamilyMap.
> >>>>
> >>>> Thx again!
> >>>> On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:
> >>>>
> >>>>> You want to have an index per every CF+CQ right?  You want to
> maintain
> >>> diff
> >>>>> tables for diff columns?
> >>>>>
> >>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this
> List of
> >>>>> KVs you can get all the CQ names and values etc..
> >>>>>
> >>>>> -Anoop-
> >>>>>
> >>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm looking to write a post-put observer coprocessor to maintain a
> >>>>>> secondary index.  Basically, my current rowkey design is a
> composite of
> >>>>>> A,B,C and I want to be able to also access data by C.  So all i'm
> >>> looking
> >>>>>> to do is invert the rowkey and apply it for all cf:cq values that
> come
> >>>>> in.
> >>>>>>
> >>>>>> My problem (i think), is that in all the good examples i've seen,
> they
> >>>>> all
> >>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they
> know
> >>>>> the
> >>>>>> qualifier ahead of time.  I'm looking to specify the family and
> >>> generate
> >>>>> a
> >>>>>> put to the secondary index table for all qualifiers ...not knowing
> or
> >>>>>> caring what the qualifier is.
> >>>>>>
> >>>>>> Any pointers would be appreciated,
> >>>>>> Thx - Rob
> >>>>>>
> >>>>>> Is there a way
> >>>
> >>>
> >
>
>

Re: observer coprocessor question regarding puts

Posted by Michael Segel <mi...@hotmail.com>.

Not to beat a dead horse... 

I did want to touch a bit more on the schema design issues and considerations. 

If you have a really wide composite key and you're only storing a single cell, you will end up with a very long (tall) table. 

Does this make sense? 

Would it make more sense in using a smaller key and then storing multiple cells with part of the rowkey as a column qualifier? 

Using your example... you have [A,B,C] as your rowkey and then Column1 with a value. 

You could make the row key [A, B] with the column qualifier [C] storing the value there. 

Does that make sense? 

-Mike

On Jun 13, 2013, at 9:51 PM, Michel Segel <mi...@hotmail.com> wrote:

> Ok...
> 
> But then you are duplicating the data, so you will have to reconcile the two sets and there is a possibility that the data sets are out of sync.
> 
> I don't know your entire Schema, but if the row key is larger than the value, you may want to think about changing the Schema.
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Jun 13, 2013, at 9:34 PM, rob mancuso <rc...@gmail.com> wrote:
> 
>> Thx Mike, for the most part.
>> 
>> My key is substantially larger than my value, so I was thinking of leaving
>> the cq->value stuff as is and just inverting the rowkey.
>> 
>> So the original table would have
>> 
>> [A, B, C] cf1:cq1 val1
>> 
>> And the secondary table would have
>> 
>> [C, B, A] cf1:cq1 val1
>> On Jun 10, 2013 3:42 PM, "Michael Segel" <mi...@hotmail.com> wrote:
>> 
>>> 
>>> If I understand you ...
>>> 
>>> You have the row key = [A,B,C]
>>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
>>> 
>>> That is to say that your inverted index would be all of the rows where the
>>> value of C = x  .
>>> And x is some value.
>>> 
>>> You should have to worry about column qualifiers just the values of A , B
>>> and C.
>>> 
>>> In this case, the columns in your index will also be the values of the
>>> tuples.
>>> You really don't need C because you already have it, but then you'd need
>>> to remember to add it to the pair (A, B) that you are storing.
>>> I'd say waste the space and store (A,B,C) but that's just me.
>>> 
>>> 
>>> Is that what you want to do?
>>> 
>>> -Mike
>>> 
>>> On Jun 9, 2013, at 12:16 PM, rob mancuso <rc...@gmail.com> wrote:
>>> 
>>>> Thx Anoop, I believe this is what I'm looking for.
>>>> 
>>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
>>> requirement
>>>> to access data by [C] only.  So I'm looking to use a post-put coprocessor
>>>> to maintain one secondary index table where the rowkey starts with [C].
>>> My
>>>> cqs are numerics representing time and can be any number btw 1 and 3600
>>> (ie
>>>> seconds within an hour). Because I won't know the cq value for each
>>>> incoming put (just the cf), I need something to deconstruct the put into
>>> a
>>>> list of cqs ...which I believe you've provided with getFamilyMap.
>>>> 
>>>> Thx again!
>>>> On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:
>>>> 
>>>>> You want to have an index per every CF+CQ right?  You want to maintain
>>> diff
>>>>> tables for diff columns?
>>>>> 
>>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
>>>>> KVs you can get all the CQ names and values etc..
>>>>> 
>>>>> -Anoop-
>>>>> 
>>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm looking to write a post-put observer coprocessor to maintain a
>>>>>> secondary index.  Basically, my current rowkey design is a composite of
>>>>>> A,B,C and I want to be able to also access data by C.  So all i'm
>>> looking
>>>>>> to do is invert the rowkey and apply it for all cf:cq values that come
>>>>> in.
>>>>>> 
>>>>>> My problem (i think), is that in all the good examples i've seen, they
>>>>> all
>>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
>>>>> the
>>>>>> qualifier ahead of time.  I'm looking to specify the family and
>>> generate
>>>>> a
>>>>>> put to the secondary index table for all qualifiers ...not knowing or
>>>>>> caring what the qualifier is.
>>>>>> 
>>>>>> Any pointers would be appreciated,
>>>>>> Thx - Rob
>>>>>> 
>>>>>> Is there a way
>>> 
>>> 
>

Re: observer coprocessor question regarding puts

Posted by Michel Segel <mi...@hotmail.com>.

Ok...

But then you are duplicating the data, so you will have to reconcile the two sets and there is a possibility that the data sets are out of sync.

I don't know your entire Schema, but if the row key is larger than the value, you may want to think about changing the Schema.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 13, 2013, at 9:34 PM, rob mancuso <rc...@gmail.com> wrote:

> Thx Mike, for the most part.
> 
> My key is substantially larger than my value, so I was thinking of leaving
> the cq->value stuff as is and just inverting the rowkey.
> 
> So the original table would have
> 
> [A, B, C] cf1:cq1 val1
> 
> And the secondary table would have
> 
> [C, B, A] cf1:cq1 val1
> On Jun 10, 2013 3:42 PM, "Michael Segel" <mi...@hotmail.com> wrote:
> 
>> 
>> If I understand you ...
>> 
>> You have the row key = [A,B,C]
>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
>> 
>> That is to say that your inverted index would be all of the rows where the
>> value of C = x  .
>> And x is some value.
>> 
>> You should have to worry about column qualifiers just the values of A , B
>> and C.
>> 
>> In this case, the columns in your index will also be the values of the
>> tuples.
>> You really don't need C because you already have it, but then you'd need
>> to remember to add it to the pair (A, B) that you are storing.
>> I'd say waste the space and store (A,B,C) but that's just me.
>> 
>> 
>> Is that what you want to do?
>> 
>> -Mike
>> 
>> On Jun 9, 2013, at 12:16 PM, rob mancuso <rc...@gmail.com> wrote:
>> 
>>> Thx Anoop, I believe this is what I'm looking for.
>>> 
>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
>> requirement
>>> to access data by [C] only.  So I'm looking to use a post-put coprocessor
>>> to maintain one secondary index table where the rowkey starts with [C].
>> My
>>> cqs are numerics representing time and can be any number btw 1 and 3600
>> (ie
>>> seconds within an hour). Because I won't know the cq value for each
>>> incoming put (just the cf), I need something to deconstruct the put into
>> a
>>> list of cqs ...which I believe you've provided with getFamilyMap.
>>> 
>>> Thx again!
>>> On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:
>>> 
>>>> You want to have an index per every CF+CQ right?  You want to maintain
>> diff
>>>> tables for diff columns?
>>>> 
>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
>>>> KVs you can get all the CQ names and values etc..
>>>> 
>>>> -Anoop-
>>>> 
>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm looking to write a post-put observer coprocessor to maintain a
>>>>> secondary index.  Basically, my current rowkey design is a composite of
>>>>> A,B,C and I want to be able to also access data by C.  So all i'm
>> looking
>>>>> to do is invert the rowkey and apply it for all cf:cq values that come
>>>> in.
>>>>> 
>>>>> My problem (i think), is that in all the good examples i've seen, they
>>>> all
>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
>>>> the
>>>>> qualifier ahead of time.  I'm looking to specify the family and
>> generate
>>>> a
>>>>> put to the secondary index table for all qualifiers ...not knowing or
>>>>> caring what the qualifier is.
>>>>> 
>>>>> Any pointers would be appreciated,
>>>>> Thx - Rob
>>>>> 
>>>>> Is there a way
>> 
>>

Re: observer coprocessor question regarding puts

Posted by rob mancuso <rc...@gmail.com>.

Thx Mike, for the most part.

My key is substantially larger than my value, so I was thinking of leaving
the cq->value stuff as is and just inverting the rowkey.

So the original table would have

[A, B, C] cf1:cq1 val1

And the secondary table would have

[C, B, A] cf1:cq1 val1
On Jun 10, 2013 3:42 PM, "Michael Segel" <mi...@hotmail.com> wrote:

>
> If I understand you ...
>
> You have the row key = [A,B,C]
> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
>
> That is to say that your inverted index would be all of the rows where the
> value of C = x  .
> And x is some value.
>
> You should have to worry about column qualifiers just the values of A , B
> and C.
>
> In this case, the columns in your index will also be the values of the
> tuples.
> You really don't need C because you already have it, but then you'd need
> to remember to add it to the pair (A, B) that you are storing.
> I'd say waste the space and store (A,B,C) but that's just me.
>
>
> Is that what you want to do?
>
> -Mike
>
> On Jun 9, 2013, at 12:16 PM, rob mancuso <rc...@gmail.com> wrote:
>
> > Thx Anoop, I believe this is what I'm looking for.
> >
> > Regarding my use case,  my rowkey is [A,B,C], but i also have a
> requirement
> > to access data by [C] only.  So I'm looking to use a post-put coprocessor
> > to maintain one secondary index table where the rowkey starts with [C].
> My
> > cqs are numerics representing time and can be any number btw 1 and 3600
> (ie
> > seconds within an hour). Because I won't know the cq value for each
> > incoming put (just the cf), I need something to deconstruct the put into
> a
> > list of cqs ...which I believe you've provided with getFamilyMap.
> >
> > Thx again!
> > On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:
> >
> >> You want to have an index per every CF+CQ right?  You want to maintain
> diff
> >> tables for diff columns?
> >>
> >> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
> >> KVs you can get all the CQ names and values etc..
> >>
> >> -Anoop-
> >>
> >> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm looking to write a post-put observer coprocessor to maintain a
> >>> secondary index.  Basically, my current rowkey design is a composite of
> >>> A,B,C and I want to be able to also access data by C.  So all i'm
> looking
> >>> to do is invert the rowkey and apply it for all cf:cq values that come
> >> in.
> >>>
> >>> My problem (i think), is that in all the good examples i've seen, they
> >> all
> >>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
> >> the
> >>> qualifier ahead of time.  I'm looking to specify the family and
> generate
> >> a
> >>> put to the secondary index table for all qualifiers ...not knowing or
> >>> caring what the qualifier is.
> >>>
> >>> Any pointers would be appreciated,
> >>> Thx - Rob
> >>>
> >>> Is there a way
> >>>
> >>
>
>

Re: observer coprocessor question regarding puts

Posted by Michael Segel <mi...@hotmail.com>.

If I understand you ... 

You have the row key = [A,B,C] 
You want to create an inverted mapping of  Key [C] => {[A,B,C]}

That is to say that your inverted index would be all of the rows where the value of C = x  .
And x is some value. 

You should have to worry about column qualifiers just the values of A , B and C.

In this case, the columns in your index will also be the values of the tuples. 
You really don't need C because you already have it, but then you'd need to remember to add it to the pair (A, B) that you are storing. 
I'd say waste the space and store (A,B,C) but that's just me. 


Is that what you want to do? 

-Mike

On Jun 9, 2013, at 12:16 PM, rob mancuso <rc...@gmail.com> wrote:

> Thx Anoop, I believe this is what I'm looking for.
> 
> Regarding my use case,  my rowkey is [A,B,C], but i also have a requirement
> to access data by [C] only.  So I'm looking to use a post-put coprocessor
> to maintain one secondary index table where the rowkey starts with [C]. My
> cqs are numerics representing time and can be any number btw 1 and 3600 (ie
> seconds within an hour). Because I won't know the cq value for each
> incoming put (just the cf), I need something to deconstruct the put into a
> list of cqs ...which I believe you've provided with getFamilyMap.
> 
> Thx again!
> On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:
> 
>> You want to have an index per every CF+CQ right?  You want to maintain diff
>> tables for diff columns?
>> 
>> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
>> KVs you can get all the CQ names and values etc..
>> 
>> -Anoop-
>> 
>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I'm looking to write a post-put observer coprocessor to maintain a
>>> secondary index.  Basically, my current rowkey design is a composite of
>>> A,B,C and I want to be able to also access data by C.  So all i'm looking
>>> to do is invert the rowkey and apply it for all cf:cq values that come
>> in.
>>> 
>>> My problem (i think), is that in all the good examples i've seen, they
>> all
>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
>> the
>>> qualifier ahead of time.  I'm looking to specify the family and generate
>> a
>>> put to the secondary index table for all qualifiers ...not knowing or
>>> caring what the qualifier is.
>>> 
>>> Any pointers would be appreciated,
>>> Thx - Rob
>>> 
>>> Is there a way
>>> 
>>

Re: observer coprocessor question regarding puts

Posted by rob mancuso <rc...@gmail.com>.

Thx Anoop, I believe this is what I'm looking for.

Regarding my use case,  my rowkey is [A,B,C], but i also have a requirement
to access data by [C] only.  So I'm looking to use a post-put coprocessor
to maintain one secondary index table where the rowkey starts with [C]. My
cqs are numerics representing time and can be any number btw 1 and 3600 (ie
seconds within an hour). Because I won't know the cq value for each
incoming put (just the cf), I need something to deconstruct the put into a
list of cqs ...which I believe you've provided with getFamilyMap.

Thx again!
On Jun 9, 2013 12:47 AM, "Anoop John" <an...@gmail.com> wrote:

> You want to have an index per every CF+CQ right?  You want to maintain diff
> tables for diff columns?
>
> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
> KVs you can get all the CQ names and values etc..
>
> -Anoop-
>
> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm looking to write a post-put observer coprocessor to maintain a
> > secondary index.  Basically, my current rowkey design is a composite of
> > A,B,C and I want to be able to also access data by C.  So all i'm looking
> > to do is invert the rowkey and apply it for all cf:cq values that come
> in.
> >
> > My problem (i think), is that in all the good examples i've seen, they
> all
> > deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
> the
> > qualifier ahead of time.  I'm looking to specify the family and generate
> a
> > put to the secondary index table for all qualifiers ...not knowing or
> > caring what the qualifier is.
> >
> > Any pointers would be appreciated,
> > Thx - Rob
> >
> > Is there a way
> >
>

Re: observer coprocessor question regarding puts

Posted by Anoop John <an...@gmail.com>.

You want to have an index per every CF+CQ right?  You want to maintain diff
tables for diff columns?

Put is having getFamilyMap method Map CF vs List KVs.  From this List of
KVs you can get all the CQ names and values etc..

-Anoop-

On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rc...@gmail.com> wrote:

> Hi,
>
> I'm looking to write a post-put observer coprocessor to maintain a
> secondary index.  Basically, my current rowkey design is a composite of
> A,B,C and I want to be able to also access data by C.  So all i'm looking
> to do is invert the rowkey and apply it for all cf:cq values that come in.
>
> My problem (i think), is that in all the good examples i've seen, they all
> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know the
> qualifier ahead of time.  I'm looking to specify the family and generate a
> put to the secondary index table for all qualifiers ...not knowing or
> caring what the qualifier is.
>
> Any pointers would be appreciated,
> Thx - Rob
>
> Is there a way
>