You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by be...@thomsonreuters.com on 2016/04/13 17:12:54 UTC

Append Visibility Labels?

We sell data. A product can be defined as a permission to access data (at a cell level). Visibility Labels look like a very good candidate for implementing this model.

The implementation works well until we create a new product over old data. We can set the visibility label for the new product but, whoops, by applying it to the relevant cells we've overwritten all the existing labels on those cells, destroying the permissioning of our older products. What to do?

One answer would be to append the new visibility label to the existing label expressions on the cells with an 'OR'. But I'm not sure that's possible .. yet?

Thanks,

Ben

________________________________

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://site.thomsonreuters.com/site/disclosures/>

RE: Append Visibility Labels?

Posted by be...@thomsonreuters.com.
> > Good to know that we haven't killed off the old products! But I'm not sure
> > the archaeological approach would scale.

> I'm curious if it would get you over your current hump. 

Yes, I think this will see us through the proof-of-concept work and early performance testing. 

In fact, from the shell at least, the scan does not need to explicitly reference versions (once they have been set in the schema). All labels, present and past, appear to have been linked with 'OR' - which is the behavior we wanted.

Thank you!

One more comment inline below ...

> -----Original Message-----
> From: Andrew Purtell [mailto:andrew.purtell@gmail.com]
> Sent: 14 April 2016 19:12
> To: Whittam Smith, Benedict (TR Technology & Ops)
> Cc: user@hbase.apache.org
> Subject: Re: Append Visibility Labels?
> 
> > > ​Actually the old product data doesn't have to die, Benedict. Set
> > > VERSIONS > 1 in your schema. The old cell version(s) carrying the
> old
> > > label set will still be there, accessible with a Scan that asks for
> N
> > > versions instead of just the latest. You'll get back a Result with
> up
> > > to N cells to iterate over and figure out how to process and
> display
> > > the information. If you only want the latest, use a Get instead.
> >
> > Good to know that we haven't killed off the old products! But I'm not
> sure
> > the archaeological approach would scale.
> 
> I'm curious if it would get you over your current hump.
> 
> Something we could consider is providing an operation attribute that
> tells core to do what Append and Increment already do, which is for all
> tags on the old value, grab them and add them to the tag set of the
> current value. No plug in combiners. Tags are "combined" by core as in
> piled up all in the latest cell. However this has a bunch of problems:
> - Mutations carrying that attribute would now have to read and possibly
> go to disk to find any relevant old value
> - Coprocessors like the AccessController and VisibilityController must
> be taught to handle cases where when enumerating over tags on a cell
> they'll find more than one. They should handle this anyway though. I
> need to check the code to see what they do (or don't do)
> - Tags themselves don't have timestamps. We can try to keep them sorted
> (by time) when building lists of them in memory and serializing them.
> - Unlikely that one-size-fits-all semantics will satisfy everyone, or
> anyone
> 
> > ​The generic facility you describe, caveats noted, certainly seems to
> fit
> > our use case - especially if we are talking of combining label
> expressions.
> > I guess we'd always use an 'OR' operator to add them. But what if we
> > wanted to remove a product/visibility label?
> 
> That's a problem with a generic approach. The default 'combiner' for
> the visibility label tag type would do one general thing - probably,
> OR. So we'd want to allow users to supply their own, configurable in CF
> schema, and I imagine having just one will not be flexible enough, so
> supply a stack of them, and probably in implementation combination
> should happen at compaction time because that's when we are iterating
> over cells anyway and when expired cells or cells lying under a
> tombstone or newer version would otherwise be lost - and hey! now we've
> implemented Accumulo's iterators in HBase. Why not just do that?
> 

From my, possibly naïve, perspective a cell holds a single Boolean expression of visibility label tags.

We can overwrite this expression or, for convenience, we can append an operator and a new expression to create an expanded one.

For utility the former requires 'getters' along with 'setters'. Quite heavy-weight. The latter does not - hence the convenience.

Which is good, because only our edge-cases seem to require the more heavy-weight approach.

> 
> On Wed, Apr 13, 2016 at 11:05 AM,
> <be...@thomsonreuters.com> wrote:
> Yes - it's a capability we would need to efficiently support
> permissioning.
> 
> Good to know that we haven't killed off the old products! But I'm not
> sure the archaeological approach would scale.
> ​​
> The generic facility you describe, caveats noted, certainly seems to
> fit our use case - especially if we are talking of combining label
> expressions.
> 
> I guess we'd always use an 'OR' operator to add them. But what if we
> wanted to remove a product/visibility label?
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:andrew.purtell@gmail.com]
> Sent: 13 April 2016 17:23
> To: user@hbase.apache.org
> Subject: Re: Append Visibility Labels?
> 
> I think Benedict was asking if it would be possible to add the
> capability.
> ​​
> Actually the old product data doesn't have to die, Benedict. Set
> VERSIONS > 1 in your schema. The old cell version(s) carrying the old
> label set will still be there, accessible with a Scan that asks for N
> versions instead of just the latest. You'll get back a Result with up
> to N cells to iterate over and figure out how to process and display
> the information. If you only want the latest, use a Get instead.
> 
> I think it could be possible to introduce a generic facility for
> handling the case where you have an existing value on the server, that
> value has tags attached, now a new mutation op has arrived with a tag
> attached _and_ another op attribute set by the client is asking for any
> tags on an earlier cell version be brought forward. For each tag type
> there would be a registered "combiner" that does what makes sense for
> its particulars. We do this in core for Append and Increment already,
> but without the notion of combination. This is an off the cuff remark,
> caveat: I haven't spent time thinking through implications.
> 
> > On Apr 13, 2016, at 8:58 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > There is currently no API for appending Visibility Labels.
> >
> > checkAndPut() only allows you to compare value, not labels.
> >
> > On Wed, Apr 13, 2016 at 8:12 AM,
> > <be...@thomsonreuters.com>
> > wrote:
> >
> >> We sell data. A product can be defined as a permission to access
> data
> >> (at a cell level). Visibility Labels look like a very good candidate
> >> for implementing this model.
> >>
> >> The implementation works well until we create a new product over old
> data.
> >> We can set the visibility label for the new product but, whoops, by
> >> applying it to the relevant cells we've overwritten all the existing
> >> labels on those cells, destroying the permissioning of our older
> >> products. What to do?
> >>
> >> One answer would be to append the new visibility label to the
> >> existing label expressions on the cells with an 'OR'. But I'm not
> >> sure that's possible .. yet?
> >>
> >> Thanks,
> >>
> >> Ben
> >>
> >> ________________________________
> >>
> >> This e-mail is for the sole use of the intended recipient and
> >> contains information that may be privileged and/or confidential. If
> >> you are not an intended recipient, please notify the sender by
> return
> >> e-mail and delete this e-mail and any attachments. Certain required
> >> legal entity disclosures can be accessed on our website.<
> >> http://site.thomsonreuters.com/site/disclosures/>
> >>


Re: Append Visibility Labels?

Posted by Andrew Purtell <an...@gmail.com>.
> > ​Actually the old product data doesn't have to die, Benedict. Set
> > VERSIONS > 1 in your schema. The old cell version(s) carrying the old
> > label set will still be there, accessible with a Scan that asks for N
> > versions instead of just the latest. You'll get back a Result with up
> > to N cells to iterate over and figure out how to process and display
> > the information. If you only want the latest, use a Get instead.
>
> Good to know that we haven't killed off the old products! But I'm not sure
> the archaeological approach would scale.

I'm curious if it would get you over your current hump.

Something we could consider is providing an operation attribute that tells
core to do what Append and Increment already do, which is for all tags on
the old value, grab them and add them to the tag set of the current value.
No plug in combiners. Tags are "combined" by core as in piled up all in the
latest cell. However this has a bunch of problems:
- Mutations carrying that attribute would now have to read and possibly go
to disk to find any relevant old value
- Coprocessors like the AccessController and VisibilityController must be
taught to handle cases where when enumerating over tags on a cell they'll
find more than one. They should handle this anyway though. I need to check
the code to see what they do (or don't do)
- Tags themselves don't have timestamps. We can try to keep them sorted (by
time) when building lists of them in memory and serializing them.
- Unlikely that one-size-fits-all semantics will satisfy everyone, or anyone

> ​The generic facility you describe, caveats noted, certainly seems to fit
> our use case - especially if we are talking of combining label
expressions.
> I guess we'd always use an 'OR' operator to add them. But what if we
> wanted to remove a product/visibility label?

That's a problem with a generic approach. The default 'combiner' for the
visibility label tag type would do one general thing - probably, OR. So
we'd want to allow users to supply their own, configurable in CF schema,
and I imagine having just one will not be flexible enough, so supply a
stack of them, and probably in implementation combination should happen at
compaction time because that's when we are iterating over cells anyway and
when expired cells or cells lying under a tombstone or newer version would
otherwise be lost - and hey! now we've implemented Accumulo's iterators in
HBase. Why not just do that?


On Wed, Apr 13, 2016 at 11:05 AM, <be...@thomsonreuters.com>
wrote:

> Yes - it's a capability we would need to efficiently support permissioning.
>
> Good to know that we haven't killed off the old products! But I'm not sure
> the archaeological approach would scale.
>
> ​​
> The generic facility you describe, caveats noted, certainly seems to fit
> our use case - especially if we are talking of combining label expressions.
>
> I guess we'd always use an 'OR' operator to add them. But what if we
> wanted to remove a product/visibility label?
>
> -----Original Message-----
> From: Andrew Purtell [mailto:andrew.purtell@gmail.com]
> Sent: 13 April 2016 17:23
> To: user@hbase.apache.org
> Subject: Re: Append Visibility Labels?
>
> I think Benedict was asking if it would be possible to add the capability.
>
> ​​
> Actually the old product data doesn't have to die, Benedict. Set VERSIONS
> > 1 in your schema. The old cell version(s) carrying the old label set will
> still be there, accessible with a Scan that asks for N versions instead of
> just the latest. You'll get back a Result with up to N cells to iterate
> over and figure out how to process and display the information. If you only
> want the latest, use a Get instead.
>
> I think it could be possible to introduce a generic facility for handling
> the case where you have an existing value on the server, that value has
> tags attached, now a new mutation op has arrived with a tag attached _and_
> another op attribute set by the client is asking for any tags on an earlier
> cell version be brought forward. For each tag type there would be a
> registered "combiner" that does what makes sense for its particulars. We do
> this in core for Append and Increment already, but without the notion of
> combination. This is an off the cuff remark, caveat: I haven't spent time
> thinking through implications.
>
> > On Apr 13, 2016, at 8:58 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > There is currently no API for appending Visibility Labels.
> >
> > checkAndPut() only allows you to compare value, not labels.
> >
> > On Wed, Apr 13, 2016 at 8:12 AM,
> > <be...@thomsonreuters.com>
> > wrote:
> >
> >> We sell data. A product can be defined as a permission to access data
> >> (at a cell level). Visibility Labels look like a very good candidate
> >> for implementing this model.
> >>
> >> The implementation works well until we create a new product over old
> data.
> >> We can set the visibility label for the new product but, whoops, by
> >> applying it to the relevant cells we've overwritten all the existing
> >> labels on those cells, destroying the permissioning of our older
> >> products. What to do?
> >>
> >> One answer would be to append the new visibility label to the
> >> existing label expressions on the cells with an 'OR'. But I'm not
> >> sure that's possible .. yet?
> >>
> >> Thanks,
> >>
> >> Ben
> >>
> >> ________________________________
> >>
> >> This e-mail is for the sole use of the intended recipient and
> >> contains information that may be privileged and/or confidential. If
> >> you are not an intended recipient, please notify the sender by return
> >> e-mail and delete this e-mail and any attachments. Certain required
> >> legal entity disclosures can be accessed on our website.<
> >> http://site.thomsonreuters.com/site/disclosures/>
> >>
>

RE: Append Visibility Labels?

Posted by be...@thomsonreuters.com.
Yes - it's a capability we would need to efficiently support permissioning.

Good to know that we haven't killed off the old products! But I'm not sure the archaeological approach would scale.

The generic facility you describe, caveats noted, certainly seems to fit our use case - especially if we are talking of combining label expressions.

I guess we'd always use an 'OR' operator to add them. But what if we wanted to remove a product/visibility label?

-----Original Message-----
From: Andrew Purtell [mailto:andrew.purtell@gmail.com] 
Sent: 13 April 2016 17:23
To: user@hbase.apache.org
Subject: Re: Append Visibility Labels?

I think Benedict was asking if it would be possible to add the capability. 

Actually the old product data doesn't have to die, Benedict. Set VERSIONS > 1 in your schema. The old cell version(s) carrying the old label set will still be there, accessible with a Scan that asks for N versions instead of just the latest. You'll get back a Result with up to N cells to iterate over and figure out how to process and display the information. If you only want the latest, use a Get instead. 

I think it could be possible to introduce a generic facility for handling the case where you have an existing value on the server, that value has tags attached, now a new mutation op has arrived with a tag attached _and_ another op attribute set by the client is asking for any tags on an earlier cell version be brought forward. For each tag type there would be a registered "combiner" that does what makes sense for its particulars. We do this in core for Append and Increment already, but without the notion of combination. This is an off the cuff remark, caveat: I haven't spent time thinking through implications. 

> On Apr 13, 2016, at 8:58 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> There is currently no API for appending Visibility Labels.
> 
> checkAndPut() only allows you to compare value, not labels.
> 
> On Wed, Apr 13, 2016 at 8:12 AM, 
> <be...@thomsonreuters.com>
> wrote:
> 
>> We sell data. A product can be defined as a permission to access data 
>> (at a cell level). Visibility Labels look like a very good candidate 
>> for implementing this model.
>> 
>> The implementation works well until we create a new product over old data.
>> We can set the visibility label for the new product but, whoops, by 
>> applying it to the relevant cells we've overwritten all the existing 
>> labels on those cells, destroying the permissioning of our older 
>> products. What to do?
>> 
>> One answer would be to append the new visibility label to the 
>> existing label expressions on the cells with an 'OR'. But I'm not 
>> sure that's possible .. yet?
>> 
>> Thanks,
>> 
>> Ben
>> 
>> ________________________________
>> 
>> This e-mail is for the sole use of the intended recipient and 
>> contains information that may be privileged and/or confidential. If 
>> you are not an intended recipient, please notify the sender by return 
>> e-mail and delete this e-mail and any attachments. Certain required 
>> legal entity disclosures can be accessed on our website.< 
>> http://site.thomsonreuters.com/site/disclosures/>
>> 

Re: Append Visibility Labels?

Posted by Andrew Purtell <an...@gmail.com>.
I think Benedict was asking if it would be possible to add the capability. 

Actually the old product data doesn't have to die, Benedict. Set VERSIONS > 1 in your schema. The old cell version(s) carrying the old label set will still be there, accessible with a Scan that asks for N versions instead of just the latest. You'll get back a Result with up to N cells to iterate over and figure out how to process and display the information. If you only want the latest, use a Get instead. 

I think it could be possible to introduce a generic facility for handling the case where you have an existing value on the server, that value has tags attached, now a new mutation op has arrived with a tag attached _and_ another op attribute set by the client is asking for any tags on an earlier cell version be brought forward. For each tag type there would be a registered "combiner" that does what makes sense for its particulars. We do this in core for Append and Increment already, but without the notion of combination. This is an off the cuff remark, caveat: I haven't spent time thinking through implications. 

> On Apr 13, 2016, at 8:58 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> There is currently no API for appending Visibility Labels.
> 
> checkAndPut() only allows you to compare value, not labels.
> 
> On Wed, Apr 13, 2016 at 8:12 AM, <be...@thomsonreuters.com>
> wrote:
> 
>> We sell data. A product can be defined as a permission to access data (at
>> a cell level). Visibility Labels look like a very good candidate for
>> implementing this model.
>> 
>> The implementation works well until we create a new product over old data.
>> We can set the visibility label for the new product but, whoops, by
>> applying it to the relevant cells we've overwritten all the existing labels
>> on those cells, destroying the permissioning of our older products. What to
>> do?
>> 
>> One answer would be to append the new visibility label to the existing
>> label expressions on the cells with an 'OR'. But I'm not sure that's
>> possible .. yet?
>> 
>> Thanks,
>> 
>> Ben
>> 
>> ________________________________
>> 
>> This e-mail is for the sole use of the intended recipient and contains
>> information that may be privileged and/or confidential. If you are not an
>> intended recipient, please notify the sender by return e-mail and delete
>> this e-mail and any attachments. Certain required legal entity disclosures
>> can be accessed on our website.<
>> http://site.thomsonreuters.com/site/disclosures/>
>> 

Re: Append Visibility Labels?

Posted by Ted Yu <yu...@gmail.com>.
There is currently no API for appending Visibility Labels.

checkAndPut() only allows you to compare value, not labels.

On Wed, Apr 13, 2016 at 8:12 AM, <be...@thomsonreuters.com>
wrote:

> We sell data. A product can be defined as a permission to access data (at
> a cell level). Visibility Labels look like a very good candidate for
> implementing this model.
>
> The implementation works well until we create a new product over old data.
> We can set the visibility label for the new product but, whoops, by
> applying it to the relevant cells we've overwritten all the existing labels
> on those cells, destroying the permissioning of our older products. What to
> do?
>
> One answer would be to append the new visibility label to the existing
> label expressions on the cells with an 'OR'. But I'm not sure that's
> possible .. yet?
>
> Thanks,
>
> Ben
>
> ________________________________
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.<
> http://site.thomsonreuters.com/site/disclosures/>
>