You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Rob Verkuylen <ro...@verkuylen.net> on 2018/09/10 13:40:16 UTC

Dynamic set of visibility labels / Trusted query

Hi,

I'm designing a table where the ColumnVisibility is based off of labels
within the data as it comes in. I know the formatting, but not the exact
label names and the labels change over time.

Based on the labels of the data which I can use for ColumnVisibility and
the user's access to certain labels(determined via LDAP group memberchip) I
want to use the labels as a filtering mechanism.

Questions are: Is this the best approach? Obviously this works client side,
but I want to push the filtering to the server. It seems that I need to
assign every possible label to a user, to get access to it. Ideally I would
have a trusted used with all access and the query engine extracts the
labels from the current user and uses them as authorisation labels.

I guess this is also possible with (custom?) filtering on the server, but
this would seem the ideal use case for visibility labels with Accumulo to
me.

Thnx, Rob

Re: Dynamic set of visibility labels / Trusted query

Posted by Rob Verkuylen <ro...@verkuylen.net>.
Alright given your feedback I think a slightly modified filtering iterator
can do the job as well. In reality I need to not only give acces based on
source/sensor, but also based off of other labels like legal, enrichments,
flags etc. Some of there are more static, so maybe a combination of both
Auth/ColViz and the custom filter.  Thnx

On Tue, Sep 11, 2018 at 4:40 PM, Josh Elser <el...@apache.org> wrote:

> Remember that visibility labels are effectively post-filters. If you want
> to support access patterns of data based on specific data sources, you
> would need to structure your data in such a way that you can find all of
> the data from that sensor fast.
>
> If the kind of data you get from one sensor is static (e.g. always of a
> specific type), I'd suggest you build labels based on that. Labels
> describing the data itself, instead of the sensor which that data came from.
>
> If you need to filter data purely based on the sensor the data came from,
> I can't think of a better approach than tagging each sensor. This will have
> limitations when it comes to the built-in ZooKeeper-backed visibility
> labels implementation which you will have to eventually consider replacing
> (the ZK impl can probably only scale to the 10000s-100000s of labels)
>
> On 9/11/18 10:01 AM, Rob Verkuylen wrote:
>
>> Thnx Josh,
>>
>> You're right, I wont change the labels or their meaning. In this use case
>> I have sensor data coming in from X amount of sensors, all labeled SensorX.
>> New sensor data comes in every day from existing sources, but also from
>> newly installed sensors with a previously unknown but unique name. I want
>> to label the data with the sensorname, so I can give ppl access to the
>> sensordata belonging to them specifically or their team.
>>
>> The amount of sensors can run into the thousands, meaning that amount of
>> labels added to the user. I get that adding auth labels makes sense when
>> the labels are pretty much static, but what of use cases like these? Would
>> a filtering iterator be a better match?
>>
>> Thnx, Rob
>>
>> On Mon, Sep 10, 2018 at 6:04 PM, Josh Elser <elserj@apache.org <mailto:
>> elserj@apache.org>> wrote:
>>
>>     I think you know this already, but I'm not 100% sure based on your
>>     message: trying to change the labels on the data is a bad idea. If
>>     you need to handle a case where a label means one thing on day1 and
>>     another thing on day2, you would need to build the logic to handle
>> that.
>>
>>     The only other thing I thought of is that you will have to write
>>     code that updates your "trusted user" to have all of the
>>     authorization labels that you might use. Even the Accumulo
>>     "superuser" must have the proper authorizations set that you want to
>>     use.
>>
>>
>>     On 9/10/18 9:40 AM, Rob Verkuylen wrote:
>>
>>         Hi,
>>
>>         I'm designing a table where the ColumnVisibility is based off of
>>         labels within the data as it comes in. I know the formatting,
>>         but not the exact label names and the labels change over time.
>>
>>         Based on the labels of the data which I can use for
>>         ColumnVisibility and the user's access to certain
>>         labels(determined via LDAP group memberchip) I want to use the
>>         labels as a filtering mechanism.
>>
>>         Questions are: Is this the best approach? Obviously this works
>>         client side, but I want to push the filtering to the server. It
>>         seems that I need to assign every possible label to a user, to
>>         get access to it. Ideally I would have a trusted used with all
>>         access and the query engine extracts the labels from the current
>>         user and uses them as authorisation labels.
>>
>>         I guess this is also possible with (custom?) filtering on the
>>         server, but this would seem the ideal use case for visibility
>>         labels with Accumulo to me.
>>
>>         Thnx, Rob
>>
>>
>>

Re: Dynamic set of visibility labels / Trusted query

Posted by Josh Elser <el...@apache.org>.
Remember that visibility labels are effectively post-filters. If you 
want to support access patterns of data based on specific data sources, 
you would need to structure your data in such a way that you can find 
all of the data from that sensor fast.

If the kind of data you get from one sensor is static (e.g. always of a 
specific type), I'd suggest you build labels based on that. Labels 
describing the data itself, instead of the sensor which that data came from.

If you need to filter data purely based on the sensor the data came 
from, I can't think of a better approach than tagging each sensor. This 
will have limitations when it comes to the built-in ZooKeeper-backed 
visibility labels implementation which you will have to eventually 
consider replacing (the ZK impl can probably only scale to the 
10000s-100000s of labels)

On 9/11/18 10:01 AM, Rob Verkuylen wrote:
> Thnx Josh,
> 
> You're right, I wont change the labels or their meaning. In this use 
> case I have sensor data coming in from X amount of sensors, all labeled 
> SensorX. New sensor data comes in every day from existing sources, but 
> also from newly installed sensors with a previously unknown but unique 
> name. I want to label the data with the sensorname, so I can give ppl 
> access to the sensordata belonging to them specifically or their team.
> 
> The amount of sensors can run into the thousands, meaning that amount of 
> labels added to the user. I get that adding auth labels makes sense when 
> the labels are pretty much static, but what of use cases like these? 
> Would a filtering iterator be a better match?
> 
> Thnx, Rob
> 
> On Mon, Sep 10, 2018 at 6:04 PM, Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     I think you know this already, but I'm not 100% sure based on your
>     message: trying to change the labels on the data is a bad idea. If
>     you need to handle a case where a label means one thing on day1 and
>     another thing on day2, you would need to build the logic to handle that.
> 
>     The only other thing I thought of is that you will have to write
>     code that updates your "trusted user" to have all of the
>     authorization labels that you might use. Even the Accumulo
>     "superuser" must have the proper authorizations set that you want to
>     use.
> 
> 
>     On 9/10/18 9:40 AM, Rob Verkuylen wrote:
> 
>         Hi,
> 
>         I'm designing a table where the ColumnVisibility is based off of
>         labels within the data as it comes in. I know the formatting,
>         but not the exact label names and the labels change over time.
> 
>         Based on the labels of the data which I can use for
>         ColumnVisibility and the user's access to certain
>         labels(determined via LDAP group memberchip) I want to use the
>         labels as a filtering mechanism.
> 
>         Questions are: Is this the best approach? Obviously this works
>         client side, but I want to push the filtering to the server. It
>         seems that I need to assign every possible label to a user, to
>         get access to it. Ideally I would have a trusted used with all
>         access and the query engine extracts the labels from the current
>         user and uses them as authorisation labels.
> 
>         I guess this is also possible with (custom?) filtering on the
>         server, but this would seem the ideal use case for visibility
>         labels with Accumulo to me.
> 
>         Thnx, Rob
> 
> 

Re: Dynamic set of visibility labels / Trusted query

Posted by Rob Verkuylen <ro...@verkuylen.net>.
Thnx Josh,

You're right, I wont change the labels or their meaning. In this use case I
have sensor data coming in from X amount of sensors, all labeled SensorX.
New sensor data comes in every day from existing sources, but also from
newly installed sensors with a previously unknown but unique name. I want
to label the data with the sensorname, so I can give ppl access to the
sensordata belonging to them specifically or their team.

The amount of sensors can run into the thousands, meaning that amount of
labels added to the user. I get that adding auth labels makes sense when
the labels are pretty much static, but what of use cases like these? Would
a filtering iterator be a better match?

Thnx, Rob

On Mon, Sep 10, 2018 at 6:04 PM, Josh Elser <el...@apache.org> wrote:

> I think you know this already, but I'm not 100% sure based on your
> message: trying to change the labels on the data is a bad idea. If you need
> to handle a case where a label means one thing on day1 and another thing on
> day2, you would need to build the logic to handle that.
>
> The only other thing I thought of is that you will have to write code that
> updates your "trusted user" to have all of the authorization labels that
> you might use. Even the Accumulo "superuser" must have the proper
> authorizations set that you want to use.
>
>
> On 9/10/18 9:40 AM, Rob Verkuylen wrote:
>
>> Hi,
>>
>> I'm designing a table where the ColumnVisibility is based off of labels
>> within the data as it comes in. I know the formatting, but not the exact
>> label names and the labels change over time.
>>
>> Based on the labels of the data which I can use for ColumnVisibility and
>> the user's access to certain labels(determined via LDAP group memberchip) I
>> want to use the labels as a filtering mechanism.
>>
>> Questions are: Is this the best approach? Obviously this works client
>> side, but I want to push the filtering to the server. It seems that I need
>> to assign every possible label to a user, to get access to it. Ideally I
>> would have a trusted used with all access and the query engine extracts the
>> labels from the current user and uses them as authorisation labels.
>>
>> I guess this is also possible with (custom?) filtering on the server, but
>> this would seem the ideal use case for visibility labels with Accumulo to
>> me.
>>
>> Thnx, Rob
>>
>

Re: Dynamic set of visibility labels / Trusted query

Posted by Josh Elser <el...@apache.org>.
I think you know this already, but I'm not 100% sure based on your 
message: trying to change the labels on the data is a bad idea. If you 
need to handle a case where a label means one thing on day1 and another 
thing on day2, you would need to build the logic to handle that.

The only other thing I thought of is that you will have to write code 
that updates your "trusted user" to have all of the authorization labels 
that you might use. Even the Accumulo "superuser" must have the proper 
authorizations set that you want to use.

On 9/10/18 9:40 AM, Rob Verkuylen wrote:
> Hi,
> 
> I'm designing a table where the ColumnVisibility is based off of labels 
> within the data as it comes in. I know the formatting, but not the exact 
> label names and the labels change over time.
> 
> Based on the labels of the data which I can use for ColumnVisibility and 
> the user's access to certain labels(determined via LDAP group 
> memberchip) I want to use the labels as a filtering mechanism.
> 
> Questions are: Is this the best approach? Obviously this works client 
> side, but I want to push the filtering to the server. It seems that I 
> need to assign every possible label to a user, to get access to it. 
> Ideally I would have a trusted used with all access and the query engine 
> extracts the labels from the current user and uses them as authorisation 
> labels.
> 
> I guess this is also possible with (custom?) filtering on the server, 
> but this would seem the ideal use case for visibility labels with 
> Accumulo to me.
> 
> Thnx, Rob