You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Vilnis Termanis <vi...@iotics.com> on 2022/01/21 15:12:39 UTC

Dynamically restricting graph access at SPARQL query time

Hi,

For a SPARQL query via Fuseki, we are trying to restrict visibility of
groups of triples (each with multiple subjects) dynamically, in order
to allow for generic queries to be executed by users (instead of
providing tinned ones).

Looking at the available ACL mechanisms in Jena/Fuseki, I assume
storing each of these groups as a distinct graph might be the way
forward. (The expectation is to be able to support 10^5 or higher
number of these.)

I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
LDAP/other), only consider triples from the set of graphs 1..N during
the query. (Where the allowed list of 1..N graphs is to be looked up
at the point of the query.)

From my limited understanding, some potential routes are:

a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
Filter" support in TDB.
However, the configuration of allowed graphs per user is static at runtime.

b) jena-permissions - Extends the SPARQL query engine with an Op
rewriter which allows a user-defined evalulator implementation to
allow/deny access to a graph/triple, given a specific user/principle.
(The specific yes/no evaluation responses are cached for the duration
of a query/operation.)
However, this can only applied to a single graph as it stands.

c) Parse & re-write the query to e.g. scope it using a fixed set of
"FROM" clauses. From some minimal testing (with ~200 FROM clauses)
this does not appear to perform well (compare to a tinned query which
explicitly restricts access via knowledge of the ontologies involved).
I appreciate that maybe having a large list of FROM clauses is an
anti-pattern.

My questions are:

1) Does filtering to a set of subset of graphs (from a large set of
graphs) to restrict access sounds like a sensible thing to do? (Note
that each of these graphs would contain a set of multiple subjects -
i.e. we are not trying filter by specific predicate/object values.)

2) Would extending either jena-fuseki-access to support the
user-graph-list lookup dynamically OR extend jena-permissions to work
at dataset level be sensible things to do?

3) If the answer to either of (2) is yes - I'd be interested in
getting a better understanding of what would be involved to gauge the
size/effort of such an extension. I have had a look codebases for the
aforementioned projects, but my knowledge of TDB/ARQ/etc is very
limited. (We'd potentially be interested in taking this on, time &
priorities permitting.)

I didn't know which mailing list to send this to but I thought the
users list would probably be a better starting point.

Regards,
Vilnis

--
Vilnis Termanis
Senior Software Developer

e | vilnis.termanis@iotics.com
www.iotics.com

Re: Dynamically restricting graph access at SPARQL query time

Posted by Martynas Jusevičius <ma...@atomgraph.com>.

You're more than welcome :)

On Mon, Jan 24, 2022 at 3:41 PM Vilnis Termanis
<vi...@iotics.com> wrote:
>
> Hi Martynas,
>
> Thank you very much for the suggestion (and additional information out-of-band).
> I've been having a look at LinkedDataHub and will come back to you
> with some questions, if you don't mind.
>
> Regards,
> Vilnis
>
> On Fri, 21 Jan 2022 at 15:26, Martynas Jusevičius
> <ma...@atomgraph.com> wrote:
> >
> > WebAccessControl ontology might be relevant here:
> > https://www.w3.org/wiki/WebAccessControl
> > We're using a request filter that controls access against
> > authorizations using SPARQL.
> >
> > On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
> > <vi...@iotics.com> wrote:
> > >
> > > Hi,
> > >
> > > For a SPARQL query via Fuseki, we are trying to restrict visibility of
> > > groups of triples (each with multiple subjects) dynamically, in order
> > > to allow for generic queries to be executed by users (instead of
> > > providing tinned ones).
> > >
> > > Looking at the available ACL mechanisms in Jena/Fuseki, I assume
> > > storing each of these groups as a distinct graph might be the way
> > > forward. (The expectation is to be able to support 10^5 or higher
> > > number of these.)
> > >
> > > I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
> > > LDAP/other), only consider triples from the set of graphs 1..N during
> > > the query. (Where the allowed list of 1..N graphs is to be looked up
> > > at the point of the query.)
> > >
> > > From my limited understanding, some potential routes are:
> > >
> > > a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
> > > Filter" support in TDB.
> > > However, the configuration of allowed graphs per user is static at runtime.
> > >
> > > b) jena-permissions - Extends the SPARQL query engine with an Op
> > > rewriter which allows a user-defined evalulator implementation to
> > > allow/deny access to a graph/triple, given a specific user/principle.
> > > (The specific yes/no evaluation responses are cached for the duration
> > > of a query/operation.)
> > > However, this can only applied to a single graph as it stands.
> > >
> > > c) Parse & re-write the query to e.g. scope it using a fixed set of
> > > "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
> > > this does not appear to perform well (compare to a tinned query which
> > > explicitly restricts access via knowledge of the ontologies involved).
> > > I appreciate that maybe having a large list of FROM clauses is an
> > > anti-pattern.
> > >
> > > My questions are:
> > >
> > > 1) Does filtering to a set of subset of graphs (from a large set of
> > > graphs) to restrict access sounds like a sensible thing to do? (Note
> > > that each of these graphs would contain a set of multiple subjects -
> > > i.e. we are not trying filter by specific predicate/object values.)
> > >
> > > 2) Would extending either jena-fuseki-access to support the
> > > user-graph-list lookup dynamically OR extend jena-permissions to work
> > > at dataset level be sensible things to do?
> > >
> > > 3) If the answer to either of (2) is yes - I'd be interested in
> > > getting a better understanding of what would be involved to gauge the
> > > size/effort of such an extension. I have had a look codebases for the
> > > aforementioned projects, but my knowledge of TDB/ARQ/etc is very
> > > limited. (We'd potentially be interested in taking this on, time &
> > > priorities permitting.)
> > >
> > > I didn't know which mailing list to send this to but I thought the
> > > users list would probably be a better starting point.
> > >
> > > Regards,
> > > Vilnis
> > >
> > > --
> > > Vilnis Termanis
> > > Senior Software Developer
> > >
> > > e | vilnis.termanis@iotics.com
> > > www.iotics.com
>
>
>
> --
> Vilnis Termanis
> Senior Software Developer
>
> m | +44 (0) 7521 012309
> e | vilnis.termanis@iotics.com
> www.iotics.com
>
> The information contained in this email is strictly confidential and
> intended only for the parties noted. If this email was not intended
> for your use, please contact Iotics. For more on our Privacy Policy
> please visit https://www.iotics.com/legal/

Re: Dynamically restricting graph access at SPARQL query time

Posted by Vilnis Termanis <vi...@iotics.com>.

Hi Martynas,

Thank you very much for the suggestion (and additional information out-of-band).
I've been having a look at LinkedDataHub and will come back to you
with some questions, if you don't mind.

Regards,
Vilnis

On Fri, 21 Jan 2022 at 15:26, Martynas Jusevičius
<ma...@atomgraph.com> wrote:
>
> WebAccessControl ontology might be relevant here:
> https://www.w3.org/wiki/WebAccessControl
> We're using a request filter that controls access against
> authorizations using SPARQL.
>
> On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
> <vi...@iotics.com> wrote:
> >
> > Hi,
> >
> > For a SPARQL query via Fuseki, we are trying to restrict visibility of
> > groups of triples (each with multiple subjects) dynamically, in order
> > to allow for generic queries to be executed by users (instead of
> > providing tinned ones).
> >
> > Looking at the available ACL mechanisms in Jena/Fuseki, I assume
> > storing each of these groups as a distinct graph might be the way
> > forward. (The expectation is to be able to support 10^5 or higher
> > number of these.)
> >
> > I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
> > LDAP/other), only consider triples from the set of graphs 1..N during
> > the query. (Where the allowed list of 1..N graphs is to be looked up
> > at the point of the query.)
> >
> > From my limited understanding, some potential routes are:
> >
> > a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
> > Filter" support in TDB.
> > However, the configuration of allowed graphs per user is static at runtime.
> >
> > b) jena-permissions - Extends the SPARQL query engine with an Op
> > rewriter which allows a user-defined evalulator implementation to
> > allow/deny access to a graph/triple, given a specific user/principle.
> > (The specific yes/no evaluation responses are cached for the duration
> > of a query/operation.)
> > However, this can only applied to a single graph as it stands.
> >
> > c) Parse & re-write the query to e.g. scope it using a fixed set of
> > "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
> > this does not appear to perform well (compare to a tinned query which
> > explicitly restricts access via knowledge of the ontologies involved).
> > I appreciate that maybe having a large list of FROM clauses is an
> > anti-pattern.
> >
> > My questions are:
> >
> > 1) Does filtering to a set of subset of graphs (from a large set of
> > graphs) to restrict access sounds like a sensible thing to do? (Note
> > that each of these graphs would contain a set of multiple subjects -
> > i.e. we are not trying filter by specific predicate/object values.)
> >
> > 2) Would extending either jena-fuseki-access to support the
> > user-graph-list lookup dynamically OR extend jena-permissions to work
> > at dataset level be sensible things to do?
> >
> > 3) If the answer to either of (2) is yes - I'd be interested in
> > getting a better understanding of what would be involved to gauge the
> > size/effort of such an extension. I have had a look codebases for the
> > aforementioned projects, but my knowledge of TDB/ARQ/etc is very
> > limited. (We'd potentially be interested in taking this on, time &
> > priorities permitting.)
> >
> > I didn't know which mailing list to send this to but I thought the
> > users list would probably be a better starting point.
> >
> > Regards,
> > Vilnis
> >
> > --
> > Vilnis Termanis
> > Senior Software Developer
> >
> > e | vilnis.termanis@iotics.com
> > www.iotics.com



-- 
Vilnis Termanis
Senior Software Developer

m | +44 (0) 7521 012309
e | vilnis.termanis@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Re: Dynamically restricting graph access at SPARQL query time

Posted by Andy Seaborne <an...@apache.org>.

Hi Vilnis,

For all performance issues, it is hard to say without all the details 
because we are talking about overheads deep down the processing stack. 
It makes it sensitive to the query, the data and concurrent load.

As does setup - for UI response, latency matters as well as throughput.

The only certainty is that fine-grained security does incur costs, and 
that isn't a comment specific to RDF. c.f. Row level ACLs in SQL.

Inline ...

On 24/01/2022 14:42, Vilnis Termanis wrote:
> Hi Andy,
> 
> Hope you're well - nice to hear from you. (responses inline)
> 
> On Sat, 22 Jan 2022 at 13:57, Andy Seaborne <an...@apache.org> wrote:
>>
>>
>>
>> On 21/01/2022 15:26, Martynas Jusevičius wrote:
>>> WebAccessControl ontology might be relevant here:
>>> https://www.w3.org/wiki/WebAccessControl
>>> We're using a request filter that controls access against
>>> authorizations using SPARQL.
>>
>>>
>>> On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
>>> <vi...@iotics.com> wrote:
>>>>
>>>> Hi,
>>
>> Hi Vilnis,
>>
>>>> For a SPARQL query via Fuseki, we are trying to restrict visibility of
>>>> groups of triples (each with multiple subjects) dynamically, in order
>>>> to allow for generic queries to be executed by users (instead of
>>>> providing tinned ones).
>>
>>>> Looking at the available ACL mechanisms in Jena/Fuseki, I assume
>>>> storing each of these groups as a distinct graph might be the way
>>>> forward. (The expectation is to be able to support 10^5 or higher
>>>> number of these.)
>>
>> If each graph is in the same TDB dataset, graph numbers are not much
>> different from any other node frequency. Millions of graphs are
>> possible. It's all quads. 4 Node/NodeIds.
> 
> Great, that's what I was hoping.
> 
>>
>> So it might be a way forward (details matter...)
>>
>> Managing said dataset is another matter.
>>
>> The description sounds a bit SOLID-like - see Martynas's comment
>> and -> https://inrupt.com.
>>
>>>> I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
>>>> LDAP/other), only consider triples from the set of graphs 1..N during
>>>> the query. (Where the allowed list of 1..N graphs is to be looked up
>>>> at the point of the query.)
>>
>> How often is LDAP being accessed per query execution? Going off machine
>> is a significant cost compared to triple access.  (From experience of
>> others, LDAP servers can be "unhelpful" - e.g. big spread in the latency
>> of requests based on environmental factors).
> 
> I think I could have worded that better: Given a provided (at query
> time) user/Principal (which Fuseki/Jena does not have to
> authenticate), only consider graphs 1..N (determined based on the
> principle.)

Another factor: Location of where is the mapping of user to graphs managed.

And whether the application-logic layer is trusted to make security 
decisions.

If multi-tenant, then where trust boundaries are, is going to be a factor.

details ...

>> (shiro is only integrated for Fuseki/webapp - it does work with
>> Fuseki/main but you have to add it. Current WIP should, eventually,
>> improve this.)
>>
>>>>   From my limited understanding, some potential routes are:
>>>>
>>>> a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
>>>> Filter" support in TDB.
>>
>> Yes. Filtering is a hook to use. Sounds like your UC might need its own
>> filter code (in Java) for the policy.
>>
>>>> However, the configuration of allowed graphs per user is static at runtime.
>>
>> jena-fuseki-access is a layer on top of the filtering mechanism for the
>> common case of ACLs on named graphs. That layer isn't compulsory for
>> quad filtering. The code may be inspiration for setup of custom code.
> 
>  From your perspective, if the UC is indeed only about graph-level
> filtering (and not more granular), are there specific pros/cons of
> implementing such a filter using the TDB quad hook VS query engine op
> rewriting?
> Is the former more efficient (due to being lower-level maybe?) or do
> they both in the end have a very similar job - exclude matched quads
> if their graphs are no in the allowed list.

Probably faster as TDB2 quad hook but "it depends".

>>>> b) jena-permissions - Extends the SPARQL query engine with an Op
...

>>>> c) Parse & re-write the query to e.g. scope it using a fixed set of
>>>> "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
>>>> this does not appear to perform well (compare to a tinned query which
>>>> explicitly restricts access via knowledge of the ontologies involved).
>>>> I appreciate that maybe having a large list of FROM clauses is an
>>>> anti-pattern.
>>
>> Quite likely - depends on the query complexity and numbers. There's a
>> hook in Fuseki query evaluation for this - did you try that or did you
>> do it client-side?
> 
> I've only tried it directly in the client. I presume this hook you
> mention would have roughly the same perf as specifying in at the
> client end?

Better, and probably still better when running as the general case of 
dataset adding features over a TDB dataset.

But if the information of which 200+ is in the business logic layer 
(system-trusted), as in your prototyping, it would have to be available 
on the server.

In our (£job) platform, we do fine-grained security on RDF in an ABAC 
style pushed as close the data as possible. Caching setup and security 
decisions during execution will help.

>> If the query is a small amount of work, the setup overhead will be
>> significant but it is (roughly) a fixed overhead so a longer running
>> query is less impacted.
>>
>>>> My questions are:
>>>>
>>>> 1) Does filtering to a set of subset of graphs (from a large set of
>>>> graphs) to restrict access sounds like a sensible thing to do? (Note
>>>> that each of these graphs would contain a set of multiple subjects -
>>>> i.e. we are not trying filter by specific predicate/object values.)
>>
>> Sounds possible - "sensible" depends on the details of the intended usage.
>>
>>>> 2) Would extending either jena-fuseki-access to support the
>>>> user-graph-list lookup dynamically OR extend jena-permissions to work
>>>> at dataset level be sensible things to do?
>>
>> Functionally - yes, but lots of details matter.
>>
>> And user-graph-list sounds SOLID-like.
>>
>> In the SOLID approach the access path is known and its the path that
>> decides access or not.  Very different to filtering.
> 
> Our ACL definitions (for the "this user can read these graphs") are
> similar to SOLID's WACL draft spec (in that, when we have more time,
> we should migrate to using it). I.e. we can answer the "Can user X see
> resource/graph Y" question using SPARQL - and we use this for
> tinned/parameterised queries. However, if we want to allow end-users
> to write their own queries but ensure the ACL still applies, then
> rewriting the query (using approaches other than (c)) seems complex.
> Although an implementation (e.g. Jena/Fuseki) independent generic
> SPARQL solution would be great - we're considering all approaches for
> performance reasons.

The data access patterns/assumptions are rather different in SOLID and I 
may be behind the curve here and query over pods (or security within 
pods) may have had some work done on it.

>>>> 3) If the answer to either of (2) is yes - I'd be interested in
>>>> getting a better understanding of what would be involved to gauge the
>>>> size/effort of such an extension. I have had a look codebases for the
>>>> aforementioned projects, but my knowledge of TDB/ARQ/etc is very
>>>> limited. (We'd potentially be interested in taking this on, time &
>>>> priorities permitting.)
>>
>> Great!
>>
>>>> I didn't know which mailing list to send this to but I thought the
>>>> users list would probably be a better starting point.
>>
>> Here is OK.
>>
>>       Andy
>>
>>>>
>>>> Regards,
>>>> Vilnis

Re: Dynamically restricting graph access at SPARQL query time

Posted by Vilnis Termanis <vi...@iotics.com>.

Hi Andy,

Hope you're well - nice to hear from you. (responses inline)

On Sat, 22 Jan 2022 at 13:57, Andy Seaborne <an...@apache.org> wrote:
>
>
>
> On 21/01/2022 15:26, Martynas Jusevičius wrote:
> > WebAccessControl ontology might be relevant here:
> > https://www.w3.org/wiki/WebAccessControl
> > We're using a request filter that controls access against
> > authorizations using SPARQL.
>
> >
> > On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
> > <vi...@iotics.com> wrote:
> >>
> >> Hi,
>
> Hi Vilnis,
>
> >> For a SPARQL query via Fuseki, we are trying to restrict visibility of
> >> groups of triples (each with multiple subjects) dynamically, in order
> >> to allow for generic queries to be executed by users (instead of
> >> providing tinned ones).
>
> >> Looking at the available ACL mechanisms in Jena/Fuseki, I assume
> >> storing each of these groups as a distinct graph might be the way
> >> forward. (The expectation is to be able to support 10^5 or higher
> >> number of these.)
>
> If each graph is in the same TDB dataset, graph numbers are not much
> different from any other node frequency. Millions of graphs are
> possible. It's all quads. 4 Node/NodeIds.

Great, that's what I was hoping.

>
> So it might be a way forward (details matter...)
>
> Managing said dataset is another matter.
>
> The description sounds a bit SOLID-like - see Martynas's comment
> and -> https://inrupt.com.
>
> >> I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
> >> LDAP/other), only consider triples from the set of graphs 1..N during
> >> the query. (Where the allowed list of 1..N graphs is to be looked up
> >> at the point of the query.)
>
> How often is LDAP being accessed per query execution? Going off machine
> is a significant cost compared to triple access.  (From experience of
> others, LDAP servers can be "unhelpful" - e.g. big spread in the latency
> of requests based on environmental factors).

I think I could have worded that better: Given a provided (at query
time) user/Principal (which Fuseki/Jena does not have to
authenticate), only consider graphs 1..N (determined based on the
principle.)

> (shiro is only integrated for Fuseki/webapp - it does work with
> Fuseki/main but you have to add it. Current WIP should, eventually,
> improve this.)
>
> >>  From my limited understanding, some potential routes are:
> >>
> >> a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
> >> Filter" support in TDB.
>
> Yes. Filtering is a hook to use. Sounds like your UC might need its own
> filter code (in Java) for the policy.
>
> >> However, the configuration of allowed graphs per user is static at runtime.
>
> jena-fuseki-access is a layer on top of the filtering mechanism for the
> common case of ACLs on named graphs. That layer isn't compulsory for
> quad filtering. The code may be inspiration for setup of custom code.

From your perspective, if the UC is indeed only about graph-level
filtering (and not more granular), are there specific pros/cons of
implementing such a filter using the TDB quad hook VS query engine op
rewriting?
Is the former more efficient (due to being lower-level maybe?) or do
they both in the end have a very similar job - exclude matched quads
if their graphs are no in the allowed list.

>
> >> b) jena-permissions - Extends the SPARQL query engine with an Op
> >> rewriter which allows a user-defined evalulator implementation to
> >> allow/deny access to a graph/triple, given a specific user/principle.
> >> (The specific yes/no evaluation responses are cached for the duration
> >> of a query/operation.)
>
>  From what I know, should work. Claude may be able to say more.
>
> >> However, this can only applied to a single graph as it stands.
>
> A dataset is a collection of named graphs. Each graph can have
> jena-permissions wrapped around it.

Indeed - what I wasn't sure about is how that wrapping would work in
the "any-number-of-graphs" (as opposed to fixed via config) case so
assumed there might be some extra work to move it up a level, to
dataset. (I found a thread from 2018 where it's suggested this:
https://markmail.org/message/z5tsgblnqivpdqvy )

>
> Your UC description was about groups of triples, and then it slid into
> named graph. Is NG points above and below about implementation
> possibility or is the incoming data already using named graphs?
>

I only wrote "groups of triples" because I wanted to make sure that
representing them each in their own graph was a sensible thing to do.
You're right - my thoughts on how it could be solved were all about
NGs.

The data isn't currently segregated into many graphs, but it'd be
relatively trivial to change this in our UC, so for my questions in
this thread we can assume they are already each in their own NG. What
I forgot to mention was that, in the case of (a) or (b) from above, I
am imagining the querying would happen against the union graph such
that clients don't need to know what the graphs are which they are
allowed to see. (Hence attempt (c) using FROM rather than FROM NAMED.)

> This approach will not be using the low level filtering but it may
> not show. What matters is the number of visible graphs, not the total
> number.

Ah right - so I presume you're eluding to the fact that the TDB1/2
quad filter is a lower-level tool than the query op rewrite. Regarding
your note on visible graphs - is that an implication about performance
impact for both (a) & (b) when a large percentage of graphs is
visible, or have I misinterpreted your statement?

> Can be possibly combined with (C).

Interesting - I hadn't though of that. But I'm not 100% clear on how
the responsibilities might be split between (a)/(b) and (c) for this.

> >> c) Parse & re-write the query to e.g. scope it using a fixed set of
> >> "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
> >> this does not appear to perform well (compare to a tinned query which
> >> explicitly restricts access via knowledge of the ontologies involved).
> >> I appreciate that maybe having a large list of FROM clauses is an
> >> anti-pattern.
>
> Quite likely - depends on the query complexity and numbers. There's a
> hook in Fuseki query evaluation for this - did you try that or did you
> do it client-side?

I've only tried it directly in the client. I presume this hook you
mention would have roughly the same perf as specifying in at the
client end?

> If the query is a small amount of work, the setup overhead will be
> significant but it is (roughly) a fixed overhead so a longer running
> query is less impacted.
>
> >> My questions are:
> >>
> >> 1) Does filtering to a set of subset of graphs (from a large set of
> >> graphs) to restrict access sounds like a sensible thing to do? (Note
> >> that each of these graphs would contain a set of multiple subjects -
> >> i.e. we are not trying filter by specific predicate/object values.)
>
> Sounds possible - "sensible" depends on the details of the intended usage.
>
> >> 2) Would extending either jena-fuseki-access to support the
> >> user-graph-list lookup dynamically OR extend jena-permissions to work
> >> at dataset level be sensible things to do?
>
> Functionally - yes, but lots of details matter.
>
> And user-graph-list sounds SOLID-like.
>
> In the SOLID approach the access path is known and its the path that
> decides access or not.  Very different to filtering.

Our ACL definitions (for the "this user can read these graphs") are
similar to SOLID's WACL draft spec (in that, when we have more time,
we should migrate to using it). I.e. we can answer the "Can user X see
resource/graph Y" question using SPARQL - and we use this for
tinned/parameterised queries. However, if we want to allow end-users
to write their own queries but ensure the ACL still applies, then
rewriting the query (using approaches other than (c)) seems complex.
Although an implementation (e.g. Jena/Fuseki) independent generic
SPARQL solution would be great - we're considering all approaches for
performance reasons.

> >> 3) If the answer to either of (2) is yes - I'd be interested in
> >> getting a better understanding of what would be involved to gauge the
> >> size/effort of such an extension. I have had a look codebases for the
> >> aforementioned projects, but my knowledge of TDB/ARQ/etc is very
> >> limited. (We'd potentially be interested in taking this on, time &
> >> priorities permitting.)
>
> Great!
>
> >> I didn't know which mailing list to send this to but I thought the
> >> users list would probably be a better starting point.
>
> Here is OK.
>
>      Andy
>
> >>
> >> Regards,
> >> Vilnis
> >>
> >> --
> >> Vilnis Termanis
> >> Senior Software Developer
> >>
> >> e | vilnis.termanis@iotics.com
> >> www.iotics.com

-- 
Vilnis Termanis
Senior Software Developer

e | vilnis.termanis@iotics.com
www.iotics.com

Re: Dynamically restricting graph access at SPARQL query time

Posted by Andy Seaborne <an...@apache.org>.

On 21/01/2022 15:26, Martynas Jusevičius wrote:
> WebAccessControl ontology might be relevant here:
> https://www.w3.org/wiki/WebAccessControl
> We're using a request filter that controls access against
> authorizations using SPARQL.

> 
> On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
> <vi...@iotics.com> wrote:
>>
>> Hi,

Hi Vilnis,

>> For a SPARQL query via Fuseki, we are trying to restrict visibility of
>> groups of triples (each with multiple subjects) dynamically, in order
>> to allow for generic queries to be executed by users (instead of
>> providing tinned ones).

>> Looking at the available ACL mechanisms in Jena/Fuseki, I assume
>> storing each of these groups as a distinct graph might be the way
>> forward. (The expectation is to be able to support 10^5 or higher
>> number of these.)

If each graph is in the same TDB dataset, graph numbers are not much 
different from any other node frequency. Millions of graphs are 
possible. It's all quads. 4 Node/NodeIds.

So it might be a way forward (details matter...)

Managing said dataset is another matter.

The description sounds a bit SOLID-like - see Martynas's comment
and -> https://inrupt.com.

>> I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
>> LDAP/other), only consider triples from the set of graphs 1..N during
>> the query. (Where the allowed list of 1..N graphs is to be looked up
>> at the point of the query.)

How often is LDAP being accessed per query execution? Going off machine 
is a significant cost compared to triple access.  (From experience of 
others, LDAP servers can be "unhelpful" - e.g. big spread in the latency 
of requests based on environmental factors).

(shiro is only integrated for Fuseki/webapp - it does work with 
Fuseki/main but you have to add it. Current WIP should, eventually, 
improve this.)

>>  From my limited understanding, some potential routes are:
>>
>> a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
>> Filter" support in TDB.

Yes. Filtering is a hook to use. Sounds like your UC might need its own 
filter code (in Java) for the policy.

>> However, the configuration of allowed graphs per user is static at runtime.

jena-fuseki-access is a layer on top of the filtering mechanism for the 
common case of ACLs on named graphs. That layer isn't compulsory for 
quad filtering. The code may be inspiration for setup of custom code.

>> b) jena-permissions - Extends the SPARQL query engine with an Op
>> rewriter which allows a user-defined evalulator implementation to
>> allow/deny access to a graph/triple, given a specific user/principle.
>> (The specific yes/no evaluation responses are cached for the duration
>> of a query/operation.)

 From what I know, should work. Claude may be able to say more.

>> However, this can only applied to a single graph as it stands.

A dataset is a collection of named graphs. Each graph can have 
jena-permissions wrapped around it.

Your UC description was about groups of triples, and then it slid into 
named graph. Is NG points above and below about implementation 
possibility or is the incoming data already using named graphs?

This approach will not be using the low level filtering but it may
not show. What matters is the number of visible graphs, not the total 
number.

Can be possibly combined with (C).

>> c) Parse & re-write the query to e.g. scope it using a fixed set of
>> "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
>> this does not appear to perform well (compare to a tinned query which
>> explicitly restricts access via knowledge of the ontologies involved).
>> I appreciate that maybe having a large list of FROM clauses is an
>> anti-pattern.

Quite likely - depends on the query complexity and numbers. There's a 
hook in Fuseki query evaluation for this - did you try that or did you 
do it client-side?

If the query is a small amount of work, the setup overhead will be 
significant but it is (roughly) a fixed overhead so a longer running 
query is less impacted.

>> My questions are:
>>
>> 1) Does filtering to a set of subset of graphs (from a large set of
>> graphs) to restrict access sounds like a sensible thing to do? (Note
>> that each of these graphs would contain a set of multiple subjects -
>> i.e. we are not trying filter by specific predicate/object values.)

Sounds possible - "sensible" depends on the details of the intended usage.

>> 2) Would extending either jena-fuseki-access to support the
>> user-graph-list lookup dynamically OR extend jena-permissions to work
>> at dataset level be sensible things to do?

Functionally - yes, but lots of details matter.

And user-graph-list sounds SOLID-like.

In the SOLID approach the access path is known and its the path that 
decides access or not.  Very different to filtering.

>> 3) If the answer to either of (2) is yes - I'd be interested in
>> getting a better understanding of what would be involved to gauge the
>> size/effort of such an extension. I have had a look codebases for the
>> aforementioned projects, but my knowledge of TDB/ARQ/etc is very
>> limited. (We'd potentially be interested in taking this on, time &
>> priorities permitting.)

Great!

>> I didn't know which mailing list to send this to but I thought the
>> users list would probably be a better starting point.

Here is OK.

     Andy

>>
>> Regards,
>> Vilnis
>>
>> --
>> Vilnis Termanis
>> Senior Software Developer
>>
>> e | vilnis.termanis@iotics.com
>> www.iotics.com

Re: Dynamically restricting graph access at SPARQL query time

Posted by Martynas Jusevičius <ma...@atomgraph.com>.

WebAccessControl ontology might be relevant here:
https://www.w3.org/wiki/WebAccessControl
We're using a request filter that controls access against
authorizations using SPARQL.

On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
<vi...@iotics.com> wrote:
>
> Hi,
>
> For a SPARQL query via Fuseki, we are trying to restrict visibility of
> groups of triples (each with multiple subjects) dynamically, in order
> to allow for generic queries to be executed by users (instead of
> providing tinned ones).
>
> Looking at the available ACL mechanisms in Jena/Fuseki, I assume
> storing each of these groups as a distinct graph might be the way
> forward. (The expectation is to be able to support 10^5 or higher
> number of these.)
>
> I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
> LDAP/other), only consider triples from the set of graphs 1..N during
> the query. (Where the allowed list of 1..N graphs is to be looked up
> at the point of the query.)
>
> From my limited understanding, some potential routes are:
>
> a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
> Filter" support in TDB.
> However, the configuration of allowed graphs per user is static at runtime.
>
> b) jena-permissions - Extends the SPARQL query engine with an Op
> rewriter which allows a user-defined evalulator implementation to
> allow/deny access to a graph/triple, given a specific user/principle.
> (The specific yes/no evaluation responses are cached for the duration
> of a query/operation.)
> However, this can only applied to a single graph as it stands.
>
> c) Parse & re-write the query to e.g. scope it using a fixed set of
> "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
> this does not appear to perform well (compare to a tinned query which
> explicitly restricts access via knowledge of the ontologies involved).
> I appreciate that maybe having a large list of FROM clauses is an
> anti-pattern.
>
> My questions are:
>
> 1) Does filtering to a set of subset of graphs (from a large set of
> graphs) to restrict access sounds like a sensible thing to do? (Note
> that each of these graphs would contain a set of multiple subjects -
> i.e. we are not trying filter by specific predicate/object values.)
>
> 2) Would extending either jena-fuseki-access to support the
> user-graph-list lookup dynamically OR extend jena-permissions to work
> at dataset level be sensible things to do?
>
> 3) If the answer to either of (2) is yes - I'd be interested in
> getting a better understanding of what would be involved to gauge the
> size/effort of such an extension. I have had a look codebases for the
> aforementioned projects, but my knowledge of TDB/ARQ/etc is very
> limited. (We'd potentially be interested in taking this on, time &
> priorities permitting.)
>
> I didn't know which mailing list to send this to but I thought the
> users list would probably be a better starting point.
>
> Regards,
> Vilnis
>
> --
> Vilnis Termanis
> Senior Software Developer
>
> e | vilnis.termanis@iotics.com
> www.iotics.com