You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Vilnis Termanis <vi...@iotics.com.INVALID> on 2022/04/07 10:10:25 UTC

Interaction between Text indexing, Fuseki services & Data Access Control

Hi,

In brief: Can Fuseki Data ACL be applied to text indexing? And is it
possible to selectively expose text index access per service for a
shared dataset?

In detail:

We're using a single TDB dataset in unionDefaultGraph mode) with
multiple services, wrapped with both ACL (AccessControlledDataset) as
well as text indexing (TextDataset) and are hoping to provide the
following Fuseki services:

1. "full access" - a) Read/write everything b) including text index
2. "selected graphs only" - a) Read only from selected graphs b) no index access
3. "read all" - a) Read everything b) no index access

In the assembler configuration, datasets for the above services are
respectively defined as (where all use the same underlying dataset):
1. TextDataset(DatasetTDB)
2. AccessControlledDataset(DatasetTDB)
3. DatasetTDB

1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
access to text indexing, despite not being explicitly configured as
such in their respective services.

From looking at code, I can see that index availability is based on
the TextQuery.textIndex symbol in the execution context
(TextQueryPF.java). This means that, as long as at least one service
enabled text indexing on a dataset, any other services referencing the
same underlying store will also use it.
(Judging by comments in the code, the "instanceof DatasetGraphText"
check is deprecated, even if the logic for now remains in
chooseTextIndex()).

So our questions are:

I) Is it currently possible to disallow access to the text index for
some services but not others (using the same underlying dataset)?
II) If not, what might be best approach to implement such a
restriction? (Would traversal of DatasetGraphWrapper to explicitly
find a DatasetGraphText instance make sense?)
III) Or: Is there a different/better approach to solve the index
visibility need described above?

In addition, regarding spatial lookups:
IV) Would GeoSPARQL querying (and it's online caching) respect
AccessControlledDataset restrictions (when querying is performed over
multiple services with different levels of ACL)?

Regards,
Vilnis

-- 
Vilnis Termanis
Senior Software Developer

e | vilnis.termanis@iotics.com
www.iotics.com

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Vilnis Termanis <vi...@iotics.com.INVALID>.

(apologies for the previous quote mess - forgot to switch to plain text mode)

On Tue, 3 May 2022 at 19:06, Andy Seaborne <an...@apache.org> wrote:
>
>
>
> On 03/05/2022 15:56, Vilnis Termanis wrote:
>
> (fixing up the quote):
>
> > My understanding is that, for updates, the stack of datasets is respected
> > when deciding whether access the text index whereas for querying the
> > dataset context is used, which is not stacked.
>
> I'm confused. "understanding of what is supposed to happen" or
> "understanding of what does happen"? What is "stacked" - do you mean the
> context settings should not pushed down? That seems like a safer
> approach to me and removes the burden on the user knowing to set the
> endpoint context.

(I didn't word that well.) What I meant is: From my (limited)
understanding of the code:
1) Updates to the text index happen because, at runtime, the base
dataset instance is wrapped as a Text dataset (with method overrides)
- this mirrors the assembly/config. Only updated targeting the wrapped
DS will result in text index updates.
2) Queries on the other hand hit the text index based on presence of
the "text:index" value in the merged context. Since there appears to
be only one dataset context (regardless of the number of DS
wrappings), as long as somewhere in the configuration a dataset has
been wrapped as a Text dataset, even the base dataset (with it's
updated context) will also end up allowing use of the text index.

>
> > (The context merging is for
> > server + dataset + endpoint but there is only one of each.)
> > I think the proposed fix would cause issues only if there is an expectation
> > for AccessControlledDataset to behave differently when it comes to handling
> > endpoint context: Right now it is ignored.
>
> Yes, if it is secure - that has be be determined. As per comment on the
> PR, requiring the user to set endpoint context is making the users life
> harder.
>
> But we have two issues here - whether the access control is correctly
> picking up the context (a general issue c.f. timeouts) if it is safe to
> do so, and whether the text dataset should push down the context setting
> (specific issues of locating the text index).
>
> There is a subsidiary issue: cleanly unsetting a context entry.
>
> > But hopefully this is a good starting point to throw rocks at.
>
> So should the PR be a draft?

Yes, I should have set it as such before. (Done now.)

>
>      Andy



-- 
Vilnis Termanis
Technical Specialist

e | vilnis.termanis@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Andy Seaborne <an...@apache.org>.

On 03/05/2022 15:56, Vilnis Termanis wrote:

(fixing up the quote):

> My understanding is that, for updates, the stack of datasets is respected
> when deciding whether access the text index whereas for querying the
> dataset context is used, which is not stacked.

I'm confused. "understanding of what is supposed to happen" or 
"understanding of what does happen"? What is "stacked" - do you mean the 
context settings should not pushed down? That seems like a safer 
approach to me and removes the burden on the user knowing to set the 
endpoint context.

> (The context merging is for
> server + dataset + endpoint but there is only one of each.)
> I think the proposed fix would cause issues only if there is an expectation
> for AccessControlledDataset to behave differently when it comes to handling
> endpoint context: Right now it is ignored.

Yes, if it is secure - that has be be determined. As per comment on the 
PR, requiring the user to set endpoint context is making the users life 
harder.

But we have two issues here - whether the access control is correctly 
picking up the context (a general issue c.f. timeouts) if it is safe to 
do so, and whether the text dataset should push down the context setting 
(specific issues of locating the text index).

There is a subsidiary issue: cleanly unsetting a context entry.

> But hopefully this is a good starting point to throw rocks at.

So should the PR be a draft?

     Andy

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Vilnis Termanis <vi...@iotics.com.INVALID>.

On Sat, 16 Apr 2022 at 21:29, Andy Seaborne <an...@apache.org> wrote:

>
>
> On 14/04/2022 20:57, Vilnis Termanis wrote:
> > Hi,
> >
> > Specifying the following in each service (which is to ignore text
> > indexing) now works (tested against 4.4.0):
> >
> > ja:context [ ja:cxtName "http://jena.apache.org/text#index" ;
> > ja:cxtValue false ] ;
>
> Doesn't that cause warnings in the Fuseki log?
>

Yes it does - three (expected) warnings from TextQueryPF:
1) Context setting 'symbol:http://jena.apache.org/text#index'is not a
TextIndex
2) Failed to find the text index : tried context and as a text-enabled
dataset
3) No text index - no text search performed

(Note that I also tried specifying the override as "undef" - but that
doesn't work because it basically unsets the existing value and the dataset
one then rightly is used.)

Is the stack of datasets still the same as earlier?
>

Yes it is: One as TextDataset(DatasetTDB) and the other as
AccessControlledDataset(DatasetTDB) (where DatasetTDB is the same dataset
for both).


> If you are saying that is necessary that is necessary, it looks like the
> text context contaminates the base dataset but the fix may break the
> reverse case (if anyone uses it) of direct access to the storage DB. The
> fix isn't a quick one, but it so happens the code (a version of Context
> that keeps changes does exists albeit not in that codebase).
>
> My understanding is that, for updates, the stack of datasets is respected
when deciding whether access the text index whereas for querying the
dataset context is used, which is not stacked. (The context merging is for
server + dataset + endpoint but there is only one of each.)
I think the proposed fix would cause issues only if there is an expectation
for AccessControlledDataset to behave differently when it comes to handling
endpoint context: Right now it is ignored.


> >
> > ... but not if the associated dataset is an AccessControlledDataset.
> >
> >  From my understanding, the issue is to do with the fact that
> > fuseki-access uses QueryExecutionFactory whilst fuseki-core uses
> > QueryExecDatasetBuilder. The latter takes the HttpAction's context
> > into account (which presumably leads to the inclusion of the service
> > context values) while the former does not. (In addition, it would
> > appear that fuseki-access does not honour query-specific timeouts due
> > to a similar reason.)
>
> Timeouts ignored because the context is skipped?
>
> The timeout is not stored in the context, but instead on the
SPARQLQueryProcessor instance (via its setTimeouts method).


> > This patch seems to fix the issue:
> >
> https://github.com/vtermanis/jena/commit/e5cb112f829f305c1f76c8f5305f4394d8e9b04f
>
> Would you be able to turn that into a PR?
>
> https://github.com/apache/jena/pull/1291

As I said, I'm aware there's probably a better way to expose the common
"merge endpoint context with DS + server ones" + "honour timeout
parameter". But hopefully this is a good starting point to throw rocks at.

> I know that this most likely is not the right way to address it.
> > (Should fuseki-access re-use some common QueryExecution building code
> > from fuseki-core?) I also wasn't sure how to add an automated (to
> > jena-integration-tests or with mocking in fuseki-access?) test-case
> > for this, but I can provide minimal manual steps.
> >
> > Should I create a Jira ticket for this?
>
> There are several "this" here.
>
> > Regards,
> > Vilnis
>
>      Andy
>
> >
> >
> > On Tue, 12 Apr 2022 at 21:44, Vilnis Termanis
> > <vi...@iotics.com> wrote:
> >>
> >> Hi Andy,
> >>
> >> Thank you for the suggestion of in-config context overrides - I had
> >> not realised that was possible (with the newer style of defining
> >> Fuseki services) - that's really useful.
> >> We'll re-rest the aforementioned 2b & 3b cases.
> >>
> >> Regards,
> >> Vilnis
> >>
> >> On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <an...@apache.org> wrote:
> >>>
> >>> Hi Vilnis,
> >>>
> >>> On 07/04/2022 11:10, Vilnis Termanis wrote:
> >>>> Hi,
> >>>>
> >>>> In brief: Can Fuseki Data ACL be applied to text indexing?
> >>>
> >>> As a general point - a text index itself is not ACL aware. It is setup
> >>> ahead of time and does not index triples directly. The GeoSPARQL cache
> >>> is probably similar (I'm less familiar with the GeoSPARQL code).
> >>>
> >>> When the query is under the control of a trusted client, the pattern:
> >>>
> >>> WHERE {
> >>>       ?s a ex:Product ;
> >>>          text:query (rdfs:label 'printer') ;
> >>>          rdfs:label ?lbl
> >>> }
> >>>
> >>> can be check of the triple.
> >>>
> >>> If the query isn't controlled, then that won't work.
> >>>
> >>> (Has your usage style changed in the last year?)
> >>>
> >>>> And is it
> >>>> possible to selectively expose text index access per service for a
> >>>> shared dataset?
> >>>
> >>> Yes.
> >>>
> >>> The context setting can be set per dataset, per service or per endpoint
> >>> with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;
> >>>
> >>> E.g.
> >>>       fuseki:endpoint [
> >>>           fuseki:operation fuseki:query ;
> >>>           fuseki:name "sparql"
> >>>           ja:context [
> >>>              ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
> >>>           ] ;
> >>>       ] ;
> >>>
> >>>>
> >>>> In detail:
> >>>>
> >>>> We're using a single TDB dataset in unionDefaultGraph mode) with
> >>>> multiple services, wrapped with both ACL (AccessControlledDataset) as
> >>>> well as text indexing (TextDataset) and are hoping to provide the
> >>>> following Fuseki services:
> >>>>
> >>>> 1. "full access" - a) Read/write everything b) including text index
> >>>> 2. "selected graphs only" - a) Read only from selected graphs b) no
> index access
> >>>> 3. "read all" - a) Read everything b) no index access
> >>>>
> >>>> In the assembler configuration, datasets for the above services are
> >>>> respectively defined as (where all use the same underlying dataset):
> >>>> 1. TextDataset(DatasetTDB)
> >>>> 2. AccessControlledDataset(DatasetTDB)
> >>>> 3. DatasetTDB
> >>>>
> >>>> 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
> >>>> access to text indexing, despite not being explicitly configured as
> >>>> such in their respective services.
> >>>
> >>> re: 2b/3b: That could be a bug or a configuration error.
> >>>
> >>> The context value is set on the text dataset. So if the server
> >>> configuration has a service that does not go through the text dataset,
> >>> the index should not be visible. There will be an entry in the server
> log.
> >>>
> >>> You don't actually need the DatasetGraphText if the index is only read
> >>> (i.e. preloaded and no runtime updates).
> >>>
> >>>>   From looking at code, I can see that index availability is based on
> >>>> the TextQuery.textIndex symbol in the execution context
> >>>> (TextQueryPF.java). This means that, as long as at least one service
> >>>> enabled text indexing on a dataset, any other services referencing the
> >>>> same underlying store will also use it.
> >>>> (Judging by comments in the code, the "instanceof DatasetGraphText"
> >>>> check is deprecated, even if the logic for now remains in
> >>>> chooseTextIndex()).
> >>>>
> >>>> So our questions are:
> >>>>
> >>>> I) Is it currently possible to disallow access to the text index for
> >>>> some services but not others (using the same underlying dataset)?
> >>>
> >>> Should be - see above.
> >>>
> >>>> II) If not, what might be best approach to implement such a
> >>>> restriction? (Would traversal of DatasetGraphWrapper to explicitly
> >>>> find a DatasetGraphText instance make sense?)
> >>>> III) Or: Is there a different/better approach to solve the index
> >>>> visibility need described above?
> >>>>
> >>>> In addition, regarding spatial lookups:
> >>>> IV) Would GeoSPARQL querying (and it's online caching) respect
> >>>> AccessControlledDataset restrictions (when querying is performed over
> >>>> multiple services with different levels of ACL)?
> >>>
> >>> The GeoSPARQL cache is like the text index - not request principal
> >>> sensitive. (see caveat!)
> >>>
> >>>> Regards,
> >>>> Vilnis
> >>>
> >>>       Andy
> >>
> >>
> >>
> >> --
> >> Vilnis Termanis
> >> Senior Software Developer
> >>
> >> m | +44 (0) 7521 012309
> >> e | vilnis.termanis@iotics.com
> >> www.iotics.com
> >>
> >> The information contained in this email is strictly confidential and
> >> intended only for the parties noted. If this email was not intended
> >> for your use, please contact Iotics. For more on our Privacy Policy
> >> please visit https://www.iotics.com/legal/
> >
> >
> >
>


-- 
Vilnis Termanis
Technical Specialist

e | vilnis.termanis@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended for
your use, please contact Iotics. For more on our Privacy Policy please
visit https://www.iotics.com/legal/

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Andy Seaborne <an...@apache.org>.


On 14/04/2022 20:57, Vilnis Termanis wrote:
> Hi,
> 
> Specifying the following in each service (which is to ignore text
> indexing) now works (tested against 4.4.0):
> 
> ja:context [ ja:cxtName "http://jena.apache.org/text#index" ;
> ja:cxtValue false ] ;

Doesn't that cause warnings in the Fuseki log?

Is the stack of datasets still the same as earlier?

If you are saying that is necessary that is necessary, it looks like the 
text context contaminates the base dataset but the fix may break the 
reverse case (if anyone uses it) of direct access to the storage DB. The 
fix isn't a quick one, but it so happens the code (a version of Context 
that keeps changes does exists albeit not in that codebase).

> 
> ... but not if the associated dataset is an AccessControlledDataset.
> 
>  From my understanding, the issue is to do with the fact that
> fuseki-access uses QueryExecutionFactory whilst fuseki-core uses
> QueryExecDatasetBuilder. The latter takes the HttpAction's context
> into account (which presumably leads to the inclusion of the service
> context values) while the former does not. (In addition, it would
> appear that fuseki-access does not honour query-specific timeouts due
> to a similar reason.)

Timeouts ignored because the context is skipped?

> This patch seems to fix the issue:
> https://github.com/vtermanis/jena/commit/e5cb112f829f305c1f76c8f5305f4394d8e9b04f

Would you be able to turn that into a PR?

> I know that this most likely is not the right way to address it.
> (Should fuseki-access re-use some common QueryExecution building code
> from fuseki-core?) I also wasn't sure how to add an automated (to
> jena-integration-tests or with mocking in fuseki-access?) test-case
> for this, but I can provide minimal manual steps.
> 
> Should I create a Jira ticket for this?

There are several "this" here.

> Regards,
> Vilnis

     Andy

> 
> 
> On Tue, 12 Apr 2022 at 21:44, Vilnis Termanis
> <vi...@iotics.com> wrote:
>>
>> Hi Andy,
>>
>> Thank you for the suggestion of in-config context overrides - I had
>> not realised that was possible (with the newer style of defining
>> Fuseki services) - that's really useful.
>> We'll re-rest the aforementioned 2b & 3b cases.
>>
>> Regards,
>> Vilnis
>>
>> On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Hi Vilnis,
>>>
>>> On 07/04/2022 11:10, Vilnis Termanis wrote:
>>>> Hi,
>>>>
>>>> In brief: Can Fuseki Data ACL be applied to text indexing?
>>>
>>> As a general point - a text index itself is not ACL aware. It is setup
>>> ahead of time and does not index triples directly. The GeoSPARQL cache
>>> is probably similar (I'm less familiar with the GeoSPARQL code).
>>>
>>> When the query is under the control of a trusted client, the pattern:
>>>
>>> WHERE {
>>>       ?s a ex:Product ;
>>>          text:query (rdfs:label 'printer') ;
>>>          rdfs:label ?lbl
>>> }
>>>
>>> can be check of the triple.
>>>
>>> If the query isn't controlled, then that won't work.
>>>
>>> (Has your usage style changed in the last year?)
>>>
>>>> And is it
>>>> possible to selectively expose text index access per service for a
>>>> shared dataset?
>>>
>>> Yes.
>>>
>>> The context setting can be set per dataset, per service or per endpoint
>>> with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;
>>>
>>> E.g.
>>>       fuseki:endpoint [
>>>           fuseki:operation fuseki:query ;
>>>           fuseki:name "sparql"
>>>           ja:context [
>>>              ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
>>>           ] ;
>>>       ] ;
>>>
>>>>
>>>> In detail:
>>>>
>>>> We're using a single TDB dataset in unionDefaultGraph mode) with
>>>> multiple services, wrapped with both ACL (AccessControlledDataset) as
>>>> well as text indexing (TextDataset) and are hoping to provide the
>>>> following Fuseki services:
>>>>
>>>> 1. "full access" - a) Read/write everything b) including text index
>>>> 2. "selected graphs only" - a) Read only from selected graphs b) no index access
>>>> 3. "read all" - a) Read everything b) no index access
>>>>
>>>> In the assembler configuration, datasets for the above services are
>>>> respectively defined as (where all use the same underlying dataset):
>>>> 1. TextDataset(DatasetTDB)
>>>> 2. AccessControlledDataset(DatasetTDB)
>>>> 3. DatasetTDB
>>>>
>>>> 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
>>>> access to text indexing, despite not being explicitly configured as
>>>> such in their respective services.
>>>
>>> re: 2b/3b: That could be a bug or a configuration error.
>>>
>>> The context value is set on the text dataset. So if the server
>>> configuration has a service that does not go through the text dataset,
>>> the index should not be visible. There will be an entry in the server log.
>>>
>>> You don't actually need the DatasetGraphText if the index is only read
>>> (i.e. preloaded and no runtime updates).
>>>
>>>>   From looking at code, I can see that index availability is based on
>>>> the TextQuery.textIndex symbol in the execution context
>>>> (TextQueryPF.java). This means that, as long as at least one service
>>>> enabled text indexing on a dataset, any other services referencing the
>>>> same underlying store will also use it.
>>>> (Judging by comments in the code, the "instanceof DatasetGraphText"
>>>> check is deprecated, even if the logic for now remains in
>>>> chooseTextIndex()).
>>>>
>>>> So our questions are:
>>>>
>>>> I) Is it currently possible to disallow access to the text index for
>>>> some services but not others (using the same underlying dataset)?
>>>
>>> Should be - see above.
>>>
>>>> II) If not, what might be best approach to implement such a
>>>> restriction? (Would traversal of DatasetGraphWrapper to explicitly
>>>> find a DatasetGraphText instance make sense?)
>>>> III) Or: Is there a different/better approach to solve the index
>>>> visibility need described above?
>>>>
>>>> In addition, regarding spatial lookups:
>>>> IV) Would GeoSPARQL querying (and it's online caching) respect
>>>> AccessControlledDataset restrictions (when querying is performed over
>>>> multiple services with different levels of ACL)?
>>>
>>> The GeoSPARQL cache is like the text index - not request principal
>>> sensitive. (see caveat!)
>>>
>>>> Regards,
>>>> Vilnis
>>>
>>>       Andy
>>
>>
>>
>> --
>> Vilnis Termanis
>> Senior Software Developer
>>
>> m | +44 (0) 7521 012309
>> e | vilnis.termanis@iotics.com
>> www.iotics.com
>>
>> The information contained in this email is strictly confidential and
>> intended only for the parties noted. If this email was not intended
>> for your use, please contact Iotics. For more on our Privacy Policy
>> please visit https://www.iotics.com/legal/
> 
> 
>

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Vilnis Termanis <vi...@iotics.com.INVALID>.

Hi,

Specifying the following in each service (which is to ignore text
indexing) now works (tested against 4.4.0):

ja:context [ ja:cxtName "http://jena.apache.org/text#index" ;
ja:cxtValue false ] ;

... but not if the associated dataset is an AccessControlledDataset.

From my understanding, the issue is to do with the fact that
fuseki-access uses QueryExecutionFactory whilst fuseki-core uses
QueryExecDatasetBuilder. The latter takes the HttpAction's context
into account (which presumably leads to the inclusion of the service
context values) while the former does not. (In addition, it would
appear that fuseki-access does not honour query-specific timeouts due
to a similar reason.)

This patch seems to fix the issue:
https://github.com/vtermanis/jena/commit/e5cb112f829f305c1f76c8f5305f4394d8e9b04f

I know that this most likely is not the right way to address it.
(Should fuseki-access re-use some common QueryExecution building code
from fuseki-core?) I also wasn't sure how to add an automated (to
jena-integration-tests or with mocking in fuseki-access?) test-case
for this, but I can provide minimal manual steps.

Should I create a Jira ticket for this?

Regards,
Vilnis


On Tue, 12 Apr 2022 at 21:44, Vilnis Termanis
<vi...@iotics.com> wrote:
>
> Hi Andy,
>
> Thank you for the suggestion of in-config context overrides - I had
> not realised that was possible (with the newer style of defining
> Fuseki services) - that's really useful.
> We'll re-rest the aforementioned 2b & 3b cases.
>
> Regards,
> Vilnis
>
> On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <an...@apache.org> wrote:
> >
> > Hi Vilnis,
> >
> > On 07/04/2022 11:10, Vilnis Termanis wrote:
> > > Hi,
> > >
> > > In brief: Can Fuseki Data ACL be applied to text indexing?
> >
> > As a general point - a text index itself is not ACL aware. It is setup
> > ahead of time and does not index triples directly. The GeoSPARQL cache
> > is probably similar (I'm less familiar with the GeoSPARQL code).
> >
> > When the query is under the control of a trusted client, the pattern:
> >
> > WHERE {
> >      ?s a ex:Product ;
> >         text:query (rdfs:label 'printer') ;
> >         rdfs:label ?lbl
> > }
> >
> > can be check of the triple.
> >
> > If the query isn't controlled, then that won't work.
> >
> > (Has your usage style changed in the last year?)
> >
> > > And is it
> > > possible to selectively expose text index access per service for a
> > > shared dataset?
> >
> > Yes.
> >
> > The context setting can be set per dataset, per service or per endpoint
> > with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;
> >
> > E.g.
> >      fuseki:endpoint [
> >          fuseki:operation fuseki:query ;
> >          fuseki:name "sparql"
> >          ja:context [
> >             ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
> >          ] ;
> >      ] ;
> >
> > >
> > > In detail:
> > >
> > > We're using a single TDB dataset in unionDefaultGraph mode) with
> > > multiple services, wrapped with both ACL (AccessControlledDataset) as
> > > well as text indexing (TextDataset) and are hoping to provide the
> > > following Fuseki services:
> > >
> > > 1. "full access" - a) Read/write everything b) including text index
> > > 2. "selected graphs only" - a) Read only from selected graphs b) no index access
> > > 3. "read all" - a) Read everything b) no index access
> > >
> > > In the assembler configuration, datasets for the above services are
> > > respectively defined as (where all use the same underlying dataset):
> > > 1. TextDataset(DatasetTDB)
> > > 2. AccessControlledDataset(DatasetTDB)
> > > 3. DatasetTDB
> > >
> > > 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
> > > access to text indexing, despite not being explicitly configured as
> > > such in their respective services.
> >
> > re: 2b/3b: That could be a bug or a configuration error.
> >
> > The context value is set on the text dataset. So if the server
> > configuration has a service that does not go through the text dataset,
> > the index should not be visible. There will be an entry in the server log.
> >
> > You don't actually need the DatasetGraphText if the index is only read
> > (i.e. preloaded and no runtime updates).
> >
> > >  From looking at code, I can see that index availability is based on
> > > the TextQuery.textIndex symbol in the execution context
> > > (TextQueryPF.java). This means that, as long as at least one service
> > > enabled text indexing on a dataset, any other services referencing the
> > > same underlying store will also use it.
> > > (Judging by comments in the code, the "instanceof DatasetGraphText"
> > > check is deprecated, even if the logic for now remains in
> > > chooseTextIndex()).
> > >
> > > So our questions are:
> > >
> > > I) Is it currently possible to disallow access to the text index for
> > > some services but not others (using the same underlying dataset)?
> >
> > Should be - see above.
> >
> > > II) If not, what might be best approach to implement such a
> > > restriction? (Would traversal of DatasetGraphWrapper to explicitly
> > > find a DatasetGraphText instance make sense?)
> > > III) Or: Is there a different/better approach to solve the index
> > > visibility need described above?
> > >
> > > In addition, regarding spatial lookups:
> > > IV) Would GeoSPARQL querying (and it's online caching) respect
> > > AccessControlledDataset restrictions (when querying is performed over
> > > multiple services with different levels of ACL)?
> >
> > The GeoSPARQL cache is like the text index - not request principal
> > sensitive. (see caveat!)
> >
> > > Regards,
> > > Vilnis
> >
> >      Andy
>
>
>
> --
> Vilnis Termanis
> Senior Software Developer
>
> m | +44 (0) 7521 012309
> e | vilnis.termanis@iotics.com
> www.iotics.com
>
> The information contained in this email is strictly confidential and
> intended only for the parties noted. If this email was not intended
> for your use, please contact Iotics. For more on our Privacy Policy
> please visit https://www.iotics.com/legal/



-- 
Vilnis Termanis
Senior Software Developer

e | vilnis.termanis@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Vilnis Termanis <vi...@iotics.com.INVALID>.

Hi Andy,

Thank you for the suggestion of in-config context overrides - I had
not realised that was possible (with the newer style of defining
Fuseki services) - that's really useful.
We'll re-rest the aforementioned 2b & 3b cases.

Regards,
Vilnis

On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <an...@apache.org> wrote:
>
> Hi Vilnis,
>
> On 07/04/2022 11:10, Vilnis Termanis wrote:
> > Hi,
> >
> > In brief: Can Fuseki Data ACL be applied to text indexing?
>
> As a general point - a text index itself is not ACL aware. It is setup
> ahead of time and does not index triples directly. The GeoSPARQL cache
> is probably similar (I'm less familiar with the GeoSPARQL code).
>
> When the query is under the control of a trusted client, the pattern:
>
> WHERE {
>      ?s a ex:Product ;
>         text:query (rdfs:label 'printer') ;
>         rdfs:label ?lbl
> }
>
> can be check of the triple.
>
> If the query isn't controlled, then that won't work.
>
> (Has your usage style changed in the last year?)
>
> > And is it
> > possible to selectively expose text index access per service for a
> > shared dataset?
>
> Yes.
>
> The context setting can be set per dataset, per service or per endpoint
> with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;
>
> E.g.
>      fuseki:endpoint [
>          fuseki:operation fuseki:query ;
>          fuseki:name "sparql"
>          ja:context [
>             ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
>          ] ;
>      ] ;
>
> >
> > In detail:
> >
> > We're using a single TDB dataset in unionDefaultGraph mode) with
> > multiple services, wrapped with both ACL (AccessControlledDataset) as
> > well as text indexing (TextDataset) and are hoping to provide the
> > following Fuseki services:
> >
> > 1. "full access" - a) Read/write everything b) including text index
> > 2. "selected graphs only" - a) Read only from selected graphs b) no index access
> > 3. "read all" - a) Read everything b) no index access
> >
> > In the assembler configuration, datasets for the above services are
> > respectively defined as (where all use the same underlying dataset):
> > 1. TextDataset(DatasetTDB)
> > 2. AccessControlledDataset(DatasetTDB)
> > 3. DatasetTDB
> >
> > 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
> > access to text indexing, despite not being explicitly configured as
> > such in their respective services.
>
> re: 2b/3b: That could be a bug or a configuration error.
>
> The context value is set on the text dataset. So if the server
> configuration has a service that does not go through the text dataset,
> the index should not be visible. There will be an entry in the server log.
>
> You don't actually need the DatasetGraphText if the index is only read
> (i.e. preloaded and no runtime updates).
>
> >  From looking at code, I can see that index availability is based on
> > the TextQuery.textIndex symbol in the execution context
> > (TextQueryPF.java). This means that, as long as at least one service
> > enabled text indexing on a dataset, any other services referencing the
> > same underlying store will also use it.
> > (Judging by comments in the code, the "instanceof DatasetGraphText"
> > check is deprecated, even if the logic for now remains in
> > chooseTextIndex()).
> >
> > So our questions are:
> >
> > I) Is it currently possible to disallow access to the text index for
> > some services but not others (using the same underlying dataset)?
>
> Should be - see above.
>
> > II) If not, what might be best approach to implement such a
> > restriction? (Would traversal of DatasetGraphWrapper to explicitly
> > find a DatasetGraphText instance make sense?)
> > III) Or: Is there a different/better approach to solve the index
> > visibility need described above?
> >
> > In addition, regarding spatial lookups:
> > IV) Would GeoSPARQL querying (and it's online caching) respect
> > AccessControlledDataset restrictions (when querying is performed over
> > multiple services with different levels of ACL)?
>
> The GeoSPARQL cache is like the text index - not request principal
> sensitive. (see caveat!)
>
> > Regards,
> > Vilnis
>
>      Andy



-- 
Vilnis Termanis
Senior Software Developer

m | +44 (0) 7521 012309
e | vilnis.termanis@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Posted by Andy Seaborne <an...@apache.org>.

Hi Vilnis,

On 07/04/2022 11:10, Vilnis Termanis wrote:
> Hi,
> 
> In brief: Can Fuseki Data ACL be applied to text indexing?

As a general point - a text index itself is not ACL aware. It is setup 
ahead of time and does not index triples directly. The GeoSPARQL cache 
is probably similar (I'm less familiar with the GeoSPARQL code).

When the query is under the control of a trusted client, the pattern:

WHERE {
     ?s a ex:Product ;
        text:query (rdfs:label 'printer') ;
        rdfs:label ?lbl
}

can be check of the triple.

If the query isn't controlled, then that won't work.

(Has your usage style changed in the last year?)

> And is it
> possible to selectively expose text index access per service for a
> shared dataset?

Yes.

The context setting can be set per dataset, per service or per endpoint 
with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;

E.g.
     fuseki:endpoint [
         fuseki:operation fuseki:query ;
         fuseki:name "sparql"
         ja:context [
            ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
         ] ;
     ] ;

> 
> In detail:
> 
> We're using a single TDB dataset in unionDefaultGraph mode) with
> multiple services, wrapped with both ACL (AccessControlledDataset) as
> well as text indexing (TextDataset) and are hoping to provide the
> following Fuseki services:
> 
> 1. "full access" - a) Read/write everything b) including text index
> 2. "selected graphs only" - a) Read only from selected graphs b) no index access
> 3. "read all" - a) Read everything b) no index access
> 
> In the assembler configuration, datasets for the above services are
> respectively defined as (where all use the same underlying dataset):
> 1. TextDataset(DatasetTDB)
> 2. AccessControlledDataset(DatasetTDB)
> 3. DatasetTDB
> 
> 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
> access to text indexing, despite not being explicitly configured as
> such in their respective services.

re: 2b/3b: That could be a bug or a configuration error.

The context value is set on the text dataset. So if the server 
configuration has a service that does not go through the text dataset, 
the index should not be visible. There will be an entry in the server log.

You don't actually need the DatasetGraphText if the index is only read 
(i.e. preloaded and no runtime updates).

>  From looking at code, I can see that index availability is based on
> the TextQuery.textIndex symbol in the execution context
> (TextQueryPF.java). This means that, as long as at least one service
> enabled text indexing on a dataset, any other services referencing the
> same underlying store will also use it.
> (Judging by comments in the code, the "instanceof DatasetGraphText"
> check is deprecated, even if the logic for now remains in
> chooseTextIndex()).
> 
> So our questions are:
> 
> I) Is it currently possible to disallow access to the text index for
> some services but not others (using the same underlying dataset)?

Should be - see above.

> II) If not, what might be best approach to implement such a
> restriction? (Would traversal of DatasetGraphWrapper to explicitly
> find a DatasetGraphText instance make sense?)
> III) Or: Is there a different/better approach to solve the index
> visibility need described above?
> 
> In addition, regarding spatial lookups:
> IV) Would GeoSPARQL querying (and it's online caching) respect
> AccessControlledDataset restrictions (when querying is performed over
> multiple services with different levels of ACL)?

The GeoSPARQL cache is like the text index - not request principal 
sensitive. (see caveat!)

> Regards,
> Vilnis

     Andy