You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2015/07/01 06:27:28 UTC
Re: Query parameterization.
Hi Andy,
this looks great, and is just in time for the ongoing discussions in the
SHACL group. I apologize in advance for not having the bandwidth yet to
try this out from your branch, but this topic will definitely bubble up
in the priorities soon...
I have not fully understood how the semantics of this are different from
the setInitialBinding feature that we currently use in SPIN, and which
seems to do a pretty good job. However, having a facility to do further
pre-processing in advance may improve performance and provide a more
formal definition of what setInitialBinding is doing. I am personally
not enthusiastic about approaches based on text-substitution, so working
on the parsed syntax tree looks good to me. There are some (rare) cases
where text-substitution would be more powerful, e.g. dynamic path
properties and some solution modifiers, but as you say no approach is
perfect.
Questions:
- would this also pre-bind variables inside of nested SELECTs?
- I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
- What about bound(?var) and ?var is pre-bound?
Thanks
Holger
On 6/28/15 8:08 PM, Andy Seaborne wrote:
> (info / discussion / ...)
>
> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL
> modifiers for GROUP BY), it was easier/better to add the code I had
> for rewriting syntax by transformation, much like the algebra is
> rewritten by the optimizer. The use case is rewriting the output of
> OpAsQuery to remove unnecessary nesting of levels of "{}" which arise
> during translation for the safety of the translation.
>
> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>
> It is also capable of being a parameterized query system (PQ). We
> already ParameterizedSparqlString (PSS) so how do they compare?
>
> Work-in-progress:
>
> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java
>
>
> PQ is a rewrite of a Query object (the template) with a map of
> variables to constants. That is, it works on the syntax tree after
> parsing and produces a syntax tree.
>
> PSS is a builder with substitution. It builds a string, carefully
> (injection attacks) and is neutral as to what it is working with -
> query or update or something weird.
> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html
>
>
> Summary:
>
> PQ is only for replacement of a variable in a template.
> PSS is a builder that can do that as part of building.
>
> PQ covers cases PSS doesn't - neither is perfect.
>
> PSS works with INSERT DATA.
> PQ would use the INSERT { ... } WHERE {} form.
>
> Details:
>
> PSS:
> Can build query, update strings and fragments
> Supports JDBC style positional parameters (a '?')
> These must be bound to get a valid query.
> Can generate illegal syntax.
> Tests the type of the injected value (string, iri, double etc).
> Has corner cases
> Looks for ?x as a string so ...
> "This is not a ?x as a variable"
> <http://example/foo?x=123>
> "SELECT ?x"
> ns:local\?x (a legal local part)
> Protects against injection by checking.
> Works on INSERT DATA.
>
> PQ:
> Replaces SPARQL variables where identified as variables.
> (no extra-syntax positional '?')
> Legal query to legal syntax query.
> The query may violate scope rules (example below).
> Not a query builder.
> Post parser, so no reparsing to use the query
> (for large updates and queries)
> Injection is meaningless - can only inject values, not syntax.
> Can rewrite structurally: "SELECT ?x" => "SELECT (:value AS ?x)"
> which is useful to record the injection variables.
> Works with "INSERT {?s ?p ?o } WHERE { }"
>
> PQ example:
>
> Query template = QueryFactory.create(.. valid query ..) ;
> Map<String, RDFNode> map = new HashMap<>() ;
> map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
> Query query = ParameterizedQuery.setVariables(template, map) ;
>
>
> A perfect system probably needs a "template language" which SPARQL
> extended with a new "template variable" which is only allowed in
> certain places in the query and must be bound before use.
>
> Some examples of hard templates:
>
> (1) Not variables:
> <http://example/foo?x=123>
> "This is not a ?x as a variable"
> ns:local\?x
>
> (2) Some places ?x can not be replaced with a value directly.
> SELECT ?x { ?s ?p ?x }
>
>
>
> A possible output is:
> SELECT (:X AS ?x) { ?s ?p :X }
> which is nice as it record the substitution but it fails when nested
> again.
>
> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>
> This is a bad query:
> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>
> (3) Other places:
> SELECT ?x { BIND(1 AS ?x) }
> SELECT ?x { VALUES ?x { 123 } }
>
> Andy
Re: Query parameterization.
Posted by Andy Seaborne <an...@apache.org>.
On 01/07/15 05:27, Holger Knublauch wrote:
> Hi Andy,
>
> this looks great, and is just in time for the ongoing discussions in the
> SHACL group. I apologize in advance for not having the bandwidth yet to
> try this out from your branch, but this topic will definitely bubble up
> in the priorities soon...
>
> I have not fully understood how the semantics of this are different from
> the setInitialBinding feature that we currently use in SPIN, and which
> seems to do a pretty good job. However, having a facility to do further
> pre-processing in advance may improve performance and provide a more
> formal definition of what setInitialBinding is doing. I am personally
> not enthusiastic about approaches based on text-substitution, so working
> on the parsed syntax tree looks good to me. There are some (rare) cases
> where text-substitution would be more powerful, e.g. dynamic path
> properties
If you can insert compound syntax, then injection attacks need to be
considered.
> and some solution modifiers, but as you say no approach is
> perfect.
Better done on the algebra? Especially around SELECT clause as it is
several modifiers in tangle.
(See recent OpAsQuery discussion and changes)
>
> Questions:
>
> - would this also pre-bind variables inside of nested SELECTs?
Yes (it's a choice - it could not do it with some analysis of the inner
projection as it passes through).
> - I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
Probably! (it's tricky and needs more testing)
...
Yes - the replacement with bnodes-are-variables in SPARQL is done during
parsing and this is post parse (different to all string based approaches).
If the substituted query to turned into a string, it will beome a bnode
in SPARQL which then reparses is a variable. The printing code
(specifically NodeToLabelMapBNode.asString) handles it and would need a
tweak.
The <_:label> form would be better but needs implementing.
> - What about bound(?var) and ?var is pre-bound?
?var in bound(?var) is replaced (as ?var in all expressions). This is
syntax.
Andy
>
> Thanks
> Holger
>
>
> On 6/28/15 8:08 PM, Andy Seaborne wrote:
>> (info / discussion / ...)
>>
>> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL
>> modifiers for GROUP BY), it was easier/better to add the code I had
>> for rewriting syntax by transformation, much like the algebra is
>> rewritten by the optimizer. The use case is rewriting the output of
>> OpAsQuery to remove unnecessary nesting of levels of "{}" which arise
>> during translation for the safety of the translation.
>>
>> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
>> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>>
>> It is also capable of being a parameterized query system (PQ). We
>> already ParameterizedSparqlString (PSS) so how do they compare?
>>
>> Work-in-progress:
>>
>> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java
>>
>>
>> PQ is a rewrite of a Query object (the template) with a map of
>> variables to constants. That is, it works on the syntax tree after
>> parsing and produces a syntax tree.
>>
>> PSS is a builder with substitution. It builds a string, carefully
>> (injection attacks) and is neutral as to what it is working with -
>> query or update or something weird.
>> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html
>>
>>
>> Summary:
>>
>> PQ is only for replacement of a variable in a template.
>> PSS is a builder that can do that as part of building.
>>
>> PQ covers cases PSS doesn't - neither is perfect.
>>
>> PSS works with INSERT DATA.
>> PQ would use the INSERT { ... } WHERE {} form.
>>
>> Details:
>>
>> PSS:
>> Can build query, update strings and fragments
>> Supports JDBC style positional parameters (a '?')
>> These must be bound to get a valid query.
>> Can generate illegal syntax.
>> Tests the type of the injected value (string, iri, double etc).
>> Has corner cases
>> Looks for ?x as a string so ...
>> "This is not a ?x as a variable"
>> <http://example/foo?x=123>
>> "SELECT ?x"
>> ns:local\?x (a legal local part)
>> Protects against injection by checking.
>> Works on INSERT DATA.
>>
>> PQ:
>> Replaces SPARQL variables where identified as variables.
>> (no extra-syntax positional '?')
>> Legal query to legal syntax query.
>> The query may violate scope rules (example below).
>> Not a query builder.
>> Post parser, so no reparsing to use the query
>> (for large updates and queries)
>> Injection is meaningless - can only inject values, not syntax.
>> Can rewrite structurally: "SELECT ?x" => "SELECT (:value AS ?x)"
>> which is useful to record the injection variables.
>> Works with "INSERT {?s ?p ?o } WHERE { }"
>>
>> PQ example:
>>
>> Query template = QueryFactory.create(.. valid query ..) ;
>> Map<String, RDFNode> map = new HashMap<>() ;
>> map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
>> Query query = ParameterizedQuery.setVariables(template, map) ;
>>
>>
>> A perfect system probably needs a "template language" which SPARQL
>> extended with a new "template variable" which is only allowed in
>> certain places in the query and must be bound before use.
>>
>> Some examples of hard templates:
>>
>> (1) Not variables:
>> <http://example/foo?x=123>
>> "This is not a ?x as a variable"
>> ns:local\?x
>>
>> (2) Some places ?x can not be replaced with a value directly.
>> SELECT ?x { ?s ?p ?x }
>>
>>
>>
>> A possible output is:
>> SELECT (:X AS ?x) { ?s ?p :X }
>> which is nice as it record the substitution but it fails when nested
>> again.
>>
>> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>>
>> This is a bad query:
>> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>>
>> (3) Other places:
>> SELECT ?x { BIND(1 AS ?x) }
>> SELECT ?x { VALUES ?x { 123 } }
>>
>> Andy
>
Re: Query parameterization.
Posted by Andy Seaborne <an...@apache.org>.
On 03/07/15 09:35, Andy Seaborne wrote:
> On 01/07/15 07:17, Claude Warren wrote:
>> SelectBuilder sb = new SelectBuilder()
>> .addVar( "*" )
>> .addWhere( "?s", "?p", "?o" );
>> sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
>> "http://xmlns.com/foaf/0.1/Person" ) ) ;Query q = sb.build();
>
> Hi Claude,
>
> Should that be one of
> Var.alloc( "o" )
> Var.alloc(Var.canonical("?o"))
>
> How does it compare to the corner cases in my first message?
>
>
> There is at least one injection attack:
>
> NodeFactory.createURI of
>
> "http://xmlns.com/foaf/0.1/Person> . ?s ?q <http://example/ns"
>
> because it is string inclusion, jena-querybuilder needs to do the same
> checks that ParametrizedSparqlString does for URI. A check is needed on
> literals but a different kind of test.
>
> BTW:
>
> and how do I add
>
> OPTIONAL {
> ?s <q> 123 .
> ?s <v> ?x .
> FILTER(?x>56)
> }
> ?
>
> And for UNION, there seems to be a confusion because it takes a
> SelectBuilder (a subquery) but that's an SQL-ism, not SPARQL.
>
> It seems to cause problems:
>
> SelectBuilder sb = new SelectBuilder().addVar("*") ;
> sb.addWhere("?s", "?p", "?o") ;
> SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
> sb1.addWhere("?s", "?p", "?o") ;
> sb1.addUnion(sb1) ;
> Query q1 = sb1.build() ;
> String s1 = q1.toString() ;
> System.out.println(s1) ;
>
> I get stack overflow.
Silly mistake on my part.
SelectBuilder sb = new SelectBuilder().addVar("*") ;
sb.addWhere("?s", "?p", "?o") ;
SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
sb1.addWhere("?s1", "?p1", "?o1") ;
sb.addUnion(sb1) ;
Query q1 = sb.build() ;
String s1 = q1.toString() ;
System.out.println(s1) ;
>
> UNION and OPTIONAL are similar - they take graph patterns.
>
But I now get this illegal query;
SELECT *
WHERE
{ ?s ?p ?o
UNION
{ SELECT ?s ?p ?o
WHERE
{ ?s ?p ?o }
}
}
which should be:
SELECT *
WHERE
{ { ?s ?p ?o }
UNION
{ SELECT ?s ?p ?o
WHERE
{ ?s ?p ?o }
}
}
each side of the UNION is a ElementGroup.
> Andy
>
Re: Query parameterization.
Posted by Andy Seaborne <an...@apache.org>.
On 01/07/15 07:17, Claude Warren wrote:
> SelectBuilder sb = new SelectBuilder()
> .addVar( "*" )
> .addWhere( "?s", "?p", "?o" );
> sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
> "http://xmlns.com/foaf/0.1/Person" ) ) ;Query q = sb.build();
Hi Claude,
Should that be one of
Var.alloc( "o" )
Var.alloc(Var.canonical("?o"))
How does it compare to the corner cases in my first message?
There is at least one injection attack:
NodeFactory.createURI of
"http://xmlns.com/foaf/0.1/Person> . ?s ?q <http://example/ns"
because it is string inclusion, jena-querybuilder needs to do the same
checks that ParametrizedSparqlString does for URI. A check is needed on
literals but a different kind of test.
BTW:
and how do I add
OPTIONAL {
?s <q> 123 .
?s <v> ?x .
FILTER(?x>56)
}
?
And for UNION, there seems to be a confusion because it takes a
SelectBuilder (a subquery) but that's an SQL-ism, not SPARQL.
It seems to cause problems:
SelectBuilder sb = new SelectBuilder().addVar("*") ;
sb.addWhere("?s", "?p", "?o") ;
SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
sb1.addWhere("?s", "?p", "?o") ;
sb1.addUnion(sb1) ;
Query q1 = sb1.build() ;
String s1 = q1.toString() ;
System.out.println(s1) ;
I get stack overflow.
UNION and OPTIONAL are similar - they take graph patterns.
Andy
Re: Query parameterization.
Posted by Claude Warren <cl...@xenei.com>.
The QueryBuilder also has parameterized variables of a type.
Basically you can construct the query with variables and then replace the
variable with a value by calling setVar() just before calling build.
SelectBuilder sb = new SelectBuilder()
.addVar( "*" )
.addWhere( "?s", "?p", "?o" );
sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
"http://xmlns.com/foaf/0.1/Person" ) ) ;Query q = sb.build();
produces
SELECT * WHERE
{ ?s ?p <http://xmlns.com/foaf/0.1/Person> }
On Wed, Jul 1, 2015 at 5:27 AM, Holger Knublauch <ho...@knublauch.com>
wrote:
> Hi Andy,
>
> this looks great, and is just in time for the ongoing discussions in the
> SHACL group. I apologize in advance for not having the bandwidth yet to try
> this out from your branch, but this topic will definitely bubble up in the
> priorities soon...
>
> I have not fully understood how the semantics of this are different from
> the setInitialBinding feature that we currently use in SPIN, and which
> seems to do a pretty good job. However, having a facility to do further
> pre-processing in advance may improve performance and provide a more formal
> definition of what setInitialBinding is doing. I am personally not
> enthusiastic about approaches based on text-substitution, so working on the
> parsed syntax tree looks good to me. There are some (rare) cases where
> text-substitution would be more powerful, e.g. dynamic path properties and
> some solution modifiers, but as you say no approach is perfect.
>
> Questions:
>
> - would this also pre-bind variables inside of nested SELECTs?
> - I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
> - What about bound(?var) and ?var is pre-bound?
>
> Thanks
> Holger
>
>
>
> On 6/28/15 8:08 PM, Andy Seaborne wrote:
>
>> (info / discussion / ...)
>>
>> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL modifiers
>> for GROUP BY), it was easier/better to add the code I had for rewriting
>> syntax by transformation, much like the algebra is rewritten by the
>> optimizer. The use case is rewriting the output of OpAsQuery to remove
>> unnecessary nesting of levels of "{}" which arise during translation for
>> the safety of the translation.
>>
>> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
>> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>>
>> It is also capable of being a parameterized query system (PQ). We
>> already ParameterizedSparqlString (PSS) so how do they compare?
>>
>> Work-in-progress:
>>
>>
>> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java
>>
>> PQ is a rewrite of a Query object (the template) with a map of variables
>> to constants. That is, it works on the syntax tree after parsing and
>> produces a syntax tree.
>>
>> PSS is a builder with substitution. It builds a string, carefully
>> (injection attacks) and is neutral as to what it is working with - query or
>> update or something weird.
>>
>> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html
>>
>> Summary:
>>
>> PQ is only for replacement of a variable in a template.
>> PSS is a builder that can do that as part of building.
>>
>> PQ covers cases PSS doesn't - neither is perfect.
>>
>> PSS works with INSERT DATA.
>> PQ would use the INSERT { ... } WHERE {} form.
>>
>> Details:
>>
>> PSS:
>> Can build query, update strings and fragments
>> Supports JDBC style positional parameters (a '?')
>> These must be bound to get a valid query.
>> Can generate illegal syntax.
>> Tests the type of the injected value (string, iri, double etc).
>> Has corner cases
>> Looks for ?x as a string so ...
>> "This is not a ?x as a variable"
>> <http://example/foo?x=123>
>> "SELECT ?x"
>> ns:local\?x (a legal local part)
>> Protects against injection by checking.
>> Works on INSERT DATA.
>>
>> PQ:
>> Replaces SPARQL variables where identified as variables.
>> (no extra-syntax positional '?')
>> Legal query to legal syntax query.
>> The query may violate scope rules (example below).
>> Not a query builder.
>> Post parser, so no reparsing to use the query
>> (for large updates and queries)
>> Injection is meaningless - can only inject values, not syntax.
>> Can rewrite structurally: "SELECT ?x" => "SELECT (:value AS ?x)"
>> which is useful to record the injection variables.
>> Works with "INSERT {?s ?p ?o } WHERE { }"
>>
>> PQ example:
>>
>> Query template = QueryFactory.create(.. valid query ..) ;
>> Map<String, RDFNode> map = new HashMap<>() ;
>> map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
>> Query query = ParameterizedQuery.setVariables(template, map) ;
>>
>>
>> A perfect system probably needs a "template language" which SPARQL
>> extended with a new "template variable" which is only allowed in certain
>> places in the query and must be bound before use.
>>
>> Some examples of hard templates:
>>
>> (1) Not variables:
>> <http://example/foo?x=123>
>> "This is not a ?x as a variable"
>> ns:local\?x
>>
>> (2) Some places ?x can not be replaced with a value directly.
>> SELECT ?x { ?s ?p ?x }
>>
>>
>>
>> A possible output is:
>> SELECT (:X AS ?x) { ?s ?p :X }
>> which is nice as it record the substitution but it fails when nested
>> again.
>>
>> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>>
>> This is a bad query:
>> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>>
>> (3) Other places:
>> SELECT ?x { BIND(1 AS ?x) }
>> SELECT ?x { VALUES ?x { 123 } }
>>
>> Andy
>>
>
>
--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren