You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2015/07/01 06:27:28 UTC

Re: Query parameterization.

Hi Andy,

this looks great, and is just in time for the ongoing discussions in the 
SHACL group. I apologize in advance for not having the bandwidth yet to 
try this out from your branch, but this topic will definitely bubble up 
in the priorities soon...

I have not fully understood how the semantics of this are different from 
the setInitialBinding feature that we currently use in SPIN, and which 
seems to do a pretty good job. However, having a facility to do further 
pre-processing in advance may improve performance and provide a more 
formal definition of what setInitialBinding is doing. I am personally 
not enthusiastic about approaches based on text-substitution, so working 
on the parsed syntax tree looks good to me. There are some (rare) cases 
where text-substitution would be more powerful, e.g. dynamic path 
properties and some solution modifiers, but as you say no approach is 
perfect.

Questions:

- would this also pre-bind variables inside of nested SELECTs?
- I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
- What about bound(?var) and ?var is pre-bound?

Thanks
Holger


On 6/28/15 8:08 PM, Andy Seaborne wrote:
> (info / discussion / ...)
>
> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL 
> modifiers for GROUP BY), it was easier/better to add the code I had 
> for rewriting syntax by transformation, much like the algebra is 
> rewritten by the optimizer.  The use case is rewriting the output of 
> OpAsQuery to remove unnecessary nesting of levels of "{}" which arise 
> during translation for the safety of the translation.
>
> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general 
> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>
> It is also capable of being a parameterized query system (PQ).  We 
> already ParameterizedSparqlString (PSS) so how do they compare?
>
> Work-in-progress:
>
> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java 
>
>
> PQ is a rewrite of a Query object (the template) with a map of 
> variables to constants. That is, it works on the syntax tree after 
> parsing and produces a syntax tree.
>
> PSS is a builder with substitution. It builds a string, carefully 
> (injection attacks) and is neutral as to what it is working with - 
> query or update or something weird.
> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html 
>
>
> Summary:
>
> PQ is only for replacement of a variable in a template.
> PSS is a builder that can do that as part of building.
>
> PQ covers cases PSS doesn't - neither is perfect.
>
> PSS works with INSERT DATA.
> PQ would use the INSERT { ... } WHERE {} form.
>
> Details:
>
> PSS:
>   Can build query, update strings and fragments
>   Supports JDBC style positional parameters (a '?')
>     These must be bound to get a valid query.
>     Can generate illegal syntax.
>   Tests the type of the injected value (string, iri, double etc).
>   Has corner cases
>      Looks for ?x as a string so ...
>        "This is not a ?x as a variable"
>        <http://example/foo?x=123>
>        "SELECT ?x"
>        ns:local\?x (a legal local part)
>   Protects against injection by checking.
>   Works on INSERT DATA.
>
> PQ:
>   Replaces SPARQL variables where identified as variables.
>     (no extra-syntax positional '?')
>   Legal query to legal syntax query.
>     The query may violate scope rules (example below).
>     Not a query builder.
>   Post parser, so no reparsing to use the query
>     (for large updates and queries)
>   Injection is meaningless - can only inject values, not syntax.
>   Can rewrite structurally: "SELECT ?x" => "SELECT  (:value AS ?x)"
>     which is useful to record the injection variables.
>   Works with "INSERT {?s ?p ?o } WHERE { }"
>
> PQ example:
>
>   Query template = QueryFactory.create(.. valid query ..) ;
>   Map<String, RDFNode> map = new HashMap<>() ;
>   map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
>   Query query = ParameterizedQuery.setVariables(template, map) ;
>
>
> A perfect system probably needs a "template language" which SPARQL 
> extended with a new "template variable" which is only allowed in 
> certain places in the query and must be bound before use.
>
> Some examples of hard templates:
>
> (1) Not variables:
> <http://example/foo?x=123>
> "This is not a ?x as a variable"
> ns:local\?x
>
> (2) Some places ?x can not be replaced with a value directly.
>    SELECT ?x { ?s ?p ?x }
>
>
>
> A possible output is:
>   SELECT  (:X AS ?x) { ?s ?p :X }
> which is nice as it record the substitution but it fails when nested 
> again.
>
> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>
> This is a bad query:
> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>
> (3) Other places:
> SELECT ?x { BIND(1 AS ?x) }
> SELECT ?x { VALUES ?x { 123 } }
>
>     Andy


Re: Query parameterization.

Posted by Andy Seaborne <an...@apache.org>.
On 01/07/15 05:27, Holger Knublauch wrote:
> Hi Andy,
>
> this looks great, and is just in time for the ongoing discussions in the
> SHACL group. I apologize in advance for not having the bandwidth yet to
> try this out from your branch, but this topic will definitely bubble up
> in the priorities soon...
>
> I have not fully understood how the semantics of this are different from
> the setInitialBinding feature that we currently use in SPIN, and which
> seems to do a pretty good job. However, having a facility to do further
> pre-processing in advance may improve performance and provide a more
> formal definition of what setInitialBinding is doing. I am personally
> not enthusiastic about approaches based on text-substitution, so working
> on the parsed syntax tree looks good to me. There are some (rare) cases
> where text-substitution would be more powerful, e.g. dynamic path
> properties

If you can insert compound syntax, then injection attacks need to be 
considered.

> and some solution modifiers, but as you say no approach is
> perfect.

Better done on the algebra?  Especially around SELECT clause as it is 
several modifiers in tangle.

(See recent OpAsQuery discussion and changes)

>
> Questions:
>
> - would this also pre-bind variables inside of nested SELECTs?

Yes (it's a choice - it could not do it with some analysis of the inner 
projection as it passes through).

> - I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?

Probably! (it's tricky and needs more testing)
...
Yes - the replacement with bnodes-are-variables in SPARQL is done during 
parsing and this is post parse (different to all string based approaches).

If the substituted query to turned into a string, it will beome a bnode 
in SPARQL which then reparses is a variable.  The printing code 
(specifically NodeToLabelMapBNode.asString) handles it and would need a 
tweak.

The <_:label> form would be better but needs implementing.

> - What about bound(?var) and ?var is pre-bound?

?var in bound(?var) is replaced (as ?var in all expressions).  This is 
syntax.

	Andy

>
> Thanks
> Holger
>
>
> On 6/28/15 8:08 PM, Andy Seaborne wrote:
>> (info / discussion / ...)
>>
>> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL
>> modifiers for GROUP BY), it was easier/better to add the code I had
>> for rewriting syntax by transformation, much like the algebra is
>> rewritten by the optimizer.  The use case is rewriting the output of
>> OpAsQuery to remove unnecessary nesting of levels of "{}" which arise
>> during translation for the safety of the translation.
>>
>> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
>> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>>
>> It is also capable of being a parameterized query system (PQ).  We
>> already ParameterizedSparqlString (PSS) so how do they compare?
>>
>> Work-in-progress:
>>
>> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java
>>
>>
>> PQ is a rewrite of a Query object (the template) with a map of
>> variables to constants. That is, it works on the syntax tree after
>> parsing and produces a syntax tree.
>>
>> PSS is a builder with substitution. It builds a string, carefully
>> (injection attacks) and is neutral as to what it is working with -
>> query or update or something weird.
>> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html
>>
>>
>> Summary:
>>
>> PQ is only for replacement of a variable in a template.
>> PSS is a builder that can do that as part of building.
>>
>> PQ covers cases PSS doesn't - neither is perfect.
>>
>> PSS works with INSERT DATA.
>> PQ would use the INSERT { ... } WHERE {} form.
>>
>> Details:
>>
>> PSS:
>>   Can build query, update strings and fragments
>>   Supports JDBC style positional parameters (a '?')
>>     These must be bound to get a valid query.
>>     Can generate illegal syntax.
>>   Tests the type of the injected value (string, iri, double etc).
>>   Has corner cases
>>      Looks for ?x as a string so ...
>>        "This is not a ?x as a variable"
>>        <http://example/foo?x=123>
>>        "SELECT ?x"
>>        ns:local\?x (a legal local part)
>>   Protects against injection by checking.
>>   Works on INSERT DATA.
>>
>> PQ:
>>   Replaces SPARQL variables where identified as variables.
>>     (no extra-syntax positional '?')
>>   Legal query to legal syntax query.
>>     The query may violate scope rules (example below).
>>     Not a query builder.
>>   Post parser, so no reparsing to use the query
>>     (for large updates and queries)
>>   Injection is meaningless - can only inject values, not syntax.
>>   Can rewrite structurally: "SELECT ?x" => "SELECT  (:value AS ?x)"
>>     which is useful to record the injection variables.
>>   Works with "INSERT {?s ?p ?o } WHERE { }"
>>
>> PQ example:
>>
>>   Query template = QueryFactory.create(.. valid query ..) ;
>>   Map<String, RDFNode> map = new HashMap<>() ;
>>   map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
>>   Query query = ParameterizedQuery.setVariables(template, map) ;
>>
>>
>> A perfect system probably needs a "template language" which SPARQL
>> extended with a new "template variable" which is only allowed in
>> certain places in the query and must be bound before use.
>>
>> Some examples of hard templates:
>>
>> (1) Not variables:
>> <http://example/foo?x=123>
>> "This is not a ?x as a variable"
>> ns:local\?x
>>
>> (2) Some places ?x can not be replaced with a value directly.
>>    SELECT ?x { ?s ?p ?x }
>>
>>
>>
>> A possible output is:
>>   SELECT  (:X AS ?x) { ?s ?p :X }
>> which is nice as it record the substitution but it fails when nested
>> again.
>>
>> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>>
>> This is a bad query:
>> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>>
>> (3) Other places:
>> SELECT ?x { BIND(1 AS ?x) }
>> SELECT ?x { VALUES ?x { 123 } }
>>
>>     Andy
>


Re: Query parameterization.

Posted by Andy Seaborne <an...@apache.org>.
On 03/07/15 09:35, Andy Seaborne wrote:
> On 01/07/15 07:17, Claude Warren wrote:
>> SelectBuilder sb = new SelectBuilder()
>>      .addVar( "*" )
>>      .addWhere( "?s", "?p", "?o" );
>> sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
>> "http://xmlns.com/foaf/0.1/Person"  ) ) ;Query q = sb.build();
>
> Hi Claude,
>
> Should that be one of
>    Var.alloc( "o" )
>    Var.alloc(Var.canonical("?o"))
>
> How does it compare to the corner cases in my first message?
>
>
> There is at least one injection attack:
>
> NodeFactory.createURI of
>
> "http://xmlns.com/foaf/0.1/Person> . ?s ?q <http://example/ns"
>
> because it is string inclusion, jena-querybuilder needs to do the same
> checks that ParametrizedSparqlString does for URI.  A check is needed on
> literals but a different kind of test.
>
> BTW:
>
> and how do I add
>
> OPTIONAL {
>     ?s <q> 123 .
>     ?s <v> ?x .
>     FILTER(?x>56)
> }
> ?
>
> And for UNION, there seems to be a confusion because it takes a
> SelectBuilder (a subquery) but that's an SQL-ism, not SPARQL.
>
> It seems to cause problems:
>
>          SelectBuilder sb = new SelectBuilder().addVar("*") ;
>          sb.addWhere("?s", "?p", "?o") ;
>          SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
>          sb1.addWhere("?s", "?p", "?o") ;
>          sb1.addUnion(sb1) ;
>          Query q1 = sb1.build() ;
>          String s1 = q1.toString() ;
>          System.out.println(s1) ;
>
> I get stack overflow.

Silly mistake on my part.

         SelectBuilder sb = new SelectBuilder().addVar("*") ;
         sb.addWhere("?s", "?p", "?o") ;
         SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
         sb1.addWhere("?s1", "?p1", "?o1") ;
         sb.addUnion(sb1) ;
         Query q1 = sb.build() ;
         String s1 = q1.toString() ;
         System.out.println(s1) ;


>
> UNION and OPTIONAL are similar - they take graph patterns.
>
But I now get this illegal query;

SELECT  *
WHERE
   { ?s  ?p  ?o
     UNION
       { SELECT  ?s ?p ?o
         WHERE
           { ?s  ?p  ?o }
       }
   }

which should be:

SELECT  *
WHERE
   { { ?s  ?p  ?o }
     UNION
       { SELECT  ?s ?p ?o
         WHERE
           { ?s  ?p  ?o }
       }
   }

each side of the UNION is a  ElementGroup.

>      Andy
>


Re: Query parameterization.

Posted by Andy Seaborne <an...@apache.org>.
On 01/07/15 07:17, Claude Warren wrote:
> SelectBuilder sb = new SelectBuilder()
>      .addVar( "*" )
>      .addWhere( "?s", "?p", "?o" );
> sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
> "http://xmlns.com/foaf/0.1/Person"  ) ) ;Query q = sb.build();

Hi Claude,

Should that be one of
   Var.alloc( "o" )
   Var.alloc(Var.canonical("?o"))

How does it compare to the corner cases in my first message?


There is at least one injection attack:

NodeFactory.createURI of

"http://xmlns.com/foaf/0.1/Person> . ?s ?q <http://example/ns"

because it is string inclusion, jena-querybuilder needs to do the same 
checks that ParametrizedSparqlString does for URI.  A check is needed on 
literals but a different kind of test.

BTW:

and how do I add

OPTIONAL {
    ?s <q> 123 .
    ?s <v> ?x .
    FILTER(?x>56)
}
?

And for UNION, there seems to be a confusion because it takes a 
SelectBuilder (a subquery) but that's an SQL-ism, not SPARQL.

It seems to cause problems:

         SelectBuilder sb = new SelectBuilder().addVar("*") ;
         sb.addWhere("?s", "?p", "?o") ;
         SelectBuilder sb1 = new SelectBuilder().addVar("*") ;
         sb1.addWhere("?s", "?p", "?o") ;
         sb1.addUnion(sb1) ;
         Query q1 = sb1.build() ;
         String s1 = q1.toString() ;
         System.out.println(s1) ;

I get stack overflow.

UNION and OPTIONAL are similar - they take graph patterns.

	Andy


Re: Query parameterization.

Posted by Claude Warren <cl...@xenei.com>.
The QueryBuilder also has parameterized variables of a type.

Basically you can construct the query with variables and then replace the
variable with a value by calling setVar()  just before calling build.

SelectBuilder sb = new SelectBuilder()
    .addVar( "*" )
    .addWhere( "?s", "?p", "?o" );
sb.setVar( Var.alloc( "?o" ), NodeFactory.createURI(
"http://xmlns.com/foaf/0.1/Person" ) ) ;Query q = sb.build();

produces

SELECT * WHERE
  { ?s ?p <http://xmlns.com/foaf/0.1/Person> }



On Wed, Jul 1, 2015 at 5:27 AM, Holger Knublauch <ho...@knublauch.com>
wrote:

> Hi Andy,
>
> this looks great, and is just in time for the ongoing discussions in the
> SHACL group. I apologize in advance for not having the bandwidth yet to try
> this out from your branch, but this topic will definitely bubble up in the
> priorities soon...
>
> I have not fully understood how the semantics of this are different from
> the setInitialBinding feature that we currently use in SPIN, and which
> seems to do a pretty good job. However, having a facility to do further
> pre-processing in advance may improve performance and provide a more formal
> definition of what setInitialBinding is doing. I am personally not
> enthusiastic about approaches based on text-substitution, so working on the
> parsed syntax tree looks good to me. There are some (rare) cases where
> text-substitution would be more powerful, e.g. dynamic path properties and
> some solution modifiers, but as you say no approach is perfect.
>
> Questions:
>
> - would this also pre-bind variables inside of nested SELECTs?
> - I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
> - What about bound(?var) and ?var is pre-bound?
>
> Thanks
> Holger
>
>
>
> On 6/28/15 8:08 PM, Andy Seaborne wrote:
>
>> (info / discussion / ...)
>>
>> In working on JENA-963 (OpAsQuery; reworked handling of SPARQL modifiers
>> for GROUP BY), it was easier/better to add the code I had for rewriting
>> syntax by transformation, much like the algebra is rewritten by the
>> optimizer.  The use case is rewriting the output of OpAsQuery to remove
>> unnecessary nesting of levels of "{}" which arise during translation for
>> the safety of the translation.
>>
>> Hence putting in package oaj.sparql.syntax.syntaxtransform, a general
>> framework for rewriting syntax, like we have for the SPARQL+ algebra.
>>
>> It is also capable of being a parameterized query system (PQ).  We
>> already ParameterizedSparqlString (PSS) so how do they compare?
>>
>> Work-in-progress:
>>
>>
>> https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java
>>
>> PQ is a rewrite of a Query object (the template) with a map of variables
>> to constants. That is, it works on the syntax tree after parsing and
>> produces a syntax tree.
>>
>> PSS is a builder with substitution. It builds a string, carefully
>> (injection attacks) and is neutral as to what it is working with - query or
>> update or something weird.
>>
>> http://jena.apache.org/documentation/query/parameterized-sparql-strings.html
>>
>> Summary:
>>
>> PQ is only for replacement of a variable in a template.
>> PSS is a builder that can do that as part of building.
>>
>> PQ covers cases PSS doesn't - neither is perfect.
>>
>> PSS works with INSERT DATA.
>> PQ would use the INSERT { ... } WHERE {} form.
>>
>> Details:
>>
>> PSS:
>>   Can build query, update strings and fragments
>>   Supports JDBC style positional parameters (a '?')
>>     These must be bound to get a valid query.
>>     Can generate illegal syntax.
>>   Tests the type of the injected value (string, iri, double etc).
>>   Has corner cases
>>      Looks for ?x as a string so ...
>>        "This is not a ?x as a variable"
>>        <http://example/foo?x=123>
>>        "SELECT ?x"
>>        ns:local\?x (a legal local part)
>>   Protects against injection by checking.
>>   Works on INSERT DATA.
>>
>> PQ:
>>   Replaces SPARQL variables where identified as variables.
>>     (no extra-syntax positional '?')
>>   Legal query to legal syntax query.
>>     The query may violate scope rules (example below).
>>     Not a query builder.
>>   Post parser, so no reparsing to use the query
>>     (for large updates and queries)
>>   Injection is meaningless - can only inject values, not syntax.
>>   Can rewrite structurally: "SELECT ?x" => "SELECT  (:value AS ?x)"
>>     which is useful to record the injection variables.
>>   Works with "INSERT {?s ?p ?o } WHERE { }"
>>
>> PQ example:
>>
>>   Query template = QueryFactory.create(.. valid query ..) ;
>>   Map<String, RDFNode> map = new HashMap<>() ;
>>   map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
>>   Query query = ParameterizedQuery.setVariables(template, map) ;
>>
>>
>> A perfect system probably needs a "template language" which SPARQL
>> extended with a new "template variable" which is only allowed in certain
>> places in the query and must be bound before use.
>>
>> Some examples of hard templates:
>>
>> (1) Not variables:
>> <http://example/foo?x=123>
>> "This is not a ?x as a variable"
>> ns:local\?x
>>
>> (2) Some places ?x can not be replaced with a value directly.
>>    SELECT ?x { ?s ?p ?x }
>>
>>
>>
>> A possible output is:
>>   SELECT  (:X AS ?x) { ?s ?p :X }
>> which is nice as it record the substitution but it fails when nested
>> again.
>>
>> SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }
>>
>> This is a bad query:
>> SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...
>>
>> (3) Other places:
>> SELECT ?x { BIND(1 AS ?x) }
>> SELECT ?x { VALUES ?x { 123 } }
>>
>>     Andy
>>
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren