You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Stephen Allen <sa...@apache.org> on 2012/11/29 17:07:55 UTC

SPARQL Update 1.1 Grammar

All,

Some thoughts on Update grammar.  Would love to get other's opinions
before posting a comment to the SPARQL WG.

I have been attempting to implement streaming updates in Jena.  Part
of that effort has been rewriting recursive grammar definitions to
have no recursion (thus avoiding a stack overflow when there are a
large number of Update operations in a single Update Request).  This
in some cases may mean an LL(2) parser.

The grammar for Update in the SPARQL 1.1 PR is as follows:

   [29] Update ::= Prologue ( Update1 ( ';' Update )? )?

This is currently implemented in our JavaCC parser as:

   Prologue() (Update1() ( <SEMICOLON> Update() )? )?

Unfortunately, the best I non-recursive solution was able to come up with was:

   Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
Update1() )* ( <SEMICOLON> )? )?

This is *almost* equivalent to the grammar in the spec, except for one
detail: it does not allow a trailing Prologue(), which the recursive
definition allows.  I can't seem to get any closer, mainly due to that
optional semicolon and optional trailing prologue (although you cannot
have a trailing semicolon if you have a lone trailing prologue).

The more I look at the problem, the more I tend to think that maybe
the spec's Update grammar is faulty.  I believe it should not allow
trailing prologues.  It also should not allow just a prologue and
nothing else (Query forbids this).  Examples of queries that I think
should be invalid (but are not currently):

==========
PREFIX : <http://example.org/>
==========
PREFIX : <http://example.org/>
insert data { } ;
PREFIX : <http://example.org/>
==========

Additionally, I would argue that the text of the Update spec [1]
contradicts the existing grammar.  Specifically the definition in
section 3:
    "A request is a sequence of operations and is terminated by
    EOF (End of File). Multiple operations are separated by a ';'
    (semicolon) character. A semicolon after the last operation
    in a request is optional."

A prologue by itself is not an operation as defined in section 4.3 [2].

I would propose to the working group that we instead adopt the
following grammar:

   [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )?

This could be easily represented in JavaCC as:

   Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
Update1() )* ( <SEMICOLON> )?

The trailing semicolon seems to force us into using an LL(2) parser.
I cannot see a way to write this grammar in LL(1).

I have three questions that would be nice to have answered before I
post a comment to the WG:

1) Is there a non-recursive way to write the existing rule 29 that
exactly matches the semantics of the spec?
2) Is there a way to write my proposed rule 29 as LL(1) (even if has
to use recursion)?
3) Would the RDF WG be open to changing the grammar at this point?  I
know it is in PR stage, but this would be feedback from attempting
implementation.

-Stephen

[1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage
[2] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate

Re: SPARQL Update 1.1 Grammar

Posted by Andy Seaborne <an...@apache.org>.

On 01/12/12 07:39, Stephen Allen wrote:
> On Fri, Nov 30, 2012 at 6:43 AM, Andy Seaborne <an...@apache.org> wrote:
>> On 29/11/12 23:13, Stephen Allen wrote:
>>>
>>> Andy,
>>>
>>> Thanks for the comments, very helpful.  It is unfortunate that I
>>> didn't look at this earlier in the year, but indeed my proposed change
>>> would be relatively minor to the end user, and I've figured out a
>>> solution staying within the current spec!
>>>
>>> More comments inline:
>>>
>>>
>>> On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> Hi Stephen,
>>>>
>>>> Adding requirements to the official SPARQL grammar by parsable in
>>>> particular
>>>> ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is it''s LL(1)
>>>> AKA
>>>> simple technically and that communicates.  People can implement the
>>>> language
>>>> in different ways and the grammar defines the language but it does not
>>>> prescribe the implementation.
>>>
>>>
>>> In hindsight I would have argued that the optional terminating
>>> semicolon should be eliminated to stay LL(1) without recursion.
>>
>>
>> It's there to make producing SPARQL Update syntax by (streaming!) program
>> easier - always put a SEMICOLON after an operation.
>>
>> Actually, I didn't want a separator at all - it's not necessary.  Oh well.
>>
>>
>>> I had been trying that to no avail because of the unlimited amount of
>>> lookahead required (which defeat the streaming effort).  Syntactic
>>> Lookahead to the rescue however, and I was able to write it as:
>>>
>>>     Prologue()
>>>     (
>>>       Update1()
>>>       (
>>>         // This syntactic lookahead is necessitated by the optional
>>> trailing semicolon and prologue
>>>         LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> |
>>> <ADD> |
>>>                      <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> |
>>> <INSERT> |
>>>                      <USING> | <INSERT_DATA> | <DELETE_DATA> |
>>> <DELETE_WHERE> ) )
>>
>>
>> Does it have to be the tokens for the LOOKAHEAD?
>>
>> ?? refactor Update1 into the control part and the
>> ( <LOAD> | ... | <DELETE_WHERE> ) list then share the list?
>>
>> (My DRYistic tendencies!)
>>
>
> I agree about DRY, but in this case I believe it is unavoidable.

Doh! yes, you're right - it is.  Via Update1 each token is tied to the 
action for the token which is side-effecting.  Lookahead will get weird.

	Andy

>  The
> paths are kind of separate.  If the grammar were written with:
>     LOOKAHEAD( <SEMICOLON> Prologue() Update1() )
>
> It would probably work, but it would require evaluating Update1()
> completely before making a decision at the choice point.  Since
> Update1() could possibly contain a very long INSERT_DATA or
> DELETE_DATA, that would defeat the streaming.  Instead, since we just
> list the options for the first token that could possibly appear in the
> Update1, then we can stop the lookahead there without evaluating the
> contents of the update operation.
>
> See the "SYNTACTIC LOOKAHEAD" section of the JavaCC mini-tutorial [1]
> if you're interested in a better explanation than I can give.
>
> -Stephen
>
> [1] http://javacc.java.net/doc/lookahead.html
>

Re: SPARQL Update 1.1 Grammar

Posted by Stephen Allen <sa...@apache.org>.

On Fri, Nov 30, 2012 at 6:43 AM, Andy Seaborne <an...@apache.org> wrote:
> On 29/11/12 23:13, Stephen Allen wrote:
>>
>> Andy,
>>
>> Thanks for the comments, very helpful.  It is unfortunate that I
>> didn't look at this earlier in the year, but indeed my proposed change
>> would be relatively minor to the end user, and I've figured out a
>> solution staying within the current spec!
>>
>> More comments inline:
>>
>>
>> On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Hi Stephen,
>>>
>>> Adding requirements to the official SPARQL grammar by parsable in
>>> particular
>>> ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is it''s LL(1)
>>> AKA
>>> simple technically and that communicates.  People can implement the
>>> language
>>> in different ways and the grammar defines the language but it does not
>>> prescribe the implementation.
>>
>>
>> In hindsight I would have argued that the optional terminating
>> semicolon should be eliminated to stay LL(1) without recursion.
>
>
> It's there to make producing SPARQL Update syntax by (streaming!) program
> easier - always put a SEMICOLON after an operation.
>
> Actually, I didn't want a separator at all - it's not necessary.  Oh well.
>
>
>> I had been trying that to no avail because of the unlimited amount of
>> lookahead required (which defeat the streaming effort).  Syntactic
>> Lookahead to the rescue however, and I was able to write it as:
>>
>>    Prologue()
>>    (
>>      Update1()
>>      (
>>        // This syntactic lookahead is necessitated by the optional
>> trailing semicolon and prologue
>>        LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> |
>> <ADD> |
>>                     <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> |
>> <INSERT> |
>>                     <USING> | <INSERT_DATA> | <DELETE_DATA> |
>> <DELETE_WHERE> ) )
>
>
> Does it have to be the tokens for the LOOKAHEAD?
>
> ?? refactor Update1 into the control part and the
> ( <LOAD> | ... | <DELETE_WHERE> ) list then share the list?
>
> (My DRYistic tendencies!)
>

I agree about DRY, but in this case I believe it is unavoidable.  The
paths are kind of separate.  If the grammar were written with:
   LOOKAHEAD( <SEMICOLON> Prologue() Update1() )

It would probably work, but it would require evaluating Update1()
completely before making a decision at the choice point.  Since
Update1() could possibly contain a very long INSERT_DATA or
DELETE_DATA, that would defeat the streaming.  Instead, since we just
list the options for the first token that could possibly appear in the
Update1, then we can stop the lookahead there without evaluating the
contents of the update operation.

See the "SYNTACTIC LOOKAHEAD" section of the JavaCC mini-tutorial [1]
if you're interested in a better explanation than I can give.

-Stephen

[1] http://javacc.java.net/doc/lookahead.html

Re: SPARQL Update 1.1 Grammar

Posted by Andy Seaborne <an...@apache.org>.

On 29/11/12 23:13, Stephen Allen wrote:
> Andy,
>
> Thanks for the comments, very helpful.  It is unfortunate that I
> didn't look at this earlier in the year, but indeed my proposed change
> would be relatively minor to the end user, and I've figured out a
> solution staying within the current spec!
>
> More comments inline:
>
>
> On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <an...@apache.org> wrote:
>> Hi Stephen,
>>
>> Adding requirements to the official SPARQL grammar by parsable in particular
>> ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is it''s LL(1) AKA
>> simple technically and that communicates.  People can implement the language
>> in different ways and the grammar defines the language but it does not
>> prescribe the implementation.
>
> In hindsight I would have argued that the optional terminating
> semicolon should be eliminated to stay LL(1) without recursion.

It's there to make producing SPARQL Update syntax by (streaming!) 
program easier - always put a SEMICOLON after an operation.

Actually, I didn't want a separator at all - it's not necessary.  Oh well.

> I had been trying that to no avail because of the unlimited amount of
> lookahead required (which defeat the streaming effort).  Syntactic
> Lookahead to the rescue however, and I was able to write it as:
>
>    Prologue()
>    (
>      Update1()
>      (
>        // This syntactic lookahead is necessitated by the optional
> trailing semicolon and prologue
>        LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> | <ADD> |
>                     <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> | <INSERT> |
>                     <USING> | <INSERT_DATA> | <DELETE_DATA> | <DELETE_WHERE> ) )

Does it have to be the tokens for the LOOKAHEAD?

?? refactor Update1 into the control part and the
( <LOAD> | ... | <DELETE_WHERE> ) list then share the list?

(My DRYistic tendencies!)


>        <SEMICOLON>
>        Prologue()
>        Update1()
>      )*
>      (
>        <SEMICOLON>
>        Prologue()
>      )?
>    )?

	Andy

Re: SPARQL Update 1.1 Grammar

Posted by Stephen Allen <sa...@apache.org>.

Andy,

Thanks for the comments, very helpful.  It is unfortunate that I
didn't look at this earlier in the year, but indeed my proposed change
would be relatively minor to the end user, and I've figured out a
solution staying within the current spec!

More comments inline:


On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <an...@apache.org> wrote:
> Hi Stephen,
>
> Adding requirements to the official SPARQL grammar by parsable in particular
> ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is it''s LL(1) AKA
> simple technically and that communicates.  People can implement the language
> in different ways and the grammar defines the language but it does not
> prescribe the implementation.

In hindsight I would have argued that the optional terminating
semicolon should be eliminated to stay LL(1) without recursion.

> There may be other ways to do streaming ... at the moment the parser builds
> a syntax tree (good for printing out request) but it could be event
> generating without a syntax builder built in - non-trivial change.
>
> So what is important to stream?
>
> 1/ INSERT DATA (and DELETE DATA)
>
> Maybe the data should not end up in the parser tree but a data bag?  It cn
> be pulled back in for small requests and printing.
>
> 2/ Per operation?  It could have a mode to emit each operation as it goes
> along.
>

All of this is implemented now and checked into a branch
(streaming-update).  I've spent a couple months on this, and it very
close to being finished.  Just point 1) in my comment on JENA-330 is
left to address!

> Possible grammar idea below ...
>
>
>> The grammar for Update in the SPARQL 1.1 PR is as follows:
>>
>>     [29] Update ::= Prologue ( Update1 ( ';' Update )? )?
>
>
> W3C process foo:
>
> The working group ends Dec 31.  There is no chance of an extension - we're
> under pressure to finish (not unreasonable ...)  Revisions, errata etc will
> noted afterwards.
>
> If the language changes, the spec would have to go back to another Last
> Call.  Implementations, of which there are many, would be affected.
>
> Just changing the grammar, not the language, could be argued to not affect
> implementations because it's the language that matters.  But that argument
> also argues for no change (because the grammar isn't so important to need a
> Last Call).  Morally, there would be a strong case for another Last Call on
> a grammar change.
>

Totally reasonable!

>
>> This is currently implemented in our JavaCC parser as:
>>
>>     Prologue() (Update1() ( <SEMICOLON> Update() )? )?
>>
>> Unfortunately, the best I non-recursive solution was able to come up with
>> was:
>>
>>     Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
>> Update1() )* ( <SEMICOLON> )? )?
>
>
> Why not add a final optional prologue if the last SEMI is seen?
>
>     .... ( <SEMICOLON> (Prologue())? )? )?
>
> [untested]
>

I had been trying that to no avail because of the unlimited amount of
lookahead required (which defeat the streaming effort).  Syntactic
Lookahead to the rescue however, and I was able to write it as:

  Prologue()
  (
    Update1()
    (
      // This syntactic lookahead is necessitated by the optional
trailing semicolon and prologue
      LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> | <ADD> |
                   <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> | <INSERT> |
                   <USING> | <INSERT_DATA> | <DELETE_DATA> | <DELETE_WHERE> ) )
      <SEMICOLON>
      Prologue()
      Update1()
    )*
    (
      <SEMICOLON>
      Prologue()
    )?
  )?

>
>> This is *almost* equivalent to the grammar in the spec, except for one
>> detail: it does not allow a trailing Prologue(), which the recursive
>> definition allows.  I can't seem to get any closer, mainly due to that
>> optional semicolon and optional trailing prologue (although you cannot
>> have a trailing semicolon if you have a lone trailing prologue).
>>
>> The more I look at the problem, the more I tend to think that maybe
>> the spec's Update grammar is faulty.  I believe it should not allow
>> trailing prologues.  It also should not allow just a prologue and
>> nothing else (Query forbids this).  Examples of queries that I think
>> should be invalid (but are not currently):
>>
>> ==========
>> PREFIX : <http://example.org/>
>> ==========
>> PREFIX : <http://example.org/>
>> insert data { } ;
>> PREFIX : <http://example.org/>
>> ==========
>>
>> Additionally, I would argue that the text of the Update spec [1]
>> contradicts the existing grammar.  Specifically the definition in
>> section 3:
>>      "A request is a sequence of operations and is terminated by
>>      EOF (End of File). Multiple operations are separated by a ';'
>>      (semicolon) character. A semicolon after the last operation
>>      in a request is optional."
>
>
> Sequences can be zero length :-)
>

Hah, I don't like it still :)



>
>>
>> A prologue by itself is not an operation as defined in section 4.3 [2].
>>
>> I would propose to the working group that we instead adopt the
>> following grammar:
>>
>>     [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )?
>>
>> This could be easily represented in JavaCC as:
>>
>>     Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
>> Update1() )* ( <SEMICOLON> )?
>>
>> The trailing semicolon seems to force us into using an LL(2) parser.
>> I cannot see a way to write this grammar in LL(1).
>>
>> I have three questions that would be nice to have answered before I
>> post a comment to the WG:
>>
>> 1) Is there a non-recursive way to write the existing rule 29 that
>> exactly matches the semantics of the spec?
>> 2) Is there a way to write my proposed rule 29 as LL(1) (even if has
>> to use recursion)?
>> 3) Would the RDF WG be open to changing the grammar at this point?  I
>> know it is in PR stage, but this would be feedback from attempting
>> implementation.
>>
>> -Stephen
>>
>> [1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage
>> [2]
>> http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate
>>
>

Re: SPARQL Update 1.1 Grammar

Posted by Andy Seaborne <an...@apache.org>.

Hi Stephen,

Adding requirements to the official SPARQL grammar by parsable in 
particular ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is 
it''s LL(1) AKA simple technically and that communicates.  People can 
implement the language in different ways and the grammar defines the 
language but it does not prescribe the implementation.

There may be other ways to do streaming ... at the moment the parser 
builds a syntax tree (good for printing out request) but it could be 
event generating without a syntax builder built in - non-trivial change.

So what is important to stream?

1/ INSERT DATA (and DELETE DATA)

Maybe the data should not end up in the parser tree but a data bag?  It 
cn be pulled back in for small requests and printing.

2/ Per operation?  It could have a mode to emit each operation as it 
goes along.

Possible grammar idea below ...

> The grammar for Update in the SPARQL 1.1 PR is as follows:
>
>     [29] Update ::= Prologue ( Update1 ( ';' Update )? )?

W3C process foo:

The working group ends Dec 31.  There is no chance of an extension - 
we're under pressure to finish (not unreasonable ...)  Revisions, errata 
etc will noted afterwards.

If the language changes, the spec would have to go back to another Last 
Call.  Implementations, of which there are many, would be affected.

Just changing the grammar, not the language, could be argued to not 
affect implementations because it's the language that matters.  But that 
argument also argues for no change (because the grammar isn't so 
important to need a Last Call).  Morally, there would be a strong case 
for another Last Call on a grammar change.

> This is currently implemented in our JavaCC parser as:
>
>     Prologue() (Update1() ( <SEMICOLON> Update() )? )?
>
> Unfortunately, the best I non-recursive solution was able to come up with was:
>
>     Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
> Update1() )* ( <SEMICOLON> )? )?

Why not add a final optional prologue if the last SEMI is seen?

     .... ( <SEMICOLON> (Prologue())? )? )?

[untested]

> This is *almost* equivalent to the grammar in the spec, except for one
> detail: it does not allow a trailing Prologue(), which the recursive
> definition allows.  I can't seem to get any closer, mainly due to that
> optional semicolon and optional trailing prologue (although you cannot
> have a trailing semicolon if you have a lone trailing prologue).
>
> The more I look at the problem, the more I tend to think that maybe
> the spec's Update grammar is faulty.  I believe it should not allow
> trailing prologues.  It also should not allow just a prologue and
> nothing else (Query forbids this).  Examples of queries that I think
> should be invalid (but are not currently):
>
> ==========
> PREFIX : <http://example.org/>
> ==========
> PREFIX : <http://example.org/>
> insert data { } ;
> PREFIX : <http://example.org/>
> ==========
>
> Additionally, I would argue that the text of the Update spec [1]
> contradicts the existing grammar.  Specifically the definition in
> section 3:
>      "A request is a sequence of operations and is terminated by
>      EOF (End of File). Multiple operations are separated by a ';'
>      (semicolon) character. A semicolon after the last operation
>      in a request is optional."

Sequences can be zero length :-)

>
> A prologue by itself is not an operation as defined in section 4.3 [2].
>
> I would propose to the working group that we instead adopt the
> following grammar:
>
>     [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )?
>
> This could be easily represented in JavaCC as:
>
>     Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
> Update1() )* ( <SEMICOLON> )?
>
> The trailing semicolon seems to force us into using an LL(2) parser.
> I cannot see a way to write this grammar in LL(1).
>
> I have three questions that would be nice to have answered before I
> post a comment to the WG:
>
> 1) Is there a non-recursive way to write the existing rule 29 that
> exactly matches the semantics of the spec?
> 2) Is there a way to write my proposed rule 29 as LL(1) (even if has
> to use recursion)?
> 3) Would the RDF WG be open to changing the grammar at this point?  I
> know it is in PR stage, but this would be feedback from attempting
> implementation.
>
> -Stephen
>
> [1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage
> [2] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate
>