You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Stephen Allen (JIRA)" <ji...@apache.org> on 2012/09/19 21:51:08 UTC

[jira] [Created] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Stephen Allen created JENA-330:
----------------------------------

             Summary: Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
                 Key: JENA-330
                 URL: https://issues.apache.org/jira/browse/JENA-330
             Project: Apache Jena
          Issue Type: Improvement
          Components: ARQ
            Reporter: Stephen Allen
            Assignee: Stephen Allen
            Priority: Minor


The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.

This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506936#comment-13506936 ] 

Stephen Allen edited comment on JENA-330 at 11/29/12 11:33 PM:
---------------------------------------------------------------

My work is now checked in in the streaming-update branch.

I have addressed all of the points in my comment except for 1 (the out-of-order blank nodes and lists for the syntactic shortcuts).  Some comments about this issue are on the [mailing list|http://markmail.org/message/3aw72tcwdmoxa46b].  I would perhaps argue that this is an OKish situation as order shouldn't matter in BGPs.  However, this is causing a unit test to fail:

   Running com.hp.hpl.jena.sparql.TC_Scripted
   **** Test: syntax-forms-01.rq
   ** reparsed query hashCode does not equal parsed input query
   (com.hp.hpl.jena.sparql.junit.TestSerialization)

This unit test compares a parsed query with a serialized and reparsed version of it.  When the query is serialized, it is an expanded version without the syntactic shortcuts.  Basically the test is failing because the blank nodes have different internal ids in the two queries, and this causes Query.hashCode() and .equals() to be different.  For the example query in syntax-forms-01.rq, here is the internal structure of both the originally parsed query, and the reparsed version.

Parsed Query:
============

Original Serialization:
------
PREFIX : <http://example.org/ns#>
SELECT * WHERE { ( [ ?x ?y ] ) :p ( [ ?pa ?b ] 57 ) }

Internal Rep:
------
{ ??1 ?x ?y .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??1 .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??3 ?pa ?b .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??3 .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??0 <http://example.org/ns#p> ??2
}


Reparsed Query:
============

Original Serialization:
------
PREFIX  :     <http://example.org/ns#>

SELECT  *
WHERE
  { _:b0 ?x ?y .
    _:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b0 .
    _:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
    _:b2 ?pa ?b .
    _:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b2 .
    _:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b4 .
    _:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
    _:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
    _:b1 :p _:b3
  }


Internal Rep:
------

{ ??0 ?x ?y .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??0 .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??2 ?pa ?b .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??2 .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??1 <http://example.org/ns#p> ??3
}



                
      was (Author: sallen):
    My work is now checked in in the streaming-update branch.

I have addressed all of the points in my comment except for 1 (the out-of-order blank nodes and lists for the syntactic shortcuts).  I would perhaps argue that this is an OKish situation as order shouldn't matter in BGPs.  However, this is causing a unit test to fail:

   Running com.hp.hpl.jena.sparql.TC_Scripted
   **** Test: syntax-forms-01.rq
   ** reparsed query hashCode does not equal parsed input query
   (com.hp.hpl.jena.sparql.junit.TestSerialization)

This unit test compares a parsed query with a serialized and reparsed version of it.  When the query is serialized, it is an expanded version without the syntactic shortcuts.  Basically the test is failing because the blank nodes have different internal ids in the two queries, and this causes Query.hashCode() and .equals() to be different.  For the example query in syntax-forms-01.rq, here is the internal structure of both the originally parsed query, and the reparsed version.

Parsed Query:
============

Original:
------
PREFIX : <http://example.org/ns#>
SELECT * WHERE { ( [ ?x ?y ] ) :p ( [ ?pa ?b ] 57 ) }

Internal Rep:
------
{ ??1 ?x ?y .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??1 .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??3 ?pa ?b .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??3 .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??0 <http://example.org/ns#p> ??2
}

Reparsed Query:
============


{ ??0 ?x ?y .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??0 .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??2 ?pa ?b .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??2 .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??1 <http://example.org/ns#p> ??3
}


[1] http://markmail.org/message/3aw72tcwdmoxa46b


                  
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477991#comment-13477991 ] 

Stephen Allen commented on JENA-330:
------------------------------------

I was aware that 2.7.4 was coming up and I didn't want to check this in right before that, as it is a pretty big change.

Re: 3)
  I agree with removing Update Submission code after 2.7.4.  Also makes sense to have SPARQL spec grammar ifdefed (just not used in standard code path).

Re: transactional property
  That property is used to tell the assembler whether to construct a GraphStoreNull or GraphStoreNullTransactional object.  This was for testing purposes, so I could verify the code paths were different in SPARQL_Update.execute().

Re: AbstractUpdateSink
  I wasn't aware which packages were considered the public API.  They certainly don't have to be there, and I can move them to .modify.


Other notes on the patch:
  4) The main changes to the grammar were to use the new UpdateSink instead of returning Update objects.

  5) Also there were places where the parser was hanging on to Token objects for too long.  Each Token contains a link to the next token, so it was leading to a memory leak where the INSERT_DATA token for example would contain a linked list to all the tokens after it.


                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Andy Seaborne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478082#comment-13478082 ] 

Andy Seaborne commented on JENA-330:
------------------------------------

Specific packages are called out in the javadoc e.g.

http://jena.apache.org/documentation/javadoc/arq/index.html

I hope that JenaClient will nudge us into be being more formal about "the API".

                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Allen updated JENA-330:
-------------------------------

    Attachment: TestLargeUpdates.java
                config-null.ttl
                JENA-330_20121016.patch

The attached patch provides streaming SPARQL Update all the way through Fuseki and into the underlying GraphStore.

Testing against GraphStoreNull and using jena-client to generate a never-ending INSERT DATA, the system will run indefinitely.  Use config-null.ttl as your Fuseki configuration, and run TestLargeUpdates to test this (TestLargeUpdates uses jena-client, which is available in the Experimental branch).  On my machine, connecting to localhost, I get a steady-state of about 32k triples per second.

Note that there are still some limits in this patch:

1) Queries with blank nodes are inverted, and queries with RDF lists are mostly in order except for the initial statement that points to the head of the list.  Examples:

   :s :p [ :q [ :q :r ] ] .
becomes:
   _:b0 :q :r . _:b1 :q _:b0 . :s :p _:b1 .
   

  :s :p (1 2 3 4)
becomes:
  _:b2 rdf:first 1 .
  _:b2 rdf:rest _:b3 .
  _:b3 rdf:first 2 .
  _:b3 rdf:rest _:b4 .
  _:b4 rdf:first 3 .
  _:b4 rdf:rest _:b5 .
  _:b5 rdf:first 4 .
  _:b5 rdf:rest rdf:nil .
  :s :p _:b2 .


2) DatasetUpdateSink bypasses UpdateEngineFactory for INSERT DATA / DELETE DATA, and calls .add(Quad) .remove(Quad) directly on the DatasetGraph.

3) There is still a limit on the number of update operations that can appear in an update request (this is because Update() in the grammar is recursive, and will hit a StackOverflowError quickly).  Uncomment the delete line in TestLargeUpdates to see this.


As this is a pretty large patch, I didn't want to commit it without some review.  If someone could take a look, that would be great!  And also an opinion on whether 1) is important (I'm thinking it's not too critical) and how to solve 2).

                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492397#comment-13492397 ] 

Hudson commented on JENA-330:
-----------------------------

Integrated in Jena_Development_Test_Windows #6 (See [https://builds.apache.org/job/Jena_Development_Test_Windows/6/])
    Add some support code from patch JENA-330_20121016.patch). (Revision 1405597)

     Result = UNSTABLE
andy : 
Files : 
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/assembler/DatasetAssemblerVocab.java
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/assembler/DatasetNullAssembler.java
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/modify/GraphStoreNullTransactional.java
* /jena/trunk/jena-arq/src/main/java/org/openjena/atlas/lib/SinkToCollection.java

                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490247#comment-13490247 ] 

Hudson commented on JENA-330:
-----------------------------

Integrated in Jena__Development_Test #255 (See [https://builds.apache.org/job/Jena__Development_Test/255/])
    Add some support code from patch JENA-330_20121016.patch). (Revision 1405597)

     Result = SUCCESS
andy : 
Files : 
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/assembler/DatasetAssemblerVocab.java
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/assembler/DatasetNullAssembler.java
* /jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/modify/GraphStoreNullTransactional.java
* /jena/trunk/jena-arq/src/main/java/org/openjena/atlas/lib/SinkToCollection.java

                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507894#comment-13507894 ] 

Stephen Allen commented on JENA-330:
------------------------------------

Yeah, I was afraid that they'd have to be in the same order for exactly the reason you mention, storage systems w/ no optimizer. One approach I might take is to make these two cases non-streaming.  At least that gets us a lot closer.
                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506936#comment-13506936 ] 

Stephen Allen commented on JENA-330:
------------------------------------

My work is now checked in in the streaming-update branch.

I have addressed all of the points in my comment except for 1 (the out-of-order blank nodes and lists for the syntactic shortcuts).  I would perhaps argue that this is an OKish situation as order shouldn't matter in BGPs.  However, this is causing a unit test to fail:

   Running com.hp.hpl.jena.sparql.TC_Scripted
   **** Test: syntax-forms-01.rq
   ** reparsed query hashCode does not equal parsed input query
   (com.hp.hpl.jena.sparql.junit.TestSerialization)

This unit test compares a parsed query with a serialized and reparsed version of it.  When the query is serialized, it is an expanded version without the syntactic shortcuts.  Basically the test is failing because the blank nodes have different internal ids in the two queries, and this causes Query.hashCode() and .equals() to be different.  For the example query in syntax-forms-01.rq, here is the internal structure of both the originally parsed query, and the reparsed version.

Parsed Query:
============

Original:
------
PREFIX : <http://example.org/ns#>
SELECT * WHERE { ( [ ?x ?y ] ) :p ( [ ?pa ?b ] 57 ) }

Internal Rep:
------
{ ??1 ?x ?y .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??1 .
  ??0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??3 ?pa ?b .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??3 .
  ??2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??0 <http://example.org/ns#p> ??2
}

Reparsed Query:
============


{ ??0 ?x ?y .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??0 .
  ??1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??2 ?pa ?b .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ??2 .
  ??3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ??4 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> 57 .
  ??4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
  ??1 <http://example.org/ns#p> ??3
}


[1] http://markmail.org/message/3aw72tcwdmoxa46b


                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477407#comment-13477407 ] 

Stephen Allen commented on JENA-330:
------------------------------------

I was able to address point 3) for the strict SPARQL 1.1 case by changing the grammar from:
     Prologue()
     (Update1() ( <SEMICOLON> Update() )? )?
to
     Prologue()
     ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue() ( Update1() )? )* ( <SEMICOLON> )? )?


I ran into issues with the "ARQ_UPDATE" that allows optional semicolons.  I tried changing:
     Prologue()
     ( Update1() (<SEMICOLON>)*  Update() )?
to:
     Prologue()
     ( Update1() ( (<SEMICOLON>)* Prologue() ( Update1() )? )* )?

But this causes a JavaCC error: "Expansion within "(...)*" can be matched by empty string.".  Basically it doesn't like the fact that all parts of that can be optional, Prologue, the Semicolon, and the Update1.  I don't know what to do here.  Maybe get rid of the optional semicolon and go with just the SPARQL 1.1 syntax?

Additonally my change to SPARQL 1.1 avoids recursion, but does make it no longer a LL(1) parser.  Is there some important reason to keep it LL(1)?  Because I note that TriplesTemplate() also uses recursion in the SPARQL 1.1 case to avoid LOOKAHEAD(2).  It seems like we could just live with a LL(2) parser for SPARQL 1.1 to avoid recursion.


                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Allen updated JENA-330:
-------------------------------

    Attachment:     (was: JENA-330_20121016.patch)
    
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Andy Seaborne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477914#comment-13477914 ] 

Andy Seaborne commented on JENA-330:
------------------------------------

(only a quick scan for the moment)

Thanks for giving a guide to the patch.

This is quite large although much is due to include autogenerated files  - should we wait until after the Jena 2.7.4 to integrate?

Re: 3)

ARQ_UPDATE is partial support for old style SPARQL/Update (the submission, not SPARQL 1.1) as a migration.  Its been there a while so it's time to start removing it.

SPARQL as spec'ed is LL(1) to make sure it's easy to implement - it means it's LALR(1) and also friendly to a hand-written parser, but sometime it has to recurse when a loop would be better.  TriplesTemplate is an example of this -

SPARQL 1.1 is finished and wrapping up - after the Jena 2.7.4 we can start using better versions oft the parser rules.  It would help me if this is done with ifdefs in the grammar so that can be easily revered if needbe (e.g. producing the SPARQl spec grammar again).  I'm happy to do that work.

Re: transactional property 

Would it be better to introspect on the dataset via supportsTransactions (or for DatasetGraph, see if it supports the Transactional interface).

Re: AbstractUpdateSink etc

com.hp.hpl.jena.update is one of the API packages.  Do all these classes need to be in the public API?  .modify is the implementation package.



                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Andy Seaborne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490241#comment-13490241 ] 

Andy Seaborne commented on JENA-330:
------------------------------------

The patch doesn't align in the parser due to deprecating, then ripping out, the SPARQL Submission transition support.  I didn't feel I understood it well enough to manualy apply it.

I have added the null graph store/ transaction additions.


I wonder if the best approach is to focus on the SPARQL Graph Store Protocol for bulk addition of data.  We could have POST to a graph be streaming triples and POST to the dataset itself for streaming quads.

                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Andy Seaborne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507303#comment-13507303 ] 

Andy Seaborne commented on JENA-330:
------------------------------------

While it's not semantically necessary to preseve the order, not all storage systems have a optimizer so preserving the order is, to me, desirable.  Yes - it can be a nuisance in the code.

In your unequal queries, you seem to be generating the bNodes/variables at different points in the parse process because you end up with " ??1 ?x ?y . " and the ??1 suggests it was after ??0.
                
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-330) Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries

Posted by "Stephen Allen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Allen updated JENA-330:
-------------------------------

    Attachment: JENA-330_20121016.patch
    
> Streaming support for SPARQL Update queries and streaming support for quads in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single UpdateRequest object which holds them in memory.  Instead the parser should insert queries into something like a Sink<Update>.  Additionally it should put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of an ArrayList.
> This should allow the creation of a streaming update parser, which could be combined with JENA-309 to have full streaming into an underlying transactional store and the ability to handle arbitrarily large INSERT_DATA or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira