You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Stephen Allen <sa...@apache.org> on 2012/09/22 16:41:05 UTC
SPARQL Query Parsing Questions
I am working on JENA-330 (converting the Update parser to streaming)
and I had a couple of questions:
1) What version of cpp do you use to generate arq.jj and sparql_11.jj?
My version inserts a bunch of extra newline characters. cpp (GCC)
3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
2) How important is the TripleCollector "mark" functionality? It
appears to be in use in the Collection and PropertyList parsing stages
to ensure that statements are added to the QuadAcc in the same order
that they appear in the query. However, RDF is unordered, so it
doesn't seem strictly necessary. In a streaming situation, its
presence complicates things. Can I simply eliminate this
functionality? Or is it important for some reason I can't see?
Thanks!
-Stephen
Re: SPARQL Query Parsing Questions
Posted by Andy Seaborne <an...@apache.org>.
On 22/09/12 15:41, Stephen Allen wrote:
> I am working on JENA-330 (converting the Update parser to streaming)
> and I had a couple of questions:
>
> 1) What version of cpp do you use to generate arq.jj and sparql_11.jj?
> My version inserts a bunch of extra newline characters. cpp (GCC)
> 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
cpp (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
but I used to use cpp under cygwin.
The cygwin output might need feeding through dos2unix.
> 2) How important is the TripleCollector "mark" functionality? It
> appears to be in use in the Collection and PropertyList parsing stages
> to ensure that statements are added to the QuadAcc in the same order
> that they appear in the query. However, RDF is unordered, so it
> doesn't seem strictly necessary. In a streaming situation, its
> presence complicates things. Can I simply eliminate this
> functionality? Or is it important for some reason I can't see?
The mark is for RDF lists and nested structures.
:s :p [ :q :r ] .
==>
:s :p _:b0 .
_:b0 :q :r
:s :p (1 2)
==>
:s :p _:b0 .
_:b0 rdf:first 1 .
_:b0 rdf:rest _:b1 .
_:b1 rdf:first 2 .
_:b1 rdf:rest rdf:nil
It keeps the triples generated in the order in the AST they are
encountered. A list element refers to the next element so you can't
generate it's rdf:rest until you know what to refer to. To keep the
rdf:first and rdf:rest together (for appearances sake, such as printing
the query or update).
It's probable not necessary to do it with a mark. It might be possible
to do as a sliding window of two elements; I have done this on an
experimental datastructure project so we to operate in the forward
direction. Working forwards is a tail recursion and can be loopified.
Working on the way back out isn't streaming (it needs stack depth).
It gets messy with nested structures:
:s :p (1 ("a" "b") 2 )
keeping the rdf:firsts in order of 1, "a", "b", 2 is nice albeit not
necessary.
One approach is that it's streaming except for compound structures. You
have to ask how you get a compound structure in the first place.
I think the important cases are
:s :p (
"item 1"
"item 2"
) .
:s :p [
:q 1 ;
:q 2 ;
] .
where it's easy to generate a huge item worth streaming. If these can
be handled, but the more complex ones don't stream, it's still a big win
IMO.
Andy
>
> Thanks!
>
> -Stephen
>