You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by mschroeder-github <gi...@git.apache.org> on 2017/10/28 12:07:53 UTC

[GitHub] jena pull request #299: Turtle Star

GitHub user mschroeder-github opened a pull request:

    https://github.com/apache/jena/pull/299

    Turtle Star

    Turtle parser extention for Turtle* as suggested in [Foundations of an Alternative Approach to Reification in RDF](https://arxiv.org/pdf/1406.3399.pdf) (Section 3.3).
    I copied the javacc grammar definition from `turtle.jj` and add the changes to `turtle-star.jj`.
    Javacc generates all the classes in package `org.apache.jena.n3.turtlestar.parser`.
    The `RDFReaderFImpl` is extended with the `TurtleStarReader` reader, so one can read Turtle* with the following code:
    ```java
    public static void parse() throws MalformedURLException {
            Model m = ModelFactory.createDefaultModel();
            
            m.read(new File("test.ttl").toURI().toURL().toString(), "TTL*");
            
            StringWriter sw = new StringWriter();
            m.write(sw, "TTL");
            System.out.println(sw.toString());
    }
    ```
    In short, the Turtle* syntax 
    ```
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix : <http://example.com/> .
    
    << :bob foaf:age 23 >> dct:creator <http://example.com/crawlers#c1> ;
                           dct:source  <http://example.net/homepage-listing.html> .
    ```
    results in the following RDF model:
    ```
    :bob  foaf:age 23 .
    
    []    a       <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> ;
          <http://www.w3.org/1999/02/22-rdf-syntax-ns#object>
                  23 ;
          <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate>
                  foaf:age ;
          <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>
                  :bob ;
          dct:creator <http://example.com/crawlers#c1> ;
          dct:source <http://example.net/homepage-listing.html> .
    ```
    A more complex example is:
    ```
    :sven :claims << :markus :says << :sven a :Person >> >> .
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mschroeder-github/jena turtle_star

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #299
    
----
commit e63811a74da613f5d66232c8ad8a78caa2db6d7e
Author: Markus Schroeder <ma...@dfki.de>
Date:   2017-10-27T07:32:40Z

    turtle parser generation by javacc fixed

commit 01520ea3d63744ed962402629bc2dea98807b1aa
Author: Markus Schroeder <ma...@dfki.de>
Date:   2017-10-27T08:24:32Z

    turtle star parser copied from turtle parser

commit 6a72f45be68adf522e39c10bfd6373a3800656ca
Author: Markus Schroeder <ma...@dfki.de>
Date:   2017-10-27T14:49:29Z

    turtle star syntax

commit 8e63b4784181eea93383cece7be0f6919e760eea
Author: Markus Schroeder <ma...@dfki.de>
Date:   2017-10-28T05:20:16Z

    undo changes on the pom.xml

----


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Looks like we're done here for now with parsers at https://github.com/RDFstar/RDFstarTools/ .



---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    @hartig - I'm ambivalent on changing `ParserProfileStd` because that guarantee on `checkTriple` (and `checkQuad`) checks a condition that a lot of code assumes is valid and does not check again. Hence the "Std".
    
    How about changing Jena so the `checkTriple` is publicly accessible and also adding, in Jena, `ParserProfileWrapper` so that operations can be intercepted and changed?
    
    (Incidently, this would be good to anyway for "generalized RDF" as defined in RDF 1.1)
    
    ```
            TurtleStarReaderRIOT(Lang lang, ParserProfile parserProfile) {
                this.lang = lang;
                this.parserProfile = new ParserProfileRDFStar(parserProfile);
            }
    ```
    ```
    public class  ParserProfileRDFStar extends ParserProfileWrapper {
    
        public ParserProfileRDFStar(ParserProfile parserProfile) { super(parserProfile); }
        @Override public void checkTriple(...
        @Override public void checkQuad(...
    }
    ```


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    I agree. Making `checkTriple` protected in `ParserProfileStd` would be an even better idea. Please do it.
    
    While we are at it, can you add another constructor to `TurtleShell` that allows me to pass my own `NodeFormatter` (in addition to the other arguments that are passed to the existing constructor)?
    This extension would allow me to develop a `TurtleStarWriter` simply by using `TurtleShell` instead of having to create a copy of it.
    I mean, in terms of source code, my proposal is to add the following new constructor:
    ```
    protected TurtleShell(IndentedWriter out, PrefixMap pmap, String baseURI, NodeFormatter nodeFmt, Context context) {
        this.out = out ;
        if ( pmap == null )
            pmap = PrefixMapFactory.emptyPrefixMap() ;
        this.prefixMap = pmap ;
        this.baseURI = baseURI ;
        this.nodeFmt = nodeFmt ;
    }
    ```
    ...and then modify the existing constructor as follows:
    ```
    protected TurtleShell(IndentedWriter out, PrefixMap pmap, String baseURI, Context context) {
        this(out, pmap, baseURI, createNodeFormatter(pmap,baseURI,context), context) ;
    }
    
    static public NodeFormatter createNodeFormatter(PrefixMap pmap, String baseURI, Context context) {
        if ( context != null && context.isTrue(RIOT.multilineLiterals) )
            return new NodeFormatterTTL_MultiLine(baseURI, pmap, NodeToLabel.createScopeByDocument()) ;
        else
            return new NodeFormatterTTL(baseURI, pmap, NodeToLabel.createScopeByDocument()) ;
    }
    ```


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Thanks @mschroeder-github for taking the initiative of writing an RDF* parser for Jena!
    
    @afs Before responding to some of your comments, I should mentioned that I am still actively working on RDF* and SPARQL* even if I am having hardly any time for writing code in my new position. Anyways, I have recently published [a paper with formal results about RDF*/SPARQL*](http://olafhartig.de/files/Hartig_AMW2017_RDFStar.pdf). Additionally, two weeks ago I presented the approach as a poster in ISWC 2017 and won the "peoples' choice best poster award" for it. Now, to your comments:
    
    You are right, an RDF* graph is not an RDF graph, but it can be transformed into an RDF graph (by applying the RDF reification vocabulary or some other pure-RDF approach such as singleton properties or single-triple named graphs). The aforementioned paper provides the formal mappings for such transformations and shows that these mappings have desirable properties (they are information preserving and query result preserving).
    
    However, you are not right when you write that "RDF* has the notion of a triple id." There is no such notion in RDF* (unless you consider the triples themselves as triple identifiers).
    
    Regarding your example that you introduce when you write about merging, you are right: these are two RDF* triples that have the same subject and this subject happens to be the triple (`:s`,`:p`,`:o`).
    
    In your comment related to SPARQL you write that the `<< >>` syntax has been discussed in the SPARQL WG. Was this discussion related to reification or was the idea of the `<< >>` syntax in this discussion for something else?
    
    In general, I understand your response to this PR as a positive attitude towards supporting RDF* syntax (and SPARQL* syntax?) in Jena, plus a suggestion to implement this support as part of RIOT instead of the jena-core Turtle parser. Correct?


---

[GitHub] jena issue #299: Turtle Star

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/299
  
    +1 on keeping the "default" check as-is, but making the choice of calling it / changing it available via a wrapper.


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Add the PR #418 


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    @mschroeder-github, @hartig 
    
    Apache Jena 3.6.0 is out and has `Node_Triple` for you. Feedback etc welcome - this is an experimental feature and can be revised.  Note - this does not mean parsers or writers handle it natively.


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Ah, one more thing: @afs, would it be possible to extend the `checkTriple` method of `ParserProfileStd` to permit the subject and the object of a triple to be `Node_Triple`?
    This is the only thing that breaks my Turtle* parser extension. The only workaround for me at the moment is to use the parser framework with "checking" disabled.


---

[GitHub] jena pull request #299: Turtle Star

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/299


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Every chance if the Jena committers and PMC agree.
    
    Developing as a separate module, outside Jena, to start with helps the work by not getting entangled in timing and release issues of Jena releases.  It also helps creates the community focused on the work. 
    
    The Jena project releases all modules in a single release process every 3 or 4 months. Users tend to expect continuity across releases. While getting started, a module `rdfstar` may want to release as and when contributors and code are ready, have flexibility to release new version rapidly, change designs, and put API around RDF*graphs (RDF*-merge for example). Relying on PRs to Jena is going to be a burden for that community initially.
    
    If there are Jena changes needed (adding a new RDF term type `Node_Triple` for example), PRs to get them in focused on the needs can be done.
    
    (Do note - the Jena project is not a source of maintenance and support for contributions. Depending on complexity, that might be something to discuss nearer the time).


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    > That does not work because the `ParserProfileStd` supplied to `TurtleStarReaderRIOT` has already been created, which is why the suggestion is to use a forwarding proxy object (the wrapper) with public access to the "check" operations.
    
    In my implementation it would have worked because it is wrapping the earlier-created `ParserProfileStd` passed to the constructor of my `LangTurtle` specialization; see line 59 in https://github.com/RDFstar/RDFstarTools/blob/master/src/main/java/se/liu/ida/rdfstar/tools/parser/lang/LangTurtleStar.java#L59
    
    Anyways, I understand that you want a more flexible solution.
    
    > Could you send in a pull request please?
    
    Will do.


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Thanks Andy! I agree with what you write.
    
    Do you think there is a chance for such a separate maven module to become part of the official family of Apache Jena maven modules?


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Sorry for the delay getting round to this - Jena had a release which
    takes time.  It would be interesting to see some support for RDF*.
    
    Comments in two areas : the parser and about RDF*.
    
    **Turtle parser**
    
    The Turtle parser in jena-core only exists for legacy and support some
    tests. It isn't the normal Turtle parser, and isn't spec compliant.
    
    Parsing happens in RIOT org.apache.jena.riot.lang.LangTurtle. It is a
    custom (hand written parser) with its own tokenizer for speed.
    
    If you are interested, we can add the tokens for RDF*, it takes some care
    because this is the same tokenizer as used for N-Triples and so
    speed-critical. We may need o split of an N-Triples/N-quads tokenizer
    but that wouldn't be such a bad thing anyway.
    
    There is an example of plumbing in a new language in [EXRIOT5.java](https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java).
    
    **RDF\* and RDF**
    
    RDF* has the notion of a triple id: each triple has a unique id which makes it different to reification. An RDF* grah can be written in RDF and the result and back in again by using exactly one for a
    triple and the process can be reversed.  This equates to the triple id of RDF* with information in the
    strict RDF graph.  It's not a general RDF graph.
    
    It is not possible to RDF-merge to such graphs - that would end up with two bnode-reificiations and statements in one graph about the triple will not RDF*-merge with the statements in the other graph (which is the whole point of RDF reification).
    
    Merging needs to respect the additional RDF* data model.
    Example:
    ```
    << :s :p :o >> :addedBy "Alice" .
    << :s :p :o >> :addedBy "Bob" .
    ```
    My understanding of RDF* is that this is one triple `td = :s :p :o` with two
    triples with the same subject. `td :addedBy "Alice" . td :addedBy "Bob" .`.
    
    **SPARQL**
    
    The `<< >>` syntax has a long history - it was first used in early SPARQL discussions but never made it to the final stages. It may only existed on the mailing list; teh WG decided not to have reification syntax.  However, there are some remains in the SPARQL grammar in ARQ.
    
    
    



---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Agreed. 


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Executing SPARQL over the id-specific standard RDF representation of RDF*, could be done with without needing to modify the core algebra by expansion of `<<:s :p :o>>` in BGPs.
    
    For `BIND(<<:s :p :o>> AS ?t)` (from earlier work - it has gone now?) it can be done with a custom function `:findTriple(:s :p :o)` to return the reification bnode but it must be a single match (a good reason for not having or restricting it to ground s/p/o.).
    
    With that, I think an extension can go in a separate maven module as a pure extension.
    
    (This is by thinking about it, not coding it.)



---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    > Making `checkTriple` protected in `ParserProfileStd` would be an even better idea.
    
    That does not work because the `ParserProfileStd` supplied to `TurtleStarReaderRIOT` has already been created, which is why the suggestion is to use a forwarding proxy object (the wrapper) with public access to the "check" operations.
    
    > another constructor to `TurtleShell`
    
    Could you send in a pull request please?


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    I finally have implemented a Turtle* parser that hooks into Jena's RIOT parser framework and uses the `Node_Triple` class that @afs had added to Jena in PR #327. Find the code at https://github.com/RDFstar/RDFstarTools 
    Thanks again all for your efforts! I think we now can close this PR here.


---

[GitHub] jena issue #299: Turtle Star

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/299
  
    PR #327 adds an RDFTerm for a triple.


---

[GitHub] jena issue #299: Turtle Star

Posted by hartig <gi...@git.apache.org>.
Github user hartig commented on the issue:

    https://github.com/apache/jena/pull/299
  
    Great Andy, thanks! I will check it out after the holidays. (I'm a bit overloaded here at the moment) 


---