You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by stain <gi...@git.apache.org> on 2017/01/27 17:30:03 UTC

[GitHub] commons-rdf pull request #32: COMMONSRDF-55: Handle Jena's urn:x-arq:Default...

GitHub user stain opened a pull request:

    https://github.com/apache/commons-rdf/pull/32

    COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph and friends

    This fixes [COMMONSRDF-55](https://issues.apache.org/jira/browse/COMMONSRDF-55) by adding special testing of Jena's `urn:x-arq:DefaultGraph` and friends when creating a Quad.
    
    It also fixes a NullPointerException as previously we wrongly  did `Quad.create(null, s,p,o)` with Jena, but `null` is not a valid graph name - `Quad.defaultGraphIRI` should be used instead.
    
    Note that I did not hard-code `urn:x-arq:DefaultGraph` but use `IRI.equals()` against `Quad.defaultGraphIRI` and `Quad.defaultGraphNodeGenerated`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/commons-rdf COMMONSRDF-55

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/commons-rdf/pull/32.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #32
    
----
commit bb264738aa0e3c3a8e9ae8bba0d7109cfb59adaa
Author: Stian Soiland-Reyes <st...@apache.org>
Date:   2017-01-27T17:26:59Z

    COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph and friends

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    I've added javadoc to this effect (commit 7bbff25) and will merge this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    The Commons RDF marker for the default graph (as of today) is `Optional.empty()` and not an `RDFNode` (and thus can only be used in the `Quad.getGraphName()` position).
    
    There are four boundaries that can create Commons RDF Quad instances from Jena:
    
    * **Retrieving:** [Dataset.stream()](https://commons.apache.org/proper/commons-rdf/apidocs/org/apache/commons/rdf/api/Dataset.html#stream--), `.iterate()` and friends
    * **Adapting:** [JenaRDF.asQuad(jenaQuad)](https://commons.apache.org/proper/commons-rdf/apidocs/org/apache/commons/rdf/jena/JenaRDF.html#asQuad-org.apache.jena.sparql.core.Quad-)
    * **Creating:** [JenaRDF](https://commons.apache.org/proper/commons-rdf/apidocs/org/apache/commons/rdf/jena/JenaRDF.html#createQuad-org.apache.commons.rdf.api.BlankNodeOrIRI-org.apache.commons.rdf.api.BlankNodeOrIRI-org.apache.commons.rdf.api.IRI-org.apache.commons.rdf.api.RDFTerm-)  with a graph `IRI` adapted from a Node using [JenaRDF.asRDFTerm(Quad.defaultGraph)](https://commons.apache.org/proper/commons-rdf/apidocs/org/apache/commons/rdf/jena/JenaRDF.html#asRDFTerm-org.apache.jena.graph.Node-)
      *  ..in worst case, `IRI` from `RDF.createIRI("urn:x-arq:DefaultGraph")
    * **Parsing:** JenaRDFParser - which uses JenaRDF.asQuad (experimental)
    
    I think we MUST do the _Retrieving_ - triples in the default graph according to the RDF 1.1 should still be in the default graph according to [org.apache.commons.rdf.api.Quad](https://commons.apache.org/proper/commons-rdf/apidocs/org/apache/commons/rdf/api/Quad.html) - importantly it must be `.equals()` an equivalent quad statement in a different implementation. I think we all agree on this.  
    
    Similarly _Parsing_ we MUST handle - at least when it's no longer experimental.
    
    The question is how helpful we should be in the other cases. If someone is _adapting_ a Jena Quad, then I think it also MUST convert to `Optional.empty()` although it keeps the original Jena quad underneath, which can distinguish between the two default variants, if needed.
    
    Now that leaves _creating_ which is a bit more speculative - we don't know where the graph `IRI` comes from, and other `RDF` implementations do of course not do anything special with `urn:x-arq:DefaultGraph`. However it is unlikely that such an IRI would be used in a non-Jena context. 
    
    So here are three options for what `RDF.createQuad(g,s,p,o) should do if g is `urn:x-arq:DefaultGraph` or `urn:x-arq:DefaultGraphNode`:
    
    1) Always recognize `urn:x-arq:DefaultGraph` and friends (either by comparing of IRI string against the `Quad.defaultGraph` and `Quad.defaultGraphNode` constants  (this PR), or adapting/unwrapping to the equivalent Jena `Node` and then checking with `Quad.isDefaultGraph(Node)`  (might be less efficient)
    2) Only recognize if it is a `org.apache.commons.rdf.jena.JenaIRI` instance by using `Quad.isDefaultGraph(iri.asJenaNode())`
    3) Never recognize it, IRI will be left as-is. Insertion of such a `Quad` to a Jena-backed Dataset will then have different semantics then in other `Dataset`s
    
    I think it's an edge-case with people creating it by hand from the IRI string with `RDF.createIRI()` - particularly with a non-Jena `RDF` instance, although that could be the case in queries against the union graph. (but that would only be in SPARQL world, right?).
    
    However it might be conceivable that `Node` from `Quad.defaultGraph` is used in Jena-centric code that adds Commons RDF at the edge.
    
    
    Boundaries from Commons RDF to Jena are simpler, as we just always use `urn:x-arq:DefaultGraph` (not `null`!) except where there's a pre-existing Jena `Quad` which is left as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    `defaultGraph` is explicit name of the default graph in Jena, `defaultGraphNode` is for when the system is generating it, not for data.  There are methods for "is this the default graph?" that abstract away from this detail. 
    
    The mark in the graph slot of the quad is anything to distinguish it - Jena uses a URI although it is not properly "U"niversal - it is relative is the dataset. A quad is a triple from a graph in a dataset.  Triples are universal.
    
    I guess I don't see the issue here: why is it not "Commons RDF marker for Quad/dft graph" <=> "System X marker for Quad/dft graph" at the adapter boundary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    Any comments from perhaps @afs or @ajs6f? I'm not sure exactly at what boundary we should replace `<urn:x-arq:DefaultGraph>` and friends with `Optional.empty()` -- in the solution of this PR it happens for any Quad created with `JenaRDF` - but of course that would not happen in other `RDF` instances. This suggestion would also convert any "foreign" `IRI` instances of `<urn:x-arq:DefaultGraph>` when given as a graph name.
    
    I'll admit there's potential for information loss if the alternate  `<urn:x-arq:DefaultGraphNode>` is used as a string IRI or Node when making a Commons RDF Quad  - this PR will adapt it into `<urn:x-arq:DefaultGraph> when later making the Jena `Quad` instance from the field of Optional.empty(), however Jena's `DatasetGraph` also does that conversion.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    The thing is a Commons RDF `JenaQuad` has in a way two feet into Jena - the underlying Jena `Quad` as well as the underlying `Node` instances (which would be wrapped as `RDFTerm` instances).
    
    So the challenge is that it's also possible to make a Commons RDF `JenaQuad` using both adapting a quad from Jena (as when being retrieved from the Dataset), as well as composed from `RDFTerm` instances with the `JenaRDF.createQuad(g,s,p,o)` method -- such constructing will currently keep fields for the Commons RDF wrappers of the `RDFTerm`s optional(g),s,p,o, and only make the underlying Jena-backed `Quad` on-demand on first call to [asJenaQuad](https://github.com/apache/commons-rdf/blob/0.3.0-incubating/jena/src/main/java/org/apache/commons/rdf/jena/impl/AbstractQuadLike.java#L92).
    
    Commons RDF methods like `.equals()` and `.hashCode()` then use the individual graph/subject/predicate/object `RDFTerm` fields which are always initialized - thus from the Commons RDF side in a way the Quad object is ready-already, while it's Jena-nature might be incomplete until needed (e.g. being added to a Dataset).
    
    Currently the constructor from a JenaQuad will unwrap to create the [RDFTerm g/s/p/o](https://github.com/apache/commons-rdf/blob/0.3.0-incubating/jena/src/main/java/org/apache/commons/rdf/jena/impl/AbstractQuadLike.java#L74) -- with this PR this would check Quad.isDefaultGraph() before unwrapping the graph name.
    
    It could in theory be done the other way, to keep the Jena-backed org.apache.jena.sparql.core.Quad as the master backend field, and rather generate the `RDFTerm` wrappers on demand, in which case first call to `getGraphName()` would simply check `isDefaultGraph()` -- but it would make it trickier to keep `AbstractQuadLike` common for both `Quad` and `Triple`, as Jena's `Quad` does not have a common superclass with Jena's `Triple`, there would be two alternate fields for the master (or a more complicated abstract class hierarchy).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf pull request #32: COMMONSRDF-55: Handle Jena's urn:x-arq:Default...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/commons-rdf/pull/32


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    I haven't looked at the code in enough detail but I was assuming there would be one place with two functions: Quad->JenaQuad and JenaQuad->Quad. This would the strategy (policy) for all of the adapter. That is what I mean by the boundary.
    
    "urn:x-arq:DefaultGraph" only exists in Jena. Quad has `Optional.empty()`
    
    `Optional.empty()` => `urn:x-arq:DefaultGraph`
    
    Jena `Quad.isDefaultGraph` => `Optional.empty()`
    
    `Quad.equals` works, which is very important.
    
    or maybe I've been doing too much policy abstraction recently.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    I guess it's a question of where we put the "inconsistency" barrier. We can probably assume that in the odd case that `urn:x-arq:DefaultGraph` appear literally in a non-Jena `IRI` or a non-Jena `Quad` then it must have leaked out of Jena somehow, and will be treated as a real IRI. It  would magically become the default graph only if such a quad is added to a Jena dataset.
    
    That would mean we let Commons RDF construction by component of a Jena-based quad preserve `g` just as in other implementations. 
    
    With option **(2)** above we would add JenaRDF-specific recognition of the magic IRI if it happens to be backed by a Jena `Node` (which might even be because it was made from a string). 
    
    It would probably cleaner in Commons RDF for a Quad to magically change only on insertion to a Jena-backed Dataset, than when making the Quad with a particular back-end - e.g. you add one quad, but a slightly different one comes back out, which will not be `.equals()` the inserted one.  This is not very different from stores with inferred rules or blank-node adaptions.  (Commons RDF Graph/Dataset contracts do not require the exact triple/quad to be returned back again)
    
    So I think that would be the semantically cleanest solution, where each `RDF` implementation behaves the same, but each `Dataset` have slight variation.
    
    However, it is not given that a `Quad` made with `JenaRDF` will be added to a Jena-based `Dataset`, but that is probably most likely. It is not given that a `Node` that is `urn:x-arq:DefaultGraph` was picked from the constant `Node.defaultNode`, but it is likely. It is not given that a literal Graph IRI `urn:x-arq:DefaultGraph` has leaked from Jena's `Node.defaultNode, but it is likely.
    
    
    Therefore the most pragmatic for Commons RDF users, if semantically slightly unclean, would be the option (2) as @ajs6f says. It means there would be only this inconsistency barrier:
    
    ```java
    RDF simple = new SimpleRDF();
    RDF jena = new JenaRDF();
    
    IRI defaultS = simple.createIRI("urn:x-arq:DefaultGraph")
    IRI defaultJ = jena.createIRI("urn:x-arq:DefaultGraph") // or jena.asRDFTerm(Node.defaultGraph)
    assertEquals(defaultS, defaultJ);
    
    IRI ex = jena.createIRI("http://example.com/");
    
    Quad q1 = jena.createQuad(defaultS, ex, ex, ex);
    assertFalse(q1.getGraphName().isPresent());
    assertEquals(defaultS, q1.getGraphName().get()); // as-s
    Quad q2 = jena.createQuad(defaultJ, ex, ex, ex);
    assertFalse(q2.getGraphName().isPresent()); // INCONSISTENT with q1
    assertFalse(q1.equals(q2)); // INCONSISTENT
    ```
    
    (Adding either `q1` or `q2` to a Jena-backed Dataset would both be transferred to q2-form with `Optional.empty()` on retrieving -- adding them to any non-Jena Dataset implementation would look like two different quads).
    
    This will technically break the [SHOULD contract](https://github.com/apache/commons-rdf/blob/0.3.0-incubating/api/src/main/java/org/apache/commons/rdf/api/RDF.java#L234) of `RDF.createQuad()` which says the parameters should be preserved. 
    
    >      * The returned Quad SHOULD have a {@link Quad#getGraphName()} that is equal
    >     * to the provided graphName, a {@link Quad#getSubject()} that is equal to
    >     * the provided subject, a {@link Quad#getPredicate()} that is equal to the
    >     * provided predicate, and a {@link Quad#getObject()} that is equal to the
    >     * provided object.
    
    but I think this is a valid breaking of SHOULD, particularly if we do it only on "our own" Jena-backed IRIs.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    I suppose I incline to (2) right now. The "magic" URIs are entirely Jena-specific, so ideally they appear as few places outside of Jena code itself as possible. On the other hand, `Quad.isDefaultGraph()` (the instance member, not the static function that takes a `Node`) seems like the most-encapsulated, best-segregated test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[GitHub] commons-rdf issue #32: COMMONSRDF-55: Handle Jena's urn:x-arq:DefaultGraph a...

Posted by stain <gi...@git.apache.org>.
Github user stain commented on the issue:

    https://github.com/apache/commons-rdf/pull/32
  
    Note that option (2) would also add potential inconsistency between Commons RDF `.getGraphName().isPresent()` and `asJenaQuad().isDefaultGraph()`, which is unfortunate, but only in that from-foreign `IRI` case.
    
    I've added option (2) now to the implementation and the [new tests](https://github.com/apache/commons-rdf/blob/COMMONSRDF-55/jena/src/test/java/org/apache/commons/rdf/jena/DefaultGraphInQuadTest.java) in this branch. OK to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org