You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by ajs6f <gi...@git.apache.org> on 2017/11/20 18:51:42 UTC

[GitHub] jena pull request #314: JENA-1430

GitHub user ajs6f opened a pull request:

    https://github.com/apache/jena/pull/314

    JENA-1430

    Includes #313, plus:
    
    - Extend testing to `DatasetAssembler`
    - Ensure that `DatasetAssembler` can also load quads
    - Correct `ja:DatasetTxnMem` => `ja:MemoryDataset`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajs6f/jena JENA-1430p

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #314
    
----
commit d174ec04dccb205de96e63c775e01f948380f8cc
Author: Andy Seaborne <an...@apache.org>
Date:   2017-11-20T10:57:01Z

    JENA-1430: Read quads for ja:data by filename

commit 3e13dc64f4047eb589d9da46e50561a25290a230
Author: ajs6f <aj...@apache.org>
Date:   2017-11-20T18:47:42Z

    JENA-1430: Quad loading for in-memory assemblers

----


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153755081
  
    --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/assembler/TestDatasetAssembler.java ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.sparql.core.assembler;
    +
    +import static java.nio.file.Files.createTempFile;
    +import static org.apache.jena.assembler.JA.data;
    +import static org.apache.jena.assembler.Mode.DEFAULT;
    +import static org.apache.jena.rdf.model.ModelFactory.createDefaultModel;
    +import static org.apache.jena.riot.Lang.NQUADS;
    +import static org.apache.jena.riot.RDFDataMgr.write;
    +import static org.apache.jena.vocabulary.RDF.type;
    +
    +import java.io.*;
    +import java.nio.file.Path;
    +
    +import org.apache.jena.assembler.JA;
    +import org.apache.jena.assembler.exceptions.CannotConstructException;
    +import org.apache.jena.query.Dataset;
    +import org.apache.jena.rdf.model.*;
    +import org.apache.jena.sparql.core.*;
    +import org.apache.jena.sparql.sse.SSE;
    +import org.apache.jena.sparql.util.IsoMatcher;
    +import org.apache.jena.system.Txn;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +public class TestDatasetAssembler extends Assert {
    --- End diff --
    
    Having one or two tests that works from a Turtle string or would make for clear tests of expected use.
    
    With test for bad cases of mixed usage (general and TIM vocabulary together)  text written test cases will help generating tests.


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r155532650
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssemblerVocab.java ---
    @@ -31,7 +31,7 @@
         // General dataset
         public static final Resource tDataset            = ResourceFactory.createResource(NS+"RDFDataset") ;
         // In-memory dataset
    -    public static final Resource tDatasetTxnMem      = ResourceFactory.createResource(NS+"DatasetTxnMem") ;
    +    public static final Resource tMemoryDataset      = ResourceFactory.createResource(NS+"MemoryDataset") ;
    --- End diff --
    
    I've put back `tDatasetTxnMem`.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152311436
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
             Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    -
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    +            // Load data into the default graph or quads into the dataset.
    +            multiValueAsString(root, data)
    +                .forEach(dataURI -> read(ds, dataURI));
    +    
    --- End diff --
    
    Okay, if I get what you are saying, it's:
    
    1. Check to see if quads are being loaded, if so, TIM. 
    2. Otherwise, check the named graphs. If they are all `ja:data` guys, then TIM again.
    3. Otherwise, general dataset.
    
    I'll get on this later today.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152397704
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
             Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    -
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    +            // Load data into the default graph or quads into the dataset.
    +            multiValueAsString(root, data)
    +                .forEach(dataURI -> read(ds, dataURI));
    +    
    --- End diff --
    
    @afs What's a good idiom for switching to a new assembler? In other words, let's say the code tests for the presence of quads and finds them and it's time to pivot to TIM. Obviously, I could just `new InMemDatasetAssembler()`, but I'm thinking there must be a more elegant way. I looked at `AssemblerUtils` but only saw ways to register `Assembler`s, not retrieve them…


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153816757
  
    --- Diff: jena-core/src/main/java/org/apache/jena/assembler/assemblers/AssemblerBase.java ---
    @@ -63,11 +63,16 @@ protected static RDFNode getUnique( Resource root, Property property )
             throw new NotUniqueException( root, property );
             }
     
    -    protected void checkType( Resource root, Resource type )
    -        {
    -        if (!root.hasProperty( RDF.type, type ))
    -            throw new CannotConstructException( this.getClass(), root, type );
    -        }
    +    /**
    +     * Throws {@link CannotConstructException} if the offered resource isn't any of the offered types.
    +     * 
    +     * @param root resource to check
    +     * @param types types for which to check
    +     */
    +    protected void checkType( Resource root, Resource... types ) {
    --- End diff --
    
    See the changes I made to [InMemDatasetAssembler](https://github.com/apache/jena/pull/314/files#diff-296ca0303990cf5f435982192c1f2563R45). That's where.
    
    This happens because if you pass assembler RDF with `RDFDataset`, it could become a general-purpose dataset or it could become a TIM dataset, and so the TIM assembler has to be able to accept both types. IOW, "An assembler resource must have one type", yes, but the more important point is that the assembler code has to be able to _accept_ more than one type.
    
    We can change the semantics, instead, so that there is a one-to-one between types and assembler classes. and I am increasingly convinced we should, because if _we_ are getting a bit confused about this, it's not going to be easy for users!


---

[GitHub] jena issue #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/314
  
    Ouch! I realize I wasn't clear enough at all in my last comment: I was actually -1 on this. I think the confusion that `ja:data` now has in its semantics is too much, and I completely disagree with:
    > We can't change the predicates due to existing deployed assemblers
    
    We change the public API when we must, we just do it carefully and after clear depreciation with a clear migration path. What's more, the docs have been wrong about the assembler type for TIM for a year or more, so I don't think there will be users waiting with pitchforks and torches!
    
    I'd like to either revert this merge or add a further commit adding the use of `ja:graph` to TIM's assembler (just like the general dataset assembler) and depreciating the use of `ja:data` for anything but quads with the expectation that we will disallow it in a near-future version. I can do that after a release, if that is easier.


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153745134
  
    --- Diff: jena-core/src/main/java/org/apache/jena/assembler/assemblers/AssemblerBase.java ---
    @@ -63,11 +63,16 @@ protected static RDFNode getUnique( Resource root, Property property )
             throw new NotUniqueException( root, property );
             }
     
    -    protected void checkType( Resource root, Resource type )
    -        {
    -        if (!root.hasProperty( RDF.type, type ))
    -            throw new CannotConstructException( this.getClass(), root, type );
    -        }
    +    /**
    +     * Throws {@link CannotConstructException} if the offered resource isn't any of the offered types.
    +     * 
    +     * @param root resource to check
    +     * @param types types for which to check
    +     */
    +    protected void checkType( Resource root, Resource... types ) {
    --- End diff --
    
    I don't understand this.  An assembler resource must have one type so that there is a unique mapping to the implementation to call.  I don't see in the code anywhere calling with 2+ arguments.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153813914
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -26,25 +30,29 @@
     import org.apache.jena.atlas.logging.Log ;
     import org.apache.jena.query.Dataset ;
     import org.apache.jena.query.DatasetFactory ;
    -import org.apache.jena.rdf.model.Model ;
    -import org.apache.jena.rdf.model.RDFNode ;
    -import org.apache.jena.rdf.model.Resource ;
    +import org.apache.jena.rdf.model.*;
     import org.apache.jena.sparql.graph.GraphFactory ;
     import org.apache.jena.sparql.util.FmtUtils ;
     import org.apache.jena.sparql.util.graph.GraphUtils ;
    +import org.apache.jena.system.Txn;
     
     public class DatasetAssembler extends AssemblerBase implements Assembler {
         public static Resource getType() {
             return DatasetAssemblerVocab.tDataset ;
         }
     
         @Override
    -    public Object open(Assembler a, Resource root, Mode mode) {
    +    public Dataset open(Assembler a, Resource root, Mode mode) {
             Dataset ds = createDataset(a, root, mode) ;
             return ds ;
         }
     
         public Dataset createDataset(Assembler a, Resource root, Mode mode) {
    +        checkType(root, DatasetAssemblerVocab.tDataset);
    +        // use TIM if quads are loaded or if all named Graphs are loaded via data property
    +        final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
    --- End diff --
    
    i don't understand the comment-- are you suggesting to use SPARQL instead of the API? If we are missing a test here, it means that I did not understand your plan for how to "distribute" the predicates amongst the types of dataset. In fact, I'm starting to wonder if that's what we should do after all, or whether it would be better to deprecate some of them, since they overlap, or confine the meanings of those that overlap so that they don't...


---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    @ajs6f I do not understand your point - it is to abstract talking about "predicates". Some examples maybe? 
    
    How do you deprecate in Turtle assembler files?
    
    `ja:graph` isn't unused by TIM.  Did you mean `namedGraph`?
    
    I don't understand about `ja:data` and quads, triples - that seems natural. It would be simple to check the source if triples for a TIM graph.  Note: loading quads should accept triples - a single graph in the dataset is the most common use case.
    
    My original PR #313 was just to make {{ja:data "filename"}} work, based on experience insisting on `<file:>` - that has relative name confusion.



---

[GitHub] jena issue #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/314
  
    Okay, then let's break this thing up, as you describe above. Your number 2 and 3 are uncontroversial. As for number 1, sure, I want that fix too, but this PR (#314) didn't do just that. It also conflated the assembler RDF for these two types of dataset and I now see that TIM's use of `ja:data` for two purposes is a big part of that. Let's revert this and merge #313 instead and we can take up JENA-1445 after 3.6.0. 



---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153814226
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssemblerVocab.java ---
    @@ -31,7 +31,7 @@
         // General dataset
         public static final Resource tDataset            = ResourceFactory.createResource(NS+"RDFDataset") ;
         // In-memory dataset
    -    public static final Resource tDatasetTxnMem      = ResourceFactory.createResource(NS+"DatasetTxnMem") ;
    +    public static final Resource tMemoryDataset      = ResourceFactory.createResource(NS+"MemoryDataset") ;
    --- End diff --
    
    Yes, we can do that.


---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    Can we separate the different parts of this and unblock 3.6.0? 
    
    Is the following acceptable for 3.6.0:
    
    1. #314 fixed a specific problem - `<file:filename>` and "filename" behave differently for resolution when the assembler file is not in the current directory of the JVM as happens in Fuseki.
    1. Make `ja:MemoryDataset` the class for the TIM assembler, with `ja:DatasetTxnMem` registered for backwards compatibility.
    1. Add the Fuseki example for using `ja:MemoryDataset`.



---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152246555
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
             Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    -
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    +            // Load data into the default graph or quads into the dataset.
    +            multiValueAsString(root, data)
    +                .forEach(dataURI -> read(ds, dataURI));
    +    
    --- End diff --
    
    This interacts with the setting up of the dft graph/named graphs. It's going to add dft graph just made (which might have content via `ja:content) but also the named graphs.  It's a dataset by links to the models where adding a model replaces the model.
    
    I'm not sure what he best balance is here -- options include (1) warning and no action on seeing `ja:data` (with a note to use TIM) (2) do after all the named models are added because parsing is adding quads, not replacing models (3) allow this or specified models, which would be better done with TIM (4) it's `ja:data` or specify graphs, not a combination; if `ja:data` return a TIM.
    
    (4) looks like a good way to encourage TIM usage. If (4) then a plain, empty `ja:RDFDataset` could be TIM as well.



---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152284128
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
             Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    -
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    +            // Load data into the default graph or quads into the dataset.
    +            multiValueAsString(root, data)
    +                .forEach(dataURI -> read(ds, dataURI));
    +    
    --- End diff --
    
    I'm trying to think of a use case for "load quads into non-TIM" and the one that occurs to me is in an embedded or integrated situation where you have a lot of quads, like so many that you prefer the memory-parsimonious-ness of the general IM dataset, maybe because you have other processes running in the system. Sound likely enough to merit (2)? 


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153754571
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssemblerVocab.java ---
    @@ -31,7 +31,7 @@
         // General dataset
         public static final Resource tDataset            = ResourceFactory.createResource(NS+"RDFDataset") ;
         // In-memory dataset
    -    public static final Resource tDatasetTxnMem      = ResourceFactory.createResource(NS+"DatasetTxnMem") ;
    +    public static final Resource tMemoryDataset      = ResourceFactory.createResource(NS+"MemoryDataset") ;
    --- End diff --
    
    @ajs6f correctly pointed out that `DatasetTxnMem` was in the documentation so migration support would be good such as register `InMemDatasetAssembler` under both names and log a warning if called as `ja:DatasetTxnMem`
    
    (And text for the release notes)


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r154233681
  
    --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/assembler/TestDatasetAssembler.java ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.sparql.core.assembler;
    +
    +import static java.nio.file.Files.createTempFile;
    +import static org.apache.jena.assembler.JA.data;
    +import static org.apache.jena.assembler.Mode.DEFAULT;
    +import static org.apache.jena.rdf.model.ModelFactory.createDefaultModel;
    +import static org.apache.jena.riot.Lang.NQUADS;
    +import static org.apache.jena.riot.RDFDataMgr.write;
    +import static org.apache.jena.vocabulary.RDF.type;
    +
    +import java.io.*;
    +import java.nio.file.Path;
    +
    +import org.apache.jena.assembler.JA;
    +import org.apache.jena.assembler.exceptions.CannotConstructException;
    +import org.apache.jena.query.Dataset;
    +import org.apache.jena.rdf.model.*;
    +import org.apache.jena.sparql.core.*;
    +import org.apache.jena.sparql.sse.SSE;
    +import org.apache.jena.sparql.util.IsoMatcher;
    +import org.apache.jena.system.Txn;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +public class TestDatasetAssembler extends Assert {
    --- End diff --
    
    I was lookign for tests that do both `ja:data` and `ja:defaultGraph`, which should be an error. I saw someone asking recently trying with `ja:defaultGraph` and TIM assemble, which just ignores unknown properties.  Easy confusion to make - so deliberately testing making an error seems sensible.


---

[GitHub] jena issue #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/314
  
    I'm saying that `ja:data` is now used for loading quads into two different kinds of datasets, but loading triples into only one. That's confusing at best. I want to make it clearer by getting rid of the second use, because we already have `ja:graph`, and then what about `ja:content`? But I'm happy to hear other ways to clarify the semantics-- I just think that we've made things more powerful but more confusing with this PR. I want to keep the power (load quads easily, which we just did) while not making it harder to understand how to build assembler RDF (which I think we just did).
    
    Re: deprecation: we can deprecate with comments and by throwing warnings from inside the assembler code. And again, we have good reason (the mistake in the docs that no one noticed enough to complain about) to think that it won't be too disruptive.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153815022
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/InMemDatasetAssembler.java ---
    @@ -20,54 +20,45 @@
     
     import static org.apache.jena.assembler.JA.data;
     import static org.apache.jena.query.DatasetFactory.createTxnMem;
    -import static org.apache.jena.query.ReadWrite.WRITE;
     import static org.apache.jena.riot.RDFDataMgr.read;
     import static org.apache.jena.sparql.core.assembler.AssemblerUtils.setContext;
    -import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.pGraphName;
    -import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.pNamedGraph;
    -import static org.apache.jena.sparql.util.graph.GraphUtils.getAsStringValue;
    -import static org.apache.jena.sparql.util.graph.GraphUtils.multiValueResource;
    +import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.*;
    +import static org.apache.jena.sparql.util.graph.GraphUtils.*;
     
     import org.apache.jena.assembler.Assembler;
     import org.apache.jena.assembler.Mode;
    -import org.apache.jena.assembler.assemblers.AssemblerBase;
     import org.apache.jena.query.Dataset;
     import org.apache.jena.rdf.model.Resource;
    +import org.apache.jena.system.Txn;
     
     /**
      * An {@link Assembler} that creates in-memory {@link Dataset}s.
      */
    -public class InMemDatasetAssembler extends AssemblerBase implements Assembler {
    +public class InMemDatasetAssembler extends DatasetAssembler {
     
         public static Resource getType() {
    -        return DatasetAssemblerVocab.tDatasetTxnMem ;
    +        return DatasetAssemblerVocab.tMemoryDataset ;
         }
         
     	@Override
     	public Dataset open(final Assembler assembler, final Resource root, final Mode mode) {
    -		checkType(root, DatasetAssemblerVocab.tDatasetTxnMem);
    --- End diff --
    
    One of my frustrations with this is that there were no tests of these assemblers at all when I started, so I'm not very sure about the current expected/guaranteed behavior. WRT to our comments above about the use of predicates, should we start migrating people away from `ja:graph`, or (probably better) instead restricting `ja:data` to _only_ loading quads into a dataset?


---

[GitHub] jena issue #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/314
  
    1. The two different datasets at TIM and general. The fact that the code for the general dataset assembler doesn't mention `ja:data` is fine, but because TIM's assembler does understand `ja:data` and some assembler RDF with the general dataset type _can_ get shunted to TIM, `ja:data` can end up being successfully used with well-defined semantics in assembler RDF with the general dataset type.
    
    This is what I mean by the whole thing having gotten a bit confusing! :grin: 
    
    I agree fully that we need to be able to do what you clearly intend to do with your example. I just want the the example to look like:
    ```
    <#dataset> a ja:MemoryDataset ;
      ja:data "File for the default graph - also data for dataset" ;
      ja:namedGraph [ ja:graphName "http://example/graph1"; ja:graph "File for graph1" ] ;
      ja:namedGraph [ ja:graphName "http://example/graph2"; ja:graph "File for graph2" ] ;
    .
    ```
    instead so that TIM and general look the same from the assembler POV.
    
    I don't want to block 3.6.0 on this, so what do you say to your three points above, plus (after 3.6.0)
    
    4. We gracefully deprecate the use of `ja:data` for anything other than loading quads into datasets and make TIM's assembler RDF look like the general dataset assembler RDF modulo the dataset type.
    
    Sound okay?


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153813903
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -26,25 +30,29 @@
     import org.apache.jena.atlas.logging.Log ;
     import org.apache.jena.query.Dataset ;
     import org.apache.jena.query.DatasetFactory ;
    -import org.apache.jena.rdf.model.Model ;
    -import org.apache.jena.rdf.model.RDFNode ;
    -import org.apache.jena.rdf.model.Resource ;
    +import org.apache.jena.rdf.model.*;
     import org.apache.jena.sparql.graph.GraphFactory ;
     import org.apache.jena.sparql.util.FmtUtils ;
     import org.apache.jena.sparql.util.graph.GraphUtils ;
    +import org.apache.jena.system.Txn;
     
     public class DatasetAssembler extends AssemblerBase implements Assembler {
         public static Resource getType() {
             return DatasetAssemblerVocab.tDataset ;
         }
     
         @Override
    -    public Object open(Assembler a, Resource root, Mode mode) {
    +    public Dataset open(Assembler a, Resource root, Mode mode) {
             Dataset ds = createDataset(a, root, mode) ;
             return ds ;
         }
     
         public Dataset createDataset(Assembler a, Resource root, Mode mode) {
    +        checkType(root, DatasetAssemblerVocab.tDataset);
    +        // use TIM if quads are loaded or if all named Graphs are loaded via data property
    +        final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
    +        if (root.hasProperty(data) || allNamedGraphsLoadViaData) return new InMemDatasetAssembler().open(a, root, mode);
    --- End diff --
    
    Much clearer than otherwise, to my eye. This is style, and I'm happy to change this to be clearer for you, but it's not an objective question.


---

[GitHub] jena issue #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/314
  
    @afs What do you think of that? It's clearer, I think, along the lines [you suggested](https://github.com/apache/jena/pull/314#discussion_r152289270).


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153751942
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/InMemDatasetAssembler.java ---
    @@ -20,54 +20,45 @@
     
     import static org.apache.jena.assembler.JA.data;
     import static org.apache.jena.query.DatasetFactory.createTxnMem;
    -import static org.apache.jena.query.ReadWrite.WRITE;
     import static org.apache.jena.riot.RDFDataMgr.read;
     import static org.apache.jena.sparql.core.assembler.AssemblerUtils.setContext;
    -import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.pGraphName;
    -import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.pNamedGraph;
    -import static org.apache.jena.sparql.util.graph.GraphUtils.getAsStringValue;
    -import static org.apache.jena.sparql.util.graph.GraphUtils.multiValueResource;
    +import static org.apache.jena.sparql.core.assembler.DatasetAssemblerVocab.*;
    +import static org.apache.jena.sparql.util.graph.GraphUtils.*;
     
     import org.apache.jena.assembler.Assembler;
     import org.apache.jena.assembler.Mode;
    -import org.apache.jena.assembler.assemblers.AssemblerBase;
     import org.apache.jena.query.Dataset;
     import org.apache.jena.rdf.model.Resource;
    +import org.apache.jena.system.Txn;
     
     /**
      * An {@link Assembler} that creates in-memory {@link Dataset}s.
      */
    -public class InMemDatasetAssembler extends AssemblerBase implements Assembler {
    +public class InMemDatasetAssembler extends DatasetAssembler {
     
         public static Resource getType() {
    -        return DatasetAssemblerVocab.tDatasetTxnMem ;
    +        return DatasetAssemblerVocab.tMemoryDataset ;
         }
         
     	@Override
     	public Dataset open(final Assembler assembler, final Resource root, final Mode mode) {
    -		checkType(root, DatasetAssemblerVocab.tDatasetTxnMem);
    --- End diff --
    
    We ought to also check for use of the RDFDataset vocabulary on a MemoryDataset. e.g. `ja:defaultGraph` and then `ja:namedGraph` pointing to `ja:graph` and a `ja:MemoryModel`.
    
    The reverse tests ought to be in general assembler.


---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    `ja:data` isn't mentioned in `DatasetAssembler` except to redirect to TIM.  Which two different kinds of dataset?
    
    Would you like to move  `public static final Property data = property( "data" );` out of JA.java  into the assembler vocab definitions file?
    
    We need a way to say:
    ```
    <#dataset> a ja:MemoryDataset ;
      ja:data "File for the default graph - also data for dataset" ;
      ja:namedGraph [ ja:graphName "http://example/graph1"; ja:data "File for graph1" ] ;
      ja:namedGraph [ ja:graphName "http://example/graph2"; ja:data "File for graph2" ] ;
    .
    ```
    because that gets a fully transactional dataset and I'd hope our preferred way to do have plain data in a memory dataset.  Having data-for-triples, data-for-quads gets weird when its an unknown URL to load from for the default graph.
    
    `ja:Content` is for `ja:Model`s in `ja:RDFDataset` only.  It is too old (= documented and written about outside the project) to drop. It is needed for inference setups.



---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    Merged in order to progress Jena 3.6.0.


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153753683
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -26,25 +30,29 @@
     import org.apache.jena.atlas.logging.Log ;
     import org.apache.jena.query.Dataset ;
     import org.apache.jena.query.DatasetFactory ;
    -import org.apache.jena.rdf.model.Model ;
    -import org.apache.jena.rdf.model.RDFNode ;
    -import org.apache.jena.rdf.model.Resource ;
    +import org.apache.jena.rdf.model.*;
     import org.apache.jena.sparql.graph.GraphFactory ;
     import org.apache.jena.sparql.util.FmtUtils ;
     import org.apache.jena.sparql.util.graph.GraphUtils ;
    +import org.apache.jena.system.Txn;
     
     public class DatasetAssembler extends AssemblerBase implements Assembler {
         public static Resource getType() {
             return DatasetAssemblerVocab.tDataset ;
         }
     
         @Override
    -    public Object open(Assembler a, Resource root, Mode mode) {
    +    public Dataset open(Assembler a, Resource root, Mode mode) {
             Dataset ds = createDataset(a, root, mode) ;
             return ds ;
         }
     
         public Dataset createDataset(Assembler a, Resource root, Mode mode) {
    +        checkType(root, DatasetAssemblerVocab.tDataset);
    +        // use TIM if quads are loaded or if all named Graphs are loaded via data property
    +        final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
    --- End diff --
    
    A SPARQL query would be quite nicely here!
     
    The test needs to include checking and rejecting mixed cases like `ja:defaultGraph` and `ja:data` on the dataset resource and both `ja:graph` `ja:data` from the object of `ja:namedGraph`.
    
    Similarly, `tMemoryDataset` needs to check for general vocab.
    
    That is, two tests: "hasGeneralDatasetVocab" and "isMemoryDatasetVocab" then check not true/true or false/false.


---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    Required reverts done, #313 applied and `ja:MemoryDataset` made the primary type name for TIM assembler.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152283404
  
    --- Diff: jena-fuseki2/examples/fuseki-in-mem-txn.ttl ---
    @@ -0,0 +1,24 @@
    +## Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
    +
    +@prefix :        <#> .
    +@prefix fuseki:  <http://jena.apache.org/fuseki#> .
    +@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    +@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
    +@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
    +@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
    +
    +<#serviceInMemory> rdf:type fuseki:Service;
    +    rdfs:label                   "In-memory, trasnactioal dataset.";
    +    fuseki:name                  "ds";
    +    fuseki:serviceQuery          "query";
    +    fuseki:serviceQuery          "sparql";
    +    fuseki:serviceUpdate         "update";
    +    fuseki:serviceUpload         "upload" ;
    +    fuseki:serviceReadGraphStore "data" ;
    +    fuseki:serviceReadGraphStore "get" ;
    +    fuseki:dataset <#dataset> ;
    +.
    +
    +<#dataset> rdf:type ja:DatasetTxnMem;
    --- End diff --
    
    Nice catch, thanks, fixed.


---

Re: [GitHub] jena pull request #314: JENA-1430

Posted by Andy Seaborne <an...@apache.org>.
Claude - it isn't subclassed.

Some tests are of assemblers themselves reading real, normal RDF data 
files because that is what the assembler is supplied to do.  It isn't a 
case of putting in a jar.

     Andy

On 07/12/17 15:25, Claude Warren wrote:
> Q:  What about when the tests are used by subclasses in a different
> package.  The test files would then need to be loaded from the class path.
> I ran into this issue trying to run tests on graph implementations before.
> However, it my not apply in this specific case with assemblers.  So I am
> asking, is there an issue here if the test is subclassed in a different
> project and includes these tests via the test-jar packaging?
> 
> Claude
> 
> 
> On Thu, Nov 30, 2017 at 11:38 PM, afs <gi...@git.apache.org> wrote:
> 
>> Github user afs commented on a diff in the pull request:
>>
>>      https://github.com/apache/jena/pull/314#discussion_r154233274
>>
>>      --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/
>> assembler/TestDatasetAssembler.java ---
>>      @@ -0,0 +1,122 @@
>>      +/*
>>      + * Licensed to the Apache Software Foundation (ASF) under one
>>      + * or more contributor license agreements.  See the NOTICE file
>>      + * distributed with this work for additional information
>>      + * regarding copyright ownership.  The ASF licenses this file
>>      + * to you under the Apache License, Version 2.0 (the
>>      + * "License"); you may not use this file except in compliance
>>      + * with the License.  You may obtain a copy of the License at
>>      + *
>>      + *     http://www.apache.org/licenses/LICENSE-2.0
>>      + *
>>      + * Unless required by applicable law or agreed to in writing, software
>>      + * distributed under the License is distributed on an "AS IS" BASIS,
>>      + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>      + * See the License for the specific language governing permissions and
>>      + * limitations under the License.
>>      + */
>>      +package org.apache.jena.sparql.core.assembler;
>>      +
>>      +import static java.nio.file.Files.createTempFile;
>>      +import static org.apache.jena.assembler.JA.data;
>>      +import static org.apache.jena.assembler.Mode.DEFAULT;
>>      +import static org.apache.jena.rdf.model.ModelFactory.
>> createDefaultModel;
>>      +import static org.apache.jena.riot.Lang.NQUADS;
>>      +import static org.apache.jena.riot.RDFDataMgr.write;
>>      +import static org.apache.jena.vocabulary.RDF.type;
>>      +
>>      +import java.io.*;
>>      +import java.nio.file.Path;
>>      +
>>      +import org.apache.jena.assembler.JA;
>>      +import org.apache.jena.assembler.exceptions.CannotConstructException;
>>      +import org.apache.jena.query.Dataset;
>>      +import org.apache.jena.rdf.model.*;
>>      +import org.apache.jena.sparql.core.*;
>>      +import org.apache.jena.sparql.sse.SSE;
>>      +import org.apache.jena.sparql.util.IsoMatcher;
>>      +import org.apache.jena.system.Txn;
>>      +import org.junit.Assert;
>>      +import org.junit.Test;
>>      +
>>      +public class TestDatasetAssembler extends Assert {
>>      --- End diff --
>>
>>      Generally, tests are loaded from a `testing/` directory, not the
>> classpath.  Some tests need to read real, normal files and also the W3C
>> SPARQL tests run from on-disk files and manifests.  Once some tests do need
>> files, there isn't much value in classpath loading as its extra, not
>> instead of.  Assemblers need to read from files, it being the common
>> usecase.
>>
>>
>>
>>
>> ---
>>
> 
> 
> 

Re: [GitHub] jena pull request #314: JENA-1430

Posted by Claude Warren <cl...@xenei.com>.
Q:  What about when the tests are used by subclasses in a different
package.  The test files would then need to be loaded from the class path.
I ran into this issue trying to run tests on graph implementations before.
However, it my not apply in this specific case with assemblers.  So I am
asking, is there an issue here if the test is subclassed in a different
project and includes these tests via the test-jar packaging?

Claude


On Thu, Nov 30, 2017 at 11:38 PM, afs <gi...@git.apache.org> wrote:

> Github user afs commented on a diff in the pull request:
>
>     https://github.com/apache/jena/pull/314#discussion_r154233274
>
>     --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/
> assembler/TestDatasetAssembler.java ---
>     @@ -0,0 +1,122 @@
>     +/*
>     + * Licensed to the Apache Software Foundation (ASF) under one
>     + * or more contributor license agreements.  See the NOTICE file
>     + * distributed with this work for additional information
>     + * regarding copyright ownership.  The ASF licenses this file
>     + * to you under the Apache License, Version 2.0 (the
>     + * "License"); you may not use this file except in compliance
>     + * with the License.  You may obtain a copy of the License at
>     + *
>     + *     http://www.apache.org/licenses/LICENSE-2.0
>     + *
>     + * Unless required by applicable law or agreed to in writing, software
>     + * distributed under the License is distributed on an "AS IS" BASIS,
>     + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>     + * See the License for the specific language governing permissions and
>     + * limitations under the License.
>     + */
>     +package org.apache.jena.sparql.core.assembler;
>     +
>     +import static java.nio.file.Files.createTempFile;
>     +import static org.apache.jena.assembler.JA.data;
>     +import static org.apache.jena.assembler.Mode.DEFAULT;
>     +import static org.apache.jena.rdf.model.ModelFactory.
> createDefaultModel;
>     +import static org.apache.jena.riot.Lang.NQUADS;
>     +import static org.apache.jena.riot.RDFDataMgr.write;
>     +import static org.apache.jena.vocabulary.RDF.type;
>     +
>     +import java.io.*;
>     +import java.nio.file.Path;
>     +
>     +import org.apache.jena.assembler.JA;
>     +import org.apache.jena.assembler.exceptions.CannotConstructException;
>     +import org.apache.jena.query.Dataset;
>     +import org.apache.jena.rdf.model.*;
>     +import org.apache.jena.sparql.core.*;
>     +import org.apache.jena.sparql.sse.SSE;
>     +import org.apache.jena.sparql.util.IsoMatcher;
>     +import org.apache.jena.system.Txn;
>     +import org.junit.Assert;
>     +import org.junit.Test;
>     +
>     +public class TestDatasetAssembler extends Assert {
>     --- End diff --
>
>     Generally, tests are loaded from a `testing/` directory, not the
> classpath.  Some tests need to read real, normal files and also the W3C
> SPARQL tests run from on-disk files and manifests.  Once some tests do need
> files, there isn't much value in classpath loading as its extra, not
> instead of.  Assemblers need to read from files, it being the common
> usecase.
>
>
>
>
> ---
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r154233274
  
    --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/assembler/TestDatasetAssembler.java ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.sparql.core.assembler;
    +
    +import static java.nio.file.Files.createTempFile;
    +import static org.apache.jena.assembler.JA.data;
    +import static org.apache.jena.assembler.Mode.DEFAULT;
    +import static org.apache.jena.rdf.model.ModelFactory.createDefaultModel;
    +import static org.apache.jena.riot.Lang.NQUADS;
    +import static org.apache.jena.riot.RDFDataMgr.write;
    +import static org.apache.jena.vocabulary.RDF.type;
    +
    +import java.io.*;
    +import java.nio.file.Path;
    +
    +import org.apache.jena.assembler.JA;
    +import org.apache.jena.assembler.exceptions.CannotConstructException;
    +import org.apache.jena.query.Dataset;
    +import org.apache.jena.rdf.model.*;
    +import org.apache.jena.sparql.core.*;
    +import org.apache.jena.sparql.sse.SSE;
    +import org.apache.jena.sparql.util.IsoMatcher;
    +import org.apache.jena.system.Txn;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +public class TestDatasetAssembler extends Assert {
    --- End diff --
    
    Generally, tests are loaded from a `testing/` directory, not the classpath.  Some tests need to read real, normal files and also the W3C SPARQL tests run from on-disk files and manifests.  Once some tests do need files, there isn't much value in classpath loading as its extra, not instead of.  Assemblers need to read from files, it being the common usecase.
    



---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152289270
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
             Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    -
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    +            // Load data into the default graph or quads into the dataset.
    +            multiValueAsString(root, data)
    +                .forEach(dataURI -> read(ds, dataURI));
    +    
    --- End diff --
    
    That is to load specialised graphs e.g. inference? In which case the graphs themselves need to be pre-created before reading data. We already have a mechanism for graphs with `ja:content` on the graph description. Having two ways seems bad. Strange things will happen if a graph is in the quads and not explicitly created (it will be a plain graph when inference expected). It could be done be splitting the quads into separate graphs and generating the assembler text. 
    
    It might make sense to have some plain graphs and specialist graphs, that is a disjoint set, but to enforce this to stop accidents is not so easy without a lot of additional machinery to enforce disjoint.
    
    How about doing this progressively - we make ja:RDFDataset create either TIM, or linked datasets. And for now, it is one or the other, not a combination. (We can have per graph data loading into TIM as it does at the moment with some care - look into the namedGhraph property to see if it has `ja:data`+name, and only those, if not, it's a general model).
    
    We leave more exotic combinations until we get a request because that may help choose what to do with the corner cases like whether they overlap or not. If we provide now, the corner cases are fixed by compatibility.
    



---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153758968
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -57,28 +65,31 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
             else
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
    -        Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    +        Dataset ds = DatasetFactory.create(dftModel);
     
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    --- End diff --
    
    Side note: Because the general dataset can't support abort, a `Txn` is nice but of little help; if anything goes wrong assembling, it's broken.


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r152234360
  
    --- Diff: jena-fuseki2/examples/fuseki-in-mem-txn.ttl ---
    @@ -0,0 +1,24 @@
    +## Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
    +
    +@prefix :        <#> .
    +@prefix fuseki:  <http://jena.apache.org/fuseki#> .
    +@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    +@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
    +@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
    +@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
    +
    +<#serviceInMemory> rdf:type fuseki:Service;
    +    rdfs:label                   "In-memory, trasnactioal dataset.";
    +    fuseki:name                  "ds";
    +    fuseki:serviceQuery          "query";
    +    fuseki:serviceQuery          "sparql";
    +    fuseki:serviceUpdate         "update";
    +    fuseki:serviceUpload         "upload" ;
    +    fuseki:serviceReadGraphStore "data" ;
    +    fuseki:serviceReadGraphStore "get" ;
    +    fuseki:dataset <#dataset> ;
    +.
    +
    +<#dataset> rdf:type ja:DatasetTxnMem;
    --- End diff --
    
    This needs to change to `ja:MemoryDataset`.



---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153814026
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -57,28 +65,31 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) {
             else
                 // Assembler description did not define one.
                 dftModel = GraphFactory.makeDefaultModel() ;
    -        Dataset ds = DatasetFactory.create(dftModel) ;
    -        // -------- Named graphs
    -        List<RDFNode> nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ;
    -        for ( RDFNode n : nodes ) {
    -            if ( !(n instanceof Resource) )
    -                throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n));
    -            Resource r = (Resource)n;
    +        Dataset ds = DatasetFactory.create(dftModel);
     
    -            String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName);
    -            Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph);
    -            if ( g == null ) {
    -                g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt);
    -                if ( g != null ) {
    -                    Log.warn(this, "Use of old vocabulary: use :graph not :graphData");
    -                } else {
    -                    throw new DatasetAssemblerException(root, "no graph for: " + gName);
    +        Txn.executeWrite(ds, () -> {
    --- End diff --
    
    Yes, I just use it more or less for uniformity. I want our code to model best practices.


---

[GitHub] jena issue #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/314
  
    I'm certainly willing to discuss it, that's what JENA-1445 is about and we can rename it to be more accurate.
    
    I'm not able to agree to the details at this stage. I have several outstanding questions and also outstanding points raised with both its name and semantics. There is also the status of `ja:externalContent` playing into the mix which hasn't been explained.
    
    



---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r155532515
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -26,25 +30,29 @@
     import org.apache.jena.atlas.logging.Log ;
     import org.apache.jena.query.Dataset ;
     import org.apache.jena.query.DatasetFactory ;
    -import org.apache.jena.rdf.model.Model ;
    -import org.apache.jena.rdf.model.RDFNode ;
    -import org.apache.jena.rdf.model.Resource ;
    +import org.apache.jena.rdf.model.*;
     import org.apache.jena.sparql.graph.GraphFactory ;
     import org.apache.jena.sparql.util.FmtUtils ;
     import org.apache.jena.sparql.util.graph.GraphUtils ;
    +import org.apache.jena.system.Txn;
     
     public class DatasetAssembler extends AssemblerBase implements Assembler {
         public static Resource getType() {
             return DatasetAssemblerVocab.tDataset ;
         }
     
         @Override
    -    public Object open(Assembler a, Resource root, Mode mode) {
    +    public Dataset open(Assembler a, Resource root, Mode mode) {
             Dataset ds = createDataset(a, root, mode) ;
             return ds ;
         }
     
         public Dataset createDataset(Assembler a, Resource root, Mode mode) {
    +        checkType(root, DatasetAssemblerVocab.tDataset);
    +        // use TIM if quads are loaded or if all named Graphs are loaded via data property
    +        final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
    --- End diff --
    
    We can't change the predicates due to existing deployed assemblers.  We can check for consistency e.g. currently, this works:
    ```
    <#dataset> rdf:type ja:MemoryDataset ;
       ja:data <file:D.trig>;
       ja:namedGraph [ ];
    .
    ```
    (loops on properties of the object of `namedGraph` so if there are none ...)
    as does this:
    ```
    <#dataset> rdf:type ja:MemoryDataset ;
       ja:data <file:D.trig>;
       ja:defaultGraph [ a ja:MemoryModel ];
    .
    
    ```
    (ignores `ja:defaultGraph`)


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153753744
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java ---
    @@ -26,25 +30,29 @@
     import org.apache.jena.atlas.logging.Log ;
     import org.apache.jena.query.Dataset ;
     import org.apache.jena.query.DatasetFactory ;
    -import org.apache.jena.rdf.model.Model ;
    -import org.apache.jena.rdf.model.RDFNode ;
    -import org.apache.jena.rdf.model.Resource ;
    +import org.apache.jena.rdf.model.*;
     import org.apache.jena.sparql.graph.GraphFactory ;
     import org.apache.jena.sparql.util.FmtUtils ;
     import org.apache.jena.sparql.util.graph.GraphUtils ;
    +import org.apache.jena.system.Txn;
     
     public class DatasetAssembler extends AssemblerBase implements Assembler {
         public static Resource getType() {
             return DatasetAssemblerVocab.tDataset ;
         }
     
         @Override
    -    public Object open(Assembler a, Resource root, Mode mode) {
    +    public Dataset open(Assembler a, Resource root, Mode mode) {
             Dataset ds = createDataset(a, root, mode) ;
             return ds ;
         }
     
         public Dataset createDataset(Assembler a, Resource root, Mode mode) {
    +        checkType(root, DatasetAssemblerVocab.tDataset);
    +        // use TIM if quads are loaded or if all named Graphs are loaded via data property
    +        final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
    +        if (root.hasProperty(data) || allNamedGraphsLoadViaData) return new InMemDatasetAssembler().open(a, root, mode);
    --- End diff --
    
    All on one line is really quite unclear. Have to live with Java.


---

[GitHub] jena pull request #314: JENA-1430

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153815330
  
    --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/assembler/TestDatasetAssembler.java ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.sparql.core.assembler;
    +
    +import static java.nio.file.Files.createTempFile;
    +import static org.apache.jena.assembler.JA.data;
    +import static org.apache.jena.assembler.Mode.DEFAULT;
    +import static org.apache.jena.rdf.model.ModelFactory.createDefaultModel;
    +import static org.apache.jena.riot.Lang.NQUADS;
    +import static org.apache.jena.riot.RDFDataMgr.write;
    +import static org.apache.jena.vocabulary.RDF.type;
    +
    +import java.io.*;
    +import java.nio.file.Path;
    +
    +import org.apache.jena.assembler.JA;
    +import org.apache.jena.assembler.exceptions.CannotConstructException;
    +import org.apache.jena.query.Dataset;
    +import org.apache.jena.rdf.model.*;
    +import org.apache.jena.sparql.core.*;
    +import org.apache.jena.sparql.sse.SSE;
    +import org.apache.jena.sparql.util.IsoMatcher;
    +import org.apache.jena.system.Txn;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +public class TestDatasetAssembler extends Assert {
    --- End diff --
    
    Sure, a couple of Turtle assembler files would make for nice examples. Do we have a standard practice about how to load test files from the classpath during a build?
    
    I don't understand the second sentence. Can you restate that?


---

[GitHub] jena pull request #314: JENA-1430

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/314#discussion_r153745192
  
    --- Diff: jena-core/src/main/java/org/apache/jena/assembler/assemblers/AssemblerBase.java ---
    @@ -63,11 +63,16 @@ protected static RDFNode getUnique( Resource root, Property property )
             throw new NotUniqueException( root, property );
             }
     
    -    protected void checkType( Resource root, Resource type )
    -        {
    -        if (!root.hasProperty( RDF.type, type ))
    -            throw new CannotConstructException( this.getClass(), root, type );
    -        }
    +    /**
    +     * Throws {@link CannotConstructException} if the offered resource isn't any of the offered types.
    +     * 
    +     * @param root resource to check
    +     * @param types types for which to check
    +     */
    +    protected void checkType( Resource root, Resource... types ) {
    +        for (Resource type : types) if (root.hasProperty( RDF.type, type )) return;
    --- End diff --
    
    Everything on one line is a bit confusing!


---

[GitHub] jena pull request #314: JENA-1430

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/314


---