You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/05/07 13:34:00 UTC

[jira] [Commented] (JENA-1894) Insert-order preserving dataset

    [ https://issues.apache.org/jira/browse/JENA-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101681#comment-17101681 ] 

Andy Seaborne commented on JENA-1894:
-------------------------------------

{quote}Note, that DatasetGraphQuadsImpl at present falsly claims that it is transaction aware - because otherwise any SPARQL insert caused an exception 
{quote}

Which bit of code causes the exception?

All datasets should be "transactional" but that level of support can be provided using a generic implementation pattern for basic implementation for transactions using a lock and assuming not persistent (not ideal but in the absence of anything else, MRSW is the test that can be done).

See
[https://github.com/apache/jena/blob/ab7882a73445c7a75e811eb58d06211c410891b0/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetGraphMap.java#L73]

Due to the hierarchy, history (pre-Java default methods) and Java's single inheritance, it's a case of copying the code. (I  can see how to clear it up.)
 
There's more structure to DatasetGraph implementations in the newish DBOE: jena-dboe-storage : see for example [DatasetGraphStorage|https://github.com/apache/jena/blob/master/jena-db/jena-dboe-storage/src/main/java/org/apache/jena/dboe/storage/system/DatasetGraphStorage.java].
 

> Insert-order preserving dataset
> -------------------------------
>
>                 Key: JENA-1894
>                 URL: https://issues.apache.org/jira/browse/JENA-1894
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.14.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> To the best of my knowledge, there is no backend for datasets that retains insert order.
>  This feature is particularly useful when changing RDF files in a git repository, as it makes for nice commits. An insert-order preserving Triple/QuadTable implementation enables:
>  * Writing (subject-grouped) RDF files or events from an RDF stream out in nearly the same way they were read in - this makes it easier to compare outputs of data transformations
>  * Combining ORDER BY with CONSTRUCT queries:
> {code:java}
> Dataset ds = DatasetFactory.createOrderPreservingDataset();
> QueryExecutionFactory.create("CONSTRUCT WHERE { ?s ?p ?o } ORDER BY ?s ?p ?o", ds);
> RDFDataMgr.write(System.out, ds, RDFFormat.TURTLE_BLOCKS);
> {code}
> I have created an implementation for this some time ago with the main classes of the machinery being:
>  * [QuadTableFromNestedMaps.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/QuadTableFromNestedMaps.java#L26]
>  * In addition, I created a lazy (but adequate?) wrapper for re-using a quad table as a triple table:
>  [TripleTableFromQuadTable.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/TripleTableFromQuadTable.java#L30]
>  * The DatasetGraph wapper:
>  [DatasetGraphQuadsImpl.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/DatasetGraphQuadsImpl.java#L32]
> The actual factory code then uses:
> {code:java}
>     public static DatasetGraph createOrderPreservingDatasetGraph() {
>         QuadTable quadTable = new QuadTableFromNestedMaps();
>         TripleTable tripleTable = new TripleTableFromQuadTable(quadTable);
>         DatasetGraph result = new DatasetGraphInMemory(quadTable, tripleTable);
>         return result;
>     }
> {code}
> Note, that DatasetGraphQuadsImpl at present falsly claims that it is transaction aware - because otherwise any SPARQL insert caused an exception (I have not tried with the latest fixes for 3.15.0-SNAPSHOT yet). In any case, for the use cases of writing out RDF transactions may not even be necessary, but if there is an easy way to add them, then it should be done.
> An example of the above code in action is here: [Git Diff based on ordered turtle-blocks output |https://github.com/SmartDataAnalytics/lodservatory/commit/ec50cd33230a771c557c1ed2751799401ea3fd89]
> The downside of using this kind of order preserving dataset is, that essentially it only features an gspo index. Hence, the performance characteristics of this kind of order preserving dataset - which is intended mostly for serialization or presentation - varies greatly form the query-optimized implementations.
> In any case, order preserving datasets are a highly useful feature for Jena and I'd gladly contribute a PR for that. My main questions are:
>  * How to call the factory methods in DatasetFactory, DatasetGraphFactory etc - createOrderPreservingDataset?
>  * In the approach using QuadTableFromNestedMaps needed - or can a different implementation of QuadTable be repurposed?
>  * It seems that the abstract class DatasetGraphQuads does not have any implementation at least in ARQ and the jena modules I use (according to eclipse) - so my custom implementation of DatasetGraphQuadsImpl seems to be needed, or is there a similar class lying around in another jena package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)