You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Code Ferret (JIRA)" <ji...@apache.org> on 2018/02/18 21:06:00 UTC
[jira] [Closed] (JENA-1489) models written twice on RDFConnection
[ https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Code Ferret closed JENA-1489.
-----------------------------
Resolution: Fixed
User error
> models written twice on RDFConnection
> -------------------------------------
>
> Key: JENA-1489
> URL: https://issues.apache.org/jira/browse/JENA-1489
> Project: Apache Jena
> Issue Type: Bug
> Components: Fuseki, Jena, TDB
> Affects Versions: Jena 3.7.0
> Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3, Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
> Reporter: Code Ferret
> Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and seeing doubling of blank nodes in _some_ graphs as though the same model is written a second time *after* a commit during the transfer. I apologize in advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so on. Each entity is a graph in a ttl file in a per type git repo. For each type, the ttl files are read from the corresponding repo into models and the models are added to a {{Dataset}} until the number of triples in the dataset exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query", baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel of loading is performed via:
> {code:java}
> private static void loadDatasetSimple(final Dataset ds) {
> if (!fuConn.isInTransaction()) {
> fuConn.begin(ReadWrite.WRITE);
> }
> fuConn.loadDataset(ds);
> fuConn.commit();
> }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type have been loaded from the corresponding repo. Since there may be some models not yet transferred after reading in all of the entities of a given type then a finish method is called:
> {code:java}
> static void finishDatasetTransfers() {
> // if map is not empty, transfer the last one
> if (currentDataset != null) {
> loadDatasetSimple(currentDataset);
> }
> }
> {code}
> After loading a given type of entity the next type in a list of types to transfer is processed as described above and this is when the problem is noticed.
> Once enough models of the next type have been added to the transfer dataset and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the previously transferred graphs exhibit doubled blank nodes. Here is {{describe bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix : <http://purl.bdrc.io/ontology/core/> .
> @prefix bdr: <http://purl.bdrc.io/resource/> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix adm: <http://purl.bdrc.io/ontology/admin/> .
> bdr:P58 a :Person ;
> adm:gitRevision "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
> adm:status bdr:StatusReleased ;
> :hasFather bdr:P4342 ;
> :hasMother bdr:P4343 ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G227
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonOccupiesSeat ;
> :personEventPlace bdr:G235
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonDeath ;
> :onOrAbout "1472"
> ] ;
> :personEvent [ a :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace bdr:G547
> ] ;
> :personGender bdr:GenderMale ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryTitle ;
> rdfs:label "spyan snga blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonTitle ;
> rdfs:label "rgya ma spyan snga ba blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonPrimaryName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> :personName [ a :PersonChineseName ;
> rdfs:label "金厄·洛卓坚赞"@zh
> ] ;
> :personName [ a :PersonFirstOrdinationName ;
> rdfs:label "blo gros rgyal mtshan/"@bo-x-ewts
> ] ;
> skos:prefLabel "blo gros rgyal mtshan/"@bo-x-ewts .
> {code}
> This doubling is completely reproducible and the same graphs exhibit doubling on each trial.
> Varying the threshold changes which graphs and how many graphs exhibit doubling. If the threshold is set higher, e.g., to 100,000 triples per call to {{loadDatasetSimple}} then many more graphs exhibit doubling. If the threshold is set lower, say to 20,000 triples, then fewer graphs exhibit doubling. If only a single model at-a-time is transferred then there is no doubling,
> Also if each type of entity is transferred separately - opening the connection, transferring all models of the type, then closing down via:
> {code:java}
> public static void closeConnections() {
> TransferHelpers.logger.info("closeConnections fuConn.commit, end, close");
> FusekiHelpers.fuConn.commit();
> FusekiHelpers.fuConn.end();
> FusekiHelpers.fuConn.close();
> }
> {code}
> There is no doubling.
> It appears that models that have already been transferred and committed are being written a second time when switching to a new type and upon the first transfer via {{loadDatasetSimple}} of the new type.
> I'm hoping there's enough information in this report to identify what sort of error in usage of {{RDFConnection}} and/or {{TDB}} would account for this behavior. If this appears to be a bug in Jena then I will have to expend more effort to create a relatively self-contained test case.
> Here is the relevant portion of the Fuseki configuration:
> {code:java}
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix : <http://base/#> .
> @prefix text: <http://jena.apache.org/text#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
> [] rdf:type fuseki:Server ;
> fuseki:services (
> :bdrcrw
> ) .
> :bdrcrw rdf:type fuseki:Service ;
> fuseki:name "bdrcrw" ; # name of the dataset in the url
> fuseki:serviceQuery "query" ; # SPARQL query service
> fuseki:serviceUpdate "update" ; # SPARQL update service
> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
> fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write)
> fuseki:dataset :bdrc_text_dataset ;
> .
> :bdrc_text_dataset rdf:type text:TextDataset ;
> text:dataset :dataset_bdrc ;
> text:index :bdrc_lucene_index ;
> .
> :dataset_bdrc rdf:type tdb:DatasetTDB ;
> tdb:location "/etc/fuseki/databases/bdrc" ;
> tdb:unionDefaultGraph true ;
> .
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)