You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2018/02/17 20:45:06 UTC

[jira] [Comment Edited] (JENA-1489) models written twice on RDFConnection

    [ https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366942#comment-16366942 ] 

Andy Seaborne edited comment on JENA-1489 at 2/17/18 8:44 PM:
--------------------------------------------------------------

Sending to Fuseki with "loadDataset" writes (at the client) and reads (at Fuseki) the RDF triples in an RDF synatx, so creating new blank nodes.

The read end is ready-to-go for round-tripping blanks (which includes sending them twice).

The write end isn't.

Is this
{noformat}
loadDataset(data)
loadDataset(data)  // Same data{noformat}
a complete example of the problem?

 


was (Author: andy.seaborne):
Sending to Fuseki with "loadDataset" writes (at the client) and reads (at Fuseki) the RDF triples in an RDF synatx, so creating new blank nodes.

The read end is read-to-go for round-tripping blanks (which includes sending them twice).

The write end isn't.

Is this
{noformat}
loadDataset(data)
loadDataset(data)  // Same data{noformat}
a complete example of the problem?

 

> models written twice on RDFConnection
> -------------------------------------
>
>                 Key: JENA-1489
>                 URL: https://issues.apache.org/jira/browse/JENA-1489
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Fuseki, Jena, TDB
>    Affects Versions: Jena 3.7.0
>         Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3, Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
>            Reporter: Code Ferret
>            Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and seeing doubling of blank nodes in _some_ graphs as though the same model is written a second time *after* a commit during the transfer. I apologize in advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so on. Each entity is a graph in a ttl file in a per type git repo. For each type, the ttl files are read from the corresponding repo into models and the models are added to a {{Dataset}} until the number of triples in the dataset exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query", baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel of loading is performed via:
> {code:java}
>     private static void loadDatasetSimple(final Dataset ds) {
>         if (!fuConn.isInTransaction()) {
>             fuConn.begin(ReadWrite.WRITE);
>         }
>         fuConn.loadDataset(ds);
>         fuConn.commit();
>     }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type have been loaded from the corresponding repo. Since there may be some models not yet transferred after reading in all of the entities of a given type then a finish method is called:
> {code:java}
>     static void finishDatasetTransfers() {
>         // if map is not empty, transfer the last one
>         if (currentDataset != null) {
>             loadDatasetSimple(currentDataset);
>         }
>     }
> {code}
> After loading a given type of entity the next type in a list of types to transfer is processed as described above and this is when the problem is noticed.
> Once enough models of the next type have been added to the transfer dataset and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the previously transferred graphs exhibit doubled blank nodes. Here is {{describe bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix :      <http://purl.bdrc.io/ontology/core/> .
> @prefix bdr:   <http://purl.bdrc.io/resource/> .
> @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
> @prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
> @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix adm:   <http://purl.bdrc.io/ontology/admin/> .
> bdr:P58  a                :Person ;
>         adm:gitRevision   "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
>         adm:status        bdr:StatusReleased ;
>         :hasFather        bdr:P4342 ;
>         :hasMother        bdr:P4343 ;
>         :personEvent      [ a                  :PersonOccupiesSeat ;
>                             :personEventPlace  bdr:G227
>                           ] ;
>         :personEvent      [ a                  :PersonOccupiesSeat ;
>                             :personEventPlace  bdr:G227
>                           ] ;
>         :personEvent      [ a                  :PersonBirth ;
>                             :onOrAbout         "1402" ;
>                             :personEventPlace  bdr:G547
>                           ] ;
>         :personEvent      [ a                  :PersonOccupiesSeat ;
>                             :personEventPlace  bdr:G235
>                           ] ;
>         :personEvent      [ a                  :PersonOccupiesSeat ;
>                             :personEventPlace  bdr:G235
>                           ] ;
>         :personEvent      [ a           :PersonDeath ;
>                             :onOrAbout  "1472"
>                           ] ;
>         :personEvent      [ a           :PersonDeath ;
>                             :onOrAbout  "1472"
>                           ] ;
>         :personEvent      [ a                  :PersonBirth ;
>                             :onOrAbout         "1402" ;
>                             :personEventPlace  bdr:G547
>                           ] ;
>         :personGender     bdr:GenderMale ;
>         :personName       [ a           :PersonPrimaryTitle ;
>                             rdfs:label  "spyan snga blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonPrimaryTitle ;
>                             rdfs:label  "spyan snga blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonChineseName ;
>                             rdfs:label  "金厄·洛卓坚赞"@zh
>                           ] ;
>         :personName       [ a           :PersonTitle ;
>                             rdfs:label  "rgya ma spyan snga ba blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonPrimaryName ;
>                             rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonTitle ;
>                             rdfs:label  "rgya ma spyan snga ba blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonPrimaryName ;
>                             rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonFirstOrdinationName ;
>                             rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         :personName       [ a           :PersonChineseName ;
>                             rdfs:label  "金厄·洛卓坚赞"@zh
>                           ] ;
>         :personName       [ a           :PersonFirstOrdinationName ;
>                             rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
>                           ] ;
>         skos:prefLabel    "blo gros rgyal mtshan/"@bo-x-ewts .
> {code}
> This doubling is completely reproducible and the same graphs exhibit doubling on each trial.
> Varying the threshold changes which graphs and how many graphs exhibit doubling. If the threshold is set higher, e.g., to 100,000 triples per call to {{loadDatasetSimple}} then many more graphs exhibit doubling. If the threshold is set lower, say to 20,000 triples, then fewer graphs exhibit doubling. If only a single model at-a-time is transferred then there is no doubling,
> Also if each type of entity is transferred separately - opening the connection, transferring all models of the type, then closing down via:
> {code:java}
>     public static void closeConnections() {
>         TransferHelpers.logger.info("closeConnections fuConn.commit, end, close");
>         FusekiHelpers.fuConn.commit();
>         FusekiHelpers.fuConn.end();
>         FusekiHelpers.fuConn.close();
>     }
> {code}
> There is no doubling.
> It appears that models that have already been transferred and committed are being written a second time when switching to a new type and upon the first transfer via {{loadDatasetSimple}} of the new type.
> I'm hoping there's enough information in this report to identify what sort of error in usage of {{RDFConnection}} and/or {{TDB}} would account for this behavior. If this appears to be a bug in Jena then I will have to expend more effort to create a relatively self-contained test case.
> Here is the relevant portion of the Fuseki configuration:
> {code:java}
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix :        <http://base/#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix skos:    <http://www.w3.org/2004/02/skos/core#> .
> [] rdf:type fuseki:Server ;
>    fuseki:services (
>      :bdrcrw
>    ) .
> :bdrcrw rdf:type fuseki:Service ;
>     fuseki:name                       "bdrcrw" ;   # name of the dataset in the url
>     fuseki:serviceQuery               "query" ;    # SPARQL query service
>     fuseki:serviceUpdate              "update" ;   # SPARQL update service
>     fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
>     fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store protocol (read and write)
>     fuseki:dataset                    :bdrc_text_dataset ;
>     .
> :bdrc_text_dataset rdf:type     text:TextDataset ;
>     text:dataset   :dataset_bdrc ;
>     text:index     :bdrc_lucene_index ;
>     .
> :dataset_bdrc rdf:type      tdb:DatasetTDB ;
>      tdb:location "/etc/fuseki/databases/bdrc" ;
>      tdb:unionDefaultGraph true ;
>      .
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)