You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2012/07/12 20:45:34 UTC

[jira] [Closed] (JENA-275) different query results for tdbloader and tdbloader3

     [ https://issues.apache.org/jira/browse/JENA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne closed JENA-275.
------------------------------

    
> different query results for tdbloader and tdbloader3
> ----------------------------------------------------
>
>                 Key: JENA-275
>                 URL: https://issues.apache.org/jira/browse/JENA-275
>             Project: Apache Jena
>          Issue Type: Question
>          Components: TDB
>    Affects Versions: TDB 0.9.2
>            Reporter: Jon Phillips
>            Assignee: Andy Seaborne
>
> I had intended to use tdbloader3 over tdbloader for loading some large data sets of (> 100 million triples) because I was seening higher sustained triples-per-second load rates.  However, I am running into some immediate issues running basic queries on the resulting models, even on small toy test sets.  In one simple case, a SPARQL query with a fixed predicate but unbound subject (excuse my novice grasp of terminology) and objects fails to return any results for the model loaded with tdbloader3. 
> Here is the sequence of steps that I ran:
> cat dbpedia.nt  (list of 10 triples from dbpedia)
> <http://dbpedia.org/resource/AccessibleComputing> <http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en .
> <http://dbpedia.org/resource/AfghanistanGeography> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en .
> <http://dbpedia.org/resource/AfghanistanHistory> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en .
> <http://dbpedia.org/resource/AfghanistanPeople> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en .
> <http://dbpedia.org/resource/AfghanistanCommunications> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en .
> <http://dbpedia.org/resource/AfghanistanTransportations> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en .
> <http://dbpedia.org/resource/AfghanistanMilitary> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en .
> <http://dbpedia.org/resource/AfghanistanTransnationalIssues> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransnationalIssues"@en .
> <http://dbpedia.org/resource/AmoeboidTaxa> <http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en .
> build the model with tdbloader
> tdbloader --loc=dbpedia_tdbl1 dbpedia.nt 
> 23:18:29 INFO  loader               :: -- Start triples data phase
> 23:18:29 INFO  loader               :: ** Load empty triples table
> 23:18:29 INFO  loader               :: Load: dbpedia.nt -- 2012/07/11 23:18:29 EDT
> 23:18:29 INFO  loader               :: -- Finish triples data phase
> 23:18:29 INFO  loader               :: 9 triples loaded in 0.04 seconds [Rate: 214.29 per second]
> 23:18:29 INFO  loader               :: -- Start triples index phase
> 23:18:29 INFO  loader               :: ** Index SPO->POS: 9 slots indexed in 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO  loader               :: ** Index SPO->OSP: 9 slots indexed in 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO  loader               :: -- Finish triples index phase
> 23:18:29 INFO  loader               :: ** 9 triples indexed in 0.00 seconds [Rate: 1,800.00 per second]
> 23:18:29 INFO  loader               :: -- Finish triples load
> 23:18:29 INFO  loader               :: ** Completed: 9 triples loaded in 0.05 seconds [Rate: 163.64 per second]
> now build the same model with tdbloader3
> tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt 
> 23:18:38 INFO  tdbloader3           :: Load: dbpedia.nt -- 2012/07/11 23:18:38 EDT
> 23:18:38 INFO  tdbloader3           :: Node Table (1/3): building nodes.dat and sorting hash|id ...
> 23:18:38 INFO  tdbloader3           :: Total: 27 tuples : 0.01 seconds : 1,928.57 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO  tdbloader3           :: Node Table (2/3): generating input data using node ids...
> 23:18:38 INFO  tdbloader3           :: Total: 8 tuples : 0.03 seconds : 275.86 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO  tdbloader3           :: Node Table (3/3): building node table B+Tree index (i.e. node2id.dat and node2id.idn files)...
> 23:18:39 INFO  tdbloader3           :: Total: 19 tuples : 0.08 seconds : 234.57 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating SPO index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 1,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GSPO index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for POS index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 4,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating POS index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 1,125.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for OSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating OSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 1,800.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for GPOS index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GPOS index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for GOSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GOSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for POSG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating POSG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for OSPG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating OSPG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for SPOG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating SPOG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.45 seconds : 20.18 tuples/sec [2012/07/11 23:18:39 EDT]
> two simple queries that return the entire result set return the same set of triples:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x                                                            | y                                            | z                                   |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            | <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en            |
> | <http://dbpedia.org/resource/AfghanistanGeography>           | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en           |
> | <http://dbpedia.org/resource/AfghanistanHistory>             | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en             |
> | <http://dbpedia.org/resource/AfghanistanPeople>              | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en              |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en      |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en            |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   | <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en                   |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> same result for the model built with tdbloader3
> ./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x                                                            | y                                            | z                                   |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            | <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en            |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en      |
> | <http://dbpedia.org/resource/AfghanistanGeography>           | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en           |
> | <http://dbpedia.org/resource/AfghanistanHistory>             | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en             |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en            |
> | <http://dbpedia.org/resource/AfghanistanPeople>              | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en              |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   | <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en                   |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> different query run on model build with tdbloader that matches on the predicate type:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x <http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
> ----------------------------------------------------------------------------------------------------------
> | x                                                            | y | z                                   |
> ==========================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            |   | "AccessibleComputing"@en            |
> | <http://dbpedia.org/resource/AfghanistanGeography>           |   | "AfghanistanGeography"@en           |
> | <http://dbpedia.org/resource/AfghanistanHistory>             |   | "AfghanistanHistory"@en             |
> | <http://dbpedia.org/resource/AfghanistanPeople>              |   | "AfghanistanPeople"@en              |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      |   | "AfghanistanCommunications"@en      |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     |   | "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            |   | "AfghanistanMilitary"@en            |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |   | "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   |   | "AmoeboidTaxa"@en                   |
> ----------------------------------------------------------------------------------------------------------
> Expected that the data loaded with tdbloader3 to return the same result but returned empty result:
> tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x <http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
> -------------
> | x | y | z |
> =============
> -------------
> Any help would be much appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira