You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Daniel Hernandez <da...@degu.cl> on 2021/02/24 19:40:43 UTC

possibly not completely loaded dataset

Hi,

I have loaded Wikidata in Jena using tdbloader2. I noticed that some
queries do not produce the expected result.

Query 1:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT * WHERE {
  <http://www.wikidata.org/entity/Q31> wdt:P1344 ?o .
}

Query 2:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT * WHERE {
  ?s wdt:P1344 ?o .
}

Query 1 returns solutions, but query 2 returns an empty table. This is
contradictory because query 1 is more selective than query 2.

I guess that this is because tdbloader2 does not finished properly the
index phase. However, the loading log showed no errors.

My question is whether can I repeat the indexing phase with the data I
currently have:

-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:24 POS-txt
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.idn
-rw-r--r-- 1 ubuntu ubuntu 276379467776 Feb 23 17:47 SPO.dat
-rw-r--r-- 1 ubuntu ubuntu    956301312 Feb 23 17:47 SPO.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:47 SPOG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:48 SPOG.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:48 data-quads.tmp
-rw-r--r-- 1 ubuntu ubuntu 592136511840 Feb 23 18:41 data-triples.tmp
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 18:41 journal.jrnl
-rw-r--r-- 1 ubuntu ubuntu  67679289344 Feb 23 18:48 node2id.dat
-rw-r--r-- 1 ubuntu ubuntu    293601280 Feb 23 18:48 node2id.idn
-rw-r--r-- 1 ubuntu ubuntu 136298477605 Feb 23 19:00 nodes.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 19:00 prefixes.dat
-rw-r--r-- 1 ubuntu ubuntu      1793582 Feb 23 19:00 stats.opt

In the best case I can run the indexing phase over this database. Is it
possible? Do you recommend me another solution to fix this database
without loading the data again?

Best,
Daniel

Re: possibly not completely loaded dataset

Posted by Andy Seaborne <an...@apache.org>.
 > In the best case I can run the indexing phase over this database.
 > Is it possible?

Yes.

     Andy

On 24/02/2021 19:40, Daniel Hernandez wrote:
> 
> Hi,
> 
> I have loaded Wikidata in Jena using tdbloader2. I noticed that some
> queries do not produce the expected result.
> 
> Query 1:
> 
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT DISTINCT * WHERE {
>    <http://www.wikidata.org/entity/Q31> wdt:P1344 ?o .
> }
> 
> Query 2:
> 
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT * WHERE {
>    ?s wdt:P1344 ?o .
> }
> 
> Query 1 returns solutions, but query 2 returns an empty table. This is
> contradictory because query 1 is more selective than query 2.
> 
> I guess that this is because tdbloader2 does not finished properly the
> index phase. However, the loading log showed no errors.
> 
> My question is whether can I repeat the indexing phase with the data I
> currently have:
> 
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.idn
> -rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:24 POS-txt
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.idn
> -rw-r--r-- 1 ubuntu ubuntu 276379467776 Feb 23 17:47 SPO.dat
> -rw-r--r-- 1 ubuntu ubuntu    956301312 Feb 23 17:47 SPO.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:47 SPOG.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:48 SPOG.idn
> -rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:48 data-quads.tmp
> -rw-r--r-- 1 ubuntu ubuntu 592136511840 Feb 23 18:41 data-triples.tmp
> -rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 18:41 journal.jrnl
> -rw-r--r-- 1 ubuntu ubuntu  67679289344 Feb 23 18:48 node2id.dat
> -rw-r--r-- 1 ubuntu ubuntu    293601280 Feb 23 18:48 node2id.idn
> -rw-r--r-- 1 ubuntu ubuntu 136298477605 Feb 23 19:00 nodes.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.idn
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.dat
> -rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.idn
> -rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 19:00 prefixes.dat
> -rw-r--r-- 1 ubuntu ubuntu      1793582 Feb 23 19:00 stats.opt
> 
> In the best case I can run the indexing phase over this database. Is it
> possible? Do you recommend me another solution to fix this database
> without loading the data again?
> 
> Best,
> Daniel
>