You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Wolfgang Fahl <wf...@bitplan.com> on 2020/07/27 13:54:00 UTC

Another successful WikiData Import

Dear Jena Users,

at

http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData

i have documented several attempts to import WikiData into Jena. After
Jonas Sourlier reported a success using a 4 TB SSD i tried
the same approach. It is documented at:

http://wiki.bitplan.com/index.php/WikiData_Import_2020-07-15

The attempt was limited to the truthy statements but still gives full
access to the full content of WikiData. This is especially important for
longer running
queries.  One of the followup questions would be how queries can be sped
up in this environment.

E.g. i tried:

SELECT (COUNT(*) as ?Triples) WHERE { ?s ?p ?o}

which took 5516 secs to answer that 5.250.681.892 triples are in the
data set.

Yours

  Wolfgang


-- 

Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de


Re: Another successful WikiData Import

Posted by Andy Seaborne <an...@apache.org>.
Good to hear!

On 27/07/2020 14:54, Wolfgang Fahl wrote:
> Dear Jena Users,
> 
> at
> 
> http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData
> 
> i have documented several attempts to import WikiData into Jena. After 
> Jonas Sourlier reported a success using a 4 TB SSD i tried
> the same approach. It is documented at:
> 
> http://wiki.bitplan.com/index.php/WikiData_Import_2020-07-15
> 
> The attempt was limited to the truthy statements but still gives full 
> access to the full content of WikiData. This is especially important for 
> longer running
> queries.  One of the followup questions would be how queries can be sped 
> up in this environment.
> 
> E.g. i tried:
> 
> SELECT  (COUNT(*)  as  ?Triples)  WHERE  {  ?s  ?p  ?o}
> 
> which took 5516 secs to answer that 5.250.681.892 triples are in the 
> data set.

Counting a bit of tradoff - TDB does not manage the count separately, 
when it needs to, it does actually count the triple table. And, 
especially when cold, that's expensive.

The upside is that it is right and included any changes in the current 
transaction.

     Andy

> 
> Yours
> 
>    Wolfgang
> 
> 
> -- 
> 
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web:http://www.bitplan.de
> 

Re: Another successful WikiData Import

Posted by Dan Brickley <da...@danbri.org>.
On Mon, 27 Jul 2020 at 14:54, Wolfgang Fahl <wf...@bitplan.com> wrote:

> Dear Jena Users,
>
> at
>
> http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData
>
> i have documented several attempts to import WikiData into Jena. After
> Jonas Sourlier reported a success using a 4 TB SSD i tried
> the same approach. It is documented at:
>
> http://wiki.bitplan.com/index.php/WikiData_Import_2020-07-15
>
> The attempt was limited to the truthy statements but still gives full
> access to the full content of WikiData. This is especially important for
> longer running
> queries.  One of the followup questions would be how queries can be sped
> up in this environment.
>
> E.g. i tried:
>
> SELECT (COUNT(*) as ?Triples) WHERE { ?s ?p ?o}
>
>
> which took 5516 secs to answer that 5.250.681.892 triples are in the data
> set.
>
Thanks for sharing this work!

You might also be interested in https://yago-knowledge.org/downloads/yago-4
which maps Wikidata to Schema.org and then offers a couple of subsets.
There are a bunch of other discussions around Wikidata subsetting that
might be relevant too, see
https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas/Subsetting
https://docs.google.com/document/d/1MmrpEQ9O7xA6frNk6gceu_IbQrUiEYGI9vcQjDvTL9c/edit#heading=h.7xg3cywpkgfq
and also
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/

cheers,

Dan





> Yours
>
>   Wolfgang
>
>
> --
>
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web: http://www.bitplan.de
>
>