You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Dragan Lesic <dr...@gmail.com> on 2021/10/29 20:00:47 UTC

Fwd: RDF Star Slow performance

Hello,
I'm trying to migrate an application with content from themoviedb.org and
other sources.
The dataset count in fuseki is about 55 million triples.
To preserve order of connected data, reification is used, dsta insert with
sparql rdf*.

When querying the performance is horrible, example query:

PREFIX sub: <https://myexample.com/movie/>
PREFIX shema: <https://schema.org/>
SELECT ?o
WHERE { <<sub:123 shema:genres ?o>> shema:order ?order . }

This simple query which returns 10 triples takes about 195 seconds.
On blazegraph it's 50ms!
Test with a small dataset is fast...
I'm using Jena 4.2.0, Fuseki2 and TDB2
Cloud docker environment with 32GB ram for the instance and fast storage.

Any suggestions on this, is there any configuration i am missing?
Thank you very much, best regards.

Re: Fwd: RDF Star Slow performance

Posted by Dragan Lesic <dr...@gmail.com>.

Yes it works fast now indeed,I didn't know that!
Thank you very very much for the quick answer, you saved my day!
Best regards,
Dragan

On Fri, Oct 29, 2021 at 10:18 PM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 29/10/2021 21:00, Dragan Lesic wrote:
> > Hello,
> > I'm trying to migrate an application with content from themoviedb.org
> and
> > other sources.
> > The dataset count in fuseki is about 55 million triples.
> > To preserve order of connected data, reification is used, dsta insert
> with
> > sparql rdf*.
>
> RDF* and RDF-star are different.
>
> RDF* is the name for the original work by Olaf Hartig in collaboration
> with Bryan Thompson/Blazegraph.
>
> RDF-star is the community work based on RDF*.
> https://w3c.github.io/rdf-star/cg-spec/2021-07-01.html
>
> In particular, in RDF-star the quoted triple may, or may not, be in the
> graph. This changes the indexing and hence query performance. Currently,
> Jena does not maintain an additional index because that would require
> everyone else to reload data (whether using RDF-star or not).
>
> In RDF* <<>> means quote the triple and assert it.
> In RDF-star, it is just quote the triple.
>
> Annotation syntax bridges the gap.
>
> WHERE { sub:123 shema:genres ?o {| shema:order ?order |} }
>
>
> which is equivalent to writing
>
> WHERE { sub:123 shema:genres ?o .
>          << sub:123 shema:genres ?o  >> shema:order ?order .
> }
>
> That should be faster - please let us know.
>
>      Andy
>
> >
> > When querying the performance is horrible, example query:
> >
> > PREFIX sub: <https://myexample.com/movie/>
> > PREFIX shema: <https://schema.org/>
> > SELECT ?o
> > WHERE { <<sub:123 shema:genres ?o>> shema:order ?order . }
> >
> > This simple query which returns 10 triples takes about 195 seconds.
> > On blazegraph it's 50ms!
> > Test with a small dataset is fast...
> > I'm using Jena 4.2.0, Fuseki2 and TDB2
> > Cloud docker environment with 32GB ram for the instance and fast storage.
> >
> > Any suggestions on this, is there any configuration i am missing?
> > Thank you very much, best regards.
> >
>

Re: Fwd: RDF Star Slow performance

Posted by Andy Seaborne <an...@apache.org>.

On 29/10/2021 21:00, Dragan Lesic wrote:
> Hello,
> I'm trying to migrate an application with content from themoviedb.org and
> other sources.
> The dataset count in fuseki is about 55 million triples.
> To preserve order of connected data, reification is used, dsta insert with
> sparql rdf*.

RDF* and RDF-star are different.

RDF* is the name for the original work by Olaf Hartig in collaboration 
with Bryan Thompson/Blazegraph.

RDF-star is the community work based on RDF*.
https://w3c.github.io/rdf-star/cg-spec/2021-07-01.html

In particular, in RDF-star the quoted triple may, or may not, be in the 
graph. This changes the indexing and hence query performance. Currently, 
Jena does not maintain an additional index because that would require 
everyone else to reload data (whether using RDF-star or not).

In RDF* <<>> means quote the triple and assert it.
In RDF-star, it is just quote the triple.

Annotation syntax bridges the gap.

WHERE { sub:123 shema:genres ?o {| shema:order ?order |} }

which is equivalent to writing

WHERE { sub:123 shema:genres ?o .
         << sub:123 shema:genres ?o  >> shema:order ?order .
}

That should be faster - please let us know.

     Andy

> 
> When querying the performance is horrible, example query:
> 
> PREFIX sub: <https://myexample.com/movie/>
> PREFIX shema: <https://schema.org/>
> SELECT ?o
> WHERE { <<sub:123 shema:genres ?o>> shema:order ?order . }
> 
> This simple query which returns 10 triples takes about 195 seconds.
> On blazegraph it's 50ms!
> Test with a small dataset is fast...
> I'm using Jena 4.2.0, Fuseki2 and TDB2
> Cloud docker environment with 32GB ram for the instance and fast storage.
> 
> Any suggestions on this, is there any configuration i am missing?
> Thank you very much, best regards.
>