You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@rya.apache.org by Johannes Pfeffer <jo...@consensys.net> on 2017/03/29 10:23:20 UTC

Speed of queries using ORDER BY

Hi Rya folks!

Rya looks very nice, from what I've seen so far (especially the Accumulo
summit recordings). We're currently evaluating different triple stores for
our use case.

We have a use case in which we need to receive ordered SPARQL results. Take
this as an example:

ex:Message_1 a ex:Message ;
    ex:number 1 .

There are 1 billion of these statements in the db. We need to get the
Messages with the highest number in batches of 100.

Now if we run:

SELECT ?message ?numberWHERE {
    ?message a ex:Message ;
        ex:number ?number .}ORDER BY DESC(?number)LIMIT 100OFFSET 0

It takes quite a long time on most triple stores.

How would Rya perform on sth like this? To be honest I don't understand yet
how Rya handles datatype properties that are strongly typed, e.g. all
xsd:integer. Can it make use of the natural order of integers?

Almost the same question would be asked for max() aggregations...

Thank you so much for your help,
Johannes

Re: Speed of queries using ORDER BY

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.

Bad news: this query probably won't be very fast on Rya, but please don't
let that discourage you from trying it out.  Let us know if you need
assistance.

Good news: your query is fairly simple and it could be fast.

Rya isn't going to do well on this query for a couple reasons: (1) Rya
doesn't do a Merge join, (2) Rya doesn't push down the ORDER BY into the
scan (and relies on the openrdf orderby operator[1]), and (3) Rya doesn't
maintain a descending index.

(1) Has been on our plate for awhile and we've always been excited about
trying out new merge strategies.  We currently have an entity-centric index
that does something similar.

(2) It would require a lot of thought about the query optimization logic to
make this happen, but perhaps the later versions of openrdf (now rdf4j)
have put some thought into this (or even just made the query optimizer more
flexible).

(3) We've put a lot of thought into maintain various indexes over the rdf
stored in rya (including freetext, geo, and temporal), but we've never had
descending indices on our radar.  Still, I think we have enough of a
framework in place to support a descending index if (2) is completed.

Again, let us know if you need any help getting up and running on Rya.

--Aaron

[1] The OpenRDF orderby uses an in-memory tree map for sorting:
https://github.com/ansell/openrdf-sesame/blob/2.7.6/core/queryalgebra/evaluation/src/main/java/org/openrdf/query/algebra/evaluation/iterator/OrderIterator.java#L194

On Wed, Mar 29, 2017 at 6:24 AM Johannes Pfeffer <
johannes.pfeffer@consensys.net> wrote:

> Hi Rya folks!
>
> Rya looks very nice, from what I've seen so far (especially the Accumulo
> summit recordings). We're currently evaluating different triple stores for
> our use case.
>
> We have a use case in which we need to receive ordered SPARQL results. Take
> this as an example:
>
> ex:Message_1 a ex:Message ;
>     ex:number 1 .
>
> There are 1 billion of these statements in the db. We need to get the
> Messages with the highest number in batches of 100.
>
> Now if we run:
>
> SELECT ?message ?numberWHERE {
>     ?message a ex:Message ;
>         ex:number ?number .}ORDER BY DESC(?number)LIMIT 100OFFSET 0
>
> It takes quite a long time on most triple stores.
>
> How would Rya perform on sth like this? To be honest I don't understand yet
> how Rya handles datatype properties that are strongly typed, e.g. all
> xsd:integer. Can it make use of the natural order of integers?
>
> Almost the same question would be asked for max() aggregations...
>
> Thank you so much for your help,
> Johannes
>