You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Paul Tyson <ph...@sbcglobal.net> on 2023/02/28 03:11:42 UTC

sparql query performance jena v2, 3, 4

I maintain an old jena/fuseki application that has happily been using 
jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a 
tdb database, and runs a couple dozen queries, some not so trivial, on 
the tdb.

Now it is time to update things. I first went to 3.17, to stay on java8. 
Many of the queries work fine, but a few have abysmal performance. A 
query that took maybe 10 minutes with v2.13 now runs for hours without 
finishing.

I am now trying v4.7 with java11. Testing is still in progress, but it 
doesn't look promising.

The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS 
clauses, some of which have UNION patterns. It is rather complicated, 
but also a fairly literal translation of the applicable business rules. 
I took a closer look at them, and adjusted the order of patterns to put 
the more-specific ones earlier, but that didn't help. I discovered that 
eliminating the UNION alternatives would let the query return some 
results, but obviously not what is wanted.

Did anything in particular change in the query processing since v2 that 
would cause this performance degradation?

Should I expect any difference in tdb vs tdb2? I've tried both, and 
neither give satisfaction.

Thanks in advance,
--Paul

Re: sparql query performance jena v2, 3, 4

Posted by Andy Seaborne <an...@apache.org>.


On 28/02/2023 03:11, Paul Tyson wrote:
> I maintain an old jena/fuseki application that has happily been using 
> jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a 
> tdb database, and runs a couple dozen queries, some not so trivial, on 
> the tdb.
> 
> Now it is time to update things. I first went to 3.17, to stay on java8. 
> Many of the queries work fine, but a few have abysmal performance. A 
> query that took maybe 10 minutes with v2.13 now runs for hours without 
> finishing.
> 
> I am now trying v4.7 with java11. Testing is still in progress, but it 
> doesn't look promising.
> 
> The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS 
> clauses, some of which have UNION patterns. It is rather complicated, 
> but also a fairly literal translation of the applicable business rules. 
> I took a closer look at them, and adjusted the order of patterns to put 
> the more-specific ones earlier, but that didn't help. I discovered that 
> eliminating the UNION alternatives would let the query return some 
> results, but obviously not what is wanted.
> 
> Did anything in particular change in the query processing since v2 that 
> would cause this performance degradation?

v2.13 was March 2015. A lot has changed since then including fixes where 
optimization would get the wrong answers. Some are directly EXISTS, some 
aren't but if you have complex EXISTS patterns, they can be impacted. 
They aren't pattern orders.

(Mostly they will be in JIRA)

> Should I expect any difference in tdb vs tdb2? I've tried both, and 
> neither give satisfaction.

Unlikely. TDB2 is preferred.

> 
> Thanks in advance,
> --Paul
> 
>