You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paul Tyson <ph...@sbcglobal.net> on 2023/02/28 03:11:42 UTC
sparql query performance jena v2, 3, 4
I maintain an old jena/fuseki application that has happily been using
jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a
tdb database, and runs a couple dozen queries, some not so trivial, on
the tdb.
Now it is time to update things. I first went to 3.17, to stay on java8.
Many of the queries work fine, but a few have abysmal performance. A
query that took maybe 10 minutes with v2.13 now runs for hours without
finishing.
I am now trying v4.7 with java11. Testing is still in progress, but it
doesn't look promising.
The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS
clauses, some of which have UNION patterns. It is rather complicated,
but also a fairly literal translation of the applicable business rules.
I took a closer look at them, and adjusted the order of patterns to put
the more-specific ones earlier, but that didn't help. I discovered that
eliminating the UNION alternatives would let the query return some
results, but obviously not what is wanted.
Did anything in particular change in the query processing since v2 that
would cause this performance degradation?
Should I expect any difference in tdb vs tdb2? I've tried both, and
neither give satisfaction.
Thanks in advance,
--Paul
Re: sparql query performance jena v2, 3, 4
Posted by Andy Seaborne <an...@apache.org>.
On 28/02/2023 03:11, Paul Tyson wrote:
> I maintain an old jena/fuseki application that has happily been using
> jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a
> tdb database, and runs a couple dozen queries, some not so trivial, on
> the tdb.
>
> Now it is time to update things. I first went to 3.17, to stay on java8.
> Many of the queries work fine, but a few have abysmal performance. A
> query that took maybe 10 minutes with v2.13 now runs for hours without
> finishing.
>
> I am now trying v4.7 with java11. Testing is still in progress, but it
> doesn't look promising.
>
> The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS
> clauses, some of which have UNION patterns. It is rather complicated,
> but also a fairly literal translation of the applicable business rules.
> I took a closer look at them, and adjusted the order of patterns to put
> the more-specific ones earlier, but that didn't help. I discovered that
> eliminating the UNION alternatives would let the query return some
> results, but obviously not what is wanted.
>
> Did anything in particular change in the query processing since v2 that
> would cause this performance degradation?
v2.13 was March 2015. A lot has changed since then including fixes where
optimization would get the wrong answers. Some are directly EXISTS, some
aren't but if you have complex EXISTS patterns, they can be impacted.
They aren't pattern orders.
(Mostly they will be in JIRA)
> Should I expect any difference in tdb vs tdb2? I've tried both, and
> neither give satisfaction.
Unlikely. TDB2 is preferred.
>
> Thanks in advance,
> --Paul
>
>