You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2022/10/17 11:56:52 UTC

Re: Weird sparql problem

?s a ?t .
   ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?


On 21/09/2022 9.15, Lorenz Buehmann wrote:
> Weird, only 10M triples and each triple pattern returns only 1 
> binding, thus, the size is tiny - honestly I can't think of anything 
> except for open connections, but as you mentioned, running the queries 
> with only one triple pattern works as expected, so that too many open 
> connections shouldn't be an issue most likely.
>
> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>
> Or can you reproduce this on different servers as well?
>
> Is it also stuck of your run the query directly after you restart Fuseki?
>
>
> On 19.09.22 13:49, Mikael Pesonen wrote:
>>
>>
>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>> Forgot:
>>>
>>> - size of result for each triple pattern? Might affect if hash join 
>>> can be used.
>> It's one row for each.
>>>
>>> - your hardware?
>> Normal server with 16gigs mem.
>>>
>>> - is it just the first query after starting Fuseki? Connections have 
>>> been closed? Note, there was also a bug in a recent Jena version, 
>>> but only with TDB and too many open connections. It has been 
>>> resolved with release 4.6.1.
>> Jena has been running quite a while.
>>>
>>> Might not be related, but I'm mentioning all things here nevertheless.
>>>
>>>
>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>
>>>> This returns one row fast, say :C1
>>>>
>>>> SELECT *
>>>> FROM <https://a.b.c>
>>>> WHERE {
>>>>   <https://x.y.z> a ?t .
>>>>   #?t skos:prefLabel ?l
>>>> }
>>>>
>>>>
>>>> and this too:
>>>>
>>>> SELECT *
>>>> FROM <https://a.b.c>
>>>> WHERE {
>>>>   #<https://x.y.z> a ?t .
>>>>   :C1 skos:prefLabel ?l
>>>> }
>>>>
>>>>
>>>> But this always hangs until timeout
>>>>
>>>> SELECT *
>>>> FROM <https://a.b.c>
>>>> WHERE {
>>>>   <https://x.y.z> a ?t .
>>>>   ?t skos:prefLabel ?l
>>>> }
>>>>
>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Where do I get this qparse?

On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
> For these kind of performance issues it is useful to see the SPARQL algebra for the whole query, not just fragments of the query.  You can use the qparse command for the version of Jena you are using to see how it is optimising your queries e.g.
>
> qparse --explain --query example.rq
>
> As Lorenz suggests this may be the optimiser making a bad guess at the appropriate order in which to evaluate the triple patterns within the BGP but without the larger query context or the algebra all we can do is guess.
>
> Rob
>
> From: Mikael Pesonen <mi...@lingsoft.fi>
> Date: Tuesday, 1 November 2022 at 12:53
> To: users@jena.apache.org <us...@jena.apache.org>
> Subject: Re: Weird sparql problem
> Diferent case, but again hanging makes no sense to user, whatever are
> the technical reasons.
>
>    VALUES ?sct_code { "298314008" }
>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>
> returns one row immediately, but
>
>    VALUES ?sct_code { "298314008" }
>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
> lsu:SNOMEDCT_US
>
> hangs forever
>
>
>    skos:inScheme lsu:SNOMEDCT_US;
>
> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>> Hi,
>>
>> comments inline
>>
>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>> This works as a separate query, but not in a the middle, since ?s
>>> gets new values instead of binding to previous ?s.
>>>
>>> { select ?t where {
>>> ?s a ?t .
>>>   } limit 10}
>>>    ?t skos:prefLabel ?l
>>
>> In the middle of what? Subqueries will be evaluated first -  if you
>> really want labels for classes, you should use a DISTINCT in the
>> subquery such that the intermediate result is small, there shouldn't
>> be that many classes, but many instances with the same class, thus,
>> the join would be more expensive than necessary.
>>
>>
>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>> ?s a ?t .
>>>>    ?t skos:prefLabel ?l
>>>>
>>>> returns 3 million triples. Maybe it's related to this?
>> I don't see how this should be related to  your initial query where ?s
>> was bound, which in my opinion should be an easy join. Is it possible
>> for you to share the dataset somehow? Also, what you can do is to
>> compute statistics for the TDB database with tdbstats tool [1] from
>> commandline and put it into the TDB folder. But even without the query
>> plan should take the first triple pattern, use the spo index as s and
>> p are bound, then pass the bindings of ?o to the evaluation of the
>> second triple pattern
>>
>> [1]
>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>>
>>>>
>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>> anything except for open connections, but as you mentioned, running
>>>>> the queries with only one triple pattern works as expected, so that
>>>>> too many open connections shouldn't be an issue most likely.
>>>>>
>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>
>>>>> Or can you reproduce this on different servers as well?
>>>>>
>>>>> Is it also stuck of your run the query directly after you restart
>>>>> Fuseki?
>>>>>
>>>>>
>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>
>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>> Forgot:
>>>>>>>
>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>> join can be used.
>>>>>> It's one row for each.
>>>>>>> - your hardware?
>>>>>> Normal server with 16gigs mem.
>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>> been resolved with release 4.6.1.
>>>>>> Jena has been running quite a while.
>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>> nevertheless.
>>>>>>>
>>>>>>>
>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>> This returns one row fast, say :C1
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> and this too:
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> But this always hangs until timeout
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi<http://www.lingsoft.fi>
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Re: Weird sparql problem

Posted by Simon Bin <sb...@informatik.uni-leipzig.de>.
On Tue, 2022-11-15 at 13:13 +0200, Mikael Pesonen wrote:
> For example this also hangs (or is slow)
> 
> (?lit) text:query (skosxl:literalForm  "\"fever\"" "lang:en" ) .
> ?c skosxl:prefLabel|altLabel [skosxl:literalForm ?lit]

you can check if https://github.com/apache/jena/pull/1616 helps for
this case.

Cheers,

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I'm finding now many similar cases where basic queries just won't work. 
Maybe we have reached the max size of db (77gb on disk)? Or should any 
serious sparql/triple store user just learn how to optimize queries? 
Which is different from MySQL, for example.

For example this also hangs (or is slow)

(?lit) text:query (skosxl:literalForm  "\"fever\"" "lang:en" ) .
?c skosxl:prefLabel|altLabel [skosxl:literalForm ?lit]

and this works
(?lit) text:query (skosxl:literalForm  "\"fever\"" "lang:en" ) .
{ ?c skosxl:prefLabel [skosxl:literalForm ?lit] }
UNION
{ ?c skosxl:altLabel [skosxl:literalForm ?lit] }

On 09/11/2022 11.43, rvesse@dotnetrdf.org wrote:
> TL;DR
>
> It’s a workaround for this issue because it can force the optimiser to behave differently, however it should be used sparingly as overuse may prevent other optimisations that may yield more benefit than you lose elsewhere.
>
> See Andy’s recent email [1] that the offending optimisation will be disabled by default in future releases so the workaround will not be needed longer term.
>
> ----
>
> The long-winded details for those interested (although I’m still glossing over lots of low level details)…
>
> There are two levels of query optimisation in ARQ:
>
>
>    1.  Logical optimisation (sometimes referred to as algebra optimisation)
>    2.  Execution optimisation
>
> The logical optimiser works at the SPARQL Algebra level and looks to make transformations to the algebra that are known to improve performance based on experience, past research etc.  In doing so the optimiser has to ensure that those transformations are semantically safe, i.e., they MUST NOT change the overall semantics of the query and result in the same answers as the original query.  Therefore, many of these optimisations are applied quite conservatively so if ARQ cannot determine that a given transformation would be semantically dependent it won’t apply it.
>
> Additionally in some cases, these optimisations are specifically intended to be chained, i.e., doing one optimisation may enable further optimisations, thus the ARQ optimiser applies the various transformations in a specific order.  See https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/OptimizerStd.java if you want to see the raw details of this and some explanatory comments around the ordering.  Some of these logical transformations are also done to enable execution optimiser behaviour later in query evaluation e.g., join linearisation.
>
> The downside of the logical optimiser is that it works purely by static analysis of the algebra i.e., without reference to the dataset against the query will ultimately evaluate.  This means that sometimes it can make decisions that are good for the general case BUT bad for some datasets.
>
> The execution optimiser is a whole bunch of things done during actual execution to improve performance.  This includes everything from Jena’s streaming-based iterator implementation of query execution (effectively a Volcano based evaluation model [2]), ARQ’s join linearisation operators, TDB’s low level Node IDs and direct expression evaluation over those, Node ID to RDF Term caching (and vice-versa), memory-mapping of database indices etc.
>
> Execution also includes BGP reordering, when the query evaluator gets a BGP to evaluate it can choose to apply reordering to the triple patterns within that BGP.  For TDB this is controlled by the presence, or lack thereof, of a stats/fixed/none.opt file in the database directory.  Having a relevant file present should apply further execution time BGP reordering that can be statistics aware and thus avoids the issue of the logical BGP reordering.  Since during execution of a single BGP bindings from earlier triple patterns are used to restrict the searches made for subsequent triple patterns the order of execution of the triple patterns can be important, especially if one triple pattern has many matches.
>
> However, if you are querying an in-memory dataset instead of TDB, then you may not be getting any execution time BGP reordering so you’re left evaluating the triple patterns according to the logical BGP reordering that may turn out to be sub-optimal depending on your dataset.
>
> ---
>
> The specific problem discussed in this thread is due to a new optimisation that was introduced in Jena 4.5.0 (BGP Reordering during logical optimisation), this was an optimisation that was shown to improve performance on some benchmarks as it enables more aggressive application of another optimisation (filter placement).
>
> The reason it did the opposite on some users’ dataset is that it is done without any knowledge of the data (as it’s a logical optimisation) and can result in breaking up a BGP into separate BGPs causing less specific triple patterns to be evaluated prior to more specific ones.  Or where no execution time BGP reordering occurs can leave the triple patterns in a sub-optimal order for evaluation even if BGPs are not split.
>
> The short-term fix (again see Andy’s email [1]) is to disable this optimisation by default, users can opt back into it if they find it benefits their usage of Jena on their datasets.
>
> The long-term fix is probably to rearchitect the logical optimiser in some way to allow more data context to be visible to it i.e., making the logical BGP reordering statistics aware, making ARQ’s overall optimisation strategy more hybrid.  If anyone is interested, I’d imagine there’ll be a thread on this on the dev list soon
>
> Hope this helps,
>
> Rob
>
> [1]: https://lists.apache.org/thread/37cloogcb3wzmkl0s33ttnxyg0kvq69p
> [2]: http://daslab.seas.harvard.edu/reading-group/papers/volcano.pdf
>
>
> From: Mikael Pesonen <mi...@lingsoft.fi>
> Date: Tuesday, 8 November 2022 at 11:04
> To: users@jena.apache.org <us...@jena.apache.org>
> Subject: Re: Weird sparql problem
> Both your suggestions for rewriting the query worked. I'm lost with the
> reasons, but for future cases, breaking problematic queries with {} is
> they way to go?
>
> On 04/11/2022 11.25, rvesse@dotnetrdf.org wrote:
>> So yes as suspected the triple patterns are being reordered badly in the BGP:
>>
>>     (sequence
>>       (table (vars ?sct_code)
>>         (row [?sct_code "298314008"])
>>       )
>>       (bgp
>>         (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>         (triple ?c skosxl:prefLabel ??0)
>>         (triple ??0 lsu:code ?sct_code)
>>       )))
>>
>> The optimizer doesn’t take into account the fact that the ?sct_code variable is going to be bound by the VALUES clause (table in the algebra) so considers that the least specific triple pattern (as it has two variables) causing it to evaluate a much less specific triple pattern first.
>>
>> Lorenz’s suggestion of generating statistics for your dataset is a good one, statistics would likely guide the optimiser that the ?c skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific for your dataset.
>>
>> You could also try Andy’s suggestion else-thread i.e. --set arq:optReorderBGP=false passed to the CLI command in question, or if this is being called from code ARQ.getContext().set(ARQ.optReorderBGP, false);
>>
>> The other thing you can do is explicitly break up your query further i.e.
>>
>> { VALUES ?sct_code { "298314008" }
>>     {  _:b0  lsu:code          ?sct_code .
>>       ?c    skosxl:prefLabel  _:b0 . }
>>     {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>>     }
>>
>> Essentially forcing the engine to evaluate that very unspecific triple pattern last
>>
>> Another possibility would be to change that triple pattern to be in a FILTER EXISTS condition, so it’d only be evaluated for matches to your other triple patterns i.e.
>>
>> { VALUES ?sct_code { "298314008" }
>>       _:b0  lsu:code          ?sct_code .
>>       ?c    skosxl:prefLabel  _:b0 .
>>      FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>>     }
>>
>> Hope this helps,
>>
>> Rob
>>
>> From: Lorenz Buehmann <bu...@informatik.uni-leipzig.de>
>> Date: Thursday, 3 November 2022 at 11:12
>> To: users@jena.apache.org <us...@jena.apache.org>
>> Subject: Re: Re: Weird sparql problem
>> tdbquery --explain --loc  $TDB_LOC  "query here"
>>
>> would also work to see the plan - maybe also increase log level to see
>> more: https://jena.apache.org/documentation/tdb/optimizer.html
>>
>> Another question, did you generate the TDB stats such those could be
>> used by the optimizer?
>>
>> for debugging purpose, you could also disable query optimization (put an
>> empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
>> manually, i.e.
>>
>>> WHERE
>>>     { VALUES ?sct_code { "298314008" }
>>>     _:b0  lsu:code          ?sct_code .
>>>       ?c    skosxl:prefLabel  _:b0 .
>>>       ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>     }
>> without stats and based on heuristics (e.g. number of variables in
>> triple pattern), otherwise the last triple pattern might always be
>> evaluated first
>>
>>
>> On 03.11.22 11:11, Mikael Pesonen wrote:
>>> Here's the parse, hope it helps:
>>>
>>> WHERE
>>>     { VALUES ?sct_code { "298314008" }
>>>       ?c    skosxl:prefLabel  _:b0 .
>>>       _:b0  lsu:code          ?sct_code .
>>>       ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>     }
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> (prefix ((owl: <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
>>>            (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
>>>            (skosxl: <http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
>>>            (skos: <http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
>>>            (dcterms: <http://purl.org/dc/terms/>)
>>>            (rdfs: <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
>>>            (lsr: <https://resource.lingsoft.fi/>)
>>>            (id: <http://snomed.info/id/>)
>>>            (dcat: <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
>>>            (dc: <http://purl.org/dc/elements/1.1/>)
>>>            (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>>     (sequence
>>>       (table (vars ?sct_code)
>>>         (row [?sct_code "298314008"])
>>>       )
>>>       (bgp
>>>         (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>>         (triple ?c skosxl:prefLabel ??0)
>>>         (triple ??0 lsu:code ?sct_code)
>>>       )))
>>>
>>>
>>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>>> For these kind of performance issues it is useful to see the SPARQL
>>>> algebra for the whole query, not just fragments of the query.  You
>>>> can use the qparse command for the version of Jena you are using to
>>>> see how it is optimising your queries e.g.
>>>>
>>>> qparse --explain --query example.rq
>>>>
>>>> As Lorenz suggests this may be the optimiser making a bad guess at
>>>> the appropriate order in which to evaluate the triple patterns within
>>>> the BGP but without the larger query context or the algebra all we
>>>> can do is guess.
>>>>
>>>> Rob
>>>>
>>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>>> Date: Tuesday, 1 November 2022 at 12:53
>>>> To: users@jena.apache.org <us...@jena.apache.org>
>>>> Subject: Re: Weird sparql problem
>>>> Diferent case, but again hanging makes no sense to user, whatever are
>>>> the technical reasons.
>>>>
>>>>      VALUES ?sct_code { "298314008" }
>>>>        ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>>
>>>> returns one row immediately, but
>>>>
>>>>      VALUES ?sct_code { "298314008" }
>>>>        ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>>> lsu:SNOMEDCT_US
>>>>
>>>> hangs forever
>>>>
>>>>
>>>>      skos:inScheme lsu:SNOMEDCT_US;
>>>>
>>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>>> Hi,
>>>>>
>>>>> comments inline
>>>>>
>>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>>> gets new values instead of binding to previous ?s.
>>>>>>
>>>>>> { select ?t where {
>>>>>> ?s a ?t .
>>>>>>     } limit 10}
>>>>>>      ?t skos:prefLabel ?l
>>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>>> really want labels for classes, you should use a DISTINCT in the
>>>>> subquery such that the intermediate result is small, there shouldn't
>>>>> be that many classes, but many instances with the same class, thus,
>>>>> the join would be more expensive than necessary.
>>>>>
>>>>>
>>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>>> ?s a ?t .
>>>>>>>      ?t skos:prefLabel ?l
>>>>>>>
>>>>>>> returns 3 million triples. Maybe it's related to this?
>>>>> I don't see how this should be related to  your initial query where ?s
>>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>>> for you to share the dataset somehow? Also, what you can do is to
>>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>>> commandline and put it into the TDB folder. But even without the query
>>>>> plan should take the first triple pattern, use the spo index as s and
>>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>>> second triple pattern
>>>>>
>>>>> [1]
>>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>
>>>>>
>>>>>
>>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>>> anything except for open connections, but as you mentioned, running
>>>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>>
>>>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>>>
>>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>>
>>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>>> Fuseki?
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>>> Forgot:
>>>>>>>>>>
>>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>>> join can be used.
>>>>>>>>> It's one row for each.
>>>>>>>>>> - your hardware?
>>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>>> Jena has been running quite a while.
>>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>>> nevertheless.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>      <https://x.y.z> a ?t .
>>>>>>>>>>>      #?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and this too:
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>      #<https://x.y.z> a ?t .
>>>>>>>>>>>      :C1 skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>      <https://x.y.z> a ?t .
>>>>>>>>>>>      ?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>


Re: Weird sparql problem

Posted by "rvesse@dotnetrdf.org" <rv...@dotnetrdf.org>.
TL;DR

It’s a workaround for this issue because it can force the optimiser to behave differently, however it should be used sparingly as overuse may prevent other optimisations that may yield more benefit than you lose elsewhere.

See Andy’s recent email [1] that the offending optimisation will be disabled by default in future releases so the workaround will not be needed longer term.

----

The long-winded details for those interested (although I’m still glossing over lots of low level details)…

There are two levels of query optimisation in ARQ:


  1.  Logical optimisation (sometimes referred to as algebra optimisation)
  2.  Execution optimisation

The logical optimiser works at the SPARQL Algebra level and looks to make transformations to the algebra that are known to improve performance based on experience, past research etc.  In doing so the optimiser has to ensure that those transformations are semantically safe, i.e., they MUST NOT change the overall semantics of the query and result in the same answers as the original query.  Therefore, many of these optimisations are applied quite conservatively so if ARQ cannot determine that a given transformation would be semantically dependent it won’t apply it.

Additionally in some cases, these optimisations are specifically intended to be chained, i.e., doing one optimisation may enable further optimisations, thus the ARQ optimiser applies the various transformations in a specific order.  See https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/OptimizerStd.java if you want to see the raw details of this and some explanatory comments around the ordering.  Some of these logical transformations are also done to enable execution optimiser behaviour later in query evaluation e.g., join linearisation.

The downside of the logical optimiser is that it works purely by static analysis of the algebra i.e., without reference to the dataset against the query will ultimately evaluate.  This means that sometimes it can make decisions that are good for the general case BUT bad for some datasets.

The execution optimiser is a whole bunch of things done during actual execution to improve performance.  This includes everything from Jena’s streaming-based iterator implementation of query execution (effectively a Volcano based evaluation model [2]), ARQ’s join linearisation operators, TDB’s low level Node IDs and direct expression evaluation over those, Node ID to RDF Term caching (and vice-versa), memory-mapping of database indices etc.

Execution also includes BGP reordering, when the query evaluator gets a BGP to evaluate it can choose to apply reordering to the triple patterns within that BGP.  For TDB this is controlled by the presence, or lack thereof, of a stats/fixed/none.opt file in the database directory.  Having a relevant file present should apply further execution time BGP reordering that can be statistics aware and thus avoids the issue of the logical BGP reordering.  Since during execution of a single BGP bindings from earlier triple patterns are used to restrict the searches made for subsequent triple patterns the order of execution of the triple patterns can be important, especially if one triple pattern has many matches.

However, if you are querying an in-memory dataset instead of TDB, then you may not be getting any execution time BGP reordering so you’re left evaluating the triple patterns according to the logical BGP reordering that may turn out to be sub-optimal depending on your dataset.

---

The specific problem discussed in this thread is due to a new optimisation that was introduced in Jena 4.5.0 (BGP Reordering during logical optimisation), this was an optimisation that was shown to improve performance on some benchmarks as it enables more aggressive application of another optimisation (filter placement).

The reason it did the opposite on some users’ dataset is that it is done without any knowledge of the data (as it’s a logical optimisation) and can result in breaking up a BGP into separate BGPs causing less specific triple patterns to be evaluated prior to more specific ones.  Or where no execution time BGP reordering occurs can leave the triple patterns in a sub-optimal order for evaluation even if BGPs are not split.

The short-term fix (again see Andy’s email [1]) is to disable this optimisation by default, users can opt back into it if they find it benefits their usage of Jena on their datasets.

The long-term fix is probably to rearchitect the logical optimiser in some way to allow more data context to be visible to it i.e., making the logical BGP reordering statistics aware, making ARQ’s overall optimisation strategy more hybrid.  If anyone is interested, I’d imagine there’ll be a thread on this on the dev list soon

Hope this helps,

Rob

[1]: https://lists.apache.org/thread/37cloogcb3wzmkl0s33ttnxyg0kvq69p
[2]: http://daslab.seas.harvard.edu/reading-group/papers/volcano.pdf


From: Mikael Pesonen <mi...@lingsoft.fi>
Date: Tuesday, 8 November 2022 at 11:04
To: users@jena.apache.org <us...@jena.apache.org>
Subject: Re: Weird sparql problem
Both your suggestions for rewriting the query worked. I'm lost with the
reasons, but for future cases, breaking problematic queries with {} is
they way to go?

On 04/11/2022 11.25, rvesse@dotnetrdf.org wrote:
> So yes as suspected the triple patterns are being reordered badly in the BGP:
>
>    (sequence
>      (table (vars ?sct_code)
>        (row [?sct_code "298314008"])
>      )
>      (bgp
>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>        (triple ?c skosxl:prefLabel ??0)
>        (triple ??0 lsu:code ?sct_code)
>      )))
>
> The optimizer doesn’t take into account the fact that the ?sct_code variable is going to be bound by the VALUES clause (table in the algebra) so considers that the least specific triple pattern (as it has two variables) causing it to evaluate a much less specific triple pattern first.
>
> Lorenz’s suggestion of generating statistics for your dataset is a good one, statistics would likely guide the optimiser that the ?c skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific for your dataset.
>
> You could also try Andy’s suggestion else-thread i.e. --set arq:optReorderBGP=false passed to the CLI command in question, or if this is being called from code ARQ.getContext().set(ARQ.optReorderBGP, false);
>
> The other thing you can do is explicitly break up your query further i.e.
>
> { VALUES ?sct_code { "298314008" }
>    {  _:b0  lsu:code          ?sct_code .
>      ?c    skosxl:prefLabel  _:b0 . }
>    {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>    }
>
> Essentially forcing the engine to evaluate that very unspecific triple pattern last
>
> Another possibility would be to change that triple pattern to be in a FILTER EXISTS condition, so it’d only be evaluated for matches to your other triple patterns i.e.
>
> { VALUES ?sct_code { "298314008" }
>      _:b0  lsu:code          ?sct_code .
>      ?c    skosxl:prefLabel  _:b0 .
>     FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>    }
>
> Hope this helps,
>
> Rob
>
> From: Lorenz Buehmann <bu...@informatik.uni-leipzig.de>
> Date: Thursday, 3 November 2022 at 11:12
> To: users@jena.apache.org <us...@jena.apache.org>
> Subject: Re: Re: Weird sparql problem
> tdbquery --explain --loc  $TDB_LOC  "query here"
>
> would also work to see the plan - maybe also increase log level to see
> more: https://jena.apache.org/documentation/tdb/optimizer.html
>
> Another question, did you generate the TDB stats such those could be
> used by the optimizer?
>
> for debugging purpose, you could also disable query optimization (put an
> empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
> manually, i.e.
>
>> WHERE
>>    { VALUES ?sct_code { "298314008" }
>>    _:b0  lsu:code          ?sct_code .
>>      ?c    skosxl:prefLabel  _:b0 .
>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>    }
> without stats and based on heuristics (e.g. number of variables in
> triple pattern), otherwise the last triple pattern might always be
> evaluated first
>
>
> On 03.11.22 11:11, Mikael Pesonen wrote:
>> Here's the parse, hope it helps:
>>
>> WHERE
>>    { VALUES ?sct_code { "298314008" }
>>      ?c    skosxl:prefLabel  _:b0 .
>>      _:b0  lsu:code          ?sct_code .
>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>    }
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> (prefix ((owl: <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
>>           (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
>>           (skosxl: <http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
>>           (skos: <http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
>>           (dcterms: <http://purl.org/dc/terms/>)
>>           (rdfs: <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
>>           (lsr: <https://resource.lingsoft.fi/>)
>>           (id: <http://snomed.info/id/>)
>>           (dcat: <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
>>           (dc: <http://purl.org/dc/elements/1.1/>)
>>           (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>    (sequence
>>      (table (vars ?sct_code)
>>        (row [?sct_code "298314008"])
>>      )
>>      (bgp
>>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>        (triple ?c skosxl:prefLabel ??0)
>>        (triple ??0 lsu:code ?sct_code)
>>      )))
>>
>>
>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>> For these kind of performance issues it is useful to see the SPARQL
>>> algebra for the whole query, not just fragments of the query.  You
>>> can use the qparse command for the version of Jena you are using to
>>> see how it is optimising your queries e.g.
>>>
>>> qparse --explain --query example.rq
>>>
>>> As Lorenz suggests this may be the optimiser making a bad guess at
>>> the appropriate order in which to evaluate the triple patterns within
>>> the BGP but without the larger query context or the algebra all we
>>> can do is guess.
>>>
>>> Rob
>>>
>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>> Date: Tuesday, 1 November 2022 at 12:53
>>> To: users@jena.apache.org <us...@jena.apache.org>
>>> Subject: Re: Weird sparql problem
>>> Diferent case, but again hanging makes no sense to user, whatever are
>>> the technical reasons.
>>>
>>>     VALUES ?sct_code { "298314008" }
>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>
>>> returns one row immediately, but
>>>
>>>     VALUES ?sct_code { "298314008" }
>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>> lsu:SNOMEDCT_US
>>>
>>> hangs forever
>>>
>>>
>>>     skos:inScheme lsu:SNOMEDCT_US;
>>>
>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>> Hi,
>>>>
>>>> comments inline
>>>>
>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>> gets new values instead of binding to previous ?s.
>>>>>
>>>>> { select ?t where {
>>>>> ?s a ?t .
>>>>>    } limit 10}
>>>>>     ?t skos:prefLabel ?l
>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>> really want labels for classes, you should use a DISTINCT in the
>>>> subquery such that the intermediate result is small, there shouldn't
>>>> be that many classes, but many instances with the same class, thus,
>>>> the join would be more expensive than necessary.
>>>>
>>>>
>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>> ?s a ?t .
>>>>>>     ?t skos:prefLabel ?l
>>>>>>
>>>>>> returns 3 million triples. Maybe it's related to this?
>>>> I don't see how this should be related to  your initial query where ?s
>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>> for you to share the dataset somehow? Also, what you can do is to
>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>> commandline and put it into the TDB folder. But even without the query
>>>> plan should take the first triple pattern, use the spo index as s and
>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>> second triple pattern
>>>>
>>>> [1]
>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>
>>>>
>>>>
>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>> anything except for open connections, but as you mentioned, running
>>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>
>>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>>
>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>
>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>> Fuseki?
>>>>>>>
>>>>>>>
>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>> Forgot:
>>>>>>>>>
>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>> join can be used.
>>>>>>>> It's one row for each.
>>>>>>>>> - your hardware?
>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>> Jena has been running quite a while.
>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>> nevertheless.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>     #?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and this too:
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     #<https://x.y.z> a ?t .
>>>>>>>>>>     :C1 skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>     ?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi<http://www.lingsoft.fi>
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>

--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi<http://www.lingsoft.fi>

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Re: Weird sparql problem

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
Andy changed something in the algebra optimizer in latest develop, maybe 
you can try this though I don't know if it will change anything as it 
was more related to FILTER expressions.

On 08.11.22 12:04, Mikael Pesonen wrote:
> Both your suggestions for rewriting the query worked. I'm lost with 
> the reasons, but for future cases, breaking problematic queries with 
> {} is they way to go?
>
> On 04/11/2022 11.25, rvesse@dotnetrdf.org wrote:
>> So yes as suspected the triple patterns are being reordered badly in 
>> the BGP:
>>
>>    (sequence
>>      (table (vars ?sct_code)
>>        (row [?sct_code "298314008"])
>>      )
>>      (bgp
>>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>        (triple ?c skosxl:prefLabel ??0)
>>        (triple ??0 lsu:code ?sct_code)
>>      )))
>>
>> The optimizer doesn’t take into account the fact that the ?sct_code 
>> variable is going to be bound by the VALUES clause (table in the 
>> algebra) so considers that the least specific triple pattern (as it 
>> has two variables) causing it to evaluate a much less specific triple 
>> pattern first.
>>
>> Lorenz’s suggestion of generating statistics for your dataset is a 
>> good one, statistics would likely guide the optimiser that the ?c 
>> skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific 
>> for your dataset.
>>
>> You could also try Andy’s suggestion else-thread i.e. --set 
>> arq:optReorderBGP=false passed to the CLI command in question, or if 
>> this is being called from code 
>> ARQ.getContext().set(ARQ.optReorderBGP, false);
>>
>> The other thing you can do is explicitly break up your query further 
>> i.e.
>>
>> { VALUES ?sct_code { "298314008" }
>>    {  _:b0  lsu:code          ?sct_code .
>>      ?c    skosxl:prefLabel  _:b0 . }
>>    {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>>    }
>>
>> Essentially forcing the engine to evaluate that very unspecific 
>> triple pattern last
>>
>> Another possibility would be to change that triple pattern to be in a 
>> FILTER EXISTS condition, so it’d only be evaluated for matches to 
>> your other triple patterns i.e.
>>
>> { VALUES ?sct_code { "298314008" }
>>      _:b0  lsu:code          ?sct_code .
>>      ?c    skosxl:prefLabel  _:b0 .
>>     FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>>    }
>>
>> Hope this helps,
>>
>> Rob
>>
>> From: Lorenz Buehmann <bu...@informatik.uni-leipzig.de>
>> Date: Thursday, 3 November 2022 at 11:12
>> To: users@jena.apache.org <us...@jena.apache.org>
>> Subject: Re: Re: Weird sparql problem
>> tdbquery --explain --loc  $TDB_LOC  "query here"
>>
>> would also work to see the plan - maybe also increase log level to see
>> more: https://jena.apache.org/documentation/tdb/optimizer.html
>>
>> Another question, did you generate the TDB stats such those could be
>> used by the optimizer?
>>
>> for debugging purpose, you could also disable query optimization (put an
>> empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
>> manually, i.e.
>>
>>> WHERE
>>>    { VALUES ?sct_code { "298314008" }
>>>    _:b0  lsu:code          ?sct_code .
>>>      ?c    skosxl:prefLabel  _:b0 .
>>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>    }
>> without stats and based on heuristics (e.g. number of variables in
>> triple pattern), otherwise the last triple pattern might always be
>> evaluated first
>>
>>
>> On 03.11.22 11:11, Mikael Pesonen wrote:
>>> Here's the parse, hope it helps:
>>>
>>> WHERE
>>>    { VALUES ?sct_code { "298314008" }
>>>      ?c    skosxl:prefLabel  _:b0 .
>>>      _:b0  lsu:code          ?sct_code .
>>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>    }
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> (prefix ((owl: 
>>> <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
>>>           (rdf: 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
>>>           (skosxl: 
>>> <http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
>>>           (skos: 
>>> <http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
>>>           (dcterms: <http://purl.org/dc/terms/>)
>>>           (rdfs: 
>>> <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
>>>           (lsr: <https://resource.lingsoft.fi/>)
>>>           (id: <http://snomed.info/id/>)
>>>           (dcat: 
>>> <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
>>>           (dc: <http://purl.org/dc/elements/1.1/>)
>>>           (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>>    (sequence
>>>      (table (vars ?sct_code)
>>>        (row [?sct_code "298314008"])
>>>      )
>>>      (bgp
>>>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>>        (triple ?c skosxl:prefLabel ??0)
>>>        (triple ??0 lsu:code ?sct_code)
>>>      )))
>>>
>>>
>>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>>> For these kind of performance issues it is useful to see the SPARQL
>>>> algebra for the whole query, not just fragments of the query.  You
>>>> can use the qparse command for the version of Jena you are using to
>>>> see how it is optimising your queries e.g.
>>>>
>>>> qparse --explain --query example.rq
>>>>
>>>> As Lorenz suggests this may be the optimiser making a bad guess at
>>>> the appropriate order in which to evaluate the triple patterns within
>>>> the BGP but without the larger query context or the algebra all we
>>>> can do is guess.
>>>>
>>>> Rob
>>>>
>>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>>> Date: Tuesday, 1 November 2022 at 12:53
>>>> To: users@jena.apache.org <us...@jena.apache.org>
>>>> Subject: Re: Weird sparql problem
>>>> Diferent case, but again hanging makes no sense to user, whatever are
>>>> the technical reasons.
>>>>
>>>>     VALUES ?sct_code { "298314008" }
>>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>>
>>>> returns one row immediately, but
>>>>
>>>>     VALUES ?sct_code { "298314008" }
>>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>>> lsu:SNOMEDCT_US
>>>>
>>>> hangs forever
>>>>
>>>>
>>>>     skos:inScheme lsu:SNOMEDCT_US;
>>>>
>>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>>> Hi,
>>>>>
>>>>> comments inline
>>>>>
>>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>>> gets new values instead of binding to previous ?s.
>>>>>>
>>>>>> { select ?t where {
>>>>>> ?s a ?t .
>>>>>>    } limit 10}
>>>>>>     ?t skos:prefLabel ?l
>>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>>> really want labels for classes, you should use a DISTINCT in the
>>>>> subquery such that the intermediate result is small, there shouldn't
>>>>> be that many classes, but many instances with the same class, thus,
>>>>> the join would be more expensive than necessary.
>>>>>
>>>>>
>>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>>> ?s a ?t .
>>>>>>>     ?t skos:prefLabel ?l
>>>>>>>
>>>>>>> returns 3 million triples. Maybe it's related to this?
>>>>> I don't see how this should be related to  your initial query 
>>>>> where ?s
>>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>>> for you to share the dataset somehow? Also, what you can do is to
>>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>>> commandline and put it into the TDB folder. But even without the 
>>>>> query
>>>>> plan should take the first triple pattern, use the spo index as s and
>>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>>> second triple pattern
>>>>>
>>>>> [1]
>>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>>> anything except for open connections, but as you mentioned, 
>>>>>>>> running
>>>>>>>> the queries with only one triple pattern works as expected, so 
>>>>>>>> that
>>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>>
>>>>>>>> Can you reproduce this behavior with newer Jena versions like 
>>>>>>>> 4.6.1?
>>>>>>>>
>>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>>
>>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>>> Fuseki?
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>>> Forgot:
>>>>>>>>>>
>>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>>> join can be used.
>>>>>>>>> It's one row for each.
>>>>>>>>>> - your hardware?
>>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>>> Jena has been running quite a while.
>>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>>> nevertheless.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>>     #?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and this too:
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>     #<https://x.y.z> a ?t .
>>>>>>>>>>>     :C1 skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>>     ?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>> -- 
>>>> Lingsoft - 30 years of Leading Language Management
>>>>
>>>> www.lingsoft.fi<http://www.lingsoft.fi>
>>>>
>>>> Speech Applications - Language Management - Translation - Reader's
>>>> and Writer's Tools - Text Tools - E-books and M-books
>>>>
>>>> Mikael Pesonen
>>>> System Engineer
>>>>
>>>> e-mail: mikael.pesonen@lingsoft.fi
>>>> Tel. +358 2 279 3300
>>>>
>>>> Time zone: GMT+2
>>>>
>>>> Helsinki Office
>>>> Eteläranta 10
>>>> FI-00130 Helsinki
>>>> FINLAND
>>>>
>>>> Turku Office
>>>> Kauppiaskatu 5 A
>>>> FI-20100 Turku
>>>> FINLAND
>>>>
>

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Both your suggestions for rewriting the query worked. I'm lost with the 
reasons, but for future cases, breaking problematic queries with {} is 
they way to go?

On 04/11/2022 11.25, rvesse@dotnetrdf.org wrote:
> So yes as suspected the triple patterns are being reordered badly in the BGP:
>
>    (sequence
>      (table (vars ?sct_code)
>        (row [?sct_code "298314008"])
>      )
>      (bgp
>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>        (triple ?c skosxl:prefLabel ??0)
>        (triple ??0 lsu:code ?sct_code)
>      )))
>
> The optimizer doesn’t take into account the fact that the ?sct_code variable is going to be bound by the VALUES clause (table in the algebra) so considers that the least specific triple pattern (as it has two variables) causing it to evaluate a much less specific triple pattern first.
>
> Lorenz’s suggestion of generating statistics for your dataset is a good one, statistics would likely guide the optimiser that the ?c skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific for your dataset.
>
> You could also try Andy’s suggestion else-thread i.e. --set arq:optReorderBGP=false passed to the CLI command in question, or if this is being called from code ARQ.getContext().set(ARQ.optReorderBGP, false);
>
> The other thing you can do is explicitly break up your query further i.e.
>
> { VALUES ?sct_code { "298314008" }
>    {  _:b0  lsu:code          ?sct_code .
>      ?c    skosxl:prefLabel  _:b0 . }
>    {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>    }
>
> Essentially forcing the engine to evaluate that very unspecific triple pattern last
>
> Another possibility would be to change that triple pattern to be in a FILTER EXISTS condition, so it’d only be evaluated for matches to your other triple patterns i.e.
>
> { VALUES ?sct_code { "298314008" }
>      _:b0  lsu:code          ?sct_code .
>      ?c    skosxl:prefLabel  _:b0 .
>     FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
>    }
>
> Hope this helps,
>
> Rob
>
> From: Lorenz Buehmann <bu...@informatik.uni-leipzig.de>
> Date: Thursday, 3 November 2022 at 11:12
> To: users@jena.apache.org <us...@jena.apache.org>
> Subject: Re: Re: Weird sparql problem
> tdbquery --explain --loc  $TDB_LOC  "query here"
>
> would also work to see the plan - maybe also increase log level to see
> more: https://jena.apache.org/documentation/tdb/optimizer.html
>
> Another question, did you generate the TDB stats such those could be
> used by the optimizer?
>
> for debugging purpose, you could also disable query optimization (put an
> empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
> manually, i.e.
>
>> WHERE
>>    { VALUES ?sct_code { "298314008" }
>>    _:b0  lsu:code          ?sct_code .
>>      ?c    skosxl:prefLabel  _:b0 .
>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>    }
> without stats and based on heuristics (e.g. number of variables in
> triple pattern), otherwise the last triple pattern might always be
> evaluated first
>
>
> On 03.11.22 11:11, Mikael Pesonen wrote:
>> Here's the parse, hope it helps:
>>
>> WHERE
>>    { VALUES ?sct_code { "298314008" }
>>      ?c    skosxl:prefLabel  _:b0 .
>>      _:b0  lsu:code          ?sct_code .
>>      ?c    skos:inScheme     lsu:SNOMEDCT_US
>>    }
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> (prefix ((owl: <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
>>           (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
>>           (skosxl: <http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
>>           (skos: <http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
>>           (dcterms: <http://purl.org/dc/terms/>)
>>           (rdfs: <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
>>           (lsr: <https://resource.lingsoft.fi/>)
>>           (id: <http://snomed.info/id/>)
>>           (dcat: <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
>>           (dc: <http://purl.org/dc/elements/1.1/>)
>>           (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>    (sequence
>>      (table (vars ?sct_code)
>>        (row [?sct_code "298314008"])
>>      )
>>      (bgp
>>        (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>        (triple ?c skosxl:prefLabel ??0)
>>        (triple ??0 lsu:code ?sct_code)
>>      )))
>>
>>
>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>> For these kind of performance issues it is useful to see the SPARQL
>>> algebra for the whole query, not just fragments of the query.  You
>>> can use the qparse command for the version of Jena you are using to
>>> see how it is optimising your queries e.g.
>>>
>>> qparse --explain --query example.rq
>>>
>>> As Lorenz suggests this may be the optimiser making a bad guess at
>>> the appropriate order in which to evaluate the triple patterns within
>>> the BGP but without the larger query context or the algebra all we
>>> can do is guess.
>>>
>>> Rob
>>>
>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>> Date: Tuesday, 1 November 2022 at 12:53
>>> To: users@jena.apache.org <us...@jena.apache.org>
>>> Subject: Re: Weird sparql problem
>>> Diferent case, but again hanging makes no sense to user, whatever are
>>> the technical reasons.
>>>
>>>     VALUES ?sct_code { "298314008" }
>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>
>>> returns one row immediately, but
>>>
>>>     VALUES ?sct_code { "298314008" }
>>>       ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>> lsu:SNOMEDCT_US
>>>
>>> hangs forever
>>>
>>>
>>>     skos:inScheme lsu:SNOMEDCT_US;
>>>
>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>> Hi,
>>>>
>>>> comments inline
>>>>
>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>> gets new values instead of binding to previous ?s.
>>>>>
>>>>> { select ?t where {
>>>>> ?s a ?t .
>>>>>    } limit 10}
>>>>>     ?t skos:prefLabel ?l
>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>> really want labels for classes, you should use a DISTINCT in the
>>>> subquery such that the intermediate result is small, there shouldn't
>>>> be that many classes, but many instances with the same class, thus,
>>>> the join would be more expensive than necessary.
>>>>
>>>>
>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>> ?s a ?t .
>>>>>>     ?t skos:prefLabel ?l
>>>>>>
>>>>>> returns 3 million triples. Maybe it's related to this?
>>>> I don't see how this should be related to  your initial query where ?s
>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>> for you to share the dataset somehow? Also, what you can do is to
>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>> commandline and put it into the TDB folder. But even without the query
>>>> plan should take the first triple pattern, use the spo index as s and
>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>> second triple pattern
>>>>
>>>> [1]
>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>
>>>>
>>>>
>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>> anything except for open connections, but as you mentioned, running
>>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>
>>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>>
>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>
>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>> Fuseki?
>>>>>>>
>>>>>>>
>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>> Forgot:
>>>>>>>>>
>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>> join can be used.
>>>>>>>> It's one row for each.
>>>>>>>>> - your hardware?
>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>> Jena has been running quite a while.
>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>> nevertheless.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>     #?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and this too:
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     #<https://x.y.z> a ?t .
>>>>>>>>>>     :C1 skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>     <https://x.y.z> a ?t .
>>>>>>>>>>     ?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi<http://www.lingsoft.fi>
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Re: Weird sparql problem

Posted by "rvesse@dotnetrdf.org" <rv...@dotnetrdf.org>.
So yes as suspected the triple patterns are being reordered badly in the BGP:

  (sequence
    (table (vars ?sct_code)
      (row [?sct_code "298314008"])
    )
    (bgp
      (triple ?c skos:inScheme lsu:SNOMEDCT_US)
      (triple ?c skosxl:prefLabel ??0)
      (triple ??0 lsu:code ?sct_code)
    )))

The optimizer doesn’t take into account the fact that the ?sct_code variable is going to be bound by the VALUES clause (table in the algebra) so considers that the least specific triple pattern (as it has two variables) causing it to evaluate a much less specific triple pattern first.

Lorenz’s suggestion of generating statistics for your dataset is a good one, statistics would likely guide the optimiser that the ?c skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific for your dataset.

You could also try Andy’s suggestion else-thread i.e. --set arq:optReorderBGP=false passed to the CLI command in question, or if this is being called from code ARQ.getContext().set(ARQ.optReorderBGP, false);

The other thing you can do is explicitly break up your query further i.e.

{ VALUES ?sct_code { "298314008" }
  {  _:b0  lsu:code          ?sct_code .
    ?c    skosxl:prefLabel  _:b0 . }
  {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
  }

Essentially forcing the engine to evaluate that very unspecific triple pattern last

Another possibility would be to change that triple pattern to be in a FILTER EXISTS condition, so it’d only be evaluated for matches to your other triple patterns i.e.

{ VALUES ?sct_code { "298314008" }
    _:b0  lsu:code          ?sct_code .
    ?c    skosxl:prefLabel  _:b0 .
   FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
  }

Hope this helps,

Rob

From: Lorenz Buehmann <bu...@informatik.uni-leipzig.de>
Date: Thursday, 3 November 2022 at 11:12
To: users@jena.apache.org <us...@jena.apache.org>
Subject: Re: Re: Weird sparql problem
tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to see
more: https://jena.apache.org/documentation/tdb/optimizer.html

Another question, did you generate the TDB stats such those could be
used by the optimizer?

for debugging purpose, you could also disable query optimization (put an
empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
manually, i.e.

> WHERE
>   { VALUES ?sct_code { "298314008" }
>   _:b0  lsu:code          ?sct_code .
>     ?c    skosxl:prefLabel  _:b0 .
>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>   }

without stats and based on heuristics (e.g. number of variables in
triple pattern), otherwise the last triple pattern might always be
evaluated first


On 03.11.22 11:11, Mikael Pesonen wrote:
> Here's the parse, hope it helps:
>
> WHERE
>   { VALUES ?sct_code { "298314008" }
>     ?c    skosxl:prefLabel  _:b0 .
>     _:b0  lsu:code          ?sct_code .
>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>   }
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> (prefix ((owl: <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
>          (skosxl: <http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
>          (skos: <http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
>          (dcterms: <http://purl.org/dc/terms/>)
>          (rdfs: <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
>          (lsr: <https://resource.lingsoft.fi/>)
>          (id: <http://snomed.info/id/>)
>          (dcat: <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
>          (dc: <http://purl.org/dc/elements/1.1/>)
>          (lsu: <https://www.lingsoft.fi/ns/umls/>))
>   (sequence
>     (table (vars ?sct_code)
>       (row [?sct_code "298314008"])
>     )
>     (bgp
>       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>       (triple ?c skosxl:prefLabel ??0)
>       (triple ??0 lsu:code ?sct_code)
>     )))
>
>
> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>> For these kind of performance issues it is useful to see the SPARQL
>> algebra for the whole query, not just fragments of the query.  You
>> can use the qparse command for the version of Jena you are using to
>> see how it is optimising your queries e.g.
>>
>> qparse --explain --query example.rq
>>
>> As Lorenz suggests this may be the optimiser making a bad guess at
>> the appropriate order in which to evaluate the triple patterns within
>> the BGP but without the larger query context or the algebra all we
>> can do is guess.
>>
>> Rob
>>
>> From: Mikael Pesonen <mi...@lingsoft.fi>
>> Date: Tuesday, 1 November 2022 at 12:53
>> To: users@jena.apache.org <us...@jena.apache.org>
>> Subject: Re: Weird sparql problem
>> Diferent case, but again hanging makes no sense to user, whatever are
>> the technical reasons.
>>
>>    VALUES ?sct_code { "298314008" }
>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>
>> returns one row immediately, but
>>
>>    VALUES ?sct_code { "298314008" }
>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>> lsu:SNOMEDCT_US
>>
>> hangs forever
>>
>>
>>    skos:inScheme lsu:SNOMEDCT_US;
>>
>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>> Hi,
>>>
>>> comments inline
>>>
>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>> This works as a separate query, but not in a the middle, since ?s
>>>> gets new values instead of binding to previous ?s.
>>>>
>>>> { select ?t where {
>>>> ?s a ?t .
>>>>   } limit 10}
>>>>    ?t skos:prefLabel ?l
>>>
>>> In the middle of what? Subqueries will be evaluated first - if you
>>> really want labels for classes, you should use a DISTINCT in the
>>> subquery such that the intermediate result is small, there shouldn't
>>> be that many classes, but many instances with the same class, thus,
>>> the join would be more expensive than necessary.
>>>
>>>
>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>> ?s a ?t .
>>>>>    ?t skos:prefLabel ?l
>>>>>
>>>>> returns 3 million triples. Maybe it's related to this?
>>> I don't see how this should be related to  your initial query where ?s
>>> was bound, which in my opinion should be an easy join. Is it possible
>>> for you to share the dataset somehow? Also, what you can do is to
>>> compute statistics for the TDB database with tdbstats tool [1] from
>>> commandline and put it into the TDB folder. But even without the query
>>> plan should take the first triple pattern, use the spo index as s and
>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>> second triple pattern
>>>
>>> [1]
>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>>
>>>>>
>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>> anything except for open connections, but as you mentioned, running
>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>
>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>
>>>>>> Or can you reproduce this on different servers as well?
>>>>>>
>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>> Fuseki?
>>>>>>
>>>>>>
>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>> Forgot:
>>>>>>>>
>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>> join can be used.
>>>>>>> It's one row for each.
>>>>>>>> - your hardware?
>>>>>>> Normal server with 16gigs mem.
>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>> been resolved with release 4.6.1.
>>>>>>> Jena has been running quite a while.
>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>> nevertheless.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and this too:
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But this always hangs until timeout
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi<http://www.lingsoft.fi>
>>
>> Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>

Re: Re: Weird sparql problem

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
tdbstats --loc $PATH_TO_TDB_LOCATION

tdbstats --desc $PATH_TO_ASSEMBLER_FILE


On 08.11.22 11:57, Mikael Pesonen wrote:
> I ran your version of the query with none.opt and no change. For
>
> |tdbstats --loc=DIR|--desc=assemblerFile [--graph=URI] Could you 
> please explain loc and desc parameters? |
>
>
>
> On 03/11/2022 13.11, Lorenz Buehmann wrote:
>> tdbquery --explain --loc  $TDB_LOC  "query here"
>>
>> would also work to see the plan - maybe also increase log level to 
>> see more: https://jena.apache.org/documentation/tdb/optimizer.html
>>
>> Another question, did you generate the TDB stats such those could be 
>> used by the optimizer?
>>
>> for debugging purpose, you could also disable query optimization (put 
>> an empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your 
>> query manually, i.e.
>>
>>> WHERE
>>>   { VALUES ?sct_code { "298314008" }
>>>   _:b0  lsu:code          ?sct_code .
>>>     ?c    skosxl:prefLabel  _:b0 .
>>>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>   } 
>>
>> without stats and based on heuristics (e.g. number of variables in 
>> triple pattern), otherwise the last triple pattern might always be 
>> evaluated first
>>
>>
>> On 03.11.22 11:11, Mikael Pesonen wrote:
>>> Here's the parse, hope it helps:
>>>
>>> WHERE
>>>   { VALUES ?sct_code { "298314008" }
>>>     ?c    skosxl:prefLabel  _:b0 .
>>>     _:b0  lsu:code          ?sct_code .
>>>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>>>   }
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> (prefix ((owl: <http://www.w3.org/2002/07/owl#>)
>>>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
>>>          (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
>>>          (skos: <http://www.w3.org/2004/02/skos/core#>)
>>>          (dcterms: <http://purl.org/dc/terms/>)
>>>          (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
>>>          (lsr: <https://resource.lingsoft.fi/>)
>>>          (id: <http://snomed.info/id/>)
>>>          (dcat: <http://www.w3.org/ns/dcat#>)
>>>          (dc: <http://purl.org/dc/elements/1.1/>)
>>>          (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>>   (sequence
>>>     (table (vars ?sct_code)
>>>       (row [?sct_code "298314008"])
>>>     )
>>>     (bgp
>>>       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>>       (triple ?c skosxl:prefLabel ??0)
>>>       (triple ??0 lsu:code ?sct_code)
>>>     )))
>>>
>>>
>>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>>> For these kind of performance issues it is useful to see the SPARQL 
>>>> algebra for the whole query, not just fragments of the query.  You 
>>>> can use the qparse command for the version of Jena you are using to 
>>>> see how it is optimising your queries e.g.
>>>>
>>>> qparse --explain --query example.rq
>>>>
>>>> As Lorenz suggests this may be the optimiser making a bad guess at 
>>>> the appropriate order in which to evaluate the triple patterns 
>>>> within the BGP but without the larger query context or the algebra 
>>>> all we can do is guess.
>>>>
>>>> Rob
>>>>
>>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>>> Date: Tuesday, 1 November 2022 at 12:53
>>>> To: users@jena.apache.org <us...@jena.apache.org>
>>>> Subject: Re: Weird sparql problem
>>>> Diferent case, but again hanging makes no sense to user, whatever are
>>>> the technical reasons.
>>>>
>>>>    VALUES ?sct_code { "298314008" }
>>>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>>
>>>> returns one row immediately, but
>>>>
>>>>    VALUES ?sct_code { "298314008" }
>>>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>>> lsu:SNOMEDCT_US
>>>>
>>>> hangs forever
>>>>
>>>>
>>>>    skos:inScheme lsu:SNOMEDCT_US;
>>>>
>>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>>> Hi,
>>>>>
>>>>> comments inline
>>>>>
>>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>>> gets new values instead of binding to previous ?s.
>>>>>>
>>>>>> { select ?t where {
>>>>>> ?s a ?t .
>>>>>>   } limit 10}
>>>>>>    ?t skos:prefLabel ?l
>>>>>
>>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>>> really want labels for classes, you should use a DISTINCT in the
>>>>> subquery such that the intermediate result is small, there shouldn't
>>>>> be that many classes, but many instances with the same class, thus,
>>>>> the join would be more expensive than necessary.
>>>>>
>>>>>
>>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>>> ?s a ?t .
>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>
>>>>>>> returns 3 million triples. Maybe it's related to this?
>>>>> I don't see how this should be related to  your initial query 
>>>>> where ?s
>>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>>> for you to share the dataset somehow? Also, what you can do is to
>>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>>> commandline and put it into the TDB folder. But even without the 
>>>>> query
>>>>> plan should take the first triple pattern, use the spo index as s and
>>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>>> second triple pattern
>>>>>
>>>>> [1]
>>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>>> anything except for open connections, but as you mentioned, 
>>>>>>>> running
>>>>>>>> the queries with only one triple pattern works as expected, so 
>>>>>>>> that
>>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>>
>>>>>>>> Can you reproduce this behavior with newer Jena versions like 
>>>>>>>> 4.6.1?
>>>>>>>>
>>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>>
>>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>>> Fuseki?
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>>>
>>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>>> Forgot:
>>>>>>>>>>
>>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>>> join can be used.
>>>>>>>>> It's one row for each.
>>>>>>>>>> - your hardware?
>>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>>> Jena has been running quite a while.
>>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>>> nevertheless.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>> <https://x.y.z> a ?t .
>>>>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and this too:
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>>
>>>>>>>>>>> SELECT *
>>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>>> WHERE {
>>>>>>>>>>> <https://x.y.z> a ?t .
>>>>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>> -- 
>>>> Lingsoft - 30 years of Leading Language Management
>>>>
>>>> www.lingsoft.fi<http://www.lingsoft.fi>
>>>>
>>>> Speech Applications - Language Management - Translation - Reader's 
>>>> and Writer's Tools - Text Tools - E-books and M-books
>>>>
>>>> Mikael Pesonen
>>>> System Engineer
>>>>
>>>> e-mail: mikael.pesonen@lingsoft.fi
>>>> Tel. +358 2 279 3300
>>>>
>>>> Time zone: GMT+2
>>>>
>>>> Helsinki Office
>>>> Eteläranta 10
>>>> FI-00130 Helsinki
>>>> FINLAND
>>>>
>>>> Turku Office
>>>> Kauppiaskatu 5 A
>>>> FI-20100 Turku
>>>> FINLAND
>>>>
>>>
>

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I ran your version of the query with none.opt and no change. For

|tdbstats --loc=DIR|--desc=assemblerFile [--graph=URI] Could you please 
explain loc and desc parameters? |



On 03/11/2022 13.11, Lorenz Buehmann wrote:
> tdbquery --explain --loc  $TDB_LOC  "query here"
>
> would also work to see the plan - maybe also increase log level to see 
> more: https://jena.apache.org/documentation/tdb/optimizer.html
>
> Another question, did you generate the TDB stats such those could be 
> used by the optimizer?
>
> for debugging purpose, you could also disable query optimization (put 
> an empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your 
> query manually, i.e.
>
>> WHERE
>>   { VALUES ?sct_code { "298314008" }
>>   _:b0  lsu:code          ?sct_code .
>>     ?c    skosxl:prefLabel  _:b0 .
>>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>>   } 
>
> without stats and based on heuristics (e.g. number of variables in 
> triple pattern), otherwise the last triple pattern might always be 
> evaluated first
>
>
> On 03.11.22 11:11, Mikael Pesonen wrote:
>> Here's the parse, hope it helps:
>>
>> WHERE
>>   { VALUES ?sct_code { "298314008" }
>>     ?c    skosxl:prefLabel  _:b0 .
>>     _:b0  lsu:code          ?sct_code .
>>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>>   }
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> (prefix ((owl: <http://www.w3.org/2002/07/owl#>)
>>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
>>          (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
>>          (skos: <http://www.w3.org/2004/02/skos/core#>)
>>          (dcterms: <http://purl.org/dc/terms/>)
>>          (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
>>          (lsr: <https://resource.lingsoft.fi/>)
>>          (id: <http://snomed.info/id/>)
>>          (dcat: <http://www.w3.org/ns/dcat#>)
>>          (dc: <http://purl.org/dc/elements/1.1/>)
>>          (lsu: <https://www.lingsoft.fi/ns/umls/>))
>>   (sequence
>>     (table (vars ?sct_code)
>>       (row [?sct_code "298314008"])
>>     )
>>     (bgp
>>       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>>       (triple ?c skosxl:prefLabel ??0)
>>       (triple ??0 lsu:code ?sct_code)
>>     )))
>>
>>
>> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>>> For these kind of performance issues it is useful to see the SPARQL 
>>> algebra for the whole query, not just fragments of the query.  You 
>>> can use the qparse command for the version of Jena you are using to 
>>> see how it is optimising your queries e.g.
>>>
>>> qparse --explain --query example.rq
>>>
>>> As Lorenz suggests this may be the optimiser making a bad guess at 
>>> the appropriate order in which to evaluate the triple patterns 
>>> within the BGP but without the larger query context or the algebra 
>>> all we can do is guess.
>>>
>>> Rob
>>>
>>> From: Mikael Pesonen <mi...@lingsoft.fi>
>>> Date: Tuesday, 1 November 2022 at 12:53
>>> To: users@jena.apache.org <us...@jena.apache.org>
>>> Subject: Re: Weird sparql problem
>>> Diferent case, but again hanging makes no sense to user, whatever are
>>> the technical reasons.
>>>
>>>    VALUES ?sct_code { "298314008" }
>>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>>
>>> returns one row immediately, but
>>>
>>>    VALUES ?sct_code { "298314008" }
>>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>>> lsu:SNOMEDCT_US
>>>
>>> hangs forever
>>>
>>>
>>>    skos:inScheme lsu:SNOMEDCT_US;
>>>
>>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>>> Hi,
>>>>
>>>> comments inline
>>>>
>>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>>> This works as a separate query, but not in a the middle, since ?s
>>>>> gets new values instead of binding to previous ?s.
>>>>>
>>>>> { select ?t where {
>>>>> ?s a ?t .
>>>>>   } limit 10}
>>>>>    ?t skos:prefLabel ?l
>>>>
>>>> In the middle of what? Subqueries will be evaluated first - if you
>>>> really want labels for classes, you should use a DISTINCT in the
>>>> subquery such that the intermediate result is small, there shouldn't
>>>> be that many classes, but many instances with the same class, thus,
>>>> the join would be more expensive than necessary.
>>>>
>>>>
>>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>> ?s a ?t .
>>>>>>    ?t skos:prefLabel ?l
>>>>>>
>>>>>> returns 3 million triples. Maybe it's related to this?
>>>> I don't see how this should be related to  your initial query where ?s
>>>> was bound, which in my opinion should be an easy join. Is it possible
>>>> for you to share the dataset somehow? Also, what you can do is to
>>>> compute statistics for the TDB database with tdbstats tool [1] from
>>>> commandline and put it into the TDB folder. But even without the query
>>>> plan should take the first triple pattern, use the spo index as s and
>>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>>> second triple pattern
>>>>
>>>> [1]
>>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>
>>>>
>>>>
>>>>>>
>>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>>> anything except for open connections, but as you mentioned, running
>>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>>
>>>>>>> Can you reproduce this behavior with newer Jena versions like 
>>>>>>> 4.6.1?
>>>>>>>
>>>>>>> Or can you reproduce this on different servers as well?
>>>>>>>
>>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>>> Fuseki?
>>>>>>>
>>>>>>>
>>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>>
>>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>>> Forgot:
>>>>>>>>>
>>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>>> join can be used.
>>>>>>>> It's one row for each.
>>>>>>>>> - your hardware?
>>>>>>>> Normal server with 16gigs mem.
>>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>>> been resolved with release 4.6.1.
>>>>>>>> Jena has been running quite a while.
>>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>>> nevertheless.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>> <https://x.y.z> a ?t .
>>>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and this too:
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But this always hangs until timeout
>>>>>>>>>>
>>>>>>>>>> SELECT *
>>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>>> WHERE {
>>>>>>>>>> <https://x.y.z> a ?t .
>>>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>> -- 
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi<http://www.lingsoft.fi>
>>>
>>> Speech Applications - Language Management - Translation - Reader's 
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail:mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Re: Weird sparql problem

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to see 
more: https://jena.apache.org/documentation/tdb/optimizer.html

Another question, did you generate the TDB stats such those could be 
used by the optimizer?

for debugging purpose, you could also disable query optimization (put an 
empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query 
manually, i.e.

> WHERE
>   { VALUES ?sct_code { "298314008" }
>   _:b0  lsu:code          ?sct_code .
>     ?c    skosxl:prefLabel  _:b0 .
>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>   } 

without stats and based on heuristics (e.g. number of variables in 
triple pattern), otherwise the last triple pattern might always be 
evaluated first


On 03.11.22 11:11, Mikael Pesonen wrote:
> Here's the parse, hope it helps:
>
> WHERE
>   { VALUES ?sct_code { "298314008" }
>     ?c    skosxl:prefLabel  _:b0 .
>     _:b0  lsu:code          ?sct_code .
>     ?c    skos:inScheme     lsu:SNOMEDCT_US
>   }
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> (prefix ((owl: <http://www.w3.org/2002/07/owl#>)
>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
>          (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
>          (skos: <http://www.w3.org/2004/02/skos/core#>)
>          (dcterms: <http://purl.org/dc/terms/>)
>          (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
>          (lsr: <https://resource.lingsoft.fi/>)
>          (id: <http://snomed.info/id/>)
>          (dcat: <http://www.w3.org/ns/dcat#>)
>          (dc: <http://purl.org/dc/elements/1.1/>)
>          (lsu: <https://www.lingsoft.fi/ns/umls/>))
>   (sequence
>     (table (vars ?sct_code)
>       (row [?sct_code "298314008"])
>     )
>     (bgp
>       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
>       (triple ?c skosxl:prefLabel ??0)
>       (triple ??0 lsu:code ?sct_code)
>     )))
>
>
> On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
>> For these kind of performance issues it is useful to see the SPARQL 
>> algebra for the whole query, not just fragments of the query.  You 
>> can use the qparse command for the version of Jena you are using to 
>> see how it is optimising your queries e.g.
>>
>> qparse --explain --query example.rq
>>
>> As Lorenz suggests this may be the optimiser making a bad guess at 
>> the appropriate order in which to evaluate the triple patterns within 
>> the BGP but without the larger query context or the algebra all we 
>> can do is guess.
>>
>> Rob
>>
>> From: Mikael Pesonen <mi...@lingsoft.fi>
>> Date: Tuesday, 1 November 2022 at 12:53
>> To: users@jena.apache.org <us...@jena.apache.org>
>> Subject: Re: Weird sparql problem
>> Diferent case, but again hanging makes no sense to user, whatever are
>> the technical reasons.
>>
>>    VALUES ?sct_code { "298314008" }
>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>
>> returns one row immediately, but
>>
>>    VALUES ?sct_code { "298314008" }
>>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
>> lsu:SNOMEDCT_US
>>
>> hangs forever
>>
>>
>>    skos:inScheme lsu:SNOMEDCT_US;
>>
>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>> Hi,
>>>
>>> comments inline
>>>
>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>> This works as a separate query, but not in a the middle, since ?s
>>>> gets new values instead of binding to previous ?s.
>>>>
>>>> { select ?t where {
>>>> ?s a ?t .
>>>>   } limit 10}
>>>>    ?t skos:prefLabel ?l
>>>
>>> In the middle of what? Subqueries will be evaluated first - if you
>>> really want labels for classes, you should use a DISTINCT in the
>>> subquery such that the intermediate result is small, there shouldn't
>>> be that many classes, but many instances with the same class, thus,
>>> the join would be more expensive than necessary.
>>>
>>>
>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>> ?s a ?t .
>>>>>    ?t skos:prefLabel ?l
>>>>>
>>>>> returns 3 million triples. Maybe it's related to this?
>>> I don't see how this should be related to  your initial query where ?s
>>> was bound, which in my opinion should be an easy join. Is it possible
>>> for you to share the dataset somehow? Also, what you can do is to
>>> compute statistics for the TDB database with tdbstats tool [1] from
>>> commandline and put it into the TDB folder. But even without the query
>>> plan should take the first triple pattern, use the spo index as s and
>>> p are bound, then pass the bindings of ?o to the evaluation of the
>>> second triple pattern
>>>
>>> [1]
>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>
>>>
>>>
>>>>>
>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>>> anything except for open connections, but as you mentioned, running
>>>>>> the queries with only one triple pattern works as expected, so that
>>>>>> too many open connections shouldn't be an issue most likely.
>>>>>>
>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>
>>>>>> Or can you reproduce this on different servers as well?
>>>>>>
>>>>>> Is it also stuck of your run the query directly after you restart
>>>>>> Fuseki?
>>>>>>
>>>>>>
>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>> Forgot:
>>>>>>>>
>>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>>> join can be used.
>>>>>>> It's one row for each.
>>>>>>>> - your hardware?
>>>>>>> Normal server with 16gigs mem.
>>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>>> been resolved with release 4.6.1.
>>>>>>> Jena has been running quite a while.
>>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>>> nevertheless.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and this too:
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But this always hangs until timeout
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>> -- 
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi<http://www.lingsoft.fi>
>>
>> Speech Applications - Language Management - Translation - Reader's 
>> and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Here's the parse, hope it helps:

WHERE
   { VALUES ?sct_code { "298314008" }
     ?c    skosxl:prefLabel  _:b0 .
     _:b0  lsu:code          ?sct_code .
     ?c    skos:inScheme     lsu:SNOMEDCT_US
   }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(prefix ((owl: <http://www.w3.org/2002/07/owl#>)
          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
          (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
          (skos: <http://www.w3.org/2004/02/skos/core#>)
          (dcterms: <http://purl.org/dc/terms/>)
          (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
          (lsr: <https://resource.lingsoft.fi/>)
          (id: <http://snomed.info/id/>)
          (dcat: <http://www.w3.org/ns/dcat#>)
          (dc: <http://purl.org/dc/elements/1.1/>)
          (lsu: <https://www.lingsoft.fi/ns/umls/>))
   (sequence
     (table (vars ?sct_code)
       (row [?sct_code "298314008"])
     )
     (bgp
       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
       (triple ?c skosxl:prefLabel ??0)
       (triple ??0 lsu:code ?sct_code)
     )))


On 02/11/2022 12.32, rvesse@dotnetrdf.org wrote:
> For these kind of performance issues it is useful to see the SPARQL algebra for the whole query, not just fragments of the query.  You can use the qparse command for the version of Jena you are using to see how it is optimising your queries e.g.
>
> qparse --explain --query example.rq
>
> As Lorenz suggests this may be the optimiser making a bad guess at the appropriate order in which to evaluate the triple patterns within the BGP but without the larger query context or the algebra all we can do is guess.
>
> Rob
>
> From: Mikael Pesonen <mi...@lingsoft.fi>
> Date: Tuesday, 1 November 2022 at 12:53
> To: users@jena.apache.org <us...@jena.apache.org>
> Subject: Re: Weird sparql problem
> Diferent case, but again hanging makes no sense to user, whatever are
> the technical reasons.
>
>    VALUES ?sct_code { "298314008" }
>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>
> returns one row immediately, but
>
>    VALUES ?sct_code { "298314008" }
>      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
> lsu:SNOMEDCT_US
>
> hangs forever
>
>
>    skos:inScheme lsu:SNOMEDCT_US;
>
> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>> Hi,
>>
>> comments inline
>>
>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>> This works as a separate query, but not in a the middle, since ?s
>>> gets new values instead of binding to previous ?s.
>>>
>>> { select ?t where {
>>> ?s a ?t .
>>>   } limit 10}
>>>    ?t skos:prefLabel ?l
>>
>> In the middle of what? Subqueries will be evaluated first -  if you
>> really want labels for classes, you should use a DISTINCT in the
>> subquery such that the intermediate result is small, there shouldn't
>> be that many classes, but many instances with the same class, thus,
>> the join would be more expensive than necessary.
>>
>>
>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>> ?s a ?t .
>>>>    ?t skos:prefLabel ?l
>>>>
>>>> returns 3 million triples. Maybe it's related to this?
>> I don't see how this should be related to  your initial query where ?s
>> was bound, which in my opinion should be an easy join. Is it possible
>> for you to share the dataset somehow? Also, what you can do is to
>> compute statistics for the TDB database with tdbstats tool [1] from
>> commandline and put it into the TDB folder. But even without the query
>> plan should take the first triple pattern, use the spo index as s and
>> p are bound, then pass the bindings of ?o to the evaluation of the
>> second triple pattern
>>
>> [1]
>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>>
>>>>
>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>>> binding, thus, the size is tiny - honestly I can't think of
>>>>> anything except for open connections, but as you mentioned, running
>>>>> the queries with only one triple pattern works as expected, so that
>>>>> too many open connections shouldn't be an issue most likely.
>>>>>
>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>
>>>>> Or can you reproduce this on different servers as well?
>>>>>
>>>>> Is it also stuck of your run the query directly after you restart
>>>>> Fuseki?
>>>>>
>>>>>
>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>
>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>> Forgot:
>>>>>>>
>>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>>> join can be used.
>>>>>> It's one row for each.
>>>>>>> - your hardware?
>>>>>> Normal server with 16gigs mem.
>>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>>> version, but only with TDB and too many open connections. It has
>>>>>>> been resolved with release 4.6.1.
>>>>>> Jena has been running quite a while.
>>>>>>> Might not be related, but I'm mentioning all things here
>>>>>>> nevertheless.
>>>>>>>
>>>>>>>
>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>> This returns one row fast, say :C1
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>    #?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> and this too:
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    #<https://x.y.z> a ?t .
>>>>>>>>    :C1 skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> But this always hangs until timeout
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>    <https://x.y.z> a ?t .
>>>>>>>>    ?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi<http://www.lingsoft.fi>
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Weird sparql problem

Posted by "rvesse@dotnetrdf.org" <rv...@dotnetrdf.org>.
For these kind of performance issues it is useful to see the SPARQL algebra for the whole query, not just fragments of the query.  You can use the qparse command for the version of Jena you are using to see how it is optimising your queries e.g.

qparse --explain --query example.rq

As Lorenz suggests this may be the optimiser making a bad guess at the appropriate order in which to evaluate the triple patterns within the BGP but without the larger query context or the algebra all we can do is guess.

Rob

From: Mikael Pesonen <mi...@lingsoft.fi>
Date: Tuesday, 1 November 2022 at 12:53
To: users@jena.apache.org <us...@jena.apache.org>
Subject: Re: Weird sparql problem
Diferent case, but again hanging makes no sense to user, whatever are
the technical reasons.

  VALUES ?sct_code { "298314008" }
    ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

  VALUES ?sct_code { "298314008" }
    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
lsu:SNOMEDCT_US

hangs forever


  skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:
> Hi,
>
> comments inline
>
> On 17.10.22 14:35, Mikael Pesonen wrote:
>> This works as a separate query, but not in a the middle, since ?s
>> gets new values instead of binding to previous ?s.
>>
>> { select ?t where {
>> ?s a ?t .
>>  } limit 10}
>>   ?t skos:prefLabel ?l
>
>
> In the middle of what? Subqueries will be evaluated first -  if you
> really want labels for classes, you should use a DISTINCT in the
> subquery such that the intermediate result is small, there shouldn't
> be that many classes, but many instances with the same class, thus,
> the join would be more expensive than necessary.
>
>
>>
>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>
>>> ?s a ?t .
>>>   ?t skos:prefLabel ?l
>>>
>>> returns 3 million triples. Maybe it's related to this?
>
> I don't see how this should be related to  your initial query where ?s
> was bound, which in my opinion should be an easy join. Is it possible
> for you to share the dataset somehow? Also, what you can do is to
> compute statistics for the TDB database with tdbstats tool [1] from
> commandline and put it into the TDB folder. But even without the query
> plan should take the first triple pattern, use the spo index as s and
> p are bound, then pass the bindings of ?o to the evaluation of the
> second triple pattern
>
> [1]
> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>
>
>>>
>>>
>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>> Weird, only 10M triples and each triple pattern returns only 1
>>>> binding, thus, the size is tiny - honestly I can't think of
>>>> anything except for open connections, but as you mentioned, running
>>>> the queries with only one triple pattern works as expected, so that
>>>> too many open connections shouldn't be an issue most likely.
>>>>
>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>
>>>> Or can you reproduce this on different servers as well?
>>>>
>>>> Is it also stuck of your run the query directly after you restart
>>>> Fuseki?
>>>>
>>>>
>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>
>>>>>
>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>> Forgot:
>>>>>>
>>>>>> - size of result for each triple pattern? Might affect if hash
>>>>>> join can be used.
>>>>> It's one row for each.
>>>>>>
>>>>>> - your hardware?
>>>>> Normal server with 16gigs mem.
>>>>>>
>>>>>> - is it just the first query after starting Fuseki? Connections
>>>>>> have been closed? Note, there was also a bug in a recent Jena
>>>>>> version, but only with TDB and too many open connections. It has
>>>>>> been resolved with release 4.6.1.
>>>>> Jena has been running quite a while.
>>>>>>
>>>>>> Might not be related, but I'm mentioning all things here
>>>>>> nevertheless.
>>>>>>
>>>>>>
>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> This returns one row fast, say :C1
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>   #?t skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> and this too:
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   #<https://x.y.z> a ?t .
>>>>>>>   :C1 skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> But this always hangs until timeout
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>   ?t skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>>
>>>
>>

--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi<http://www.lingsoft.fi>

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
BIND( "298314008" AS ?sct_code )
    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
lsu:SNOMEDCT_US

takes a long time also, about 8 minutes (all previous are also slow but 
finish)

On 02/11/2022 9.35, Lorenz Buehmann wrote:
> I think if you use
>
> BIND( "298314008" AS ?sct_code )
>
> it would work for the second query?
>
> Looks the the query optimizer does the join in wrong order
>
> @Andy?
>
> On 01.11.22 13:52, Mikael Pesonen wrote:
>> Diferent case, but again hanging makes no sense to user, whatever are 
>> the technical reasons.
>>
>>  VALUES ?sct_code { "298314008" }
>>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>
>> returns one row immediately, but
>>
>>  VALUES ?sct_code { "298314008" }
>>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
>> lsu:SNOMEDCT_US
>>
>> hangs forever
>>
>>
>>  skos:inScheme lsu:SNOMEDCT_US;
>>
>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>> Hi,
>>>
>>> comments inline
>>>
>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>> This works as a separate query, but not in a the middle, since ?s 
>>>> gets new values instead of binding to previous ?s.
>>>>
>>>> { select ?t where {
>>>> ?s a ?t .
>>>>  } limit 10}
>>>>   ?t skos:prefLabel ?l
>>>
>>>
>>> In the middle of what? Subqueries will be evaluated first - if you 
>>> really want labels for classes, you should use a DISTINCT in the 
>>> subquery such that the intermediate result is small, there shouldn't 
>>> be that many classes, but many instances with the same class, thus, 
>>> the join would be more expensive than necessary.
>>>
>>>
>>>>
>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>
>>>>> ?s a ?t .
>>>>>   ?t skos:prefLabel ?l
>>>>>
>>>>> returns 3 million triples. Maybe it's related to this?
>>>
>>> I don't see how this should be related to  your initial query where 
>>> ?s was bound, which in my opinion should be an easy join. Is it 
>>> possible for you to share the dataset somehow? Also, what you can do 
>>> is to compute statistics for the TDB database with tdbstats tool [1] 
>>> from commandline and put it into the TDB folder. But even without 
>>> the query plan should take the first triple pattern, use the spo 
>>> index as s and p are bound, then pass the bindings of ?o to the 
>>> evaluation of the second triple pattern
>>>
>>> [1] 
>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>>>>
>>>>>
>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>> Weird, only 10M triples and each triple pattern returns only 1 
>>>>>> binding, thus, the size is tiny - honestly I can't think of 
>>>>>> anything except for open connections, but as you mentioned, 
>>>>>> running the queries with only one triple pattern works as 
>>>>>> expected, so that too many open connections shouldn't be an issue 
>>>>>> most likely.
>>>>>>
>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>
>>>>>> Or can you reproduce this on different servers as well?
>>>>>>
>>>>>> Is it also stuck of your run the query directly after you restart 
>>>>>> Fuseki?
>>>>>>
>>>>>>
>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>> Forgot:
>>>>>>>>
>>>>>>>> - size of result for each triple pattern? Might affect if hash 
>>>>>>>> join can be used.
>>>>>>> It's one row for each.
>>>>>>>>
>>>>>>>> - your hardware?
>>>>>>> Normal server with 16gigs mem.
>>>>>>>>
>>>>>>>> - is it just the first query after starting Fuseki? Connections 
>>>>>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>>>>>> version, but only with TDB and too many open connections. It 
>>>>>>>> has been resolved with release 4.6.1.
>>>>>>> Jena has been running quite a while.
>>>>>>>>
>>>>>>>> Might not be related, but I'm mentioning all things here 
>>>>>>>> nevertheless.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>
>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>>   #?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and this too:
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   #<https://x.y.z> a ?t .
>>>>>>>>>   :C1 skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But this always hangs until timeout
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>>   ?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>>>>
>>>>>
>>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Weird sparql problem

Posted by Andy Seaborne <an...@apache.org>.

On 02/11/2022 07:35, Lorenz Buehmann wrote:
> I think if you use
> 
> BIND( "298314008" AS ?sct_code )
> 
> it would work for the second query?
> 
> Looks the the query optimizer does the join in wrong order
> 
> @Andy?

May be. may be now - no way to tell from the report. We don't know what 
the data looks like.

"hang forever" could be a low level issue. Is CPU being consumed?

Could try
--set arq:optReorderBGP=false

but from the information given, I don't think it will make a difference.

(if it's TDB, I'd expect that to pick a reasonable order a the TDB stage).

     Andy

> 
> On 01.11.22 13:52, Mikael Pesonen wrote:
>> Diferent case, but again hanging makes no sense to user, whatever are 
>> the technical reasons.
>>
>>  VALUES ?sct_code { "298314008" }
>>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>>
>> returns one row immediately, but
>>
>>  VALUES ?sct_code { "298314008" }
>>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
>> lsu:SNOMEDCT_US
>>
>> hangs forever
>>
>>
>>  skos:inScheme lsu:SNOMEDCT_US;
>>
>> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>>> Hi,
>>>
>>> comments inline
>>>
>>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>>> This works as a separate query, but not in a the middle, since ?s 
>>>> gets new values instead of binding to previous ?s.
>>>>
>>>> { select ?t where {
>>>> ?s a ?t .
>>>>  } limit 10}
>>>>   ?t skos:prefLabel ?l
>>>
>>>
>>> In the middle of what? Subqueries will be evaluated first -  if you 
>>> really want labels for classes, you should use a DISTINCT in the 
>>> subquery such that the intermediate result is small, there shouldn't 
>>> be that many classes, but many instances with the same class, thus, 
>>> the join would be more expensive than necessary.
>>>
>>>
>>>>
>>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>>
>>>>> ?s a ?t .
>>>>>   ?t skos:prefLabel ?l
>>>>>
>>>>> returns 3 million triples. Maybe it's related to this?
>>>
>>> I don't see how this should be related to  your initial query where 
>>> ?s was bound, which in my opinion should be an easy join. Is it 
>>> possible for you to share the dataset somehow? Also, what you can do 
>>> is to compute statistics for the TDB database with tdbstats tool [1] 
>>> from commandline and put it into the TDB folder. But even without the 
>>> query plan should take the first triple pattern, use the spo index as 
>>> s and p are bound, then pass the bindings of ?o to the evaluation of 
>>> the second triple pattern
>>>
>>> [1] 
>>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>>>>
>>>>>
>>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>>> Weird, only 10M triples and each triple pattern returns only 1 
>>>>>> binding, thus, the size is tiny - honestly I can't think of 
>>>>>> anything except for open connections, but as you mentioned, 
>>>>>> running the queries with only one triple pattern works as 
>>>>>> expected, so that too many open connections shouldn't be an issue 
>>>>>> most likely.
>>>>>>
>>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>>
>>>>>> Or can you reproduce this on different servers as well?
>>>>>>
>>>>>> Is it also stuck of your run the query directly after you restart 
>>>>>> Fuseki?
>>>>>>
>>>>>>
>>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>>> Forgot:
>>>>>>>>
>>>>>>>> - size of result for each triple pattern? Might affect if hash 
>>>>>>>> join can be used.
>>>>>>> It's one row for each.
>>>>>>>>
>>>>>>>> - your hardware?
>>>>>>> Normal server with 16gigs mem.
>>>>>>>>
>>>>>>>> - is it just the first query after starting Fuseki? Connections 
>>>>>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>>>>>> version, but only with TDB and too many open connections. It has 
>>>>>>>> been resolved with release 4.6.1.
>>>>>>> Jena has been running quite a while.
>>>>>>>>
>>>>>>>> Might not be related, but I'm mentioning all things here 
>>>>>>>> nevertheless.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>>
>>>>>>>>> This returns one row fast, say :C1
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>>   #?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and this too:
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   #<https://x.y.z> a ?t .
>>>>>>>>>   :C1 skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But this always hangs until timeout
>>>>>>>>>
>>>>>>>>> SELECT *
>>>>>>>>> FROM <https://a.b.c>
>>>>>>>>> WHERE {
>>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>>   ?t skos:prefLabel ?l
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>>>>
>>>>>
>>>>
>>

Re: Re: Weird sparql problem

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
I think if you use

BIND( "298314008" AS ?sct_code )

it would work for the second query?

Looks the the query optimizer does the join in wrong order

@Andy?

On 01.11.22 13:52, Mikael Pesonen wrote:
> Diferent case, but again hanging makes no sense to user, whatever are 
> the technical reasons.
>
>  VALUES ?sct_code { "298314008" }
>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]
>
> returns one row immediately, but
>
>  VALUES ?sct_code { "298314008" }
>    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
> lsu:SNOMEDCT_US
>
> hangs forever
>
>
>  skos:inScheme lsu:SNOMEDCT_US;
>
> On 18/10/2022 9.08, Lorenz Buehmann wrote:
>> Hi,
>>
>> comments inline
>>
>> On 17.10.22 14:35, Mikael Pesonen wrote:
>>> This works as a separate query, but not in a the middle, since ?s 
>>> gets new values instead of binding to previous ?s.
>>>
>>> { select ?t where {
>>> ?s a ?t .
>>>  } limit 10}
>>>   ?t skos:prefLabel ?l
>>
>>
>> In the middle of what? Subqueries will be evaluated first -  if you 
>> really want labels for classes, you should use a DISTINCT in the 
>> subquery such that the intermediate result is small, there shouldn't 
>> be that many classes, but many instances with the same class, thus, 
>> the join would be more expensive than necessary.
>>
>>
>>>
>>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>>
>>>> ?s a ?t .
>>>>   ?t skos:prefLabel ?l
>>>>
>>>> returns 3 million triples. Maybe it's related to this?
>>
>> I don't see how this should be related to  your initial query where 
>> ?s was bound, which in my opinion should be an easy join. Is it 
>> possible for you to share the dataset somehow? Also, what you can do 
>> is to compute statistics for the TDB database with tdbstats tool [1] 
>> from commandline and put it into the TDB folder. But even without the 
>> query plan should take the first triple pattern, use the spo index as 
>> s and p are bound, then pass the bindings of ?o to the evaluation of 
>> the second triple pattern
>>
>> [1] 
>> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>>
>>>>
>>>>
>>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>>> Weird, only 10M triples and each triple pattern returns only 1 
>>>>> binding, thus, the size is tiny - honestly I can't think of 
>>>>> anything except for open connections, but as you mentioned, 
>>>>> running the queries with only one triple pattern works as 
>>>>> expected, so that too many open connections shouldn't be an issue 
>>>>> most likely.
>>>>>
>>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>>
>>>>> Or can you reproduce this on different servers as well?
>>>>>
>>>>> Is it also stuck of your run the query directly after you restart 
>>>>> Fuseki?
>>>>>
>>>>>
>>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>>
>>>>>>
>>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>>> Forgot:
>>>>>>>
>>>>>>> - size of result for each triple pattern? Might affect if hash 
>>>>>>> join can be used.
>>>>>> It's one row for each.
>>>>>>>
>>>>>>> - your hardware?
>>>>>> Normal server with 16gigs mem.
>>>>>>>
>>>>>>> - is it just the first query after starting Fuseki? Connections 
>>>>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>>>>> version, but only with TDB and too many open connections. It has 
>>>>>>> been resolved with release 4.6.1.
>>>>>> Jena has been running quite a while.
>>>>>>>
>>>>>>> Might not be related, but I'm mentioning all things here 
>>>>>>> nevertheless.
>>>>>>>
>>>>>>>
>>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>>
>>>>>>>> This returns one row fast, say :C1
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>   #?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> and this too:
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>   #<https://x.y.z> a ?t .
>>>>>>>>   :C1 skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> But this always hangs until timeout
>>>>>>>>
>>>>>>>> SELECT *
>>>>>>>> FROM <https://a.b.c>
>>>>>>>> WHERE {
>>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>>   ?t skos:prefLabel ?l
>>>>>>>> }
>>>>>>>>
>>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>>>
>>>>
>>>
>

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Diferent case, but again hanging makes no sense to user, whatever are 
the technical reasons.

  VALUES ?sct_code { "298314008" }
    ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

  VALUES ?sct_code { "298314008" }
    ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
lsu:SNOMEDCT_US

hangs forever


  skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:
> Hi,
>
> comments inline
>
> On 17.10.22 14:35, Mikael Pesonen wrote:
>> This works as a separate query, but not in a the middle, since ?s 
>> gets new values instead of binding to previous ?s.
>>
>> { select ?t where {
>> ?s a ?t .
>>  } limit 10}
>>   ?t skos:prefLabel ?l
>
>
> In the middle of what? Subqueries will be evaluated first -  if you 
> really want labels for classes, you should use a DISTINCT in the 
> subquery such that the intermediate result is small, there shouldn't 
> be that many classes, but many instances with the same class, thus, 
> the join would be more expensive than necessary.
>
>
>>
>> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>>
>>> ?s a ?t .
>>>   ?t skos:prefLabel ?l
>>>
>>> returns 3 million triples. Maybe it's related to this?
>
> I don't see how this should be related to  your initial query where ?s 
> was bound, which in my opinion should be an easy join. Is it possible 
> for you to share the dataset somehow? Also, what you can do is to 
> compute statistics for the TDB database with tdbstats tool [1] from 
> commandline and put it into the TDB folder. But even without the query 
> plan should take the first triple pattern, use the spo index as s and 
> p are bound, then pass the bindings of ?o to the evaluation of the 
> second triple pattern
>
> [1] 
> https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>
>
>>>
>>>
>>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>>> Weird, only 10M triples and each triple pattern returns only 1 
>>>> binding, thus, the size is tiny - honestly I can't think of 
>>>> anything except for open connections, but as you mentioned, running 
>>>> the queries with only one triple pattern works as expected, so that 
>>>> too many open connections shouldn't be an issue most likely.
>>>>
>>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>>
>>>> Or can you reproduce this on different servers as well?
>>>>
>>>> Is it also stuck of your run the query directly after you restart 
>>>> Fuseki?
>>>>
>>>>
>>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>>
>>>>>
>>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>>> Forgot:
>>>>>>
>>>>>> - size of result for each triple pattern? Might affect if hash 
>>>>>> join can be used.
>>>>> It's one row for each.
>>>>>>
>>>>>> - your hardware?
>>>>> Normal server with 16gigs mem.
>>>>>>
>>>>>> - is it just the first query after starting Fuseki? Connections 
>>>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>>>> version, but only with TDB and too many open connections. It has 
>>>>>> been resolved with release 4.6.1.
>>>>> Jena has been running quite a while.
>>>>>>
>>>>>> Might not be related, but I'm mentioning all things here 
>>>>>> nevertheless.
>>>>>>
>>>>>>
>>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> This returns one row fast, say :C1
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>   #?t skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> and this too:
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   #<https://x.y.z> a ?t .
>>>>>>>   :C1 skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> But this always hangs until timeout
>>>>>>>
>>>>>>> SELECT *
>>>>>>> FROM <https://a.b.c>
>>>>>>> WHERE {
>>>>>>>   <https://x.y.z> a ?t .
>>>>>>>   ?t skos:prefLabel ?l
>>>>>>> }
>>>>>>>
>>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>>
>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Re: Weird sparql problem

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:
> This works as a separate query, but not in a the middle, since ?s gets 
> new values instead of binding to previous ?s.
>
> { select ?t where {
> ?s a ?t .
>  } limit 10}
>   ?t skos:prefLabel ?l


In the middle of what? Subqueries will be evaluated first -  if you 
really want labels for classes, you should use a DISTINCT in the 
subquery such that the intermediate result is small, there shouldn't be 
that many classes, but many instances with the same class, thus, the 
join would be more expensive than necessary.


>
> On 17/10/2022 14.56, Mikael Pesonen wrote:
>>
>> ?s a ?t .
>>   ?t skos:prefLabel ?l
>>
>> returns 3 million triples. Maybe it's related to this?

I don't see how this should be related to  your initial query where ?s 
was bound, which in my opinion should be an easy join. Is it possible 
for you to share the dataset somehow? Also, what you can do is to 
compute statistics for the TDB database with tdbstats tool [1] from 
commandline and put it into the TDB folder. But even without the query 
plan should take the first triple pattern, use the spo index as s and p 
are bound, then pass the bindings of ?o to the evaluation of the second 
triple pattern

[1] 
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file


>>
>>
>> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>>> Weird, only 10M triples and each triple pattern returns only 1 
>>> binding, thus, the size is tiny - honestly I can't think of anything 
>>> except for open connections, but as you mentioned, running the 
>>> queries with only one triple pattern works as expected, so that too 
>>> many open connections shouldn't be an issue most likely.
>>>
>>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>>
>>> Or can you reproduce this on different servers as well?
>>>
>>> Is it also stuck of your run the query directly after you restart 
>>> Fuseki?
>>>
>>>
>>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>>
>>>>
>>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>>> Forgot:
>>>>>
>>>>> - size of result for each triple pattern? Might affect if hash 
>>>>> join can be used.
>>>> It's one row for each.
>>>>>
>>>>> - your hardware?
>>>> Normal server with 16gigs mem.
>>>>>
>>>>> - is it just the first query after starting Fuseki? Connections 
>>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>>> version, but only with TDB and too many open connections. It has 
>>>>> been resolved with release 4.6.1.
>>>> Jena has been running quite a while.
>>>>>
>>>>> Might not be related, but I'm mentioning all things here 
>>>>> nevertheless.
>>>>>
>>>>>
>>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>>
>>>>>> This returns one row fast, say :C1
>>>>>>
>>>>>> SELECT *
>>>>>> FROM <https://a.b.c>
>>>>>> WHERE {
>>>>>>   <https://x.y.z> a ?t .
>>>>>>   #?t skos:prefLabel ?l
>>>>>> }
>>>>>>
>>>>>>
>>>>>> and this too:
>>>>>>
>>>>>> SELECT *
>>>>>> FROM <https://a.b.c>
>>>>>> WHERE {
>>>>>>   #<https://x.y.z> a ?t .
>>>>>>   :C1 skos:prefLabel ?l
>>>>>> }
>>>>>>
>>>>>>
>>>>>> But this always hangs until timeout
>>>>>>
>>>>>> SELECT *
>>>>>> FROM <https://a.b.c>
>>>>>> WHERE {
>>>>>>   <https://x.y.z> a ?t .
>>>>>>   ?t skos:prefLabel ?l
>>>>>> }
>>>>>>
>>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>>
>>
>

Re: Weird sparql problem

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
This works as a separate query, but not in a the middle, since ?s gets 
new values instead of binding to previous ?s.

{ select ?t where {
?s a ?t .
  } limit 10}
   ?t skos:prefLabel ?l

On 17/10/2022 14.56, Mikael Pesonen wrote:
>
> ?s a ?t .
>   ?t skos:prefLabel ?l
>
> returns 3 million triples. Maybe it's related to this?
>
>
> On 21/09/2022 9.15, Lorenz Buehmann wrote:
>> Weird, only 10M triples and each triple pattern returns only 1 
>> binding, thus, the size is tiny - honestly I can't think of anything 
>> except for open connections, but as you mentioned, running the 
>> queries with only one triple pattern works as expected, so that too 
>> many open connections shouldn't be an issue most likely.
>>
>> Can you reproduce this behavior with newer Jena versions like 4.6.1?
>>
>> Or can you reproduce this on different servers as well?
>>
>> Is it also stuck of your run the query directly after you restart 
>> Fuseki?
>>
>>
>> On 19.09.22 13:49, Mikael Pesonen wrote:
>>>
>>>
>>> On 15/09/2022 17.48, Lorenz Buehmann wrote:
>>>> Forgot:
>>>>
>>>> - size of result for each triple pattern? Might affect if hash join 
>>>> can be used.
>>> It's one row for each.
>>>>
>>>> - your hardware?
>>> Normal server with 16gigs mem.
>>>>
>>>> - is it just the first query after starting Fuseki? Connections 
>>>> have been closed? Note, there was also a bug in a recent Jena 
>>>> version, but only with TDB and too many open connections. It has 
>>>> been resolved with release 4.6.1.
>>> Jena has been running quite a while.
>>>>
>>>> Might not be related, but I'm mentioning all things here nevertheless.
>>>>
>>>>
>>>> On 15.09.22 11:16, Mikael Pesonen wrote:
>>>>>
>>>>> This returns one row fast, say :C1
>>>>>
>>>>> SELECT *
>>>>> FROM <https://a.b.c>
>>>>> WHERE {
>>>>>   <https://x.y.z> a ?t .
>>>>>   #?t skos:prefLabel ?l
>>>>> }
>>>>>
>>>>>
>>>>> and this too:
>>>>>
>>>>> SELECT *
>>>>> FROM <https://a.b.c>
>>>>> WHERE {
>>>>>   #<https://x.y.z> a ?t .
>>>>>   :C1 skos:prefLabel ?l
>>>>> }
>>>>>
>>>>>
>>>>> But this always hangs until timeout
>>>>>
>>>>> SELECT *
>>>>> FROM <https://a.b.c>
>>>>> WHERE {
>>>>>   <https://x.y.z> a ?t .
>>>>>   ?t skos:prefLabel ?l
>>>>> }
>>>>>
>>>>> What am I missing here? I'm using Fuseki web GUI. Thanks!
>>>
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND