You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2016/09/01 08:20:33 UTC
Re: Slow SPARQL query
How do I get the snapshot? I found this page
https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
but how to download?
Mikael
On 28.8.2016 13:34, Andy Seaborne wrote:
> On 26/08/16 12:17, Mikael Pesonen wrote:
>>
>> I'm happy to try out the snapshot. Its just matter of running the server
>> - no modifications of data or config needed?
>>
>> Do you know when the new version will released (weeks, months)?
>>
>> Mikael
>
> We've started talking about it:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201608.mbox/%3C6c93affa-8a09-1d87-2ef2-b235b7a1a8a5%40apache.org%3E
>
>
> (open to everyone)
>
> We can't commit to dates because volunteer time ebbs and flows. Other
> things can overtake plans so we can't reliably commit to anything.
> But we aim for once every 6 months and the last release was May and I
> think we could go a bit earlier this time.
>
> There is some outstanding items that are desirable to deal with, but
> not blockers.
>
> Having reports of good (preferably!) and bad experiences with the
> development builds (get from the snapshot repository - they are simply
> the state of the codebase at the time of the build - done once a day).
>
> Andy
>
>>
>>
>> On 26.8.2016 13:53, Andy Seaborne wrote:
>>> On 26/08/16 11:35, Rob Vesse wrote:
>>>> To try to answer the question about your specific query it\u2019s
>>>> difficult without knowing more about the nature of the data, in your
>>>> case how many named graphs are in the database?
>>>>
>>>> One thing that jumps out at me is that you use the GRAPH operator in
>>>> your query. That operator essentially requires that a query engine
>>>> applies your inner query to every single graph in your database. In
>>>> practice ARQ Will try to do something a bit more efficient than that
>>>> but this is not guaranteed.
>>>
>>> TDB does all the graphs at the same time if it can. Property paths can
>>> stop this but basic graph patterns + GRAPH ?var is done as quad table
>>> accesses.
>>>
>>> Ditto for TIM.
>>>
>>> (It's only the general purpose in-memory dataset where you can put any
>>> graph in from any source that has to loop.)
>>>
>>>> Your inner query uses a lots of property paths and so is potentially
>>>> quite expensive. As a first step I would suggest changing * to + if
>>>> you can as that will avoid having to match the Zero length path which
>>>> while quite simple for your case can be very expensive for more
>>>> generic property paths.
>>>
>>> And has been speed up recently (after the last release I'm afraid,
>>> post Jena 3.1.0 (Fuseki 2.4.0)) JENA-1195
>>>
>>> How long are the skos:narrower* chains?
>>>
>>> Mikael - are you able to try out a SNAPSHOT build?
>>>
>>>> Secondly if you are able to limit the number of graphs that are under
>>>> consideration you may be able to substantially improved performance.
>>>> One way to do this would be to place a pattern prior to the GRAPH
>>>> operator that restricts the values of the ?graph variable. ARQ
>>>> should then be able to use that information to restrict which graphs
>>>> in the database it scans.
>>>>
>>>> Rob
>>>>
>>>
>>> Andy
>>>
>>>
>>
>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Leaving child out from inner results was quite obvious mistake. Now
paging works and queries are fast. Thank you!
Mikael
On 4.9.2016 20:31, Andy Seaborne wrote:
>
>
> On 02/09/16 13:22, Mikael Pesonen wrote:
>>
>> Sorry for bombing with questions.
>
> No problem in this case though replies may be delayed - the questions
> aren't always simple things to reply to.
>
> Can you make the data available?
>
>
>> Im trying to add paging with no success:
>>
>> SELECT DISTINCT ?s ?p ?o WHERE {
>> GRAPH ?graph { SELECT DISTINCT ?child WHERE
>> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
>> ?child}}}
>> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>>
>> { SELECT ?s WHERE {
>>
>> ?s <http://purl.org/dc/terms/subject> ?child .
>> ?s <http://purl.org/dc/terms/isPartOf>
>> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>> } limit 2 }
>> ?s ?p ?o .
>> } }
>
> So 2nd the inner SELECT ?s will return 2 items and it hides the
> ?child. You need to add it to the SELECT clause. You might as well
> call it ?Z - the results are same.
>
> And from that block you get rows of ?s ?p ?o. No ?child - it didn't
> get out of the "SELECT ?s"
>
> but the first part "SELECT DISTINCT ?child" has ?child columns - it's
> an unrelated cross product. You'll get all the results for ?child *
> all the triples of 2 isPartOf items.
>
> If trye:
>
> qparse -print=opt you'll see ?child gets renamed to "?/child" in the
> second part.
>
> Andy
>
>> This should return two sets of records having dcterms:subject one of
>> children from ontology query. But I get results where subject seems to
>> be other random items from same ontology.
>>
>>
>> This works, except can't set correct limit:
>>
>> SELECT DISTINCT ?s ?p ?o WHERE {
>> GRAPH ?graph { SELECT DISTINCT ?child WHERE
>> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
>> ?child}}}
>> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>>
>> ?s <http://purl.org/dc/terms/subject> ?child .
>> ?s <http://purl.org/dc/terms/isPartOf>
>> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>>
>> ?s ?p ?o .
>> } }
>> limit 200
>>
>>
>> Mikael
>>
>>
>>
>>
>> On 2.9.2016 12:04, Mikael Pesonen wrote:
>>>
>>> But I think we can handle this by adding paging, so not a show
>>> stopper...
>>>
>>> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>>>
>>>> Tested that one. Example query, similar that Ive sent here, took on
>>>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>>>
>>>> So a bit of improvement but we need to get that query to couple of
>>>> secs so that its usable for our application.
>>>>
>>>> Mikael
>>>>
>>>>
>>>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>>>
>>>>>> How do I get the snapshot? I found this page
>>>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>>>>>
>>>>>> but how to download?
>>>>>>
>>>>>> Mikael
>>>>>
>>>>> The builds end up in a maven repo:
>>>>>
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>>>>>
>>>>>
>>>>>
>>>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>>>
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>>>>>
>>>>>
>>>>>
>>>>> Make sure you scroll down: the newest is at the bottom (just at the
>>>>> moment it is "20160831.100731-60"
>>>>>
>>>>> Andy
>>>>>
>>>>
>>>
>>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
On 24.9.2016 17:02, Andy Seaborne wrote:
>
>
> On 23/09/16 09:35, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> I have another query that is behaving illogically to me. I am searching
>> for terms in SKOS vocabulary and also need to retrieve topmost level
>> concept for each search result.
>>
>>
>> This query returns entire skos:broader hierarchy for search results and
>> works in a second (marked related lines with bold)
>>
>> SELECT ?graph ?concept
>> (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>> (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
>> ?prefLabels)
>> (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>> (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS
>> ?altLabels)
>> (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>> (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> *(group_concat(DISTINCT
>> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
>> ?topConceptLabels)*
>> WHERE
>> {
>> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>> GRAPH ?graph
>> {
>> {
>> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>> {
>> {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>> UNION
>> {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>> }
>> limit 200
>> }
>> ?concept skos:prefLabel ?prefLabel .
>>
>> OPTIONAL { ?concept skos:altLabel ?altLabel }
>> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?concept skos:broader* ?topConcept . ?topConcept
>> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) =
>> "fi") }*
>> }
>> }
>> GROUP BY ?graph ?concept
>>
>>
>>
>> But this is what I tried first to get only the one topmost broader for
>> each, but it takes 17 seconds to run:
>>
>> SELECT ?graph ?concept *?topConceptLabel*
>> (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>> (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
>> ?prefLabels)
>> (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>> (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS
>> ?altLabels)
>> (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>> (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> WHERE
>> {
>> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>> GRAPH ?graph
>> {
>> {
>> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>> {
>> {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>> UNION
>> {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>> }
>> limit 200
>> }
>> ?concept skos:prefLabel ?prefLabel .
>>
>> OPTIONAL { ?concept skos:altLabel ?altLabel }
>> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
>> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
>> FILTER ( lang(?topConceptLabel) = "fi") }*
>
> You can format queries with qparse or use sparql.org.
>
> First:
> GRAPH ?graph
> {
> ...
> OPTIONAL
> { ?concept (skos:broader)* ?topConcept .
> ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi" )
> }
>
>
> Second:
> GRAPH ?graph
> {
> ...
> OPTIONAL
> { ?topConcept skos:topConceptOf ?graph .
> ?concept (skos:broader)* ?topConcept .
> ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi" )
> }
>
> so the 2nd query does an extra
> "?topConcept skos:topConceptOf ?graph ."
> before the path.
>
> Try putting it after.
Yes, that one reduced time to 10 seconds.
>
> Also,use a version with the path performance fix.
New version was maybe half a second faster.
Okay so maybe SPARQL is not so optimized language yet? With script I can
query all the broader concepts and get the top level one in one sec. Of
course not so elegant and is intuitively more work but gets the job done.
Br,
Mikael
>
> Because you have a group and DISTINCT, this may be several matches but
> this is hidden.
>
> Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }
>
> means that the engine may have to separate those two uses of ?graph at
> the point the BGP executes and sort it out later. The earlier
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> may not have so much effect limiting the execution search.
>
> Andy
>
>
>
>
>> }
>> }
>> GROUP BY ?graph ?concept *?topConceptLabel*
>>
>>
>>
>> The speed issue seems totally illogical to me but there must be a
>> correct way to perform the latter query then?
>>
>> Br,
>> Mikael
>>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Okay by trial and error found the solution: move graph selection to
innermost select:
GRAPH ?graph
{
{
SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
{
?graph dcterms:subject "MEDICAL"
{?concept skos:prefLabel ?prefLabelm FILTER (
(lang(?prefLabelm) = "fi" || lang(...
...
Really have to start studying the inner workings of SPARQL queries...
Mikael
On 24.9.2016 17:02, Andy Seaborne wrote:
>
>
> On 23/09/16 09:35, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> I have another query that is behaving illogically to me. I am searching
>> for terms in SKOS vocabulary and also need to retrieve topmost level
>> concept for each search result.
>>
>>
>> This query returns entire skos:broader hierarchy for search results and
>> works in a second (marked related lines with bold)
>>
>> SELECT ?graph ?concept
>> (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>> (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
>> ?prefLabels)
>> (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>> (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS
>> ?altLabels)
>> (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>> (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> *(group_concat(DISTINCT
>> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
>> ?topConceptLabels)*
>> WHERE
>> {
>> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>> GRAPH ?graph
>> {
>> {
>> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>> {
>> {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>> UNION
>> {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>> }
>> limit 200
>> }
>> ?concept skos:prefLabel ?prefLabel .
>>
>> OPTIONAL { ?concept skos:altLabel ?altLabel }
>> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?concept skos:broader* ?topConcept . ?topConcept
>> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) =
>> "fi") }*
>> }
>> }
>> GROUP BY ?graph ?concept
>>
>>
>>
>> But this is what I tried first to get only the one topmost broader for
>> each, but it takes 17 seconds to run:
>>
>> SELECT ?graph ?concept *?topConceptLabel*
>> (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>> (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
>> ?prefLabels)
>> (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>> (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS
>> ?altLabels)
>> (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>> (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> WHERE
>> {
>> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>> GRAPH ?graph
>> {
>> {
>> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>> {
>> {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>> UNION
>> {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>> }
>> limit 200
>> }
>> ?concept skos:prefLabel ?prefLabel .
>>
>> OPTIONAL { ?concept skos:altLabel ?altLabel }
>> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
>> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
>> FILTER ( lang(?topConceptLabel) = "fi") }*
>
> You can format queries with qparse or use sparql.org.
>
> First:
> GRAPH ?graph
> {
> ...
> OPTIONAL
> { ?concept (skos:broader)* ?topConcept .
> ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi" )
> }
>
>
> Second:
> GRAPH ?graph
> {
> ...
> OPTIONAL
> { ?topConcept skos:topConceptOf ?graph .
> ?concept (skos:broader)* ?topConcept .
> ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi" )
> }
>
> so the 2nd query does an extra
> "?topConcept skos:topConceptOf ?graph ."
> before the path.
>
> Try putting it after.
>
> Also,use a version with the path performance fix.
>
> Because you have a group and DISTINCT, this may be several matches but
> this is hidden.
>
> Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }
>
> means that the engine may have to separate those two uses of ?graph at
> the point the BGP executes and sort it out later. The earlier
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> may not have so much effect limiting the execution search.
>
> Andy
>
>
>
>
>> }
>> }
>> GROUP BY ?graph ?concept *?topConceptLabel*
>>
>>
>>
>> The speed issue seems totally illogical to me but there must be a
>> correct way to perform the latter query then?
>>
>> Br,
>> Mikael
>>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Andy Seaborne <an...@apache.org>.
On 23/09/16 09:35, Mikael Pesonen wrote:
>
> Hi,
>
> I have another query that is behaving illogically to me. I am searching
> for terms in SKOS vocabulary and also need to retrieve topmost level
> concept for each search result.
>
>
> This query returns entire skos:broader hierarchy for search results and
> works in a second (marked related lines with bold)
>
> SELECT ?graph ?concept
> (group_concat(DISTINCT
> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
> ?prefLabelms)
> (group_concat(DISTINCT
> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
> ?prefLabels)
> (group_concat(DISTINCT
> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
> ?altLabelms)
> (group_concat(DISTINCT
> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
> (group_concat(DISTINCT
> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
> (group_concat(DISTINCT
> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
> *(group_concat(DISTINCT
> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
> ?topConceptLabels)*
> WHERE
> {
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> GRAPH ?graph
> {
> {
> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
> {
> {?concept skos:prefLabel ?prefLabelm FILTER (
> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
> REGEX(?prefLabelm, "culo", "i"))}
> UNION
> {?concept skos:altLabel ?altLabelm FILTER (
> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
> REGEX(?altLabelm, "culo", "i"))}
> }
> limit 200
> }
> ?concept skos:prefLabel ?prefLabel .
>
> OPTIONAL { ?concept skos:altLabel ?altLabel }
> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
> rdf:value ?def2 } }
> *OPTIONAL { ?concept skos:broader* ?topConcept . ?topConcept
> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = "fi") }*
> }
> }
> GROUP BY ?graph ?concept
>
>
>
> But this is what I tried first to get only the one topmost broader for
> each, but it takes 17 seconds to run:
>
> SELECT ?graph ?concept *?topConceptLabel*
> (group_concat(DISTINCT
> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
> ?prefLabelms)
> (group_concat(DISTINCT
> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
> ?prefLabels)
> (group_concat(DISTINCT
> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
> ?altLabelms)
> (group_concat(DISTINCT
> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
> (group_concat(DISTINCT
> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
> (group_concat(DISTINCT
> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
> WHERE
> {
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> GRAPH ?graph
> {
> {
> SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
> {
> {?concept skos:prefLabel ?prefLabelm FILTER (
> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
> REGEX(?prefLabelm, "culo", "i"))}
> UNION
> {?concept skos:altLabel ?altLabelm FILTER (
> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
> REGEX(?altLabelm, "culo", "i"))}
> }
> limit 200
> }
> ?concept skos:prefLabel ?prefLabel .
>
> OPTIONAL { ?concept skos:altLabel ?altLabel }
> OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
> rdf:value ?def2 } }
> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi") }*
You can format queries with qparse or use sparql.org.
First:
GRAPH ?graph
{
...
OPTIONAL
{ ?concept (skos:broader)* ?topConcept .
?topConcept skos:prefLabel ?topConceptLabel
FILTER ( lang(?topConceptLabel) = "fi" )
}
Second:
GRAPH ?graph
{
...
OPTIONAL
{ ?topConcept skos:topConceptOf ?graph .
?concept (skos:broader)* ?topConcept .
?topConcept skos:prefLabel ?topConceptLabel
FILTER ( lang(?topConceptLabel) = "fi" )
}
so the 2nd query does an extra
"?topConcept skos:topConceptOf ?graph ."
before the path.
Try putting it after.
Also,use a version with the path performance fix.
Because you have a group and DISTINCT, this may be several matches but
this is hidden.
Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }
means that the engine may have to separate those two uses of ?graph at
the point the BGP executes and sort it out later. The earlier
GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
may not have so much effect limiting the execution search.
Andy
> }
> }
> GROUP BY ?graph ?concept *?topConceptLabel*
>
>
>
> The speed issue seems totally illogical to me but there must be a
> correct way to perform the latter query then?
>
> Br,
> Mikael
>
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
I have another query that is behaving illogically to me. I am searching
for terms in SKOS vocabulary and also need to retrieve topmost level
concept for each search result.
This query returns entire skos:broader hierarchy for search results and
works in a second (marked related lines with bold)
SELECT ?graph ?concept
(group_concat(DISTINCT
concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
?prefLabelms)
(group_concat(DISTINCT
concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
?prefLabels)
(group_concat(DISTINCT
concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS ?altLabelms)
(group_concat(DISTINCT
concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
(group_concat(DISTINCT
concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
(group_concat(DISTINCT
concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
*(group_concat(DISTINCT
concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
?topConceptLabels)*
WHERE
{
GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
GRAPH ?graph
{
{
SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
{
{?concept skos:prefLabel ?prefLabelm FILTER (
(lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
REGEX(?prefLabelm, "culo", "i"))}
UNION
{?concept skos:altLabel ?altLabelm FILTER (
(lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
REGEX(?altLabelm, "culo", "i"))}
}
limit 200
}
?concept skos:prefLabel ?prefLabel .
OPTIONAL { ?concept skos:altLabel ?altLabel }
OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
rdf:value ?def2 } }
*OPTIONAL { ?concept skos:broader* ?topConcept . ?topConcept
skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = "fi") }*
}
}
GROUP BY ?graph ?concept
But this is what I tried first to get only the one topmost broader for
each, but it takes 17 seconds to run:
SELECT ?graph ?concept *?topConceptLabel*
(group_concat(DISTINCT
concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
?prefLabelms)
(group_concat(DISTINCT
concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE") AS
?prefLabels)
(group_concat(DISTINCT
concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS ?altLabelms)
(group_concat(DISTINCT
concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
(group_concat(DISTINCT
concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
(group_concat(DISTINCT
concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
WHERE
{
GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
GRAPH ?graph
{
{
SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
{
{?concept skos:prefLabel ?prefLabelm FILTER (
(lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
REGEX(?prefLabelm, "culo", "i"))}
UNION
{?concept skos:altLabel ?altLabelm FILTER (
(lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
REGEX(?altLabelm, "culo", "i"))}
}
limit 200
}
?concept skos:prefLabel ?prefLabel .
OPTIONAL { ?concept skos:altLabel ?altLabel }
OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
rdf:value ?def2 } }
*OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
FILTER ( lang(?topConceptLabel) = "fi") }*
}
}
GROUP BY ?graph ?concept *?topConceptLabel*
The speed issue seems totally illogical to me but there must be a
correct way to perform the latter query then?
Br,
Mikael
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Andy Seaborne <an...@apache.org>.
On 02/09/16 13:22, Mikael Pesonen wrote:
>
> Sorry for bombing with questions.
No problem in this case though replies may be delayed - the questions
aren't always simple things to reply to.
Can you make the data available?
> Im trying to add paging with no success:
>
> SELECT DISTINCT ?s ?p ?o WHERE {
> GRAPH ?graph { SELECT DISTINCT ?child WHERE
> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
> ?child}}}
> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>
> { SELECT ?s WHERE {
>
> ?s <http://purl.org/dc/terms/subject> ?child .
> ?s <http://purl.org/dc/terms/isPartOf>
> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
> } limit 2 }
> ?s ?p ?o .
> } }
So 2nd the inner SELECT ?s will return 2 items and it hides the ?child.
You need to add it to the SELECT clause. You might as well call it ?Z -
the results are same.
And from that block you get rows of ?s ?p ?o. No ?child - it didn't get
out of the "SELECT ?s"
but the first part "SELECT DISTINCT ?child" has ?child columns - it's
an unrelated cross product. You'll get all the results for ?child * all
the triples of 2 isPartOf items.
If trye:
qparse -print=opt you'll see ?child gets renamed to "?/child" in the
second part.
Andy
> This should return two sets of records having dcterms:subject one of
> children from ontology query. But I get results where subject seems to
> be other random items from same ontology.
>
>
> This works, except can't set correct limit:
>
> SELECT DISTINCT ?s ?p ?o WHERE {
> GRAPH ?graph { SELECT DISTINCT ?child WHERE
> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
> ?child}}}
> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>
> ?s <http://purl.org/dc/terms/subject> ?child .
> ?s <http://purl.org/dc/terms/isPartOf>
> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>
> ?s ?p ?o .
> } }
> limit 200
>
>
> Mikael
>
>
>
>
> On 2.9.2016 12:04, Mikael Pesonen wrote:
>>
>> But I think we can handle this by adding paging, so not a show stopper...
>>
>> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>>
>>> Tested that one. Example query, similar that Ive sent here, took on
>>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>>
>>> So a bit of improvement but we need to get that query to couple of
>>> secs so that its usable for our application.
>>>
>>> Mikael
>>>
>>>
>>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>>
>>>>> How do I get the snapshot? I found this page
>>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>>>> but how to download?
>>>>>
>>>>> Mikael
>>>>
>>>> The builds end up in a maven repo:
>>>>
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>>>>
>>>>
>>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>>
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>>>>
>>>>
>>>> Make sure you scroll down: the newest is at the bottom (just at the
>>>> moment it is "20160831.100731-60"
>>>>
>>>> Andy
>>>>
>>>
>>
>
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Sorry for bombing with questions. Im trying to add paging with no success:
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH ?graph { SELECT DISTINCT ?child WHERE
{{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
?child}}}
GRAPH <http://www.lingsoft.fi/resource-meta/> {
{ SELECT ?s WHERE {
?s <http://purl.org/dc/terms/subject> ?child .
?s <http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/57aae39836662> .
} limit 2 }
?s ?p ?o .
} }
This should return two sets of records having dcterms:subject one of
children from ontology query. But I get results where subject seems to
be other random items from same ontology.
This works, except can't set correct limit:
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH ?graph { SELECT DISTINCT ?child WHERE
{{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
?child}}}
GRAPH <http://www.lingsoft.fi/resource-meta/> {
?s <http://purl.org/dc/terms/subject> ?child .
?s <http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/57aae39836662> .
?s ?p ?o .
} }
limit 200
Mikael
On 2.9.2016 12:04, Mikael Pesonen wrote:
>
> But I think we can handle this by adding paging, so not a show stopper...
>
> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>
>> Tested that one. Example query, similar that Ive sent here, took on
>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>
>> So a bit of improvement but we need to get that query to couple of
>> secs so that its usable for our application.
>>
>> Mikael
>>
>>
>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>
>>>> How do I get the snapshot? I found this page
>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>>> but how to download?
>>>>
>>>> Mikael
>>>
>>> The builds end up in a maven repo:
>>>
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>>>
>>>
>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>>>
>>>
>>> Make sure you scroll down: the newest is at the bottom (just at the
>>> moment it is "20160831.100731-60"
>>>
>>> Andy
>>>
>>
>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
But I think we can handle this by adding paging, so not a show stopper...
On 2.9.2016 11:52, Mikael Pesonen wrote:
>
> Tested that one. Example query, similar that Ive sent here, took on
> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>
> So a bit of improvement but we need to get that query to couple of
> secs so that its usable for our application.
>
> Mikael
>
>
> On 1.9.2016 12:40, Andy Seaborne wrote:
>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>
>>> How do I get the snapshot? I found this page
>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>> but how to download?
>>>
>>> Mikael
>>
>> The builds end up in a maven repo:
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>>
>>
>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>>
>>
>> Make sure you scroll down: the newest is at the bottom (just at the
>> moment it is "20160831.100731-60"
>>
>> Andy
>>
>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Tested that one. Example query, similar that Ive sent here, took on
average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
So a bit of improvement but we need to get that query to couple of secs
so that its usable for our application.
Mikael
On 1.9.2016 12:40, Andy Seaborne wrote:
> On 01/09/16 09:20, Mikael Pesonen wrote:
>>
>> How do I get the snapshot? I found this page
>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>> but how to download?
>>
>> Mikael
>
> The builds end up in a maven repo:
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>
>
> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>
>
> Make sure you scroll down: the newest is at the bottom (just at the
> moment it is "20160831.100731-60"
>
> Andy
>
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Re: Slow SPARQL query
Posted by Andy Seaborne <an...@apache.org>.
On 01/09/16 09:20, Mikael Pesonen wrote:
>
> How do I get the snapshot? I found this page
> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
> but how to download?
>
> Mikael
The builds end up in a maven repo:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
Make sure you scroll down: the newest is at the bottom (just at the
moment it is "20160831.100731-60"
Andy