You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2016/09/01 08:20:33 UTC

Re: Slow SPARQL query

How do I get the snapshot? I found this page 
https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/ 
but how to download?

Mikael


On 28.8.2016 13:34, Andy Seaborne wrote:
> On 26/08/16 12:17, Mikael Pesonen wrote:
>>
>> I'm happy to try out the snapshot. Its just matter of running the server
>> - no modifications of data or config needed?
>>
>> Do you know when the new version will released (weeks, months)?
>>
>> Mikael
>
> We've started talking about it:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201608.mbox/%3C6c93affa-8a09-1d87-2ef2-b235b7a1a8a5%40apache.org%3E 
>
>
> (open to everyone)
>
> We can't commit to dates because volunteer time ebbs and flows. Other 
> things can overtake plans so we can't reliably commit to anything.  
> But we aim for once every 6 months and the last release was May and I 
> think we could go a bit earlier this time.
>
> There is some outstanding items that are desirable to deal with, but 
> not blockers.
>
> Having reports of good (preferably!) and bad experiences with the 
> development builds (get from the snapshot repository - they are simply 
> the state of the codebase at the time of the build - done once a day).
>
>     Andy
>
>>
>>
>> On 26.8.2016 13:53, Andy Seaborne wrote:
>>> On 26/08/16 11:35, Rob Vesse wrote:
>>>> To try to answer the question about your specific query it\u2019s
>>>> difficult without knowing more about the nature of the data, in your
>>>> case how many named graphs are in the database?
>>>>
>>>> One thing that jumps out at me is that you use the GRAPH operator in
>>>> your query. That operator essentially requires that a query engine
>>>> applies your inner query to every single graph in your database. In
>>>> practice ARQ Will try to do something a bit more efficient than that
>>>> but this is not guaranteed.
>>>
>>> TDB does all the graphs at the same time if it can. Property paths can
>>> stop this but basic graph patterns + GRAPH ?var is done as quad table
>>> accesses.
>>>
>>> Ditto for TIM.
>>>
>>> (It's only the general purpose in-memory dataset where you can put any
>>> graph in from any source that has to loop.)
>>>
>>>> Your inner query uses a lots of property paths and so is potentially
>>>> quite expensive. As a first step I would suggest changing * to + if
>>>> you can as that will avoid having to match the Zero length path which
>>>> while quite simple for your case can be very expensive for more
>>>> generic property paths.
>>>
>>> And has been speed up recently (after the last release I'm afraid,
>>> post Jena 3.1.0 (Fuseki 2.4.0)) JENA-1195
>>>
>>> How long are the skos:narrower* chains?
>>>
>>> Mikael - are you able to try out a SNAPSHOT build?
>>>
>>>> Secondly if you are able to limit the number of graphs that are under
>>>> consideration you may be able to substantially improved performance.
>>>> One way to do this would be to place a pattern prior to the GRAPH
>>>> operator that restricts the values of the ?graph variable. ARQ
>>>> should then be able to use that information to restrict which graphs
>>>> in the database it scans.
>>>>
>>>> Rob
>>>>
>>>
>>>     Andy
>>>
>>>
>>
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Leaving child out from inner results was quite obvious mistake. Now 
paging works and queries are fast. Thank you!

Mikael


On 4.9.2016 20:31, Andy Seaborne wrote:
>
>
> On 02/09/16 13:22, Mikael Pesonen wrote:
>>
>> Sorry for bombing with questions.
>
> No problem in this case though replies may be delayed - the questions 
> aren't always simple things to reply to.
>
> Can you make the data available?
>
>
>> Im trying to add paging with no success:
>>
>> SELECT DISTINCT ?s ?p ?o WHERE {
>>   GRAPH ?graph { SELECT DISTINCT ?child WHERE
>> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
>> ?child}}}
>> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>>
>> { SELECT ?s WHERE {
>>
>>  ?s <http://purl.org/dc/terms/subject> ?child .
>> ?s <http://purl.org/dc/terms/isPartOf>
>> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>>       } limit 2 }
>>  ?s ?p ?o .
>> } }
>
> So 2nd the inner SELECT ?s will return 2 items and it hides the 
> ?child. You need to add it to the SELECT clause.  You might as well 
> call it ?Z - the results are same.
>
> And from that block you get rows of ?s ?p ?o.  No ?child - it didn't 
> get out of the "SELECT ?s"
>
> but the first part "SELECT DISTINCT  ?child" has ?child columns - it's 
> an unrelated cross product.  You'll get all the results for ?child * 
> all the triples of 2 isPartOf items.
>
> If trye:
>
> qparse -print=opt you'll see ?child gets renamed to "?/child" in the 
> second part.
>
>     Andy
>
>> This should return two sets of records having dcterms:subject one of
>> children from ontology query. But I get results where subject seems to
>> be other random items from same ontology.
>>
>>
>> This works, except can't set correct limit:
>>
>> SELECT DISTINCT ?s ?p ?o WHERE {
>>   GRAPH ?graph { SELECT DISTINCT ?child WHERE
>> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
>> ?child}}}
>> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>>
>>  ?s <http://purl.org/dc/terms/subject> ?child .
>> ?s <http://purl.org/dc/terms/isPartOf>
>> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>>
>>  ?s ?p ?o .
>> } }
>> limit 200
>>
>>
>> Mikael
>>
>>
>>
>>
>> On 2.9.2016 12:04, Mikael Pesonen wrote:
>>>
>>> But I think we can handle this by adding paging, so not a show 
>>> stopper...
>>>
>>> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>>>
>>>> Tested that one. Example query, similar that Ive sent here, took on
>>>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>>>
>>>> So a bit of improvement but we need to get that query to couple of
>>>> secs so that its usable for our application.
>>>>
>>>> Mikael
>>>>
>>>>
>>>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>>>
>>>>>> How do I get the snapshot? I found this page
>>>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/ 
>>>>>>
>>>>>> but how to download?
>>>>>>
>>>>>> Mikael
>>>>>
>>>>> The builds end up in a maven repo:
>>>>>
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 
>>>>>
>>>>>
>>>>>
>>>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>>>
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/ 
>>>>>
>>>>>
>>>>>
>>>>> Make sure you scroll down: the newest is at the bottom (just at the
>>>>> moment it is "20160831.100731-60"
>>>>>
>>>>>     Andy
>>>>>
>>>>
>>>
>>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Hi,



On 24.9.2016 17:02, Andy Seaborne wrote:
>
>
> On 23/09/16 09:35, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> I have another query that is behaving illogically to me. I am searching
>> for terms in SKOS vocabulary and also need to retrieve topmost level
>> concept for each search result.
>>
>>
>> This query returns entire skos:broader hierarchy for search results and
>> works in a second (marked related lines with bold)
>>
>>     SELECT ?graph ?concept
>>     (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>>     (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
>> ?prefLabels)
>>     (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>>     (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS 
>> ?altLabels)
>>     (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>>     (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> *(group_concat(DISTINCT
>> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
>> ?topConceptLabels)*
>>     WHERE
>>     {
>>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>>         GRAPH ?graph
>>         {
>>             {
>>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>>                 {
>>                     {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>>                     UNION
>>                     {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>>                 }
>>                 limit 200
>>             }
>>             ?concept skos:prefLabel ?prefLabel .
>>
>>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL {  ?concept skos:broader* ?topConcept . ?topConcept
>> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = 
>> "fi") }*
>>         }
>>     }
>> GROUP BY ?graph ?concept
>>
>>
>>
>> But this is what I tried first to get only the one topmost broader for
>> each, but it takes 17 seconds to run:
>>
>> SELECT ?graph ?concept *?topConceptLabel*
>>     (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>>     (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
>> ?prefLabels)
>>     (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>>     (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS 
>> ?altLabels)
>>     (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>>     (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>>     WHERE
>>     {
>>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>>         GRAPH ?graph
>>         {
>>             {
>>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>>                 {
>>                     {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>>                     UNION
>>                     {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>>                 }
>>                 limit 200
>>             }
>>             ?concept skos:prefLabel ?prefLabel .
>>
>>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
>> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
>> FILTER ( lang(?topConceptLabel) = "fi") }*
>
> You can format queries with qparse or use sparql.org.
>
> First:
>     GRAPH ?graph
>     {
> ...
>         OPTIONAL
>           { ?concept (skos:broader)* ?topConcept .
>             ?topConcept  skos:prefLabel  ?topConceptLabel
>             FILTER ( lang(?topConceptLabel) = "fi" )
>           }
>
>
> Second:
>    GRAPH ?graph
>    {
> ...
>         OPTIONAL
>           { ?topConcept  skos:topConceptOf  ?graph .
>             ?concept (skos:broader)* ?topConcept .
>             ?topConcept  skos:prefLabel  ?topConceptLabel
>             FILTER ( lang(?topConceptLabel) = "fi" )
>           }
>
> so the 2nd query does an extra
> "?topConcept  skos:topConceptOf  ?graph ."
> before the path.
>
> Try putting it after.
Yes, that one reduced time to 10 seconds.
>
> Also,use a version with the path performance fix.
New version was maybe half a second faster.

Okay so maybe SPARQL is not so optimized language yet? With script I can 
query all the broader concepts and get the top level one in one sec. Of 
course not so elegant and is intuitively more work but gets the job done.

Br,
Mikael

>
> Because you have a group and DISTINCT, this may be several matches but 
> this is hidden.
>
> Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }
>
> means that the engine may have to separate those two uses of ?graph at 
> the point the BGP executes and sort it out later.  The earlier
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> may not have so much effect limiting the execution search.
>
>     Andy
>
>
>
>
>>         }
>>     }
>> GROUP BY ?graph ?concept *?topConceptLabel*
>>
>>
>>
>> The speed issue seems totally illogical to me but there must be a
>> correct way to perform the latter query then?
>>
>> Br,
>> Mikael
>>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Okay by trial and error found the solution: move graph selection to 
innermost select:

GRAPH ?graph
         {
             {
                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
                 {
                     ?graph dcterms:subject "MEDICAL"
                     {?concept skos:prefLabel ?prefLabelm FILTER ( 
(lang(?prefLabelm) = "fi" || lang(...
...

Really have to start studying the inner workings of SPARQL queries...

Mikael

On 24.9.2016 17:02, Andy Seaborne wrote:
>
>
> On 23/09/16 09:35, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> I have another query that is behaving illogically to me. I am searching
>> for terms in SKOS vocabulary and also need to retrieve topmost level
>> concept for each search result.
>>
>>
>> This query returns entire skos:broader hierarchy for search results and
>> works in a second (marked related lines with bold)
>>
>>     SELECT ?graph ?concept
>>     (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>>     (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
>> ?prefLabels)
>>     (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>>     (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS 
>> ?altLabels)
>>     (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>>     (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>> *(group_concat(DISTINCT
>> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
>> ?topConceptLabels)*
>>     WHERE
>>     {
>>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>>         GRAPH ?graph
>>         {
>>             {
>>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>>                 {
>>                     {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>>                     UNION
>>                     {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>>                 }
>>                 limit 200
>>             }
>>             ?concept skos:prefLabel ?prefLabel .
>>
>>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL {  ?concept skos:broader* ?topConcept . ?topConcept
>> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = 
>> "fi") }*
>>         }
>>     }
>> GROUP BY ?graph ?concept
>>
>>
>>
>> But this is what I tried first to get only the one topmost broader for
>> each, but it takes 17 seconds to run:
>>
>> SELECT ?graph ?concept *?topConceptLabel*
>>     (group_concat(DISTINCT
>> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE") AS
>> ?prefLabelms)
>>     (group_concat(DISTINCT
>> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
>> ?prefLabels)
>>     (group_concat(DISTINCT
>> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
>> ?altLabelms)
>>     (group_concat(DISTINCT
>> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS 
>> ?altLabels)
>>     (group_concat(DISTINCT
>> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>>     (group_concat(DISTINCT
>> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>>     WHERE
>>     {
>>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>>         GRAPH ?graph
>>         {
>>             {
>>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>>                 {
>>                     {?concept skos:prefLabel ?prefLabelm FILTER (
>> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
>> REGEX(?prefLabelm, "culo", "i"))}
>>                     UNION
>>                     {?concept skos:altLabel ?altLabelm FILTER (
>> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
>> REGEX(?altLabelm, "culo", "i"))}
>>                 }
>>                 limit 200
>>             }
>>             ?concept skos:prefLabel ?prefLabel .
>>
>>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
>> rdf:value ?def2 } }
>> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
>> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
>> FILTER ( lang(?topConceptLabel) = "fi") }*
>
> You can format queries with qparse or use sparql.org.
>
> First:
>     GRAPH ?graph
>     {
> ...
>         OPTIONAL
>           { ?concept (skos:broader)* ?topConcept .
>             ?topConcept  skos:prefLabel  ?topConceptLabel
>             FILTER ( lang(?topConceptLabel) = "fi" )
>           }
>
>
> Second:
>    GRAPH ?graph
>    {
> ...
>         OPTIONAL
>           { ?topConcept  skos:topConceptOf  ?graph .
>             ?concept (skos:broader)* ?topConcept .
>             ?topConcept  skos:prefLabel  ?topConceptLabel
>             FILTER ( lang(?topConceptLabel) = "fi" )
>           }
>
> so the 2nd query does an extra
> "?topConcept  skos:topConceptOf  ?graph ."
> before the path.
>
> Try putting it after.
>
> Also,use a version with the path performance fix.
>
> Because you have a group and DISTINCT, this may be several matches but 
> this is hidden.
>
> Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }
>
> means that the engine may have to separate those two uses of ?graph at 
> the point the BGP executes and sort it out later.  The earlier
> GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
> may not have so much effect limiting the execution search.
>
>     Andy
>
>
>
>
>>         }
>>     }
>> GROUP BY ?graph ?concept *?topConceptLabel*
>>
>>
>>
>> The speed issue seems totally illogical to me but there must be a
>> correct way to perform the latter query then?
>>
>> Br,
>> Mikael
>>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Andy Seaborne <an...@apache.org>.


On 23/09/16 09:35, Mikael Pesonen wrote:
>
> Hi,
>
> I have another query that is behaving illogically to me. I am searching
> for terms in SKOS vocabulary and also need to retrieve topmost level
> concept for each search result.
>
>
> This query returns entire skos:broader hierarchy for search results and
> works in a second (marked related lines with bold)
>
>     SELECT ?graph ?concept
>     (group_concat(DISTINCT
> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE")  AS
> ?prefLabelms)
>     (group_concat(DISTINCT
> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
> ?prefLabels)
>     (group_concat(DISTINCT
> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
> ?altLabelms)
>     (group_concat(DISTINCT
> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
>     (group_concat(DISTINCT
> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>     (group_concat(DISTINCT
> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
> *(group_concat(DISTINCT
> concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS
> ?topConceptLabels)*
>     WHERE
>     {
>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>         GRAPH ?graph
>         {
>             {
>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>                 {
>                     {?concept skos:prefLabel ?prefLabelm FILTER (
> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
> REGEX(?prefLabelm, "culo", "i"))}
>                     UNION
>                     {?concept skos:altLabel ?altLabelm FILTER (
> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
> REGEX(?altLabelm, "culo", "i"))}
>                 }
>                 limit 200
>             }
>             ?concept skos:prefLabel ?prefLabel .
>
>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
> rdf:value ?def2 } }
> *OPTIONAL {  ?concept skos:broader* ?topConcept . ?topConcept
> skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = "fi") }*
>         }
>     }
> GROUP BY ?graph ?concept
>
>
>
> But this is what I tried first to get only the one topmost broader for
> each, but it takes 17 seconds to run:
>
> SELECT ?graph ?concept *?topConceptLabel*
>     (group_concat(DISTINCT
> concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE")  AS
> ?prefLabelms)
>     (group_concat(DISTINCT
> concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS
> ?prefLabels)
>     (group_concat(DISTINCT
> concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS
> ?altLabelms)
>     (group_concat(DISTINCT
> concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
>     (group_concat(DISTINCT
> concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
>     (group_concat(DISTINCT
> concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
>     WHERE
>     {
>         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
>         GRAPH ?graph
>         {
>             {
>                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
>                 {
>                     {?concept skos:prefLabel ?prefLabelm FILTER (
> (lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") &&
> REGEX(?prefLabelm, "culo", "i"))}
>                     UNION
>                     {?concept skos:altLabel ?altLabelm FILTER (
> (lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") &&
> REGEX(?altLabelm, "culo", "i"))}
>                 }
>                 limit 200
>             }
>             ?concept skos:prefLabel ?prefLabel .
>
>             OPTIONAL { ?concept skos:altLabel ?altLabel  }
>             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1
> rdf:value ?def2 } }
> *OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept
> skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel
> FILTER ( lang(?topConceptLabel) = "fi") }*

You can format queries with qparse or use sparql.org.

First:
     GRAPH ?graph
     {
...
         OPTIONAL
           { ?concept (skos:broader)* ?topConcept .
             ?topConcept  skos:prefLabel  ?topConceptLabel
             FILTER ( lang(?topConceptLabel) = "fi" )
           }


Second:
    GRAPH ?graph
    {
...
         OPTIONAL
           { ?topConcept  skos:topConceptOf  ?graph .
             ?concept (skos:broader)* ?topConcept .
             ?topConcept  skos:prefLabel  ?topConceptLabel
             FILTER ( lang(?topConceptLabel) = "fi" )
           }

so the 2nd query does an extra
"?topConcept  skos:topConceptOf  ?graph ."
before the path.

Try putting it after.

Also,use a version with the path performance fix.

Because you have a group and DISTINCT, this may be several matches but 
this is hidden.

Also, GRAPH ?graph { ... OPTIONAL { use of ?graph .... } ... }

means that the engine may have to separate those two uses of ?graph at 
the point the BGP executes and sort it out later.  The earlier
GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
may not have so much effect limiting the execution search.

     Andy




>         }
>     }
> GROUP BY ?graph ?concept *?topConceptLabel*
>
>
>
> The speed issue seems totally illogical to me but there must be a
> correct way to perform the latter query then?
>
> Br,
> Mikael
>

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Hi,

I have another query that is behaving illogically to me. I am searching 
for terms in SKOS vocabulary and also need to retrieve topmost level 
concept for each search result.


This query returns entire skos:broader hierarchy for search results and 
works in a second (marked related lines with bold)

     SELECT ?graph ?concept
     (group_concat(DISTINCT 
concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE")  AS 
?prefLabelms)
     (group_concat(DISTINCT 
concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS 
?prefLabels)
     (group_concat(DISTINCT 
concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS ?altLabelms)
     (group_concat(DISTINCT 
concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
     (group_concat(DISTINCT 
concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
     (group_concat(DISTINCT 
concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
*(group_concat(DISTINCT 
concat(?topConceptLabel,"@",lang(?topConceptLabel));separator="/") AS 
?topConceptLabels)*
     WHERE
     {
         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
         GRAPH ?graph
         {
             {
                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
                 {
                     {?concept skos:prefLabel ?prefLabelm FILTER ( 
(lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") && 
REGEX(?prefLabelm, "culo", "i"))}
                     UNION
                     {?concept skos:altLabel ?altLabelm FILTER ( 
(lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") && 
REGEX(?altLabelm, "culo", "i"))}
                 }
                 limit 200
             }
             ?concept skos:prefLabel ?prefLabel .

             OPTIONAL { ?concept skos:altLabel ?altLabel  }
             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1 
rdf:value ?def2 } }
*OPTIONAL {  ?concept skos:broader* ?topConcept . ?topConcept 
skos:prefLabel ?topConceptLabel FILTER ( lang(?topConceptLabel) = "fi") }*
         }
     }
GROUP BY ?graph ?concept



But this is what I tried first to get only the one topmost broader for 
each, but it takes 17 seconds to run:

SELECT ?graph ?concept *?topConceptLabel*
     (group_concat(DISTINCT 
concat(?prefLabelm,"@",lang(?prefLabelm));separator="NEWLINE")  AS 
?prefLabelms)
     (group_concat(DISTINCT 
concat(?prefLabel,"@",lang(?prefLabel));separator="NEWLINE")  AS 
?prefLabels)
     (group_concat(DISTINCT 
concat(?altLabelm,"@",lang(?altLabelm));separator="NEWLINE") AS ?altLabelms)
     (group_concat(DISTINCT 
concat(?altLabel,"@",lang(?altLabel));separator="NEWLINE") AS ?altLabels)
     (group_concat(DISTINCT 
concat(?def1,"@",lang(?def1));separator="NEWLINENEWLINE") AS ?defs1)
     (group_concat(DISTINCT 
concat(?def2,"@",lang(?def2));separator="NEWLINENEWLINE") AS ?defs2)
     WHERE
     {
         GRAPH ?graph { ?graph dcterms:subject "MEDICAL" }
         GRAPH ?graph
         {
             {
                 SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
                 {
                     {?concept skos:prefLabel ?prefLabelm FILTER ( 
(lang(?prefLabelm) = "fi" || lang(?prefLabelm) = "la-FI") && 
REGEX(?prefLabelm, "culo", "i"))}
                     UNION
                     {?concept skos:altLabel ?altLabelm FILTER ( 
(lang(?altLabelm) = "fi" || lang(?altLabelm) = "la-FI") && 
REGEX(?altLabelm, "culo", "i"))}
                 }
                 limit 200
             }
             ?concept skos:prefLabel ?prefLabel .

             OPTIONAL { ?concept skos:altLabel ?altLabel  }
             OPTIONAL { ?concept skos:definition ?def1 . OPTIONAL {?def1 
rdf:value ?def2 } }
*OPTIONAL { ?topConcept skos:topConceptOf ?graph . ?concept 
skos:broader* ?topConcept . ?topConcept skos:prefLabel ?topConceptLabel 
FILTER ( lang(?topConceptLabel) = "fi") }*
         }
     }
GROUP BY ?graph ?concept *?topConceptLabel*



The speed issue seems totally illogical to me but there must be a 
correct way to perform the latter query then?

Br,
Mikael

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Andy Seaborne <an...@apache.org>.


On 02/09/16 13:22, Mikael Pesonen wrote:
>
> Sorry for bombing with questions.

No problem in this case though replies may be delayed - the questions 
aren't always simple things to reply to.

Can you make the data available?


> Im trying to add paging with no success:
>
> SELECT DISTINCT ?s ?p ?o WHERE {
>   GRAPH ?graph { SELECT DISTINCT ?child WHERE
> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
> ?child}}}
> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>
> { SELECT ?s WHERE {
>
>  ?s <http://purl.org/dc/terms/subject> ?child .
> ?s <http://purl.org/dc/terms/isPartOf>
> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>       } limit 2 }
>  ?s ?p ?o .
> } }

So 2nd the inner SELECT ?s will return 2 items and it hides the ?child. 
You need to add it to the SELECT clause.  You might as well call it ?Z - 
the results are same.

And from that block you get rows of ?s ?p ?o.  No ?child - it didn't get 
out of the "SELECT ?s"

but the first part "SELECT DISTINCT  ?child" has ?child columns - it's 
an unrelated cross product.  You'll get all the results for ?child * all 
the triples of 2 isPartOf items.

If trye:

qparse -print=opt you'll see ?child gets renamed to "?/child" in the 
second part.

	Andy

> This should return two sets of records having dcterms:subject one of
> children from ontology query. But I get results where subject seems to
> be other random items from same ontology.
>
>
> This works, except can't set correct limit:
>
> SELECT DISTINCT ?s ?p ?o WHERE {
>   GRAPH ?graph { SELECT DISTINCT ?child WHERE
> {{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower*
> ?child}}}
> GRAPH <http://www.lingsoft.fi/resource-meta/> {
>
>  ?s <http://purl.org/dc/terms/subject> ?child .
> ?s <http://purl.org/dc/terms/isPartOf>
> <http://www.lingsoft.fi/rdf/uid/57aae39836662> .
>
>  ?s ?p ?o .
> } }
> limit 200
>
>
> Mikael
>
>
>
>
> On 2.9.2016 12:04, Mikael Pesonen wrote:
>>
>> But I think we can handle this by adding paging, so not a show stopper...
>>
>> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>>
>>> Tested that one. Example query, similar that Ive sent here, took on
>>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>>
>>> So a bit of improvement but we need to get that query to couple of
>>> secs so that its usable for our application.
>>>
>>> Mikael
>>>
>>>
>>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>>
>>>>> How do I get the snapshot? I found this page
>>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>>>> but how to download?
>>>>>
>>>>> Mikael
>>>>
>>>> The builds end up in a maven repo:
>>>>
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/
>>>>
>>>>
>>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>>
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/
>>>>
>>>>
>>>> Make sure you scroll down: the newest is at the bottom (just at the
>>>> moment it is "20160831.100731-60"
>>>>
>>>>     Andy
>>>>
>>>
>>
>

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Sorry for bombing with questions. Im trying to add paging with no success:

SELECT DISTINCT ?s ?p ?o WHERE {
   GRAPH ?graph { SELECT DISTINCT ?child WHERE 
{{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower* 
?child}}}
GRAPH <http://www.lingsoft.fi/resource-meta/> {

{ SELECT ?s WHERE {

  ?s <http://purl.org/dc/terms/subject> ?child .
?s <http://purl.org/dc/terms/isPartOf> 
<http://www.lingsoft.fi/rdf/uid/57aae39836662> .
       } limit 2 }
  ?s ?p ?o .
} }


This should return two sets of records having dcterms:subject one of 
children from ontology query. But I get results where subject seems to 
be other random items from same ontology.


This works, except can't set correct limit:

SELECT DISTINCT ?s ?p ?o WHERE {
   GRAPH ?graph { SELECT DISTINCT ?child WHERE 
{{<http://www.lingsoft.fi/ontologies/e882a3c73c56c42a> skos:narrower* 
?child}}}
GRAPH <http://www.lingsoft.fi/resource-meta/> {

  ?s <http://purl.org/dc/terms/subject> ?child .
?s <http://purl.org/dc/terms/isPartOf> 
<http://www.lingsoft.fi/rdf/uid/57aae39836662> .

  ?s ?p ?o .
} }
limit 200


Mikael




On 2.9.2016 12:04, Mikael Pesonen wrote:
>
> But I think we can handle this by adding paging, so not a show stopper...
>
> On 2.9.2016 11:52, Mikael Pesonen wrote:
>>
>> Tested that one. Example query, similar that Ive sent here, took on 
>> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>>
>> So a bit of improvement but we need to get that query to couple of 
>> secs so that its usable for our application.
>>
>> Mikael
>>
>>
>> On 1.9.2016 12:40, Andy Seaborne wrote:
>>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>>
>>>> How do I get the snapshot? I found this page
>>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>>> but how to download?
>>>>
>>>> Mikael
>>>
>>> The builds end up in a maven repo:
>>>
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 
>>>
>>>
>>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>>
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/ 
>>>
>>>
>>> Make sure you scroll down: the newest is at the bottom (just at the 
>>> moment it is "20160831.100731-60"
>>>
>>>     Andy
>>>
>>
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

But I think we can handle this by adding paging, so not a show stopper...

On 2.9.2016 11:52, Mikael Pesonen wrote:
>
> Tested that one. Example query, similar that Ive sent here, took on 
> average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.
>
> So a bit of improvement but we need to get that query to couple of 
> secs so that its usable for our application.
>
> Mikael
>
>
> On 1.9.2016 12:40, Andy Seaborne wrote:
>> On 01/09/16 09:20, Mikael Pesonen wrote:
>>>
>>> How do I get the snapshot? I found this page
>>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>>> but how to download?
>>>
>>> Mikael
>>
>> The builds end up in a maven repo:
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 
>>
>>
>> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/ 
>>
>>
>> Make sure you scroll down: the newest is at the bottom (just at the 
>> moment it is "20160831.100731-60"
>>
>>     Andy
>>
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Tested that one. Example query, similar that Ive sent here, took on 
average ~14 secs on 2.3.1 and 13.5 secs on 2.4.1.

So a bit of improvement but we need to get that query to couple of secs 
so that its usable for our application.

Mikael


On 1.9.2016 12:40, Andy Seaborne wrote:
> On 01/09/16 09:20, Mikael Pesonen wrote:
>>
>> How do I get the snapshot? I found this page
>> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
>> but how to download?
>>
>> Mikael
>
> The builds end up in a maven repo:
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 
>
>
> and you want the "apache-jena-fuseki" module of the SNAPSHOT version:
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/ 
>
>
> Make sure you scroll down: the newest is at the bottom (just at the 
> moment it is "20160831.100731-60"
>
>     Andy
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Posted by Andy Seaborne <an...@apache.org>.

On 01/09/16 09:20, Mikael Pesonen wrote:
>
> How do I get the snapshot? I found this page
> https://builds.apache.org/job/Jena_Development_Deploy/lastStableBuild/
> but how to download?
>
> Mikael

The builds end up in a maven repo:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/

and you want the "apache-jena-fuseki" module of the SNAPSHOT version:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/2.4.1-SNAPSHOT/

Make sure you scroll down: the newest is at the bottom (just at the 
moment it is "20160831.100731-60"

	Andy