You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2017/10/26 12:47:32 UTC
Slow query when getting rdf:type
Hi, I have trouble understanding why the first query is slow and second
one is fast. Using Jena Fuseki 3.4.0.
So I want to get all resources that reference <some resource>, and their
types:
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s a ?type
}
}
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s ?p2 ?o2
}
}
First one takes 5 seconds which is too slow for our application. Can it
be rearranged somehow to make fast? Sorry if this is not a correct forum
for this.
Thanks!
--
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
yes I was using the same resource for testing. Jena has been running for
weeks so not a cold system if understood correctly. Sorry what means
inference?
Br,
Mikael
On 27.10.2017 13:02, Andy Seaborne wrote:
> In this case, stats won't help. The <some resource> shoudl eb the
> starting point.
>
> (quadpattern
> (quad ?g ?s ?p <some:resource>)
> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
> )
>
> (quadpattern
> (quad ?g ?s ?p <some:resource>)
> (quad ?g ?s ?p2 ?o2)
> )))
>
> Are you using inference as well?
>
> Is it the same <some resource>?
>
> Is the timing for the rdf:type variant on a cold system?
>
> Andy
>
>
>
> On 27/10/17 10:22, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> thanks! I'll try that when get chance to stop jena. Yes we are using
>> TDB.
>>
>>
>>
>> On 26.10.2017 16:15, Rob Vesse wrote:
>>> Is TDB the underlying database?
>>>
>>> If so is there a stats.opt file in your database directory?
>>>
>>> I remember there being issues in the past with the statistics for
>>> rdf:type triples being wrongly prioritised. You might want to look
>>> at that file, assuming that it exists, and you try adjusting values
>>> associated with rdf:type based upon the guidance in the documentation:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>
>>>
>>> Also if this is a database which is being updated then the
>>> statistics can get out of date relative to the database. You can use
>>> the commandline tdbstats tool to try regenerating this:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>> Note that you will need to stop Fuseki in order to run this as only
>>> a single process is permitted to access a TDB database at a time
>>>
>>> Rob
>>>
>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi>
>>> wrote:
>>>
>>> Hi, I have trouble understanding why the first query is slow
>>> and second
>>> one is fast. Using Jena Fuseki 3.4.0.
>>> So I want to get all resources that reference <some resource>,
>>> and their
>>> types:
>>> SELECT * WHERE
>>> {
>>> GRAPH ?g
>>> {
>>> ?s ?p <some resource> .
>>> ?s a ?type
>>> }
>>> }
>>> SELECT * WHERE
>>> {
>>> GRAPH ?g
>>> {
>>> ?s ?p <some resource> .
>>> ?s ?p2 ?o2
>>> }
>>> }
>>> First one takes 5 seconds which is too slow for our
>>> application. Can it
>>> be rearranged somehow to make fast? Sorry if this is not a
>>> correct forum
>>> for this.
>>> Thanks!
>>> --
>>>
>>>
>>>
>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
With none.opt query is fast, thanks!
Br,
Mikael
On 8.11.2017 13:24, Andy Seaborne wrote:
> Worth trying with "none.opt" and "fixed.opt".
>
> My guess is that "none.opt" will show a difference in the speed of the
> rdf:type quesry and not for the other one.
>
> Andy
>
> (the overall query counts would be useful as well).
>
> On 08/11/17 10:37, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> only opt file is the stats.opt, and I made sure there was not such
>> file before running the tool.
>>
>> Br
>>
>> On 8.11.2017 11:13, Andy Seaborne wrote:
>>> Mikael,
>>>
>>> Did the database directory have a stats.opt file in it to start with?
>>>
>>> Reading back through this thread, I can't see mention of whether
>>> there was one or not.
>>>
>>> An experiment on the different optimization approaches:
>>>
>>> Take a copy of the database directory when Fuseki is not running.
>>>
>>> In the copied TDB directory, look for all the *.opt files, note
>>> whether there are multiple such files. Move them all elsewhere.
>>>
>>> Put in the stats you showed "stats.opt".
>>>
>>> Now try your queries using tdbquery --loc=... --file ...
>>>
>>> Delete the stats file, and put in am empty file called "none.opt"
>>> and try the queries again.
>>>
>>> Same again for "fixed.opt": delete the "opt" file and put in an
>>> empty one "fixed.opt".
>>>
>>> The three files for the reordering step are "stats.opt" (with
>>> statistics), "fixed.opt" (a built-in reordering) and ""none.opt"
>>> (don't change the order of triples). There should only be one or
>>> zero in the database directory; the default is "fixed".
>>>
>>> Andy
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Andy Seaborne <an...@apache.org>.
Worth trying with "none.opt" and "fixed.opt".
My guess is that "none.opt" will show a difference in the speed of the
rdf:type quesry and not for the other one.
Andy
(the overall query counts would be useful as well).
On 08/11/17 10:37, Mikael Pesonen wrote:
>
> Hi,
>
> only opt file is the stats.opt, and I made sure there was not such file
> before running the tool.
>
> Br
>
> On 8.11.2017 11:13, Andy Seaborne wrote:
>> Mikael,
>>
>> Did the database directory have a stats.opt file in it to start with?
>>
>> Reading back through this thread, I can't see mention of whether there
>> was one or not.
>>
>> An experiment on the different optimization approaches:
>>
>> Take a copy of the database directory when Fuseki is not running.
>>
>> In the copied TDB directory, look for all the *.opt files, note
>> whether there are multiple such files. Move them all elsewhere.
>>
>> Put in the stats you showed "stats.opt".
>>
>> Now try your queries using tdbquery --loc=... --file ...
>>
>> Delete the stats file, and put in am empty file called "none.opt" and
>> try the queries again.
>>
>> Same again for "fixed.opt": delete the "opt" file and put in an empty
>> one "fixed.opt".
>>
>> The three files for the reordering step are "stats.opt" (with
>> statistics), "fixed.opt" (a built-in reordering) and ""none.opt"
>> (don't change the order of triples). There should only be one or zero
>> in the database directory; the default is "fixed".
>>
>> Andy
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
only opt file is the stats.opt, and I made sure there was not such file
before running the tool.
Br
On 8.11.2017 11:13, Andy Seaborne wrote:
> Mikael,
>
> Did the database directory have a stats.opt file in it to start with?
>
> Reading back through this thread, I can't see mention of whether there
> was one or not.
>
> An experiment on the different optimization approaches:
>
> Take a copy of the database directory when Fuseki is not running.
>
> In the copied TDB directory, look for all the *.opt files, note
> whether there are multiple such files. Move them all elsewhere.
>
> Put in the stats you showed "stats.opt".
>
> Now try your queries using tdbquery --loc=... --file ...
>
> Delete the stats file, and put in am empty file called "none.opt" and
> try the queries again.
>
> Same again for "fixed.opt": delete the "opt" file and put in an empty
> one "fixed.opt".
>
> The three files for the reordering step are "stats.opt" (with
> statistics), "fixed.opt" (a built-in reordering) and ""none.opt"
> (don't change the order of triples). There should only be one or zero
> in the database directory; the default is "fixed".
>
> Andy
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Andy Seaborne <an...@apache.org>.
Mikael,
Did the database directory have a stats.opt file in it to start with?
Reading back through this thread, I can't see mention of whether there
was one or not.
An experiment on the different optimization approaches:
Take a copy of the database directory when Fuseki is not running.
In the copied TDB directory, look for all the *.opt files, note whether
there are multiple such files. Move them all elsewhere.
Put in the stats you showed "stats.opt".
Now try your queries using tdbquery --loc=... --file ...
Delete the stats file, and put in am empty file called "none.opt" and
try the queries again.
Same again for "fixed.opt": delete the "opt" file and put in an empty
one "fixed.opt".
The three files for the reordering step are "stats.opt" (with
statistics), "fixed.opt" (a built-in reordering) and ""none.opt" (don't
change the order of triples). There should only be one or zero in the
database directory; the default is "fixed".
Andy
Re: Slow query when getting rdf:type
Posted by Andy Seaborne <an...@apache.org>.
In this case, stats won't help.
The two execution plans are:
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)
and
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s ?p2 ?o2)
)))
I can't see why the first is slower - I'd expect the second as it has
more answers.
What counts do you get for the two queries?
On 07/11/17 15:02, Mikael Pesonen wrote:
>
> Thanks for explaining! SPARQL query is still as slow as before. So
> getting rdf:type slows it down.
>
> Br,
>
> On 7.11.2017 16:51, ajs6f@apache.org wrote:
>> Yes. That is exactly the expected behavior. Please read the entire page.
>>
>> It explains that the query optimizer can use the stats file to
>> optimize the execution of queries. Any change you would expect to see
>> in behavior will occur at query time. Try your queries again and see
>> if there are changes in the execution times or query explanations.
>>
>>
>> ajs6f
>>
>> Mikael Pesonen wrote on 11/7/17 9:43 AM:
>>>
>>> Thanks for the help. So outputted stats into tmp file and moved to
>>> stats.opt into index folder.
>>>
>>> Rerunning tdbstats seems to give same result still:
>>>
>>> (stats
>>> (meta
>>> (timestamp
>>> "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>>
>>> (run@ "2017/11/07 16:39:53 EET")
>>> (count 165865))
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://purl.org/vocab/frbr/core#Work>)
>>> 2)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>>
>>> 1097)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>> 896)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>> 36)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>> 29284)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://purl.org/dc/dcmitype/Text>)
>>> 1622)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://xmlns.com/foaf/0.1/Document>)
>>> 1097)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://purl.org/dc/dcmitype/Collection>)
>>> 5)
>>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
>>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
>>>
>>> (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
>>> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
>>> (<http://purl.org/dc/elements/1.1/format> 4696)
>>> (<http://purl.org/dc/terms/created> 723)
>>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>> (<http://purl.org/dc/elements/1.1/description> 6)
>>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>> (<http://purl.org/dc/terms/type> 1623)
>>> (<http://purl.org/dc/terms/accessRights> 1)
>>> (<http://purl.org/dc/terms/identifier> 78)
>>> (<http://purl.org/dc/terms/hasFormat> 3016)
>>> (<http://purl.org/dc/terms/modified> 30899)
>>>
>>> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>>>> Take a look at the link Rob sent you again. Please read the _entire_
>>>> page carefully. Under:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>
>>>>
>>>> You will see: "The command line tdbstats will scan the data and
>>>> produce a rules file based on the frequency of
>>>> properties. The output should first go to a temporary file, then
>>>> that file moved into the database location."
>>>>
>>>> You need to actually use the output of tdbstats by moving it into
>>>> your database directory.
>>>>
>>>>
>>>> ajs6f
>>>>
>>>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>>>
>>>>> Hi,
>>>>>
>>>>> sorry, I don't understand how tdbstats work. I ran it against the
>>>>> same graph that making the slow query and got the
>>>>> result below (some lines removed)
>>>>>
>>>>> Br,
>>>>> Mikael
>>>>>
>>>>> (stats
>>>>> (meta
>>>>> (timestamp
>>>>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>>>>
>>>>> (run@ "2017/11/07 13:24:16 EET")
>>>>> (count 165911))
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://purl.org/vocab/frbr/core#Work>)
>>>>> 3)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>>>>
>>>>> 1098)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>>> 897)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>>> 36)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>>>>>
>>>>> 1)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://purl.org/vocab/frbr/core#Expression>)
>>>>> 3)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>>> 29284)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://purl.org/dc/dcmitype/Text>)
>>>>> 1623)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>>>>>
>>>>> 2)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://xmlns.com/foaf/0.1/Document>)
>>>>> 1100)
>>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://purl.org/dc/dcmitype/Collection>)
>>>>> 5)
>>>>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>>>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
>>>>> 59)
>>>>> ...
>>>>> (<http://purl.org/dc/elements/1.1/format> 4697)
>>>>> (<http://purl.org/dc/terms/created> 725)
>>>>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>>>>
>>>>> (<http://purl.org/dc/elements/1.1/description> 6)
>>>>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>>> (<http://purl.org/dc/terms/type> 1624)
>>>>> (<http://purl.org/dc/terms/accessRights> 2)
>>>>> (<http://purl.org/dc/terms/identifier> 78)
>>>>> ...
>>>>>
>>>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>>>> Mikael,
>>>>>>
>>>>>> I can't find anything that makes rdf:type special. Maybe some
>>>>>> distribution of data is the cause but I'm not seeing it.
>>>>>>
>>>>>> Did you get a chance to get some stats?
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> Tried this also with other properties such as dcterms:created,
>>>>>>> and it didnt slow down with them.
>>>>>>>
>>>>>>> -Mikael
>>>>>>>
>>>>>>>
>>>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>>>> In this case, stats won't help. The <some resource> shoudl eb
>>>>>>>> the starting point.
>>>>>>>>
>>>>>>>> (quadpattern
>>>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>>>>> ?type)
>>>>>>>> )
>>>>>>>>
>>>>>>>> (quadpattern
>>>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>>>> (quad ?g ?s ?p2 ?o2)
>>>>>>>> )))
>>>>>>>>
>>>>>>>> Are you using inference as well?
>>>>>>>>
>>>>>>>> Is it the same <some resource>?
>>>>>>>>
>>>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are
>>>>>>>>> using TDB.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>>>> Is TDB the underlying database?
>>>>>>>>>>
>>>>>>>>>> If so is there a stats.opt file in your database directory?
>>>>>>>>>>
>>>>>>>>>> I remember there being issues in the past with the statistics
>>>>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>>>>> might want to look at that file, assuming that it exists, and
>>>>>>>>>> you try adjusting values associated with rdf:type
>>>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>>>
>>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also if this is a database which is being updated then the
>>>>>>>>>> statistics can get out of date relative to the
>>>>>>>>>> database. You can use the commandline tdbstats tool to try
>>>>>>>>>> regenerating this:
>>>>>>>>>>
>>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Note that you will need to stop Fuseki in order to run this as
>>>>>>>>>> only a single process is permitted to access a TDB
>>>>>>>>>> database at a time
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen"
>>>>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi, I have trouble understanding why the first query is
>>>>>>>>>> slow and second
>>>>>>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>>>> So I want to get all resources that reference <some
>>>>>>>>>> resource>, and their
>>>>>>>>>> types:
>>>>>>>>>> SELECT * WHERE
>>>>>>>>>> {
>>>>>>>>>> GRAPH ?g
>>>>>>>>>> {
>>>>>>>>>> ?s ?p <some resource> .
>>>>>>>>>> ?s a ?type
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> SELECT * WHERE
>>>>>>>>>> {
>>>>>>>>>> GRAPH ?g
>>>>>>>>>> {
>>>>>>>>>> ?s ?p <some resource> .
>>>>>>>>>> ?s ?p2 ?o2
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> First one takes 5 seconds which is too slow for our
>>>>>>>>>> application. Can it
>>>>>>>>>> be rearranged somehow to make fast? Sorry if this is not
>>>>>>>>>> a correct forum
>>>>>>>>>> for this.
>>>>>>>>>> Thanks!
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for explaining! SPARQL query is still as slow as before. So
getting rdf:type slows it down.
Br,
On 7.11.2017 16:51, ajs6f@apache.org wrote:
> Yes. That is exactly the expected behavior. Please read the entire page.
>
> It explains that the query optimizer can use the stats file to
> optimize the execution of queries. Any change you would expect to see
> in behavior will occur at query time. Try your queries again and see
> if there are changes in the execution times or query explanations.
>
>
> ajs6f
>
> Mikael Pesonen wrote on 11/7/17 9:43 AM:
>>
>> Thanks for the help. So outputted stats into tmp file and moved to
>> stats.opt into index folder.
>>
>> Rerunning tdbstats seems to give same result still:
>>
>> (stats
>> (meta
>> (timestamp
>> "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>> (run@ "2017/11/07 16:39:53 EET")
>> (count 165865))
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/vocab/frbr/core#Work>)
>> 2)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>
>> 1097)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/vocab/frbr/core#Manifestation>)
>> 896)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/2004/02/skos/core#Concept>)
>> 36)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>> 29284)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/dc/dcmitype/Text>)
>> 1622)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://xmlns.com/foaf/0.1/Document>)
>> 1097)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/dc/dcmitype/Collection>)
>> 5)
>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
>> 57)
>> (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
>> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
>> (<http://purl.org/dc/elements/1.1/format> 4696)
>> (<http://purl.org/dc/terms/created> 723)
>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>> (<http://purl.org/dc/elements/1.1/description> 6)
>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>> (<http://purl.org/dc/terms/type> 1623)
>> (<http://purl.org/dc/terms/accessRights> 1)
>> (<http://purl.org/dc/terms/identifier> 78)
>> (<http://purl.org/dc/terms/hasFormat> 3016)
>> (<http://purl.org/dc/terms/modified> 30899)
>>
>> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>>> Take a look at the link Rob sent you again. Please read the _entire_
>>> page carefully. Under:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>> You will see: "The command line tdbstats will scan the data and
>>> produce a rules file based on the frequency of
>>> properties. The output should first go to a temporary file, then
>>> that file moved into the database location."
>>>
>>> You need to actually use the output of tdbstats by moving it into
>>> your database directory.
>>>
>>>
>>> ajs6f
>>>
>>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>>
>>>> Hi,
>>>>
>>>> sorry, I don't understand how tdbstats work. I ran it against the
>>>> same graph that making the slow query and got the
>>>> result below (some lines removed)
>>>>
>>>> Br,
>>>> Mikael
>>>>
>>>> (stats
>>>> (meta
>>>> (timestamp
>>>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>>> (run@ "2017/11/07 13:24:16 EET")
>>>> (count 165911))
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.org/vocab/frbr/core#Work>)
>>>> 3)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>>>
>>>> 1098)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>> 897)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>> 36)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>>>>
>>>> 1)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.org/vocab/frbr/core#Expression>)
>>>> 3)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>> 29284)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.org/dc/dcmitype/Text>)
>>>> 1623)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>>>>
>>>> 2)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://xmlns.com/foaf/0.1/Document>)
>>>> 1100)
>>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.org/dc/dcmitype/Collection>)
>>>> 5)
>>>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
>>>> 59)
>>>> ...
>>>> (<http://purl.org/dc/elements/1.1/format> 4697)
>>>> (<http://purl.org/dc/terms/created> 725)
>>>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version>
>>>> 1)
>>>> (<http://purl.org/dc/elements/1.1/description> 6)
>>>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>> (<http://purl.org/dc/terms/type> 1624)
>>>> (<http://purl.org/dc/terms/accessRights> 2)
>>>> (<http://purl.org/dc/terms/identifier> 78)
>>>> ...
>>>>
>>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>>> Mikael,
>>>>>
>>>>> I can't find anything that makes rdf:type special. Maybe some
>>>>> distribution of data is the cause but I'm not seeing it.
>>>>>
>>>>> Did you get a chance to get some stats?
>>>>>
>>>>> Andy
>>>>>
>>>>>
>>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>>
>>>>>> Tried this also with other properties such as dcterms:created,
>>>>>> and it didnt slow down with them.
>>>>>>
>>>>>> -Mikael
>>>>>>
>>>>>>
>>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>>> In this case, stats won't help. The <some resource> shoudl eb
>>>>>>> the starting point.
>>>>>>>
>>>>>>> (quadpattern
>>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>>>> ?type)
>>>>>>> )
>>>>>>>
>>>>>>> (quadpattern
>>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>>> (quad ?g ?s ?p2 ?o2)
>>>>>>> )))
>>>>>>>
>>>>>>> Are you using inference as well?
>>>>>>>
>>>>>>> Is it the same <some resource>?
>>>>>>>
>>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are
>>>>>>>> using TDB.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>>> Is TDB the underlying database?
>>>>>>>>>
>>>>>>>>> If so is there a stats.opt file in your database directory?
>>>>>>>>>
>>>>>>>>> I remember there being issues in the past with the statistics
>>>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>>>> might want to look at that file, assuming that it exists, and
>>>>>>>>> you try adjusting values associated with rdf:type
>>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>>
>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also if this is a database which is being updated then the
>>>>>>>>> statistics can get out of date relative to the
>>>>>>>>> database. You can use the commandline tdbstats tool to try
>>>>>>>>> regenerating this:
>>>>>>>>>
>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Note that you will need to stop Fuseki in order to run this as
>>>>>>>>> only a single process is permitted to access a TDB
>>>>>>>>> database at a time
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen"
>>>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>>>
>>>>>>>>> Hi, I have trouble understanding why the first query is
>>>>>>>>> slow and second
>>>>>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>>> So I want to get all resources that reference <some
>>>>>>>>> resource>, and their
>>>>>>>>> types:
>>>>>>>>> SELECT * WHERE
>>>>>>>>> {
>>>>>>>>> GRAPH ?g
>>>>>>>>> {
>>>>>>>>> ?s ?p <some resource> .
>>>>>>>>> ?s a ?type
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> SELECT * WHERE
>>>>>>>>> {
>>>>>>>>> GRAPH ?g
>>>>>>>>> {
>>>>>>>>> ?s ?p <some resource> .
>>>>>>>>> ?s ?p2 ?o2
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> First one takes 5 seconds which is too slow for our
>>>>>>>>> application. Can it
>>>>>>>>> be rearranged somehow to make fast? Sorry if this is not
>>>>>>>>> a correct forum
>>>>>>>>> for this.
>>>>>>>>> Thanks!
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by aj...@apache.org.
Yes. That is exactly the expected behavior. Please read the entire page.
It explains that the query optimizer can use the stats file to optimize the execution of queries. Any change you would
expect to see in behavior will occur at query time. Try your queries again and see if there are changes in the execution
times or query explanations.
ajs6f
Mikael Pesonen wrote on 11/7/17 9:43 AM:
>
> Thanks for the help. So outputted stats into tmp file and moved to stats.opt into index folder.
>
> Rerunning tdbstats seems to give same result still:
>
> (stats
> (meta
> (timestamp "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
> (run@ "2017/11/07 16:39:53 EET")
> (count 165865))
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
> 2)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
> 1097)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
> 896)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
> 36)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
> 29284)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
> 1622)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
> 1097)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
> 5)
> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
> (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
> (<http://purl.org/dc/elements/1.1/format> 4696)
> (<http://purl.org/dc/terms/created> 723)
> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
> (<http://purl.org/dc/elements/1.1/description> 6)
> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
> (<http://purl.org/dc/terms/type> 1623)
> (<http://purl.org/dc/terms/accessRights> 1)
> (<http://purl.org/dc/terms/identifier> 78)
> (<http://purl.org/dc/terms/hasFormat> 3016)
> (<http://purl.org/dc/terms/modified> 30899)
>
> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>> Take a look at the link Rob sent you again. Please read the _entire_ page carefully. Under:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>> You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of
>> properties. The output should first go to a temporary file, then that file moved into the database location."
>>
>> You need to actually use the output of tdbstats by moving it into your database directory.
>>
>>
>> ajs6f
>>
>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>
>>> Hi,
>>>
>>> sorry, I don't understand how tdbstats work. I ran it against the same graph that making the slow query and got the
>>> result below (some lines removed)
>>>
>>> Br,
>>> Mikael
>>>
>>> (stats
>>> (meta
>>> (timestamp "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>> (run@ "2017/11/07 13:24:16 EET")
>>> (count 165911))
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
>>> 3)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>> 1098)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
>>> 897)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
>>> 36)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>>> 1)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Expression>)
>>> 3)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>> 29284)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
>>> 1623)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>>> 2)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
>>> 1100)
>>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
>>> 5)
>>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
>>> ...
>>> (<http://purl.org/dc/elements/1.1/format> 4697)
>>> (<http://purl.org/dc/terms/created> 725)
>>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>> (<http://purl.org/dc/elements/1.1/description> 6)
>>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>> (<http://purl.org/dc/terms/type> 1624)
>>> (<http://purl.org/dc/terms/accessRights> 2)
>>> (<http://purl.org/dc/terms/identifier> 78)
>>> ...
>>>
>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>> Mikael,
>>>>
>>>> I can't find anything that makes rdf:type special. Maybe some distribution of data is the cause but I'm not seeing it.
>>>>
>>>> Did you get a chance to get some stats?
>>>>
>>>> Andy
>>>>
>>>>
>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>
>>>>> Tried this also with other properties such as dcterms:created, and it didnt slow down with them.
>>>>>
>>>>> -Mikael
>>>>>
>>>>>
>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>> In this case, stats won't help. The <some resource> shoudl eb the starting point.
>>>>>>
>>>>>> (quadpattern
>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>>>>> )
>>>>>>
>>>>>> (quadpattern
>>>>>> (quad ?g ?s ?p <some:resource>)
>>>>>> (quad ?g ?s ?p2 ?o2)
>>>>>> )))
>>>>>>
>>>>>> Are you using inference as well?
>>>>>>
>>>>>> Is it the same <some resource>?
>>>>>>
>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>> Is TDB the underlying database?
>>>>>>>>
>>>>>>>> If so is there a stats.opt file in your database directory?
>>>>>>>>
>>>>>>>> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You
>>>>>>>> might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type
>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>
>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>>>
>>>>>>>> Also if this is a database which is being updated then the statistics can get out of date relative to the
>>>>>>>> database. You can use the commandline tdbstats tool to try regenerating this:
>>>>>>>>
>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>>>
>>>>>>>> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB
>>>>>>>> database at a time
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>>>>>>>
>>>>>>>> Hi, I have trouble understanding why the first query is slow and second
>>>>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>> So I want to get all resources that reference <some resource>, and their
>>>>>>>> types:
>>>>>>>> SELECT * WHERE
>>>>>>>> {
>>>>>>>> GRAPH ?g
>>>>>>>> {
>>>>>>>> ?s ?p <some resource> .
>>>>>>>> ?s a ?type
>>>>>>>> }
>>>>>>>> }
>>>>>>>> SELECT * WHERE
>>>>>>>> {
>>>>>>>> GRAPH ?g
>>>>>>>> {
>>>>>>>> ?s ?p <some resource> .
>>>>>>>> ?s ?p2 ?o2
>>>>>>>> }
>>>>>>>> }
>>>>>>>> First one takes 5 seconds which is too slow for our application. Can it
>>>>>>>> be rearranged somehow to make fast? Sorry if this is not a correct forum
>>>>>>>> for this.
>>>>>>>> Thanks!
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for the help. So outputted stats into tmp file and moved to
stats.opt into index folder.
Rerunning tdbstats seems to give same result still:
(stats
(meta
(timestamp
"2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 16:39:53 EET")
(count 165865))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1097)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
896)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1622)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1097)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
(<http://resource.lingsoft.fi/rdf/resource/producer> 3)
(<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
(<http://purl.org/dc/elements/1.1/format> 4696)
(<http://purl.org/dc/terms/created> 723)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1623)
(<http://purl.org/dc/terms/accessRights> 1)
(<http://purl.org/dc/terms/identifier> 78)
(<http://purl.org/dc/terms/hasFormat> 3016)
(<http://purl.org/dc/terms/modified> 30899)
On 7.11.2017 16:27, ajs6f@apache.org wrote:
> Take a look at the link Rob sent you again. Please read the _entire_
> page carefully. Under:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>
>
> You will see: "The command line tdbstats will scan the data and
> produce a rules file based on the frequency of properties. The output
> should first go to a temporary file, then that file moved into the
> database location."
>
> You need to actually use the output of tdbstats by moving it into your
> database directory.
>
>
> ajs6f
>
> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>
>> Hi,
>>
>> sorry, I don't understand how tdbstats work. I ran it against the
>> same graph that making the slow query and got the
>> result below (some lines removed)
>>
>> Br,
>> Mikael
>>
>> (stats
>> (meta
>> (timestamp
>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>> (run@ "2017/11/07 13:24:16 EET")
>> (count 165911))
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/vocab/frbr/core#Work>)
>> 3)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>
>> 1098)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/vocab/frbr/core#Manifestation>)
>> 897)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/2004/02/skos/core#Concept>)
>> 36)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>>
>> 1)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/vocab/frbr/core#Expression>)
>> 3)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>> 29284)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/dc/dcmitype/Text>)
>> 1623)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>>
>> 2)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://xmlns.com/foaf/0.1/Document>)
>> 1100)
>> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://purl.org/dc/dcmitype/Collection>)
>> 5)
>> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
>> 59)
>> ...
>> (<http://purl.org/dc/elements/1.1/format> 4697)
>> (<http://purl.org/dc/terms/created> 725)
>> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>> (<http://purl.org/dc/elements/1.1/description> 6)
>> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>> (<http://purl.org/dc/terms/type> 1624)
>> (<http://purl.org/dc/terms/accessRights> 2)
>> (<http://purl.org/dc/terms/identifier> 78)
>> ...
>>
>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>> Mikael,
>>>
>>> I can't find anything that makes rdf:type special. Maybe some
>>> distribution of data is the cause but I'm not seeing it.
>>>
>>> Did you get a chance to get some stats?
>>>
>>> Andy
>>>
>>>
>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>
>>>> Tried this also with other properties such as dcterms:created, and
>>>> it didnt slow down with them.
>>>>
>>>> -Mikael
>>>>
>>>>
>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>> In this case, stats won't help. The <some resource> shoudl eb the
>>>>> starting point.
>>>>>
>>>>> (quadpattern
>>>>> (quad ?g ?s ?p <some:resource>)
>>>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> ?type)
>>>>> )
>>>>>
>>>>> (quadpattern
>>>>> (quad ?g ?s ?p <some:resource>)
>>>>> (quad ?g ?s ?p2 ?o2)
>>>>> )))
>>>>>
>>>>> Are you using inference as well?
>>>>>
>>>>> Is it the same <some resource>?
>>>>>
>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>
>>>>> Andy
>>>>>
>>>>>
>>>>>
>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are
>>>>>> using TDB.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>> Is TDB the underlying database?
>>>>>>>
>>>>>>> If so is there a stats.opt file in your database directory?
>>>>>>>
>>>>>>> I remember there being issues in the past with the statistics
>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>> might want to look at that file, assuming that it exists, and
>>>>>>> you try adjusting values associated with rdf:type
>>>>>>> based upon the guidance in the documentation:
>>>>>>>
>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>>
>>>>>>>
>>>>>>> Also if this is a database which is being updated then the
>>>>>>> statistics can get out of date relative to the
>>>>>>> database. You can use the commandline tdbstats tool to try
>>>>>>> regenerating this:
>>>>>>>
>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>>
>>>>>>>
>>>>>>> Note that you will need to stop Fuseki in order to run this as
>>>>>>> only a single process is permitted to access a TDB
>>>>>>> database at a time
>>>>>>>
>>>>>>> Rob
>>>>>>>
>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen"
>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>
>>>>>>> Hi, I have trouble understanding why the first query is
>>>>>>> slow and second
>>>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>>>> So I want to get all resources that reference <some
>>>>>>> resource>, and their
>>>>>>> types:
>>>>>>> SELECT * WHERE
>>>>>>> {
>>>>>>> GRAPH ?g
>>>>>>> {
>>>>>>> ?s ?p <some resource> .
>>>>>>> ?s a ?type
>>>>>>> }
>>>>>>> }
>>>>>>> SELECT * WHERE
>>>>>>> {
>>>>>>> GRAPH ?g
>>>>>>> {
>>>>>>> ?s ?p <some resource> .
>>>>>>> ?s ?p2 ?o2
>>>>>>> }
>>>>>>> }
>>>>>>> First one takes 5 seconds which is too slow for our
>>>>>>> application. Can it
>>>>>>> be rearranged somehow to make fast? Sorry if this is not a
>>>>>>> correct forum
>>>>>>> for this.
>>>>>>> Thanks!
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by aj...@apache.org.
Take a look at the link Rob sent you again. Please read the _entire_ page carefully. Under:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of
properties. The output should first go to a temporary file, then that file moved into the database location."
You need to actually use the output of tdbstats by moving it into your database directory.
ajs6f
Mikael Pesonen wrote on 11/7/17 6:30 AM:
>
> Hi,
>
> sorry, I don't understand how tdbstats work. I ran it against the same graph that making the slow query and got the
> result below (some lines removed)
>
> Br,
> Mikael
>
> (stats
> (meta
> (timestamp "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
> (run@ "2017/11/07 13:24:16 EET")
> (count 165911))
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
> 3)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
> 1098)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
> 897)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
> 36)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
> 1)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Expression>)
> 3)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
> 29284)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
> 1623)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
> 2)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
> 1100)
> ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
> 5)
> (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
> (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
> ...
> (<http://purl.org/dc/elements/1.1/format> 4697)
> (<http://purl.org/dc/terms/created> 725)
> (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
> (<http://purl.org/dc/elements/1.1/description> 6)
> (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
> (<http://purl.org/dc/terms/type> 1624)
> (<http://purl.org/dc/terms/accessRights> 2)
> (<http://purl.org/dc/terms/identifier> 78)
> ...
>
> On 30.10.2017 17:10, Andy Seaborne wrote:
>> Mikael,
>>
>> I can't find anything that makes rdf:type special. Maybe some distribution of data is the cause but I'm not seeing it.
>>
>> Did you get a chance to get some stats?
>>
>> Andy
>>
>>
>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>
>>> Tried this also with other properties such as dcterms:created, and it didnt slow down with them.
>>>
>>> -Mikael
>>>
>>>
>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>> In this case, stats won't help. The <some resource> shoudl eb the starting point.
>>>>
>>>> (quadpattern
>>>> (quad ?g ?s ?p <some:resource>)
>>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>>> )
>>>>
>>>> (quadpattern
>>>> (quad ?g ?s ?p <some:resource>)
>>>> (quad ?g ?s ?p2 ?o2)
>>>> )))
>>>>
>>>> Are you using inference as well?
>>>>
>>>> Is it the same <some resource>?
>>>>
>>>> Is the timing for the rdf:type variant on a cold system?
>>>>
>>>> Andy
>>>>
>>>>
>>>>
>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
>>>>>
>>>>>
>>>>>
>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>> Is TDB the underlying database?
>>>>>>
>>>>>> If so is there a stats.opt file in your database directory?
>>>>>>
>>>>>> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You
>>>>>> might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type
>>>>>> based upon the guidance in the documentation:
>>>>>>
>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>
>>>>>> Also if this is a database which is being updated then the statistics can get out of date relative to the
>>>>>> database. You can use the commandline tdbstats tool to try regenerating this:
>>>>>>
>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>
>>>>>> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB
>>>>>> database at a time
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>>>>>
>>>>>> Hi, I have trouble understanding why the first query is slow and second
>>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>>> So I want to get all resources that reference <some resource>, and their
>>>>>> types:
>>>>>> SELECT * WHERE
>>>>>> {
>>>>>> GRAPH ?g
>>>>>> {
>>>>>> ?s ?p <some resource> .
>>>>>> ?s a ?type
>>>>>> }
>>>>>> }
>>>>>> SELECT * WHERE
>>>>>> {
>>>>>> GRAPH ?g
>>>>>> {
>>>>>> ?s ?p <some resource> .
>>>>>> ?s ?p2 ?o2
>>>>>> }
>>>>>> }
>>>>>> First one takes 5 seconds which is too slow for our application. Can it
>>>>>> be rearranged somehow to make fast? Sorry if this is not a correct forum
>>>>>> for this.
>>>>>> Thanks!
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
sorry, I don't understand how tdbstats work. I ran it against the same
graph that making the slow query and got the result below (some lines
removed)
Br,
Mikael
(stats
(meta
(timestamp
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 13:24:16 EET")
(count 165911))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1098)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
897)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
1)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Expression>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1623)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1100)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
(<http://purl.org/dc/elements/1.1/format> 4697)
(<http://purl.org/dc/terms/created> 725)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1624)
(<http://purl.org/dc/terms/accessRights> 2)
(<http://purl.org/dc/terms/identifier> 78)
...
On 30.10.2017 17:10, Andy Seaborne wrote:
> Mikael,
>
> I can't find anything that makes rdf:type special. Maybe some
> distribution of data is the cause but I'm not seeing it.
>
> Did you get a chance to get some stats?
>
> Andy
>
>
> On 27/10/17 12:27, Mikael Pesonen wrote:
>>
>> Tried this also with other properties such as dcterms:created, and it
>> didnt slow down with them.
>>
>> -Mikael
>>
>>
>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>> In this case, stats won't help. The <some resource> shoudl eb the
>>> starting point.
>>>
>>> (quadpattern
>>> (quad ?g ?s ?p <some:resource>)
>>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>> )
>>>
>>> (quadpattern
>>> (quad ?g ?s ?p <some:resource>)
>>> (quad ?g ?s ?p2 ?o2)
>>> )))
>>>
>>> Are you using inference as well?
>>>
>>> Is it the same <some resource>?
>>>
>>> Is the timing for the rdf:type variant on a cold system?
>>>
>>> Andy
>>>
>>>
>>>
>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>
>>>> Hi,
>>>>
>>>> thanks! I'll try that when get chance to stop jena. Yes we are
>>>> using TDB.
>>>>
>>>>
>>>>
>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>> Is TDB the underlying database?
>>>>>
>>>>> If so is there a stats.opt file in your database directory?
>>>>>
>>>>> I remember there being issues in the past with the statistics for
>>>>> rdf:type triples being wrongly prioritised. You might want to look
>>>>> at that file, assuming that it exists, and you try adjusting
>>>>> values associated with rdf:type based upon the guidance in the
>>>>> documentation:
>>>>>
>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>
>>>>>
>>>>> Also if this is a database which is being updated then the
>>>>> statistics can get out of date relative to the database. You can
>>>>> use the commandline tdbstats tool to try regenerating this:
>>>>>
>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>
>>>>>
>>>>> Note that you will need to stop Fuseki in order to run this as
>>>>> only a single process is permitted to access a TDB database at a time
>>>>>
>>>>> Rob
>>>>>
>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi>
>>>>> wrote:
>>>>>
>>>>> Hi, I have trouble understanding why the first query is slow
>>>>> and second
>>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>>> So I want to get all resources that reference <some
>>>>> resource>, and their
>>>>> types:
>>>>> SELECT * WHERE
>>>>> {
>>>>> GRAPH ?g
>>>>> {
>>>>> ?s ?p <some resource> .
>>>>> ?s a ?type
>>>>> }
>>>>> }
>>>>> SELECT * WHERE
>>>>> {
>>>>> GRAPH ?g
>>>>> {
>>>>> ?s ?p <some resource> .
>>>>> ?s ?p2 ?o2
>>>>> }
>>>>> }
>>>>> First one takes 5 seconds which is too slow for our
>>>>> application. Can it
>>>>> be rearranged somehow to make fast? Sorry if this is not a
>>>>> correct forum
>>>>> for this.
>>>>> Thanks!
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Andy Seaborne <an...@apache.org>.
Mikael,
I can't find anything that makes rdf:type special. Maybe some
distribution of data is the cause but I'm not seeing it.
Did you get a chance to get some stats?
Andy
On 27/10/17 12:27, Mikael Pesonen wrote:
>
> Tried this also with other properties such as dcterms:created, and it
> didnt slow down with them.
>
> -Mikael
>
>
> On 27.10.2017 13:02, Andy Seaborne wrote:
>> In this case, stats won't help. The <some resource> shoudl eb the
>> starting point.
>>
>> (quadpattern
>> (quad ?g ?s ?p <some:resource>)
>> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>> )
>>
>> (quadpattern
>> (quad ?g ?s ?p <some:resource>)
>> (quad ?g ?s ?p2 ?o2)
>> )))
>>
>> Are you using inference as well?
>>
>> Is it the same <some resource>?
>>
>> Is the timing for the rdf:type variant on a cold system?
>>
>> Andy
>>
>>
>>
>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>
>>> Hi,
>>>
>>> thanks! I'll try that when get chance to stop jena. Yes we are using
>>> TDB.
>>>
>>>
>>>
>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>> Is TDB the underlying database?
>>>>
>>>> If so is there a stats.opt file in your database directory?
>>>>
>>>> I remember there being issues in the past with the statistics for
>>>> rdf:type triples being wrongly prioritised. You might want to look
>>>> at that file, assuming that it exists, and you try adjusting values
>>>> associated with rdf:type based upon the guidance in the documentation:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>
>>>>
>>>> Also if this is a database which is being updated then the
>>>> statistics can get out of date relative to the database. You can use
>>>> the commandline tdbstats tool to try regenerating this:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>
>>>>
>>>> Note that you will need to stop Fuseki in order to run this as only
>>>> a single process is permitted to access a TDB database at a time
>>>>
>>>> Rob
>>>>
>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi>
>>>> wrote:
>>>>
>>>> Hi, I have trouble understanding why the first query is slow
>>>> and second
>>>> one is fast. Using Jena Fuseki 3.4.0.
>>>> So I want to get all resources that reference <some resource>,
>>>> and their
>>>> types:
>>>> SELECT * WHERE
>>>> {
>>>> GRAPH ?g
>>>> {
>>>> ?s ?p <some resource> .
>>>> ?s a ?type
>>>> }
>>>> }
>>>> SELECT * WHERE
>>>> {
>>>> GRAPH ?g
>>>> {
>>>> ?s ?p <some resource> .
>>>> ?s ?p2 ?o2
>>>> }
>>>> }
>>>> First one takes 5 seconds which is too slow for our
>>>> application. Can it
>>>> be rearranged somehow to make fast? Sorry if this is not a
>>>> correct forum
>>>> for this.
>>>> Thanks!
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Tried this also with other properties such as dcterms:created, and it
didnt slow down with them.
-Mikael
On 27.10.2017 13:02, Andy Seaborne wrote:
> In this case, stats won't help. The <some resource> shoudl eb the
> starting point.
>
> (quadpattern
> (quad ?g ?s ?p <some:resource>)
> (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
> )
>
> (quadpattern
> (quad ?g ?s ?p <some:resource>)
> (quad ?g ?s ?p2 ?o2)
> )))
>
> Are you using inference as well?
>
> Is it the same <some resource>?
>
> Is the timing for the rdf:type variant on a cold system?
>
> Andy
>
>
>
> On 27/10/17 10:22, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> thanks! I'll try that when get chance to stop jena. Yes we are using
>> TDB.
>>
>>
>>
>> On 26.10.2017 16:15, Rob Vesse wrote:
>>> Is TDB the underlying database?
>>>
>>> If so is there a stats.opt file in your database directory?
>>>
>>> I remember there being issues in the past with the statistics for
>>> rdf:type triples being wrongly prioritised. You might want to look
>>> at that file, assuming that it exists, and you try adjusting values
>>> associated with rdf:type based upon the guidance in the documentation:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>
>>>
>>> Also if this is a database which is being updated then the
>>> statistics can get out of date relative to the database. You can use
>>> the commandline tdbstats tool to try regenerating this:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>
>>>
>>> Note that you will need to stop Fuseki in order to run this as only
>>> a single process is permitted to access a TDB database at a time
>>>
>>> Rob
>>>
>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi>
>>> wrote:
>>>
>>> Hi, I have trouble understanding why the first query is slow
>>> and second
>>> one is fast. Using Jena Fuseki 3.4.0.
>>> So I want to get all resources that reference <some resource>,
>>> and their
>>> types:
>>> SELECT * WHERE
>>> {
>>> GRAPH ?g
>>> {
>>> ?s ?p <some resource> .
>>> ?s a ?type
>>> }
>>> }
>>> SELECT * WHERE
>>> {
>>> GRAPH ?g
>>> {
>>> ?s ?p <some resource> .
>>> ?s ?p2 ?o2
>>> }
>>> }
>>> First one takes 5 seconds which is too slow for our
>>> application. Can it
>>> be rearranged somehow to make fast? Sorry if this is not a
>>> correct forum
>>> for this.
>>> Thanks!
>>> --
>>>
>>>
>>>
>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Andy Seaborne <an...@apache.org>.
In this case, stats won't help. The <some resource> shoudl eb the
starting point.
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s ?p2 ?o2)
)))
Are you using inference as well?
Is it the same <some resource>?
Is the timing for the rdf:type variant on a cold system?
Andy
On 27/10/17 10:22, Mikael Pesonen wrote:
>
> Hi,
>
> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
>
>
>
> On 26.10.2017 16:15, Rob Vesse wrote:
>> Is TDB the underlying database?
>>
>> If so is there a stats.opt file in your database directory?
>>
>> I remember there being issues in the past with the statistics for
>> rdf:type triples being wrongly prioritised. You might want to look at
>> that file, assuming that it exists, and you try adjusting values
>> associated with rdf:type based upon the guidance in the documentation:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>
>>
>> Also if this is a database which is being updated then the statistics
>> can get out of date relative to the database. You can use the
>> commandline tdbstats tool to try regenerating this:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>>
>> Note that you will need to stop Fuseki in order to run this as only a
>> single process is permitted to access a TDB database at a time
>>
>> Rob
>>
>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>
>> Hi, I have trouble understanding why the first query is slow and
>> second
>> one is fast. Using Jena Fuseki 3.4.0.
>> So I want to get all resources that reference <some resource>,
>> and their
>> types:
>> SELECT * WHERE
>> {
>> GRAPH ?g
>> {
>> ?s ?p <some resource> .
>> ?s a ?type
>> }
>> }
>> SELECT * WHERE
>> {
>> GRAPH ?g
>> {
>> ?s ?p <some resource> .
>> ?s ?p2 ?o2
>> }
>> }
>> First one takes 5 seconds which is too slow for our application.
>> Can it
>> be rearranged somehow to make fast? Sorry if this is not a
>> correct forum
>> for this.
>> Thanks!
>> --
>>
>>
>>
>>
>
Re: Slow query when getting rdf:type
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
On 26.10.2017 16:15, Rob Vesse wrote:
> Is TDB the underlying database?
>
> If so is there a stats.opt file in your database directory?
>
> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type based upon the guidance in the documentation:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>
> Also if this is a database which is being updated then the statistics can get out of date relative to the database. You can use the commandline tdbstats tool to try regenerating this:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>
> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB database at a time
>
> Rob
>
> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>
> Hi, I have trouble understanding why the first query is slow and second
> one is fast. Using Jena Fuseki 3.4.0.
>
> So I want to get all resources that reference <some resource>, and their
> types:
>
> SELECT * WHERE
> {
> GRAPH ?g
> {
> ?s ?p <some resource> .
> ?s a ?type
> }
> }
>
> SELECT * WHERE
> {
> GRAPH ?g
> {
> ?s ?p <some resource> .
> ?s ?p2 ?o2
> }
> }
>
>
> First one takes 5 seconds which is too slow for our application. Can it
> be rearranged somehow to make fast? Sorry if this is not a correct forum
> for this.
>
> Thanks!
>
> --
>
>
>
>
>
>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Slow query when getting rdf:type
Posted by Rob Vesse <rv...@dotnetrdf.org>.
Is TDB the underlying database?
If so is there a stats.opt file in your database directory?
I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type based upon the guidance in the documentation:
http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
Also if this is a database which is being updated then the statistics can get out of date relative to the database. You can use the commandline tdbstats tool to try regenerating this:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB database at a time
Rob
On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
Hi, I have trouble understanding why the first query is slow and second
one is fast. Using Jena Fuseki 3.4.0.
So I want to get all resources that reference <some resource>, and their
types:
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s a ?type
}
}
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s ?p2 ?o2
}
}
First one takes 5 seconds which is too slow for our application. Can it
be rearranged somehow to make fast? Sorry if this is not a correct forum
for this.
Thanks!
--