You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2017/10/26 12:47:32 UTC

Slow query when getting rdf:type

Hi, I have trouble understanding why the first query is slow and second 
one is fast. Using Jena Fuseki 3.4.0.

So I want to get all resources that reference <some resource>, and their 
types:

SELECT * WHERE
{
	GRAPH ?g
	{
		?s ?p <some resource> .
     		?s a ?type
	}
}

SELECT * WHERE
{
	GRAPH ?g
	{
		?s ?p <some resource> .
     		?s ?p2 ?o2
	}
}


First one takes 5 seconds which is too slow for our application. Can it 
be rearranged somehow to make fast? Sorry if this is not a correct forum 
for this.

Thanks!

-- 


Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,

yes I was using the same resource for testing. Jena has been running for 
weeks so not a cold system if understood correctly. Sorry what means 
inference?

Br,
Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:
> In this case, stats won't help.  The <some resource> shoudl eb the 
> starting point.
>
> (quadpattern
>   (quad ?g ?s ?p <some:resource>)
>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
> )
>
> (quadpattern
>   (quad ?g ?s ?p <some:resource>)
>   (quad ?g ?s ?p2 ?o2)
> )))
>
> Are you using inference as well?
>
> Is it the same <some resource>?
>
> Is the timing for the rdf:type variant on a cold system?
>
>     Andy
>
>
>
> On 27/10/17 10:22, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> thanks! I'll try that when get chance to stop jena. Yes we are using 
>> TDB.
>>
>>
>>
>> On 26.10.2017 16:15, Rob Vesse wrote:
>>> Is TDB the underlying database?
>>>
>>> If so is there a stats.opt  file in your database directory?
>>>
>>> I remember there being issues in the past with the statistics for 
>>> rdf:type triples being wrongly prioritised. You might want to look 
>>> at that file, assuming that it exists, and you try adjusting values 
>>> associated with rdf:type based upon the guidance in the documentation:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>
>>>
>>> Also if this is a database which is being updated then the 
>>> statistics can get out of date relative to the database. You can use 
>>> the commandline tdbstats tool to try regenerating this:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>
>>>
>>> Note that you will need to stop Fuseki in order to run this as only 
>>> a single process is permitted to access a TDB database at a time
>>>
>>> Rob
>>>
>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> 
>>> wrote:
>>>
>>>      Hi, I have trouble understanding why the first query is slow 
>>> and second
>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>      So I want to get all resources that reference <some resource>, 
>>> and their
>>>      types:
>>>      SELECT * WHERE
>>>      {
>>>          GRAPH ?g
>>>          {
>>>              ?s ?p <some resource> .
>>>                   ?s a ?type
>>>          }
>>>      }
>>>      SELECT * WHERE
>>>      {
>>>          GRAPH ?g
>>>          {
>>>              ?s ?p <some resource> .
>>>                   ?s ?p2 ?o2
>>>          }
>>>      }
>>>      First one takes 5 seconds which is too slow for our 
>>> application. Can it
>>>      be rearranged somehow to make fast? Sorry if this is not a 
>>> correct forum
>>>      for this.
>>>      Thanks!
>>>      --
>>>
>>>
>>>
>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
With none.opt query is fast, thanks!

Br,
Mikael


On 8.11.2017 13:24, Andy Seaborne wrote:
> Worth trying with "none.opt" and "fixed.opt".
>
> My guess is that "none.opt" will show a difference in the speed of the 
> rdf:type quesry and not for the other one.
>
>     Andy
>
> (the overall query counts would be useful as well).
>
> On 08/11/17 10:37, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> only opt file is the stats.opt, and I made sure there was not such 
>> file before running the tool.
>>
>> Br
>>
>> On 8.11.2017 11:13, Andy Seaborne wrote:
>>> Mikael,
>>>
>>> Did the database directory have a stats.opt file in it to start with?
>>>
>>> Reading back through this thread, I can't see mention of whether 
>>> there was one or not.
>>>
>>> An experiment on the different optimization approaches:
>>>
>>> Take a copy of the database directory when Fuseki is not running.
>>>
>>> In the copied TDB directory, look for all the *.opt files, note 
>>> whether there are multiple such files. Move them all elsewhere.
>>>
>>> Put in the stats you showed "stats.opt".
>>>
>>> Now try your queries using tdbquery --loc=... --file ...
>>>
>>> Delete the stats file, and put in am empty file called "none.opt" 
>>> and try the queries again.
>>>
>>> Same again for "fixed.opt": delete the "opt" file and put in an 
>>> empty one "fixed.opt".
>>>
>>> The three files for the reordering step are "stats.opt" (with 
>>> statistics), "fixed.opt" (a built-in reordering) and ""none.opt" 
>>> (don't change the order of triples). There should only be one or 
>>> zero in the database directory; the default is "fixed".
>>>
>>>     Andy
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Andy Seaborne <an...@apache.org>.
Worth trying with "none.opt" and "fixed.opt".

My guess is that "none.opt" will show a difference in the speed of the 
rdf:type quesry and not for the other one.

	Andy

(the overall query counts would be useful as well).

On 08/11/17 10:37, Mikael Pesonen wrote:
> 
> Hi,
> 
> only opt file is the stats.opt, and I made sure there was not such file 
> before running the tool.
> 
> Br
> 
> On 8.11.2017 11:13, Andy Seaborne wrote:
>> Mikael,
>>
>> Did the database directory have a stats.opt file in it to start with?
>>
>> Reading back through this thread, I can't see mention of whether there 
>> was one or not.
>>
>> An experiment on the different optimization approaches:
>>
>> Take a copy of the database directory when Fuseki is not running.
>>
>> In the copied TDB directory, look for all the *.opt files, note 
>> whether there are multiple such files. Move them all elsewhere.
>>
>> Put in the stats you showed "stats.opt".
>>
>> Now try your queries using tdbquery --loc=... --file ...
>>
>> Delete the stats file, and put in am empty file called "none.opt" and 
>> try the queries again.
>>
>> Same again for "fixed.opt": delete the "opt" file and put in an empty 
>> one "fixed.opt".
>>
>> The three files for the reordering step are "stats.opt" (with 
>> statistics), "fixed.opt" (a built-in reordering) and ""none.opt" 
>> (don't change the order of triples). There should only be one or zero 
>> in the database directory; the default is "fixed".
>>
>>     Andy
> 

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,

only opt file is the stats.opt, and I made sure there was not such file 
before running the tool.

Br

On 8.11.2017 11:13, Andy Seaborne wrote:
> Mikael,
>
> Did the database directory have a stats.opt file in it to start with?
>
> Reading back through this thread, I can't see mention of whether there 
> was one or not.
>
> An experiment on the different optimization approaches:
>
> Take a copy of the database directory when Fuseki is not running.
>
> In the copied TDB directory, look for all the *.opt files, note 
> whether there are multiple such files. Move them all elsewhere.
>
> Put in the stats you showed "stats.opt".
>
> Now try your queries using tdbquery --loc=... --file ...
>
> Delete the stats file, and put in am empty file called "none.opt" and 
> try the queries again.
>
> Same again for "fixed.opt": delete the "opt" file and put in an empty 
> one "fixed.opt".
>
> The three files for the reordering step are "stats.opt" (with 
> statistics), "fixed.opt" (a built-in reordering) and ""none.opt" 
> (don't change the order of triples). There should only be one or zero 
> in the database directory; the default is "fixed".
>
>     Andy

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Andy Seaborne <an...@apache.org>.
Mikael,

Did the database directory have a stats.opt file in it to start with?

Reading back through this thread, I can't see mention of whether there 
was one or not.

An experiment on the different optimization approaches:

Take a copy of the database directory when Fuseki is not running.

In the copied TDB directory, look for all the *.opt files, note whether 
there are multiple such files. Move them all elsewhere.

Put in the stats you showed "stats.opt".

Now try your queries using tdbquery --loc=... --file ...

Delete the stats file, and put in am empty file called "none.opt" and 
try the queries again.

Same again for "fixed.opt": delete the "opt" file and put in an empty 
one "fixed.opt".

The three files for the reordering step are "stats.opt" (with 
statistics), "fixed.opt" (a built-in reordering) and ""none.opt" (don't 
change the order of triples). There should only be one or zero in the 
database directory; the default is "fixed".

     Andy

Re: Slow query when getting rdf:type

Posted by Andy Seaborne <an...@apache.org>.
In this case, stats won't help.

The two execution plans are:

(quadpattern
   (quad ?g ?s ?p <some:resource>)
   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)

and

(quadpattern
   (quad ?g ?s ?p <some:resource>)
   (quad ?g ?s ?p2 ?o2)
)))

I can't see why the first is slower - I'd expect the second as it has 
more answers.

What counts do you get for the two queries?


On 07/11/17 15:02, Mikael Pesonen wrote:
> 
> Thanks for explaining! SPARQL query is still as slow as before. So 
> getting rdf:type slows it down.
> 
> Br,
> 
> On 7.11.2017 16:51, ajs6f@apache.org wrote:
>> Yes. That is exactly the expected behavior. Please read the entire page.
>>
>> It explains that the query optimizer can use the stats file to 
>> optimize the execution of queries. Any change you would expect to see 
>> in behavior will occur at query time. Try your queries again and see 
>> if there are changes in the execution times or query explanations.
>>
>>
>> ajs6f
>>
>> Mikael Pesonen wrote on 11/7/17 9:43 AM:
>>>
>>> Thanks for the help. So outputted stats into tmp file and moved to 
>>> stats.opt into index folder.
>>>
>>> Rerunning tdbstats seems to give same result still:
>>>
>>> (stats
>>>   (meta
>>>     (timestamp 
>>> "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>) 
>>>
>>>     (run@ "2017/11/07 16:39:53 EET")
>>>     (count 165865))
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://purl.org/vocab/frbr/core#Work>)
>>>    2)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>) 
>>>
>>>    1097)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>    896)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>    36)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>    29284)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://purl.org/dc/dcmitype/Text>)
>>>    1622)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://xmlns.com/foaf/0.1/Document>)
>>>    1097)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>> <http://purl.org/dc/dcmitype/Collection>)
>>>    5)
>>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
>>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57) 
>>>
>>>   (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
>>> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
>>>   (<http://purl.org/dc/elements/1.1/format> 4696)
>>>   (<http://purl.org/dc/terms/created> 723)
>>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>   (<http://purl.org/dc/terms/type> 1623)
>>>   (<http://purl.org/dc/terms/accessRights> 1)
>>>   (<http://purl.org/dc/terms/identifier> 78)
>>>   (<http://purl.org/dc/terms/hasFormat> 3016)
>>>   (<http://purl.org/dc/terms/modified> 30899)
>>>
>>> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>>>> Take a look at the link Rob sent you again. Please read the _entire_ 
>>>> page carefully. Under:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>
>>>>
>>>> You will see: "The command line tdbstats will scan the data and 
>>>> produce a rules file based on the frequency of
>>>> properties. The output should first go to a temporary file, then 
>>>> that file moved into the database location."
>>>>
>>>> You need to actually use the output of tdbstats by moving it into 
>>>> your database directory.
>>>>
>>>>
>>>> ajs6f
>>>>
>>>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>>>
>>>>> Hi,
>>>>>
>>>>> sorry, I don't understand how tdbstats work. I ran it against the 
>>>>> same graph that making the slow query and got the
>>>>> result below (some lines removed)
>>>>>
>>>>> Br,
>>>>> Mikael
>>>>>
>>>>> (stats
>>>>>   (meta
>>>>>     (timestamp 
>>>>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>) 
>>>>>
>>>>>     (run@ "2017/11/07 13:24:16 EET")
>>>>>     (count 165911))
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://purl.org/vocab/frbr/core#Work>)
>>>>>    3)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>) 
>>>>>
>>>>>    1098)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>>>    897)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>>>    36)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>) 
>>>>>
>>>>>    1)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://purl.org/vocab/frbr/core#Expression>)
>>>>>    3)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>>>    29284)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://purl.org/dc/dcmitype/Text>)
>>>>>    1623)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>) 
>>>>>
>>>>>    2)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://xmlns.com/foaf/0.1/Document>)
>>>>>    1100)
>>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> <http://purl.org/dc/dcmitype/Collection>)
>>>>>    5)
>>>>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>>>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 
>>>>> 59)
>>>>> ...
>>>>>   (<http://purl.org/dc/elements/1.1/format> 4697)
>>>>>   (<http://purl.org/dc/terms/created> 725)
>>>>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1) 
>>>>>
>>>>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>>>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>>>   (<http://purl.org/dc/terms/type> 1624)
>>>>>   (<http://purl.org/dc/terms/accessRights> 2)
>>>>>   (<http://purl.org/dc/terms/identifier> 78)
>>>>> ...
>>>>>
>>>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>>>> Mikael,
>>>>>>
>>>>>> I can't find anything that makes rdf:type special.  Maybe some 
>>>>>> distribution of data is the cause but I'm not seeing it.
>>>>>>
>>>>>> Did you get a chance to get some stats?
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>
>>>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> Tried this also with other properties such as dcterms:created, 
>>>>>>> and it didnt slow down with them.
>>>>>>>
>>>>>>> -Mikael
>>>>>>>
>>>>>>>
>>>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>>>> In this case, stats won't help. The <some resource> shoudl eb 
>>>>>>>> the starting point.
>>>>>>>>
>>>>>>>> (quadpattern
>>>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>>>>> ?type)
>>>>>>>> )
>>>>>>>>
>>>>>>>> (quadpattern
>>>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>>>   (quad ?g ?s ?p2 ?o2)
>>>>>>>> )))
>>>>>>>>
>>>>>>>> Are you using inference as well?
>>>>>>>>
>>>>>>>> Is it the same <some resource>?
>>>>>>>>
>>>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are 
>>>>>>>>> using TDB.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>>>> Is TDB the underlying database?
>>>>>>>>>>
>>>>>>>>>> If so is there a stats.opt  file in your database directory?
>>>>>>>>>>
>>>>>>>>>> I remember there being issues in the past with the statistics 
>>>>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>>>>> might want to look at that file, assuming that it exists, and 
>>>>>>>>>> you try adjusting values associated with rdf:type
>>>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>>>
>>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also if this is a database which is being updated then the 
>>>>>>>>>> statistics can get out of date relative to the
>>>>>>>>>> database. You can use the commandline tdbstats tool to try 
>>>>>>>>>> regenerating this:
>>>>>>>>>>
>>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Note that you will need to stop Fuseki in order to run this as 
>>>>>>>>>> only a single process is permitted to access a TDB
>>>>>>>>>> database at a time
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" 
>>>>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>>>>
>>>>>>>>>>      Hi, I have trouble understanding why the first query is 
>>>>>>>>>> slow and second
>>>>>>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>>>>      So I want to get all resources that reference <some 
>>>>>>>>>> resource>, and their
>>>>>>>>>>      types:
>>>>>>>>>>      SELECT * WHERE
>>>>>>>>>>      {
>>>>>>>>>>          GRAPH ?g
>>>>>>>>>>          {
>>>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>>>                   ?s a ?type
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>>      SELECT * WHERE
>>>>>>>>>>      {
>>>>>>>>>>          GRAPH ?g
>>>>>>>>>>          {
>>>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>>>                   ?s ?p2 ?o2
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>>      First one takes 5 seconds which is too slow for our 
>>>>>>>>>> application. Can it
>>>>>>>>>>      be rearranged somehow to make fast? Sorry if this is not 
>>>>>>>>>> a correct forum
>>>>>>>>>>      for this.
>>>>>>>>>>      Thanks!
>>>>>>>>>>      --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for explaining! SPARQL query is still as slow as before. So 
getting rdf:type slows it down.

Br,

On 7.11.2017 16:51, ajs6f@apache.org wrote:
> Yes. That is exactly the expected behavior. Please read the entire page.
>
> It explains that the query optimizer can use the stats file to 
> optimize the execution of queries. Any change you would expect to see 
> in behavior will occur at query time. Try your queries again and see 
> if there are changes in the execution times or query explanations.
>
>
> ajs6f
>
> Mikael Pesonen wrote on 11/7/17 9:43 AM:
>>
>> Thanks for the help. So outputted stats into tmp file and moved to 
>> stats.opt into index folder.
>>
>> Rerunning tdbstats seems to give same result still:
>>
>> (stats
>>   (meta
>>     (timestamp 
>> "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>     (run@ "2017/11/07 16:39:53 EET")
>>     (count 165865))
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/vocab/frbr/core#Work>)
>>    2)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>) 
>>
>>    1097)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>    896)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>    36)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>    29284)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/dc/dcmitype/Text>)
>>    1622)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://xmlns.com/foaf/0.1/Document>)
>>    1097)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/dc/dcmitype/Collection>)
>>    5)
>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 
>> 57)
>>   (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
>> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
>>   (<http://purl.org/dc/elements/1.1/format> 4696)
>>   (<http://purl.org/dc/terms/created> 723)
>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>   (<http://purl.org/dc/terms/type> 1623)
>>   (<http://purl.org/dc/terms/accessRights> 1)
>>   (<http://purl.org/dc/terms/identifier> 78)
>>   (<http://purl.org/dc/terms/hasFormat> 3016)
>>   (<http://purl.org/dc/terms/modified> 30899)
>>
>> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>>> Take a look at the link Rob sent you again. Please read the _entire_ 
>>> page carefully. Under:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>
>>>
>>> You will see: "The command line tdbstats will scan the data and 
>>> produce a rules file based on the frequency of
>>> properties. The output should first go to a temporary file, then 
>>> that file moved into the database location."
>>>
>>> You need to actually use the output of tdbstats by moving it into 
>>> your database directory.
>>>
>>>
>>> ajs6f
>>>
>>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>>
>>>> Hi,
>>>>
>>>> sorry, I don't understand how tdbstats work. I ran it against the 
>>>> same graph that making the slow query and got the
>>>> result below (some lines removed)
>>>>
>>>> Br,
>>>> Mikael
>>>>
>>>> (stats
>>>>   (meta
>>>>     (timestamp 
>>>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>>>     (run@ "2017/11/07 13:24:16 EET")
>>>>     (count 165911))
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://purl.org/vocab/frbr/core#Work>)
>>>>    3)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>) 
>>>>
>>>>    1098)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>>    897)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>>    36)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>) 
>>>>
>>>>    1)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://purl.org/vocab/frbr/core#Expression>)
>>>>    3)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>>    29284)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://purl.org/dc/dcmitype/Text>)
>>>>    1623)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>) 
>>>>
>>>>    2)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://xmlns.com/foaf/0.1/Document>)
>>>>    1100)
>>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>> <http://purl.org/dc/dcmitype/Collection>)
>>>>    5)
>>>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 
>>>> 59)
>>>> ...
>>>>   (<http://purl.org/dc/elements/1.1/format> 4697)
>>>>   (<http://purl.org/dc/terms/created> 725)
>>>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 
>>>> 1)
>>>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>>   (<http://purl.org/dc/terms/type> 1624)
>>>>   (<http://purl.org/dc/terms/accessRights> 2)
>>>>   (<http://purl.org/dc/terms/identifier> 78)
>>>> ...
>>>>
>>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>>> Mikael,
>>>>>
>>>>> I can't find anything that makes rdf:type special.  Maybe some 
>>>>> distribution of data is the cause but I'm not seeing it.
>>>>>
>>>>> Did you get a chance to get some stats?
>>>>>
>>>>>     Andy
>>>>>
>>>>>
>>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>>
>>>>>> Tried this also with other properties such as dcterms:created, 
>>>>>> and it didnt slow down with them.
>>>>>>
>>>>>> -Mikael
>>>>>>
>>>>>>
>>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>>> In this case, stats won't help. The <some resource> shoudl eb 
>>>>>>> the starting point.
>>>>>>>
>>>>>>> (quadpattern
>>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>>>> ?type)
>>>>>>> )
>>>>>>>
>>>>>>> (quadpattern
>>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>>   (quad ?g ?s ?p2 ?o2)
>>>>>>> )))
>>>>>>>
>>>>>>> Are you using inference as well?
>>>>>>>
>>>>>>> Is it the same <some resource>?
>>>>>>>
>>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>>
>>>>>>>     Andy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are 
>>>>>>>> using TDB.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>>> Is TDB the underlying database?
>>>>>>>>>
>>>>>>>>> If so is there a stats.opt  file in your database directory?
>>>>>>>>>
>>>>>>>>> I remember there being issues in the past with the statistics 
>>>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>>>> might want to look at that file, assuming that it exists, and 
>>>>>>>>> you try adjusting values associated with rdf:type
>>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>>
>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also if this is a database which is being updated then the 
>>>>>>>>> statistics can get out of date relative to the
>>>>>>>>> database. You can use the commandline tdbstats tool to try 
>>>>>>>>> regenerating this:
>>>>>>>>>
>>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Note that you will need to stop Fuseki in order to run this as 
>>>>>>>>> only a single process is permitted to access a TDB
>>>>>>>>> database at a time
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" 
>>>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>>>
>>>>>>>>>      Hi, I have trouble understanding why the first query is 
>>>>>>>>> slow and second
>>>>>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>>>      So I want to get all resources that reference <some 
>>>>>>>>> resource>, and their
>>>>>>>>>      types:
>>>>>>>>>      SELECT * WHERE
>>>>>>>>>      {
>>>>>>>>>          GRAPH ?g
>>>>>>>>>          {
>>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>>                   ?s a ?type
>>>>>>>>>          }
>>>>>>>>>      }
>>>>>>>>>      SELECT * WHERE
>>>>>>>>>      {
>>>>>>>>>          GRAPH ?g
>>>>>>>>>          {
>>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>>                   ?s ?p2 ?o2
>>>>>>>>>          }
>>>>>>>>>      }
>>>>>>>>>      First one takes 5 seconds which is too slow for our 
>>>>>>>>> application. Can it
>>>>>>>>>      be rearranged somehow to make fast? Sorry if this is not 
>>>>>>>>> a correct forum
>>>>>>>>>      for this.
>>>>>>>>>      Thanks!
>>>>>>>>>      --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by aj...@apache.org.
Yes. That is exactly the expected behavior. Please read the entire page.

It explains that the query optimizer can use the stats file to optimize the execution of queries. Any change you would 
expect to see in behavior will occur at query time. Try your queries again and see if there are changes in the execution 
times or query explanations.


ajs6f

Mikael Pesonen wrote on 11/7/17 9:43 AM:
>
> Thanks for the help. So outputted stats into tmp file and moved to stats.opt into index folder.
>
> Rerunning tdbstats seems to give same result still:
>
> (stats
>   (meta
>     (timestamp "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>     (run@ "2017/11/07 16:39:53 EET")
>     (count 165865))
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
>    2)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>    1097)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
>    896)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
>    36)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
>    29284)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
>    1622)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
>    1097)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
>    5)
>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
>   (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
> (<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
>   (<http://purl.org/dc/elements/1.1/format> 4696)
>   (<http://purl.org/dc/terms/created> 723)
>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>   (<http://purl.org/dc/elements/1.1/description> 6)
>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>   (<http://purl.org/dc/terms/type> 1623)
>   (<http://purl.org/dc/terms/accessRights> 1)
>   (<http://purl.org/dc/terms/identifier> 78)
>   (<http://purl.org/dc/terms/hasFormat> 3016)
>   (<http://purl.org/dc/terms/modified> 30899)
>
> On 7.11.2017 16:27, ajs6f@apache.org wrote:
>> Take a look at the link Rob sent you again. Please read the _entire_ page carefully. Under:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>
>> You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of
>> properties. The output should first go to a temporary file, then that file moved into the database location."
>>
>> You need to actually use the output of tdbstats by moving it into your database directory.
>>
>>
>> ajs6f
>>
>> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>>
>>> Hi,
>>>
>>> sorry, I don't understand how tdbstats work. I ran it against the same graph that making the slow query and got the
>>> result below (some lines removed)
>>>
>>> Br,
>>> Mikael
>>>
>>> (stats
>>>   (meta
>>>     (timestamp "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>>     (run@ "2017/11/07 13:24:16 EET")
>>>     (count 165911))
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
>>>    3)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>>>    1098)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
>>>    897)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
>>>    36)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>>>    1)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Expression>)
>>>    3)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>>    29284)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
>>>    1623)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>>>    2)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
>>>    1100)
>>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
>>>    5)
>>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
>>> ...
>>>   (<http://purl.org/dc/elements/1.1/format> 4697)
>>>   (<http://purl.org/dc/terms/created> 725)
>>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>>   (<http://purl.org/dc/terms/type> 1624)
>>>   (<http://purl.org/dc/terms/accessRights> 2)
>>>   (<http://purl.org/dc/terms/identifier> 78)
>>> ...
>>>
>>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>>> Mikael,
>>>>
>>>> I can't find anything that makes rdf:type special.  Maybe some distribution of data is the cause but I'm not seeing it.
>>>>
>>>> Did you get a chance to get some stats?
>>>>
>>>>     Andy
>>>>
>>>>
>>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>>
>>>>> Tried this also with other properties such as dcterms:created, and it didnt slow down with them.
>>>>>
>>>>> -Mikael
>>>>>
>>>>>
>>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>>> In this case, stats won't help.  The <some resource> shoudl eb the starting point.
>>>>>>
>>>>>> (quadpattern
>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>>>>> )
>>>>>>
>>>>>> (quadpattern
>>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>>   (quad ?g ?s ?p2 ?o2)
>>>>>> )))
>>>>>>
>>>>>> Are you using inference as well?
>>>>>>
>>>>>> Is it the same <some resource>?
>>>>>>
>>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>>> Is TDB the underlying database?
>>>>>>>>
>>>>>>>> If so is there a stats.opt  file in your database directory?
>>>>>>>>
>>>>>>>> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You
>>>>>>>> might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type
>>>>>>>> based upon the guidance in the documentation:
>>>>>>>>
>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>>>
>>>>>>>> Also if this is a database which is being updated then the statistics can get out of date relative to the
>>>>>>>> database. You can use the commandline tdbstats tool to try regenerating this:
>>>>>>>>
>>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>>>
>>>>>>>> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB
>>>>>>>> database at a time
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>>>>>>>
>>>>>>>>      Hi, I have trouble understanding why the first query is slow and second
>>>>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>>      So I want to get all resources that reference <some resource>, and their
>>>>>>>>      types:
>>>>>>>>      SELECT * WHERE
>>>>>>>>      {
>>>>>>>>          GRAPH ?g
>>>>>>>>          {
>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>                   ?s a ?type
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>>      SELECT * WHERE
>>>>>>>>      {
>>>>>>>>          GRAPH ?g
>>>>>>>>          {
>>>>>>>>              ?s ?p <some resource> .
>>>>>>>>                   ?s ?p2 ?o2
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>>      First one takes 5 seconds which is too slow for our application. Can it
>>>>>>>>      be rearranged somehow to make fast? Sorry if this is not a correct forum
>>>>>>>>      for this.
>>>>>>>>      Thanks!
>>>>>>>>      --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for the help. So outputted stats into tmp file and moved to 
stats.opt into index folder.

Rerunning tdbstats seems to give same result still:

(stats
   (meta
     (timestamp 
"2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
     (run@ "2017/11/07 16:39:53 EET")
     (count 165865))
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Work>)
    2)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
    1097)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Manifestation>)
    896)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/2004/02/skos/core#Concept>)
    36)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/ns/dcat#CatalogRecord>)
    29284)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Text>)
    1622)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://xmlns.com/foaf/0.1/Document>)
    1097)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Collection>)
    5)
   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
   (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
(<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
   (<http://purl.org/dc/elements/1.1/format> 4696)
   (<http://purl.org/dc/terms/created> 723)
   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
   (<http://purl.org/dc/elements/1.1/description> 6)
   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
   (<http://purl.org/dc/terms/type> 1623)
   (<http://purl.org/dc/terms/accessRights> 1)
   (<http://purl.org/dc/terms/identifier> 78)
   (<http://purl.org/dc/terms/hasFormat> 3016)
   (<http://purl.org/dc/terms/modified> 30899)

On 7.11.2017 16:27, ajs6f@apache.org wrote:
> Take a look at the link Rob sent you again. Please read the _entire_ 
> page carefully. Under:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>
>
> You will see: "The command line tdbstats will scan the data and 
> produce a rules file based on the frequency of properties. The output 
> should first go to a temporary file, then that file moved into the 
> database location."
>
> You need to actually use the output of tdbstats by moving it into your 
> database directory.
>
>
> ajs6f
>
> Mikael Pesonen wrote on 11/7/17 6:30 AM:
>>
>> Hi,
>>
>> sorry, I don't understand how tdbstats work. I ran it against the 
>> same graph that making the slow query and got the
>> result below (some lines removed)
>>
>> Br,
>> Mikael
>>
>> (stats
>>   (meta
>>     (timestamp 
>> "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>>     (run@ "2017/11/07 13:24:16 EET")
>>     (count 165911))
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/vocab/frbr/core#Work>)
>>    3)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>) 
>>
>>    1098)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/vocab/frbr/core#Manifestation>)
>>    897)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://www.w3.org/2004/02/skos/core#Concept>)
>>    36)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>) 
>>
>>    1)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/vocab/frbr/core#Expression>)
>>    3)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://www.w3.org/ns/dcat#CatalogRecord>)
>>    29284)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/dc/dcmitype/Text>)
>>    1623)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>) 
>>
>>    2)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://xmlns.com/foaf/0.1/Document>)
>>    1100)
>>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>> <http://purl.org/dc/dcmitype/Collection>)
>>    5)
>>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
>> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 
>> 59)
>> ...
>>   (<http://purl.org/dc/elements/1.1/format> 4697)
>>   (<http://purl.org/dc/terms/created> 725)
>>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
>> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>>   (<http://purl.org/dc/elements/1.1/description> 6)
>>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>>   (<http://purl.org/dc/terms/type> 1624)
>>   (<http://purl.org/dc/terms/accessRights> 2)
>>   (<http://purl.org/dc/terms/identifier> 78)
>> ...
>>
>> On 30.10.2017 17:10, Andy Seaborne wrote:
>>> Mikael,
>>>
>>> I can't find anything that makes rdf:type special.  Maybe some 
>>> distribution of data is the cause but I'm not seeing it.
>>>
>>> Did you get a chance to get some stats?
>>>
>>>     Andy
>>>
>>>
>>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>>
>>>> Tried this also with other properties such as dcterms:created, and 
>>>> it didnt slow down with them.
>>>>
>>>> -Mikael
>>>>
>>>>
>>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>>> In this case, stats won't help.  The <some resource> shoudl eb the 
>>>>> starting point.
>>>>>
>>>>> (quadpattern
>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
>>>>> ?type)
>>>>> )
>>>>>
>>>>> (quadpattern
>>>>>   (quad ?g ?s ?p <some:resource>)
>>>>>   (quad ?g ?s ?p2 ?o2)
>>>>> )))
>>>>>
>>>>> Are you using inference as well?
>>>>>
>>>>> Is it the same <some resource>?
>>>>>
>>>>> Is the timing for the rdf:type variant on a cold system?
>>>>>
>>>>>     Andy
>>>>>
>>>>>
>>>>>
>>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> thanks! I'll try that when get chance to stop jena. Yes we are 
>>>>>> using TDB.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>>> Is TDB the underlying database?
>>>>>>>
>>>>>>> If so is there a stats.opt  file in your database directory?
>>>>>>>
>>>>>>> I remember there being issues in the past with the statistics 
>>>>>>> for rdf:type triples being wrongly prioritised. You
>>>>>>> might want to look at that file, assuming that it exists, and 
>>>>>>> you try adjusting values associated with rdf:type
>>>>>>> based upon the guidance in the documentation:
>>>>>>>
>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>>>>>
>>>>>>>
>>>>>>> Also if this is a database which is being updated then the 
>>>>>>> statistics can get out of date relative to the
>>>>>>> database. You can use the commandline tdbstats tool to try 
>>>>>>> regenerating this:
>>>>>>>
>>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>>>
>>>>>>>
>>>>>>> Note that you will need to stop Fuseki in order to run this as 
>>>>>>> only a single process is permitted to access a TDB
>>>>>>> database at a time
>>>>>>>
>>>>>>> Rob
>>>>>>>
>>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" 
>>>>>>> <mi...@lingsoft.fi> wrote:
>>>>>>>
>>>>>>>      Hi, I have trouble understanding why the first query is 
>>>>>>> slow and second
>>>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>>>      So I want to get all resources that reference <some 
>>>>>>> resource>, and their
>>>>>>>      types:
>>>>>>>      SELECT * WHERE
>>>>>>>      {
>>>>>>>          GRAPH ?g
>>>>>>>          {
>>>>>>>              ?s ?p <some resource> .
>>>>>>>                   ?s a ?type
>>>>>>>          }
>>>>>>>      }
>>>>>>>      SELECT * WHERE
>>>>>>>      {
>>>>>>>          GRAPH ?g
>>>>>>>          {
>>>>>>>              ?s ?p <some resource> .
>>>>>>>                   ?s ?p2 ?o2
>>>>>>>          }
>>>>>>>      }
>>>>>>>      First one takes 5 seconds which is too slow for our 
>>>>>>> application. Can it
>>>>>>>      be rearranged somehow to make fast? Sorry if this is not a 
>>>>>>> correct forum
>>>>>>>      for this.
>>>>>>>      Thanks!
>>>>>>>      --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by aj...@apache.org.
Take a look at the link Rob sent you again. Please read the _entire_ page carefully. Under:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of 
properties. The output should first go to a temporary file, then that file moved into the database location."

You need to actually use the output of tdbstats by moving it into your database directory.


ajs6f

Mikael Pesonen wrote on 11/7/17 6:30 AM:
>
> Hi,
>
> sorry, I don't understand how tdbstats work. I ran it against the same graph that making the slow query and got the
> result below (some lines removed)
>
> Br,
> Mikael
>
> (stats
>   (meta
>     (timestamp "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
>     (run@ "2017/11/07 13:24:16 EET")
>     (count 165911))
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
>    3)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
>    1098)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
>    897)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
>    36)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
>    1)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Expression>)
>    3)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
>    29284)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
>    1623)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
>    2)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
>    1100)
>   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
>    5)
>   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
>   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
> (<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
> ...
>   (<http://purl.org/dc/elements/1.1/format> 4697)
>   (<http://purl.org/dc/terms/created> 725)
>   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
> (<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
>   (<http://purl.org/dc/elements/1.1/description> 6)
>   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
>   (<http://purl.org/dc/terms/type> 1624)
>   (<http://purl.org/dc/terms/accessRights> 2)
>   (<http://purl.org/dc/terms/identifier> 78)
> ...
>
> On 30.10.2017 17:10, Andy Seaborne wrote:
>> Mikael,
>>
>> I can't find anything that makes rdf:type special.  Maybe some distribution of data is the cause but I'm not seeing it.
>>
>> Did you get a chance to get some stats?
>>
>>     Andy
>>
>>
>> On 27/10/17 12:27, Mikael Pesonen wrote:
>>>
>>> Tried this also with other properties such as dcterms:created, and it didnt slow down with them.
>>>
>>> -Mikael
>>>
>>>
>>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>>> In this case, stats won't help.  The <some resource> shoudl eb the starting point.
>>>>
>>>> (quadpattern
>>>>   (quad ?g ?s ?p <some:resource>)
>>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>>> )
>>>>
>>>> (quadpattern
>>>>   (quad ?g ?s ?p <some:resource>)
>>>>   (quad ?g ?s ?p2 ?o2)
>>>> )))
>>>>
>>>> Are you using inference as well?
>>>>
>>>> Is it the same <some resource>?
>>>>
>>>> Is the timing for the rdf:type variant on a cold system?
>>>>
>>>>     Andy
>>>>
>>>>
>>>>
>>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
>>>>>
>>>>>
>>>>>
>>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>>> Is TDB the underlying database?
>>>>>>
>>>>>> If so is there a stats.opt  file in your database directory?
>>>>>>
>>>>>> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You
>>>>>> might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type
>>>>>> based upon the guidance in the documentation:
>>>>>>
>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>>>>>>
>>>>>> Also if this is a database which is being updated then the statistics can get out of date relative to the
>>>>>> database. You can use the commandline tdbstats tool to try regenerating this:
>>>>>>
>>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>>>>>>
>>>>>> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB
>>>>>> database at a time
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>>>>>
>>>>>>      Hi, I have trouble understanding why the first query is slow and second
>>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>>      So I want to get all resources that reference <some resource>, and their
>>>>>>      types:
>>>>>>      SELECT * WHERE
>>>>>>      {
>>>>>>          GRAPH ?g
>>>>>>          {
>>>>>>              ?s ?p <some resource> .
>>>>>>                   ?s a ?type
>>>>>>          }
>>>>>>      }
>>>>>>      SELECT * WHERE
>>>>>>      {
>>>>>>          GRAPH ?g
>>>>>>          {
>>>>>>              ?s ?p <some resource> .
>>>>>>                   ?s ?p2 ?o2
>>>>>>          }
>>>>>>      }
>>>>>>      First one takes 5 seconds which is too slow for our application. Can it
>>>>>>      be rearranged somehow to make fast? Sorry if this is not a correct forum
>>>>>>      for this.
>>>>>>      Thanks!
>>>>>>      --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,

sorry, I don't understand how tdbstats work. I ran it against the same 
graph that making the slow query and got the result below (some lines 
removed)

Br,
Mikael

(stats
   (meta
     (timestamp 
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
     (run@ "2017/11/07 13:24:16 EET")
     (count 165911))
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Work>)
    3)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
    1098)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Manifestation>)
    897)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/2004/02/skos/core#Concept>)
    36)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
    1)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Expression>)
    3)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/ns/dcat#CatalogRecord>)
    29284)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Text>)
    1623)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
    2)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://xmlns.com/foaf/0.1/Document>)
    1100)
   ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Collection>)
    5)
   (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
   (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
   (<http://purl.org/dc/elements/1.1/format> 4697)
   (<http://purl.org/dc/terms/created> 725)
   (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
   (<http://purl.org/dc/elements/1.1/description> 6)
   (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
   (<http://purl.org/dc/terms/type> 1624)
   (<http://purl.org/dc/terms/accessRights> 2)
   (<http://purl.org/dc/terms/identifier> 78)
...

On 30.10.2017 17:10, Andy Seaborne wrote:
> Mikael,
>
> I can't find anything that makes rdf:type special.  Maybe some 
> distribution of data is the cause but I'm not seeing it.
>
> Did you get a chance to get some stats?
>
>     Andy
>
>
> On 27/10/17 12:27, Mikael Pesonen wrote:
>>
>> Tried this also with other properties such as dcterms:created, and it 
>> didnt slow down with them.
>>
>> -Mikael
>>
>>
>> On 27.10.2017 13:02, Andy Seaborne wrote:
>>> In this case, stats won't help.  The <some resource> shoudl eb the 
>>> starting point.
>>>
>>> (quadpattern
>>>   (quad ?g ?s ?p <some:resource>)
>>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>>> )
>>>
>>> (quadpattern
>>>   (quad ?g ?s ?p <some:resource>)
>>>   (quad ?g ?s ?p2 ?o2)
>>> )))
>>>
>>> Are you using inference as well?
>>>
>>> Is it the same <some resource>?
>>>
>>> Is the timing for the rdf:type variant on a cold system?
>>>
>>>     Andy
>>>
>>>
>>>
>>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>>
>>>> Hi,
>>>>
>>>> thanks! I'll try that when get chance to stop jena. Yes we are 
>>>> using TDB.
>>>>
>>>>
>>>>
>>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>>> Is TDB the underlying database?
>>>>>
>>>>> If so is there a stats.opt  file in your database directory?
>>>>>
>>>>> I remember there being issues in the past with the statistics for 
>>>>> rdf:type triples being wrongly prioritised. You might want to look 
>>>>> at that file, assuming that it exists, and you try adjusting 
>>>>> values associated with rdf:type based upon the guidance in the 
>>>>> documentation:
>>>>>
>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>>>
>>>>>
>>>>> Also if this is a database which is being updated then the 
>>>>> statistics can get out of date relative to the database. You can 
>>>>> use the commandline tdbstats tool to try regenerating this:
>>>>>
>>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>>
>>>>>
>>>>> Note that you will need to stop Fuseki in order to run this as 
>>>>> only a single process is permitted to access a TDB database at a time
>>>>>
>>>>> Rob
>>>>>
>>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> 
>>>>> wrote:
>>>>>
>>>>>      Hi, I have trouble understanding why the first query is slow 
>>>>> and second
>>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>>      So I want to get all resources that reference <some 
>>>>> resource>, and their
>>>>>      types:
>>>>>      SELECT * WHERE
>>>>>      {
>>>>>          GRAPH ?g
>>>>>          {
>>>>>              ?s ?p <some resource> .
>>>>>                   ?s a ?type
>>>>>          }
>>>>>      }
>>>>>      SELECT * WHERE
>>>>>      {
>>>>>          GRAPH ?g
>>>>>          {
>>>>>              ?s ?p <some resource> .
>>>>>                   ?s ?p2 ?o2
>>>>>          }
>>>>>      }
>>>>>      First one takes 5 seconds which is too slow for our 
>>>>> application. Can it
>>>>>      be rearranged somehow to make fast? Sorry if this is not a 
>>>>> correct forum
>>>>>      for this.
>>>>>      Thanks!
>>>>>      --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Andy Seaborne <an...@apache.org>.
Mikael,

I can't find anything that makes rdf:type special.  Maybe some 
distribution of data is the cause but I'm not seeing it.

Did you get a chance to get some stats?

     Andy


On 27/10/17 12:27, Mikael Pesonen wrote:
> 
> Tried this also with other properties such as dcterms:created, and it 
> didnt slow down with them.
> 
> -Mikael
> 
> 
> On 27.10.2017 13:02, Andy Seaborne wrote:
>> In this case, stats won't help.  The <some resource> shoudl eb the 
>> starting point.
>>
>> (quadpattern
>>   (quad ?g ?s ?p <some:resource>)
>>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
>> )
>>
>> (quadpattern
>>   (quad ?g ?s ?p <some:resource>)
>>   (quad ?g ?s ?p2 ?o2)
>> )))
>>
>> Are you using inference as well?
>>
>> Is it the same <some resource>?
>>
>> Is the timing for the rdf:type variant on a cold system?
>>
>>     Andy
>>
>>
>>
>> On 27/10/17 10:22, Mikael Pesonen wrote:
>>>
>>> Hi,
>>>
>>> thanks! I'll try that when get chance to stop jena. Yes we are using 
>>> TDB.
>>>
>>>
>>>
>>> On 26.10.2017 16:15, Rob Vesse wrote:
>>>> Is TDB the underlying database?
>>>>
>>>> If so is there a stats.opt  file in your database directory?
>>>>
>>>> I remember there being issues in the past with the statistics for 
>>>> rdf:type triples being wrongly prioritised. You might want to look 
>>>> at that file, assuming that it exists, and you try adjusting values 
>>>> associated with rdf:type based upon the guidance in the documentation:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>>
>>>>
>>>> Also if this is a database which is being updated then the 
>>>> statistics can get out of date relative to the database. You can use 
>>>> the commandline tdbstats tool to try regenerating this:
>>>>
>>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>>
>>>>
>>>> Note that you will need to stop Fuseki in order to run this as only 
>>>> a single process is permitted to access a TDB database at a time
>>>>
>>>> Rob
>>>>
>>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> 
>>>> wrote:
>>>>
>>>>      Hi, I have trouble understanding why the first query is slow 
>>>> and second
>>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>>      So I want to get all resources that reference <some resource>, 
>>>> and their
>>>>      types:
>>>>      SELECT * WHERE
>>>>      {
>>>>          GRAPH ?g
>>>>          {
>>>>              ?s ?p <some resource> .
>>>>                   ?s a ?type
>>>>          }
>>>>      }
>>>>      SELECT * WHERE
>>>>      {
>>>>          GRAPH ?g
>>>>          {
>>>>              ?s ?p <some resource> .
>>>>                   ?s ?p2 ?o2
>>>>          }
>>>>      }
>>>>      First one takes 5 seconds which is too slow for our 
>>>> application. Can it
>>>>      be rearranged somehow to make fast? Sorry if this is not a 
>>>> correct forum
>>>>      for this.
>>>>      Thanks!
>>>>      --
>>>>
>>>>
>>>>
>>>>
>>>
> 

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Tried this also with other properties such as dcterms:created, and it 
didnt slow down with them.

-Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:
> In this case, stats won't help.  The <some resource> shoudl eb the 
> starting point.
>
> (quadpattern
>   (quad ?g ?s ?p <some:resource>)
>   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
> )
>
> (quadpattern
>   (quad ?g ?s ?p <some:resource>)
>   (quad ?g ?s ?p2 ?o2)
> )))
>
> Are you using inference as well?
>
> Is it the same <some resource>?
>
> Is the timing for the rdf:type variant on a cold system?
>
>     Andy
>
>
>
> On 27/10/17 10:22, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> thanks! I'll try that when get chance to stop jena. Yes we are using 
>> TDB.
>>
>>
>>
>> On 26.10.2017 16:15, Rob Vesse wrote:
>>> Is TDB the underlying database?
>>>
>>> If so is there a stats.opt  file in your database directory?
>>>
>>> I remember there being issues in the past with the statistics for 
>>> rdf:type triples being wrongly prioritised. You might want to look 
>>> at that file, assuming that it exists, and you try adjusting values 
>>> associated with rdf:type based upon the guidance in the documentation:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>>
>>>
>>> Also if this is a database which is being updated then the 
>>> statistics can get out of date relative to the database. You can use 
>>> the commandline tdbstats tool to try regenerating this:
>>>
>>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>>
>>>
>>> Note that you will need to stop Fuseki in order to run this as only 
>>> a single process is permitted to access a TDB database at a time
>>>
>>> Rob
>>>
>>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> 
>>> wrote:
>>>
>>>      Hi, I have trouble understanding why the first query is slow 
>>> and second
>>>      one is fast. Using Jena Fuseki 3.4.0.
>>>      So I want to get all resources that reference <some resource>, 
>>> and their
>>>      types:
>>>      SELECT * WHERE
>>>      {
>>>          GRAPH ?g
>>>          {
>>>              ?s ?p <some resource> .
>>>                   ?s a ?type
>>>          }
>>>      }
>>>      SELECT * WHERE
>>>      {
>>>          GRAPH ?g
>>>          {
>>>              ?s ?p <some resource> .
>>>                   ?s ?p2 ?o2
>>>          }
>>>      }
>>>      First one takes 5 seconds which is too slow for our 
>>> application. Can it
>>>      be rearranged somehow to make fast? Sorry if this is not a 
>>> correct forum
>>>      for this.
>>>      Thanks!
>>>      --
>>>
>>>
>>>
>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Andy Seaborne <an...@apache.org>.
In this case, stats won't help.  The <some resource> shoudl eb the 
starting point.

(quadpattern
   (quad ?g ?s ?p <some:resource>)
   (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)

(quadpattern
   (quad ?g ?s ?p <some:resource>)
   (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same <some resource>?

Is the timing for the rdf:type variant on a cold system?

     Andy



On 27/10/17 10:22, Mikael Pesonen wrote:
> 
> Hi,
> 
> thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
> 
> 
> 
> On 26.10.2017 16:15, Rob Vesse wrote:
>> Is TDB the underlying database?
>>
>> If so is there a stats.opt  file in your database directory?
>>
>> I remember there being issues in the past with the statistics for 
>> rdf:type triples being wrongly prioritised. You might want to look at 
>> that file, assuming that it exists, and you try adjusting values 
>> associated with rdf:type based upon the guidance in the documentation:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 
>>
>>
>> Also if this is a database which is being updated then the statistics 
>> can get out of date relative to the database. You can use the 
>> commandline tdbstats tool to try regenerating this:
>>
>> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 
>>
>>
>> Note that you will need to stop Fuseki in order to run this as only a 
>> single process is permitted to access a TDB database at a time
>>
>> Rob
>>
>> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>>
>>      Hi, I have trouble understanding why the first query is slow and 
>> second
>>      one is fast. Using Jena Fuseki 3.4.0.
>>      So I want to get all resources that reference <some resource>, 
>> and their
>>      types:
>>      SELECT * WHERE
>>      {
>>          GRAPH ?g
>>          {
>>              ?s ?p <some resource> .
>>                   ?s a ?type
>>          }
>>      }
>>      SELECT * WHERE
>>      {
>>          GRAPH ?g
>>          {
>>              ?s ?p <some resource> .
>>                   ?s ?p2 ?o2
>>          }
>>      }
>>      First one takes 5 seconds which is too slow for our application. 
>> Can it
>>      be rearranged somehow to make fast? Sorry if this is not a 
>> correct forum
>>      for this.
>>      Thanks!
>>      --
>>
>>
>>
>>
> 

Re: Slow query when getting rdf:type

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using TDB.



On 26.10.2017 16:15, Rob Vesse wrote:
> Is TDB the underlying database?
>
> If so is there a stats.opt  file in your database directory?
>
> I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type based upon the guidance in the documentation:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
>
> Also if this is a database which is being updated then the statistics can get out of date relative to the database. You can use the commandline tdbstats tool to try regenerating this:
>
> http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
>
> Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB database at a time
>
> Rob
>
> On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>      
>      Hi, I have trouble understanding why the first query is slow and second
>      one is fast. Using Jena Fuseki 3.4.0.
>      
>      So I want to get all resources that reference <some resource>, and their
>      types:
>      
>      SELECT * WHERE
>      {
>      	GRAPH ?g
>      	{
>      		?s ?p <some resource> .
>           		?s a ?type
>      	}
>      }
>      
>      SELECT * WHERE
>      {
>      	GRAPH ?g
>      	{
>      		?s ?p <some resource> .
>           		?s ?p2 ?o2
>      	}
>      }
>      
>      
>      First one takes 5 seconds which is too slow for our application. Can it
>      be rearranged somehow to make fast? Sorry if this is not a correct forum
>      for this.
>      
>      Thanks!
>      
>      --
>      
>      
>
>
>
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Slow query when getting rdf:type

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type based upon the guidance in the documentation:

http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file

Also if this is a database which is being updated then the statistics can get out of date relative to the database. You can use the commandline tdbstats tool to try regenerating this:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB database at a time

Rob

On 26/10/2017 13:47, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:

    
    Hi, I have trouble understanding why the first query is slow and second 
    one is fast. Using Jena Fuseki 3.4.0.
    
    So I want to get all resources that reference <some resource>, and their 
    types:
    
    SELECT * WHERE
    {
    	GRAPH ?g
    	{
    		?s ?p <some resource> .
         		?s a ?type
    	}
    }
    
    SELECT * WHERE
    {
    	GRAPH ?g
    	{
    		?s ?p <some resource> .
         		?s ?p2 ?o2
    	}
    }
    
    
    First one takes 5 seconds which is too slow for our application. Can it 
    be rearranged somehow to make fast? Sorry if this is not a correct forum 
    for this.
    
    Thanks!
    
    --