You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2012/08/04 18:49:41 UTC

Re: Fuseki 0.2.3 performance / StatsMatcher warnings

Hi Michael.

(Michael has sent me a copy of this database he's using - a bit big too 
email even though it's only 157,809,969 triples and 44Gbytes).

I can recreate this - I'm getting a total of 5006ms for the query set of 
17 queries you're using.

With a fix, it's 150ms made up of approx 50ms of server-side execution, 
100ms of HTTP networking and results transmission for all 17 queries.

With the workaround, it's about 170ms.

Sorry - the fix will not make the next release, which is already built. 
  Also, my current fix does the right thing for your case but I want 
make sure it's not got any concurrency problems.


Workaround:

Remove the "stats.opt" from the database directory and create a file 
"fixed.opt" in that directory. An empty "fixed.opt" is fine - it's not 
actually read; it's the presence that matters.   Caution - you need to 
get rid of the stats.opt file as it's used in preference.

You'll need to see if the change affects other, more complex queries.
dbpedia as a very unusual dataset at the best of times (42K different 
unique properties). Depressingly, fixed.opt does a reasonably job of 
optimizing.  It simply looks for more tightly constrained triple 
patterns and mildly avoid rdf:type.

(The other optimizer option is "none.opt" when BGPs in queries are 
executed in the order written.  Good for control and experimentation.)


Explanation:

I said:
[[
But the performance is not to do with stats - this is all single quad 
lookup.
]]

Not quite true :-)  While the statistics themselves don't matter, the 
system is reading stats.opt too often.  Normally, this isn't too 
important because the file is small, heavily cached and fast to parse 
(it still shouldn't do it).  But the dbpedia stats.opt is big at 2.1 
Mbytes and 42K entries.

	Andy


Re: Fuseki 0.2.3 performance / StatsMatcher warnings

Posted by Andy Seaborne <an...@apache.org>.
On 05/08/12 12:03, Michael Brunnbauer wrote:
>
> Hello Andy,
>
> much better now. Thank you very much :-)

Good!

>
> I guess it's OK to rename stats.opt to stats.opt.old instead of moving it
> out of the directory ?

Yes - it's the specific name "stats.opt" that the code looks for.

	Andy

>
> Regards,
>
> Michael Brunnbauer
>
> On Sat, Aug 04, 2012 at 05:49:41PM +0100, Andy Seaborne wrote:
>> Hi Michael.
>>
>> (Michael has sent me a copy of this database he's using - a bit big too
>> email even though it's only 157,809,969 triples and 44Gbytes).
>>
>> I can recreate this - I'm getting a total of 5006ms for the query set of
>> 17 queries you're using.
>>
>> With a fix, it's 150ms made up of approx 50ms of server-side execution,
>> 100ms of HTTP networking and results transmission for all 17 queries.
>>
>> With the workaround, it's about 170ms.
>>
>> Sorry - the fix will not make the next release, which is already built.
>>   Also, my current fix does the right thing for your case but I want
>> make sure it's not got any concurrency problems.
>>
>>
>> Workaround:
>>
>> Remove the "stats.opt" from the database directory and create a file
>> "fixed.opt" in that directory. An empty "fixed.opt" is fine - it's not
>> actually read; it's the presence that matters.   Caution - you need to
>> get rid of the stats.opt file as it's used in preference.
>>
>> You'll need to see if the change affects other, more complex queries.
>> dbpedia as a very unusual dataset at the best of times (42K different
>> unique properties). Depressingly, fixed.opt does a reasonably job of
>> optimizing.  It simply looks for more tightly constrained triple
>> patterns and mildly avoid rdf:type.
>>
>> (The other optimizer option is "none.opt" when BGPs in queries are
>> executed in the order written.  Good for control and experimentation.)
>>
>>
>> Explanation:
>>
>> I said:
>> [[
>> But the performance is not to do with stats - this is all single quad
>> lookup.
>> ]]
>>
>> Not quite true :-)  While the statistics themselves don't matter, the
>> system is reading stats.opt too often.  Normally, this isn't too
>> important because the file is small, heavily cached and fast to parse
>> (it still shouldn't do it).  But the dbpedia stats.opt is big at 2.1
>> Mbytes and 42K entries.
>>
>> 	Andy
>


Re: Fuseki 0.2.3 performance / StatsMatcher warnings

Posted by Michael Brunnbauer <br...@netestate.de>.
Hello Andy,

much better now. Thank you very much :-)

I guess it's OK to rename stats.opt to stats.opt.old instead of moving it
out of the directory ?

Regards,

Michael Brunnbauer

On Sat, Aug 04, 2012 at 05:49:41PM +0100, Andy Seaborne wrote:
> Hi Michael.
> 
> (Michael has sent me a copy of this database he's using - a bit big too 
> email even though it's only 157,809,969 triples and 44Gbytes).
> 
> I can recreate this - I'm getting a total of 5006ms for the query set of 
> 17 queries you're using.
> 
> With a fix, it's 150ms made up of approx 50ms of server-side execution, 
> 100ms of HTTP networking and results transmission for all 17 queries.
> 
> With the workaround, it's about 170ms.
> 
> Sorry - the fix will not make the next release, which is already built. 
>  Also, my current fix does the right thing for your case but I want 
> make sure it's not got any concurrency problems.
> 
> 
> Workaround:
> 
> Remove the "stats.opt" from the database directory and create a file 
> "fixed.opt" in that directory. An empty "fixed.opt" is fine - it's not 
> actually read; it's the presence that matters.   Caution - you need to 
> get rid of the stats.opt file as it's used in preference.
> 
> You'll need to see if the change affects other, more complex queries.
> dbpedia as a very unusual dataset at the best of times (42K different 
> unique properties). Depressingly, fixed.opt does a reasonably job of 
> optimizing.  It simply looks for more tightly constrained triple 
> patterns and mildly avoid rdf:type.
> 
> (The other optimizer option is "none.opt" when BGPs in queries are 
> executed in the order written.  Good for control and experimentation.)
> 
> 
> Explanation:
> 
> I said:
> [[
> But the performance is not to do with stats - this is all single quad 
> lookup.
> ]]
> 
> Not quite true :-)  While the statistics themselves don't matter, the 
> system is reading stats.opt too often.  Normally, this isn't too 
> important because the file is small, heavily cached and fast to parse 
> (it still shouldn't do it).  But the dbpedia stats.opt is big at 2.1 
> Mbytes and 42K entries.
> 
> 	Andy

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel