You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Sarven Capadisli <in...@csarven.ca> on 2012/05/12 19:41:35 UTC

TDB Optimizer

Hi TDBsters,

I was just getting back to playing around with tdbstats to optimize some 
query responses [1]. At first I couldn't get any counts, but then 
discovered that "tdbstats only looks in the default graph (currently)" 
[2]. So, I went on to pass one of the graph IRIs, lo and behold, I got 
some stats for that graph.

What I'm wondering is, how can I create stats for all of the graphs in 
the store and possibly have them represented in stats.opt? Creating 
stats for each graph and then merging them (with proper counts) into a 
single stats.opt seems cumbersome and possibly incorrect. After all, I 
want to be able to query the store in a way that the generated 
statistics reflects the store in the most accurate way possible.

Also.. what is the location of rule file where we write the Statistics 
Rule Language? I can't see how that supposed to go into stats.opt 
because then how would we obtain the counts for the triple patterns?

Thanks!

[1] http://jena.apache.org/documentation/tdb/optimizer.html
[2] http://tech.groups.yahoo.com/group/jena-dev/message/44946

-Sarven

Re: TDB Optimizer

Posted by Sarven Capadisli <in...@csarven.ca>.

On 12-05-15 10:54 AM, Svatopluk Šperka wrote:
> Hi,
>
> what about using something like "tdbstats --loc=a/b/c --graph=urn:x-arq:UnionGraph". tdbstats should accept urn:x-arg:UnionGraph according to http://jena.apache.org/documentation/tdb/datasets.html .
>
> 	Svatopluk
>
> On May 14, 2012, at 11:13 PM, Andy Seaborne wrote:
>
>> On 12/05/12 18:41, Sarven Capadisli wrote:
>>> Hi TDBsters,
>>>
>>> I was just getting back to playing around with tdbstats to optimize some
>>> query responses [1]. At first I couldn't get any counts, but then
>>> discovered that "tdbstats only looks in the default graph (currently)"
>>> [2]. So, I went on to pass one of the graph IRIs, lo and behold, I got
>>> some stats for that graph.
>>>
>>> What I'm wondering is, how can I create stats for all of the graphs in
>>> the store and possibly have them represented in stats.opt? Creating
>>> stats for each graph and then merging them (with proper counts) into a
>>> single stats.opt seems cumbersome and possibly incorrect. After all, I
>>> want to be able to query the store in a way that the generated
>>> statistics reflects the store in the most accurate way possible.
>>>
>>> Also.. what is the location of rule file where we write the Statistics
>>> Rule Language? I can't see how that supposed to go into stats.opt
>>> because then how would we obtain the counts for the triple patterns?
>>>
>>> Thanks!
>>>
>>> [1] http://jena.apache.org/documentation/tdb/optimizer.html
>>> [2] http://tech.groups.yahoo.com/group/jena-dev/message/44946
>>>
>>> -Sarven
>>
>> Sarven,
>>
>> There isn't a way to capture proper stats across named graphs currently.  The stats are applied to every graph, so it's not sensitive to which graph (named or default).
>>
>> The stats.opt file is the rules file.  Be careful not to overwrite :-(
>>
>> 	Andy
>
>

Thanks Svatopluk. It looks like that did the trick as far as getting the 
stats for the union of all graphs. However, at the moment, I'm uncertain 
what that entails.

It makes me wonder whether the application of stats can be improved. 
That is, can the stats be more granular on a per graph basis? If GRAPH 
is used in a SPARQL query, can the stats be better used? Would that mean 
extending the Statistics Rule Language to use a graph variable as well?

Otherwise, I'm unsure how using anything but 
--graph=urn:x-arq:UnionGraph or urn:x-arq:DefaultGraph is actually 
helpful for obtaining a stats file. Because, if a single stats file is 
used (which comes from a single named graph), it excludes statistics 
about the other graphs.

Do I understand this problem correctly?

Thanks,

-Sarven

Re: TDB Optimizer

Posted by Svatopluk Šperka <sp...@gmail.com>.

Hi,

what about using something like "tdbstats --loc=a/b/c --graph=urn:x-arq:UnionGraph". tdbstats should accept urn:x-arg:UnionGraph according to http://jena.apache.org/documentation/tdb/datasets.html .

	Svatopluk

On May 14, 2012, at 11:13 PM, Andy Seaborne wrote:

> On 12/05/12 18:41, Sarven Capadisli wrote:
>> Hi TDBsters,
>> 
>> I was just getting back to playing around with tdbstats to optimize some
>> query responses [1]. At first I couldn't get any counts, but then
>> discovered that "tdbstats only looks in the default graph (currently)"
>> [2]. So, I went on to pass one of the graph IRIs, lo and behold, I got
>> some stats for that graph.
>> 
>> What I'm wondering is, how can I create stats for all of the graphs in
>> the store and possibly have them represented in stats.opt? Creating
>> stats for each graph and then merging them (with proper counts) into a
>> single stats.opt seems cumbersome and possibly incorrect. After all, I
>> want to be able to query the store in a way that the generated
>> statistics reflects the store in the most accurate way possible.
>> 
>> Also.. what is the location of rule file where we write the Statistics
>> Rule Language? I can't see how that supposed to go into stats.opt
>> because then how would we obtain the counts for the triple patterns?
>> 
>> Thanks!
>> 
>> [1] http://jena.apache.org/documentation/tdb/optimizer.html
>> [2] http://tech.groups.yahoo.com/group/jena-dev/message/44946
>> 
>> -Sarven
> 
> Sarven,
> 
> There isn't a way to capture proper stats across named graphs currently.  The stats are applied to every graph, so it's not sensitive to which graph (named or default).
> 
> The stats.opt file is the rules file.  Be careful not to overwrite :-(
> 
> 	Andy

Re: TDB Optimizer

Posted by Andy Seaborne <an...@apache.org>.

On 12/05/12 18:41, Sarven Capadisli wrote:
> Hi TDBsters,
>
> I was just getting back to playing around with tdbstats to optimize some
> query responses [1]. At first I couldn't get any counts, but then
> discovered that "tdbstats only looks in the default graph (currently)"
> [2]. So, I went on to pass one of the graph IRIs, lo and behold, I got
> some stats for that graph.
>
> What I'm wondering is, how can I create stats for all of the graphs in
> the store and possibly have them represented in stats.opt? Creating
> stats for each graph and then merging them (with proper counts) into a
> single stats.opt seems cumbersome and possibly incorrect. After all, I
> want to be able to query the store in a way that the generated
> statistics reflects the store in the most accurate way possible.
>
> Also.. what is the location of rule file where we write the Statistics
> Rule Language? I can't see how that supposed to go into stats.opt
> because then how would we obtain the counts for the triple patterns?
>
> Thanks!
>
> [1] http://jena.apache.org/documentation/tdb/optimizer.html
> [2] http://tech.groups.yahoo.com/group/jena-dev/message/44946
>
> -Sarven

Sarven,

There isn't a way to capture proper stats across named graphs currently. 
  The stats are applied to every graph, so it's not sensitive to which 
graph (named or default).

The stats.opt file is the rules file.  Be careful not to overwrite :-(

	Andy