You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by ch...@info-cast.com on 2013/09/10 00:25:27 UTC

huge number of rules for a few RDF statements?

Hi,

I'm considering the Jena Rules as a rule-based programming model
where rules are being discovered and accumulated to grow tens of
thousand, while the fact for inferring new info is only a few
RDF statements. In this case, the rule engine may have to check
each and every rule for the fact to find out the one matching
the statements - which may imply a scaling issue.

Or, should the rules be organized into a set of category, and
the statement is classified first to select the matching rule
set to reduce the rule processing time ?

Will appreciate your insights,

Chan

Re: huge number of rules for a few RDF statements?

Posted by ch...@info-cast.com.

Hi Joshua,

Thanks for valuable suggestion - I'll certainly try categorizing
the rule set by improving the rule generator. It'd be interesting
to compare the rule engine performance of categorized rules against
the union of all rules.

Best

Chan

>> I'm considering the Jena Rules as a rule-based programming model
>> where rules are being discovered and accumulated to grow tens of
>> thousand, while the fact for inferring new info is only a few
>> RDF statements. In this case, the rule engine may have to check
>> each and every rule for the fact to find out the one matching
>> the statements - which may imply a scaling issue.
>> 
>> Or, should the rules be organized into a set of category, and
>> the statement is classified first to select the matching rule
>> set to reduce the rule processing time ?
> 
> I'm not sure whether this is appropriate on the dev@jena.apache.org
> list, so I'm only replying on the users list.  The forward-chaining
> RETE engine, as I understand it, does some optimization in determining
> what triples will match to what rules, so the scaling issue might not
> be as much of an issue as you suspect.  However, it doesn't like it's
> too difficult to try out and compare the different approaches (split
> up your rules into different categories, then apply reasoning with
> just individual categories, and then again with the union of all the
> rulesets).  Have you run into scaling issues yet?
> 
> //JT

Re: huge number of rules for a few RDF statements ?

Posted by Joshua TAYLOR <jo...@gmail.com>.

On Mon, Sep 9, 2013 at 6:25 PM,  <ch...@info-cast.com> wrote:
> I'm considering the Jena Rules as a rule-based programming model
> where rules are being discovered and accumulated to grow tens of
> thousand, while the fact for inferring new info is only a few
> RDF statements. In this case, the rule engine may have to check
> each and every rule for the fact to find out the one matching
> the statements - which may imply a scaling issue.
>
> Or, should the rules be organized into a set of category, and
> the statement is classified first to select the matching rule
> set to reduce the rule processing time ?

I'm not sure whether this is appropriate on the dev@jena.apache.org
list, so I'm only replying on the users list.  The forward-chaining
RETE engine, as I understand it, does some optimization in determining
what triples will match to what rules, so the scaling issue might not
be as much of an issue as you suspect.  However, it doesn't like it's
too difficult to try out and compare the different approaches (split
up your rules into different categories, then apply reasoning with
just individual categories, and then again with the union of all the
rulesets).  Have you run into scaling issues yet?

//JT
-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

Re: huge number of rules for a few RDF statements?

Posted by ch...@info-cast.com.

Hi Dave,

Thanks for your valuable insights helping me how to structure
the rule internals to improve the scalability. I'll do some
research on my rule generator to implement that. The scaling
issue is to execute the rule engine in realtime to catch up
with real world message streams like tweets.

Best

Chan

> Hi,
> 
> On 09/09/13 23:25, chan@info-cast.com wrote:
>> Hi,
>> 
>> I'm considering the Jena Rules as a rule-based programming model
>> where rules are being discovered and accumulated to grow tens of
>> thousand, while the fact for inferring new info is only a few
>> RDF statements. In this case, the rule engine may have to check
>> each and every rule for the fact to find out the one matching
>> the statements - which may imply a scaling issue.
>> 
>> Or, should the rules be organized into a set of category, and
>> the statement is classified first to select the matching rule
>> set to reduce the rule processing time ?
>> 
>> Will appreciate your insights,
> 
> In theory the primary scaling issue in this case should be the number
> of distinct patterns in the rules rather than the number of rules. In
> RETE the rules are implemented as a pattern matching network and facts
> are dropped in.
> 
> However, in practice the Jena rules implementation is crude and
> hasn't been designed or tested on huge numbers of rules. So the
> network it produces may be suboptimal (especially if grown
> incrementally) and there is no indexing in the cases where one node
> fans out to a very large number of child nodes. Given the simplicity
> of the Jena implementation then at least putting the more
> discriminating patterns at the start of the rules is likely to help.
> 
> The only way to check if Jena could cope with this would be to run
> some representative tests.
> 
> Dave

Re: huge number of rules for a few RDF statements ?

Posted by Dave Reynolds <da...@gmail.com>.

Hi,

On 09/09/13 23:25, chan@info-cast.com wrote:
> Hi,
>
> I'm considering the Jena Rules as a rule-based programming model
> where rules are being discovered and accumulated to grow tens of
> thousand, while the fact for inferring new info is only a few
> RDF statements. In this case, the rule engine may have to check
> each and every rule for the fact to find out the one matching
> the statements - which may imply a scaling issue.
>
> Or, should the rules be organized into a set of category, and
> the statement is classified first to select the matching rule
> set to reduce the rule processing time ?
>
> Will appreciate your insights,

In theory the primary scaling issue in this case should be the number of 
distinct patterns in the rules rather than the number of rules. In RETE 
the rules are implemented as a pattern matching network and facts are 
dropped in.

However, in practice the Jena rules implementation is crude and hasn't 
been designed or tested on huge numbers of rules. So the network it 
produces may be suboptimal (especially if grown incrementally) and there 
is no indexing in the cases where one node fans out to a very large 
number of child nodes. Given the simplicity of the Jena implementation 
then at least putting the more discriminating patterns at the start of 
the rules is likely to help.

The only way to check if Jena could cope with this would be to run some 
representative tests.

Dave