You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Luangsay Sourygna <lu...@gmail.com> on 2012/10/19 21:25:10 UTC

rules engine with Hadoop

Hi,

Does anyone know any (opensource) project that builds a rules engine
(based on RETE) on top Hadoop?
Searching a bit on the net, I have only seen a small reference to
Concord/IBM but there is barely any information available (and surely
it is not open source).

Alpha and beta memories would be stored on HBase. Should be possible, no?

Regards,

Sourygna

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

That probably means that your problem is pretty easy.

Just code up a standard rules engine into a mapper.  You can also build a
user defined function (UDF) in Pig or Hive and Hadoop will handle the
parallelism for you.

On Sat, Oct 20, 2012 at 6:48 AM, Luangsay Sourygna <lu...@gmail.com>wrote:

> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).
>

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

That probably means that your problem is pretty easy.

Just code up a standard rules engine into a mapper.  You can also build a
user defined function (UDF) in Pig or Hive and Hadoop will handle the
parallelism for you.

On Sat, Oct 20, 2012 at 6:48 AM, Luangsay Sourygna <lu...@gmail.com>wrote:

> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).
>

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

That probably means that your problem is pretty easy.

Just code up a standard rules engine into a mapper.  You can also build a
user defined function (UDF) in Pig or Hive and Hadoop will handle the
parallelism for you.

On Sat, Oct 20, 2012 at 6:48 AM, Luangsay Sourygna <lu...@gmail.com>wrote:

> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

the number of rules isn't as important as how the rules are written.

Generally speaking, if you're using a RETE rule engine, the key is
making sure you use rule chaining properly.

I've seen people write really huge rule like they're writing java,
which ends up being a horrible mess. As along as the rules make use of
proper rule chaining, the actions of the rules become output of the
map phase.

In practice though, it might not always be possible to do this, so
your mileage will vary.

On Sat, Oct 20, 2012 at 9:48 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

the number of rules isn't as important as how the rules are written.

Generally speaking, if you're using a RETE rule engine, the key is
making sure you use rule chaining properly.

I've seen people write really huge rule like they're writing java,
which ends up being a horrible mess. As along as the rules make use of
proper rule chaining, the actions of the rules become output of the
map phase.

In practice though, it might not always be possible to do this, so
your mileage will vary.

On Sat, Oct 20, 2012 at 9:48 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

That probably means that your problem is pretty easy.

Just code up a standard rules engine into a mapper.  You can also build a
user defined function (UDF) in Pig or Hive and Hadoop will handle the
parallelism for you.

On Sat, Oct 20, 2012 at 6:48 AM, Luangsay Sourygna <lu...@gmail.com>wrote:

> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

the number of rules isn't as important as how the rules are written.

Generally speaking, if you're using a RETE rule engine, the key is
making sure you use rule chaining properly.

I've seen people write really huge rule like they're writing java,
which ends up being a horrible mess. As along as the rules make use of
proper rule chaining, the actions of the rules become output of the
map phase.

In practice though, it might not always be possible to do this, so
your mileage will vary.

On Sat, Oct 20, 2012 at 9:48 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

the number of rules isn't as important as how the rules are written.

Generally speaking, if you're using a RETE rule engine, the key is
making sure you use rule chaining properly.

I've seen people write really huge rule like they're writing java,
which ends up being a horrible mess. As along as the rules make use of
proper rule chaining, the actions of the rules become output of the
map phase.

In practice though, it might not always be possible to do this, so
your mileage will vary.

On Sat, Oct 20, 2012 at 9:48 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> My problem would be similar to the first option you write:
> I have a few number of rules (let's say, < 1000) and a huge number of
> inputs (= big data part).

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

My problem would be similar to the first option you write:
I have a few number of rules (let's say, < 1000) and a huge number of
inputs (= big data part).

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

My problem would be similar to the first option you write:
I have a few number of rules (let's say, < 1000) and a huge number of
inputs (= big data part).

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

My problem would be similar to the first option you write:
I have a few number of rules (let's say, < 1000) and a huge number of
inputs (= big data part).

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

embedding a rule engine in map/reduce makes much more sense, but as
Ted points out scaling it isn't easy.

As long as you break the reasoning into map/reduce stages, it should
work. The devil is in the details and you have to write the rules
efficiently to achieve the goal.


On Fri, Oct 19, 2012 at 3:45 PM, Ted Dunning <td...@maprtech.com> wrote:
> Unification in a parallel cluster is a difficult problem.  Writing very
> large scale unification programs is an even harder problem.
>
> What problem are you trying to solve?
>
> One option would be that you need to evaluate a conventionally-sized
> rulebase against many inputs.  Map-reduce should be trivially capable of
> this.
>
> Another option would be that you want to evaluate a huge rulebase against a
> few inputs.  It isn't clear that this would be useful given the problems of
> huge rulebases and the typically super-linear cost of resolution algorithms.
>
> Another option is that you want to evaluate many conventionally-sized
> rulebases against one or many inputs in order to implement a boosted rule
> engine.  Map-reduce should be relatively trivial for this as well.
>
> What is it that you are trying to do?
>
>
> On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Does anyone know any (opensource) project that builds a rules engine
>> (based on RETE) on top Hadoop?
>> Searching a bit on the net, I have only seen a small reference to
>> Concord/IBM but there is barely any information available (and surely
>> it is not open source).
>>
>> Alpha and beta memories would be stored on HBase. Should be possible, no?
>>
>> Regards,
>>
>> Sourygna
>
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

embedding a rule engine in map/reduce makes much more sense, but as
Ted points out scaling it isn't easy.

As long as you break the reasoning into map/reduce stages, it should
work. The devil is in the details and you have to write the rules
efficiently to achieve the goal.


On Fri, Oct 19, 2012 at 3:45 PM, Ted Dunning <td...@maprtech.com> wrote:
> Unification in a parallel cluster is a difficult problem.  Writing very
> large scale unification programs is an even harder problem.
>
> What problem are you trying to solve?
>
> One option would be that you need to evaluate a conventionally-sized
> rulebase against many inputs.  Map-reduce should be trivially capable of
> this.
>
> Another option would be that you want to evaluate a huge rulebase against a
> few inputs.  It isn't clear that this would be useful given the problems of
> huge rulebases and the typically super-linear cost of resolution algorithms.
>
> Another option is that you want to evaluate many conventionally-sized
> rulebases against one or many inputs in order to implement a boosted rule
> engine.  Map-reduce should be relatively trivial for this as well.
>
> What is it that you are trying to do?
>
>
> On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Does anyone know any (opensource) project that builds a rules engine
>> (based on RETE) on top Hadoop?
>> Searching a bit on the net, I have only seen a small reference to
>> Concord/IBM but there is barely any information available (and surely
>> it is not open source).
>>
>> Alpha and beta memories would be stored on HBase. Should be possible, no?
>>
>> Regards,
>>
>> Sourygna
>
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

embedding a rule engine in map/reduce makes much more sense, but as
Ted points out scaling it isn't easy.

As long as you break the reasoning into map/reduce stages, it should
work. The devil is in the details and you have to write the rules
efficiently to achieve the goal.


On Fri, Oct 19, 2012 at 3:45 PM, Ted Dunning <td...@maprtech.com> wrote:
> Unification in a parallel cluster is a difficult problem.  Writing very
> large scale unification programs is an even harder problem.
>
> What problem are you trying to solve?
>
> One option would be that you need to evaluate a conventionally-sized
> rulebase against many inputs.  Map-reduce should be trivially capable of
> this.
>
> Another option would be that you want to evaluate a huge rulebase against a
> few inputs.  It isn't clear that this would be useful given the problems of
> huge rulebases and the typically super-linear cost of resolution algorithms.
>
> Another option is that you want to evaluate many conventionally-sized
> rulebases against one or many inputs in order to implement a boosted rule
> engine.  Map-reduce should be relatively trivial for this as well.
>
> What is it that you are trying to do?
>
>
> On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Does anyone know any (opensource) project that builds a rules engine
>> (based on RETE) on top Hadoop?
>> Searching a bit on the net, I have only seen a small reference to
>> Concord/IBM but there is barely any information available (and surely
>> it is not open source).
>>
>> Alpha and beta memories would be stored on HBase. Should be possible, no?
>>
>> Regards,
>>
>> Sourygna
>
>

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

My problem would be similar to the first option you write:
I have a few number of rules (let's say, < 1000) and a huge number of
inputs (= big data part).

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

embedding a rule engine in map/reduce makes much more sense, but as
Ted points out scaling it isn't easy.

As long as you break the reasoning into map/reduce stages, it should
work. The devil is in the details and you have to write the rules
efficiently to achieve the goal.


On Fri, Oct 19, 2012 at 3:45 PM, Ted Dunning <td...@maprtech.com> wrote:
> Unification in a parallel cluster is a difficult problem.  Writing very
> large scale unification programs is an even harder problem.
>
> What problem are you trying to solve?
>
> One option would be that you need to evaluate a conventionally-sized
> rulebase against many inputs.  Map-reduce should be trivially capable of
> this.
>
> Another option would be that you want to evaluate a huge rulebase against a
> few inputs.  It isn't clear that this would be useful given the problems of
> huge rulebases and the typically super-linear cost of resolution algorithms.
>
> Another option is that you want to evaluate many conventionally-sized
> rulebases against one or many inputs in order to implement a boosted rule
> engine.  Map-reduce should be relatively trivial for this as well.
>
> What is it that you are trying to do?
>
>
> On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Does anyone know any (opensource) project that builds a rules engine
>> (based on RETE) on top Hadoop?
>> Searching a bit on the net, I have only seen a small reference to
>> Concord/IBM but there is barely any information available (and surely
>> it is not open source).
>>
>> Alpha and beta memories would be stored on HBase. Should be possible, no?
>>
>> Regards,
>>
>> Sourygna
>
>

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Unification in a parallel cluster is a difficult problem.  Writing very
large scale unification programs is an even harder problem.

What problem are you trying to solve?

One option would be that you need to evaluate a conventionally-sized
rulebase against many inputs.  Map-reduce should be trivially capable of
this.

Another option would be that you want to evaluate a huge rulebase against a
few inputs.  It isn't clear that this would be useful given the problems of
huge rulebases and the typically super-linear cost of resolution algorithms.

Another option is that you want to evaluate many conventionally-sized
rulebases against one or many inputs in order to implement a boosted rule
engine.  Map-reduce should be relatively trivial for this as well.

What is it that you are trying to do?

On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>wrote:

> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB.

There's no simple answers on how many facts can be loaded in a rule
engine. If you want to learn more, email directly. Hadoop mailing list
isn't an appropriate place to get into the weeds of how to build
efficient rules, since it has nothing to do with hadoop.

On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Thanks for all the information. Many papers/book to read in my free time :)...
>
> Just to get an idea, what is the maximum memory consumed by a rule engine
> you have ever seen and what were its characteristic (how many facts
> loaded at the same
> time, how many rules and joins?) ?
>
> On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
>> All RETE implementations use RAM these days.
>>
>> There are older rule engines that used databases or file systems when
>> there wasn't enough RAM. The key to efficient scale of rulebase
>> systems or expert systems is loading only the data you need. An expert
>> system is inference engine + rules + functions + facts. Some products
>> shameless promote their rule engine as an expert system, when they
>> don't understand what the term means. Some rule engines are expert
>> systems shells, which provide a full programming environment without
>> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
>> Haley come to mind.
>>
>> I would suggest reading Gary Riley's book
>> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>>
>> In terms of nodes, that actually doesn't matter much due to the
>> discrimination network produced by RETE algorithm. What matters more
>> is the number of facts and % of the facts that match some of the
>> patterns declared in the rules.
>>
>> Most RETE implementations materialize the joins results, so that is
>> the biggest factor in memory consumption. For example, if you had 1000
>> rules, but only 3 have joins, they it doesn't make much difference. In
>> contrast, if you had 200 rules and each has 4 joins, it will consume
>> more memory for the same dataset.
>>
>> Proper scaling of rulebase systems requires years of experience and
>> expertise, so it's not something one should rush. It's best to study
>> the domain and methodically develop the rulebase so that it is
>> efficient. I would recommend you use JESS. Feel free to email me
>> directly if your company wants to hire experienced rule developer to
>> assist with your project.
>>
>> RETE rule engines are powerful tools, but it does require experience
>> to scale properly.

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB.

There's no simple answers on how many facts can be loaded in a rule
engine. If you want to learn more, email directly. Hadoop mailing list
isn't an appropriate place to get into the weeds of how to build
efficient rules, since it has nothing to do with hadoop.

On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Thanks for all the information. Many papers/book to read in my free time :)...
>
> Just to get an idea, what is the maximum memory consumed by a rule engine
> you have ever seen and what were its characteristic (how many facts
> loaded at the same
> time, how many rules and joins?) ?
>
> On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
>> All RETE implementations use RAM these days.
>>
>> There are older rule engines that used databases or file systems when
>> there wasn't enough RAM. The key to efficient scale of rulebase
>> systems or expert systems is loading only the data you need. An expert
>> system is inference engine + rules + functions + facts. Some products
>> shameless promote their rule engine as an expert system, when they
>> don't understand what the term means. Some rule engines are expert
>> systems shells, which provide a full programming environment without
>> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
>> Haley come to mind.
>>
>> I would suggest reading Gary Riley's book
>> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>>
>> In terms of nodes, that actually doesn't matter much due to the
>> discrimination network produced by RETE algorithm. What matters more
>> is the number of facts and % of the facts that match some of the
>> patterns declared in the rules.
>>
>> Most RETE implementations materialize the joins results, so that is
>> the biggest factor in memory consumption. For example, if you had 1000
>> rules, but only 3 have joins, they it doesn't make much difference. In
>> contrast, if you had 200 rules and each has 4 joins, it will consume
>> more memory for the same dataset.
>>
>> Proper scaling of rulebase systems requires years of experience and
>> expertise, so it's not something one should rush. It's best to study
>> the domain and methodically develop the rulebase so that it is
>> efficient. I would recommend you use JESS. Feel free to email me
>> directly if your company wants to hire experienced rule developer to
>> assist with your project.
>>
>> RETE rule engines are powerful tools, but it does require experience
>> to scale properly.

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB.

There's no simple answers on how many facts can be loaded in a rule
engine. If you want to learn more, email directly. Hadoop mailing list
isn't an appropriate place to get into the weeds of how to build
efficient rules, since it has nothing to do with hadoop.

On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Thanks for all the information. Many papers/book to read in my free time :)...
>
> Just to get an idea, what is the maximum memory consumed by a rule engine
> you have ever seen and what were its characteristic (how many facts
> loaded at the same
> time, how many rules and joins?) ?
>
> On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
>> All RETE implementations use RAM these days.
>>
>> There are older rule engines that used databases or file systems when
>> there wasn't enough RAM. The key to efficient scale of rulebase
>> systems or expert systems is loading only the data you need. An expert
>> system is inference engine + rules + functions + facts. Some products
>> shameless promote their rule engine as an expert system, when they
>> don't understand what the term means. Some rule engines are expert
>> systems shells, which provide a full programming environment without
>> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
>> Haley come to mind.
>>
>> I would suggest reading Gary Riley's book
>> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>>
>> In terms of nodes, that actually doesn't matter much due to the
>> discrimination network produced by RETE algorithm. What matters more
>> is the number of facts and % of the facts that match some of the
>> patterns declared in the rules.
>>
>> Most RETE implementations materialize the joins results, so that is
>> the biggest factor in memory consumption. For example, if you had 1000
>> rules, but only 3 have joins, they it doesn't make much difference. In
>> contrast, if you had 200 rules and each has 4 joins, it will consume
>> more memory for the same dataset.
>>
>> Proper scaling of rulebase systems requires years of experience and
>> expertise, so it's not something one should rush. It's best to study
>> the domain and methodically develop the rulebase so that it is
>> efficient. I would recommend you use JESS. Feel free to email me
>> directly if your company wants to hire experienced rule developer to
>> assist with your project.
>>
>> RETE rule engines are powerful tools, but it does require experience
>> to scale properly.

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB.

There's no simple answers on how many facts can be loaded in a rule
engine. If you want to learn more, email directly. Hadoop mailing list
isn't an appropriate place to get into the weeds of how to build
efficient rules, since it has nothing to do with hadoop.

On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Thanks for all the information. Many papers/book to read in my free time :)...
>
> Just to get an idea, what is the maximum memory consumed by a rule engine
> you have ever seen and what were its characteristic (how many facts
> loaded at the same
> time, how many rules and joins?) ?
>
> On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
>> All RETE implementations use RAM these days.
>>
>> There are older rule engines that used databases or file systems when
>> there wasn't enough RAM. The key to efficient scale of rulebase
>> systems or expert systems is loading only the data you need. An expert
>> system is inference engine + rules + functions + facts. Some products
>> shameless promote their rule engine as an expert system, when they
>> don't understand what the term means. Some rule engines are expert
>> systems shells, which provide a full programming environment without
>> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
>> Haley come to mind.
>>
>> I would suggest reading Gary Riley's book
>> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>>
>> In terms of nodes, that actually doesn't matter much due to the
>> discrimination network produced by RETE algorithm. What matters more
>> is the number of facts and % of the facts that match some of the
>> patterns declared in the rules.
>>
>> Most RETE implementations materialize the joins results, so that is
>> the biggest factor in memory consumption. For example, if you had 1000
>> rules, but only 3 have joins, they it doesn't make much difference. In
>> contrast, if you had 200 rules and each has 4 joins, it will consume
>> more memory for the same dataset.
>>
>> Proper scaling of rulebase systems requires years of experience and
>> expertise, so it's not something one should rush. It's best to study
>> the domain and methodically develop the rulebase so that it is
>> efficient. I would recommend you use JESS. Feel free to email me
>> directly if your company wants to hire experienced rule developer to
>> assist with your project.
>>
>> RETE rule engines are powerful tools, but it does require experience
>> to scale properly.

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

Thanks for all the information. Many papers/book to read in my free time :)...

Just to get an idea, what is the maximum memory consumed by a rule engine
you have ever seen and what were its characteristic (how many facts
loaded at the same
time, how many rules and joins?) ?

On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
> All RETE implementations use RAM these days.
>
> There are older rule engines that used databases or file systems when
> there wasn't enough RAM. The key to efficient scale of rulebase
> systems or expert systems is loading only the data you need. An expert
> system is inference engine + rules + functions + facts. Some products
> shameless promote their rule engine as an expert system, when they
> don't understand what the term means. Some rule engines are expert
> systems shells, which provide a full programming environment without
> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
> Haley come to mind.
>
> I would suggest reading Gary Riley's book
> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>
> In terms of nodes, that actually doesn't matter much due to the
> discrimination network produced by RETE algorithm. What matters more
> is the number of facts and % of the facts that match some of the
> patterns declared in the rules.
>
> Most RETE implementations materialize the joins results, so that is
> the biggest factor in memory consumption. For example, if you had 1000
> rules, but only 3 have joins, they it doesn't make much difference. In
> contrast, if you had 200 rules and each has 4 joins, it will consume
> more memory for the same dataset.
>
> Proper scaling of rulebase systems requires years of experience and
> expertise, so it's not something one should rush. It's best to study
> the domain and methodically develop the rulebase so that it is
> efficient. I would recommend you use JESS. Feel free to email me
> directly if your company wants to hire experienced rule developer to
> assist with your project.
>
> RETE rule engines are powerful tools, but it does require experience
> to scale properly.

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

Thanks for all the information. Many papers/book to read in my free time :)...

Just to get an idea, what is the maximum memory consumed by a rule engine
you have ever seen and what were its characteristic (how many facts
loaded at the same
time, how many rules and joins?) ?

On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
> All RETE implementations use RAM these days.
>
> There are older rule engines that used databases or file systems when
> there wasn't enough RAM. The key to efficient scale of rulebase
> systems or expert systems is loading only the data you need. An expert
> system is inference engine + rules + functions + facts. Some products
> shameless promote their rule engine as an expert system, when they
> don't understand what the term means. Some rule engines are expert
> systems shells, which provide a full programming environment without
> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
> Haley come to mind.
>
> I would suggest reading Gary Riley's book
> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>
> In terms of nodes, that actually doesn't matter much due to the
> discrimination network produced by RETE algorithm. What matters more
> is the number of facts and % of the facts that match some of the
> patterns declared in the rules.
>
> Most RETE implementations materialize the joins results, so that is
> the biggest factor in memory consumption. For example, if you had 1000
> rules, but only 3 have joins, they it doesn't make much difference. In
> contrast, if you had 200 rules and each has 4 joins, it will consume
> more memory for the same dataset.
>
> Proper scaling of rulebase systems requires years of experience and
> expertise, so it's not something one should rush. It's best to study
> the domain and methodically develop the rulebase so that it is
> efficient. I would recommend you use JESS. Feel free to email me
> directly if your company wants to hire experienced rule developer to
> assist with your project.
>
> RETE rule engines are powerful tools, but it does require experience
> to scale properly.

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

Thanks for all the information. Many papers/book to read in my free time :)...

Just to get an idea, what is the maximum memory consumed by a rule engine
you have ever seen and what were its characteristic (how many facts
loaded at the same
time, how many rules and joins?) ?

On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
> All RETE implementations use RAM these days.
>
> There are older rule engines that used databases or file systems when
> there wasn't enough RAM. The key to efficient scale of rulebase
> systems or expert systems is loading only the data you need. An expert
> system is inference engine + rules + functions + facts. Some products
> shameless promote their rule engine as an expert system, when they
> don't understand what the term means. Some rule engines are expert
> systems shells, which provide a full programming environment without
> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
> Haley come to mind.
>
> I would suggest reading Gary Riley's book
> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>
> In terms of nodes, that actually doesn't matter much due to the
> discrimination network produced by RETE algorithm. What matters more
> is the number of facts and % of the facts that match some of the
> patterns declared in the rules.
>
> Most RETE implementations materialize the joins results, so that is
> the biggest factor in memory consumption. For example, if you had 1000
> rules, but only 3 have joins, they it doesn't make much difference. In
> contrast, if you had 200 rules and each has 4 joins, it will consume
> more memory for the same dataset.
>
> Proper scaling of rulebase systems requires years of experience and
> expertise, so it's not something one should rush. It's best to study
> the domain and methodically develop the rulebase so that it is
> efficient. I would recommend you use JESS. Feel free to email me
> directly if your company wants to hire experienced rule developer to
> assist with your project.
>
> RETE rule engines are powerful tools, but it does require experience
> to scale properly.

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

Thanks for all the information. Many papers/book to read in my free time :)...

Just to get an idea, what is the maximum memory consumed by a rule engine
you have ever seen and what were its characteristic (how many facts
loaded at the same
time, how many rules and joins?) ?

On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <wo...@gmail.com> wrote:
> All RETE implementations use RAM these days.
>
> There are older rule engines that used databases or file systems when
> there wasn't enough RAM. The key to efficient scale of rulebase
> systems or expert systems is loading only the data you need. An expert
> system is inference engine + rules + functions + facts. Some products
> shameless promote their rule engine as an expert system, when they
> don't understand what the term means. Some rule engines are expert
> systems shells, which provide a full programming environment without
> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
> Haley come to mind.
>
> I would suggest reading Gary Riley's book
> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>
> In terms of nodes, that actually doesn't matter much due to the
> discrimination network produced by RETE algorithm. What matters more
> is the number of facts and % of the facts that match some of the
> patterns declared in the rules.
>
> Most RETE implementations materialize the joins results, so that is
> the biggest factor in memory consumption. For example, if you had 1000
> rules, but only 3 have joins, they it doesn't make much difference. In
> contrast, if you had 200 rules and each has 4 joins, it will consume
> more memory for the same dataset.
>
> Proper scaling of rulebase systems requires years of experience and
> expertise, so it's not something one should rush. It's best to study
> the domain and methodically develop the rulebase so that it is
> efficient. I would recommend you use JESS. Feel free to email me
> directly if your company wants to hire experienced rule developer to
> assist with your project.
>
> RETE rule engines are powerful tools, but it does require experience
> to scale properly.

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

All RETE implementations use RAM these days.

There are older rule engines that used databases or file systems when
there wasn't enough RAM. The key to efficient scale of rulebase
systems or expert systems is loading only the data you need. An expert
system is inference engine + rules + functions + facts. Some products
shameless promote their rule engine as an expert system, when they
don't understand what the term means. Some rule engines are expert
systems shells, which provide a full programming environment without
needing IDE and a bunch of other stuff. For example CLIPS, JESS and
Haley come to mind.

I would suggest reading Gary Riley's book
http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems

In terms of nodes, that actually doesn't matter much due to the
discrimination network produced by RETE algorithm. What matters more
is the number of facts and % of the facts that match some of the
patterns declared in the rules.

Most RETE implementations materialize the joins results, so that is
the biggest factor in memory consumption. For example, if you had 1000
rules, but only 3 have joins, they it doesn't make much difference. In
contrast, if you had 200 rules and each has 4 joins, it will consume
more memory for the same dataset.

Proper scaling of rulebase systems requires years of experience and
expertise, so it's not something one should rush. It's best to study
the domain and methodically develop the rulebase so that it is
efficient. I would recommend you use JESS. Feel free to email me
directly if your company wants to hire experienced rule developer to
assist with your project.

RETE rule engines are powerful tools, but it does require experience
to scale properly.

On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> In your RETE implementation, did you just relied on RAM to store the
> alpha and beta memories?
> What if there is a huge number of facts/WME/nodes and that you have to
> retain them for quite a long period (I mean: what happens if the
> alpha&beta memories gets higher than the RAM of your server?) ?
>
> HBase seemed interesting to me because it enables me to "scale out"
> this amount of memory and gives me the MR boost. Maybe there is a more
> interesting database/distributed cache for that?
>
> A big thank you anyway for your reply: I have googled a bit on your
> name and found many papers that should help me in going to the right
> direction (from this link:
> http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
> Till now, the only paper I had found was:
> http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
> (found on wikipedia) which I started to read.
>
> On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
>> Since I've implemented RETE algorithm, that is a terrible idea and
>> wouldn't be efficient.
>>
>> storing alpha and beta memories in HBase is technically feasible, but
>> it would be so slow as to be useless.
>>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

All RETE implementations use RAM these days.

There are older rule engines that used databases or file systems when
there wasn't enough RAM. The key to efficient scale of rulebase
systems or expert systems is loading only the data you need. An expert
system is inference engine + rules + functions + facts. Some products
shameless promote their rule engine as an expert system, when they
don't understand what the term means. Some rule engines are expert
systems shells, which provide a full programming environment without
needing IDE and a bunch of other stuff. For example CLIPS, JESS and
Haley come to mind.

I would suggest reading Gary Riley's book
http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems

In terms of nodes, that actually doesn't matter much due to the
discrimination network produced by RETE algorithm. What matters more
is the number of facts and % of the facts that match some of the
patterns declared in the rules.

Most RETE implementations materialize the joins results, so that is
the biggest factor in memory consumption. For example, if you had 1000
rules, but only 3 have joins, they it doesn't make much difference. In
contrast, if you had 200 rules and each has 4 joins, it will consume
more memory for the same dataset.

Proper scaling of rulebase systems requires years of experience and
expertise, so it's not something one should rush. It's best to study
the domain and methodically develop the rulebase so that it is
efficient. I would recommend you use JESS. Feel free to email me
directly if your company wants to hire experienced rule developer to
assist with your project.

RETE rule engines are powerful tools, but it does require experience
to scale properly.

On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> In your RETE implementation, did you just relied on RAM to store the
> alpha and beta memories?
> What if there is a huge number of facts/WME/nodes and that you have to
> retain them for quite a long period (I mean: what happens if the
> alpha&beta memories gets higher than the RAM of your server?) ?
>
> HBase seemed interesting to me because it enables me to "scale out"
> this amount of memory and gives me the MR boost. Maybe there is a more
> interesting database/distributed cache for that?
>
> A big thank you anyway for your reply: I have googled a bit on your
> name and found many papers that should help me in going to the right
> direction (from this link:
> http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
> Till now, the only paper I had found was:
> http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
> (found on wikipedia) which I started to read.
>
> On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
>> Since I've implemented RETE algorithm, that is a terrible idea and
>> wouldn't be efficient.
>>
>> storing alpha and beta memories in HBase is technically feasible, but
>> it would be so slow as to be useless.
>>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

All RETE implementations use RAM these days.

There are older rule engines that used databases or file systems when
there wasn't enough RAM. The key to efficient scale of rulebase
systems or expert systems is loading only the data you need. An expert
system is inference engine + rules + functions + facts. Some products
shameless promote their rule engine as an expert system, when they
don't understand what the term means. Some rule engines are expert
systems shells, which provide a full programming environment without
needing IDE and a bunch of other stuff. For example CLIPS, JESS and
Haley come to mind.

I would suggest reading Gary Riley's book
http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems

In terms of nodes, that actually doesn't matter much due to the
discrimination network produced by RETE algorithm. What matters more
is the number of facts and % of the facts that match some of the
patterns declared in the rules.

Most RETE implementations materialize the joins results, so that is
the biggest factor in memory consumption. For example, if you had 1000
rules, but only 3 have joins, they it doesn't make much difference. In
contrast, if you had 200 rules and each has 4 joins, it will consume
more memory for the same dataset.

Proper scaling of rulebase systems requires years of experience and
expertise, so it's not something one should rush. It's best to study
the domain and methodically develop the rulebase so that it is
efficient. I would recommend you use JESS. Feel free to email me
directly if your company wants to hire experienced rule developer to
assist with your project.

RETE rule engines are powerful tools, but it does require experience
to scale properly.

On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> In your RETE implementation, did you just relied on RAM to store the
> alpha and beta memories?
> What if there is a huge number of facts/WME/nodes and that you have to
> retain them for quite a long period (I mean: what happens if the
> alpha&beta memories gets higher than the RAM of your server?) ?
>
> HBase seemed interesting to me because it enables me to "scale out"
> this amount of memory and gives me the MR boost. Maybe there is a more
> interesting database/distributed cache for that?
>
> A big thank you anyway for your reply: I have googled a bit on your
> name and found many papers that should help me in going to the right
> direction (from this link:
> http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
> Till now, the only paper I had found was:
> http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
> (found on wikipedia) which I started to read.
>
> On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
>> Since I've implemented RETE algorithm, that is a terrible idea and
>> wouldn't be efficient.
>>
>> storing alpha and beta memories in HBase is technically feasible, but
>> it would be so slow as to be useless.
>>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

All RETE implementations use RAM these days.

There are older rule engines that used databases or file systems when
there wasn't enough RAM. The key to efficient scale of rulebase
systems or expert systems is loading only the data you need. An expert
system is inference engine + rules + functions + facts. Some products
shameless promote their rule engine as an expert system, when they
don't understand what the term means. Some rule engines are expert
systems shells, which provide a full programming environment without
needing IDE and a bunch of other stuff. For example CLIPS, JESS and
Haley come to mind.

I would suggest reading Gary Riley's book
http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems

In terms of nodes, that actually doesn't matter much due to the
discrimination network produced by RETE algorithm. What matters more
is the number of facts and % of the facts that match some of the
patterns declared in the rules.

Most RETE implementations materialize the joins results, so that is
the biggest factor in memory consumption. For example, if you had 1000
rules, but only 3 have joins, they it doesn't make much difference. In
contrast, if you had 200 rules and each has 4 joins, it will consume
more memory for the same dataset.

Proper scaling of rulebase systems requires years of experience and
expertise, so it's not something one should rush. It's best to study
the domain and methodically develop the rulebase so that it is
efficient. I would recommend you use JESS. Feel free to email me
directly if your company wants to hire experienced rule developer to
assist with your project.

RETE rule engines are powerful tools, but it does require experience
to scale properly.

On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <lu...@gmail.com> wrote:
> In your RETE implementation, did you just relied on RAM to store the
> alpha and beta memories?
> What if there is a huge number of facts/WME/nodes and that you have to
> retain them for quite a long period (I mean: what happens if the
> alpha&beta memories gets higher than the RAM of your server?) ?
>
> HBase seemed interesting to me because it enables me to "scale out"
> this amount of memory and gives me the MR boost. Maybe there is a more
> interesting database/distributed cache for that?
>
> A big thank you anyway for your reply: I have googled a bit on your
> name and found many papers that should help me in going to the right
> direction (from this link:
> http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
> Till now, the only paper I had found was:
> http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
> (found on wikipedia) which I started to read.
>
> On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
>> Since I've implemented RETE algorithm, that is a terrible idea and
>> wouldn't be efficient.
>>
>> storing alpha and beta memories in HBase is technically feasible, but
>> it would be so slow as to be useless.
>>

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

In your RETE implementation, did you just relied on RAM to store the
alpha and beta memories?
What if there is a huge number of facts/WME/nodes and that you have to
retain them for quite a long period (I mean: what happens if the
alpha&beta memories gets higher than the RAM of your server?) ?

HBase seemed interesting to me because it enables me to "scale out"
this amount of memory and gives me the MR boost. Maybe there is a more
interesting database/distributed cache for that?

A big thank you anyway for your reply: I have googled a bit on your
name and found many papers that should help me in going to the right
direction (from this link:
http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
Till now, the only paper I had found was:
http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
(found on wikipedia) which I started to read.

On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
> Since I've implemented RETE algorithm, that is a terrible idea and
> wouldn't be efficient.
>
> storing alpha and beta memories in HBase is technically feasible, but
> it would be so slow as to be useless.
>

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

In your RETE implementation, did you just relied on RAM to store the
alpha and beta memories?
What if there is a huge number of facts/WME/nodes and that you have to
retain them for quite a long period (I mean: what happens if the
alpha&beta memories gets higher than the RAM of your server?) ?

HBase seemed interesting to me because it enables me to "scale out"
this amount of memory and gives me the MR boost. Maybe there is a more
interesting database/distributed cache for that?

A big thank you anyway for your reply: I have googled a bit on your
name and found many papers that should help me in going to the right
direction (from this link:
http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
Till now, the only paper I had found was:
http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
(found on wikipedia) which I started to read.

On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
> Since I've implemented RETE algorithm, that is a terrible idea and
> wouldn't be efficient.
>
> storing alpha and beta memories in HBase is technically feasible, but
> it would be so slow as to be useless.
>

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

In your RETE implementation, did you just relied on RAM to store the
alpha and beta memories?
What if there is a huge number of facts/WME/nodes and that you have to
retain them for quite a long period (I mean: what happens if the
alpha&beta memories gets higher than the RAM of your server?) ?

HBase seemed interesting to me because it enables me to "scale out"
this amount of memory and gives me the MR boost. Maybe there is a more
interesting database/distributed cache for that?

A big thank you anyway for your reply: I have googled a bit on your
name and found many papers that should help me in going to the right
direction (from this link:
http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
Till now, the only paper I had found was:
http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
(found on wikipedia) which I started to read.

On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
> Since I've implemented RETE algorithm, that is a terrible idea and
> wouldn't be efficient.
>
> storing alpha and beta memories in HBase is technically feasible, but
> it would be so slow as to be useless.
>

Re: rules engine with Hadoop

Posted by Luangsay Sourygna <lu...@gmail.com>.

In your RETE implementation, did you just relied on RAM to store the
alpha and beta memories?
What if there is a huge number of facts/WME/nodes and that you have to
retain them for quite a long period (I mean: what happens if the
alpha&beta memories gets higher than the RAM of your server?) ?

HBase seemed interesting to me because it enables me to "scale out"
this amount of memory and gives me the MR boost. Maybe there is a more
interesting database/distributed cache for that?

A big thank you anyway for your reply: I have googled a bit on your
name and found many papers that should help me in going to the right
direction (from this link:
http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
Till now, the only paper I had found was:
http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
(found on wikipedia) which I started to read.

On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <wo...@gmail.com> wrote:
> Since I've implemented RETE algorithm, that is a terrible idea and
> wouldn't be efficient.
>
> storing alpha and beta memories in HBase is technically feasible, but
> it would be so slow as to be useless.
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

Since I've implemented RETE algorithm, that is a terrible idea and
wouldn't be efficient.

storing alpha and beta memories in HBase is technically feasible, but
it would be so slow as to be useless.

On Fri, Oct 19, 2012 at 3:25 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

Since I've implemented RETE algorithm, that is a terrible idea and
wouldn't be efficient.

storing alpha and beta memories in HBase is technically feasible, but
it would be so slow as to be useless.

On Fri, Oct 19, 2012 at 3:25 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

Since I've implemented RETE algorithm, that is a terrible idea and
wouldn't be efficient.

storing alpha and beta memories in HBase is technically feasible, but
it would be so slow as to be useless.

On Fri, Oct 19, 2012 at 3:25 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Unification in a parallel cluster is a difficult problem.  Writing very
large scale unification programs is an even harder problem.

What problem are you trying to solve?

One option would be that you need to evaluate a conventionally-sized
rulebase against many inputs.  Map-reduce should be trivially capable of
this.

Another option would be that you want to evaluate a huge rulebase against a
few inputs.  It isn't clear that this would be useful given the problems of
huge rulebases and the typically super-linear cost of resolution algorithms.

Another option is that you want to evaluate many conventionally-sized
rulebases against one or many inputs in order to implement a boosted rule
engine.  Map-reduce should be relatively trivial for this as well.

What is it that you are trying to do?

On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>wrote:

> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna
>

Re: rules engine with Hadoop

Posted by Peter Lin <wo...@gmail.com>.

Since I've implemented RETE algorithm, that is a terrible idea and
wouldn't be efficient.

storing alpha and beta memories in HBase is technically feasible, but
it would be so slow as to be useless.

On Fri, Oct 19, 2012 at 3:25 PM, Luangsay Sourygna <lu...@gmail.com> wrote:
> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Unification in a parallel cluster is a difficult problem.  Writing very
large scale unification programs is an even harder problem.

What problem are you trying to solve?

One option would be that you need to evaluate a conventionally-sized
rulebase against many inputs.  Map-reduce should be trivially capable of
this.

Another option would be that you want to evaluate a huge rulebase against a
few inputs.  It isn't clear that this would be useful given the problems of
huge rulebases and the typically super-linear cost of resolution algorithms.

Another option is that you want to evaluate many conventionally-sized
rulebases against one or many inputs in order to implement a boosted rule
engine.  Map-reduce should be relatively trivial for this as well.

What is it that you are trying to do?

On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>wrote:

> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna
>

Re: rules engine with Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Unification in a parallel cluster is a difficult problem.  Writing very
large scale unification programs is an even harder problem.

What problem are you trying to solve?

One option would be that you need to evaluate a conventionally-sized
rulebase against many inputs.  Map-reduce should be trivially capable of
this.

Another option would be that you want to evaluate a huge rulebase against a
few inputs.  It isn't clear that this would be useful given the problems of
huge rulebases and the typically super-linear cost of resolution algorithms.

Another option is that you want to evaluate many conventionally-sized
rulebases against one or many inputs in order to implement a boosted rule
engine.  Map-reduce should be relatively trivial for this as well.

What is it that you are trying to do?

On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <lu...@gmail.com>wrote:

> Hi,
>
> Does anyone know any (opensource) project that builds a rules engine
> (based on RETE) on top Hadoop?
> Searching a bit on the net, I have only seen a small reference to
> Concord/IBM but there is barely any information available (and surely
> it is not open source).
>
> Alpha and beta memories would be stored on HBase. Should be possible, no?
>
> Regards,
>
> Sourygna
>