You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by juber patel <ju...@gmail.com> on 2008/06/01 20:12:39 UTC

Gene Expression Programming in Mahout

hello everyone,

I have been lurking on this list for some time now. I would really
like to contribute to Mahout. As I had discussed earlier, I would like
to include my code, Amiba (http://amiba.sourceforge.net/) in Mahout. I
feel this is the right place for that code.
It implements Gene Expression Programming but it is sequential. I
would like to adapt it for Hadoop and for that I am reading up on
Hadoop.

Could you tell me again if this fits well with Mahout. And if you
don't mind including it in Mahout, how to go about it?

I have a day job but will try to push it as fast as I can.

thanks.

-- 
Juber Patel        http://juberpatel.googlepages.com

Re: Gene Expression Programming in Mahout

Posted by juber patel <ju...@gmail.com>.
deneche,

thanks  for the links.

Even in my experince the evaluation of individuals is the costliest
part. Maybe we can devise a strategy to handle this.

juber

On Mon, Jun 2, 2008 at 11:47 PM, deneche abdelhakim <a_...@yahoo.fr> wrote:
> I am working on using Hadoop to distribute the fitness evaluation of (hopefully) any problem written using the Watchmaker framework [https://watchmaker.dev.java.net/]. I already provided a patch with some code [http://issues.apache.org/jira/browse/MAHOUT-56] that let you distribute the evaluation of the population over the cluster (each node will evaluate a subset of the population).
>
> thank you for the links, I will take a look at some papers, but in the mean time could you tell me please : wich part of the GEP algorithm needs to be distributed (I'm guessing it's the fitness evaluation part) ?
>
> --- En date de : Lun 2.6.08, juber patel <ju...@gmail.com> a écrit :
>
>> De: juber patel <ju...@gmail.com>
>> Objet: Re: Gene Expression Programming in Mahout
>> À: mahout-dev@lucene.apache.org, apache_mahout@isabel-drost.de
>> Date: Lundi 2 Juin 2008, 19h34
>> yes, GEP is related to GA and I feel it provides a more
>> generic way of
>> defining populations, fitness functions etc. with the
>> possibility of a
>> wide range of grammars for the encoding of the Individual.
>> This
>> flexibility can be hugely effective when we can use the
>> computing
>> power of clusters.
>>
>> here is some biblio:
>>
>> http://www.gene-expression-programming.com/GEPBiblio.asp
>>
>>
>> Deneche,
>>
>> could you just give me an idea about your work so far?
>>
>> juber
>>
>>
>> On Mon, Jun 2, 2008 at 11:48 AM, Isabel Drost
>> <ap...@isabel-drost.de> wrote:
>> > On Sunday 01 June 2008, juber patel wrote:
>> >> I have been lurking on this list for some time
>> now. I would really
>> >> like to contribute to Mahout. As I had discussed
>> earlier, I would like
>> >> to include my code, Amiba
>> (http://amiba.sourceforge.net/) in Mahout. I
>> >> feel this is the right place for that code.
>> >
>> > Sounds great!
>> >
>> >
>> >> It implements Gene Expression Programming but it
>> is sequential. I
>> >> would like to adapt it for Hadoop and for that I
>> am reading up on
>> >> Hadoop.
>> >
>> > If you have any questions, feel free to ask us or post
>> your questions to the
>> > Hadoop mailinglists.
>> >
>> >
>> >> Could you tell me again if this fits well with
>> Mahout. And if you
>> >> don't mind including it in Mahout.
>> >
>> > Sure. You might want to coordinate with Deneche
>> Abdelhakim who is working in
>> > GA for GSoC - as I understand, Gene Expression
>> Programming is related to GA?
>> >
>> >
>> > Isabel
>> >
>> >
>> > --
>> > #if _FP_W_TYPE_SIZE < 32#error "Here's a
>> nickel kid.  Go buy yourself a real
>> > computer."#endif                --
>> linux/arch/sparc64/double.h
>> >  |\      _,,,---,,_       Web:
>> <http://www.isabel-drost.de>
>> >  /,`.-'`'    -.  ;-;;,_
>> >  |,4-  ) )-,_..;\ (  `'-'
>> > '---''(_/--'  `-'\_) (fL)  IM:
>>  <xm...@spaceboyz.net>
>> >
>>
>>
>>
>> --
>> Juber Patel http://juberpatel.googlepages.com
>
> __________________________________________________
> Do You Yahoo!?
> En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités
> http://mail.yahoo.fr Yahoo! Mail
>



-- 
Juber Patel http://juberpatel.googlepages.com

Re: Gene Expression Programming in Mahout

Posted by juber patel <ju...@gmail.com>.
Agreed. I can take care of that in case of Amiba code.

juber

On Tue, Jun 3, 2008 at 2:15 AM, Isabel Drost
<ap...@isabel-drost.de> wrote:
> On Monday 02 June 2008, Grant Ingersoll wrote:
>> I can't yet speak to accepting it just yet, although I'm of the
>> mindset the more the merrier.
>
> I would add only one additional condition: Code contributions should have at
> least one active maintainer or sponsor who takes care of the code. I am
> thinking of stuff like fixing bugs, handling feature requests and the like.
> The core developers can surely provide support here, but having someone
> around who is intimitely familiar with the code and the concepts behind
> should make life easier.
>
> Isabel
>
> --
> Caveats: it's GNOME, be afraid, be very afraid of the Depends line              -- James
> Troup
>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>  /,`.-'`'    -.  ;-;;,_
>  |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>



-- 
Juber Patel http://juberpatel.googlepages.com

Re: Gene Expression Programming in Mahout

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Monday 02 June 2008, Grant Ingersoll wrote:
> I can't yet speak to accepting it just yet, although I'm of the
> mindset the more the merrier.

I would add only one additional condition: Code contributions should have at 
least one active maintainer or sponsor who takes care of the code. I am 
thinking of stuff like fixing bugs, handling feature requests and the like. 
The core developers can surely provide support here, but having someone 
around who is intimitely familiar with the code and the concepts behind 
should make life easier.

Isabel

-- 
Caveats: it's GNOME, be afraid, be very afraid of the Depends line		-- James 
Troup
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Gene Expression Programming in Mahout

Posted by juber patel <ju...@gmail.com>.
well, this one is gonna be easy for you :)

yes, I am the sole author. Everything from design to implementation is mine.

The JGraphT library was supposed to be used for a grammar where
Individual strings would represent graphs. The implementation is
incomplete and we can easily replace JGraphT with some other graph
library.

There are many opportunities for parallelization.

One way could be creating relatively isolated pools of populations
with limited inter-pool exchange of best individuals. This would give
evolution a chance to explore various paths.

Another is somehow parallelizing evaluation of individuals in
accordance with the fitness  function. As someone mentioned it is
probably the costliest part of the evolution process.

Frankly I have to think more about it. But I am sure we can leverage
the power of multiple machines to make the process much faster and at
the same time explore more of the solution space. I have some
experience with parallel programming from my post-grad days.








On Mon, Jun 2, 2008 at 11:18 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Hi Juber,
>
> Sounds interesting.
>
> Software grants go by
> http://incubator.apache.org/ip-clearance/ip-clearance-template.html
>
> I can't yet speak to accepting it just yet, although I'm of the mindset the
> more the merrier, but perhaps you can tell us some more about it, in terms
> of the legal fun we get to go through for accepting code:
>
> 1. Are you the sole author?  If not, do the other authors agree with the
> decision to donate?  All authors need to sign the software grant.
> 2. Are there any dependencies on non ASL friendly libraries (i.e. (L)GPL,
> etc.)?  I think I recall a dep. on JGraphT, right?  If so, how much work to
> remove them?
>
> Do you have preliminary thoughts on how it would be put on Hadoop or some
> other distributed mechanism?
>
> Cheers,
> Grant
>
>
>
> On Jun 2, 2008, at 1:34 PM, juber patel wrote:
>
>> yes, GEP is related to GA and I feel it provides a more generic way of
>> defining populations, fitness functions etc. with the possibility of a
>> wide range of grammars for the encoding of the Individual. This
>> flexibility can be hugely effective when we can use the computing
>> power of clusters.
>>
>> here is some biblio:
>>
>> http://www.gene-expression-programming.com/GEPBiblio.asp
>>
>>
>> Deneche,
>>
>> could you just give me an idea about your work so far?
>>
>> juber
>>
>>
>> On Mon, Jun 2, 2008 at 11:48 AM, Isabel Drost
>> <ap...@isabel-drost.de> wrote:
>>>
>>> On Sunday 01 June 2008, juber patel wrote:
>>>>
>>>> I have been lurking on this list for some time now. I would really
>>>> like to contribute to Mahout. As I had discussed earlier, I would like
>>>> to include my code, Amiba (http://amiba.sourceforge.net/) in Mahout. I
>>>> feel this is the right place for that code.
>>>
>>> Sounds great!
>>>
>>>
>>>> It implements Gene Expression Programming but it is sequential. I
>>>> would like to adapt it for Hadoop and for that I am reading up on
>>>> Hadoop.
>>>
>>> If you have any questions, feel free to ask us or post your questions to
>>> the
>>> Hadoop mailinglists.
>>>
>>>
>>>> Could you tell me again if this fits well with Mahout. And if you
>>>> don't mind including it in Mahout.
>>>
>>> Sure. You might want to coordinate with Deneche Abdelhakim who is working
>>> in
>>> GA for GSoC - as I understand, Gene Expression Programming is related to
>>> GA?
>>>
>>>
>>> Isabel
>>>
>>>
>>> --
>>> #if _FP_W_TYPE_SIZE < 32#error "Here's a nickel kid.  Go buy yourself a
>>> real
>>> computer."#endif                -- linux/arch/sparc64/double.h
>>> |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>>> /,`.-'`'    -.  ;-;;,_
>>> |,4-  ) )-,_..;\ (  `'-'
>>> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>>>
>>
>>
>>
>> --
>> Juber Patel http://juberpatel.googlepages.com
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>



-- 
Juber Patel http://juberpatel.googlepages.com

Re: Gene Expression Programming in Mahout

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Juber,

Sounds interesting.

Software grants go by http://incubator.apache.org/ip-clearance/ip-clearance-template.html

I can't yet speak to accepting it just yet, although I'm of the  
mindset the more the merrier, but perhaps you can tell us some more  
about it, in terms of the legal fun we get to go through for accepting  
code:

1. Are you the sole author?  If not, do the other authors agree with  
the decision to donate?  All authors need to sign the software grant.
2. Are there any dependencies on non ASL friendly libraries (i.e.  
(L)GPL, etc.)?  I think I recall a dep. on JGraphT, right?  If so, how  
much work to remove them?

Do you have preliminary thoughts on how it would be put on Hadoop or  
some other distributed mechanism?

Cheers,
Grant



On Jun 2, 2008, at 1:34 PM, juber patel wrote:

> yes, GEP is related to GA and I feel it provides a more generic way of
> defining populations, fitness functions etc. with the possibility of a
> wide range of grammars for the encoding of the Individual. This
> flexibility can be hugely effective when we can use the computing
> power of clusters.
>
> here is some biblio:
>
> http://www.gene-expression-programming.com/GEPBiblio.asp
>
>
> Deneche,
>
> could you just give me an idea about your work so far?
>
> juber
>
>
> On Mon, Jun 2, 2008 at 11:48 AM, Isabel Drost
> <ap...@isabel-drost.de> wrote:
>> On Sunday 01 June 2008, juber patel wrote:
>>> I have been lurking on this list for some time now. I would really
>>> like to contribute to Mahout. As I had discussed earlier, I would  
>>> like
>>> to include my code, Amiba (http://amiba.sourceforge.net/) in  
>>> Mahout. I
>>> feel this is the right place for that code.
>>
>> Sounds great!
>>
>>
>>> It implements Gene Expression Programming but it is sequential. I
>>> would like to adapt it for Hadoop and for that I am reading up on
>>> Hadoop.
>>
>> If you have any questions, feel free to ask us or post your  
>> questions to the
>> Hadoop mailinglists.
>>
>>
>>> Could you tell me again if this fits well with Mahout. And if you
>>> don't mind including it in Mahout.
>>
>> Sure. You might want to coordinate with Deneche Abdelhakim who is  
>> working in
>> GA for GSoC - as I understand, Gene Expression Programming is  
>> related to GA?
>>
>>
>> Isabel
>>
>>
>> --
>> #if _FP_W_TYPE_SIZE < 32#error "Here's a nickel kid.  Go buy  
>> yourself a real
>> computer."#endif                -- linux/arch/sparc64/double.h
>> |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>> /,`.-'`'    -.  ;-;;,_
>> |,4-  ) )-,_..;\ (  `'-'
>> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>>
>
>
>
> -- 
> Juber Patel http://juberpatel.googlepages.com

--------------------------
Grant Ingersoll
http://www.lucidimagination.com



Re: Gene Expression Programming in Mahout

Posted by deneche abdelhakim <a_...@yahoo.fr>.
I am working on using Hadoop to distribute the fitness evaluation of (hopefully) any problem written using the Watchmaker framework [https://watchmaker.dev.java.net/]. I already provided a patch with some code [http://issues.apache.org/jira/browse/MAHOUT-56] that let you distribute the evaluation of the population over the cluster (each node will evaluate a subset of the population).

thank you for the links, I will take a look at some papers, but in the mean time could you tell me please : wich part of the GEP algorithm needs to be distributed (I'm guessing it's the fitness evaluation part) ?

--- En date de : Lun 2.6.08, juber patel <ju...@gmail.com> a écrit :

> De: juber patel <ju...@gmail.com>
> Objet: Re: Gene Expression Programming in Mahout
> À: mahout-dev@lucene.apache.org, apache_mahout@isabel-drost.de
> Date: Lundi 2 Juin 2008, 19h34
> yes, GEP is related to GA and I feel it provides a more
> generic way of
> defining populations, fitness functions etc. with the
> possibility of a
> wide range of grammars for the encoding of the Individual.
> This
> flexibility can be hugely effective when we can use the
> computing
> power of clusters.
> 
> here is some biblio:
> 
> http://www.gene-expression-programming.com/GEPBiblio.asp
> 
> 
> Deneche,
> 
> could you just give me an idea about your work so far?
> 
> juber
> 
> 
> On Mon, Jun 2, 2008 at 11:48 AM, Isabel Drost
> <ap...@isabel-drost.de> wrote:
> > On Sunday 01 June 2008, juber patel wrote:
> >> I have been lurking on this list for some time
> now. I would really
> >> like to contribute to Mahout. As I had discussed
> earlier, I would like
> >> to include my code, Amiba
> (http://amiba.sourceforge.net/) in Mahout. I
> >> feel this is the right place for that code.
> >
> > Sounds great!
> >
> >
> >> It implements Gene Expression Programming but it
> is sequential. I
> >> would like to adapt it for Hadoop and for that I
> am reading up on
> >> Hadoop.
> >
> > If you have any questions, feel free to ask us or post
> your questions to the
> > Hadoop mailinglists.
> >
> >
> >> Could you tell me again if this fits well with
> Mahout. And if you
> >> don't mind including it in Mahout.
> >
> > Sure. You might want to coordinate with Deneche
> Abdelhakim who is working in
> > GA for GSoC - as I understand, Gene Expression
> Programming is related to GA?
> >
> >
> > Isabel
> >
> >
> > --
> > #if _FP_W_TYPE_SIZE < 32#error "Here's a
> nickel kid.  Go buy yourself a real
> > computer."#endif                --
> linux/arch/sparc64/double.h
> >  |\      _,,,---,,_       Web:  
> <http://www.isabel-drost.de>
> >  /,`.-'`'    -.  ;-;;,_
> >  |,4-  ) )-,_..;\ (  `'-'
> > '---''(_/--'  `-'\_) (fL)  IM:
>  <xm...@spaceboyz.net>
> >
> 
> 
> 
> -- 
> Juber Patel http://juberpatel.googlepages.com

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: Gene Expression Programming in Mahout

Posted by juber patel <ju...@gmail.com>.
yes, GEP is related to GA and I feel it provides a more generic way of
defining populations, fitness functions etc. with the possibility of a
wide range of grammars for the encoding of the Individual. This
flexibility can be hugely effective when we can use the computing
power of clusters.

here is some biblio:

http://www.gene-expression-programming.com/GEPBiblio.asp


Deneche,

could you just give me an idea about your work so far?

juber


On Mon, Jun 2, 2008 at 11:48 AM, Isabel Drost
<ap...@isabel-drost.de> wrote:
> On Sunday 01 June 2008, juber patel wrote:
>> I have been lurking on this list for some time now. I would really
>> like to contribute to Mahout. As I had discussed earlier, I would like
>> to include my code, Amiba (http://amiba.sourceforge.net/) in Mahout. I
>> feel this is the right place for that code.
>
> Sounds great!
>
>
>> It implements Gene Expression Programming but it is sequential. I
>> would like to adapt it for Hadoop and for that I am reading up on
>> Hadoop.
>
> If you have any questions, feel free to ask us or post your questions to the
> Hadoop mailinglists.
>
>
>> Could you tell me again if this fits well with Mahout. And if you
>> don't mind including it in Mahout.
>
> Sure. You might want to coordinate with Deneche Abdelhakim who is working in
> GA for GSoC - as I understand, Gene Expression Programming is related to GA?
>
>
> Isabel
>
>
> --
> #if _FP_W_TYPE_SIZE < 32#error "Here's a nickel kid.  Go buy yourself a real
> computer."#endif                -- linux/arch/sparc64/double.h
>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>  /,`.-'`'    -.  ;-;;,_
>  |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>



-- 
Juber Patel http://juberpatel.googlepages.com

Re: Gene Expression Programming in Mahout

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Sunday 01 June 2008, juber patel wrote:
> I have been lurking on this list for some time now. I would really
> like to contribute to Mahout. As I had discussed earlier, I would like
> to include my code, Amiba (http://amiba.sourceforge.net/) in Mahout. I
> feel this is the right place for that code.

Sounds great!


> It implements Gene Expression Programming but it is sequential. I
> would like to adapt it for Hadoop and for that I am reading up on
> Hadoop.

If you have any questions, feel free to ask us or post your questions to the 
Hadoop mailinglists.


> Could you tell me again if this fits well with Mahout. And if you
> don't mind including it in Mahout.

Sure. You might want to coordinate with Deneche Abdelhakim who is working in 
GA for GSoC - as I understand, Gene Expression Programming is related to GA?


Isabel


-- 
#if _FP_W_TYPE_SIZE < 32#error "Here's a nickel kid.  Go buy yourself a real 
computer."#endif		-- linux/arch/sparc64/double.h
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>