You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2008/05/19 07:41:57 UTC

[jira] Created: (MAHOUT-56) Watchmaker Integration

Watchmaker Integration
----------------------

                 Key: MAHOUT-56
                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
             Project: Mahout
          Issue Type: Task
            Reporter: Deneche A. Hakim


The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by Grant Ingersoll <gs...@apache.org>.
Sounds good, Deneche, I'll take a look this weekend or early next week.

-Grant
On Jul 3, 2008, at 2:26 AM, Deneche A. Hakim (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Deneche A. Hakim updated MAHOUT-56:
> -----------------------------------
>
>    Attachment: watchmaker-tsp.patch
>
> I moved the class discovery code  
> (org.apache.mahout.ga.watchmaker.ca) to the examples directory,  
> until I figure out how to make it more generic :P
>
> I made some changes to the build.xml :
> * *ant compile-examples* will now compile all the code int src/main/ 
> examples, not that you'll need the ejb.jar library in order to  
> compile the cf.taste.ejb example
> * *ant examples-test* will lunch all the tests in the src/test/ 
> examples directory. It will allow us to add unit tests for the  
> examples
>
> you can run the CDGA algorithm, after generating the examples-job,  
> by using the following command
>
> {noformat}
> <hadoop-0.17.0_HOME>/bin/hadoop jar <mahout_HOME>/core/build/apache- 
> mahout-0.1-dev-ex.jar org.apache.mahout.ga.watchmaker.cd.CDGA  
> <mahout_HOME>/core/src/main/resources/wdbc/ 0.9 1 0.033 0.1 0 100 10
> {noformat}
>
> I will explain later what all those parameters mean...
>
>
>> Watchmaker Integration
>> ----------------------
>>
>>                Key: MAHOUT-56
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-56
>>            Project: Mahout
>>         Issue Type: Task
>>         Components: Genetic Algorithms
>>           Reporter: Deneche A. Hakim
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>            Fix For: 0.1
>>
>>        Attachments: libs.zip, watchmaker-tsp.patch, watchmaker- 
>> tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker- 
>> tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker- 
>> tsp.patch
>>
>>
>> The goal of this task is to allow watchmaker definded problems be  
>> solved in Mahout.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>



[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

I moved the class discovery code (org.apache.mahout.ga.watchmaker.ca) to the examples directory, until I figure out how to make it more generic :P

I made some changes to the build.xml :
* *ant compile-examples* will now compile all the code int src/main/examples, not that you'll need the ejb.jar library in order to compile the cf.taste.ejb example
* *ant examples-test* will lunch all the tests in the src/test/examples directory. It will allow us to add unit tests for the examples

you can run the CDGA algorithm, after generating the examples-job, by using the following command

{noformat}
<hadoop-0.17.0_HOME>/bin/hadoop jar <mahout_HOME>/core/build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.ga.watchmaker.cd.CDGA <mahout_HOME>/core/src/main/resources/wdbc/ 0.9 1 0.033 0.1 0 100 10
{noformat}

I will explain later what all those parameters mean...


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623116#action_12623116 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

{quote}
Also, I thought we had the wdbc dataset somewhere, but now the example above doesn't work for me for the class discovery.
{quote}

wdbc was in test/ressources, and now it should be in examples/test/ressources. CDGA dos not work anymore because the code in the repository is weird !
Some of the code is not the latest one of the patch !!!

I am verifying all my code and should post soon a correcting patch. In the mean time the following command should run CDGA

{noformat}
$ hadoop-0.17.1/bin/hadoop jar apache-mahout-examples-0.1-dev.job org.apache.mahout.ga.watchmaker.cd.CDGA wdbc 0.9 1 0.033 0.1 0 100 10
{noformat}

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616361#action_12616361 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

In fact the info file is inspired from ARFF. The main differences are :
* I need to be able to ignore some attributes (for example : ID)
* I need to store the min and max values for the numerical attributes.
* If the dataset is not in the ARFF format, I just need to generate its info file, I think its much more efficient than converting it to the ARFF format (I am talking here about very large datasets)

And you're right about the Weka and Rapidminer compatibility, so I'll add to my *todo* list : support the ARFF dataset format.


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by Grant Ingersoll <gs...@apache.org>.
I see the wdbc data now.  Where should that go?

On Aug 15, 2008, at 10:09 AM, Grant Ingersoll (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622872 
> #action_12622872 ]
>
> Grant Ingersoll commented on MAHOUT-56:
> ---------------------------------------
>
> I'm going to commit and then move the core/test/examples over to  
> examples/...
>
> Also, I thought we had the wdbc dataset somewhere, but now the  
> example above doesn't work for me for the class discovery.
>
>> Watchmaker Integration
>> ----------------------
>>
>>                Key: MAHOUT-56
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-56
>>            Project: Mahout
>>         Issue Type: Task
>>         Components: Genetic Algorithms
>>           Reporter: Deneche A. Hakim
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>            Fix For: 0.1
>>
>>        Attachments: libs.zip, libs.zip, libs.zip, tsp- 
>> screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch
>>
>>
>> The goal of this task is to allow watchmaker definded problems be  
>> solved in Mahout.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>



[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622872#action_12622872 ] 

Grant Ingersoll commented on MAHOUT-56:
---------------------------------------

I'm going to commit and then move the core/test/examples over to examples/...

Also, I thought we had the wdbc dataset somewhere, but now the example above doesn't work for me for the class discovery.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*Changes*
* org.apache.mahout.ga.watchmaker.MahoutEvaluator removes any axisting input directory before storing the population
* org.apache.mahout.ga.watchmaker.cd.FileInfosParser Uses the CATEGORICAL token for symbolic (nominal) attributes. This makes it easy to identify a token using the first character.
* org.apache.mahout.ga.watchmaker.cd.tool.CDInfosTool is used to generate the .infos file needed by the CDGA for a new dataset. 

The new tool works as follow:
* he is invoked using the following command (the dataset path is given as a parameter):

{noformat}
$ ~/hadoop-0.17.0/bin/hadoop jar apache-mahout-0.1-dev-ex.jar org.apache.mahout.ga.watchmaker.cd.tool.CDInfosTool dataset_path
{noformat}

* the tool searches for an existing infos file, in the same directory of the dataset with the same name and with the ".infos" extension, that contain the type of the attributes: 
** 'N' numerical attribute
** 'C' categorical attribute
** 'L' label (this also a categorical attribute)
** 'I' to ignore the attribute
    each attribute is in a separate line
* the tool uses a Hadoop job to parse the dataset and collect the informations
* the results are writen back in the same .info file, in a format compatible with CDGA

for example, this is the info file generated for the [KDDCup (1999)|http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html] 10% Training Dataset :

{panel:title=kddcup.data_10_percent.infos}
NUMERICAL, 0.0,58329.0
CATEGORICAL, icmp,udp,tcp
CATEGORICAL, rje,login,time,systat,ntp_u,mtp,uucp_path,bgp,nntp,efs,Z39_50,csnet_ns,tim_i,X11,telnet,ftp_data,finger,other,exec,uucp,netstat,klogin,ecr_i,remote_job,urh_i,netbios_dgm,pop_2,auth,private,shell,printer,kshell,urp_i,vmnet,pop_3,echo,daytime,iso_tsap,courier,tftp_u,sunrpc,red_i,ctf,supdup,gopher,ssh,sql_net,name,smtp,hostnames,netbios_ssn,ftp,IRC,imap4,netbios_ns,http,ldap,eco_i,link,http_443,domain_u,discard,nnsp,pm_dump,domain,whois
CATEGORICAL, S2,SF,OTH,S0,S3,RSTR,RSTO,SH,S1,RSTOS0,REJ
NUMERICAL, 0.0,6.9337562E8
NUMERICAL, 0.0,5155468.0
CATEGORICAL, 0,1
NUMERICAL, 0.0,3.0
NUMERICAL, 0.0,3.0
NUMERICAL, 0.0,30.0
NUMERICAL, 0.0,5.0
CATEGORICAL, 0,1
NUMERICAL, 0.0,884.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,2.0
NUMERICAL, 0.0,993.0
NUMERICAL, 0.0,28.0
NUMERICAL, 0.0,2.0
NUMERICAL, 0.0,8.0
NUMERICAL, 0.0,1.4E-45
CATEGORICAL, 0
CATEGORICAL, 0,1
NUMERICAL, 0.0,511.0
NUMERICAL, 0.0,511.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,255.0
NUMERICAL, 0.0,255.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
NUMERICAL, 0.0,1.0
LABEL, teardrop.,ipsweep.,phf.,nmap.,land.,portsweep.,warezmaster.,smurf.,guess_passwd.,ftp_write.,perl.,loadmodule.,back.,imap.,normal.,pod.,spy.,neptune.,satan.,buffer_overflow.,rootkit.,warezclient.,multihop.
{panel}

*What's Next*
* I think I found a quick workaround to allow CDGA to handle multi-class classification, I should implement it and try it on the KDD dataset
* Run the code on a small cluster and hope that it will work :P

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*changes*
* simplified tests using dummy classes instead of using tsp and soduko
* added a complete watchmaker example related to TSP, this example comes with watchmaker, I made some modifications to allow the user to choose how the result will be calculated (standalone or distributed)
* no more need for watchmaker-examples-0.4.3.jar, the examples now need the folliwing library : watchmaker-swing-0.4.3.jar (the new libs.jar contains the required libraries and their licence files)
* you can run the CDGA algorithm, after generating the examples-job, by using the following command

{noformat} 
<hadoop-0.17.0_HOME>/bin/hadoop jar <mahout_HOME>/core/build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.ga.watchmaker.travellingsalesman.TravellingSalesman
{noformat}

make sure to check the "distributed" option to solve the problem using mahout.ga

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

I modified the NOTICE.TXT file to conform to xpp3 license.

also added more tests using another Watchmaker example (Sudoku solver) along with TSP. We don't probably need them both, but more tests are always welcome.

I should post soon into the dev-list to talk about the next possible steps...

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618640#action_12618640 ] 

Grant Ingersoll commented on MAHOUT-56:
---------------------------------------

Committed revision 681327.

Let's open up bugs/issues off of this, or add to this one if needed.  I think the ARFF support should be done separately. Deneche, do you want to add an issue for that?

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616508#action_12616508 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


Actually, I think what we need are three things:

a) your program that should work from in-memory data sets (probably labeled matrices of some kind).

b) we need a matrix reader of the kind you propose

c) we need an arff matrix reader.

I think that there is a jira around that could cover b & c.


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

I tested CDGA on a pseudo-distributed (a single PC) manner, and I discovered that I forgot to pass the dataset to the mappers :P Well, it's done now, and it works on pseudo-distributed.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-56:
----------------------------------

      Component/s: Genetic Algorithms
    Fix Version/s: 0.1
         Priority: Minor  (was: Major)

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-56:
----------------------------------

    Attachment: watchmaker-tsp.patch

Added ASL where needed.

Moved StringUtils to utils package.

Deneche, I think you need to clean up the examples that refer to Daniel Dyer.  I'm assuming this is a watchmaker example that you modified.  I believe the way to handle this is to mark it as ASL and somehow link to where you got the code from.  It is already ASL to begin with, but the copyright is Daniel Dyer.  You probably should also put a reference in NOTICES.txt that some of the code was developed by Daniel.

Otherwise, looks pretty good.  I'm no GA expert, but I like the TSP GUI!  :-)   Would be interested in seeing some performance numbers as you distribute this out over multiple nodes, but that is not a requirement for committing.



> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606994#action_12606994 ] 

adeneche edited comment on MAHOUT-56 at 6/22/08 3:42 AM:
-----------------------------------------------------------------

*what's new*
. class discovery: based on the following paper [Discovering Comprehensible Classification Rules using Genetic Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a genetic algorithm that searches for the best binary classification rule for a given dataset. The population, which is a list of possible rules, is passed to each mapper that handles a subset of the dataset. All the new stuff is in the package:

org.apache.mahout.gsoc.watchmaker.classdiscovery

. I refactored some classes from the previous patch to reuse the existing code. The main change is the class STEvolutionEngine<T> that uses a single thread and the corresponding STFitnessEvaluator<T>. More details will be added to the comments

. I added easymock library needed to run the tests

*What's need to be done*
The following steps need to be done before considering this patch to be complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the Soduko test to make the tests more comprehensible
. pass the population using the DestributedCache instead of job parameter

      was (Author: adeneche):
    *what's new*
. class discovery: based on the following paper [Discovering Comprehensible Classification Rules using Genetic Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a genetic algorithm that searches for the best binary classification rule for a given dataset. The population, which is a list of possible rules, is passed to each mapper that handles a subset of the dataset. All the new stuff is in the package:

org.apache.mahout.gsoc.watchmaker.classdiscovery

. I refactored some classes from the previous patch to reuse the existing code. The main change is the class STEvolutionEngine<T> that uses a single thread and the corresponding STFitnessEvaluator<T>. More details will be added to the comments

. I added easymock library needed to run the tests

*What's need to be done*
The following steps need to be done before considering this patch to be complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the Soduko test to make the tests more comprehensible
  
> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

This patch should work (I tried it)

Also contains DatasetTextOutputFormat, this is a TextOutputFormat that allows the input to be split into tow disjoint subsets (training and testing)

the main algo CDGA contains a bug somewhere, cause the results are weird...guess I know what I have to do for the next days (apart from hitting the keyboard with my head)

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: libs.rar

updated dependencies

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623199#action_12623199 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

A added a small (tiny) tutorial in the wiki. And I don't remember when, but I think that I accidently removed some lines from the file NOTICE.TXT, 
so if a committer can add them it'll be great :)

{noformat}
This product includes software developed by the Indiana University
  Extreme! Lab (http://www.extreme.indiana.edu/).

This product includes examples code from the Watchmaker project   
  https://watchmaker.dev.java.net/
{noformat}

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613072#action_12613072 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

{quote}
Deneche, I think you need to clean up the examples that refer to Daniel Dyer. I'm assuming this is a watchmaker example that you modified. I believe the way to handle this is to mark it as ASL and somehow link to where you got the code from. It is already ASL to begin with, but the copyright is Daniel Dyer. You probably should also put a reference in NOTICES.txt that some of the code was developed by Daniel.
{quote}
Ok, should be evailable in the next patch

{quote}
Otherwise, looks pretty good. I'm no GA expert, but I like the TSP GUI! Would be interested in seeing some performance numbers as you distribute this out over multiple nodes, but that is not a requirement for committing.
{quote}
This is a very good idea, but it needs a larger TSP problem (should be able to find one), and a cluster. I'll definitely try it.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597873#action_12597873 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

I started with the Traveling Salesman Problem (TSP) because the reference implementation already exists within Watchmaker.

You'll need to add the following jars to the Mahout/core/lib/ :

watchmaker-framework-0.4.3.jar
watchmaker-examples-0.4.3.jar (contains reference implementation of the TSP)
uncommons-maths-1.0.2.jar
uncommons-utils.jar

they are all available with watchmaker0.4.3 [https://watchmaker.dev.java.net/]

I also included some unit tests that should pass without problem.

The code contains the following 4 classes:
. RouteEvalMapper : a Hadoop mapper that evaluate the fitness of one candidate solution (GA individual)
. MahoutRouteEvaluator : takes a GA population in input and launch a Hadoop job to evaluate the fitness of each individual, 
  returns back the results. Takes care of storing the population into an input file, and loading the fitnesses from job outputs
. MahoutTspEvolutionEngine : Distributed implementation of the evolution engine that uses MahoutRouteEvaluator for the evaluations
. PopulationUtils : Utility class to store the population into a given FileSystem

This is the easiest possible implementation, the next steps are :
. Use serialization to store/load any king of individuals and not only List<String>
. Use serialization to pass any possible FitnessEvaluator, thus we can use MahoutEvolutionEngine for other problems
. and as suggested by Ted: use meta-mutation (But I think it will be in a separate Task)

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: libs.zip

this zip file contains the additional librairies

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-56:
-------------------------------------

    Assignee: Grant Ingersoll

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>         Attachments: watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*What's new*
CDGA should be able to cope with any given dataset (of course with a certain file format). It uses a special format file that contains enough informations about the dataset. This file (called info file) has the following format:
for each attribute a corresponding line in the info file describes it, it can be one of the following:
* IGNORED
  if the attribute is ignored
* LABEL val1, val2,...
  if the attribute is the label (class), and its possible values
* NOMINAL val1, val2,...
  if the attribute is nominal (categorial), and its possible values
* NUMERICAL min, max
  if the attribute is numerical, and its min and max values

For now I generated the info file manually for the WDBC dataset. The info file should be in the same parent directory of the input, with the same name as the input directory followed by ".info". For ex. for a dataset

build/examples/wdbc/

the info file should be

build/examples/wdbc.infos

*What's next*
* Map-Reduce program to automaticly generate the info file from any given dataset.
* Run CDGA with other datasets
* Multi-class classification

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved MAHOUT-56.
-----------------------------------

    Resolution: Fixed

Going to close this one, we can open up new issues as they arise.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607956#action_12607956 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

bq. Seems the latest patch doesn't apply all that well. Seems I'm getting double entries of each class in the same file.

Yeah for me too !!! I seems I am using a rather old version of TortoiseSVN, I updated now and should provide a working patch soon


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616730#action_12616730 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


Yes.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616162#action_12616162 ] 

Andrew Purtell commented on MAHOUT-56:
--------------------------------------

Was Weka's ARFF insufficient? Please see http://weka.sourceforge.net/wekadoc/index.php/en:ARFF_(3.5.1) . Just a suggestion from a potential Mahout user, but ARFF is a de-facto standard in some ML circles, and being able to move from Weka or Rapidminer to Mahout and back, depending on the scale, would be highly advantageous. 

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616531#action_12616531 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


I spoke poorly.  

In-memory is a misnomer.


It should be possible to have a large arff dataset in HDFS to be used as input as well as a large dataset in your format.

However you decide to read your data in, it should be usable by others.  Likewise, by symmetrically, with the arff input.  

How that works should depend a little on your data.  My feeling is that we will need something like a "row-wise splitting matrix input format" that sends groups of rows of a matrix to different mappers.  This input format should accept a configuration argument which is the class to be used to actually decode the format.

It will probably happen that not all algorithms will be quite so happy with this, especially the groups of rows part.  They may want all mappers to see the entire data set (if the data set is, say, a set of population members rather than real data).  They may want the mappers to have some row-wise map input, but have some side data that is read without using an input format.

You are really one of the first to define a real user story for this so you should feel free to define what you need in the context of what you think others might be able to use as well. 

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607832#action_12607832 ] 

Grant Ingersoll commented on MAHOUT-56:
---------------------------------------

Hi Deneche,

Seems the latest patch doesn't apply all that well.  Seems I'm getting double entries of each class in the same file.

>From the top directory, do:
svn status
svn diff > watchmaker-tsp.patch

Also, no need for the "gsoc" package.  This is full-fledged goodness, no need to qualify.  I'd suggest something like org.apache.mahout.genetic.watchmaker or org.apache.mahout.ga.watchmaker would be good.

Also, if you can zip up the required libraries and attach them, that would save a few trips to track them down.

Thanks,
Grant

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*Description of the changes*
I made the code problem independent, and so I changed the class names to remove any reference to TSP.
The classes are :
. StringUtils: inspired from the future Stringifier of Hadoop. Translates any given object (even not a Serializable one) 
to a one-line xml representation, and vice versa.
. MahoutEvolutionEngine: generic distributed Genetic algorithms. Now the constructor takes a FitnessEvaluator
  that takes care of evaluating every candidate.
. MahoutEvaluator: evaluate a population of individuals using a given FitnessEvaluator. Uses StringUtils to store 
the population into an input file, and the FitnessEvaluator into the JobConf.
. EvalMapper: Mapper that evaluate a candidate using the FitnessEvaluator passed into the JobConf.

Note that we no more need watchmaker-examples to build the code, but we still need it in the tests to 
compare this code with the reference implementation.

*Needed libraries*
you'll need the xstream library [http://xstream.codehaus.org/], I used the 1.2.1 version. Add the following jars to core/lib

xpp3_min-*.jar
xstream-*.jar

I also included the licenses for all the libraries that I added. And if we plan to use xpp3_min-*.jar we need 
to include the following lines somewhere in the Mahout documentation or in the software:

  "This product includes software developed by the Indiana University
  Extreme! Lab (http://www.extreme.indiana.edu/)."

*Next steps*
There is another example with wathmaker that I want to test with this new code, just to confirm that 
the integration is fine. Then we can talk in the mailing list about the next move, wich could be one of the following :
. meta-mutations
. for now Im assuming that each node contains the whole dataset needed to evaluate a candidate. But if the dataset is large enough 
to span on multiple nodes, the user should have the possibility of writing the evaluation funtion in terms of mappers and reducers
. ...any suggestion ?


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re : [jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by deneche abdelhakim <a_...@yahoo.fr>.
Ichecked their source code and all of them are under the terms of an Apache Software License 2.


--- En date de : Lun 19.5.08, Ted Dunning (JIRA) <ji...@apache.org> a écrit :

> De: Ted Dunning (JIRA) <ji...@apache.org>
> Objet: [jira] Commented: (MAHOUT-56) Watchmaker Integration
> À: mahout-dev@lucene.apache.org
> Date: Lundi 19 Mai 2008, 18h55
> [
> https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597998#action_12597998
> ] 
> 
> Ted Dunning commented on MAHOUT-56:
> -----------------------------------
> 
> 
> What is the license on watchmaker?
> 
> What about the other jars (uncommon-maths and
> uncommon-utils)?
> 
> 
> > Watchmaker Integration
> > ----------------------
> >
> >                 Key: MAHOUT-56
> >                 URL:
> https://issues.apache.org/jira/browse/MAHOUT-56
> >             Project: Mahout
> >          Issue Type: Task
> >            Reporter: Deneche A. Hakim
> >         Attachments: watchmaker-tsp.patch
> >
> >
> > The goal of this task is to allow watchmaker definded
> problems be solved in Mahout.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue
> online.

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597998#action_12597998 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


What is the license on watchmaker?

What about the other jars (uncommon-maths and uncommon-utils)?


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>         Attachments: watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*what's new*
. class discovery: based on the following paper [Discovering Comprehensible Classification Rules using Genetic Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a genetic algorithm that searches for the best binary classification rule for a given dataset. The population, which is a list of possible rules, is passed to each mapper that handles a subset of the dataset. All the new stuff is in the package:

org.apache.mahout.gsoc.watchmaker.classdiscovery

. I refactored some classes from the previous patch to reuse the existing code. The main change is the class STEvolutionEngine<T> that uses a single thread and the corresponding STFitnessEvaluator<T>. More details will be added to the comments

. I added easymock library needed to run the tests

*What's need to be done*
The following steps need to be done before considering this patch to be complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the Soduko test to make the tests more comprehensible

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: libs.zip

Updated dependencies.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by Grant Ingersoll <gs...@apache.org>.
I think we should split out a separate issue for ARFF (didn't Karl  
start one already?) and tackle that too.  It seems like reading ARFF  
should be generally useful.


On Jul 24, 2008, at 2:41 PM, Deneche A. Hakim (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616573 
> #action_12616573 ]
>
> Deneche A. Hakim commented on MAHOUT-56:
> ----------------------------------------
>
> After meny attemps to load all the informations that you gave me in  
> my brain-processing-cluster-that-doesnt-work-quit-well, let's see if  
> I understand it correctly:
>
> The algortihm handles any dataset in a matrix format, where (in my  
> case) the collumns are the attributes (and one of them is the Label)  
> and the rows are the datas.
>
> Working with Hadoop, we'll need to pass the dataset in the mapper's  
> input, so it must be a file (or many files). We'll then need a  
> custom InputFormat to feed the mappers with the data, and here comes  
> the lovely-named "row-wise splitting matrix input format".
>
> Now we want to be able to work with any given dataset file format  
> (including the ARFF and my custom format), and thus the InputFormat  
> needs a decoder that converts the dataset lines into matrix rows.
>
>> Watchmaker Integration
>> ----------------------
>>
>>                Key: MAHOUT-56
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-56
>>            Project: Mahout
>>         Issue Type: Task
>>         Components: Genetic Algorithms
>>           Reporter: Deneche A. Hakim
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>            Fix For: 0.1
>>
>>        Attachments: libs.zip, libs.zip, libs.zip, tsp- 
>> screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,  
>> watchmaker-tsp.patch, watchmaker-tsp.patch
>>
>>
>> The goal of this task is to allow watchmaker definded problems be  
>> solved in Mahout.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616573#action_12616573 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

After meny attemps to load all the informations that you gave me in my brain-processing-cluster-that-doesnt-work-quit-well, let's see if I understand it correctly:

The algortihm handles any dataset in a matrix format, where (in my case) the collumns are the attributes (and one of them is the Label) and the rows are the datas.

Working with Hadoop, we'll need to pass the dataset in the mapper's input, so it must be a file (or many files). We'll then need a custom InputFormat to feed the mappers with the data, and here comes the lovely-named "row-wise splitting matrix input format".

Now we want to be able to work with any given dataset file format (including the ARFF and my custom format), and thus the InputFormat needs a decoder that converts the dataset lines into matrix rows.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

Added comments to the new classes. The CDGA comment describes the meaning of the parameters for the program.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616527#action_12616527 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

The whole point of the CDGA example is to show how to use Mahout to run a Genetic Algorithm on a very large dataset, cause this is what Map-Reduce is about : large and distributed data.

Now, it wont harm my program to be able to work with in-memory datasets, and I'll be more than happy to implement (a) as soon as there is a "stable" solution for (b) and (c).

I have one question about in-memory datasets : how to pass them to the mappers ? we can't use the job input if the dataset is in-memory ? so I assume it is passed as a job parameter, is it ?

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

ouf ! I found the bug, it was hidden in CDFitness, and caused the GA to return weird solutions

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

classdiscovery.ga.CDGA (the main tool) now accepts command-line parameters


> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599398#action_12599398 ] 

Grant Ingersoll commented on MAHOUT-56:
---------------------------------------

{quote}
 "This product includes software developed by the Indiana University
Extreme! Lab (http://www.extreme.indiana.edu/)."

{quote}

This typically goes in NOTICE.txt in the root directory.   Feel free to add it, we will clean it up before release.

I hope to look at the rest of this soon, but others should too.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

This patch should now work fine. I added the wdbc dataset and modified the tests to look in the correct directory. I also correctected CDGA, it should run now with the following command:

{noformat}
$ ~/hadoop-0.17.0/bin/hadoop jar apache-mahout-examples-0.1-dev.job org.apache.mahout.ga.watchmaker.cd.CDGA wdbc 1 0.9 1 0.033 0.1 0 100 10
{noformat}

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment:     (was: libs.rar)

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: tsp-screenshot-1.jpg

TravellingSalesman example GUI

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

*What's new*
* Fixed some bugs that were well hidden in *DummyOutputCollector*  and *CDMutation*(why the bugs are always hidden !!!), the later unit test has been improved to catch the bug if it manages to come again

* The ClassDiscovery example should be able to handle Categorical attributes now, but I still need to add a tool that generate Dataset information from any given dataset.

* The *Travelling Salesman* comments have been cleared, and a reference to Watchmaker project has been added to the comments inplace of the @author tag. I also added a readme.txt that describes where to look for the changes in the original code.

*what's next*
* A generic map-reduce program to generate dataset informations from the dataset itself.

* multi-class classification

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: libs.zip

updated "zipped" dependencies

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deneche A. Hakim updated MAHOUT-56:
-----------------------------------

    Attachment: watchmaker-tsp.patch

A made a small (relatively) modification to CDGA that allows him to cope with multi-class classification. You can now give it a target class, and it will (try to) dicover the classification rule for this class. If you have N classes, just run it N times with a different target each time.

This modification allowed me to run CDGA over the KDD dataset, but it's veryyyyyyyyyyyyy slow. It takes more than 8 minutes to do one single iteration for one target over the 10% dataset (I didn't have the courage to run it over the whole dataset). At least now, I have a good dataset to test on a cluster :)

the target class (the index of the value for the LABEL in the info file) is specified just after the dataset name. The following examples run CDGA over the WDBC dataset with target 1:

{noformat}
$ ~/hadoop-0.17.0/bin/hadoop jar apache-mahout-0.1-dev-ex.jar org.apache.mahout.ga.watchmaker.cd.CDGA wdbc 1 0.9 1 0.033 0.1 0 100 10
{noformat}

This is the last week of GSoC, so if you have any suggestions about the tests, the comments and the code I think its time for them :)

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618958#action_12618958 ] 

Deneche A. Hakim commented on MAHOUT-56:
----------------------------------------

bq. Committed revision 681327.

Cool, now the patches should be easier to create :P


bq. Let's open up bugs/issues off of this, or add to this one if needed.

It'll be easier for me if the bug/issues are added here. I should my self add some known open issues soon.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-56) Watchmaker Integration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616213#action_12616213 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


R handles arff as well.

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.