You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2008/05/21 07:23:56 UTC

Thoughts on timeline for first release?

Just curious, what are people thinking about the timeline for a first,
very early release, like an 0.1 release? any open tasks that I could
pick up to help?

Without rushing anything, I'm keen to retire my current project site
and forward everybody that's interested to Mahout. As long as there's
a .jar distro someone can pick up and use, that's cool.

Sean

Re: Thoughts on timeline for first release?

Posted by Jeff Eastman <je...@windwardsolutions.com>.

Task 1 is completed and R is running :). Maybe this afternoon you can 
give me the cooks tour?
Jeff

Ted Dunning wrote:
> Sorry, didn't mean to sound like that.
>
> I am happy to build the data sets!
>
> I can also demo R for you this evening.
>
> On Wed, May 21, 2008 at 10:36 AM, Jeff Eastman <
> jeastman@windwardsolutions.com> wrote:
>
>   
>> Ok, ok, UNCLE!
>>
>> Things to do:
>> - Install R
>> - Learn R
>>
>> Now I have at least two reasons to do that<grin>
>> Jeff
>>
>>
>>
>> Ted Dunning wrote:
>>
>>     
>>> It is also the work of a moment to build some synthetic data sets using R.
>>> Real data is cooler, though.
>>>
>>> On Wed, May 21, 2008 at 10:25 AM, Jeff Eastman <
>>> jeastman@windwardsolutions.com> wrote:
>>>
>>>
>>>
>>>       
>>>> Thanks, Ted, and most are small enough to run on a single node. I'm
>>>> investigating further...
>>>>
>>>> Jeff
>>>>
>>>>
>>>> Ted Dunning wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Do these 5 suffice:
>>>>>
>>>>>
>>>>>
>>>>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>>>>>
>>>>> The classification data sets are also reasonable to try with clustering.
>>>>> The Irises dataset and the Japanese vowels are both plausible for
>>>>> clustering
>>>>> (inter alia, of course).
>>>>>
>>>>> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
>>>>> jeastman@windwardsolutions.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Does anybody have some links to datasets we can use for clustering
>>>>>> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop
>>>>>> and
>>>>>> Mahout, along with a script to deploy it on a cluster, upload the
>>>>>> examples
>>>>>> and run clustering on it. Is that too ambitious? I'm kinda hoping that
>>>>>> we
>>>>>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
>>>>>> won't meet our schedule then maybe I should work through the 0.16
>>>>>> deployment.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>>
>>>>>> Grant Ingersoll wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> I was thinking we should get the Taste stuff in (seems to be pretty
>>>>>>> close
>>>>>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This
>>>>>>> would
>>>>>>> give
>>>>>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>>>>>> implementations, a classifier, and, of course, Taste.  I think I can
>>>>>>> finish
>>>>>>> up my part in the next week or so.  Then, we will need to start to
>>>>>>> figure
>>>>>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also
>>>>>>> like
>>>>>>> to see us have an easy to use demo of the clustering stuff, but it is
>>>>>>> all
>>>>>>> right if we don't.
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>>>>>
>>>>>>>  Just curious, what are people thinking about the timeline for a
>>>>>>> first,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> very early release, like an 0.1 release? any open tasks that I could
>>>>>>>> pick up to help?
>>>>>>>>
>>>>>>>> Without rushing anything, I'm keen to retire my current project site
>>>>>>>> and forward everybody that's interested to Mahout. As long as there's
>>>>>>>> a .jar distro someone can pick up and use, that's cool.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>           
>>>>         
>>>
>>>
>>>       
>>     
>
>
>

Re: Thoughts on timeline for first release?

Posted by Ted Dunning <te...@gmail.com>.

http://statwww.epfl.ch/davison/teaching/Microarrays/lab/clustering.html

http://research.nhgri.nih.gov/microarray/Gene_Expression_Supplement/

This provides some sample micro array data with a lab project for clustering
the data.

On Wed, May 21, 2008 at 10:44 AM, Ted Dunning <te...@gmail.com> wrote:

>
> Sorry, didn't mean to sound like that.
>
> I am happy to build the data sets!
>
> I can also demo R for you this evening.
>
>
> On Wed, May 21, 2008 at 10:36 AM, Jeff Eastman <
> jeastman@windwardsolutions.com> wrote:
>
>> Ok, ok, UNCLE!
>>
>> Things to do:
>> - Install R
>> - Learn R
>>
>> Now I have at least two reasons to do that<grin>
>> Jeff
>>
>>
>>
>> Ted Dunning wrote:
>>
>>> It is also the work of a moment to build some synthetic data sets using
>>> R.
>>> Real data is cooler, though.
>>>
>>> On Wed, May 21, 2008 at 10:25 AM, Jeff Eastman <
>>> jeastman@windwardsolutions.com> wrote:
>>>
>>>
>>>
>>>> Thanks, Ted, and most are small enough to run on a single node. I'm
>>>> investigating further...
>>>>
>>>> Jeff
>>>>
>>>>
>>>> Ted Dunning wrote:
>>>>
>>>>
>>>>
>>>>> Do these 5 suffice:
>>>>>
>>>>>
>>>>>
>>>>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>>>>>
>>>>> The classification data sets are also reasonable to try with
>>>>> clustering.
>>>>> The Irises dataset and the Japanese vowels are both plausible for
>>>>> clustering
>>>>> (inter alia, of course).
>>>>>
>>>>> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
>>>>> jeastman@windwardsolutions.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Does anybody have some links to datasets we can use for clustering
>>>>>> examples? I'm thinking we could publish an EC2 AMI that includes
>>>>>> Hadoop
>>>>>> and
>>>>>> Mahout, along with a script to deploy it on a cluster, upload the
>>>>>> examples
>>>>>> and run clustering on it. Is that too ambitious? I'm kinda hoping that
>>>>>> we
>>>>>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If
>>>>>> that
>>>>>> won't meet our schedule then maybe I should work through the 0.16
>>>>>> deployment.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>>
>>>>>> Grant Ingersoll wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I was thinking we should get the Taste stuff in (seems to be pretty
>>>>>>> close
>>>>>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This
>>>>>>> would
>>>>>>> give
>>>>>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>>>>>> implementations, a classifier, and, of course, Taste.  I think I can
>>>>>>> finish
>>>>>>> up my part in the next week or so.  Then, we will need to start to
>>>>>>> figure
>>>>>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also
>>>>>>> like
>>>>>>> to see us have an easy to use demo of the clustering stuff, but it is
>>>>>>> all
>>>>>>> right if we don't.
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>>>>>
>>>>>>>  Just curious, what are people thinking about the timeline for a
>>>>>>> first,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> very early release, like an 0.1 release? any open tasks that I could
>>>>>>>> pick up to help?
>>>>>>>>
>>>>>>>> Without rushing anything, I'm keen to retire my current project site
>>>>>>>> and forward everybody that's interested to Mahout. As long as
>>>>>>>> there's
>>>>>>>> a .jar distro someone can pick up and use, that's cool.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
> --
> ted
>
>


-- 
ted

Re: Thoughts on timeline for first release?

Posted by Ted Dunning <te...@gmail.com>.

Sorry, didn't mean to sound like that.

I am happy to build the data sets!

I can also demo R for you this evening.

On Wed, May 21, 2008 at 10:36 AM, Jeff Eastman <
jeastman@windwardsolutions.com> wrote:

> Ok, ok, UNCLE!
>
> Things to do:
> - Install R
> - Learn R
>
> Now I have at least two reasons to do that<grin>
> Jeff
>
>
>
> Ted Dunning wrote:
>
>> It is also the work of a moment to build some synthetic data sets using R.
>> Real data is cooler, though.
>>
>> On Wed, May 21, 2008 at 10:25 AM, Jeff Eastman <
>> jeastman@windwardsolutions.com> wrote:
>>
>>
>>
>>> Thanks, Ted, and most are small enough to run on a single node. I'm
>>> investigating further...
>>>
>>> Jeff
>>>
>>>
>>> Ted Dunning wrote:
>>>
>>>
>>>
>>>> Do these 5 suffice:
>>>>
>>>>
>>>>
>>>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>>>>
>>>> The classification data sets are also reasonable to try with clustering.
>>>> The Irises dataset and the Japanese vowels are both plausible for
>>>> clustering
>>>> (inter alia, of course).
>>>>
>>>> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
>>>> jeastman@windwardsolutions.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Does anybody have some links to datasets we can use for clustering
>>>>> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop
>>>>> and
>>>>> Mahout, along with a script to deploy it on a cluster, upload the
>>>>> examples
>>>>> and run clustering on it. Is that too ambitious? I'm kinda hoping that
>>>>> we
>>>>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
>>>>> won't meet our schedule then maybe I should work through the 0.16
>>>>> deployment.
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>> Grant Ingersoll wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I was thinking we should get the Taste stuff in (seems to be pretty
>>>>>> close
>>>>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This
>>>>>> would
>>>>>> give
>>>>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>>>>> implementations, a classifier, and, of course, Taste.  I think I can
>>>>>> finish
>>>>>> up my part in the next week or so.  Then, we will need to start to
>>>>>> figure
>>>>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also
>>>>>> like
>>>>>> to see us have an easy to use demo of the clustering stuff, but it is
>>>>>> all
>>>>>> right if we don't.
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>>>>
>>>>>>  Just curious, what are people thinking about the timeline for a
>>>>>> first,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> very early release, like an 0.1 release? any open tasks that I could
>>>>>>> pick up to help?
>>>>>>>
>>>>>>> Without rushing anything, I'm keen to retire my current project site
>>>>>>> and forward everybody that's interested to Mahout. As long as there's
>>>>>>> a .jar distro someone can pick up and use, that's cool.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
ted

Re: Thoughts on timeline for first release?

Posted by Jeff Eastman <je...@windwardsolutions.com>.

Ok, ok, UNCLE!

Things to do:
- Install R
- Learn R

Now I have at least two reasons to do that<grin>
Jeff


Ted Dunning wrote:
> It is also the work of a moment to build some synthetic data sets using R.
> Real data is cooler, though.
>
> On Wed, May 21, 2008 at 10:25 AM, Jeff Eastman <
> jeastman@windwardsolutions.com> wrote:
>
>   
>> Thanks, Ted, and most are small enough to run on a single node. I'm
>> investigating further...
>>
>> Jeff
>>
>>
>> Ted Dunning wrote:
>>
>>     
>>> Do these 5 suffice:
>>>
>>>
>>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>>>
>>> The classification data sets are also reasonable to try with clustering.
>>> The Irises dataset and the Japanese vowels are both plausible for
>>> clustering
>>> (inter alia, of course).
>>>
>>> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
>>> jeastman@windwardsolutions.com> wrote:
>>>
>>>
>>>
>>>       
>>>> Does anybody have some links to datasets we can use for clustering
>>>> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop
>>>> and
>>>> Mahout, along with a script to deploy it on a cluster, upload the
>>>> examples
>>>> and run clustering on it. Is that too ambitious? I'm kinda hoping that we
>>>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
>>>> won't meet our schedule then maybe I should work through the 0.16
>>>> deployment.
>>>>
>>>> Jeff
>>>>
>>>>
>>>> Grant Ingersoll wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> I was thinking we should get the Taste stuff in (seems to be pretty
>>>>> close
>>>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This would
>>>>> give
>>>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>>>> implementations, a classifier, and, of course, Taste.  I think I can
>>>>> finish
>>>>> up my part in the next week or so.  Then, we will need to start to
>>>>> figure
>>>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also
>>>>> like
>>>>> to see us have an easy to use demo of the clustering stuff, but it is
>>>>> all
>>>>> right if we don't.
>>>>>
>>>>> -Grant
>>>>>
>>>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>>>
>>>>>  Just curious, what are people thinking about the timeline for a first,
>>>>>
>>>>>
>>>>>           
>>>>>> very early release, like an 0.1 release? any open tasks that I could
>>>>>> pick up to help?
>>>>>>
>>>>>> Without rushing anything, I'm keen to retire my current project site
>>>>>> and forward everybody that's interested to Mahout. As long as there's
>>>>>> a .jar distro someone can pick up and use, that's cool.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>
>>>>>           
>>>
>>>       
>>     
>
>
>

Re: Thoughts on timeline for first release?

Posted by Ted Dunning <te...@gmail.com>.

It is also the work of a moment to build some synthetic data sets using R.
Real data is cooler, though.

On Wed, May 21, 2008 at 10:25 AM, Jeff Eastman <
jeastman@windwardsolutions.com> wrote:

> Thanks, Ted, and most are small enough to run on a single node. I'm
> investigating further...
>
> Jeff
>
>
> Ted Dunning wrote:
>
>> Do these 5 suffice:
>>
>>
>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>>
>> The classification data sets are also reasonable to try with clustering.
>> The Irises dataset and the Japanese vowels are both plausible for
>> clustering
>> (inter alia, of course).
>>
>> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
>> jeastman@windwardsolutions.com> wrote:
>>
>>
>>
>>> Does anybody have some links to datasets we can use for clustering
>>> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop
>>> and
>>> Mahout, along with a script to deploy it on a cluster, upload the
>>> examples
>>> and run clustering on it. Is that too ambitious? I'm kinda hoping that we
>>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
>>> won't meet our schedule then maybe I should work through the 0.16
>>> deployment.
>>>
>>> Jeff
>>>
>>>
>>> Grant Ingersoll wrote:
>>>
>>>
>>>
>>>> I was thinking we should get the Taste stuff in (seems to be pretty
>>>> close
>>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This would
>>>> give
>>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>>> implementations, a classifier, and, of course, Taste.  I think I can
>>>> finish
>>>> up my part in the next week or so.  Then, we will need to start to
>>>> figure
>>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also
>>>> like
>>>> to see us have an easy to use demo of the clustering stuff, but it is
>>>> all
>>>> right if we don't.
>>>>
>>>> -Grant
>>>>
>>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>>
>>>>  Just curious, what are people thinking about the timeline for a first,
>>>>
>>>>
>>>>> very early release, like an 0.1 release? any open tasks that I could
>>>>> pick up to help?
>>>>>
>>>>> Without rushing anything, I'm keen to retire my current project site
>>>>> and forward everybody that's interested to Mahout. As long as there's
>>>>> a .jar distro someone can pick up and use, that's cool.
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>


-- 
ted

Re: Thoughts on timeline for first release?

Posted by Jeff Eastman <je...@windwardsolutions.com>.

Thanks, Ted, and most are small enough to run on a single node. I'm 
investigating further...

Jeff

Ted Dunning wrote:
> Do these 5 suffice:
>
> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table
>
> The classification data sets are also reasonable to try with clustering.
> The Irises dataset and the Japanese vowels are both plausible for clustering
> (inter alia, of course).
>
> On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
> jeastman@windwardsolutions.com> wrote:
>
>   
>> Does anybody have some links to datasets we can use for clustering
>> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop and
>> Mahout, along with a script to deploy it on a cluster, upload the examples
>> and run clustering on it. Is that too ambitious? I'm kinda hoping that we
>> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
>> won't meet our schedule then maybe I should work through the 0.16
>> deployment.
>>
>> Jeff
>>
>>
>> Grant Ingersoll wrote:
>>
>>     
>>> I was thinking we should get the Taste stuff in (seems to be pretty close
>>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This would give
>>> us a pretty nice release, I think.  Namely, a couple of clustering
>>> implementations, a classifier, and, of course, Taste.  I think I can finish
>>> up my part in the next week or so.  Then, we will need to start to figure
>>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also like
>>> to see us have an easy to use demo of the clustering stuff, but it is all
>>> right if we don't.
>>>
>>> -Grant
>>>
>>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>>
>>>  Just curious, what are people thinking about the timeline for a first,
>>>       
>>>> very early release, like an 0.1 release? any open tasks that I could
>>>> pick up to help?
>>>>
>>>> Without rushing anything, I'm keen to retire my current project site
>>>> and forward everybody that's interested to Mahout. As long as there's
>>>> a .jar distro someone can pick up and use, that's cool.
>>>>
>>>> Sean
>>>>
>>>>         
>>>
>>>
>>>       
>
>
>

Re: Thoughts on timeline for first release?

Posted by Ted Dunning <te...@gmail.com>.

Do these 5 suffice:

http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=attUp&view=table

The classification data sets are also reasonable to try with clustering.
The Irises dataset and the Japanese vowels are both plausible for clustering
(inter alia, of course).

On Wed, May 21, 2008 at 8:10 AM, Jeff Eastman <
jeastman@windwardsolutions.com> wrote:

> Does anybody have some links to datasets we can use for clustering
> examples? I'm thinking we could publish an EC2 AMI that includes Hadoop and
> Mahout, along with a script to deploy it on a cluster, upload the examples
> and run clustering on it. Is that too ambitious? I'm kinda hoping that we
> can use 0.17 which advertises simpler EC2 deployment than 0.16. If that
> won't meet our schedule then maybe I should work through the 0.16
> deployment.
>
> Jeff
>
>
> Grant Ingersoll wrote:
>
>> I was thinking we should get the Taste stuff in (seems to be pretty close
>> to done) and I would like to get Mahout-9 (Naive Bayes) in.  This would give
>> us a pretty nice release, I think.  Namely, a couple of clustering
>> implementations, a classifier, and, of course, Taste.  I think I can finish
>> up my part in the next week or so.  Then, we will need to start to figure
>> out all the fun of releases (signatures, notices.txt, etc.)  I'd also like
>> to see us have an easy to use demo of the clustering stuff, but it is all
>> right if we don't.
>>
>> -Grant
>>
>> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>>
>>  Just curious, what are people thinking about the timeline for a first,
>>> very early release, like an 0.1 release? any open tasks that I could
>>> pick up to help?
>>>
>>> Without rushing anything, I'm keen to retire my current project site
>>> and forward everybody that's interested to Mahout. As long as there's
>>> a .jar distro someone can pick up and use, that's cool.
>>>
>>> Sean
>>>
>>
>>
>>
>>
>


-- 
ted

Re: Thoughts on timeline for first release?

Posted by deneche abdelhakim <a_...@yahoo.fr>.

UCI : http://archive.ics.uci.edu/ml/


--- En date de : Mer 21.5.08, Jeff Eastman <je...@windwardsolutions.com> a écrit :

> De: Jeff Eastman <je...@windwardsolutions.com>
> Objet: Re: Thoughts on timeline for first release?
> À: mahout-dev@lucene.apache.org
> Date: Mercredi 21 Mai 2008, 17h10
> Does anybody have some links to datasets we can use for
> clustering 
> examples? I'm thinking we could publish an EC2 AMI that
> includes Hadoop 
> and Mahout, along with a script to deploy it on a cluster,
> upload the 
> examples and run clustering on it. Is that too ambitious?
> I'm kinda 
> hoping that we can use 0.17 which advertises simpler EC2
> deployment than 
> 0.16. If that won't meet our schedule then maybe I
> should work through 
> the 0.16 deployment.
> 
> Jeff
> 
> Grant Ingersoll wrote:
> > I was thinking we should get the Taste stuff in (seems
> to be pretty 
> > close to done) and I would like to get Mahout-9 (Naive
> Bayes) in.  
> > This would give us a pretty nice release, I think. 
> Namely, a couple 
> > of clustering implementations, a classifier, and, of
> course, Taste.  I 
> > think I can finish up my part in the next week or so. 
> Then, we will 
> > need to start to figure out all the fun of releases
> (signatures, 
> > notices.txt, etc.)  I'd also like to see us have
> an easy to use demo 
> > of the clustering stuff, but it is all right if we
> don't.
> >
> > -Grant
> >
> > On May 21, 2008, at 1:23 AM, Sean Owen wrote:
> >
> >> Just curious, what are people thinking about the
> timeline for a first,
> >> very early release, like an 0.1 release? any open
> tasks that I could
> >> pick up to help?
> >>
> >> Without rushing anything, I'm keen to retire
> my current project site
> >> and forward everybody that's interested to
> Mahout. As long as there's
> >> a .jar distro someone can pick up and use,
> that's cool.
> >>
> >> Sean
> >
> >
> >

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail

Re: Thoughts on timeline for first release?

Posted by Jeff Eastman <je...@windwardsolutions.com>.

Does anybody have some links to datasets we can use for clustering 
examples? I'm thinking we could publish an EC2 AMI that includes Hadoop 
and Mahout, along with a script to deploy it on a cluster, upload the 
examples and run clustering on it. Is that too ambitious? I'm kinda 
hoping that we can use 0.17 which advertises simpler EC2 deployment than 
0.16. If that won't meet our schedule then maybe I should work through 
the 0.16 deployment.

Jeff

Grant Ingersoll wrote:
> I was thinking we should get the Taste stuff in (seems to be pretty 
> close to done) and I would like to get Mahout-9 (Naive Bayes) in.  
> This would give us a pretty nice release, I think.  Namely, a couple 
> of clustering implementations, a classifier, and, of course, Taste.  I 
> think I can finish up my part in the next week or so.  Then, we will 
> need to start to figure out all the fun of releases (signatures, 
> notices.txt, etc.)  I'd also like to see us have an easy to use demo 
> of the clustering stuff, but it is all right if we don't.
>
> -Grant
>
> On May 21, 2008, at 1:23 AM, Sean Owen wrote:
>
>> Just curious, what are people thinking about the timeline for a first,
>> very early release, like an 0.1 release? any open tasks that I could
>> pick up to help?
>>
>> Without rushing anything, I'm keen to retire my current project site
>> and forward everybody that's interested to Mahout. As long as there's
>> a .jar distro someone can pick up and use, that's cool.
>>
>> Sean
>
>
>

Re: Thoughts on timeline for first release?

Posted by Sean Owen <sr...@gmail.com>.

>From my perspective the Taste code is ready to go for an 0.1 release.
The possible outstanding questions are...

- fully merge taste-build.xml into build.xml?
- put that documentation somewhere else in the documentation tree?

more work continues besides that on the code but as far as I am
concerned it is complete and stable and releasable.

On Wed, May 21, 2008 at 7:01 AM, Grant Ingersoll <gs...@apache.org> wrote:
> I was thinking we should get the Taste stuff in (seems to be pretty close to
> done) and I would like to get Mahout-9 (Naive Bayes) in.  This would give us
> a pretty nice release, I think.  Namely, a couple of clustering
> implementations, a classifier, and, of course, Taste.  I think I can finish
> up my part in the next week or so.  Then, we will need to start to figure
> out all the fun of releases (signatures, notices.txt, etc.)  I'd also like
> to see us have an easy to use demo of the clustering stuff, but it is all
> right if we don't.
>

Re: Thoughts on timeline for first release?

Posted by Grant Ingersoll <gs...@apache.org>.

I was thinking we should get the Taste stuff in (seems to be pretty  
close to done) and I would like to get Mahout-9 (Naive Bayes) in.   
This would give us a pretty nice release, I think.  Namely, a couple  
of clustering implementations, a classifier, and, of course, Taste.  I  
think I can finish up my part in the next week or so.  Then, we will  
need to start to figure out all the fun of releases (signatures,  
notices.txt, etc.)  I'd also like to see us have an easy to use demo  
of the clustering stuff, but it is all right if we don't.

-Grant

On May 21, 2008, at 1:23 AM, Sean Owen wrote:

> Just curious, what are people thinking about the timeline for a first,
> very early release, like an 0.1 release? any open tasks that I could
> pick up to help?
>
> Without rushing anything, I'm keen to retire my current project site
> and forward everybody that's interested to Mahout. As long as there's
> a .jar distro someone can pick up and use, that's cool.
>
> Sean