You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2010/05/11 01:19:13 UTC

Draft of May board report available; comments needed by Wednesday

Our first board report as a top-level project is due quite shortly --
snuck up on me.

I need your feedback and edits on our report by end-of-day Wednesday
if you please:

https://svn.apache.org/repos/asf/lucene/mahout/pmc/board-reports/2010/board-report-may.txt

Feel free to edit this directly, or post comments here and I'll take care of it.

Sean

Re: Draft of May board report available; comments needed by Wednesday

Posted by Ted Dunning <te...@gmail.com>.
The best characterization I have heard recently distinguished between
"traditional statistics" and "data mining".  The key factor
in the distinction was that in traditional statistics, you test hypotheses
against data whereas in data mining you generate hypotheses (called models)
from the data.

In my view, machine learning is pretty closely synonymous with data mining
and the key distinction is learning from the data.  If you exclude LDA, then
you exclude k-means (which is essentially the same algorithm), but both are
classic unsupervised learning applications.  FPgrowth is in much the same
category as clustering.

I think it is a mistake to assume that only supervised learning is machine
learning.

On Tue, May 11, 2010 at 2:08 AM, Robin Anil <ro...@gmail.com> wrote:

> Just a thought. "scalable machine learning and data-mining libraries" ?.
> FPgrowth is not machine learning. Similary LDA is not machine learning but
> more like data modelling. I know, its all fuzzy, and wish we had a better
> way to say it. "tools for understanding patterns from data and predicting
> from learned ones"
>

Re: Draft of May board report available; comments needed by Wednesday

Posted by Sean Owen <sr...@gmail.com>.
Sounds good, will qualify that too. (I'm too lazy at the moment, but
the web site ought to say that too, then.)

On Tue, May 11, 2010 at 10:08 AM, Robin Anil <ro...@gmail.com> wrote:
> Just a thought. "scalable machine learning and data-mining libraries" ?.
> FPgrowth is not machine learning. Similary LDA is not machine learning but
> more like data modelling. I know, its all fuzzy, and wish we had a better
> way to say it. "tools for understanding patterns from data and predicting
> from learned ones"
> Many people who are not in this field wont know the difference even if we
> say its all machine learning.
>

Re: Draft of May board report available; comments needed by Wednesday

Posted by Robin Anil <ro...@gmail.com>.
Just a thought. "scalable machine learning and data-mining libraries" ?.
FPgrowth is not machine learning. Similary LDA is not machine learning but
more like data modelling. I know, its all fuzzy, and wish we had a better
way to say it. "tools for understanding patterns from data and predicting
from learned ones"
Many people who are not in this field wont know the difference even if we
say its all machine learning.

On Tue, May 11, 2010 at 12:28 PM, Sean Owen <sr...@gmail.com> wrote:

> No problem, I'll reword that accordingly.
>
> On Tue, May 11, 2010 at 3:16 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > I think that generally looks good except that I disagree with the
> scalable =
> > hadoop-based assertion. I think that scalable means that we handle big
> > problems that are difficult with other tools. Hadoop is one key method
> for
> > this and we use it a lot. In other cases we use other methods. One
> example
> > is how taste handles delivery of realtime recs. Another is where clever
> > algorithms like Pegasos or sgd which give large scale results on single
> > machines.
>

Re: Draft of May board report available; comments needed by Wednesday

Posted by Sean Owen <sr...@gmail.com>.
No problem, I'll reword that accordingly.

On Tue, May 11, 2010 at 3:16 AM, Ted Dunning <te...@gmail.com> wrote:
> I think that generally looks good except that I disagree with the scalable =
> hadoop-based assertion. I think that scalable means that we handle big
> problems that are difficult with other tools. Hadoop is one key method for
> this and we use it a lot. In other cases we use other methods. One example
> is how taste handles delivery of realtime recs. Another is where clever
> algorithms like Pegasos or sgd which give large scale results on single
> machines.

Re: Draft of May board report available; comments needed by Wednesday

Posted by Ted Dunning <te...@gmail.com>.
I think that generally looks good except that I disagree with the  
scalable = hadoop-based assertion. I think that scalable means that we  
handle big problems that are difficult with other tools. Hadoop is one  
key method for this and we use it a lot. In other cases we use other  
methods. One example is how taste handles delivery of realtime recs.  
Another is where clever algorithms like Pegasos or sgd which give  
large scale results on single machines.

Sent from my iPhone

On May 10, 2010, at 4:19 PM, Sean Owen <sr...@gmail.com> wrote:

> Our first board report as a top-level project is due quite shortly --
> snuck up on me.
>
> I need your feedback and edits on our report by end-of-day Wednesday
> if you please:
>
> https://svn.apache.org/repos/asf/lucene/mahout/pmc/board-reports/2010/board-report-may.txt
>
> Feel free to edit this directly, or post comments here and I'll take  
> care of it.
>
> Sean

Re: Draft of May board report available; comments needed by Wednesday

Posted by Sean Owen <sr...@gmail.com>.
I don't think it hurts to add a sentence about that, will do.

On Tue, May 11, 2010 at 2:09 PM, Drew Farris <dr...@gmail.com> wrote:
> Should we add anything about the move, getting the website set up, etc? It
> is not directly related to the goals of the project, but it is work that's
> being done. Not sure if this sort of thing goes in the board report or not.

Re: Draft of May board report available; comments needed by Wednesday

Posted by Drew Farris <dr...@gmail.com>.
Should we add anything about the move, getting the website set up, etc? It
is not directly related to the goals of the project, but it is work that's
being done. Not sure if this sort of thing goes in the board report or not.

Drew

On Mon, May 10, 2010 at 7:19 PM, Sean Owen <sr...@gmail.com> wrote:

> Our first board report as a top-level project is due quite shortly --
> snuck up on me.
>
> I need your feedback and edits on our report by end-of-day Wednesday
> if you please:
>
>
> https://svn.apache.org/repos/asf/lucene/mahout/pmc/board-reports/2010/board-report-may.txt
>
> Feel free to edit this directly, or post comments here and I'll take care
> of it.
>
> Sean
>

Re: Draft of May board report available; comments needed by Wednesday

Posted by Sean Owen <sr...@gmail.com>.
No worries, it's already there. I checked.

On Tue, May 11, 2010 at 1:38 PM, Grant Ingersoll <gs...@apache.org> wrote:
> If you haven't (or infra hasn't) already, we'll need to setup an entry in the committee-info.txt file.  It's under the foundation private section.  I'm not entirely sure if it is ASF Members only or not.  I'll send you a link privately.

Re: Draft of May board report available; comments needed by Wednesday

Posted by Grant Ingersoll <gs...@apache.org>.
If you haven't (or infra hasn't) already, we'll need to setup an entry in the committee-info.txt file.  It's under the foundation private section.  I'm not entirely sure if it is ASF Members only or not.  I'll send you a link privately.  

On May 11, 2010, at 8:12 AM, Sean Owen wrote:

> Done, I had kept it to 77 chars I thought but will double-check that
> before submitting.
> 
> Subscribed, and started the process to add Mahout to
> apache.org/foundation
> (https://issues.apache.org/jira/browse/INFRA-2698), which was also
> listed on the chair duties to-do list.
> 
> On Tue, May 11, 2010 at 12:12 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> Looks good, Sean.  Report should be formatted to no more than 80 chars wide, as the board seems to live in the dark ages still when it comes to screen width.
>> 
>> You should subscribe to board@a.o (see the chair duties on www.apache.org/dev)
>> 
>> I'll change the SVN karma to give you permission and also to setup the Mahout SVN karma.
>> 
>> On May 10, 2010, at 7:19 PM, Sean Owen wrote:



Re: Draft of May board report available; comments needed by Wednesday

Posted by Sean Owen <sr...@gmail.com>.
Done, I had kept it to 77 chars I thought but will double-check that
before submitting.

Subscribed, and started the process to add Mahout to
apache.org/foundation
(https://issues.apache.org/jira/browse/INFRA-2698), which was also
listed on the chair duties to-do list.

On Tue, May 11, 2010 at 12:12 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Looks good, Sean.  Report should be formatted to no more than 80 chars wide, as the board seems to live in the dark ages still when it comes to screen width.
>
> You should subscribe to board@a.o (see the chair duties on www.apache.org/dev)
>
> I'll change the SVN karma to give you permission and also to setup the Mahout SVN karma.
>
> On May 10, 2010, at 7:19 PM, Sean Owen wrote:

Re: Draft of May board report available; comments needed by Wednesday

Posted by Grant Ingersoll <gs...@apache.org>.
Looks good, Sean.  Report should be formatted to no more than 80 chars wide, as the board seems to live in the dark ages still when it comes to screen width.

You should subscribe to board@a.o (see the chair duties on www.apache.org/dev)

I'll change the SVN karma to give you permission and also to setup the Mahout SVN karma.

On May 10, 2010, at 7:19 PM, Sean Owen wrote:

> Our first board report as a top-level project is due quite shortly --
> snuck up on me.
> 
> I need your feedback and edits on our report by end-of-day Wednesday
> if you please:
> 
> https://svn.apache.org/repos/asf/lucene/mahout/pmc/board-reports/2010/board-report-may.txt
> 
> Feel free to edit this directly, or post comments here and I'll take care of it.
> 
> Sean