You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Isabel Drost <ap...@isabel-drost.de> on 2008/04/01 08:02:21 UTC

Re: Fast Feather Track

On Monday 31 March 2008, Karl Wettin wrote:
> I think it is worth listing all the algorithms people have submitted as
> GSoC proposals. It is an amazingly large group of people when you
> consider at how long the project has been around.

+1 Thanks for the comment - added them. Looks really impressive now - 
unfortunately I guess the list was outdated at the moment I wrote it down ;)


> I also think you should add an introduction slide to ML so people that
> does not yet know they can benefit from it will understand. Perhaps that
> is the same thing as the "Problem setting"? I'll rant on though.

+1 Thanks for ranting it. It should be the same as "Problem setting". Waking 
up this morning I still think the essential part of learning models from data 
is still missing - despite the many application examples. Will add that this 
afternoon.


> Nutch has an ngram based language identifier. Lucene has a "more like
> this" feature. Carrot cluster search results. LingPipe does a whole lot
> of things with text I think many would like to see in Mahout.

Any other examples? I will add these to the next version. (Did not have that 
mail when I made the corresponding slide.


> One important thing is that people might not be aware that they store
> structured minable data. There is a lot of facetted classifications,
> tags, ratings and what not that is not used to its full potential.

I tried to give a few examples on the Problem Setting slide. Maybe this slide 
can move further back into some "We need you/what can you do with Mahout" 
context and at the Problem setting I would put a slide on learning models 
from data. Thanks for the examples you gave.


Isabel


-- 
If you wait long enough, it will go away... after having done its damage.If it 
was bad, it will be back.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Fast Feather Track

Posted by Karl Wettin <ka...@gmail.com>.
Isabel Drost skrev:
> On Monday 31 March 2008, Karl Wettin wrote:
>> Nutch has an ngram based language identifier. Lucene has a "more like
>> this" feature. Carrot cluster search results. LingPipe does a whole lot
>> of things with text I think many would like to see in Mahout.
> 
> Any other examples? I will add these to the next version. (Did not have that 
> mail when I made the corresponding slide.

Some "did you mean" must count as machine learning. Nice example where 
there is no need for other data than users correcting their own typos, 
accepting/declining suggestions and inspecting results. (Reinforcement 
learning)


       karl