You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Isabel Drost <ap...@isabel-drost.de> on 2008/03/30 22:29:13 UTC

Fast Feather Track

Hello,

my proposal for presenting our project at the Fast Feather session at Apache 
Con EU was accepted.

I am currently about to prepare the slides for my talk. I would like to 
include one slide on the project members that were so crazy to start all this 
half a year ago. It would be nice if I could add a little picture of each of 
you, so there is a face beside the name ;)

Please find the initial slides at the following url:
http://www.isabel-drost.de/mahout_fast_feather.odp

If you have any comments on what is missing or should be done differently - I 
am happy about any feedback, criticism, ... :)

Isabel


-- 
It'll be a nice world if they ever get it finished.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Fast Feather Track

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Monday 31 March 2008, Lukas Vlcek wrote:
> As a side note - try remember how do people react to the log draft and what
> they say about it. This information could help me to shape it into final
> version.

Sure! :)

Isabel

-- 
A foolish consistency is the hobgoblin of little minds.		-- Ralph Waldo 
Emerson
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Fast Feather Track

Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,

Nice presentation! I regret I can't attend...

As a side note - try remember how do people react to the log draft and what
they say about it. This information could help me to shape it into final
version.

Regards,
Lukas

On Mon, Mar 31, 2008 at 3:27 PM, Karl Wettin <ka...@gmail.com> wrote:

> Isabel Drost skrev:
>  > I have added a pdf version for those that do not have oo:
>  >
>  > http://www.isabel-drost.de/mahout_fast_feather.pdf
>  >
>  > This evening, I will add the missing content of the "Problem setting"
>
> I think it is worth listing all the algorithms people have submitted as
> GSoC proposals. It is an amazingly large group of people when you
> consider at how long the project has been around.
>
> I also think you should add an introduction slide to ML so people that
> does not yet know they can benefit from it will understand. Perhaps that
> is the same thing as the "Problem setting"? I'll rant on though.
>
> You already mention the many relationships with Lucene and that text
> mining probably will be something big. How about listing some examples,
> starting with the various pseudo-ML stuff already in existing in various
> Lucene trunks, and perhaps how the new algorithms could improve or add
> features to structured an unstructured data already available in their
> applications.
>
> Nutch has an ngram based language identifier. Lucene has a "more like
> this" feature. Carrot cluster search results. LingPipe does a whole lot
> of things with text I think many would like to see in Mahout.
>
>
> One important thing is that people might not be aware that they store
> structured minable data. There is a lot of facetted classifications,
> tags, ratings and what not that is not used to its full potential.
>
> There is more minable data to be extracted everywhere and it can often
> be used as feedback to improve it self. (Did you ever make music on a
> modular synthesizer?)
>
> A photo site could extract social networks by using facial biometrics to
> find out who is who in pictures. This social network can then be used to
> improve the quality of the biometric classifer.
>
> The site could further expand the social network by looking at who
> writes comments on whos pictures. Trust between users could be evalutaed
> and used to pune what ratings to extract from the from text comments to
> picutes, ratings used be feed to collaborate filtering used by users to
> find new interesting photographers and by the site to show ads that the
> user is more probable to be interested in.
>
> And so on.
>
>
>     karl
>
>


-- 
http://blog.lukas-vlcek.com/

Re: Fast Feather Track

Posted by Karl Wettin <ka...@gmail.com>.
Isabel Drost skrev:
> On Monday 31 March 2008, Karl Wettin wrote:
>> Nutch has an ngram based language identifier. Lucene has a "more like
>> this" feature. Carrot cluster search results. LingPipe does a whole lot
>> of things with text I think many would like to see in Mahout.
> 
> Any other examples? I will add these to the next version. (Did not have that 
> mail when I made the corresponding slide.

Some "did you mean" must count as machine learning. Nice example where 
there is no need for other data than users correcting their own typos, 
accepting/declining suggestions and inspecting results. (Reinforcement 
learning)


       karl



Re: Fast Feather Track

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Monday 31 March 2008, Karl Wettin wrote:
> I think it is worth listing all the algorithms people have submitted as
> GSoC proposals. It is an amazingly large group of people when you
> consider at how long the project has been around.

+1 Thanks for the comment - added them. Looks really impressive now - 
unfortunately I guess the list was outdated at the moment I wrote it down ;)


> I also think you should add an introduction slide to ML so people that
> does not yet know they can benefit from it will understand. Perhaps that
> is the same thing as the "Problem setting"? I'll rant on though.

+1 Thanks for ranting it. It should be the same as "Problem setting". Waking 
up this morning I still think the essential part of learning models from data 
is still missing - despite the many application examples. Will add that this 
afternoon.


> Nutch has an ngram based language identifier. Lucene has a "more like
> this" feature. Carrot cluster search results. LingPipe does a whole lot
> of things with text I think many would like to see in Mahout.

Any other examples? I will add these to the next version. (Did not have that 
mail when I made the corresponding slide.


> One important thing is that people might not be aware that they store
> structured minable data. There is a lot of facetted classifications,
> tags, ratings and what not that is not used to its full potential.

I tried to give a few examples on the Problem Setting slide. Maybe this slide 
can move further back into some "We need you/what can you do with Mahout" 
context and at the Problem setting I would put a slide on learning models 
from data. Thanks for the examples you gave.


Isabel


-- 
If you wait long enough, it will go away... after having done its damage.If it 
was bad, it will be back.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Fast Feather Track

Posted by Karl Wettin <ka...@gmail.com>.
Isabel Drost skrev:
 > I have added a pdf version for those that do not have oo:
 >
 > http://www.isabel-drost.de/mahout_fast_feather.pdf
 >
 > This evening, I will add the missing content of the "Problem setting"

I think it is worth listing all the algorithms people have submitted as 
GSoC proposals. It is an amazingly large group of people when you 
consider at how long the project has been around.

I also think you should add an introduction slide to ML so people that 
does not yet know they can benefit from it will understand. Perhaps that 
is the same thing as the "Problem setting"? I'll rant on though.

You already mention the many relationships with Lucene and that text 
mining probably will be something big. How about listing some examples, 
starting with the various pseudo-ML stuff already in existing in various 
Lucene trunks, and perhaps how the new algorithms could improve or add 
features to structured an unstructured data already available in their 
applications.

Nutch has an ngram based language identifier. Lucene has a "more like 
this" feature. Carrot cluster search results. LingPipe does a whole lot 
of things with text I think many would like to see in Mahout.


One important thing is that people might not be aware that they store 
structured minable data. There is a lot of facetted classifications, 
tags, ratings and what not that is not used to its full potential.

There is more minable data to be extracted everywhere and it can often 
be used as feedback to improve it self. (Did you ever make music on a 
modular synthesizer?)

A photo site could extract social networks by using facial biometrics to 
find out who is who in pictures. This social network can then be used to 
improve the quality of the biometric classifer.

The site could further expand the social network by looking at who 
writes comments on whos pictures. Trust between users could be evalutaed 
and used to pune what ratings to extract from the from text comments to 
picutes, ratings used be feed to collaborate filtering used by users to 
find new interesting photographers and by the site to show ads that the 
user is more probable to be interested in.

And so on.


     karl


Re: Fast Feather Track

Posted by Isabel Drost <ap...@isabel-drost.de>.
I have added a pdf version for those that do not have oo:

http://www.isabel-drost.de/mahout_fast_feather.pdf

This evening, I will add the missing content of the "Problem setting" slide 
and refactor the "Who we are" slide with your pictures and the missing names.

Isabel

-- 
Most people want either less corruption or more of a chance to participate in 
it.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Fast Feather Track

Posted by Ted Dunning <td...@veoh.com>.
See here for a picture of me: http://www.veoh.com/users/ted


On 3/30/08 1:29 PM, "Isabel Drost" <ap...@isabel-drost.de> wrote:

> 
> Hello,
> 
> my proposal for presenting our project at the Fast Feather session at Apache
> Con EU was accepted.
> 
> I am currently about to prepare the slides for my talk. I would like to
> include one slide on the project members that were so crazy to start all this
> half a year ago. It would be nice if I could add a little picture of each of
> you, so there is a face beside the name ;)
> 
> Please find the initial slides at the following url:
> http://www.isabel-drost.de/mahout_fast_feather.odp
> 
> If you have any comments on what is missing or should be done differently - I
> am happy about any feedback, criticism, ... :)
> 
> Isabel
>