You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Thilo Goetz <tw...@gmx.de> on 2008/01/30 20:46:41 UTC

Mahout for NLP?

Hi,

it's great to see a ML project started at Apache!

I have a little bit of background in applying ML to various
NLP tasks such as text classification, POS tagging and
entity detection.  I'm not a ML algorithms guy, though.  I'm
wondering if these kinds of tasks are among those you guys
had in mind when you started this project?

If yes, then I have a follow-up question.  In these NLP tasks,
choosing and extracting the right kinds of features is just
as important as the actual learning algorithm you employ.  Any
thoughts on that?  Would these kinds of feature selection
tasks be in scope for Mahout, or would you consider that a
a separate problem to be dealt with elsewhere?

Anyway, I'll certainly hang out here and see where this is
going.  If things are happening around text/NLP, I may be able
to contribute.

I'd also like to mention that over in the UIMA incubator
project, we have a sandbox project going that does Hidden Markov
Model-based POS tagging, with promising results.  Not sure if
there can be any synergies there.  I didn't see HMMs mentioned
in the map/reduce paper and understand this stuff too little
to know if they fit the Statistical Query model.

--Thilo

Re: Mahout for NLP?

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 30, 2008, at 3:51 PM, Thilo Goetz wrote:

> Great, thanks!  I have since found the Mahout incubator proposal,
> which also talks about NLP and mentions UIMA.  I really like that
> text, and it looks like a lot of work went into it.  Why not put
> (some of) it on the Mahout website?

Yep, definitely something we are transitioning to as we get rolling  
here.  I don't have any plans to take down the startup site, though,  
so it should be around for a good long time.

And, yes, I second Isabel's sentiment that we are glad to have some  
UIMA folks snooping around, as I definitely think we will want to  
integrate w/ UIMA at some point.

We have also talked about implementing POS taggers in Mahout, but  
nothing concrete as of yet.  I know Steve Rowe has an interest in a  
Brill implementation, for example.  Mostly, our initial 10 algorithms  
that we picked should be thought of as seeds to form a community of  
interest.

As I see it, Mahout is big enough for a lot of quality approaches,  
and, eventually, we should be a TLP once it has obtained a level of  
maturity.   The key, of course, is what us, the community,  
contributes.  After becoming a TLP, we will obtain world domination as  
the machines take over...  :-)

-Grant

Re: Mahout for NLP?

Posted by Thilo Goetz <tw...@gmx.de>.
Great, thanks!  I have since found the Mahout incubator proposal,
which also talks about NLP and mentions UIMA.  I really like that
text, and it looks like a lot of work went into it.  Why not put
(some of) it on the Mahout website?

--Thilo

Isabel Drost wrote:
> On Wednesday 30 January 2008, Thilo Goetz wrote:
>> I have a little bit of background in applying ML to various
>> NLP tasks such as text classification, POS tagging and
>> entity detection. 
> 
> Certainly that is interesting for us.
> 
> 
>> Would these kinds of feature selection tasks be in scope for Mahout, or
>> would you consider that a separate problem to be dealt with elsewhere?
> 
> Sure - if someone wants to work on feature selection, we would be happy to 
> welcome him in our project.
> 
> 
>> Anyway, I'll certainly hang out here and see where this is
>> going.  If things are happening around text/NLP, I may be able
>> to contribute.
> 
> Glad to have you here! I am especially happy to have someone from the UIMA 
> project over here.
> 
> 
>> I'd also like to mention that over in the UIMA incubator
>> project, we have a sandbox project going that does Hidden Markov
>> Model-based POS tagging, with promising results.  Not sure if
>> there can be any synergies there. 
> 
> Before going to Apache we actually thought about integrating our results as 
> UIMA components - if that turns out to be possible. So I guess, there 
> certainly will be synergies.
>  
> 


Re: Mahout for NLP?

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Wednesday 30 January 2008, Thilo Goetz wrote:
> I have a little bit of background in applying ML to various
> NLP tasks such as text classification, POS tagging and
> entity detection. 

Certainly that is interesting for us.


> Would these kinds of feature selection tasks be in scope for Mahout, or
> would you consider that a separate problem to be dealt with elsewhere?

Sure - if someone wants to work on feature selection, we would be happy to 
welcome him in our project.


> Anyway, I'll certainly hang out here and see where this is
> going.  If things are happening around text/NLP, I may be able
> to contribute.

Glad to have you here! I am especially happy to have someone from the UIMA 
project over here.


> I'd also like to mention that over in the UIMA incubator
> project, we have a sandbox project going that does Hidden Markov
> Model-based POS tagging, with promising results.  Not sure if
> there can be any synergies there. 

Before going to Apache we actually thought about integrating our results as 
UIMA components - if that turns out to be possible. So I guess, there 
certainly will be synergies.
 

-- 
Absence makes the heart grow fonder.		-- Sextus Aurelius
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>