You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Benson Margulies <bi...@gmail.com> on 2010/01/16 14:41:01 UTC

Abbreviations?

I have approval from the CEO to contribute our collection of
abbreviations to Mahout.

We use them with the ICU breakers.

I guess IP clearance is called for here, but, thinking ahead, where
would people like to see files of abbreviations in various languages
show up?

Re: Abbreviations?

Posted by Ted Dunning <te...@gmail.com>.
+1 as well.

I think it should be in core rather than utils due to dependency issues.

On Sat, Jan 16, 2010 at 7:16 AM, Olivier Grisel <ol...@ensta.org>wrote:

> 2010/1/16 Grant Ingersoll <gs...@apache.org>:
> > I think we should start a new module, that will be the seed for a
> subproject, called NLP and that contains the stuff for NLP.
> >
> > Either that or put them in the utils module, which is where I envision
> all of things that are "helpful" for ML go, but aren't required.
>
> +1 for an explicit "org.apache.mahout.nlp module". Tools to turn
> wikipedia dumps into term freq vectors could also move there instead
> of "examples".
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Abbreviations?

Posted by Robin Anil <ro...@gmail.com>.
+1 from my side for nlp stuff

On Sun, Jan 17, 2010 at 7:08 PM, Drew Farris <dr...@gmail.com> wrote:

> On Sat, Jan 16, 2010 at 12:49 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > How about src/main/resources/nlp?
> >
>
> +1 to src/main/resources/nlp in mahout-core
>

Re: Abbreviations?

Posted by Drew Farris <dr...@gmail.com>.
On Sat, Jan 16, 2010 at 12:49 PM, Ted Dunning <te...@gmail.com> wrote:
> How about src/main/resources/nlp?
>

+1 to src/main/resources/nlp in mahout-core

Re: Abbreviations?

Posted by Ted Dunning <te...@gmail.com>.
How about src/main/resources/nlp?

On Sat, Jan 16, 2010 at 9:31 AM, Benson Margulies <bi...@gmail.com>wrote:

> Sure.
>
> However, the immediate contribution is data. src/main/resources? Something
> else?
>
> On Sat, Jan 16, 2010 at 10:16 AM, Olivier Grisel
> <ol...@ensta.org> wrote:
> > 2010/1/16 Grant Ingersoll <gs...@apache.org>:
> >> I think we should start a new module, that will be the seed for a
> subproject, called NLP and that contains the stuff for NLP.
> >>
> >> Either that or put them in the utils module, which is where I envision
> all of things that are "helpful" for ML go, but aren't required.
> >
> > +1 for an explicit "org.apache.mahout.nlp module". Tools to turn
> > wikipedia dumps into term freq vectors could also move there instead
> > of "examples".
> >
> > --
> > Olivier
> > http://twitter.com/ogrisel - http://code.oliviergrisel.name
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Abbreviations?

Posted by Benson Margulies <bi...@gmail.com>.
Sure.

However, the immediate contribution is data. src/main/resources? Something else?

On Sat, Jan 16, 2010 at 10:16 AM, Olivier Grisel
<ol...@ensta.org> wrote:
> 2010/1/16 Grant Ingersoll <gs...@apache.org>:
>> I think we should start a new module, that will be the seed for a subproject, called NLP and that contains the stuff for NLP.
>>
>> Either that or put them in the utils module, which is where I envision all of things that are "helpful" for ML go, but aren't required.
>
> +1 for an explicit "org.apache.mahout.nlp module". Tools to turn
> wikipedia dumps into term freq vectors could also move there instead
> of "examples".
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>

Re: Abbreviations?

Posted by Olivier Grisel <ol...@ensta.org>.
2010/1/16 Grant Ingersoll <gs...@apache.org>:
> I think we should start a new module, that will be the seed for a subproject, called NLP and that contains the stuff for NLP.
>
> Either that or put them in the utils module, which is where I envision all of things that are "helpful" for ML go, but aren't required.

+1 for an explicit "org.apache.mahout.nlp module". Tools to turn
wikipedia dumps into term freq vectors could also move there instead
of "examples".

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Re: Abbreviations?

Posted by Grant Ingersoll <gs...@apache.org>.
I think we should start a new module, that will be the seed for a subproject, called NLP and that contains the stuff for NLP.  

Either that or put them in the utils module, which is where I envision all of things that are "helpful" for ML go, but aren't required.

On Jan 16, 2010, at 8:41 AM, Benson Margulies wrote:

> I have approval from the CEO to contribute our collection of
> abbreviations to Mahout.
> 
> We use them with the ICU breakers.
> 
> I guess IP clearance is called for here, but, thinking ahead, where
> would people like to see files of abbreviations in various languages
> show up?