You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Timothy Mann <ma...@gmail.com> on 2012/10/26 05:22:27 UTC

What is the desired behavior of NGrams.generateNGrams()?

I'm trying to write javadoc comments for
org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes
sense, but I'm puzzled by the implementation of generateNGrams().

Map<String,List<String>> NGrams.generateNGrams() returns a Map from
'labels' to a list of 'tokens' (where each token is an n-gram of words
separated by single spaces). In the current implementation only a single
('label', list of tokens) pair is put in the map. The 'label' is just the
first word extracted from the specified text. I am guessing that the
returned Map is being used as a pair. What is the significance of the
'label'?

Thank you for your help.

-Timothy Mann

Re: What is the desired behavior of NGrams.generateNGrams()?

Posted by Timothy Mann <ma...@gmail.com>.
I recommend simply removing org.apache.mahout.common.nlp package, unless
there is a long term plan for it. NGrams is the only class in the package
and no one seems to know what the behavior of Map<String,List<String>>
NGrams.generateNGrams() should be. Furthermore, no one seems to be using
it. Even if someone is using it, the code is very small and could be
incorporated into the non-mahout side of the project.

There is some (independently implemented) n-grams computation going on in* *
org.apache.mahout.vectorizer.collocations.llr.CollocDriver* *but I don't
think this is related to NLP. Otherwise it might make sense to try to merge
the functionality (eventually).

-Tim


On Sat, Nov 3, 2012 at 12:25 PM, Sean Owen <sr...@gmail.com> wrote:

> (I also don't see any usages.)
>
>
> On Sat, Nov 3, 2012 at 5:08 PM, Timothy Mann <ma...@gmail.com>
> wrote:
>
> > It looks like nothing in the core package is using
> > org.apache.mahout.common.nlp.NGrams. Is anyone using this class?
> >
> > -Tim
> >
> >
> > On Thu, Oct 25, 2012 at 10:22 PM, Timothy Mann <mann.timothy@gmail.com
> > >wrote:
> >
> > > I'm trying to write javadoc comments for
> > > org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes
> > > sense, but I'm puzzled by the implementation of generateNGrams().
> > >
> > > Map<String,List<String>> NGrams.generateNGrams() returns a Map from
> > > 'labels' to a list of 'tokens' (where each token is an n-gram of words
> > > separated by single spaces). In the current implementation only a
> single
> > > ('label', list of tokens) pair is put in the map. The 'label' is just
> the
> > > first word extracted from the specified text. I am guessing that the
> > > returned Map is being used as a pair. What is the significance of the
> > > 'label'?
> > >
> > > Thank you for your help.
> > >
> > > -Timothy Mann
> > >
> >
>

Re: What is the desired behavior of NGrams.generateNGrams()?

Posted by Sean Owen <sr...@gmail.com>.
(I also don't see any usages.)


On Sat, Nov 3, 2012 at 5:08 PM, Timothy Mann <ma...@gmail.com> wrote:

> It looks like nothing in the core package is using
> org.apache.mahout.common.nlp.NGrams. Is anyone using this class?
>
> -Tim
>
>
> On Thu, Oct 25, 2012 at 10:22 PM, Timothy Mann <mann.timothy@gmail.com
> >wrote:
>
> > I'm trying to write javadoc comments for
> > org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes
> > sense, but I'm puzzled by the implementation of generateNGrams().
> >
> > Map<String,List<String>> NGrams.generateNGrams() returns a Map from
> > 'labels' to a list of 'tokens' (where each token is an n-gram of words
> > separated by single spaces). In the current implementation only a single
> > ('label', list of tokens) pair is put in the map. The 'label' is just the
> > first word extracted from the specified text. I am guessing that the
> > returned Map is being used as a pair. What is the significance of the
> > 'label'?
> >
> > Thank you for your help.
> >
> > -Timothy Mann
> >
>

Re: What is the desired behavior of NGrams.generateNGrams()?

Posted by Timothy Mann <ma...@gmail.com>.
It looks like nothing in the core package is using
org.apache.mahout.common.nlp.NGrams. Is anyone using this class?

-Tim


On Thu, Oct 25, 2012 at 10:22 PM, Timothy Mann <ma...@gmail.com>wrote:

> I'm trying to write javadoc comments for
> org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes
> sense, but I'm puzzled by the implementation of generateNGrams().
>
> Map<String,List<String>> NGrams.generateNGrams() returns a Map from
> 'labels' to a list of 'tokens' (where each token is an n-gram of words
> separated by single spaces). In the current implementation only a single
> ('label', list of tokens) pair is put in the map. The 'label' is just the
> first word extracted from the specified text. I am guessing that the
> returned Map is being used as a pair. What is the significance of the
> 'label'?
>
> Thank you for your help.
>
> -Timothy Mann
>