You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Loic Descotte <lo...@kelkoo.com> on 2011/10/27 10:53:12 UTC

test classification : give weight to words + words location

Hello,

I'm working on a classification problem. My datasets are basically text entries.

To find the right class, I know that some words are very important. Is
there a way to tell the classifier that this words should have a greater
weight?

Another very important thing is the position of this words and their
distance to other important words.

Example: I want to classifify black and white cars. I know that the
words "car", "sedan" and "limo" are very important, and that their
localisation in relation to "white" and "black" words is very important too.

The sentence "white sedan with dark windows" sould be classified in
white cars, not black cars even if the black word is here.
The localisation of coulors ("black" is further than "sedan" in relation
to"white") should help us a lot.


Is there a way to express that with Mahout classifiers (I 'm currently testing with SGD) ?
If yes, do you have any idea or example about how to do that?

Thans a lot for your help

Loic


Re: test classification : give weight to words + words location

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Oct 27, 2011 at 2:08 AM, Sean Owen <sr...@gmail.com> wrote:

> How about using bigrams instead of single words?
>

Exactly what I was about to say.

The classifier worries about the weights.  The suggestion here is to
consider your features to be:

text: white sedan with dark windows
bigrams: white_sedan, sedan_with, with_dark, dark_windows


> On Oct 27, 2011 9:54 AM, "Loic Descotte" <lo...@kelkoo.com> wrote:
>
> > Hello,
> >
> > I'm working on a classification problem. My datasets are basically text
> > entries.
> >
> > To find the right class, I know that some words are very important. Is
> > there a way to tell the classifier that this words should have a greater
> > weight?
> >
> > Another very important thing is the position of this words and their
> > distance to other important words.
> >
> > Example: I want to classifify black and white cars. I know that the
> > words "car", "sedan" and "limo" are very important, and that their
> > localisation in relation to "white" and "black" words is very important
> > too.
> >
> > The sentence "white sedan with dark windows" sould be classified in
> > white cars, not black cars even if the black word is here.
> > The localisation of coulors ("black" is further than "sedan" in relation
> > to"white") should help us a lot.
> >
> >
> > Is there a way to express that with Mahout classifiers (I 'm currently
> > testing with SGD) ?
> > If yes, do you have any idea or example about how to do that?
> >
> > Thans a lot for your help
> >
> > Loic
> >
> >
>

Re: test classification : give weight to words + words location

Posted by Sean Owen <sr...@gmail.com>.
How about using bigrams instead of single words?
On Oct 27, 2011 9:54 AM, "Loic Descotte" <lo...@kelkoo.com> wrote:

> Hello,
>
> I'm working on a classification problem. My datasets are basically text
> entries.
>
> To find the right class, I know that some words are very important. Is
> there a way to tell the classifier that this words should have a greater
> weight?
>
> Another very important thing is the position of this words and their
> distance to other important words.
>
> Example: I want to classifify black and white cars. I know that the
> words "car", "sedan" and "limo" are very important, and that their
> localisation in relation to "white" and "black" words is very important
> too.
>
> The sentence "white sedan with dark windows" sould be classified in
> white cars, not black cars even if the black word is here.
> The localisation of coulors ("black" is further than "sedan" in relation
> to"white") should help us a lot.
>
>
> Is there a way to express that with Mahout classifiers (I 'm currently
> testing with SGD) ?
> If yes, do you have any idea or example about how to do that?
>
> Thans a lot for your help
>
> Loic
>
>