You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by JAGANADH G <ja...@gmail.com> on 2012/10/11 08:44:21 UTC

Create vector from text

Hi All

As of mahout 0.7 a classifier takes vector for classification.
an anybody guide me how to create vector from text. I am not looking to
create vector from a file stored in HDFS or local file system.
In runtime my system will be recieving text input to perform classification.

Best regards

-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Create vector from text

Posted by JAGANADH G <ja...@gmail.com>.
On Thu, Oct 11, 2012 at 12:29 PM, Ted Dunning <te...@gmail.com> wrote:

> You have to tokenize your text and then use some form of vector encoding.
>
> If you have a known dictionary of all interesting words, you can simply
> make a vector as long as the number of words in your dictionary and put a 1
> in the right place.
>
> If you don't want to do that either because you don't know all the words in
> advance or because the number of words is too large, you can use
> a TextValueEncoder to do the deed.  There is sample code in the Mahout in
> Action code for this and Chapter 14 in Mahout in Action talks about the
> code.  You can get the code from http://github.com/tdunning/MiA
>
>

Hi Ted

Thanks for the pointer.
It works.
Sorry to shoot another question.
Is there any way get lable for classifier result as of 0.7 API

Best regards

-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Create vector from text

Posted by Ted Dunning <te...@gmail.com>.
You have to tokenize your text and then use some form of vector encoding.

If you have a known dictionary of all interesting words, you can simply
make a vector as long as the number of words in your dictionary and put a 1
in the right place.

If you don't want to do that either because you don't know all the words in
advance or because the number of words is too large, you can use
a TextValueEncoder to do the deed.  There is sample code in the Mahout in
Action code for this and Chapter 14 in Mahout in Action talks about the
code.  You can get the code from http://github.com/tdunning/MiA

On Wed, Oct 10, 2012 at 11:44 PM, JAGANADH G <ja...@gmail.com> wrote:

> Hi All
>
> As of mahout 0.7 a classifier takes vector for classification.
> an anybody guide me how to create vector from text. I am not looking to
> create vector from a file stored in HDFS or local file system.
> In runtime my system will be recieving text input to perform
> classification.
>
> Best regards
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>