You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Daniele Volpi <da...@gmail.com> on 2011/12/13 18:29:07 UTC

SequenceFile cast problems

Hi everyone,
I'm trying to implement the Naive Bayes classifier through the
TrainNaiveBayesJob class.
After convert the text files in the required sequencefile for the "run"
method through the seqdirectory program i get this error:

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable

Do you have some hints on the right usage of this class?

Thanks,
Daniele Volpi

Re: Status naive bayes [Was: Re: SequenceFile cast problems]

Posted by Grant Ingersoll <gs...@apache.org>.
I asked the same question a few months back and got no reply.  I think the goal is to move to vector based, but I'm not convinced that the new one is totally correct yet.  Robin was the primary author of both, but haven't heard more from him on it.


On Dec 21, 2011, at 1:50 PM, Isabel Drost wrote:

> On 14.12.2011 Grant Ingersoll wrote:
>> While Ted answered the Dissector question, your original issue, I believe,
>> is that Mahout currently has two different NB implementations. 
>> trainclassifier/testclassifier use the old, word based package which
>> requires Text as input.  The new package, which TrainNaiveBayesJob uses,
>> requires VectorWritables.
> 
> While reading that thread it occured to me that this is sort of confusing for 
> users. What is the reason for keeping both implementations? Would it make sense 
> to keep only the vector-based version?
> 
> 
> Isabel

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Status naive bayes [Was: Re: SequenceFile cast problems]

Posted by Isabel Drost <is...@apache.org>.
On 14.12.2011 Grant Ingersoll wrote:
> While Ted answered the Dissector question, your original issue, I believe,
> is that Mahout currently has two different NB implementations. 
> trainclassifier/testclassifier use the old, word based package which
> requires Text as input.  The new package, which TrainNaiveBayesJob uses,
> requires VectorWritables.

While reading that thread it occured to me that this is sort of confusing for 
users. What is the reason for keeping both implementations? Would it make sense 
to keep only the vector-based version?


Isabel

Re: SequenceFile cast problems

Posted by Grant Ingersoll <gs...@apache.org>.
I believe it is supposed to, at least at the high level.  We don't have any 1-1 tests, so YMMV.


On Dec 17, 2011, at 8:26 PM, Lance Norskog wrote:

> Does the new approach do the same thing as the old approach?
> 
> On Thu, Dec 15, 2011 at 1:56 AM, Daniele Volpi <da...@gmail.com> wrote:
>> Yes Grant that was the point of my first question..
>> Now I'll take a look at the vector implementation.
>> Thanks again
>> Daniele
>> 
>> On 14 December 2011 23:44, Grant Ingersoll <gs...@apache.org> wrote:
>>> While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently has two different NB implementations.  trainclassifier/testclassifier use the old, word based package which requires Text as input.  The new package, which TrainNaiveBayesJob uses, requires VectorWritables.  For the latter case, you don't use the BayesFileFormatter at all.  See the asf-email-examples for how to use the Vector based approach.  I realize this is confusing, but we haven't yet made the transition fully to the new vector based approach.
>>> 
>>> -Grant
>>> 
>>> On Dec 14, 2011, at 3:01 AM, Daniele Volpi wrote:
>>> 
>>>> The version is 0.6-SNAPSHOT
>>>> From terminal both commands trainclassifier and testclassifier work.
>>>> Actually my real purpose is to use the TrainNaiveBayesJob in order to
>>>> obtain a StandardNaiveBayesClassifier that i can use with the
>>>> ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
>>>> procedure is completely wrong.
>>>> Thank you
>>>> 
>>>> 
>>>> On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
>>>> 
>>>>> Which version of Mahout?
>>>>> 
>>>>> And what happens when you train the classifier from the command line?
>>>>> 
>>>>> On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> First of all i've converted the train files in the format:
>>>>>> target[\t]terms
>>>>>> through the BayesFileFormatter class.
>>>>>> Then i've converted these files (one per category) in SequenceFile using
>>>>>> the seqdirectory program.
>>>>>> After that I ran this code:
>>>>>> 
>>>>>> TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
>>>>>> trainer.setConf(new Configuration());
>>>>>> 
>>>>>> String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
>>>>>> trainer.run(params);
>>>>>> 
>>>>>> Here's the error message:
>>>>>> 
>>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>>>>>> org.apache.mahout.math.VectorWritable
>>>>>> at
>>>>>> 
>>>>>> 
>>>>> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
>>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>>>> at
>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>>>> 
>>>>>> On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
>>>>>> 
>>>>>>> What steps have you done?
>>>>>>> 
>>>>>>> On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> I'm trying to implement the Naive Bayes classifier through the
>>>>>>>> TrainNaiveBayesJob class.
>>>>>>>> After convert the text files in the required sequencefile for the
>>>>> "run"
>>>>>>>> method through the seqdirectory program i get this error:
>>>>>>>> 
>>>>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
>>>>> cast
>>>>>> to
>>>>>>>> org.apache.mahout.math.VectorWritable
>>>>>>>> 
>>>>>>>> Do you have some hints on the right usage of this class?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Daniele Volpi
>>>>>>> 
>>>>>>> --------------------------------------------
>>>>>>> Grant Ingersoll
>>>>>>> http://www.lucidimagination.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com

--------------------------
Grant Ingersoll
http://www.lucidimagination.com






Re: SequenceFile cast problems

Posted by Lance Norskog <go...@gmail.com>.
Does the new approach do the same thing as the old approach?

On Thu, Dec 15, 2011 at 1:56 AM, Daniele Volpi <da...@gmail.com> wrote:
> Yes Grant that was the point of my first question..
> Now I'll take a look at the vector implementation.
> Thanks again
> Daniele
>
> On 14 December 2011 23:44, Grant Ingersoll <gs...@apache.org> wrote:
>> While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently has two different NB implementations.  trainclassifier/testclassifier use the old, word based package which requires Text as input.  The new package, which TrainNaiveBayesJob uses, requires VectorWritables.  For the latter case, you don't use the BayesFileFormatter at all.  See the asf-email-examples for how to use the Vector based approach.  I realize this is confusing, but we haven't yet made the transition fully to the new vector based approach.
>>
>> -Grant
>>
>> On Dec 14, 2011, at 3:01 AM, Daniele Volpi wrote:
>>
>>> The version is 0.6-SNAPSHOT
>>> From terminal both commands trainclassifier and testclassifier work.
>>> Actually my real purpose is to use the TrainNaiveBayesJob in order to
>>> obtain a StandardNaiveBayesClassifier that i can use with the
>>> ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
>>> procedure is completely wrong.
>>> Thank you
>>>
>>>
>>> On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
>>>
>>>> Which version of Mahout?
>>>>
>>>> And what happens when you train the classifier from the command line?
>>>>
>>>> On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
>>>>> wrote:
>>>>
>>>>> First of all i've converted the train files in the format:
>>>>> target[\t]terms
>>>>> through the BayesFileFormatter class.
>>>>> Then i've converted these files (one per category) in SequenceFile using
>>>>> the seqdirectory program.
>>>>> After that I ran this code:
>>>>>
>>>>> TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
>>>>> trainer.setConf(new Configuration());
>>>>>
>>>>> String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
>>>>> trainer.run(params);
>>>>>
>>>>> Here's the error message:
>>>>>
>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>>>>> org.apache.mahout.math.VectorWritable
>>>>> at
>>>>>
>>>>>
>>>> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>>>
>>>>> On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
>>>>>
>>>>>> What steps have you done?
>>>>>>
>>>>>> On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>> I'm trying to implement the Naive Bayes classifier through the
>>>>>>> TrainNaiveBayesJob class.
>>>>>>> After convert the text files in the required sequencefile for the
>>>> "run"
>>>>>>> method through the seqdirectory program i get this error:
>>>>>>>
>>>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
>>>> cast
>>>>> to
>>>>>>> org.apache.mahout.math.VectorWritable
>>>>>>>
>>>>>>> Do you have some hints on the right usage of this class?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Daniele Volpi
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>>



-- 
Lance Norskog
goksron@gmail.com

Re: SequenceFile cast problems

Posted by Daniele Volpi <da...@gmail.com>.
Yes Grant that was the point of my first question..
Now I'll take a look at the vector implementation.
Thanks again
Daniele

On 14 December 2011 23:44, Grant Ingersoll <gs...@apache.org> wrote:
> While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently has two different NB implementations.  trainclassifier/testclassifier use the old, word based package which requires Text as input.  The new package, which TrainNaiveBayesJob uses, requires VectorWritables.  For the latter case, you don't use the BayesFileFormatter at all.  See the asf-email-examples for how to use the Vector based approach.  I realize this is confusing, but we haven't yet made the transition fully to the new vector based approach.
>
> -Grant
>
> On Dec 14, 2011, at 3:01 AM, Daniele Volpi wrote:
>
>> The version is 0.6-SNAPSHOT
>> From terminal both commands trainclassifier and testclassifier work.
>> Actually my real purpose is to use the TrainNaiveBayesJob in order to
>> obtain a StandardNaiveBayesClassifier that i can use with the
>> ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
>> procedure is completely wrong.
>> Thank you
>>
>>
>> On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
>>
>>> Which version of Mahout?
>>>
>>> And what happens when you train the classifier from the command line?
>>>
>>> On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
>>>> wrote:
>>>
>>>> First of all i've converted the train files in the format:
>>>> target[\t]terms
>>>> through the BayesFileFormatter class.
>>>> Then i've converted these files (one per category) in SequenceFile using
>>>> the seqdirectory program.
>>>> After that I ran this code:
>>>>
>>>> TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
>>>> trainer.setConf(new Configuration());
>>>>
>>>> String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
>>>> trainer.run(params);
>>>>
>>>> Here's the error message:
>>>>
>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>>>> org.apache.mahout.math.VectorWritable
>>>> at
>>>>
>>>>
>>> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>>
>>>> On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
>>>>
>>>>> What steps have you done?
>>>>>
>>>>> On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>> I'm trying to implement the Naive Bayes classifier through the
>>>>>> TrainNaiveBayesJob class.
>>>>>> After convert the text files in the required sequencefile for the
>>> "run"
>>>>>> method through the seqdirectory program i get this error:
>>>>>>
>>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
>>> cast
>>>> to
>>>>>> org.apache.mahout.math.VectorWritable
>>>>>>
>>>>>> Do you have some hints on the right usage of this class?
>>>>>>
>>>>>> Thanks,
>>>>>> Daniele Volpi
>>>>>
>>>>> --------------------------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>

Re: SequenceFile cast problems

Posted by Grant Ingersoll <gs...@apache.org>.
While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently has two different NB implementations.  trainclassifier/testclassifier use the old, word based package which requires Text as input.  The new package, which TrainNaiveBayesJob uses, requires VectorWritables.  For the latter case, you don't use the BayesFileFormatter at all.  See the asf-email-examples for how to use the Vector based approach.  I realize this is confusing, but we haven't yet made the transition fully to the new vector based approach.

-Grant

On Dec 14, 2011, at 3:01 AM, Daniele Volpi wrote:

> The version is 0.6-SNAPSHOT
> From terminal both commands trainclassifier and testclassifier work.
> Actually my real purpose is to use the TrainNaiveBayesJob in order to
> obtain a StandardNaiveBayesClassifier that i can use with the
> ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
> procedure is completely wrong.
> Thank you
> 
> 
> On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
> 
>> Which version of Mahout?
>> 
>> And what happens when you train the classifier from the command line?
>> 
>> On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
>>> wrote:
>> 
>>> First of all i've converted the train files in the format:
>>> target[\t]terms
>>> through the BayesFileFormatter class.
>>> Then i've converted these files (one per category) in SequenceFile using
>>> the seqdirectory program.
>>> After that I ran this code:
>>> 
>>> TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
>>> trainer.setConf(new Configuration());
>>> 
>>> String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
>>> trainer.run(params);
>>> 
>>> Here's the error message:
>>> 
>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>>> org.apache.mahout.math.VectorWritable
>>> at
>>> 
>>> 
>> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>> 
>>> On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> What steps have you done?
>>>> 
>>>> On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
>>>> 
>>>>> Hi everyone,
>>>>> I'm trying to implement the Naive Bayes classifier through the
>>>>> TrainNaiveBayesJob class.
>>>>> After convert the text files in the required sequencefile for the
>> "run"
>>>>> method through the seqdirectory program i get this error:
>>>>> 
>>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
>> cast
>>> to
>>>>> org.apache.mahout.math.VectorWritable
>>>>> 
>>>>> Do you have some hints on the right usage of this class?
>>>>> 
>>>>> Thanks,
>>>>> Daniele Volpi
>>>> 
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: SequenceFile cast problems

Posted by Daniele Volpi <da...@gmail.com>.
Ok, i was thinking i could easily use the ModelDissector class because
requires an AbstractVectorClassifier and the
StandardNaiveBayesClassifier in the naivebayes package extends that
class.

On 14 December 2011 14:42, Ted Dunning <te...@gmail.com> wrote:
>
> I think that using the model dissector with NaiveBayes will not work
> easily.  The assumption inside the model dissector is that there is a model
> matrix compatible with logistic regression to be had.
>
> The easy way to get everything to work is to simply use a single
> categorical variable that can have four values.  Pretend this variable is
> text.  If you use hashed vector encoding, you should be able to continue,
> but you really need to use StaticWordEncoder (name is approximate).
>
> Also, with a tiny example, NB will give unreasonably pessimistic results.
>
> On Wed, Dec 14, 2011 at 6:01 AM, Daniele Volpi <da...@gmail.com>wrote:
>
> > The version is 0.6-SNAPSHOT
> > From terminal both commands trainclassifier and testclassifier work.
> > Actually my real purpose is to use the TrainNaiveBayesJob in order to
> > obtain a StandardNaiveBayesClassifier that i can use with the
> > ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
> > procedure is completely wrong.
> > Thank you
> >
> >
> > On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
> >
> > > Which version of Mahout?
> > >
> > > And what happens when you train the classifier from the command line?
> > >
> > > On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
> > > >wrote:
> > >
> > > > First of all i've converted the train files in the format:
> > > > target[\t]terms
> > > > through the BayesFileFormatter class.
> > > > Then i've converted these files (one per category) in SequenceFile
> > using
> > > > the seqdirectory program.
> > > > After that I ran this code:
> > > >
> > > > TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
> > > > trainer.setConf(new Configuration());
> > > >
> > > > String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
> > > > trainer.run(params);
> > > >
> > > > Here's the error message:
> > > >
> > > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
> > to
> > > > org.apache.mahout.math.VectorWritable
> > > > at
> > > >
> > > >
> > >
> > org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > > at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > >
> > > > On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > > >
> > > > > What steps have you done?
> > > > >
> > > > > On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > > I'm trying to implement the Naive Bayes classifier through the
> > > > > > TrainNaiveBayesJob class.
> > > > > > After convert the text files in the required sequencefile for the
> > > "run"
> > > > > > method through the seqdirectory program i get this error:
> > > > > >
> > > > > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
> > > cast
> > > > to
> > > > > > org.apache.mahout.math.VectorWritable
> > > > > >
> > > > > > Do you have some hints on the right usage of this class?
> > > > > >
> > > > > > Thanks,
> > > > > > Daniele Volpi
> > > > >
> > > > > --------------------------------------------
> > > > > Grant Ingersoll
> > > > > http://www.lucidimagination.com
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >

Re: SequenceFile cast problems

Posted by Ted Dunning <te...@gmail.com>.
I think that using the model dissector with NaiveBayes will not work
easily.  The assumption inside the model dissector is that there is a model
matrix compatible with logistic regression to be had.

The easy way to get everything to work is to simply use a single
categorical variable that can have four values.  Pretend this variable is
text.  If you use hashed vector encoding, you should be able to continue,
but you really need to use StaticWordEncoder (name is approximate).

Also, with a tiny example, NB will give unreasonably pessimistic results.

On Wed, Dec 14, 2011 at 6:01 AM, Daniele Volpi <da...@gmail.com>wrote:

> The version is 0.6-SNAPSHOT
> From terminal both commands trainclassifier and testclassifier work.
> Actually my real purpose is to use the TrainNaiveBayesJob in order to
> obtain a StandardNaiveBayesClassifier that i can use with the
> ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
> procedure is completely wrong.
> Thank you
>
>
> On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:
>
> > Which version of Mahout?
> >
> > And what happens when you train the classifier from the command line?
> >
> > On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
> > >wrote:
> >
> > > First of all i've converted the train files in the format:
> > > target[\t]terms
> > > through the BayesFileFormatter class.
> > > Then i've converted these files (one per category) in SequenceFile
> using
> > > the seqdirectory program.
> > > After that I ran this code:
> > >
> > > TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
> > > trainer.setConf(new Configuration());
> > >
> > > String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
> > > trainer.run(params);
> > >
> > > Here's the error message:
> > >
> > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
> to
> > > org.apache.mahout.math.VectorWritable
> > > at
> > >
> > >
> >
> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> > > On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org>
> wrote:
> > >
> > > > What steps have you done?
> > > >
> > > > On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
> > > >
> > > > > Hi everyone,
> > > > > I'm trying to implement the Naive Bayes classifier through the
> > > > > TrainNaiveBayesJob class.
> > > > > After convert the text files in the required sequencefile for the
> > "run"
> > > > > method through the seqdirectory program i get this error:
> > > > >
> > > > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
> > cast
> > > to
> > > > > org.apache.mahout.math.VectorWritable
> > > > >
> > > > > Do you have some hints on the right usage of this class?
> > > > >
> > > > > Thanks,
> > > > > Daniele Volpi
> > > >
> > > > --------------------------------------------
> > > > Grant Ingersoll
> > > > http://www.lucidimagination.com
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: SequenceFile cast problems

Posted by Daniele Volpi <da...@gmail.com>.
The version is 0.6-SNAPSHOT
>From terminal both commands trainclassifier and testclassifier work.
Actually my real purpose is to use the TrainNaiveBayesJob in order to
obtain a StandardNaiveBayesClassifier that i can use with the
ModelDissector class similiar to chapter 15 in Mahout In Action, maybe the
procedure is completely wrong.
Thank you


On 14 December 2011 01:24, Ted Dunning <te...@gmail.com> wrote:

> Which version of Mahout?
>
> And what happens when you train the classifier from the command line?
>
> On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <danielevolpi@gmail.com
> >wrote:
>
> > First of all i've converted the train files in the format:
> > target[\t]terms
> > through the BayesFileFormatter class.
> > Then i've converted these files (one per category) in SequenceFile using
> > the seqdirectory program.
> > After that I ran this code:
> >
> > TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
> > trainer.setConf(new Configuration());
> >
> > String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
> > trainer.run(params);
> >
> > Here's the error message:
> >
> > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> > org.apache.mahout.math.VectorWritable
> > at
> >
> >
> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> >
> > On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > > What steps have you done?
> > >
> > > On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
> > >
> > > > Hi everyone,
> > > > I'm trying to implement the Naive Bayes classifier through the
> > > > TrainNaiveBayesJob class.
> > > > After convert the text files in the required sequencefile for the
> "run"
> > > > method through the seqdirectory program i get this error:
> > > >
> > > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
> cast
> > to
> > > > org.apache.mahout.math.VectorWritable
> > > >
> > > > Do you have some hints on the right usage of this class?
> > > >
> > > > Thanks,
> > > > Daniele Volpi
> > >
> > > --------------------------------------------
> > > Grant Ingersoll
> > > http://www.lucidimagination.com
> > >
> > >
> > >
> > >
> >
>

Re: SequenceFile cast problems

Posted by Ted Dunning <te...@gmail.com>.
Which version of Mahout?

And what happens when you train the classifier from the command line?

On Tue, Dec 13, 2011 at 2:27 PM, Daniele Volpi <da...@gmail.com>wrote:

> First of all i've converted the train files in the format:
> target[\t]terms
> through the BayesFileFormatter class.
> Then i've converted these files (one per category) in SequenceFile using
> the seqdirectory program.
> After that I ran this code:
>
> TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
> trainer.setConf(new Configuration());
>
> String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
> trainer.run(params);
>
> Here's the error message:
>
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.mahout.math.VectorWritable
> at
>
> org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:
>
> > What steps have you done?
> >
> > On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
> >
> > > Hi everyone,
> > > I'm trying to implement the Naive Bayes classifier through the
> > > TrainNaiveBayesJob class.
> > > After convert the text files in the required sequencefile for the "run"
> > > method through the seqdirectory program i get this error:
> > >
> > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
> to
> > > org.apache.mahout.math.VectorWritable
> > >
> > > Do you have some hints on the right usage of this class?
> > >
> > > Thanks,
> > > Daniele Volpi
> >
> > --------------------------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com
> >
> >
> >
> >
>

Re: SequenceFile cast problems

Posted by Daniele Volpi <da...@gmail.com>.
First of all i've converted the train files in the format:
target[\t]terms
through the BayesFileFormatter class.
Then i've converted these files (one per category) in SequenceFile using
the seqdirectory program.
After that I ran this code:

TrainNaiveBayesJob trainer = new TrainNaiveBayesJob();
trainer.setConf(new Configuration());

String[] params = {"-i" + inputPath, "-o" + outputPath, "-ow", "-el"};
trainer.run(params);

Here's the error message:

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable
at
org.apache.mahout.classifier.naivebayes.training.IndexInstancesMapper.map(IndexInstancesMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

On 13 December 2011 19:52, Grant Ingersoll <gs...@apache.org> wrote:

> What steps have you done?
>
> On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:
>
> > Hi everyone,
> > I'm trying to implement the Naive Bayes classifier through the
> > TrainNaiveBayesJob class.
> > After convert the text files in the required sequencefile for the "run"
> > method through the seqdirectory program i get this error:
> >
> > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> > org.apache.mahout.math.VectorWritable
> >
> > Do you have some hints on the right usage of this class?
> >
> > Thanks,
> > Daniele Volpi
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>

Re: SequenceFile cast problems

Posted by Grant Ingersoll <gs...@apache.org>.
What steps have you done?

On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote:

> Hi everyone,
> I'm trying to implement the Naive Bayes classifier through the
> TrainNaiveBayesJob class.
> After convert the text files in the required sequencefile for the "run"
> method through the seqdirectory program i get this error:
> 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.mahout.math.VectorWritable
> 
> Do you have some hints on the right usage of this class?
> 
> Thanks,
> Daniele Volpi

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com