You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by jung hoon sohn <js...@gmail.com> on 2012/09/15 09:29:30 UTC
KmeansDriver Question
Hello, I am trying to cluster the input data using KmeansDriver.
The input vector is transformed from the lucene vector using the
"bin/mahout lucene.vector ..." commands and when I run the
KmeansDriver using the run method, I get
12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
attempt_201209121951_0067_m_000000_1, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.io.Text
at
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.map(SequenceFileTokenizerMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
for several attempts but the process goes on and generates the output data.
I can even run the clusterdump using the output cluster data however I am
concerned about the effect of above errors.
Please help me to get through the problem.
Thanks.
Jung Hoon
Re: KmeansDriver Question
Posted by Paritosh Ranjan <pr...@xebia.com>.
AFAIK SequenceFileTokenizerMapper is not called from KMeansdriver.
The mapper is tokenizing sequence files, so, the error might be during
that step.
On 17-09-2012 12:39, jung hoon sohn wrote:
> Thank you for the reply.
> However the error was thrown during the process of the map (
> org.apache.hadoop.mapreduce.**Mapper.run).
> Isn't the mapping function part of the KmeansDriver class?
>
> Thank You.
>
> Jung Hoon
>
> On Sat, Sep 15, 2012 at 5:48 PM, Paritosh Ranjan <pr...@xebia.com> wrote:
>
>> I don't think that it is a kmeans driver error.
>> SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are
>> getting error while transforming data.
>>
>>
>> On 15-09-2012 12:59, jung hoon sohn wrote:
>>
>>> Hello, I am trying to cluster the input data using KmeansDriver.
>>> The input vector is transformed from the lucene vector using the
>>> "bin/mahout lucene.vector ..." commands and when I run the
>>> KmeansDriver using the run method, I get
>>>
>>> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
>>> attempt_201209121951_0067_m_**000000_1, Status : FAILED
>>> java.lang.ClassCastException: org.apache.hadoop.io.**LongWritable cannot
>>> be
>>> cast to org.apache.hadoop.io.Text
>>> at
>>> org.apache.mahout.vectorizer.**document.**SequenceFileTokenizerMapper.**
>>> map(**SequenceFileTokenizerMapper.**java:37)
>>> at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
>>> java:764)
>>> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370)
>>> at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)
>>> at java.security.**AccessController.doPrivileged(**Native
>>> Method)
>>> at javax.security.auth.Subject.**doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>> UserGroupInformation.java:**1093)
>>> at org.apache.hadoop.mapred.**Child.main(Child.java:249)
>>>
>>> for several attempts but the process goes on and generates the output
>>> data.
>>> I can even run the clusterdump using the output cluster data however I am
>>> concerned about the effect of above errors.
>>>
>>> Please help me to get through the problem.
>>>
>>> Thanks.
>>>
>>> Jung Hoon
>>>
>>>
>>
Re: KmeansDriver Question
Posted by jung hoon sohn <js...@gmail.com>.
Thank you for the reply.
However the error was thrown during the process of the map (
org.apache.hadoop.mapreduce.**Mapper.run).
Isn't the mapping function part of the KmeansDriver class?
Thank You.
Jung Hoon
On Sat, Sep 15, 2012 at 5:48 PM, Paritosh Ranjan <pr...@xebia.com> wrote:
> I don't think that it is a kmeans driver error.
> SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are
> getting error while transforming data.
>
>
> On 15-09-2012 12:59, jung hoon sohn wrote:
>
>> Hello, I am trying to cluster the input data using KmeansDriver.
>> The input vector is transformed from the lucene vector using the
>> "bin/mahout lucene.vector ..." commands and when I run the
>> KmeansDriver using the run method, I get
>>
>> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
>> attempt_201209121951_0067_m_**000000_1, Status : FAILED
>> java.lang.ClassCastException: org.apache.hadoop.io.**LongWritable cannot
>> be
>> cast to org.apache.hadoop.io.Text
>> at
>> org.apache.mahout.vectorizer.**document.**SequenceFileTokenizerMapper.**
>> map(**SequenceFileTokenizerMapper.**java:37)
>> at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
>> java:764)
>> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370)
>> at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)
>> at java.security.**AccessController.doPrivileged(**Native
>> Method)
>> at javax.security.auth.Subject.**doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1093)
>> at org.apache.hadoop.mapred.**Child.main(Child.java:249)
>>
>> for several attempts but the process goes on and generates the output
>> data.
>> I can even run the clusterdump using the output cluster data however I am
>> concerned about the effect of above errors.
>>
>> Please help me to get through the problem.
>>
>> Thanks.
>>
>> Jung Hoon
>>
>>
>
>
Re: KmeansDriver Question
Posted by Paritosh Ranjan <pr...@xebia.com>.
I don't think that it is a kmeans driver error.
SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are
getting error while transforming data.
On 15-09-2012 12:59, jung hoon sohn wrote:
> Hello, I am trying to cluster the input data using KmeansDriver.
> The input vector is transformed from the lucene vector using the
> "bin/mahout lucene.vector ..." commands and when I run the
> KmeansDriver using the run method, I get
>
> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
> attempt_201209121951_0067_m_000000_1, Status : FAILED
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
> at
> org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.map(SequenceFileTokenizerMapper.java:37)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> for several attempts but the process goes on and generates the output data.
> I can even run the clusterdump using the output cluster data however I am
> concerned about the effect of above errors.
>
> Please help me to get through the problem.
>
> Thanks.
>
> Jung Hoon
>