You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Christopher Schindler <id...@hotmail.com> on 2013/11/17 20:57:45 UTC

createTermFrequencyVectors, Hadoop, cast error

Hi,

After proving for FuzzyKMeans clustering methods in CLI I'm now moving to a Java app.

I'm running into an issue I can't seem to get past. 

Error I'm getting: 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.common.StringTuple
    at org.apache.mahout.vectorizer.collocations.llr.CollocMapper.map(CollocMapper.java:41)
...

I understand the type issue being reported; any insights for the fix? Also, I'm not explicitly calling FSDataOutputStream as I believe that the new Path param that is in the mahout method is handling the stream out. 


Here's how I'm calling the method:

<snip>
String luceneSequenceFile = "hdfs://<server>:50070/opt/mahout/lucene-seq/index";
String outputDir = "hdfs://<server>:50070/opt/mahout/fkmeans-newsClusters";
String vectorsOutput = "hdfs://<server>:50070/opt/mahout/fkmeans-newsVectorsOutput";

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

DictionaryVectorizer.createTermFrequencyVectors(
                    new Path(luceneSequenceFile),
                    new Path(outputDir), 
                    vectorsOutput,
                    conf, 
                    minSupport, 
                    maxNGramSize, 
                    minLLRValue, 
                    normPower, 
                    true, 
                    reduceTasks,
                    chunkSize, 
                    sequentialAccessOutput, false);
</snip>

TIA,
Chris