You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jayani Withanawasam <ja...@gmail.com> on 2015/03/24 11:12:00 UTC

Error in TF-IDF vector creation - java.lang.IllegalStateException

Hi,

I'm trying to get text classification working in Mahout 1.0 on Hadoop fully
distribution mode (Ubuntu 12.04/ Hadoop 2.6)

There, I get the following error during TF vector creation.

Command:
*mahout seq2sparse -i 20news-seq -o 20news-vectors  -lnorm -nv  -wt tfidf*

Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at
org.apache.mahout.vectorizer.common.PartialVectorMerger.mergePartialVectors(PartialVectorMerger.java:131)
    at
org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:206)
    at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

Here's the output generated in 20news-vectors directory in HDFS (until I
get the error).

drwxr-xr-x   - hduser supergroup          0 2015-03-24 08:04
20news-vectors/df-count
-rw-r--r--   2 hduser supergroup    1937084 2015-03-24 09:23
20news-vectors/dictionary.file-0
drwxr-xr-x   - hduser supergroup          0 2015-03-24 09:24 20news-vectors/
*partial-vectors-0*
drwxr-xr-x   - hduser supergroup          0 2015-03-24 09:26 20news-vectors/
*tf-vectors-toprune*
drwxr-xr-x   - hduser supergroup          0 2015-03-24 09:22
20news-vectors/tokenized-documents
drwxr-xr-x   - hduser supergroup          0 2015-03-24 09:23
20news-vectors/wordcount

I have tried the same with Hadoop pseodo-distribution mode and I did not
encounter any issues.

Any help or clue to resolve this issue would be much appreciated.

Thank you in advance!
Jayani