You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sahitya ML <ml...@gmail.com> on 2013/03/26 16:33:26 UTC

Term vectors not found error

Hello,
  I am new to mahout and solr and i am trying to generate vectors for the
documents stored in solr. Tough i am able to search the documents in solr i
get the following error while running the lucene.vector. I would very
much appreciate the help.I have attached the xml file used for indexing in
solr. I am using Mahout 0.6.

Thanks
Sahitya

newscontext@newscontext-VirtualBox:~/mahout$ bin/mahout lucene.vector --dir
/home/newscontext/apache-solr-3.3.0/example/solr/data/index --output
tmp/part-out.vec --field Keywords --dictOut tmp/dict.out --norm 2
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
no HADOOP_HOME set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/newscontext/mahout/examples/target/mahout-examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/newscontext/mahout/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/newscontext/mahout/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
13/03/26 13:03:27 INFO lucene.Driver: Output File: tmp/part-out.vec
13/03/26 13:03:29 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/03/26 13:03:29 INFO compress.CodecPool: Got brand-new compressor
13/03/26 13:03:29 ERROR lucene.LuceneIterator: There are too many documents
that do not have a term vector for Keywords
Exception in thread "main" java.lang.IllegalStateException: There are too
many documents that do not have a term vector for Keywords
    at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
    at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
    at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
    at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
    at
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
    at
org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:109)
    at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

Re: Term vectors not found error

Posted by Suneel Marthi <su...@yahoo.com>.
Without digging deep into this issue, I would first suggest using Mahout 0.7 or later (better to go with the trunk).  
You seem to be using Solr 3.3, so the trunk (which is Lucene 4.2) may not work for you.

I would first try with Mahout 0.7.



________________________________
 From: Sahitya ML <ml...@gmail.com>
To: user@mahout.apache.org 
Sent: Tuesday, March 26, 2013 11:33 AM
Subject: Term vectors not found error
 

Hello,
  I am new to mahout and solr and i am trying to generate vectors for the documents stored in solr. Tough i am able to search the documents in solr i get the following error while running the lucene.vector. I would very much appreciate the help.I have attached the xml file used for indexing in solr. I am using Mahout 0.6.

Thanks 
Sahitya 
 newscontext@newscontext-VirtualBox:~/mahout$ bin/mahout lucene.vector --dir /home/newscontext/apache-solr-3.3.0/example/solr/data/index --output tmp/part-out.vec --field Keywords --dictOut tmp/dict.out --norm 2
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
no HADOOP_HOME set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/newscontext/mahout/examples/target/mahout-examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/newscontext/mahout/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/newscontext/mahout/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/03/26 13:03:27 INFO lucene.Driver: Output File: tmp/part-out.vec
13/03/26 13:03:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/26 13:03:29 INFO compress.CodecPool: Got brand-new compressor
13/03/26 13:03:29 ERROR lucene.LuceneIterator: There are too many documents that do not have a term vector for Keywords
Exception in thread "main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for Keywords
    at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
    at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
    at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
    at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:109)
    at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)