You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by OldSkoolMark <ma...@sisa.samsung.com> on 2011/10/20 20:17:45 UTC

Exception in thread "main" org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file "_b.fnm"

I’m having some trouble getting this to work with my own data. I issue the
following command:

mahout lucene.vector –dir
/home/markr/shgs/apache-solr-3.4.0/example/solr/data/index/ –output
/tmp/part-out.vec –field content_encoded –idField id –dictOut /tmp/dict.out
–norm 2

My intent is to generate term vectors for the content_encoded field whose
schema.xml entry has the termVectors=”true” attribute setting. There is also
a field named ‘id’. My data was imported into a sqlite3 db, and id is ‘not
null’, but content_encoded may be null. When I run, I get the SLF4J multiple
binding warning (just a warning?), and then the following exception:

Exception in thread “main” org.apache.lucene.index.CorruptIndexException:
unrecognized format -3 in file “_b.fnm”
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:351)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:72)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:114)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92)
at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:113)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:29)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:288)
at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:84)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

Advise on how to debug this problem would be greatly appreciated.

Mark


--
View this message in context: http://lucene.472066.n3.nabble.com/Exception-in-thread-main-org-apache-lucene-index-CorruptIndexException-unrecognized-format-3-in-file-tp3438539p3438539.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file "_b.fnm"

Posted by Grant Ingersoll <gs...@apache.org>.
I'll update us to 3.4 anyway, since 3.3 has a nasty bug in it.

On Oct 26, 2011, at 2:47 AM, Lance Norskog wrote:

> Yup! The top-level pom.xml file in mahout gives Lucene 3.3 as the
> requested version. You need to either:
> 1) back-rev your Solr to 3.3, OR
> 2) upgrade your Mahout to use Lucene 3.4 instead of 3.3. You may have
> to make some simple changes in the Mahout code which uses Lucene to
> match any API changes, but there should not be any real differences.
> 
> 
> On 10/24/11, Isabel Drost <is...@apache.org> wrote:
>> On 20.10.2011 OldSkoolMark wrote:
>>> Exception in thread “main” org.apache.lucene.index.CorruptIndexException:
>>> unrecognized format -3 in file “_b.fnm”
>> 
>> Not having much experience with Lucene this looks like you are trying to
>> read
>> the index with Lucene in a version that is older than the one the index was
>> created with?
>> 
>> 
>> Isabel
>> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file "_b.fnm"

Posted by Lance Norskog <go...@gmail.com>.
Yup! The top-level pom.xml file in mahout gives Lucene 3.3 as the
requested version. You need to either:
1) back-rev your Solr to 3.3, OR
2) upgrade your Mahout to use Lucene 3.4 instead of 3.3. You may have
to make some simple changes in the Mahout code which uses Lucene to
match any API changes, but there should not be any real differences.


On 10/24/11, Isabel Drost <is...@apache.org> wrote:
> On 20.10.2011 OldSkoolMark wrote:
>> Exception in thread “main” org.apache.lucene.index.CorruptIndexException:
>> unrecognized format -3 in file “_b.fnm”
>
> Not having much experience with Lucene this looks like you are trying to
> read
> the index with Lucene in a version that is older than the one the index was
> created with?
>
>
> Isabel
>


-- 
Lance Norskog
goksron@gmail.com

Re: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file "_b.fnm"

Posted by Isabel Drost <is...@apache.org>.
On 20.10.2011 OldSkoolMark wrote:
> Exception in thread “main” org.apache.lucene.index.CorruptIndexException:
> unrecognized format -3 in file “_b.fnm”

Not having much experience with Lucene this looks like you are trying to read 
the index with Lucene in a version that is older than the one the index was 
created with?


Isabel