You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/01/15 14:22:19 UTC

Re: CorruptIndexException or NullPointerException when creating vectors from Lucene

Right, Mahout is currently on Lucene 2.9.  We should upgrade.

On Jan 15, 2010, at 1:01 AM, Shashikant Kore wrote:

> The first problem seems to be index version incompatibility.
> 
> Since you created index with Lucene 3.0, you will need the same
> version to read the index. It seem while creating the vectors, the
> version of Lucene is lower than that.  Can you check if you are using
> the same lucene jar while creating vector?
> 
> Not sure what the second problem is.
> 
> --shashi
> 
> On Fri, Jan 15, 2010 at 11:11 AM, Rob Ennals <ro...@gmail.com> wrote:
>> Hi Guys,
>> 
>> I'm totally new to Mahout so I'm running into what I expect are newbie issues.
>> 
>> To get started with clustering, I tried importing some indexes from Lucene.
>> 
>> Following the Lucene tutorial, I created a really simple index of the
>> Lucene source code:
>> http://lucene.apache.org/java/3_0_0/demo.html
>> 
>> I then tried to convert this to a Mahout Vector, following as per
>> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
>> 
>> This gives me a CorruptIndexException:
>> 
>> rob@rob:~/svn/mahout$ java
>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>> /home/rob/Reference/Installers/lucene-3.0.0/index --output
>> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
>> contents
>> Exception in thread "main"
>> org.apache.lucene.index.CorruptIndexException: Incompatible format
>> version: 2 expected 1 or lower
>>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:117)
>>        at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>>        at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104)
>>        at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27)
>>        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
>>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
>>        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
>>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:140)
>> 
>> 
>> I also tried running the driver on the actual Lucene index that I want
>> to apply it to, and this time to a NullPointerException:
>> 
>> rob@rob:~/svn/mahout$ java
>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>> /home/rob/git/thinklink/scala/bin/index/ --output
>> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
>> contents
>> Jan 14, 2010 9:40:40 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Output File: /home/rob/test/output
>> Exception in thread "main" java.lang.NullPointerException
>>        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>>        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>>        at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074)
>>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
>>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
>>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
>>        at org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
>>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>> 
>> 
>> In both cases, the indexes should have the "contents" field.
>> 
>> 
>> I assume I'm doing something stupid here. If someone can tell me what
>> that is, then that would be great.
>> 
>> 
>> Thanks
>> 
>> -Rob
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

Re: CorruptIndexException or NullPointerException when creating vectors from Lucene

Posted by Isabel Drost <is...@apache.org>.

On Fri Grant Ingersoll <gs...@apache.org> wrote:

> Right, Mahout is currently on Lucene 2.9.  We should upgrade.

Apart from the issues to be fixed in MAHOUT-246 - is there anything else
that would block upgrading?

Isabel