You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/01/15 14:22:19 UTC
Re: CorruptIndexException or NullPointerException when creating vectors from Lucene
Right, Mahout is currently on Lucene 2.9. We should upgrade.
On Jan 15, 2010, at 1:01 AM, Shashikant Kore wrote:
> The first problem seems to be index version incompatibility.
>
> Since you created index with Lucene 3.0, you will need the same
> version to read the index. It seem while creating the vectors, the
> version of Lucene is lower than that. Can you check if you are using
> the same lucene jar while creating vector?
>
> Not sure what the second problem is.
>
> --shashi
>
> On Fri, Jan 15, 2010 at 11:11 AM, Rob Ennals <ro...@gmail.com> wrote:
>> Hi Guys,
>>
>> I'm totally new to Mahout so I'm running into what I expect are newbie issues.
>>
>> To get started with clustering, I tried importing some indexes from Lucene.
>>
>> Following the Lucene tutorial, I created a really simple index of the
>> Lucene source code:
>> http://lucene.apache.org/java/3_0_0/demo.html
>>
>> I then tried to convert this to a Mahout Vector, following as per
>> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
>>
>> This gives me a CorruptIndexException:
>>
>> rob@rob:~/svn/mahout$ java
>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>> /home/rob/Reference/Installers/lucene-3.0.0/index --output
>> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
>> contents
>> Exception in thread "main"
>> org.apache.lucene.index.CorruptIndexException: Incompatible format
>> version: 2 expected 1 or lower
>> at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:117)
>> at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>> at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104)
>> at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27)
>> at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
>> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
>> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
>> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:140)
>>
>>
>> I also tried running the driver on the actual Lucene index that I want
>> to apply it to, and this time to a NullPointerException:
>>
>> rob@rob:~/svn/mahout$ java
>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>> /home/rob/git/thinklink/scala/bin/index/ --output
>> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
>> contents
>> Jan 14, 2010 9:40:40 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Output File: /home/rob/test/output
>> Exception in thread "main" java.lang.NullPointerException
>> at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>> at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>> at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074)
>> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
>> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
>> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
>> at org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
>> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>>
>>
>> In both cases, the indexes should have the "contents" field.
>>
>>
>> I assume I'm doing something stupid here. If someone can tell me what
>> that is, then that would be great.
>>
>>
>> Thanks
>>
>> -Rob
>>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: CorruptIndexException or NullPointerException when creating
vectors from Lucene
Posted by Isabel Drost <is...@apache.org>.
On Fri Grant Ingersoll <gs...@apache.org> wrote:
> Right, Mahout is currently on Lucene 2.9. We should upgrade.
Apart from the issues to be fixed in MAHOUT-246 - is there anything else
that would block upgrading?
Isabel