You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "hailong.yang1115" <ha...@gmail.com> on 2011/05/24 09:20:17 UTC

Error convert lucene index to vectors

Dear all,

I am using mahout to convert the lucene index into the vectors needed by clustering algorithm. However, I got the following errors:

[hailong@node125 benchmark]$ mahout lucene.vector --dir index/ --field body --dictOut ./dict.txt --output ./out.txt
Running on hadoop, using HADOOP_HOME=/home/hailong/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /home/hailong/hadoop-0.20.2/conf 
Exception in thread "main" org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)
        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)
        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:157)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

The mahout version is 0.4 and the lucene version is 3.1.0. Any help will be appreciated.


Hailong

2011-05-24 



***********************************************
* Hailong Yang, PhD. Candidate 
* Sino-German Joint Software Institute, 
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University, 
*              No.37 XueYuan Road,HaiDian District, 
*              Beijing,P.R.China,100191
***********************************************

Re: Error convert lucene index to vectors

Posted by Grant Ingersoll <gs...@apache.org>.
Mahout 0.4 uses an older version of Lucene, and so it won't be able to read an index created by Lucene 3.1.0.  Try using trunk, which uses 3.1.

-Grant

On May 24, 2011, at 3:20 AM, hailong.yang1115 wrote:

> Dear all,
> 
> I am using mahout to convert the lucene index into the vectors needed by clustering algorithm. However, I got the following errors:
> 
> [hailong@node125 benchmark]$ mahout lucene.vector --dir index/ --field body --dictOut ./dict.txt --output ./out.txt
> Running on hadoop, using HADOOP_HOME=/home/hailong/hadoop-0.20.2
> No HADOOP_CONF_DIR set, using /home/hailong/hadoop-0.20.2/conf 
> Exception in thread "main" org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)
>        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
>        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)
>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:157)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> The mahout version is 0.4 and the lucene version is 3.1.0. Any help will be appreciated.
> 
> 
> Hailong
> 
> 2011-05-24 
> 
> 
> 
> ***********************************************
> * Hailong Yang, PhD. Candidate 
> * Sino-German Joint Software Institute, 
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: hailong.yang1115@gmail.com
> * Address: G413, New Main Building in Beihang University, 
> *              No.37 XueYuan Road,HaiDian District, 
> *              Beijing,P.R.China,100191
> ***********************************************

--------------------------
Grant Ingersoll