You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "hailong.yang1115" <ha...@gmail.com> on 2011/05/24 09:20:17 UTC
Error convert lucene index to vectors
Dear all,
I am using mahout to convert the lucene index into the vectors needed by clustering algorithm. However, I got the following errors:
[hailong@node125 benchmark]$ mahout lucene.vector --dir index/ --field body --dictOut ./dict.txt --output ./out.txt
Running on hadoop, using HADOOP_HOME=/home/hailong/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /home/hailong/hadoop-0.20.2/conf
Exception in thread "main" org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
The mahout version is 0.4 and the lucene version is 3.1.0. Any help will be appreciated.
Hailong
2011-05-24
***********************************************
* Hailong Yang, PhD. Candidate
* Sino-German Joint Software Institute,
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University,
* No.37 XueYuan Road,HaiDian District,
* Beijing,P.R.China,100191
***********************************************
Re: Error convert lucene index to vectors
Posted by Grant Ingersoll <gs...@apache.org>.
Mahout 0.4 uses an older version of Lucene, and so it won't be able to read an index created by Lucene 3.1.0. Try using trunk, which uses 3.1.
-Grant
On May 24, 2011, at 3:20 AM, hailong.yang1115 wrote:
> Dear all,
>
> I am using mahout to convert the lucene index into the vectors needed by clustering algorithm. However, I got the following errors:
>
> [hailong@node125 benchmark]$ mahout lucene.vector --dir index/ --field body --dictOut ./dict.txt --output ./out.txt
> Running on hadoop, using HADOOP_HOME=/home/hailong/hadoop-0.20.2
> No HADOOP_CONF_DIR set, using /home/hailong/hadoop-0.20.2/conf
> Exception in thread "main" org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)
> at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> The mahout version is 0.4 and the lucene version is 3.1.0. Any help will be appreciated.
>
>
> Hailong
>
> 2011-05-24
>
>
>
> ***********************************************
> * Hailong Yang, PhD. Candidate
> * Sino-German Joint Software Institute,
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: hailong.yang1115@gmail.com
> * Address: G413, New Main Building in Beihang University,
> * No.37 XueYuan Road,HaiDian District,
> * Beijing,P.R.China,100191
> ***********************************************
--------------------------
Grant Ingersoll