You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by varun sharma <me...@yahoo.co.in> on 2014/08/20 16:29:29 UTC

Re: Lucene index corruption on HDFS

Please do help here.

Thank you ,
Varun.


On Tuesday, 15 July 2014 2:14 PM, varun sharma <me...@yahoo.co.in> wrote:
 


I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am trying to do
Create Index
	1. Build index in RAMDirectory based on data stored on HDFS .
	2. Once built , copy the index onto HDFS.
Search Index
	1. Bring in the index stored on HDFS into RAMDirectory.
	2. Perform a search on in memory index .
The error I am facing is
Exception in thread "main" java.io.EOFException: read past EOF: RAMInputStream(name=segments_2) at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:94) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:67) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:326) at org.apache.lucene.index.StandardDirectoryReader$1 o.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at hdfs.SearchFiles.main(SearchFiles.java:85)
I did some research and found out this , may be due to index corruption .
Below is my code .
Save index into HDFS .
// Getting files present in memory into an array.StringfileList[]=rdir.listAll();// Reading index files from memory and storing them to HDFS.for(inti =0;i <fileList.length;i++){IndexInputindxfile =rdir.openInput(fileList[i].trim(),null);longlen =indxfile.length();intlen1 =(int)len;// Reading data from file into a byte array.byte[]bytarr =newbyte[len1];indxfile.readBytes(bytarr,0,len1);// Creating file in HDFS directory with name same as that of// index filePathsrc =newPath(indexPath +fileList[i].trim());dfs.createNewFile(src);// Writing data from byte array to the file in HDFSFSDataOutputStreamfs =dfs.create(newPath(dfs.getWorkingDirectory()+indexPath +fileList[i].trim()),true);fs.write(bytarr);fs.flush();fs.close();}FileSystem.closeAll();
________________________________

Bringing index from HDFS into RAMDirectory and using it .
// Creating a RAMDirectory (memory) object, to be able to create index// in memory.RAMDirectoryrdir =newRAMDirectory();// Getting the list of index files present in the directory into an// array.FSDataInputStreamfilereader =null;for(inti =0;i <status.length;i++){// Reading data from index files on HDFS directory into filereader// object.filereader =dfs.open(status[i].getPath());intsize =filereader.available();// Reading data from file into a byte array.byte[]bytarr =newbyte[size];filereader.read(bytarr,0,size);// Creating file in RAM directory with names same as that of// index files present in HDFS directory.filenm =newString(status[i].getPath().toString());StringsSplitValue =filenm.substring(57,filenm.length());System.out.println(sSplitValue);IndexOutputindxout =rdir.createOutput((sSplitValue),IOContext.DEFAULT);// Writing data from byte array to the file in RAM directoryindxout.writeBytes(bytarr,bytarr.length);indxout.flush();indxout.close();}