You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ri...@gxs.com on 2008/03/17 22:57:29 UTC

Huge number of Term objects in memory gives OutOfMemory error

I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux. We have fairly large collections (~1gig collection files, ~1,000,000 documents). When I try to load test our application with 50 users, all doing simple searches via a web interface, we quickly get an OutOfMemory exception. When I do a jmap dump of the heap, this is what I see:

Size Count Class description
-------------------------------------------------------
195818576 4263822 char[]
190889608 13259 byte[]
172316640 4307916 java.lang.String
164813120 4120328 org.apache.lucene.index.TermInfo
131823104 4119472 org.apache.lucene.index.Term
37729184 604 org.apache.lucene.index.TermInfo[]
37729184 604 org.apache.lucene.index.Term[]

So 4 of the top 7 memory consumers are Term related. We have 2 gig of RAM available on the system but we get OOM errors no matter the java heap settings. Has anyone seen this issue and know how to solve it?

We do use separate MultiSearcher instances for each search. (We actually have 2 collections that we search via a MultiSearcher.) We tried using a singleton searcher instance but our collections are constantly being updated and the singleton searcher only gives you results since the searcher was opened. Creating new searcher objects at search time gives you up to the minute search results.

I've seen some postings referring to an Index Divisor setting which could reduce the Terms in memory, but I have not seen how to set this value for Lucene.

Any help would be greatly appreciated.

Rich

Re: Lucene 2.3.1 Index Corruption?

Posted by Jamie <ja...@stimulussoft.com>.

As a further followup:

The follows files are located in the index:

 ls /usr/local/index
_0.fnm  _0.frq  _0.nrm  _0.prx  _0.tii  _0.tis  _1.cfs  indexinfo  
_j.cfs  segments.gen  segments_s

This problem appears to be intermittant and has occurred on several 
machines. Is there any incorrect way that I could be using Lucene such 
that this problem would occur?

Jamie

Jamie wrote:
> Hi There
>
> I am getting the following error while searching a given index:
>
> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such file 
> or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(Unknown Source)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506) 
>
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536) 
>
>        at 
> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
>        at 
> org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:75)
>        at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
>        at 
> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>        at 
> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
>        at 
> org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55) 
>
>        at 
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75) 
>
>        at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636) 
>
>        at 
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) 
>
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>        at 
> org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
>
> My software used to work perfectly under earlier versions of Lucene. 
> Since I upgraded to 2.3.1, this problem has arisen.
>
> I seriously worried my customer's indexes will be corrupted. Lucene 
> expects to find a file that does not exist.
>
> Any ideas on what might be happening and how to rectify this?
>
> Jamie
>
>


-- 
Stimulus Software - MailArchiva
Email Archiving And Compliance
USA Tel: +1-713-366-8072 ext 3
UK Tel: +44-20-80991035 ext 3
Email: jamie@stimulussoft.com
Web: http://www.mailarchiva.com

To receive MailArchiva Enterprise Edition product announcements, send a message to: <ma...@stimulussoft.com> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, opening two writers at once is definitely a recipe for disaster.

Please post back on whether this does or doesn't resolve it.

Previous versions of Lucene didn't write the fdt/fdx files until a  
segment is flushed, so it's possible you escaped index corruption  
(but, lost documents) before.  But with 2.3, Lucene has become more  
sensitive to two writers at once.

Mike

Jamie wrote:

> Michael McCandless wrote:
>>
>> Yes fdt/fdx hold stored fields.  When the first buffered document  
>> is added these files are created.
>>
>> The only way they disappear (through Lucene's APIs) is if a writer  
>> is opened on that directory, and, those files are not referenced  
>> by the current segments file.  This is why I'm concerned about the  
>> "two writers at a time" risk.  If a 2nd writer is opened while 1st  
>> one is still open that would easily cause this issue, so triple  
>> check that the messages you send to your logger on having to  
>> remove the write.lock are definitely not happening when you hit  
>> this corruption.
> I think you could be right.  I am going to try the following change:
>
>      public void indexMessage(Email email) throws  
> MessageSearchException {
>          VolumeIndex volumeIndex = null;
>          synchronized (volumeIndexLock) { // note here
>              Volume volume = email.getEmailId().getVolume();
>              volumeIndex = volumeIndexes.get(volume);
>              if (volumeIndex!=null) {
>                  volumeIndex.indexMessage(email);
>              } else {
>                  volumeIndex = new VolumeIndex(volume);
>                  volumeIndexes.put(volume,volumeIndex);
>              }
>          }
>          volumeIndex.indexMessage(email);
>      }
>
>>
>> Can you post the output of "ls -l" on the corrupted index directory?
>>
>> One more possibility is that this file failed to be created in the  
>> first place, yet, IndexWriter flushed the remaining _0.* files.  I  
>> can see one code path that causes this, however, it only happens  
>> if you open a new writer, you call addDocument, you hit an  
>> exception specifically in the code trying to create the fdt file  
>> (eg something like "too many open files"), then you close the  
>> writer.  I have a unit test showing this particular exception  
>> would result in the _0.* files you see in your index with fdt/fdx  
>> missing.  Are you really sure you don't see any exceptions,  
>> perhaps from very long ago, against this index, when calling  
>> addDocument?  If you are hitting this case, it's already been  
>> fixed (this is LUCENE-1198) and backported to the 2.3 branch.  Are  
>> you able to checkout the current 2.3 branch and run your test  
>> using the JAR from there?
>>
>> Since your index has much later segment files (_1.cfs, _j.cfs),  
>> these exceptions could have happened quite a while back (many  
>> writers ago) but then only detected when you finally opened a  
>> searcher.  So if possible, look way back in your error logs...
>>
>> Mike
>>
>> Jamie wrote:
>>
>>> Hi Michael
>>>
>>> I've tried to reindex the index several times and no such luck.  
>>> I've enabled lucene debugging as you suggested and will let you  
>>> know as soon as I have more information. From what I've read, fdt  
>>> files are used to hold field data. Could there be any reason why  
>>> this file is not being written? Does Lucene recreate this file  
>>> every time from scratch? Why would the file completely disappear?
>>>
>>> Jamie
>>>
>>>
>>>
>>> Michael McCandless wrote:
>>>>
>>>> One more thing: try running with asserts enabled (java -ea).   
>>>> Lucene has a number of assertions that may catch something sooner.
>>>>
>>>> Also: how often do you try to open a searcher?  Can you try  
>>>> opening and then closing a searcher right after you close your  
>>>> writer?  (Just so we detect the corruption the moment it happens).
>>>>
>>>> Mike
>>>>
>>>> Jamie wrote:
>>>>
>>>>> Hi Michael
>>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> It looks like you ignore any IOException coming out of  
>>>>>> IndexWriter.close?  Can you put some code in the catch clause  
>>>>>> around writer.close to see if you are hitting some exception  
>>>>>> there?
>>>>> Sure. I'll do that.
>>>>>>
>>>>>> Also, you forcefully remove the write lock if it's present.   
>>>>>> But are you absolutely certain there isn't another writer  
>>>>>> actually writing to that index directory?
>>>>> Yes. There is only ever one writer writing.
>>>>>>
>>>>>> Do you copy the index or alter it in some way?
>>>>> No. absoutely not.
>>>>>>   One strange thing in your directory listing was the file  
>>>>>> "indexinfo", which isn't a Lucene index file.  Something else  
>>>>>> must be writing that file.
>>>>> Yes. I neglected to mentioned.... its used by my application to  
>>>>> deal with multiple indexes.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Jamie wrote:
>>>>>>
>>>>>>> Hi Michael
>>>>>>>
>>>>>>> Sorry for the late reply. As you guessed, it missed my  
>>>>>>> attention.
>>>>>>>
>>>>>>> Michael McCandless wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Can you describe what led up to this?
>>>>>>>
>>>>>>> My application indexes emails. In this particular instance, I  
>>>>>>> had reindexed all emails from their original sources. The  
>>>>>>> error occurred while I was using a search to search through  
>>>>>>> the index.
>>>>>>>> Were there any exceptions when adding documents to the index?
>>>>>>> I had a look through all my application debug logs and there  
>>>>>>> were no exceptions outputted.
>>>>>>>
>>>>>>>>   Was the index newly created with 2.3.1 or created on 2.3.0  
>>>>>>>> or 2.2?
>>>>>>> This index was created by v2.3.1
>>>>>>>>
>>>>>>>> What options are you using in your IndexWriter?
>>>>>>>  See source code below:
>>>>>>>
>>>>>>>      public void indexMessage(Email email) throws  
>>>>>>> MessageSearchException {
>>>>>>>          Volume volume = email.getEmailId().getVolume();
>>>>>>>          VolumeIndex volumeIndex = volumeIndexes.get(volume);
>>>>>>>          if (volumeIndex!=null) {
>>>>>>>              volumeIndex.indexMessage(email);
>>>>>>>          } else {
>>>>>>>              volumeIndex = new VolumeIndex(volume);
>>>>>>>              volumeIndex.indexMessage(email);
>>>>>>>              volumeIndexes.put(volume,volumeIndex);
>>>>>>>          }
>>>>>>>      }
>>>>>>>          public class VolumeIndex {
>>>>>>>                        IndexWriter writer;
>>>>>>>                Volume volume;
>>>>>>>                Timer closeIndexTimer = new Timer();
>>>>>>>             AccessStatus volumeOpened = AccessStatus.CLOSED;
>>>>>>>             Object indexLock = new Object();
>>>>>>>                          public synchronized AccessStatus  
>>>>>>> getAccessStatus() { return volumeOpened;}
>>>>>>>
>>>>>>>              public synchronized void setAccessStatus 
>>>>>>> (AccessStatus volumeOpened) {
>>>>>>>                  this.volumeOpened = volumeOpened;
>>>>>>>             }
>>>>>>>                            public VolumeIndex(Volume volume) {
>>>>>>>                        this.volume = volume;
>>>>>>>                      closeIndexTimer.scheduleAtFixedRate(new  
>>>>>>> TimerTask() {
>>>>>>>                        public void run() {
>>>>>>>                             closeIndex(writer);
>>>>>>>                        }
>>>>>>>                    }, indexOpenTime, indexOpenTime);
>>>>>>>                        }
>>>>>>>
>>>>>>>              protected void openIndex() throws  
>>>>>>> MessageSearchException {
>>>>>>>                  synchronized(indexLock) {
>>>>>>>                        if (getAccessStatus() 
>>>>>>> ==AccessStatus.CLOSED) {
>>>>>>>                            logger.debug("openIndex() index  
>>>>>>> will be opened. it is currently closed.");
>>>>>>>                            openIndex(false);
>>>>>>>                            setAccessStatus(AccessStatus.OPEN);
>>>>>>>                        } else
>>>>>>>                            logger.debug("openIndex() did not  
>>>>>>> bother opening index. it is already open.");
>>>>>>>                  }
>>>>>>>                              }
>>>>>>>                          protected void openIndex(boolean  
>>>>>>> retry) throws MessageSearchException {
>>>>>>>                    if (volume == null)
>>>>>>>                        throw new MessageSearchException 
>>>>>>> ("assertion failure: null volume",logger);
>>>>>>>                    logger.debug("opening index for write  
>>>>>>> {"+volume+"}");
>>>>>>>                    prepareIndex(volume);
>>>>>>>                    Index activeIndex = volume.getActiveIndex();
>>>>>>>                    logger.debug("opening search index for  
>>>>>>> write {indexpath='"+activeIndex.getPath()+"'}");
>>>>>>>                    try {
>>>>>>>                            writer = new IndexWriter 
>>>>>>> (activeIndex.getPath(), analyzer);
>>>>>>>                    } catch (IOException io)
>>>>>>>                    {
>>>>>>>                        if (!retry) {
>>>>>>>                            // most obvious reason for error  
>>>>>>> is that there is a lock on the index, due hard shutdown
>>>>>>>                            // resolution delete the lock, and  
>>>>>>> try again
>>>>>>>                            logger.warn("failed to open search  
>>>>>>> index for write. possible write lock due to hard system  
>>>>>>> shutdown.",io);
>>>>>>>                            logger.info("attempting recovery.  
>>>>>>> deleting index lock file and retrying..");
>>>>>>>                            File lockFile = new File 
>>>>>>> (activeIndex.getPath()+File.separatorChar + "write.lock");
>>>>>>>                            lockFile.delete();
>>>>>>>                            try {
>>>>>>>                                openIndex(true);
>>>>>>>                            } catch (MessageSearchException  
>>>>>>> mse) {
>>>>>>>                                throw mse;
>>>>>>>                            }
>>>>>>>                        }
>>>>>>>                        throw new MessageSearchException 
>>>>>>> ("failed to open/ index writer  
>>>>>>> {location='"+activeIndex.getPath()+"'}",io,logger);
>>>>>>>                    }
>>>>>>>            }
>>>>>>>
>>>>>>>              public void prepareIndex(Volume volume) throws  
>>>>>>> MessageSearchException {
>>>>>>>                                if (volume==null)
>>>>>>>                            throw new MessageSearchException 
>>>>>>> ("assertion failure: null volume",logger);
>>>>>>>                                if (volume.getIndexPath 
>>>>>>> ().startsWith("rmi://"))
>>>>>>>                          return;
>>>>>>>                                          File indexDir = new  
>>>>>>> File(volume.getIndexPath());
>>>>>>>                  if (!indexDir.exists()) {
>>>>>>>                    logger.info("index directory does not  
>>>>>>> exist. will proceed with creation {location='" +  
>>>>>>> volume.getIndexPath() + "'}");
>>>>>>>                    boolean success = indexDir.mkdir();
>>>>>>>                    if (!success)
>>>>>>>                            throw new MessageSearchException 
>>>>>>> ("failed to create index directory {location='" +  
>>>>>>> volume.getIndexPath() + "'}",logger);
>>>>>>>                    logger.info("index directory successfully  
>>>>>>> created {location='" + volume.getIndexPath() + "'}");
>>>>>>>                  }
>>>>>>>                        }
>>>>>>>                        public void indexMessage(Email  
>>>>>>> message) throws MessageSearchException  {
>>>>>>>                long s = (new Date()).getTime();
>>>>>>>                if (message == null)
>>>>>>>                    throw new MessageSearchException 
>>>>>>> ("assertion failure: null message",logger);
>>>>>>>                logger.debug("indexing message {"+message+"}");
>>>>>>>                              Document doc = new Document();
>>>>>>>                try {
>>>>>>>                   writeMessageToDocument 
>>>>>>> (message,doc);                   String language = doc.get 
>>>>>>> ("lang");
>>>>>>>                   if (language==null)
>>>>>>>                       language = getIndexLanguage();
>>>>>>>                           synchronized (indexLock) {
>>>>>>>                               openIndex();
>>>>>>>                               writer.addDocument 
>>>>>>> (doc,AnalyzerFactory.getAnalyzer 
>>>>>>> (language,AnalyzerFactory.Operation.INDEX));
>>>>>>>                           }
>>>>>>>                   logger.debug("message indexed successfully  
>>>>>>> {"+message+",language='"+language+"'}");
>>>>>>>                } catch (MessagingException me)
>>>>>>>                {
>>>>>>>                   throw new MessageSearchException("failed to  
>>>>>>> decode message during indexing",me,logger);
>>>>>>>                } catch (IOException me) {
>>>>>>>                    throw new MessageSearchException("failed  
>>>>>>> to index message {"+message+"}",me,logger);
>>>>>>>                } catch (ExtractionException ee)
>>>>>>>                {
>>>>>>>                   throw new MessageSearchException("failed to  
>>>>>>> decode attachments in message {"+message+"}",ee,logger);
>>>>>>>                } catch (Exception e) {
>>>>>>>                    throw new MessageSearchException("failed  
>>>>>>> to index message",e,logger);
>>>>>>>                }
>>>>>>>                logger.debug("indexing message end {"+message 
>>>>>>> +"}");
>>>>>>>                              long e = (new Date()).getTime();
>>>>>>>                logger.debug("indexing time {time='"+(e-s)+"'}");
>>>>>>>            }
>>>>>>>                          protected void closeIndex 
>>>>>>> (IndexWriter writer) {
>>>>>>>
>>>>>>>                       synchronized(indexLock) {
>>>>>>>                                                 if  
>>>>>>> (getAccessStatus()==AccessStatus.CLOSED)
>>>>>>>                                return;
>>>>>>>                                              try {
>>>>>>>                                if (writer!=null)
>>>>>>>                                    writer.close();
>>>>>>>                                try { Thread.sleep(50); }  
>>>>>>> catch (Exception e) {}
>>>>>>>                        } catch (Exception io) {}
>>>>>>>                        setAccessStatus(AccessStatus.CLOSED);
>>>>>>>                       }
>>>>>>>               }
>>>>>>>                      protected void finalize() throws  
>>>>>>> Throwable {
>>>>>>>                logger.debug("volumeindex class is shutting  
>>>>>>> down");
>>>>>>>                try {
>>>>>>>                    closeIndexTimer.cancel();
>>>>>>>                } finally {
>>>>>>>                super.finalize();
>>>>>>>                }
>>>>>>>            }
>>>>>>>                }
>>>>>>>
>>>>>>>>
>>>>>>>> Is it easy to reproduce?
>>>>>>> Its difficult to reproduce since the problem seems  
>>>>>>> intermittant..
>>>>>>>> If so, can you call setInfoStream on your IndexWriter when  
>>>>>>>> creating this index and post the resulting output?
>>>>>>> I'll try this but I cannot guarantee anything. Do you see  
>>>>>>> anything obvious from the above?
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> Jamie wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi There
>>>>>>>>>
>>>>>>>>> I am getting the following error while searching a given  
>>>>>>>>> index:
>>>>>>>>>
>>>>>>>>> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No  
>>>>>>>>> such file or directory)
>>>>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>>>>        at java.io.RandomAccessFile.<init>(Unknown Source)
>>>>>>>>>        at org.apache.lucene.store.FSDirectory$FSIndexInput 
>>>>>>>>> $Descriptor.<init>(FSDirectory.java:506)
>>>>>>>>>        at org.apache.lucene.store.FSDirectory 
>>>>>>>>> $FSIndexInput.<init>(FSDirectory.java:536)
>>>>>>>>>        at org.apache.lucene.store.FSDirectory.openInput 
>>>>>>>>> (FSDirectory.java:445)
>>>>>>>>>        at org.apache.lucene.index.FieldsReader.<init> 
>>>>>>>>> (FieldsReader.java:75)
>>>>>>>>>        at org.apache.lucene.index.SegmentReader.initialize 
>>>>>>>>> (SegmentReader.java:308)
>>>>>>>>>        at org.apache.lucene.index.SegmentReader.get 
>>>>>>>>> (SegmentReader.java:262)
>>>>>>>>>        at org.apache.lucene.index.SegmentReader.get 
>>>>>>>>> (SegmentReader.java:197)
>>>>>>>>>        at org.apache.lucene.index.MultiSegmentReader.<init> 
>>>>>>>>> (MultiSegmentReader.java:55)
>>>>>>>>>        at org.apache.lucene.index.DirectoryIndexReader 
>>>>>>>>> $1.doBody(DirectoryIndexReader.java:75)
>>>>>>>>>        at org.apache.lucene.index.SegmentInfos 
>>>>>>>>> $FindSegmentsFile.run(SegmentInfos.java:636)
>>>>>>>>>        at org.apache.lucene.index.DirectoryIndexReader.open 
>>>>>>>>> (DirectoryIndexReader.java:63)
>>>>>>>>>        at org.apache.lucene.index.IndexReader.open 
>>>>>>>>> (IndexReader.java:209)
>>>>>>>>>        at org.apache.lucene.index.IndexReader.open 
>>>>>>>>> (IndexReader.java:173)
>>>>>>>>>        at org.apache.lucene.search.IndexSearcher.<init> 
>>>>>>>>> (IndexSearcher.java:48)
>>>>>>>>>
>>>>>>>>> My software used to work perfectly under earlier versions  
>>>>>>>>> of Lucene. Since I upgraded to 2.3.1, this problem has arisen.
>>>>>>>>>
>>>>>>>>> I seriously worried my customer's indexes will be  
>>>>>>>>> corrupted. Lucene expects to find a file that does not exist.
>>>>>>>>>
>>>>>>>>> Any ideas on what might be happening and how to rectify this?
>>>>>>>>>
>>>>>>>>> Jamie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>> -------
>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Jamie <ja...@stimulussoft.com>.

Michael McCandless wrote:
>
> Yes fdt/fdx hold stored fields.  When the first buffered document is 
> added these files are created.
>
> The only way they disappear (through Lucene's APIs) is if a writer is 
> opened on that directory, and, those files are not referenced by the 
> current segments file.  This is why I'm concerned about the "two 
> writers at a time" risk.  If a 2nd writer is opened while 1st one is 
> still open that would easily cause this issue, so triple check that 
> the messages you send to your logger on having to remove the 
> write.lock are definitely not happening when you hit this corruption.
I think you could be right.  I am going to try the following change:

 
      public void indexMessage(Email email) throws MessageSearchException {
          VolumeIndex volumeIndex = null;
          synchronized (volumeIndexLock) { // note here
              Volume volume = email.getEmailId().getVolume();
              volumeIndex = volumeIndexes.get(volume);
              if (volumeIndex!=null) {
                  volumeIndex.indexMessage(email);
              } else {
                  volumeIndex = new VolumeIndex(volume);
                  volumeIndexes.put(volume,volumeIndex);
              }
          }
          volumeIndex.indexMessage(email);
      }

>
> Can you post the output of "ls -l" on the corrupted index directory?
>
> One more possibility is that this file failed to be created in the 
> first place, yet, IndexWriter flushed the remaining _0.* files.  I can 
> see one code path that causes this, however, it only happens if you 
> open a new writer, you call addDocument, you hit an exception 
> specifically in the code trying to create the fdt file (eg something 
> like "too many open files"), then you close the writer.  I have a unit 
> test showing this particular exception would result in the _0.* files 
> you see in your index with fdt/fdx missing.  Are you really sure you 
> don't see any exceptions, perhaps from very long ago, against this 
> index, when calling addDocument?  If you are hitting this case, it's 
> already been fixed (this is LUCENE-1198) and backported to the 2.3 
> branch.  Are you able to checkout the current 2.3 branch and run your 
> test using the JAR from there?
>
> Since your index has much later segment files (_1.cfs, _j.cfs), these 
> exceptions could have happened quite a while back (many writers ago) 
> but then only detected when you finally opened a searcher.  So if 
> possible, look way back in your error logs...
>
> Mike
>
> Jamie wrote:
>
>> Hi Michael
>>
>> I've tried to reindex the index several times and no such luck. I've 
>> enabled lucene debugging as you suggested and will let you know as 
>> soon as I have more information. From what I've read, fdt files are 
>> used to hold field data. Could there be any reason why this file is 
>> not being written? Does Lucene recreate this file every time from 
>> scratch? Why would the file completely disappear?
>>
>> Jamie
>>
>>
>>
>> Michael McCandless wrote:
>>>
>>> One more thing: try running with asserts enabled (java -ea).  Lucene 
>>> has a number of assertions that may catch something sooner.
>>>
>>> Also: how often do you try to open a searcher?  Can you try opening 
>>> and then closing a searcher right after you close your writer?  
>>> (Just so we detect the corruption the moment it happens).
>>>
>>> Mike
>>>
>>> Jamie wrote:
>>>
>>>> Hi Michael
>>>>
>>>> Michael McCandless wrote:
>>>>>
>>>>> It looks like you ignore any IOException coming out of 
>>>>> IndexWriter.close?  Can you put some code in the catch clause 
>>>>> around writer.close to see if you are hitting some exception there?
>>>> Sure. I'll do that.
>>>>>
>>>>> Also, you forcefully remove the write lock if it's present.  But 
>>>>> are you absolutely certain there isn't another writer actually 
>>>>> writing to that index directory?
>>>> Yes. There is only ever one writer writing.
>>>>>
>>>>> Do you copy the index or alter it in some way?
>>>> No. absoutely not.
>>>>>   One strange thing in your directory listing was the file 
>>>>> "indexinfo", which isn't a Lucene index file.  Something else must 
>>>>> be writing that file.
>>>> Yes. I neglected to mentioned.... its used by my application to 
>>>> deal with multiple indexes.
>>>>>
>>>>> Mike
>>>>>
>>>>> Jamie wrote:
>>>>>
>>>>>> Hi Michael
>>>>>>
>>>>>> Sorry for the late reply. As you guessed, it missed my attention.
>>>>>>
>>>>>> Michael McCandless wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Can you describe what led up to this?
>>>>>>
>>>>>> My application indexes emails. In this particular instance, I had 
>>>>>> reindexed all emails from their original sources. The error 
>>>>>> occurred while I was using a search to search through the index.
>>>>>>> Were there any exceptions when adding documents to the index?
>>>>>> I had a look through all my application debug logs and there were 
>>>>>> no exceptions outputted.
>>>>>>
>>>>>>>   Was the index newly created with 2.3.1 or created on 2.3.0 or 
>>>>>>> 2.2?
>>>>>> This index was created by v2.3.1
>>>>>>>
>>>>>>> What options are you using in your IndexWriter?
>>>>>>  See source code below:
>>>>>>
>>>>>>      public void indexMessage(Email email) throws 
>>>>>> MessageSearchException {
>>>>>>          Volume volume = email.getEmailId().getVolume();
>>>>>>          VolumeIndex volumeIndex = volumeIndexes.get(volume);
>>>>>>          if (volumeIndex!=null) {
>>>>>>              volumeIndex.indexMessage(email);
>>>>>>          } else {
>>>>>>              volumeIndex = new VolumeIndex(volume);
>>>>>>              volumeIndex.indexMessage(email);
>>>>>>              volumeIndexes.put(volume,volumeIndex);
>>>>>>          }
>>>>>>      }
>>>>>>          public class VolumeIndex {
>>>>>>                        IndexWriter writer;
>>>>>>                Volume volume;
>>>>>>                Timer closeIndexTimer = new Timer();
>>>>>>             AccessStatus volumeOpened = AccessStatus.CLOSED;
>>>>>>             Object indexLock = new Object();
>>>>>>                          public synchronized AccessStatus 
>>>>>> getAccessStatus() { return volumeOpened;}
>>>>>>
>>>>>>              public synchronized void 
>>>>>> setAccessStatus(AccessStatus volumeOpened) {
>>>>>>                  this.volumeOpened = volumeOpened;
>>>>>>             }
>>>>>>                            public VolumeIndex(Volume volume) {
>>>>>>                        this.volume = volume;
>>>>>>                      closeIndexTimer.scheduleAtFixedRate(new 
>>>>>> TimerTask() {
>>>>>>                        public void run() {
>>>>>>                             closeIndex(writer);
>>>>>>                        }
>>>>>>                    }, indexOpenTime, indexOpenTime);
>>>>>>                        }
>>>>>>
>>>>>>              protected void openIndex() throws 
>>>>>> MessageSearchException {
>>>>>>                  synchronized(indexLock) {
>>>>>>                        if (getAccessStatus()==AccessStatus.CLOSED) {
>>>>>>                            logger.debug("openIndex() index will 
>>>>>> be opened. it is currently closed.");
>>>>>>                            openIndex(false);
>>>>>>                            setAccessStatus(AccessStatus.OPEN);
>>>>>>                        } else
>>>>>>                            logger.debug("openIndex() did not 
>>>>>> bother opening index. it is already open.");
>>>>>>                  }
>>>>>>                              }
>>>>>>                          protected void openIndex(boolean retry) 
>>>>>> throws MessageSearchException {
>>>>>>                    if (volume == null)
>>>>>>                        throw new 
>>>>>> MessageSearchException("assertion failure: null volume",logger);
>>>>>>                    logger.debug("opening index for write 
>>>>>> {"+volume+"}");
>>>>>>                    prepareIndex(volume);
>>>>>>                    Index activeIndex = volume.getActiveIndex();
>>>>>>                    logger.debug("opening search index for write 
>>>>>> {indexpath='"+activeIndex.getPath()+"'}");
>>>>>>                    try {
>>>>>>                            writer = new 
>>>>>> IndexWriter(activeIndex.getPath(), analyzer);
>>>>>>                    } catch (IOException io)
>>>>>>                    {
>>>>>>                        if (!retry) {
>>>>>>                            // most obvious reason for error is 
>>>>>> that there is a lock on the index, due hard shutdown
>>>>>>                            // resolution delete the lock, and try 
>>>>>> again
>>>>>>                            logger.warn("failed to open search 
>>>>>> index for write. possible write lock due to hard system 
>>>>>> shutdown.",io);
>>>>>>                            logger.info("attempting recovery. 
>>>>>> deleting index lock file and retrying..");
>>>>>>                            File lockFile = new 
>>>>>> File(activeIndex.getPath()+File.separatorChar + "write.lock");
>>>>>>                            lockFile.delete();
>>>>>>                            try {
>>>>>>                                openIndex(true);
>>>>>>                            } catch (MessageSearchException mse) {
>>>>>>                                throw mse;
>>>>>>                            }
>>>>>>                        }
>>>>>>                        throw new MessageSearchException("failed 
>>>>>> to open/ index writer 
>>>>>> {location='"+activeIndex.getPath()+"'}",io,logger);
>>>>>>                    }
>>>>>>            }
>>>>>>
>>>>>>              public void prepareIndex(Volume volume) throws 
>>>>>> MessageSearchException {
>>>>>>                                if (volume==null)
>>>>>>                            throw new 
>>>>>> MessageSearchException("assertion failure: null volume",logger);
>>>>>>                                if 
>>>>>> (volume.getIndexPath().startsWith("rmi://"))
>>>>>>                          return;
>>>>>>                                          File indexDir = new 
>>>>>> File(volume.getIndexPath());
>>>>>>                  if (!indexDir.exists()) {
>>>>>>                    logger.info("index directory does not exist. 
>>>>>> will proceed with creation {location='" + volume.getIndexPath() + 
>>>>>> "'}");
>>>>>>                    boolean success = indexDir.mkdir();
>>>>>>                    if (!success)
>>>>>>                            throw new 
>>>>>> MessageSearchException("failed to create index directory 
>>>>>> {location='" + volume.getIndexPath() + "'}",logger);
>>>>>>                    logger.info("index directory successfully 
>>>>>> created {location='" + volume.getIndexPath() + "'}");
>>>>>>                  }
>>>>>>                        }
>>>>>>                        public void indexMessage(Email message) 
>>>>>> throws MessageSearchException  {
>>>>>>                long s = (new Date()).getTime();
>>>>>>                if (message == null)
>>>>>>                    throw new MessageSearchException("assertion 
>>>>>> failure: null message",logger);
>>>>>>                logger.debug("indexing message {"+message+"}");
>>>>>>                              Document doc = new Document();
>>>>>>                try {
>>>>>>                   
>>>>>> writeMessageToDocument(message,doc);                   String 
>>>>>> language = doc.get("lang");
>>>>>>                   if (language==null)
>>>>>>                       language = getIndexLanguage();
>>>>>>                           synchronized (indexLock) {
>>>>>>                               openIndex();
>>>>>>                               
>>>>>> writer.addDocument(doc,AnalyzerFactory.getAnalyzer(language,AnalyzerFactory.Operation.INDEX)); 
>>>>>>
>>>>>>                           }
>>>>>>                   logger.debug("message indexed successfully 
>>>>>> {"+message+",language='"+language+"'}");
>>>>>>                } catch (MessagingException me)
>>>>>>                {
>>>>>>                   throw new MessageSearchException("failed to 
>>>>>> decode message during indexing",me,logger);
>>>>>>                } catch (IOException me) {
>>>>>>                    throw new MessageSearchException("failed to 
>>>>>> index message {"+message+"}",me,logger);
>>>>>>                } catch (ExtractionException ee)
>>>>>>                {
>>>>>>                   throw new MessageSearchException("failed to 
>>>>>> decode attachments in message {"+message+"}",ee,logger);
>>>>>>                } catch (Exception e) {
>>>>>>                    throw new MessageSearchException("failed to 
>>>>>> index message",e,logger);
>>>>>>                }
>>>>>>                logger.debug("indexing message end {"+message+"}");
>>>>>>                              long e = (new Date()).getTime();
>>>>>>                logger.debug("indexing time {time='"+(e-s)+"'}");
>>>>>>            }
>>>>>>                          protected void closeIndex(IndexWriter 
>>>>>> writer) {
>>>>>>
>>>>>>                       synchronized(indexLock) {
>>>>>>                                                 if 
>>>>>> (getAccessStatus()==AccessStatus.CLOSED)
>>>>>>                                return;
>>>>>>                                              try {
>>>>>>                                if (writer!=null)
>>>>>>                                    writer.close();
>>>>>>                                try { Thread.sleep(50); } catch 
>>>>>> (Exception e) {}
>>>>>>                        } catch (Exception io) {}
>>>>>>                        setAccessStatus(AccessStatus.CLOSED);
>>>>>>                       }
>>>>>>               }
>>>>>>                      protected void finalize() throws Throwable {
>>>>>>                logger.debug("volumeindex class is shutting down");
>>>>>>                try {
>>>>>>                    closeIndexTimer.cancel();
>>>>>>                } finally {
>>>>>>                super.finalize();
>>>>>>                }
>>>>>>            }
>>>>>>                }
>>>>>>
>>>>>>>
>>>>>>> Is it easy to reproduce?
>>>>>> Its difficult to reproduce since the problem seems intermittant..
>>>>>>> If so, can you call setInfoStream on your IndexWriter when 
>>>>>>> creating this index and post the resulting output?
>>>>>> I'll try this but I cannot guarantee anything. Do you see 
>>>>>> anything obvious from the above?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Jamie wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi There
>>>>>>>>
>>>>>>>> I am getting the following error while searching a given index:
>>>>>>>>
>>>>>>>> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such 
>>>>>>>> file or directory)
>>>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>>>        at java.io.RandomAccessFile.<init>(Unknown Source)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:75)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) 
>>>>>>>>
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>>>>>>>        at 
>>>>>>>> org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48) 
>>>>>>>>
>>>>>>>>
>>>>>>>> My software used to work perfectly under earlier versions of 
>>>>>>>> Lucene. Since I upgraded to 2.3.1, this problem has arisen.
>>>>>>>>
>>>>>>>> I seriously worried my customer's indexes will be corrupted. 
>>>>>>>> Lucene expects to find a file that does not exist.
>>>>>>>>
>>>>>>>> Any ideas on what might be happening and how to rectify this?
>>>>>>>>
>>>>>>>> Jamie
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Michael McCandless <lu...@mikemccandless.com>.

Yes fdt/fdx hold stored fields.  When the first buffered document is  
added these files are created.

The only way they disappear (through Lucene's APIs) is if a writer is  
opened on that directory, and, those files are not referenced by the  
current segments file.  This is why I'm concerned about the "two  
writers at a time" risk.  If a 2nd writer is opened while 1st one is  
still open that would easily cause this issue, so triple check that  
the messages you send to your logger on having to remove the  
write.lock are definitely not happening when you hit this corruption.

Can you post the output of "ls -l" on the corrupted index directory?

One more possibility is that this file failed to be created in the  
first place, yet, IndexWriter flushed the remaining _0.* files.  I  
can see one code path that causes this, however, it only happens if  
you open a new writer, you call addDocument, you hit an exception  
specifically in the code trying to create the fdt file (eg something  
like "too many open files"), then you close the writer.  I have a  
unit test showing this particular exception would result in the _0.*  
files you see in your index with fdt/fdx missing.  Are you really  
sure you don't see any exceptions, perhaps from very long ago,  
against this index, when calling addDocument?  If you are hitting  
this case, it's already been fixed (this is LUCENE-1198) and  
backported to the 2.3 branch.  Are you able to checkout the current  
2.3 branch and run your test using the JAR from there?

Since your index has much later segment files (_1.cfs, _j.cfs), these  
exceptions could have happened quite a while back (many writers ago)  
but then only detected when you finally opened a searcher.  So if  
possible, look way back in your error logs...

Mike

Jamie wrote:

> Hi Michael
>
> I've tried to reindex the index several times and no such luck.  
> I've enabled lucene debugging as you suggested and will let you  
> know as soon as I have more information. From what I've read, fdt  
> files are used to hold field data. Could there be any reason why  
> this file is not being written? Does Lucene recreate this file  
> every time from scratch? Why would the file completely disappear?
>
> Jamie
>
>
>
> Michael McCandless wrote:
>>
>> One more thing: try running with asserts enabled (java -ea).   
>> Lucene has a number of assertions that may catch something sooner.
>>
>> Also: how often do you try to open a searcher?  Can you try  
>> opening and then closing a searcher right after you close your  
>> writer?  (Just so we detect the corruption the moment it happens).
>>
>> Mike
>>
>> Jamie wrote:
>>
>>> Hi Michael
>>>
>>> Michael McCandless wrote:
>>>>
>>>> It looks like you ignore any IOException coming out of  
>>>> IndexWriter.close?  Can you put some code in the catch clause  
>>>> around writer.close to see if you are hitting some exception there?
>>> Sure. I'll do that.
>>>>
>>>> Also, you forcefully remove the write lock if it's present.  But  
>>>> are you absolutely certain there isn't another writer actually  
>>>> writing to that index directory?
>>> Yes. There is only ever one writer writing.
>>>>
>>>> Do you copy the index or alter it in some way?
>>> No. absoutely not.
>>>>   One strange thing in your directory listing was the file  
>>>> "indexinfo", which isn't a Lucene index file.  Something else  
>>>> must be writing that file.
>>> Yes. I neglected to mentioned.... its used by my application to  
>>> deal with multiple indexes.
>>>>
>>>> Mike
>>>>
>>>> Jamie wrote:
>>>>
>>>>> Hi Michael
>>>>>
>>>>> Sorry for the late reply. As you guessed, it missed my attention.
>>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Can you describe what led up to this?
>>>>>
>>>>> My application indexes emails. In this particular instance, I  
>>>>> had reindexed all emails from their original sources. The error  
>>>>> occurred while I was using a search to search through the index.
>>>>>> Were there any exceptions when adding documents to the index?
>>>>> I had a look through all my application debug logs and there  
>>>>> were no exceptions outputted.
>>>>>
>>>>>>   Was the index newly created with 2.3.1 or created on 2.3.0  
>>>>>> or 2.2?
>>>>> This index was created by v2.3.1
>>>>>>
>>>>>> What options are you using in your IndexWriter?
>>>>>  See source code below:
>>>>>
>>>>>      public void indexMessage(Email email) throws  
>>>>> MessageSearchException {
>>>>>          Volume volume = email.getEmailId().getVolume();
>>>>>          VolumeIndex volumeIndex = volumeIndexes.get(volume);
>>>>>          if (volumeIndex!=null) {
>>>>>              volumeIndex.indexMessage(email);
>>>>>          } else {
>>>>>              volumeIndex = new VolumeIndex(volume);
>>>>>              volumeIndex.indexMessage(email);
>>>>>              volumeIndexes.put(volume,volumeIndex);
>>>>>          }
>>>>>      }
>>>>>          public class VolumeIndex {
>>>>>                        IndexWriter writer;
>>>>>                Volume volume;
>>>>>                Timer closeIndexTimer = new Timer();
>>>>>             AccessStatus volumeOpened = AccessStatus.CLOSED;
>>>>>             Object indexLock = new Object();
>>>>>                          public synchronized AccessStatus  
>>>>> getAccessStatus() { return volumeOpened;}
>>>>>
>>>>>              public synchronized void setAccessStatus 
>>>>> (AccessStatus volumeOpened) {
>>>>>                  this.volumeOpened = volumeOpened;
>>>>>             }
>>>>>                            public VolumeIndex(Volume volume) {
>>>>>                        this.volume = volume;
>>>>>                      closeIndexTimer.scheduleAtFixedRate(new  
>>>>> TimerTask() {
>>>>>                        public void run() {
>>>>>                             closeIndex(writer);
>>>>>                        }
>>>>>                    }, indexOpenTime, indexOpenTime);
>>>>>                        }
>>>>>
>>>>>              protected void openIndex() throws  
>>>>> MessageSearchException {
>>>>>                  synchronized(indexLock) {
>>>>>                        if (getAccessStatus() 
>>>>> ==AccessStatus.CLOSED) {
>>>>>                            logger.debug("openIndex() index will  
>>>>> be opened. it is currently closed.");
>>>>>                            openIndex(false);
>>>>>                            setAccessStatus(AccessStatus.OPEN);
>>>>>                        } else
>>>>>                            logger.debug("openIndex() did not  
>>>>> bother opening index. it is already open.");
>>>>>                  }
>>>>>                              }
>>>>>                          protected void openIndex(boolean  
>>>>> retry) throws MessageSearchException {
>>>>>                    if (volume == null)
>>>>>                        throw new MessageSearchException 
>>>>> ("assertion failure: null volume",logger);
>>>>>                    logger.debug("opening index for write  
>>>>> {"+volume+"}");
>>>>>                    prepareIndex(volume);
>>>>>                    Index activeIndex = volume.getActiveIndex();
>>>>>                    logger.debug("opening search index for write  
>>>>> {indexpath='"+activeIndex.getPath()+"'}");
>>>>>                    try {
>>>>>                            writer = new IndexWriter 
>>>>> (activeIndex.getPath(), analyzer);
>>>>>                    } catch (IOException io)
>>>>>                    {
>>>>>                        if (!retry) {
>>>>>                            // most obvious reason for error is  
>>>>> that there is a lock on the index, due hard shutdown
>>>>>                            // resolution delete the lock, and  
>>>>> try again
>>>>>                            logger.warn("failed to open search  
>>>>> index for write. possible write lock due to hard system  
>>>>> shutdown.",io);
>>>>>                            logger.info("attempting recovery.  
>>>>> deleting index lock file and retrying..");
>>>>>                            File lockFile = new File 
>>>>> (activeIndex.getPath()+File.separatorChar + "write.lock");
>>>>>                            lockFile.delete();
>>>>>                            try {
>>>>>                                openIndex(true);
>>>>>                            } catch (MessageSearchException mse) {
>>>>>                                throw mse;
>>>>>                            }
>>>>>                        }
>>>>>                        throw new MessageSearchException("failed  
>>>>> to open/ index writer {location='"+activeIndex.getPath() 
>>>>> +"'}",io,logger);
>>>>>                    }
>>>>>            }
>>>>>
>>>>>              public void prepareIndex(Volume volume) throws  
>>>>> MessageSearchException {
>>>>>                                if (volume==null)
>>>>>                            throw new MessageSearchException 
>>>>> ("assertion failure: null volume",logger);
>>>>>                                if (volume.getIndexPath 
>>>>> ().startsWith("rmi://"))
>>>>>                          return;
>>>>>                                          File indexDir = new  
>>>>> File(volume.getIndexPath());
>>>>>                  if (!indexDir.exists()) {
>>>>>                    logger.info("index directory does not exist.  
>>>>> will proceed with creation {location='" + volume.getIndexPath()  
>>>>> + "'}");
>>>>>                    boolean success = indexDir.mkdir();
>>>>>                    if (!success)
>>>>>                            throw new MessageSearchException 
>>>>> ("failed to create index directory {location='" +  
>>>>> volume.getIndexPath() + "'}",logger);
>>>>>                    logger.info("index directory successfully  
>>>>> created {location='" + volume.getIndexPath() + "'}");
>>>>>                  }
>>>>>                        }
>>>>>                        public void indexMessage(Email message)  
>>>>> throws MessageSearchException  {
>>>>>                long s = (new Date()).getTime();
>>>>>                if (message == null)
>>>>>                    throw new MessageSearchException("assertion  
>>>>> failure: null message",logger);
>>>>>                logger.debug("indexing message {"+message+"}");
>>>>>                              Document doc = new Document();
>>>>>                try {
>>>>>                   writeMessageToDocument 
>>>>> (message,doc);                   String language = doc.get 
>>>>> ("lang");
>>>>>                   if (language==null)
>>>>>                       language = getIndexLanguage();
>>>>>                           synchronized (indexLock) {
>>>>>                               openIndex();
>>>>>                               writer.addDocument 
>>>>> (doc,AnalyzerFactory.getAnalyzer 
>>>>> (language,AnalyzerFactory.Operation.INDEX));
>>>>>                           }
>>>>>                   logger.debug("message indexed successfully  
>>>>> {"+message+",language='"+language+"'}");
>>>>>                } catch (MessagingException me)
>>>>>                {
>>>>>                   throw new MessageSearchException("failed to  
>>>>> decode message during indexing",me,logger);
>>>>>                } catch (IOException me) {
>>>>>                    throw new MessageSearchException("failed to  
>>>>> index message {"+message+"}",me,logger);
>>>>>                } catch (ExtractionException ee)
>>>>>                {
>>>>>                   throw new MessageSearchException("failed to  
>>>>> decode attachments in message {"+message+"}",ee,logger);
>>>>>                } catch (Exception e) {
>>>>>                    throw new MessageSearchException("failed to  
>>>>> index message",e,logger);
>>>>>                }
>>>>>                logger.debug("indexing message end {"+message+"}");
>>>>>                              long e = (new Date()).getTime();
>>>>>                logger.debug("indexing time {time='"+(e-s)+"'}");
>>>>>            }
>>>>>                          protected void closeIndex(IndexWriter  
>>>>> writer) {
>>>>>
>>>>>                       synchronized(indexLock) {
>>>>>                                                 if  
>>>>> (getAccessStatus()==AccessStatus.CLOSED)
>>>>>                                return;
>>>>>                                              try {
>>>>>                                if (writer!=null)
>>>>>                                    writer.close();
>>>>>                                try { Thread.sleep(50); } catch  
>>>>> (Exception e) {}
>>>>>                        } catch (Exception io) {}
>>>>>                        setAccessStatus(AccessStatus.CLOSED);
>>>>>                       }
>>>>>               }
>>>>>                      protected void finalize() throws Throwable {
>>>>>                logger.debug("volumeindex class is shutting down");
>>>>>                try {
>>>>>                    closeIndexTimer.cancel();
>>>>>                } finally {
>>>>>                super.finalize();
>>>>>                }
>>>>>            }
>>>>>                }
>>>>>
>>>>>>
>>>>>> Is it easy to reproduce?
>>>>> Its difficult to reproduce since the problem seems intermittant..
>>>>>> If so, can you call setInfoStream on your IndexWriter when  
>>>>>> creating this index and post the resulting output?
>>>>> I'll try this but I cannot guarantee anything. Do you see  
>>>>> anything obvious from the above?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Jamie wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi There
>>>>>>>
>>>>>>> I am getting the following error while searching a given index:
>>>>>>>
>>>>>>> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No  
>>>>>>> such file or directory)
>>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>>        at java.io.RandomAccessFile.<init>(Unknown Source)
>>>>>>>        at org.apache.lucene.store.FSDirectory$FSIndexInput 
>>>>>>> $Descriptor.<init>(FSDirectory.java:506)
>>>>>>>        at org.apache.lucene.store.FSDirectory 
>>>>>>> $FSIndexInput.<init>(FSDirectory.java:536)
>>>>>>>        at org.apache.lucene.store.FSDirectory.openInput 
>>>>>>> (FSDirectory.java:445)
>>>>>>>        at org.apache.lucene.index.FieldsReader.<init> 
>>>>>>> (FieldsReader.java:75)
>>>>>>>        at org.apache.lucene.index.SegmentReader.initialize 
>>>>>>> (SegmentReader.java:308)
>>>>>>>        at org.apache.lucene.index.SegmentReader.get 
>>>>>>> (SegmentReader.java:262)
>>>>>>>        at org.apache.lucene.index.SegmentReader.get 
>>>>>>> (SegmentReader.java:197)
>>>>>>>        at org.apache.lucene.index.MultiSegmentReader.<init> 
>>>>>>> (MultiSegmentReader.java:55)
>>>>>>>        at org.apache.lucene.index.DirectoryIndexReader 
>>>>>>> $1.doBody(DirectoryIndexReader.java:75)
>>>>>>>        at org.apache.lucene.index.SegmentInfos 
>>>>>>> $FindSegmentsFile.run(SegmentInfos.java:636)
>>>>>>>        at org.apache.lucene.index.DirectoryIndexReader.open 
>>>>>>> (DirectoryIndexReader.java:63)
>>>>>>>        at org.apache.lucene.index.IndexReader.open 
>>>>>>> (IndexReader.java:209)
>>>>>>>        at org.apache.lucene.index.IndexReader.open 
>>>>>>> (IndexReader.java:173)
>>>>>>>        at org.apache.lucene.search.IndexSearcher.<init> 
>>>>>>> (IndexSearcher.java:48)
>>>>>>>
>>>>>>> My software used to work perfectly under earlier versions of  
>>>>>>> Lucene. Since I upgraded to 2.3.1, this problem has arisen.
>>>>>>>
>>>>>>> I seriously worried my customer's indexes will be corrupted.  
>>>>>>> Lucene expects to find a file that does not exist.
>>>>>>>
>>>>>>> Any ideas on what might be happening and how to rectify this?
>>>>>>>
>>>>>>> Jamie
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Stimulus Software - MailArchiva
>>>>> Email Archiving And Compliance
>>>>> USA Tel: +1-713-366-8072 ext 3
>>>>> UK Tel: +44-20-80991035 ext 3
>>>>> Email: jamie@stimulussoft.com
>>>>> Web: http://www.mailarchiva.com
>>>>>
>>>>> To receive MailArchiva Enterprise Edition product  
>>>>> announcements, send a message to: <mailarchiva-enterprise- 
>>>>> edition-subscribe@stimulussoft.com>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>
>>>
>>> -- 
>>> Stimulus Software - MailArchiva
>>> Email Archiving And Compliance
>>> USA Tel: +1-713-366-8072 ext 3
>>> UK Tel: +44-20-80991035 ext 3
>>> Email: jamie@stimulussoft.com
>>> Web: http://www.mailarchiva.com
>>>
>>> To receive MailArchiva Enterprise Edition product announcements,  
>>> send a message to: <mailarchiva-enterprise-edition- 
>>> subscribe@stimulussoft.com>
>>
>
>
> -- 
> Stimulus Software - MailArchiva
> Email Archiving And Compliance
> USA Tel: +1-713-366-8072 ext 3
> UK Tel: +44-20-80991035 ext 3
> Email: jamie@stimulussoft.com
> Web: http://www.mailarchiva.com
>
> To receive MailArchiva Enterprise Edition product announcements,  
> send a message to: <mailarchiva-enterprise-edition- 
> subscribe@stimulussoft.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Michael McCandless <lu...@mikemccandless.com>.

It looks like you ignore any IOException coming out of  
IndexWriter.close?  Can you put some code in the catch clause around  
writer.close to see if you are hitting some exception there?

Also, you forcefully remove the write lock if it's present.  But are  
you absolutely certain there isn't another writer actually writing to  
that index directory?

Do you copy the index or alter it in some way?  One strange thing in  
your directory listing was the file "indexinfo", which isn't a Lucene  
index file.  Something else must be writing that file.

Mike

Jamie wrote:

> Hi Michael
>
> Sorry for the late reply. As you guessed, it missed my attention.
>
> Michael McCandless wrote:
>>
>> Hi,
>>
>> Can you describe what led up to this?
>
> My application indexes emails. In this particular instance, I had  
> reindexed all emails from their original sources. The error  
> occurred while I was using a search to search through the index.
>> Were there any exceptions when adding documents to the index?
> I had a look through all my application debug logs and there were  
> no exceptions outputted.
>
>>   Was the index newly created with 2.3.1 or created on 2.3.0 or 2.2?
> This index was created by v2.3.1
>>
>> What options are you using in your IndexWriter?
>  See source code below:
>
>      public void indexMessage(Email email) throws  
> MessageSearchException {
>          Volume volume = email.getEmailId().getVolume();
>          VolumeIndex volumeIndex = volumeIndexes.get(volume);
>          if (volumeIndex!=null) {
>              volumeIndex.indexMessage(email);
>          } else {
>              volumeIndex = new VolumeIndex(volume);
>              volumeIndex.indexMessage(email);
>              volumeIndexes.put(volume,volumeIndex);
>          }
>      }
>          public class VolumeIndex {
>                        IndexWriter writer;
>                Volume volume;
>                Timer closeIndexTimer = new Timer();
>             AccessStatus volumeOpened = AccessStatus.CLOSED;
>             Object indexLock = new Object();
>                          public synchronized AccessStatus  
> getAccessStatus() { return volumeOpened;}
>
>              public synchronized void setAccessStatus(AccessStatus  
> volumeOpened) {
>                  this.volumeOpened = volumeOpened;
>             }
>                            public VolumeIndex(Volume volume) {
>                        this.volume = volume;
>                      closeIndexTimer.scheduleAtFixedRate(new  
> TimerTask() {
>                        public void run() {
>                             closeIndex(writer);
>                        }
>                    }, indexOpenTime, indexOpenTime);
>                        }
>
>              protected void openIndex() throws  
> MessageSearchException {
>                  synchronized(indexLock) {
>                        if (getAccessStatus()==AccessStatus.CLOSED) {
>                            logger.debug("openIndex() index will be  
> opened. it is currently closed.");
>                            openIndex(false);
>                            setAccessStatus(AccessStatus.OPEN);
>                        } else
>                            logger.debug("openIndex() did not bother  
> opening index. it is already open.");
>                  }
>                              }
>                          protected void openIndex(boolean retry)  
> throws MessageSearchException {
>                    if (volume == null)
>                        throw new MessageSearchException("assertion  
> failure: null volume",logger);
>                    logger.debug("opening index for write {"+volume 
> +"}");
>                    prepareIndex(volume);
>                    Index activeIndex = volume.getActiveIndex();
>                    logger.debug("opening search index for write  
> {indexpath='"+activeIndex.getPath()+"'}");
>                    try {
>                            writer = new IndexWriter 
> (activeIndex.getPath(), analyzer);
>                    } catch (IOException io)
>                    {
>                        if (!retry) {
>                            // most obvious reason for error is that  
> there is a lock on the index, due hard shutdown
>                            // resolution delete the lock, and try  
> again
>                            logger.warn("failed to open search index  
> for write. possible write lock due to hard system shutdown.",io);
>                            logger.info("attempting recovery.  
> deleting index lock file and retrying..");
>                            File lockFile = new File 
> (activeIndex.getPath()+File.separatorChar + "write.lock");
>                            lockFile.delete();
>                            try {
>                                openIndex(true);
>                            } catch (MessageSearchException mse) {
>                                throw mse;
>                            }
>                        }
>                        throw new MessageSearchException("failed to  
> open/ index writer {location='"+activeIndex.getPath()+"'}",io,logger);
>                    }
>            }
>
>              public void prepareIndex(Volume volume) throws  
> MessageSearchException {
>                                if (volume==null)
>                            throw new MessageSearchException 
> ("assertion failure: null volume",logger);
>                                if (volume.getIndexPath().startsWith 
> ("rmi://"))
>                          return;
>                                          File indexDir = new File 
> (volume.getIndexPath());
>                  if (!indexDir.exists()) {
>                    logger.info("index directory does not exist.  
> will proceed with creation {location='" + volume.getIndexPath() +  
> "'}");
>                    boolean success = indexDir.mkdir();
>                    if (!success)
>                            throw new MessageSearchException("failed  
> to create index directory {location='" + volume.getIndexPath() +  
> "'}",logger);
>                    logger.info("index directory successfully  
> created {location='" + volume.getIndexPath() + "'}");
>                  }
>                        }
>                        public void indexMessage(Email message)  
> throws MessageSearchException  {
>                long s = (new Date()).getTime();
>                if (message == null)
>                    throw new MessageSearchException("assertion  
> failure: null message",logger);
>                logger.debug("indexing message {"+message+"}");
>                              Document doc = new Document();
>                try {
>                   writeMessageToDocument 
> (message,doc);                   String language = doc.get("lang");
>                   if (language==null)
>                       language = getIndexLanguage();
>                           synchronized (indexLock) {
>                               openIndex();
>                               writer.addDocument 
> (doc,AnalyzerFactory.getAnalyzer 
> (language,AnalyzerFactory.Operation.INDEX));
>                           }
>                   logger.debug("message indexed successfully  
> {"+message+",language='"+language+"'}");
>                } catch (MessagingException me)
>                {
>                   throw new MessageSearchException("failed to  
> decode message during indexing",me,logger);
>                } catch (IOException me) {
>                    throw new MessageSearchException("failed to  
> index message {"+message+"}",me,logger);
>                } catch (ExtractionException ee)
>                {
>                   throw new MessageSearchException("failed to  
> decode attachments in message {"+message+"}",ee,logger);
>                } catch (Exception e) {
>                    throw new MessageSearchException("failed to  
> index message",e,logger);
>                }
>                logger.debug("indexing message end {"+message+"}");
>                              long e = (new Date()).getTime();
>                logger.debug("indexing time {time='"+(e-s)+"'}");
>            }
>                          protected void closeIndex(IndexWriter  
> writer) {
>
>                       synchronized(indexLock) {
>                                                 if (getAccessStatus 
> ()==AccessStatus.CLOSED)
>                                return;
>                                              try {
>                                if (writer!=null)
>                                    writer.close();
>                                try { Thread.sleep(50); } catch  
> (Exception e) {}
>                        } catch (Exception io) {}
>                        setAccessStatus(AccessStatus.CLOSED);
>                       }
>               }
>                      protected void finalize() throws Throwable {
>                logger.debug("volumeindex class is shutting down");
>                try {
>                    closeIndexTimer.cancel();
>                } finally {
>                super.finalize();
>                }
>            }
>                }
>
>>
>> Is it easy to reproduce?
> Its difficult to reproduce since the problem seems intermittant..
>> If so, can you call setInfoStream on your IndexWriter when  
>> creating this index and post the resulting output?
> I'll try this but I cannot guarantee anything. Do you see anything  
> obvious from the above?
>>
>> Mike
>>
>> Jamie wrote:
>>
>>>
>>> Hi There
>>>
>>> I am getting the following error while searching a given index:
>>>
>>> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such  
>>> file or directory)
>>>        at java.io.RandomAccessFile.open(Native Method)
>>>        at java.io.RandomAccessFile.<init>(Unknown Source)
>>>        at org.apache.lucene.store.FSDirectory$FSIndexInput 
>>> $Descriptor.<init>(FSDirectory.java:506)
>>>        at org.apache.lucene.store.FSDirectory$FSIndexInput.<init> 
>>> (FSDirectory.java:536)
>>>        at org.apache.lucene.store.FSDirectory.openInput 
>>> (FSDirectory.java:445)
>>>        at org.apache.lucene.index.FieldsReader.<init> 
>>> (FieldsReader.java:75)
>>>        at org.apache.lucene.index.SegmentReader.initialize 
>>> (SegmentReader.java:308)
>>>        at org.apache.lucene.index.SegmentReader.get 
>>> (SegmentReader.java:262)
>>>        at org.apache.lucene.index.SegmentReader.get 
>>> (SegmentReader.java:197)
>>>        at org.apache.lucene.index.MultiSegmentReader.<init> 
>>> (MultiSegmentReader.java:55)
>>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody 
>>> (DirectoryIndexReader.java:75)
>>>        at org.apache.lucene.index.SegmentInfos 
>>> $FindSegmentsFile.run(SegmentInfos.java:636)
>>>        at org.apache.lucene.index.DirectoryIndexReader.open 
>>> (DirectoryIndexReader.java:63)
>>>        at org.apache.lucene.index.IndexReader.open 
>>> (IndexReader.java:209)
>>>        at org.apache.lucene.index.IndexReader.open 
>>> (IndexReader.java:173)
>>>        at org.apache.lucene.search.IndexSearcher.<init> 
>>> (IndexSearcher.java:48)
>>>
>>> My software used to work perfectly under earlier versions of  
>>> Lucene. Since I upgraded to 2.3.1, this problem has arisen.
>>>
>>> I seriously worried my customer's indexes will be corrupted.  
>>> Lucene expects to find a file that does not exist.
>>>
>>> Any ideas on what might be happening and how to rectify this?
>>>
>>> Jamie
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> -- 
> Stimulus Software - MailArchiva
> Email Archiving And Compliance
> USA Tel: +1-713-366-8072 ext 3
> UK Tel: +44-20-80991035 ext 3
> Email: jamie@stimulussoft.com
> Web: http://www.mailarchiva.com
>
> To receive MailArchiva Enterprise Edition product announcements,  
> send a message to: <mailarchiva-enterprise-edition- 
> subscribe@stimulussoft.com>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Jamie <ja...@stimulussoft.com>.

Hi Michael

Sorry for the late reply. As you guessed, it missed my attention.

Michael McCandless wrote:
>
> Hi,
>
> Can you describe what led up to this?

My application indexes emails. In this particular instance, I had 
reindexed all emails from their original sources. The error occurred 
while I was using a search to search through the index.
> Were there any exceptions when adding documents to the index?
I had a look through all my application debug logs and there were no 
exceptions outputted.

>   Was the index newly created with 2.3.1 or created on 2.3.0 or 2.2?
This index was created by v2.3.1
>
> What options are you using in your IndexWriter?
  See source code below:

      public void indexMessage(Email email) throws MessageSearchException {
          Volume volume = email.getEmailId().getVolume();
          VolumeIndex volumeIndex = volumeIndexes.get(volume);
          if (volumeIndex!=null) {
              volumeIndex.indexMessage(email);
          } else {
              volumeIndex = new VolumeIndex(volume);
              volumeIndex.indexMessage(email);
              volumeIndexes.put(volume,volumeIndex);
          }
      }
     
      public class VolumeIndex {
         
                IndexWriter writer;
                Volume volume;
                Timer closeIndexTimer = new Timer();
             AccessStatus volumeOpened = AccessStatus.CLOSED;
             Object indexLock = new Object();
             
              public synchronized AccessStatus getAccessStatus() { 
return volumeOpened;}

              public synchronized void setAccessStatus(AccessStatus 
volumeOpened) {
                  this.volumeOpened = volumeOpened;
             }
             
                public VolumeIndex(Volume volume) {
                        this.volume = volume;
                      closeIndexTimer.scheduleAtFixedRate(new TimerTask() {
                        public void run() {
                             closeIndex(writer);
                        }
                    }, indexOpenTime, indexOpenTime);
         
                }

              protected void openIndex() throws MessageSearchException {
                  synchronized(indexLock) {
                        if (getAccessStatus()==AccessStatus.CLOSED) {
                            logger.debug("openIndex() index will be 
opened. it is currently closed.");
                            openIndex(false);
                            setAccessStatus(AccessStatus.OPEN);
                        } else
                            logger.debug("openIndex() did not bother 
opening index. it is already open.");
                  }
                 
              }
             
              protected void openIndex(boolean retry) throws 
MessageSearchException {
                    if (volume == null)
                        throw new MessageSearchException("assertion 
failure: null volume",logger);
                    logger.debug("opening index for write {"+volume+"}");
                    prepareIndex(volume);
                    Index activeIndex = volume.getActiveIndex();
                    logger.debug("opening search index for write 
{indexpath='"+activeIndex.getPath()+"'}");
                    try {
                            writer = new 
IndexWriter(activeIndex.getPath(), analyzer);
                    } catch (IOException io)
                    {
                        if (!retry) {
                            // most obvious reason for error is that 
there is a lock on the index, due hard shutdown
                            // resolution delete the lock, and try again
                            logger.warn("failed to open search index for 
write. possible write lock due to hard system shutdown.",io);
                            logger.info("attempting recovery. deleting 
index lock file and retrying..");
                            File lockFile = new 
File(activeIndex.getPath()+File.separatorChar + "write.lock");
                            lockFile.delete();
                            try {
                                openIndex(true);
                            } catch (MessageSearchException mse) {
                                throw mse;
                            }
                        }
                        throw new MessageSearchException("failed to 
open/ index writer {location='"+activeIndex.getPath()+"'}",io,logger);
                    }
            }
               
             

              public void prepareIndex(Volume volume) throws 
MessageSearchException {
               
                  if (volume==null)
                            throw new MessageSearchException("assertion 
failure: null volume",logger);
               
                  if (volume.getIndexPath().startsWith("rmi://"))
                          return;
                         
                  File indexDir = new File(volume.getIndexPath());
                  if (!indexDir.exists()) {
                    logger.info("index directory does not exist. will 
proceed with creation {location='" + volume.getIndexPath() + "'}");
                    boolean success = indexDir.mkdir();
                    if (!success)
                            throw new MessageSearchException("failed to 
create index directory {location='" + volume.getIndexPath() + "'}",logger);
                    logger.info("index directory successfully created 
{location='" + volume.getIndexPath() + "'}");
                  }
           
              }
             
            public void indexMessage(Email message) throws 
MessageSearchException  {
                long s = (new Date()).getTime();
                if (message == null)
                    throw new MessageSearchException("assertion failure: 
null message",logger);
                logger.debug("indexing message {"+message+"}");
               
                Document doc = new Document();
                try {
                   writeMessageToDocument(message,doc); 
                   String language = doc.get("lang");
                   if (language==null)
                       language = getIndexLanguage();
                           synchronized (indexLock) {
                               openIndex();
                               
writer.addDocument(doc,AnalyzerFactory.getAnalyzer(language,AnalyzerFactory.Operation.INDEX));
                           }
                   logger.debug("message indexed successfully 
{"+message+",language='"+language+"'}");
                } catch (MessagingException me)
                {
                   throw new MessageSearchException("failed to decode 
message during indexing",me,logger);
                } catch (IOException me) {
                    throw new MessageSearchException("failed to index 
message {"+message+"}",me,logger);
                } catch (ExtractionException ee)
                {
                   throw new MessageSearchException("failed to decode 
attachments in message {"+message+"}",ee,logger);
                } catch (Exception e) {
                    throw new MessageSearchException("failed to index 
message",e,logger);
                }
                logger.debug("indexing message end {"+message+"}");
               
                long e = (new Date()).getTime();
                logger.debug("indexing time {time='"+(e-s)+"'}");
            }
               
            protected void closeIndex(IndexWriter writer) {

                       synchronized(indexLock) {
                          
                        if (getAccessStatus()==AccessStatus.CLOSED)
                                return;
                       
                        try {
                                if (writer!=null)
                                    writer.close();
                                try { Thread.sleep(50); } catch 
(Exception e) {}
                        } catch (Exception io) {}
                        setAccessStatus(AccessStatus.CLOSED);
                       }
               }
           
            protected void finalize() throws Throwable {
                logger.debug("volumeindex class is shutting down");
                try {
                    closeIndexTimer.cancel();
                } finally {
                super.finalize();
                }
            }
           
      }
   
>
> Is it easy to reproduce?
Its difficult to reproduce since the problem seems intermittant..
> If so, can you call setInfoStream on your IndexWriter when creating 
> this index and post the resulting output?
I'll try this but I cannot guarantee anything. Do you see anything 
obvious from the above?
>
> Mike
>
> Jamie wrote:
>
>>
>> Hi There
>>
>> I am getting the following error while searching a given index:
>>
>> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such file 
>> or directory)
>>        at java.io.RandomAccessFile.open(Native Method)
>>        at java.io.RandomAccessFile.<init>(Unknown Source)
>>        at 
>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506) 
>>
>>        at 
>> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536) 
>>
>>        at 
>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
>>        at 
>> org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:75)
>>        at 
>> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
>>        at 
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>>        at 
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
>>        at 
>> org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55) 
>>
>>        at 
>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75) 
>>
>>        at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636) 
>>
>>        at 
>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) 
>>
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>        at 
>> org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
>>
>> My software used to work perfectly under earlier versions of Lucene. 
>> Since I upgraded to 2.3.1, this problem has arisen.
>>
>> I seriously worried my customer's indexes will be corrupted. Lucene 
>> expects to find a file that does not exist.
>>
>> Any ideas on what might be happening and how to rectify this?
>>
>> Jamie
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


-- 
Stimulus Software - MailArchiva
Email Archiving And Compliance
USA Tel: +1-713-366-8072 ext 3
UK Tel: +44-20-80991035 ext 3
Email: jamie@stimulussoft.com
Web: http://www.mailarchiva.com

To receive MailArchiva Enterprise Edition product announcements, send a message to: <ma...@stimulussoft.com> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.3.1 Index Corruption?

Posted by Michael McCandless <lu...@mikemccandless.com>.

Hi,

Can you describe what led up to this?  Were there any exceptions when  
adding documents to the index?  Was the index newly created with  
2.3.1 or created on 2.3.0 or 2.2?

What options are you using in your IndexWriter?

Is it easy to reproduce?  If so, can you call setInfoStream on your  
IndexWriter when creating this index and post the resulting output?

Mike

Jamie wrote:

>
> Hi There
>
> I am getting the following error while searching a given index:
>
> java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such  
> file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(Unknown Source)
>        at org.apache.lucene.store.FSDirectory$FSIndexInput 
> $Descriptor.<init>(FSDirectory.java:506)
>        at org.apache.lucene.store.FSDirectory$FSIndexInput.<init> 
> (FSDirectory.java:536)
>        at org.apache.lucene.store.FSDirectory.openInput 
> (FSDirectory.java:445)
>        at org.apache.lucene.index.FieldsReader.<init> 
> (FieldsReader.java:75)
>        at org.apache.lucene.index.SegmentReader.initialize 
> (SegmentReader.java:308)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:262)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:197)
>        at org.apache.lucene.index.MultiSegmentReader.<init> 
> (MultiSegmentReader.java:55)
>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody 
> (DirectoryIndexReader.java:75)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run 
> (SegmentInfos.java:636)
>        at org.apache.lucene.index.DirectoryIndexReader.open 
> (DirectoryIndexReader.java:63)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
> 209)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
> 173)
>        at org.apache.lucene.search.IndexSearcher.<init> 
> (IndexSearcher.java:48)
>
> My software used to work perfectly under earlier versions of  
> Lucene. Since I upgraded to 2.3.1, this problem has arisen.
>
> I seriously worried my customer's indexes will be corrupted. Lucene  
> expects to find a file that does not exist.
>
> Any ideas on what might be happening and how to rectify this?
>
> Jamie
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Lucene 2.3.1 Index Corruption?

Posted by Jamie <ja...@stimulussoft.com>.

Hi There

I am getting the following error while searching a given index:

java.io.FileNotFoundException: /usr/local/index/_0.fdt (No such file or 
directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(Unknown Source)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
        at 
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:75)
        at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
        at 
org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
        at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at 
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)

My software used to work perfectly under earlier versions of Lucene. 
Since I upgraded to 2.3.1, this problem has arisen.

I seriously worried my customer's indexes will be corrupted. Lucene 
expects to find a file that does not exist.

Any ideas on what might be happening and how to rectify this?

Jamie


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Huge number of Term objects in memory gives OutOfMemory error

Posted by Michael McCandless <lu...@mikemccandless.com>.

<Ri...@gxs.com> wrote:

>
> Does each searchable have it's own copy of Term and TermInfo  
> arrays?  So the amount in memory would grow with each new  
> Searchable instance? If so, it might be worthwhile to implement a  
> singleton MultiSearcher that is closed and re-opened periodically.   
> What do you think?

Yes, yes and yes a single shared MultiSearcher would be better.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Huge number of Term objects in memory gives OutOfMemory error

Posted by Ri...@gxs.com.

Does each searchable have it's own copy of Term and TermInfo arrays?  So the amount in memory would grow with each new Searchable instance? If so, it might be worthwhile to implement a singleton MultiSearcher that is closed and re-opened periodically.  What do you think?

Thanks again,
Rich
________________________________________
From: Michael McCandless [lucene@mikemccandless.com]
Sent: Monday, March 17, 2008 6:27 PM
To: java-user@lucene.apache.org
Subject: Re: Huge number of Term objects in memory gives OutOfMemory error

You can call IndexReader.setTermInfosIndexDivisor(int) to reduce how
many index terms are loaded in memory.  EG setting it to 10 will load
1/10th what's loaded now, but will slow down searches.

Also, you should understand why your index has so many terms.  EG,
use Luke to peek at the terms and see if they are "valid".  If for
example you are accidentally indexing binary content as if it were
text that can easily cause a great many, large, unwanted terms.

Mike

<Ri...@gxs.com> wrote:

> I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux.  We
> have fairly large collections (~1gig collection files, ~1,000,000
> documents).  When I try to load test our application with 50 users,
> all doing simple searches via a web interface, we quickly get an
> OutOfMemory exception.  When I do a jmap dump of the heap, this is
> what I see:
>
> Size    Count   Class description
> -------------------------------------------------------
> 195818576       4263822 char[]
> 190889608       13259   byte[]
> 172316640       4307916 java.lang.String
> 164813120       4120328 org.apache.lucene.index.TermInfo
> 131823104       4119472 org.apache.lucene.index.Term
> 37729184        604     org.apache.lucene.index.TermInfo[]
> 37729184        604     org.apache.lucene.index.Term[]
>
> So 4 of the top 7 memory consumers are Term related.  We have 2 gig
> of RAM available on the system but we get OOM errors no matter the
> java heap settings.  Has anyone seen this issue and know how to
> solve it?
>
> We do use separate MultiSearcher instances for each search.  (We
> actually have 2 collections that we search via a MultiSearcher.) We
> tried using a singleton searcher instance but our collections are
> constantly being updated and the singleton searcher only gives you
> results since the searcher was opened.  Creating new searcher
> objects at search time gives you up to the minute search results.
>
> I've seen some postings referring to an Index Divisor setting which
> could reduce the Terms in memory, but I have not seen how to set
> this value for Lucene.
>
> Any help would be greatly appreciated.
>
> Rich


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Huge number of Term objects in memory gives OutOfMemory error

Posted by Michael McCandless <lu...@mikemccandless.com>.

You can call IndexReader.setTermInfosIndexDivisor(int) to reduce how  
many index terms are loaded in memory.  EG setting it to 10 will load  
1/10th what's loaded now, but will slow down searches.

Also, you should understand why your index has so many terms.  EG,  
use Luke to peek at the terms and see if they are "valid".  If for  
example you are accidentally indexing binary content as if it were  
text that can easily cause a great many, large, unwanted terms.

Mike

<Ri...@gxs.com> wrote:

> I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux.  We  
> have fairly large collections (~1gig collection files, ~1,000,000  
> documents).  When I try to load test our application with 50 users,  
> all doing simple searches via a web interface, we quickly get an  
> OutOfMemory exception.  When I do a jmap dump of the heap, this is  
> what I see:
>
> Size    Count   Class description
> -------------------------------------------------------
> 195818576       4263822 char[]
> 190889608       13259   byte[]
> 172316640       4307916 java.lang.String
> 164813120       4120328 org.apache.lucene.index.TermInfo
> 131823104       4119472 org.apache.lucene.index.Term
> 37729184        604     org.apache.lucene.index.TermInfo[]
> 37729184        604     org.apache.lucene.index.Term[]
>
> So 4 of the top 7 memory consumers are Term related.  We have 2 gig  
> of RAM available on the system but we get OOM errors no matter the  
> java heap settings.  Has anyone seen this issue and know how to  
> solve it?
>
> We do use separate MultiSearcher instances for each search.  (We  
> actually have 2 collections that we search via a MultiSearcher.) We  
> tried using a singleton searcher instance but our collections are  
> constantly being updated and the singleton searcher only gives you  
> results since the searcher was opened.  Creating new searcher  
> objects at search time gives you up to the minute search results.
>
> I've seen some postings referring to an Index Divisor setting which  
> could reduce the Terms in memory, but I have not seen how to set  
> this value for Lucene.
>
> Any help would be greatly appreciated.
>
> Rich


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Huge number of Term objects in memory gives OutOfMemory error

Posted by Paul Smith <ps...@aconex.com>.

I'll bet the byte[] are the Norm data per field.  If you have a lot of  
fields and do not need the normalization data for every field, I'd  
suggest turning that option off for fields you don't need the  
normalization for scoring.  The calculation I understand is:

1 byte x (# fields with normalization turned on) x (# documents within  
the index)

adds up pretty quickly!

The char[] & String's will be your FieldCache's, probably used for  
sorting.  Do you do any sorting other than by relevance?

cheers,

Paul

On 18/03/2008, at 8:57 AM, <Ri...@gxs.com> wrote:

> I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux.  We  
> have fairly large collections (~1gig collection files, ~1,000,000  
> documents).  When I try to load test our application with 50 users,  
> all doing simple searches via a web interface, we quickly get an  
> OutOfMemory exception.  When I do a jmap dump of the heap, this is  
> what I see:
>
> Size    Count   Class description
> -------------------------------------------------------
> 195818576       4263822 char[]
> 190889608       13259   byte[]
> 172316640       4307916 java.lang.String
> 164813120       4120328 org.apache.lucene.index.TermInfo
> 131823104       4119472 org.apache.lucene.index.Term
> 37729184        604     org.apache.lucene.index.TermInfo[]
> 37729184        604     org.apache.lucene.index.Term[]
>
> So 4 of the top 7 memory consumers are Term related.  We have 2 gig  
> of RAM available on the system but we get OOM errors no matter the  
> java heap settings.  Has anyone seen this issue and know how to  
> solve it?
>
> We do use separate MultiSearcher instances for each search.  (We  
> actually have 2 collections that we search via a MultiSearcher.) We  
> tried using a singleton searcher instance but our collections are  
> constantly being updated and the singleton searcher only gives you  
> results since the searcher was opened.  Creating new searcher  
> objects at search time gives you up to the minute search results.
>
> I've seen some postings referring to an Index Divisor setting which  
> could reduce the Terms in memory, but I have not seen how to set  
> this value for Lucene.
>
> Any help would be greatly appreciated.
>
> Rich

Paul Smith
Core Engineering Manager

Aconex
The easy way to save time and money on your project

696 Bourke Street, Melbourne,
VIC 3000, Australia
Tel: +61 3 9240 0200  Fax: +61 3 9240 0299
Email: psmith@aconex.com  www.aconex.com

This email and any attachments are intended solely for the addressee.  
The contents may be privileged, confidential and/or subject to  
copyright or other applicable law. No confidentiality or privilege is  
lost by an erroneous transmission. If you have received this e-mail in  
error, please let us know by reply e-mail and delete or destroy this  
mail and all copies. If you are not the intended recipient of this  
message you must not disseminate, copy or take any action in reliance  
on it. The sender takes no responsibility for the effect of this  
message upon the recipient's computer system.