You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by William Pierce <ev...@hotmail.com> on 2008/11/14 18:45:57 UTC

Fatal exception in solr 1.3+ replication

Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the replication handler.   I am running on windows 2003 server with tomcat 6.0.14.   Everything was running fine until I noticed that certain updated records were not showing up on the slave.  Further investigation showed me that the failures have indeed been occurring since early this morning with a fatal exception....here is a segment of the tomcat log:
  INFO: Total time taken for download : 0 secs
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
  INFO: Conf files are not downloaded or are in sync
  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler doSnapPull
  SEVERE: SnapPull failed 
  org.apache.solr.common.SolrException: Snappull failed : 
   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
   at org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.RuntimeException: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
   at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
   at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
   ... 11 more
  Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
   at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
   at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
   at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
   at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
   at org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
   at org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
   ... 14 more
  Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 commit

Any ideas, anyone?

-- Bill

Re: Fatal exception in solr 1.3+ replication

Posted by William Pierce <ev...@hotmail.com>.
Not easily no...It has occurred twice on my machine but what triggers it I 
do not know.  Mark Miller has provided some explanations for what may be 
going on in Lucene that may be causing this....Cf. his last email....

- Bill

--------------------------------------------------
From: "Noble Paul ??????? ??????" <no...@gmail.com>
Sent: Saturday, November 15, 2008 11:40 PM
To: <so...@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication

> Is this issue visible for consistently ? I mean are you able to
> reproduce this easily?
>
> On Fri, Nov 14, 2008 at 11:15 PM, William Pierce <ev...@hotmail.com> 
> wrote:
>> Folks:
>>
>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>> replication handler.   I am running on windows 2003 server with tomcat 
>> 6.0.14.   Everything was running fine until I noticed that certain 
>> updated records were not showing up on the slave.  Further investigation 
>> showed me that the failures have indeed been occurring since early this 
>> morning with a fatal exception....here is a segment of the tomcat log:
>>  INFO: Total time taken for download : 0 secs
>>  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>> fetchLatestIndex
>>  INFO: Conf files are not downloaded or are in sync
>>  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
>> commit
>>  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
>> doSnapPull
>>  SEVERE: SnapPull failed
>>  org.apache.solr.common.SolrException: Snappull failed :
>>   at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>   at 
>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>>   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>   at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>   at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>   at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>   at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>>   at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>>   at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>   at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>   at java.lang.Thread.run(Thread.java:619)
>>  Caused by: java.lang.RuntimeException: 
>> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>   at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>>   at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>   at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>   ... 11 more
>>  Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>> Directory is closed
>>   at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>   at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>   at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>>   at 
>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>>   at 
>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>>   at 
>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>   ... 14 more
>>  Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
>> commit
>>
>> Any ideas, anyone?
>>
>> -- Bill
>
>
>
> -- 
> --Noble Paul
> 

Re: Fatal exception in solr 1.3+ replication

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Is this issue visible for consistently ? I mean are you able to
reproduce this easily?

On Fri, Nov 14, 2008 at 11:15 PM, William Pierce <ev...@hotmail.com> wrote:
> Folks:
>
> I am using the nightly build of 1.3 as of Oct 23 so as to use the replication handler.   I am running on windows 2003 server with tomcat 6.0.14.   Everything was running fine until I noticed that certain updated records were not showing up on the slave.  Further investigation showed me that the failures have indeed been occurring since early this morning with a fatal exception....here is a segment of the tomcat log:
>  INFO: Total time taken for download : 0 secs
>  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
>  INFO: Conf files are not downloaded or are in sync
>  Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
>  INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>  Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler doSnapPull
>  SEVERE: SnapPull failed
>  org.apache.solr.common.SolrException: Snappull failed :
>   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>   at org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
>  Caused by: java.lang.RuntimeException: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>   at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>   at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>   at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>   ... 11 more
>  Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>   at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>   at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>   at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>   at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>   at org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>   at org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>   ... 14 more
>  Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 commit
>
> Any ideas, anyone?
>
> -- Bill



-- 
--Noble Paul

Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
I meant patch as in a source code patch, so I'm afraid your kind of in a 
tough spot. Thats part of the 'trunk running' risk unfortunately...

You've done it once though, so I am sure you can manage again right ? 
I'm not sure exactly what state your checkout is (though I suppose I can 
guess close from the date), which makes producing a source patch 
difficult, but essentially hiding the problem is pretty simple.

Take your check out (which I hope you have or can get again...the only 
safe way to play the trunk game) and make the following simple changes:

We need to stop using the SolrIndexSearcher constructors that take a 
String or File rather than a Directory or Reader. The key spot is in 
SolrCore:

line: approx 1010
    try {
      newestSearcher = getNewestSearcher(false);
      if (newestSearcher != null) {
        IndexReader currentReader = newestSearcher.get().getReader();
        String newIndexDir = getNewIndexDir();
        if(new File(getIndexDir()).equals(new File(newIndexDir)))  {
          IndexReader newReader = currentReader.reopen();

          if(newReader == currentReader) {
            currentReader.incRef();
          }

          tmp = new SolrIndexSearcher(this, schema, "main", newReader, 
true, true);
        } else  {
          tmp = new SolrIndexSearcher(this, schema, "main", newIndexDir, 
true);
        }
      } else {
        tmp = new SolrIndexSearcher(this, schema, "main", 
getNewIndexDir(), true);

You can see the two lower SolrIndexSearchers are inited with a String. 
You want to init them with a Directory instead,

        tmp = new SolrIndexSearcher(this, schema, "main", 
FSDirectory.getDirectory(newIndexDir), true, true);  // you will need to 
add another true param here

By passing a Directory rather than a String, the underlying IndexReaders 
will not try to close the Directory and you won't hit that error. Trunk 
no longer has this problem exposed because we now supply  Directories to 
these constructors (though for a different reason).

If your hesitant on any of this, you might try trunk and just test it 
out after looking at the changes that have been put in, or you might 
email me privately and I may be able to point you to some alternate options.

- Mark


William Pierce wrote:
> Mark,
>
> That sounds great!    Good luck with the cleaning :-)
>
> Let me know how I can get a patch --- I'd prefer not do a solr build 
> from source since we are not Java savvy here....:-(
>
> - Bill
>
> --------------------------------------------------
> From: "Mark Miller" <ma...@gmail.com>
> Sent: Saturday, November 15, 2008 12:43 PM
> To: <so...@lucene.apache.org>
> Subject: Re: Fatal exception in solr 1.3+ replication
>
>> Okay, I'm fairly certain I've found it. As usual, take a walk and the 
>> solution pops into your head out of the blue.
>>
>> It looks like Lucene's IndexReader reopen call is not very friendly 
>> with the FSDirectory implementation. If you call reopen and it 
>> returns a new IndexReader, it creates a new reference on the 
>> Directory - so if you reopen an IndexReader that was originally 
>> opened with a non Directory parameter (String or File instead), both 
>> Readers (the reopened one and the one your reopening on) will close 
>> the Directory when they close. Thats not right. Thats how we get to 0 
>> faster than we should. So its kind of a Lucene issue.
>>
>> My guess that this is hidden in the trunk was right, because I think 
>> we are no longer using String, File based IndexReader opens, which 
>> means our IndexReaders don't attempt to close their underlying 
>> Directories now.
>>
>> I can probably send you patch for the revision your on to hide this 
>> as well, but I'm already in the doghouse on cleaning right now ; ) 
>> The way my brain works, I'll probably be back to this later though.
>>
>> - Mark
>>
>>
>> William Pierce wrote:
>>>> Trunk may actually still hide the issue (possibly), but something 
>>>> really funky seems to have gone on and I can't find it yet. Do you 
>>>> have any custom code interacting with solr?
>>>
>>> None whatsoever...I am using out-of-the-box solr 1.3 (build of 
>>> 10/23).  I am using my C# app to http requests to my solr instance.
>>>
>>> Is there something you want me to try at my end that might give you 
>>> a clue? Let me know and I can try to help out.
>>>
>>> Best,
>>>
>>> - Bill
>>>
>>> --------------------------------------------------
>>> From: "Mark Miller" <ma...@gmail.com>
>>> Sent: Saturday, November 15, 2008 10:59 AM
>>> To: <so...@lucene.apache.org>
>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>
>>>> Havn't given up, but this has really got me so far. For every path, 
>>>> FSDirectory allows just one instance of FSDirectory to exist, and 
>>>> it keeps a ref count of how many have been returned from 
>>>> openDirectory for a given path. An FSDirectory will not actually be 
>>>> closed unless all references to it are released (it doesn't 
>>>> actually even do anything in close, other than drop the reference). 
>>>> So pretty much, the only way to get into trouble is to call close 
>>>> enough times to equal how many times you called openDirectory, and 
>>>> then try to use the FSDirectory again. This is what your stack 
>>>> trace indicates is happening.
>>>>
>>>> So we can get the call hierarchy for directory.close() in solr, and 
>>>> we find that everything is pretty matched up...at worst it looks 
>>>> that a reference might not be closed - but that doesn't hurt 
>>>> anything...FSDirectory will just think there is something out there 
>>>> holding onto a reference for that directory and allow you to 
>>>> continue using it, even though no such reference might exist. Its 
>>>> only when enough closes are called that an instance will be marked 
>>>> as closed (a further openDirectory will return a new open 
>>>> instance). So to get your error, something has to close a Directory 
>>>> that it did not get from openDirectory.
>>>>
>>>> We know that the IndexReader that is trying to be reopened must 
>>>> have called openDirectory (or something called openDirectory for 
>>>> it) and we know that it hasn't called close (the IndexReader has 
>>>> already held up to an ensureOpen call on itself in that stack 
>>>> trace). Something else must have closed it. I can't find this 
>>>> happening. Nothing else calls close on Directory unless it called 
>>>> openDirectory (that I can find using all of Eclipses magical 
>>>> goodness).
>>>>
>>>> So how does the refcount on the Directory hit 0? I can't find or 
>>>> duplicate yet...
>>>>
>>>> Trunk may actually still hide the issue (possibly), but something 
>>>> really funky seems to have gone on and I can't find it yet. Do you 
>>>> have any custom code interacting with solr?
>>>>
>>>> - Mark
>>>>
>>>> William Pierce wrote:
>>>>> Mark,
>>>>>
>>>>> Thanks for your help and lucid exposition of what may be going 
>>>>> on....I will await to hear from you before I plough ahead with the 
>>>>> latest trunk bits....
>>>>>
>>>>> Best regards,
>>>>>
>>>>> -- Bill
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Mark Miller" <ma...@gmail.com>
>>>>> Sent: Saturday, November 15, 2008 7:29 AM
>>>>> To: <so...@lucene.apache.org>
>>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>>
>>>>>> Hang on. The more I look at this, the more I am thinking that was 
>>>>>> not the problem. Directories are pretty strictly managed by 
>>>>>> Lucene at the moment, and it should actually be pretty difficult 
>>>>>> to have one closed out from under you. They are singletons and 
>>>>>> reference counted. The IndexReader would have to have been closed 
>>>>>> if it was responsible for closing the Directory, and in that 
>>>>>> case, we would not be trying to reopen it. The Searcher we get 
>>>>>> the IndexReader from has been inc ref'd to ensure it won't close. 
>>>>>> All I can think is that something is grabbing/stealing a 
>>>>>> Directory that didn't directly ask for it from 
>>>>>> FSDirectory.getDirectory, and is then closing it. I'm trying to 
>>>>>> hunt where that could be happening now. Hope to spot something, 
>>>>>> but appears pretty mysterious at the moment. I suppose another 
>>>>>> option is a nasty little race condition - I've been trying to 
>>>>>> repeat the error by sending lots of update/search requests from 
>>>>>> multiple threads with no luck though. Looks like I may have to 
>>>>>> throw in snap puller code (havn't look too heavily into any of 
>>>>>> that before).
>>>>>>
>>>>>> At worst, if/when a fix is discovered, you will probably be able 
>>>>>> to apply just the fix to the revision your working with.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> William Pierce wrote:
>>>>>>> Mark,
>>>>>>>
>>>>>>> Thanks for your response --- I do appreciate all you volunteers 
>>>>>>> working to provide such a nice system!
>>>>>>>
>>>>>>> Anyway,  I will try the trunk bits as you said.  The only 
>>>>>>> problem is that the later the trunk I use from 1.3,  the more of 
>>>>>>> post 1.3 capability I get. And I feel somewhat exposed running 
>>>>>>> these bits in our production environment...
>>>>>>>
>>>>>>> Not sure if there's another approach?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Bill
>>>>>>>
>>>>>>> --------------------------------------------------
>>>>>>> From: "Mark Miller" <ma...@gmail.com>
>>>>>>> Sent: Friday, November 14, 2008 8:43 PM
>>>>>>> To: <so...@lucene.apache.org>
>>>>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>>>>
>>>>>>>> Hey William, sorry about the trouble. I have to look at this 
>>>>>>>> further, but I think the issue is fixed if you grab the latest 
>>>>>>>> trunk build. Solr-465 should inadvertently fix things - before 
>>>>>>>> that patch, a deprecated constructor for solrsearcher was being 
>>>>>>>> called - this constructor caused the underlying IndexReader to 
>>>>>>>> close its own Directory, and since IndexReaders are reopened, 
>>>>>>>> we don't want that.
>>>>>>>>
>>>>>>>> Mark Miller wrote:
>>>>>>>>> Looks like there might be an issue with the reopen - I'm not 
>>>>>>>>> seeing what it could be offhand though. Have to find what 
>>>>>>>>> could be closing a Directory unexpectedly...I'll try to take a 
>>>>>>>>> further look over the weekend.
>>>>>>>>>
>>>>>>>>> - Mark
>>>>>>>>>
>>>>>>>>> William Pierce wrote:
>>>>>>>>>> Folks:
>>>>>>>>>>
>>>>>>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use 
>>>>>>>>>> the replication handler.   I am running on windows 2003 
>>>>>>>>>> server with tomcat 6.0.14.   Everything was running fine 
>>>>>>>>>> until I noticed that certain updated records were not showing 
>>>>>>>>>> up on the slave.  Further investigation showed me that the 
>>>>>>>>>> failures have indeed been occurring since early this morning 
>>>>>>>>>> with a fatal exception....here is a segment of the tomcat log:
>>>>>>>>>>   INFO: Total time taken for download : 0 secs
>>>>>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>>>>>>> fetchLatestIndex
>>>>>>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>>>   INFO: start 
>>>>>>>>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>>>> org.apache.solr.handler.ReplicationHandler doSnapPull
>>>>>>>>>>   SEVERE: SnapPull failed   
>>>>>>>>>> org.apache.solr.common.SolrException: Snappull failed :    at 
>>>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
>>>>>>>>>>
>>>>>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>>   Caused by: java.lang.RuntimeException: 
>>>>>>>>>> org.apache.lucene.store.AlreadyClosedException: this 
>>>>>>>>>> Directory is closed
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265) 
>>>>>>>>>>
>>>>>>>>>>    ... 11 more
>>>>>>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: 
>>>>>>>>>> this Directory is closed
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124) 
>>>>>>>>>>
>>>>>>>>>>    at 
>>>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>>>>>>    ... 14 more
>>>>>>>>>>   Nov 14, 2008 5:38:52 AM 
>>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>>>
>>>>>>>>>> Any ideas, anyone?
>>>>>>>>>>
>>>>>>>>>> -- Bill
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


Re: Fatal exception in solr 1.3+ replication

Posted by William Pierce <ev...@hotmail.com>.
Mark,

That sounds great!    Good luck with the cleaning :-)

Let me know how I can get a patch --- I'd prefer not do a solr build from 
source since we are not Java savvy here....:-(

- Bill

--------------------------------------------------
From: "Mark Miller" <ma...@gmail.com>
Sent: Saturday, November 15, 2008 12:43 PM
To: <so...@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication

> Okay, I'm fairly certain I've found it. As usual, take a walk and the 
> solution pops into your head out of the blue.
>
> It looks like Lucene's IndexReader reopen call is not very friendly with 
> the FSDirectory implementation. If you call reopen and it returns a new 
> IndexReader, it creates a new reference on the Directory - so if you 
> reopen an IndexReader that was originally opened with a non Directory 
> parameter (String or File instead), both Readers (the reopened one and the 
> one your reopening on) will close the Directory when they close. Thats not 
> right. Thats how we get to 0 faster than we should. So its kind of a 
> Lucene issue.
>
> My guess that this is hidden in the trunk was right, because I think we 
> are no longer using String, File based IndexReader opens, which means our 
> IndexReaders don't attempt to close their underlying Directories now.
>
> I can probably send you patch for the revision your on to hide this as 
> well, but I'm already in the doghouse on cleaning right now ; ) The way my 
> brain works, I'll probably be back to this later though.
>
> - Mark
>
>
> William Pierce wrote:
>>> Trunk may actually still hide the issue (possibly), but something really 
>>> funky seems to have gone on and I can't find it yet. Do you have any 
>>> custom code interacting with solr?
>>
>> None whatsoever...I am using out-of-the-box solr 1.3 (build of 10/23).  I 
>> am using my C# app to http requests to my solr instance.
>>
>> Is there something you want me to try at my end that might give you a 
>> clue? Let me know and I can try to help out.
>>
>> Best,
>>
>> - Bill
>>
>> --------------------------------------------------
>> From: "Mark Miller" <ma...@gmail.com>
>> Sent: Saturday, November 15, 2008 10:59 AM
>> To: <so...@lucene.apache.org>
>> Subject: Re: Fatal exception in solr 1.3+ replication
>>
>>> Havn't given up, but this has really got me so far. For every path, 
>>> FSDirectory allows just one instance of FSDirectory to exist, and it 
>>> keeps a ref count of how many have been returned from openDirectory for 
>>> a given path. An FSDirectory will not actually be closed unless all 
>>> references to it are released (it doesn't actually even do anything in 
>>> close, other than drop the reference). So pretty much, the only way to 
>>> get into trouble is to call close enough times to equal how many times 
>>> you called openDirectory, and then try to use the FSDirectory again. 
>>> This is what your stack trace indicates is happening.
>>>
>>> So we can get the call hierarchy for directory.close() in solr, and we 
>>> find that everything is pretty matched up...at worst it looks that a 
>>> reference might not be closed - but that doesn't hurt 
>>> anything...FSDirectory will just think there is something out there 
>>> holding onto a reference for that directory and allow you to continue 
>>> using it, even though no such reference might exist. Its only when 
>>> enough closes are called that an instance will be marked as closed (a 
>>> further openDirectory will return a new open instance). So to get your 
>>> error, something has to close a Directory that it did not get from 
>>> openDirectory.
>>>
>>> We know that the IndexReader that is trying to be reopened must have 
>>> called openDirectory (or something called openDirectory for it) and we 
>>> know that it hasn't called close (the IndexReader has already held up to 
>>> an ensureOpen call on itself in that stack trace). Something else must 
>>> have closed it. I can't find this happening. Nothing else calls close on 
>>> Directory unless it called openDirectory (that I can find using all of 
>>> Eclipses magical goodness).
>>>
>>> So how does the refcount on the Directory hit 0? I can't find or 
>>> duplicate yet...
>>>
>>> Trunk may actually still hide the issue (possibly), but something really 
>>> funky seems to have gone on and I can't find it yet. Do you have any 
>>> custom code interacting with solr?
>>>
>>> - Mark
>>>
>>> William Pierce wrote:
>>>> Mark,
>>>>
>>>> Thanks for your help and lucid exposition of what may be going on....I 
>>>> will await to hear from you before I plough ahead with the latest trunk 
>>>> bits....
>>>>
>>>> Best regards,
>>>>
>>>> -- Bill
>>>>
>>>> --------------------------------------------------
>>>> From: "Mark Miller" <ma...@gmail.com>
>>>> Sent: Saturday, November 15, 2008 7:29 AM
>>>> To: <so...@lucene.apache.org>
>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>
>>>>> Hang on. The more I look at this, the more I am thinking that was not 
>>>>> the problem. Directories are pretty strictly managed by Lucene at the 
>>>>> moment, and it should actually be pretty difficult to have one closed 
>>>>> out from under you. They are singletons and reference counted. The 
>>>>> IndexReader would have to have been closed if it was responsible for 
>>>>> closing the Directory, and in that case, we would not be trying to 
>>>>> reopen it. The Searcher we get the IndexReader from has been inc ref'd 
>>>>> to ensure it won't close. All I can think is that something is 
>>>>> grabbing/stealing a Directory that didn't directly ask for it from 
>>>>> FSDirectory.getDirectory, and is then closing it. I'm trying to hunt 
>>>>> where that could be happening now. Hope to spot something, but appears 
>>>>> pretty mysterious at the moment. I suppose another option is a nasty 
>>>>> little race condition - I've been trying to repeat the error by 
>>>>> sending lots of update/search requests from multiple threads with no 
>>>>> luck though. Looks like I may have to throw in snap puller code 
>>>>> (havn't look too heavily into any of that before).
>>>>>
>>>>> At worst, if/when a fix is discovered, you will probably be able to 
>>>>> apply just the fix to the revision your working with.
>>>>>
>>>>> - Mark
>>>>>
>>>>> William Pierce wrote:
>>>>>> Mark,
>>>>>>
>>>>>> Thanks for your response --- I do appreciate all you volunteers 
>>>>>> working to provide such a nice system!
>>>>>>
>>>>>> Anyway,  I will try the trunk bits as you said.  The only problem is 
>>>>>> that the later the trunk I use from 1.3,  the more of post 1.3 
>>>>>> capability I get. And I feel somewhat exposed running these bits in 
>>>>>> our production environment...
>>>>>>
>>>>>> Not sure if there's another approach?
>>>>>>
>>>>>> Thanks,
>>>>>> -Bill
>>>>>>
>>>>>> --------------------------------------------------
>>>>>> From: "Mark Miller" <ma...@gmail.com>
>>>>>> Sent: Friday, November 14, 2008 8:43 PM
>>>>>> To: <so...@lucene.apache.org>
>>>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>>>
>>>>>>> Hey William, sorry about the trouble. I have to look at this 
>>>>>>> further, but I think the issue is fixed if you grab the latest trunk 
>>>>>>> build. Solr-465 should inadvertently fix things - before that patch, 
>>>>>>> a deprecated constructor for solrsearcher was being called - this 
>>>>>>> constructor caused the underlying IndexReader to close its own 
>>>>>>> Directory, and since IndexReaders are reopened, we don't want that.
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>> Looks like there might be an issue with the reopen - I'm not seeing 
>>>>>>>> what it could be offhand though. Have to find what could be closing 
>>>>>>>> a Directory unexpectedly...I'll try to take a further look over the 
>>>>>>>> weekend.
>>>>>>>>
>>>>>>>> - Mark
>>>>>>>>
>>>>>>>> William Pierce wrote:
>>>>>>>>> Folks:
>>>>>>>>>
>>>>>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>>>>>>>> replication handler.   I am running on windows 2003 server with 
>>>>>>>>> tomcat 6.0.14.   Everything was running fine until I noticed that 
>>>>>>>>> certain updated records were not showing up on the slave.  Further 
>>>>>>>>> investigation showed me that the failures have indeed been 
>>>>>>>>> occurring since early this morning with a fatal exception....here 
>>>>>>>>> is a segment of the tomcat log:
>>>>>>>>>   INFO: Total time taken for download : 0 secs
>>>>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>>>>>> fetchLatestIndex
>>>>>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>>   INFO: start 
>>>>>>>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>>> org.apache.solr.handler.ReplicationHandler doSnapPull
>>>>>>>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>>>>>>>> Snappull failed :    at 
>>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>>>    at 
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>   Caused by: java.lang.RuntimeException: 
>>>>>>>>> org.apache.lucene.store.AlreadyClosedException: this Directory is 
>>>>>>>>> closed
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>>>>>>>>    ... 11 more
>>>>>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>>>>>>>> Directory is closed
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>>>>>>>>>    at 
>>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>>>>>>>>>    at 
>>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>>>>>    ... 14 more
>>>>>>>>>   Nov 14, 2008 5:38:52 AM 
>>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>>
>>>>>>>>> Any ideas, anyone?
>>>>>>>>>
>>>>>>>>> -- Bill
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
> 

Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
Okay, I'm fairly certain I've found it. As usual, take a walk and the 
solution pops into your head out of the blue.

It looks like Lucene's IndexReader reopen call is not very friendly with 
the FSDirectory implementation. If you call reopen and it returns a new 
IndexReader, it creates a new reference on the Directory - so if you 
reopen an IndexReader that was originally opened with a non Directory 
parameter (String or File instead), both Readers (the reopened one and 
the one your reopening on) will close the Directory when they close. 
Thats not right. Thats how we get to 0 faster than we should. So its 
kind of a Lucene issue.

My guess that this is hidden in the trunk was right, because I think we 
are no longer using String, File based IndexReader opens, which means 
our IndexReaders don't attempt to close their underlying Directories now.

I can probably send you patch for the revision your on to hide this as 
well, but I'm already in the doghouse on cleaning right now ; ) The way 
my brain works, I'll probably be back to this later though.

- Mark


William Pierce wrote:
>> Trunk may actually still hide the issue (possibly), but something 
>> really funky seems to have gone on and I can't find it yet. Do you 
>> have any custom code interacting with solr?
>
> None whatsoever...I am using out-of-the-box solr 1.3 (build of 
> 10/23).  I am using my C# app to http requests to my solr instance.
>
> Is there something you want me to try at my end that might give you a 
> clue? Let me know and I can try to help out.
>
> Best,
>
> - Bill
>
> --------------------------------------------------
> From: "Mark Miller" <ma...@gmail.com>
> Sent: Saturday, November 15, 2008 10:59 AM
> To: <so...@lucene.apache.org>
> Subject: Re: Fatal exception in solr 1.3+ replication
>
>> Havn't given up, but this has really got me so far. For every path, 
>> FSDirectory allows just one instance of FSDirectory to exist, and it 
>> keeps a ref count of how many have been returned from openDirectory 
>> for a given path. An FSDirectory will not actually be closed unless 
>> all references to it are released (it doesn't actually even do 
>> anything in close, other than drop the reference). So pretty much, 
>> the only way to get into trouble is to call close enough times to 
>> equal how many times you called openDirectory, and then try to use 
>> the FSDirectory again. This is what your stack trace indicates is 
>> happening.
>>
>> So we can get the call hierarchy for directory.close() in solr, and 
>> we find that everything is pretty matched up...at worst it looks that 
>> a reference might not be closed - but that doesn't hurt 
>> anything...FSDirectory will just think there is something out there 
>> holding onto a reference for that directory and allow you to continue 
>> using it, even though no such reference might exist. Its only when 
>> enough closes are called that an instance will be marked as closed (a 
>> further openDirectory will return a new open instance). So to get 
>> your error, something has to close a Directory that it did not get 
>> from openDirectory.
>>
>> We know that the IndexReader that is trying to be reopened must have 
>> called openDirectory (or something called openDirectory for it) and 
>> we know that it hasn't called close (the IndexReader has already held 
>> up to an ensureOpen call on itself in that stack trace). Something 
>> else must have closed it. I can't find this happening. Nothing else 
>> calls close on Directory unless it called openDirectory (that I can 
>> find using all of Eclipses magical goodness).
>>
>> So how does the refcount on the Directory hit 0? I can't find or 
>> duplicate yet...
>>
>> Trunk may actually still hide the issue (possibly), but something 
>> really funky seems to have gone on and I can't find it yet. Do you 
>> have any custom code interacting with solr?
>>
>> - Mark
>>
>> William Pierce wrote:
>>> Mark,
>>>
>>> Thanks for your help and lucid exposition of what may be going 
>>> on....I will await to hear from you before I plough ahead with the 
>>> latest trunk bits....
>>>
>>> Best regards,
>>>
>>> -- Bill
>>>
>>> --------------------------------------------------
>>> From: "Mark Miller" <ma...@gmail.com>
>>> Sent: Saturday, November 15, 2008 7:29 AM
>>> To: <so...@lucene.apache.org>
>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>
>>>> Hang on. The more I look at this, the more I am thinking that was 
>>>> not the problem. Directories are pretty strictly managed by Lucene 
>>>> at the moment, and it should actually be pretty difficult to have 
>>>> one closed out from under you. They are singletons and reference 
>>>> counted. The IndexReader would have to have been closed if it was 
>>>> responsible for closing the Directory, and in that case, we would 
>>>> not be trying to reopen it. The Searcher we get the IndexReader 
>>>> from has been inc ref'd to ensure it won't close. All I can think 
>>>> is that something is grabbing/stealing a Directory that didn't 
>>>> directly ask for it from FSDirectory.getDirectory, and is then 
>>>> closing it. I'm trying to hunt where that could be happening now. 
>>>> Hope to spot something, but appears pretty mysterious at the 
>>>> moment. I suppose another option is a nasty little race condition - 
>>>> I've been trying to repeat the error by sending lots of 
>>>> update/search requests from multiple threads with no luck though. 
>>>> Looks like I may have to throw in snap puller code (havn't look too 
>>>> heavily into any of that before).
>>>>
>>>> At worst, if/when a fix is discovered, you will probably be able to 
>>>> apply just the fix to the revision your working with.
>>>>
>>>> - Mark
>>>>
>>>> William Pierce wrote:
>>>>> Mark,
>>>>>
>>>>> Thanks for your response --- I do appreciate all you volunteers 
>>>>> working to provide such a nice system!
>>>>>
>>>>> Anyway,  I will try the trunk bits as you said.  The only problem 
>>>>> is that the later the trunk I use from 1.3,  the more of post 1.3 
>>>>> capability I get. And I feel somewhat exposed running these bits 
>>>>> in our production environment...
>>>>>
>>>>> Not sure if there's another approach?
>>>>>
>>>>> Thanks,
>>>>> -Bill
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Mark Miller" <ma...@gmail.com>
>>>>> Sent: Friday, November 14, 2008 8:43 PM
>>>>> To: <so...@lucene.apache.org>
>>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>>
>>>>>> Hey William, sorry about the trouble. I have to look at this 
>>>>>> further, but I think the issue is fixed if you grab the latest 
>>>>>> trunk build. Solr-465 should inadvertently fix things - before 
>>>>>> that patch, a deprecated constructor for solrsearcher was being 
>>>>>> called - this constructor caused the underlying IndexReader to 
>>>>>> close its own Directory, and since IndexReaders are reopened, we 
>>>>>> don't want that.
>>>>>>
>>>>>> Mark Miller wrote:
>>>>>>> Looks like there might be an issue with the reopen - I'm not 
>>>>>>> seeing what it could be offhand though. Have to find what could 
>>>>>>> be closing a Directory unexpectedly...I'll try to take a further 
>>>>>>> look over the weekend.
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>> William Pierce wrote:
>>>>>>>> Folks:
>>>>>>>>
>>>>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use 
>>>>>>>> the replication handler.   I am running on windows 2003 server 
>>>>>>>> with tomcat 6.0.14.   Everything was running fine until I 
>>>>>>>> noticed that certain updated records were not showing up on the 
>>>>>>>> slave.  Further investigation showed me that the failures have 
>>>>>>>> indeed been occurring since early this morning with a fatal 
>>>>>>>> exception....here is a segment of the tomcat log:
>>>>>>>>   INFO: Total time taken for download : 0 secs
>>>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>>>>> fetchLatestIndex
>>>>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>   INFO: start 
>>>>>>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>>> org.apache.solr.handler.ReplicationHandler doSnapPull
>>>>>>>>   SEVERE: SnapPull failed   
>>>>>>>> org.apache.solr.common.SolrException: Snappull failed :    at 
>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
>>>>>>>>
>>>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>>>   Caused by: java.lang.RuntimeException: 
>>>>>>>> org.apache.lucene.store.AlreadyClosedException: this Directory 
>>>>>>>> is closed
>>>>>>>>    at 
>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>>>>    at 
>>>>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>>>>    at 
>>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265) 
>>>>>>>>
>>>>>>>>    ... 11 more
>>>>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: 
>>>>>>>> this Directory is closed
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124) 
>>>>>>>>
>>>>>>>>    at 
>>>>>>>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>>>>    ... 14 more
>>>>>>>>   Nov 14, 2008 5:38:52 AM 
>>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>>
>>>>>>>> Any ideas, anyone?
>>>>>>>>
>>>>>>>> -- Bill
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


Re: Fatal exception in solr 1.3+ replication

Posted by William Pierce <ev...@hotmail.com>.
> Trunk may actually still hide the issue (possibly), but something really 
> funky seems to have gone on and I can't find it yet. Do you have any 
> custom code interacting with solr?

None whatsoever...I am using out-of-the-box solr 1.3 (build of 10/23).  I am 
using my C# app to http requests to my solr instance.

Is there something you want me to try at my end that might give you a clue? 
Let me know and I can try to help out.

Best,

- Bill

--------------------------------------------------
From: "Mark Miller" <ma...@gmail.com>
Sent: Saturday, November 15, 2008 10:59 AM
To: <so...@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication

> Havn't given up, but this has really got me so far. For every path, 
> FSDirectory allows just one instance of FSDirectory to exist, and it keeps 
> a ref count of how many have been returned from openDirectory for a given 
> path. An FSDirectory will not actually be closed unless all references to 
> it are released (it doesn't actually even do anything in close, other than 
> drop the reference). So pretty much, the only way to get into trouble is 
> to call close enough times to equal how many times you called 
> openDirectory, and then try to use the FSDirectory again. This is what 
> your stack trace indicates is happening.
>
> So we can get the call hierarchy for directory.close() in solr, and we 
> find that everything is pretty matched up...at worst it looks that a 
> reference might not be closed - but that doesn't hurt 
> anything...FSDirectory will just think there is something out there 
> holding onto a reference for that directory and allow you to continue 
> using it, even though no such reference might exist. Its only when enough 
> closes are called that an instance will be marked as closed (a further 
> openDirectory will return a new open instance). So to get your error, 
> something has to close a Directory that it did not get from openDirectory.
>
> We know that the IndexReader that is trying to be reopened must have 
> called openDirectory (or something called openDirectory for it) and we 
> know that it hasn't called close (the IndexReader has already held up to 
> an ensureOpen call on itself in that stack trace). Something else must 
> have closed it. I can't find this happening. Nothing else calls close on 
> Directory unless it called openDirectory (that I can find using all of 
> Eclipses magical goodness).
>
> So how does the refcount on the Directory hit 0? I can't find or duplicate 
> yet...
>
> Trunk may actually still hide the issue (possibly), but something really 
> funky seems to have gone on and I can't find it yet. Do you have any 
> custom code interacting with solr?
>
> - Mark
>
> William Pierce wrote:
>> Mark,
>>
>> Thanks for your help and lucid exposition of what may be going on....I 
>> will await to hear from you before I plough ahead with the latest trunk 
>> bits....
>>
>> Best regards,
>>
>> -- Bill
>>
>> --------------------------------------------------
>> From: "Mark Miller" <ma...@gmail.com>
>> Sent: Saturday, November 15, 2008 7:29 AM
>> To: <so...@lucene.apache.org>
>> Subject: Re: Fatal exception in solr 1.3+ replication
>>
>>> Hang on. The more I look at this, the more I am thinking that was not 
>>> the problem. Directories are pretty strictly managed by Lucene at the 
>>> moment, and it should actually be pretty difficult to have one closed 
>>> out from under you. They are singletons and reference counted. The 
>>> IndexReader would have to have been closed if it was responsible for 
>>> closing the Directory, and in that case, we would not be trying to 
>>> reopen it. The Searcher we get the IndexReader from has been inc ref'd 
>>> to ensure it won't close. All I can think is that something is 
>>> grabbing/stealing a Directory that didn't directly ask for it from 
>>> FSDirectory.getDirectory, and is then closing it. I'm trying to hunt 
>>> where that could be happening now. Hope to spot something, but appears 
>>> pretty mysterious at the moment. I suppose another option is a nasty 
>>> little race condition - I've been trying to repeat the error by sending 
>>> lots of update/search requests from multiple threads with no luck 
>>> though. Looks like I may have to throw in snap puller code (havn't look 
>>> too heavily into any of that before).
>>>
>>> At worst, if/when a fix is discovered, you will probably be able to 
>>> apply just the fix to the revision your working with.
>>>
>>> - Mark
>>>
>>> William Pierce wrote:
>>>> Mark,
>>>>
>>>> Thanks for your response --- I do appreciate all you volunteers working 
>>>> to provide such a nice system!
>>>>
>>>> Anyway,  I will try the trunk bits as you said.  The only problem is 
>>>> that the later the trunk I use from 1.3,  the more of post 1.3 
>>>> capability I get. And I feel somewhat exposed running these bits in our 
>>>> production environment...
>>>>
>>>> Not sure if there's another approach?
>>>>
>>>> Thanks,
>>>> -Bill
>>>>
>>>> --------------------------------------------------
>>>> From: "Mark Miller" <ma...@gmail.com>
>>>> Sent: Friday, November 14, 2008 8:43 PM
>>>> To: <so...@lucene.apache.org>
>>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>>
>>>>> Hey William, sorry about the trouble. I have to look at this further, 
>>>>> but I think the issue is fixed if you grab the latest trunk build. 
>>>>> Solr-465 should inadvertently fix things - before that patch, a 
>>>>> deprecated constructor for solrsearcher was being called - this 
>>>>> constructor caused the underlying IndexReader to close its own 
>>>>> Directory, and since IndexReaders are reopened, we don't want that.
>>>>>
>>>>> Mark Miller wrote:
>>>>>> Looks like there might be an issue with the reopen - I'm not seeing 
>>>>>> what it could be offhand though. Have to find what could be closing a 
>>>>>> Directory unexpectedly...I'll try to take a further look over the 
>>>>>> weekend.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> William Pierce wrote:
>>>>>>> Folks:
>>>>>>>
>>>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>>>>>> replication handler.   I am running on windows 2003 server with 
>>>>>>> tomcat 6.0.14.   Everything was running fine until I noticed that 
>>>>>>> certain updated records were not showing up on the slave.  Further 
>>>>>>> investigation showed me that the failures have indeed been occurring 
>>>>>>> since early this morning with a fatal exception....here is a segment 
>>>>>>> of the tomcat log:
>>>>>>>   INFO: Total time taken for download : 0 secs
>>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>>>> fetchLatestIndex
>>>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>   INFO: start 
>>>>>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
>>>>>>> doSnapPull
>>>>>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>>>>>> Snappull failed :    at 
>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>>>>>>    at 
>>>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>>>>>>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>>>    at 
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>>>>>    at 
>>>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>>>>>    at 
>>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>>>    at 
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>>>>>    at 
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>>>>>>>    at 
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>>>>>>>    at 
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>    at 
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>>   Caused by: java.lang.RuntimeException: 
>>>>>>> org.apache.lucene.store.AlreadyClosedException: this Directory is 
>>>>>>> closed
>>>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>>>    at 
>>>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>>>>>>>    at 
>>>>>>> org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>>>    at 
>>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>>>>>>    ... 11 more
>>>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>>>>>> Directory is closed
>>>>>>>    at 
>>>>>>> org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>>>    at 
>>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>>>>>>>    at 
>>>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>>>>>>>    at 
>>>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>>>>>>>    at 
>>>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>>>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>>>    ... 14 more
>>>>>>>   Nov 14, 2008 5:38:52 AM 
>>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>>
>>>>>>> Any ideas, anyone?
>>>>>>>
>>>>>>> -- Bill
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
> 

Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
Havn't given up, but this has really got me so far. For every path, 
FSDirectory allows just one instance of FSDirectory to exist, and it 
keeps a ref count of how many have been returned from openDirectory for 
a given path. An FSDirectory will not actually be closed unless all 
references to it are released (it doesn't actually even do anything in 
close, other than drop the reference). So pretty much, the only way to 
get into trouble is to call close enough times to equal how many times 
you called openDirectory, and then try to use the FSDirectory again. 
This is what your stack trace indicates is happening.

So we can get the call hierarchy for directory.close() in solr, and we 
find that everything is pretty matched up...at worst it looks that a 
reference might not be closed - but that doesn't hurt 
anything...FSDirectory will just think there is something out there 
holding onto a reference for that directory and allow you to continue 
using it, even though no such reference might exist. Its only when 
enough closes are called that an instance will be marked as closed (a 
further openDirectory will return a new open instance). So to get your 
error, something has to close a Directory that it did not get from 
openDirectory.

We know that the IndexReader that is trying to be reopened must have 
called openDirectory (or something called openDirectory for it) and we 
know that it hasn't called close (the IndexReader has already held up to 
an ensureOpen call on itself in that stack trace). Something else must 
have closed it. I can't find this happening. Nothing else calls close on 
Directory unless it called openDirectory (that I can find using all of 
Eclipses magical goodness).

So how does the refcount on the Directory hit 0? I can't find or 
duplicate yet...

Trunk may actually still hide the issue (possibly), but something really 
funky seems to have gone on and I can't find it yet. Do you have any 
custom code interacting with solr?

- Mark

William Pierce wrote:
> Mark,
>
> Thanks for your help and lucid exposition of what may be going on....I 
> will await to hear from you before I plough ahead with the latest 
> trunk bits....
>
> Best regards,
>
> -- Bill
>
> --------------------------------------------------
> From: "Mark Miller" <ma...@gmail.com>
> Sent: Saturday, November 15, 2008 7:29 AM
> To: <so...@lucene.apache.org>
> Subject: Re: Fatal exception in solr 1.3+ replication
>
>> Hang on. The more I look at this, the more I am thinking that was not 
>> the problem. Directories are pretty strictly managed by Lucene at the 
>> moment, and it should actually be pretty difficult to have one closed 
>> out from under you. They are singletons and reference counted. The 
>> IndexReader would have to have been closed if it was responsible for 
>> closing the Directory, and in that case, we would not be trying to 
>> reopen it. The Searcher we get the IndexReader from has been inc 
>> ref'd to ensure it won't close. All I can think is that something is 
>> grabbing/stealing a Directory that didn't directly ask for it from 
>> FSDirectory.getDirectory, and is then closing it. I'm trying to hunt 
>> where that could be happening now. Hope to spot something, but 
>> appears pretty mysterious at the moment. I suppose another option is 
>> a nasty little race condition - I've been trying to repeat the error 
>> by sending lots of update/search requests from multiple threads with 
>> no luck though. Looks like I may have to throw in snap puller code 
>> (havn't look too heavily into any of that before).
>>
>> At worst, if/when a fix is discovered, you will probably be able to 
>> apply just the fix to the revision your working with.
>>
>> - Mark
>>
>> William Pierce wrote:
>>> Mark,
>>>
>>> Thanks for your response --- I do appreciate all you volunteers 
>>> working to provide such a nice system!
>>>
>>> Anyway,  I will try the trunk bits as you said.  The only problem is 
>>> that the later the trunk I use from 1.3,  the more of post 1.3 
>>> capability I get. And I feel somewhat exposed running these bits in 
>>> our production environment...
>>>
>>> Not sure if there's another approach?
>>>
>>> Thanks,
>>> -Bill
>>>
>>> --------------------------------------------------
>>> From: "Mark Miller" <ma...@gmail.com>
>>> Sent: Friday, November 14, 2008 8:43 PM
>>> To: <so...@lucene.apache.org>
>>> Subject: Re: Fatal exception in solr 1.3+ replication
>>>
>>>> Hey William, sorry about the trouble. I have to look at this 
>>>> further, but I think the issue is fixed if you grab the latest 
>>>> trunk build. Solr-465 should inadvertently fix things - before that 
>>>> patch, a deprecated constructor for solrsearcher was being called - 
>>>> this constructor caused the underlying IndexReader to close its own 
>>>> Directory, and since IndexReaders are reopened, we don't want that.
>>>>
>>>> Mark Miller wrote:
>>>>> Looks like there might be an issue with the reopen - I'm not 
>>>>> seeing what it could be offhand though. Have to find what could be 
>>>>> closing a Directory unexpectedly...I'll try to take a further look 
>>>>> over the weekend.
>>>>>
>>>>> - Mark
>>>>>
>>>>> William Pierce wrote:
>>>>>> Folks:
>>>>>>
>>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>>>>> replication handler.   I am running on windows 2003 server with 
>>>>>> tomcat 6.0.14.   Everything was running fine until I noticed that 
>>>>>> certain updated records were not showing up on the slave.  
>>>>>> Further investigation showed me that the failures have indeed 
>>>>>> been occurring since early this morning with a fatal 
>>>>>> exception....here is a segment of the tomcat log:
>>>>>>   INFO: Total time taken for download : 0 secs
>>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>>> fetchLatestIndex
>>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>   INFO: start 
>>>>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>>   Nov 14, 2008 5:34:24 AM 
>>>>>> org.apache.solr.handler.ReplicationHandler doSnapPull
>>>>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>>>>> Snappull failed :    at 
>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278) 
>>>>>>
>>>>>>    at 
>>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208) 
>>>>>>
>>>>>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>>    at 
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>>    at 
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) 
>>>>>>
>>>>>>    at 
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
>>>>>>
>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>   Caused by: java.lang.RuntimeException: 
>>>>>> org.apache.lucene.store.AlreadyClosedException: this Directory is 
>>>>>> closed
>>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>>    at 
>>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) 
>>>>>>
>>>>>>    at 
>>>>>> org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>>    at 
>>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265) 
>>>>>>
>>>>>>    ... 11 more
>>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>>>>> Directory is closed
>>>>>>    at 
>>>>>> org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>>    at 
>>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) 
>>>>>>
>>>>>>    at 
>>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) 
>>>>>>
>>>>>>    at 
>>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) 
>>>>>>
>>>>>>    at 
>>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124) 
>>>>>>
>>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>>    ... 14 more
>>>>>>   Nov 14, 2008 5:38:52 AM 
>>>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>>>
>>>>>> Any ideas, anyone?
>>>>>>
>>>>>> -- Bill
>>>>>>
>>>>>
>>>>
>>>>
>>
>>


Re: Fatal exception in solr 1.3+ replication

Posted by William Pierce <ev...@hotmail.com>.
Mark,

Thanks for your help and lucid exposition of what may be going on....I will 
await to hear from you before I plough ahead with the latest trunk bits....

Best regards,

-- Bill

--------------------------------------------------
From: "Mark Miller" <ma...@gmail.com>
Sent: Saturday, November 15, 2008 7:29 AM
To: <so...@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication

> Hang on. The more I look at this, the more I am thinking that was not the 
> problem. Directories are pretty strictly managed by Lucene at the moment, 
> and it should actually be pretty difficult to have one closed out from 
> under you. They are singletons and reference counted. The IndexReader 
> would have to have been closed if it was responsible for closing the 
> Directory, and in that case, we would not be trying to reopen it. The 
> Searcher we get the IndexReader from has been inc ref'd to ensure it won't 
> close. All I can think is that something is grabbing/stealing a Directory 
> that didn't directly ask for it from FSDirectory.getDirectory, and is then 
> closing it. I'm trying to hunt where that could be happening now. Hope to 
> spot something, but appears pretty mysterious at the moment. I suppose 
> another option is a nasty little race condition - I've been trying to 
> repeat the error by sending lots of update/search requests from multiple 
> threads with no luck though. Looks like I may have to throw in snap puller 
> code (havn't look too heavily into any of that before).
>
> At worst, if/when a fix is discovered, you will probably be able to apply 
> just the fix to the revision your working with.
>
> - Mark
>
> William Pierce wrote:
>> Mark,
>>
>> Thanks for your response --- I do appreciate all you volunteers working 
>> to provide such a nice system!
>>
>> Anyway,  I will try the trunk bits as you said.  The only problem is that 
>> the later the trunk I use from 1.3,  the more of post 1.3 capability I 
>> get. And I feel somewhat exposed running these bits in our production 
>> environment...
>>
>> Not sure if there's another approach?
>>
>> Thanks,
>> -Bill
>>
>> --------------------------------------------------
>> From: "Mark Miller" <ma...@gmail.com>
>> Sent: Friday, November 14, 2008 8:43 PM
>> To: <so...@lucene.apache.org>
>> Subject: Re: Fatal exception in solr 1.3+ replication
>>
>>> Hey William, sorry about the trouble. I have to look at this further, 
>>> but I think the issue is fixed if you grab the latest trunk build. 
>>> Solr-465 should inadvertently fix things - before that patch, a 
>>> deprecated constructor for solrsearcher was being called - this 
>>> constructor caused the underlying IndexReader to close its own 
>>> Directory, and since IndexReaders are reopened, we don't want that.
>>>
>>> Mark Miller wrote:
>>>> Looks like there might be an issue with the reopen - I'm not seeing 
>>>> what it could be offhand though. Have to find what could be closing a 
>>>> Directory unexpectedly...I'll try to take a further look over the 
>>>> weekend.
>>>>
>>>> - Mark
>>>>
>>>> William Pierce wrote:
>>>>> Folks:
>>>>>
>>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>>>> replication handler.   I am running on windows 2003 server with tomcat 
>>>>> 6.0.14.   Everything was running fine until I noticed that certain 
>>>>> updated records were not showing up on the slave.  Further 
>>>>> investigation showed me that the failures have indeed been occurring 
>>>>> since early this morning with a fatal exception....here is a segment 
>>>>> of the tomcat log:
>>>>>   INFO: Total time taken for download : 0 secs
>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>>> fetchLatestIndex
>>>>>   INFO: Conf files are not downloaded or are in sync
>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
>>>>> commit
>>>>>   INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
>>>>> doSnapPull
>>>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>>>> Snappull failed :    at 
>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>>>>    at 
>>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>>>>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>>    at 
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>>>    at 
>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>>    at 
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>>>    at 
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>>>>>    at 
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>>>>>    at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>    at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>   Caused by: java.lang.RuntimeException: 
>>>>> org.apache.lucene.store.AlreadyClosedException: this Directory is 
>>>>> closed
>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>>    at 
>>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>>>>>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>>    at 
>>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>>>>    ... 11 more
>>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>>>> Directory is closed
>>>>>    at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>>>>>    at 
>>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>>>>>    at 
>>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>>    ... 14 more
>>>>>   Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
>>>>> commit
>>>>>
>>>>> Any ideas, anyone?
>>>>>
>>>>> -- Bill
>>>>>
>>>>
>>>
>>>
>
> 

Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
Hang on. The more I look at this, the more I am thinking that was not 
the problem. Directories are pretty strictly managed by Lucene at the 
moment, and it should actually be pretty difficult to have one closed 
out from under you. They are singletons and reference counted. The 
IndexReader would have to have been closed if it was responsible for 
closing the Directory, and in that case, we would not be trying to 
reopen it. The Searcher we get the IndexReader from has been inc ref'd 
to ensure it won't close. All I can think is that something is 
grabbing/stealing a Directory that didn't directly ask for it from 
FSDirectory.getDirectory, and is then  closing it. I'm trying to hunt 
where that could be happening now. Hope to spot something, but appears 
pretty mysterious at the moment. I suppose another option is a nasty 
little race condition - I've been trying to repeat the error by sending 
lots of update/search requests from multiple threads with no luck 
though. Looks like I may have to throw in snap puller code (havn't look 
too heavily into any of that before).

At worst, if/when a fix is discovered, you will probably be able to 
apply just the fix to the revision your working with.

- Mark

William Pierce wrote:
> Mark,
>
> Thanks for your response --- I do appreciate all you volunteers 
> working to provide such a nice system!
>
> Anyway,  I will try the trunk bits as you said.  The only problem is 
> that the later the trunk I use from 1.3,  the more of post 1.3 
> capability I get. And I feel somewhat exposed running these bits in 
> our production environment...
>
> Not sure if there's another approach?
>
> Thanks,
> -Bill
>
> --------------------------------------------------
> From: "Mark Miller" <ma...@gmail.com>
> Sent: Friday, November 14, 2008 8:43 PM
> To: <so...@lucene.apache.org>
> Subject: Re: Fatal exception in solr 1.3+ replication
>
>> Hey William, sorry about the trouble. I have to look at this further, 
>> but I think the issue is fixed if you grab the latest trunk build. 
>> Solr-465 should inadvertently fix things - before that patch, a 
>> deprecated constructor for solrsearcher was being called - this 
>> constructor caused the underlying IndexReader to close its own 
>> Directory, and since IndexReaders are reopened, we don't want that.
>>
>> Mark Miller wrote:
>>> Looks like there might be an issue with the reopen - I'm not seeing 
>>> what it could be offhand though. Have to find what could be closing 
>>> a Directory unexpectedly...I'll try to take a further look over the 
>>> weekend.
>>>
>>> - Mark
>>>
>>> William Pierce wrote:
>>>> Folks:
>>>>
>>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>>> replication handler.   I am running on windows 2003 server with 
>>>> tomcat 6.0.14.   Everything was running fine until I noticed that 
>>>> certain updated records were not showing up on the slave.  Further 
>>>> investigation showed me that the failures have indeed been 
>>>> occurring since early this morning with a fatal exception....here 
>>>> is a segment of the tomcat log:
>>>>   INFO: Total time taken for download : 0 secs
>>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>>> fetchLatestIndex
>>>>   INFO: Conf files are not downloaded or are in sync
>>>>   Nov 14, 2008 5:34:24 AM 
>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>   INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>>   Nov 14, 2008 5:34:24 AM 
>>>> org.apache.solr.handler.ReplicationHandler doSnapPull
>>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>>> Snappull failed :    at 
>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278) 
>>>>
>>>>    at 
>>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208) 
>>>>
>>>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>>    at 
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
>>>>
>>>>    at 
>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
>>>>
>>>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>    at 
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 
>>>>
>>>>    at 
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) 
>>>>
>>>>    at 
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) 
>>>>
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) 
>>>>
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
>>>>
>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>   Caused by: java.lang.RuntimeException: 
>>>> org.apache.lucene.store.AlreadyClosedException: this Directory is 
>>>> closed
>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>>    at 
>>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) 
>>>>
>>>>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>>    at 
>>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265) 
>>>>
>>>>    ... 11 more
>>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>>> Directory is closed
>>>>    at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>>    at 
>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) 
>>>>
>>>>    at 
>>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) 
>>>>
>>>>    at 
>>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) 
>>>>
>>>>    at 
>>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124) 
>>>>
>>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>>    ... 14 more
>>>>   Nov 14, 2008 5:38:52 AM 
>>>> org.apache.solr.update.DirectUpdateHandler2 commit
>>>>
>>>> Any ideas, anyone?
>>>>
>>>> -- Bill
>>>>
>>>
>>
>>


Re: Fatal exception in solr 1.3+ replication

Posted by William Pierce <ev...@hotmail.com>.
Mark,

Thanks for your response --- I do appreciate all you volunteers working to 
provide such a nice system!

Anyway,  I will try the trunk bits as you said.  The only problem is that 
the later the trunk I use from 1.3,  the more of post 1.3 capability I get. 
And I feel somewhat exposed running these bits in our production 
environment...

Not sure if there's another approach?

Thanks,
-Bill

--------------------------------------------------
From: "Mark Miller" <ma...@gmail.com>
Sent: Friday, November 14, 2008 8:43 PM
To: <so...@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication

> Hey William, sorry about the trouble. I have to look at this further, but 
> I think the issue is fixed if you grab the latest trunk build. Solr-465 
> should inadvertently fix things - before that patch, a deprecated 
> constructor for solrsearcher was being called - this constructor caused 
> the underlying IndexReader to close its own Directory, and since 
> IndexReaders are reopened, we don't want that.
>
> Mark Miller wrote:
>> Looks like there might be an issue with the reopen - I'm not seeing what 
>> it could be offhand though. Have to find what could be closing a 
>> Directory unexpectedly...I'll try to take a further look over the 
>> weekend.
>>
>> - Mark
>>
>> William Pierce wrote:
>>> Folks:
>>>
>>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>>> replication handler.   I am running on windows 2003 server with tomcat 
>>> 6.0.14.   Everything was running fine until I noticed that certain 
>>> updated records were not showing up on the slave.  Further investigation 
>>> showed me that the failures have indeed been occurring since early this 
>>> morning with a fatal exception....here is a segment of the tomcat log:
>>>   INFO: Total time taken for download : 0 secs
>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>>> fetchLatestIndex
>>>   INFO: Conf files are not downloaded or are in sync
>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
>>> commit
>>>   INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
>>> doSnapPull
>>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>>> Snappull failed :    at 
>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>>    at 
>>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>>    at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>    at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>    at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>    at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>>>    at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>>>    at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>    at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>    at java.lang.Thread.run(Thread.java:619)
>>>   Caused by: java.lang.RuntimeException: 
>>> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>>    at 
>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>>>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>>    at 
>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>>    ... 11 more
>>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>>> Directory is closed
>>>    at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>>    at 
>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>>>    at 
>>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>>>    at 
>>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>>>    at 
>>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>>    ... 14 more
>>>   Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
>>> commit
>>>
>>> Any ideas, anyone?
>>>
>>> -- Bill
>>>
>>
>
> 

Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
Hey William, sorry about the trouble. I have to look at this further, 
but I think the issue is fixed if you grab the latest trunk build. 
Solr-465 should inadvertently fix things - before that patch, a 
deprecated constructor for solrsearcher was being called - this 
constructor caused the underlying IndexReader to close its own 
Directory, and since IndexReaders are reopened, we don't want that.

Mark Miller wrote:
> Looks like there might be an issue with the reopen - I'm not seeing 
> what it could be offhand though. Have to find what could be closing a 
> Directory unexpectedly...I'll try to take a further look over the 
> weekend.
>
> - Mark
>
> William Pierce wrote:
>> Folks:
>>
>> I am using the nightly build of 1.3 as of Oct 23 so as to use the 
>> replication handler.   I am running on windows 2003 server with 
>> tomcat 6.0.14.   Everything was running fine until I noticed that 
>> certain updated records were not showing up on the slave.  Further 
>> investigation showed me that the failures have indeed been occurring 
>> since early this morning with a fatal exception....here is a segment 
>> of the tomcat log:
>>   INFO: Total time taken for download : 0 secs
>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
>> fetchLatestIndex
>>   INFO: Conf files are not downloaded or are in sync
>>   Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
>> commit
>>   INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
>> doSnapPull
>>   SEVERE: SnapPull failed   org.apache.solr.common.SolrException: 
>> Snappull failed :    at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>>    at 
>> org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208) 
>>
>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>>    at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>    at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
>>
>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>    at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 
>>
>>    at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) 
>>
>>    at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) 
>>
>>    at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) 
>>
>>    at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
>>
>>    at java.lang.Thread.run(Thread.java:619)
>>   Caused by: java.lang.RuntimeException: 
>> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>>    at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350) 
>>
>>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>>    at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>>    ... 11 more
>>   Caused by: org.apache.lucene.store.AlreadyClosedException: this 
>> Directory is closed
>>    at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>>    at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) 
>>
>>    at 
>> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) 
>>
>>    at 
>> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) 
>>
>>    at 
>> org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124) 
>>
>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>>    ... 14 more
>>   Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
>> commit
>>
>> Any ideas, anyone?
>>
>> -- Bill
>>   
>


Re: Fatal exception in solr 1.3+ replication

Posted by Mark Miller <ma...@gmail.com>.
Looks like there might be an issue with the reopen - I'm not seeing what 
it could be offhand though. Have to find what could be closing a 
Directory unexpectedly...I'll try to take a further look over the weekend.

- Mark

William Pierce wrote:
> Folks:
>
> I am using the nightly build of 1.3 as of Oct 23 so as to use the replication handler.   I am running on windows 2003 server with tomcat 6.0.14.   Everything was running fine until I noticed that certain updated records were not showing up on the slave.  Further investigation showed me that the failures have indeed been occurring since early this morning with a fatal exception....here is a segment of the tomcat log:
>   INFO: Total time taken for download : 0 secs
>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
>   INFO: Conf files are not downloaded or are in sync
>   Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
>   INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
>   Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler doSnapPull
>   SEVERE: SnapPull failed 
>   org.apache.solr.common.SolrException: Snappull failed : 
>    at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
>    at org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>    at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>    at java.lang.Thread.run(Thread.java:619)
>   Caused by: java.lang.RuntimeException: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
>    at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
>    at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
>    ... 11 more
>   Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>    at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
>    at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
>    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
>    at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
>    at org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
>    at org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
>    ... 14 more
>   Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 commit
>
> Any ideas, anyone?
>
> -- Bill
>