You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phil Black-Knight <pb...@globalgiving.org> on 2014/10/02 15:25:09 UTC

RE: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

I was helping to look into this with Nick & I think we may have figured out
the core of the problem...

The problem is easily reproducible by starting replication on the slave and
then sending a shutdown command to tomcat (e.g. catalina.sh stop).

With a debugger attached, it looks like the fsyncService thread is blocking
VM shutdown because it is created as a non-daemon thread.

Essentially what seems to be happening is that the fsyncService thread is
running when 'catalina.sh stop' is executed. This goes in and calls
SnapPuller.destroy() which aborts the current sync. Around line 517 of the
SnapPuller, there is code that is supposed to cleanup the fsyncService
thread, but I don't think it is getting executed because the thread that
called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so
the JVM ends up shutting that down before it can cleanup the fysncService...

So... it seems like:

    if (fsyncService != null)
ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
could be added around line 1706 of SnapPuller.java,  or

          puller.setDaemon(*false*);
could be added around line 230 of ReplicationHandler.java, however this
needs some additional work (and I think it might need to be added
regardless) since the cleanup code in SnapPuller(around 517) that shuts
down the fsync thread never gets execute since
logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest
of the finally block...So the call to
logReplicationTimeAndConfFiles() around line 512 would need to get wrapped
with a try/catch block to catch the IO exception...

I can submit patches if needed... and cross post to the dev mailing list...

-Phil

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

Posted by Phil Black-Knight <pb...@globalgiving.org>.
I haven't seen any activity regarding this in Jira, just curious if it
would be looked into anytime soon...

On Thu, Oct 2, 2014 at 10:11 AM, Phil Black-Knight <
pblackknight@globalgiving.org> wrote:

> see the ticket here:
> https://issues.apache.org/jira/browse/SOLR-6579
>
> including a patch to fix it.
>
> On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 10/2/2014 7:25 AM, Phil Black-Knight wrote:
>> > I was helping to look into this with Nick & I think we may have figured
>> out
>> > the core of the problem...
>> >
>> > The problem is easily reproducible by starting replication on the slave
>> and
>> > then sending a shutdown command to tomcat (e.g. catalina.sh stop).
>> >
>> > With a debugger attached, it looks like the fsyncService thread is
>> blocking
>> > VM shutdown because it is created as a non-daemon thread.
>>
>> <snip>
>>
>> > I can submit patches if needed... and cross post to the dev mailing
>> list...
>>
>> File a detailed issue in Jira and attach your patch there.  This is our
>> bugtracker.  You need an account on the Apache jira instance to do this:
>>
>> https://issues.apache.org/jira/browse/SOLR
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

Posted by Phil Black-Knight <pb...@globalgiving.org>.
see the ticket here:
https://issues.apache.org/jira/browse/SOLR-6579

including a patch to fix it.

On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/2/2014 7:25 AM, Phil Black-Knight wrote:
> > I was helping to look into this with Nick & I think we may have figured
> out
> > the core of the problem...
> >
> > The problem is easily reproducible by starting replication on the slave
> and
> > then sending a shutdown command to tomcat (e.g. catalina.sh stop).
> >
> > With a debugger attached, it looks like the fsyncService thread is
> blocking
> > VM shutdown because it is created as a non-daemon thread.
>
> <snip>
>
> > I can submit patches if needed... and cross post to the dev mailing
> list...
>
> File a detailed issue in Jira and attach your patch there.  This is our
> bugtracker.  You need an account on the Apache jira instance to do this:
>
> https://issues.apache.org/jira/browse/SOLR
>
> Thanks,
> Shawn
>
>

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/2/2014 7:25 AM, Phil Black-Knight wrote:
> I was helping to look into this with Nick & I think we may have figured out
> the core of the problem...
> 
> The problem is easily reproducible by starting replication on the slave and
> then sending a shutdown command to tomcat (e.g. catalina.sh stop).
> 
> With a debugger attached, it looks like the fsyncService thread is blocking
> VM shutdown because it is created as a non-daemon thread.

<snip>

> I can submit patches if needed... and cross post to the dev mailing list...

File a detailed issue in Jira and attach your patch there.  This is our
bugtracker.  You need an account on the Apache jira instance to do this:

https://issues.apache.org/jira/browse/SOLR

Thanks,
Shawn