You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Noble Paul (JIRA)" <ji...@apache.org> on 2014/11/10 15:48:34 UTC

[jira] [Assigned] (SOLR-6579) SnapPuller Replication blocks clean shutdown of tomcat

     [ https://issues.apache.org/jira/browse/SOLR-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul reassigned SOLR-6579:
--------------------------------

    Assignee: Noble Paul  (was: Shalin Shekhar Mangar)

> SnapPuller Replication blocks clean shutdown of tomcat
> ------------------------------------------------------
>
>                 Key: SOLR-6579
>                 URL: https://issues.apache.org/jira/browse/SOLR-6579
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.10.1
>            Reporter: Philip Black-Knight
>            Assignee: Noble Paul
>         Attachments: cleanupSnapPullerFinally.patch
>
>
> main issue was described in the mailing list here:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser
> and 
> here:
> but also including the quotes:
> original message from Nick
> {quote}
> Hello,
> I have solr 4.10 running on tomcat 7. I'm doing replication from one master
> to about 10 slaves, with standard configuration:
> {code}
>       <requestHandler name="/replication" class="solr.ReplicationHandler" >
>            <lst name="master">
>              <str name="enable">${enable.master:false}</str>
>              <str name="replicateAfter">commit</str>
>              <str name="replicateAfter">startup</str>
>              <str name="confFiles">schema.xml,stopwords.txt</str>
>              <!-- str name="commitReserveDuration">00:00:10</str -->
>            </lst>
>            <lst name="slave">
>              <str name="enable">${enable.slave:false}</str>
>              <str name="masterUrl">http://master:8080/solr/mycore</str>
>              <str name="pollInterval">00:00:60</str>
>            </lst>
>       </requestHandler>
> {code}
> It appears that if tomcat gets shutdown while solr is replicating, it
> prevents tomcat from shutting down fully. Immediately after receiving the
> shutdown command, a thread dump is logged into catalina.out (this may have
> been turned on by some configuration someone else on my team made). I
> removed some threads that didn't look related, mostly about tomcat session
> replication, or with names like "http-bio-8080-exec-10".
> {code}
>     62252 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014928648&wt=json} status=0 QTime=6
>     63310 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014929699&wt=json} status=0 QTime=6
>     2014-09-29 14:22:10
>     Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed
> mode):
>     "fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d
> waiting on condition [0x00007f3c271f0000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007e1ff4458> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800
> nid=0x203c runnable [0x00007f3c272f1000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.SocketInputStream.socketRead0(Native Method)
>             at java.net.SocketInputStream.read(SocketInputStream.java:152)
>             at java.net.SocketInputStream.read(SocketInputStream.java:122)
>             at
> org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
>             at
> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174)
>             at
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
>             at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
>             at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114)
>             at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
>             at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239)
>             at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187)
>             at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774)
>             at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424)
>             at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
>             at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
>     "process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b
> waiting on condition [0x00007f3c984e9000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>             at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039
> waiting on condition [0x00007f3cac040000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>             at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000
> nid=0x202e waiting on condition [0x00007f3c27af9000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
>             at java.lang.Thread.run(Thread.java:745)
>     "ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800
> nid=0x202d runnable [0x00007f3c985ea000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c
> runnable [0x00007f3c986eb000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.SocketInputStream.socketRead0(Native Method)
>             at java.net.SocketInputStream.read(SocketInputStream.java:152)
>             at java.net.SocketInputStream.read(SocketInputStream.java:122)
>             at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516)
>             at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501)
>             at
> org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173)
>             at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927)
>             at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
>             - locked <0x00000007df0874b0> (a
> org.apache.tomcat.util.net.SocketWrapper)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000
> nid=0x202b waiting on condition [0x00007f3cac1aa000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000
> nid=0x202a runnable [0x00007f3cac2ab000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>             at java.lang.Thread.run(Thread.java:745)
>     "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
> tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508)
>             at java.lang.Thread.run(Thread.java:745)
>     "clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in
> Object.wait() [0x00007f3c27bfa000]
>        java.lang.Thread.State: TIMED_WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
>             at org.webmacro.util.Clock$ClockThread.run(Clock.java:108)
>             - locked <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
>     "TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027
> waiting on condition [0x00007f3c27cfb000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237)
>     "pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022
> waiting on condition [0x00007f3c98246000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b90b4a0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in
> Object.wait() [0x00007f3c9835d000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>             - locked <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>             at sun.java2d.Disposer.run(Disposer.java:145)
>             at java.lang.Thread.run(Thread.java:745)
>     "MySQL Statement Cancellation Timer" daemon prio=10
> tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue)
>             at java.lang.Object.wait(Object.java:503)
>             at java.util.TimerThread.mainLoop(Timer.java:526)
>             - locked <0x00000007e5f45ad8> (a java.util.TaskQueue)
>             at java.util.TimerThread.run(Timer.java:505)
>     "Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in
> Object.wait() [0x00007f3cad0bc000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e605a8d8> (a java.util.TaskQueue)
>             at java.lang.Object.wait(Object.java:503)
>             at java.util.TimerThread.mainLoop(Timer.java:526)
>             - locked <0x00000007e605a8d8> (a java.util.TaskQueue)
>             at java.util.TimerThread.run(Timer.java:505)
>     "Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in
> Object.wait() [0x00007f3cac3ac000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070b92b230> (a java.lang.Object)
>             at java.lang.Object.wait(Object.java:503)
>             at org.apache.solr.core.CloserThread.run(CoreContainer.java:905)
>             - locked <0x000000070b92b230> (a java.lang.Object)
>     "snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d
> waiting on condition [0x00007f3cac555000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b92b2a8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>             at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
>             at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c
> waiting on condition [0x00007f3cac810000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b92b440> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in
> Object.wait() [0x00007f3cd8137000]
>        java.lang.Thread.State: TIMED_WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
>             at sun.misc.GC$Daemon.run(GC.java:117)
>             - locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
>     "Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b
> runnable [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in
> Object.wait() [0x00007f3cd8c1b000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>             - locked <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>             at
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
>     "Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in
> Object.wait() [0x00007f3cadffe000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070b8b23b8> (a
> java.lang.ref.Reference$Lock)
>             at java.lang.Object.wait(Object.java:503)
>             at
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
>             - locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock)
>     "main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable
> [0x00007f3ce675c000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.catalina.core.StandardServer.await(StandardServer.java:452)
>             at org.apache.catalina.startup.Catalina.await(Catalina.java:766)
>             at org.apache.catalina.startup.Catalina.start(Catalina.java:712)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>             at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>             at java.lang.reflect.Method.invoke(Method.java:606)
>             at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
>             at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456)
>     JNI global references: 422
>     Heap
>      PSYoungGen      total 1184256K, used 1031985K [0x00000007aaa80000,
> 0x0000000800000000, 0x0000000800000000)
>       eden space 970240K, 94% used
> [0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000)
>       from space 214016K, 55% used
> [0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000)
>       to   space 212480K, 0% used
> [0x00000007f3080000,0x00000007f3080000,0x0000000800000000)
>      ParOldGen       total 1398272K, used 209681K [0x0000000700000000,
> 0x0000000755580000, 0x00000007aaa80000)
>       object space 1398272K, 14% used
> [0x0000000700000000,0x000000070ccc4700,0x0000000755580000)
>      PSPermGen       total 78848K, used 78659K [0x00000006c0000000,
> 0x00000006c4d00000, 0x0000000700000000)
>       object space 78848K, 99% used
> [0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000)
> right after this, here comes the shutdown:
>     Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await
>     INFO: A valid shutdown command was received via the shutdown port.
> Stopping the Server instance.
>     Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
>     INFO: Pausing ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
>     INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
>     Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService
> stopInternal
>     INFO: Stopping service Catalina
>     Sep 29, 2014 2:22:11 PM
> org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal
>     INFO: JvmRouteBinderValve stopped
>     Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
>     INFO: Manager [localhost#/dy] expiring sessions upon shutdown
> (snip some stuff about an unrelated webapp, then the solr errors)
>     64876 [localhost-startStop-2] INFO  org.apache.solr.core.CoreContainer
>  – Shutting down CoreContainer instance=1359815787
>     64877 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
> [mycore]  CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50
>     64882 [localhost-startStop-2] INFO
>  org.apache.solr.update.UpdateHandler  – closing
> DirectUpdateHandler2{commits=0,autocommit
> maxTime=15000ms,autocommits=0,soft
> autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.SolrCoreState  – Closing SolrCoreState
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – SolrCoreState ref count has
> reached 0 - closing IndexWriter
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – closing IndexWriter with
> IndexWriterCloser
>     64885 [localhost-startStop-2] ERROR
> org.apache.solr.update.DefaultSolrCoreState  – Error during shutdown of
> writer.
>     org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
> closed
>             at
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700)
>             at
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384)
>             at
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795)
>             at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70)
>             at
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371)
>             at
> org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72)
>             at org.apache.solr.core.SolrCore.close(SolrCore.java:1073)
>             at org.apache.solr.core.SolrCores.close(SolrCores.java:117)
>             at
> org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347)
>             at
> org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200)
>             at
> org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315)
>             at
> org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782)
>             at
> org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565)
>             at
> org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
>             at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575)
>             at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564)
>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K),
> 0.0427280 secs]
>     64929 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
> [mycore] Closing main searcher on request.
>     64935 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing
> NRTCachingDirectoryFactory - 3 directories currently being tracked
>     69937 [explicit-fetchindex-cmd] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – Creating new IndexWriter...
>     69938 [explicit-fetchindex-cmd] ERROR
> org.apache.solr.handler.ReplicationHandler  – SnapPull failed
> :org.apache.lucene.store.AlreadyClosedException: Already closed
>             at
> org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341)
>             at
> org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827)
>             at
> org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569)
>             at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511)
>             at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
>             at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
>     69943 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Removing directory before
> core close:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
>     INFO: Manager [localhost#/solr] expiring sessions upon shutdown
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> clearReferencesThreads
>     SEVERE: The web application [/solr] appears to have started a thread
> named [fsyncService-12-thread-1] but has failed to stop it. This is very
> likely to create a memory leak.
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> checkThreadLocalMapForLeaks
>     SEVERE: The web application [/solr] created a ThreadLocal with key of
> type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a
> value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
> (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
> but failed to remove it when the web application was stopped. Threads are
> going to be renewed over time to try and avoid a probable memory leak.
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
>     INFO: Stopping ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
>     INFO: Stopping ProtocolHandler ["ajp-bio-8009"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
>     INFO: Destroying ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
>     INFO: Destroying ProtocolHandler ["ajp-bio-8009"]
> {code}
> At this point it looks like tomcat has shutdown, but the process is still
> running, as checked by ps. If I start a new tomcat and repeat this process
> a few times, we end up with several zombie instances of tomcat running.
> Is this designed behavior? What is supposed to happen when the container is
> shutdown while replication is running? How are others handling this?
> Please let me know if I can provide any more information.
> Thanks in advance,
> Nick
> {quote}
> My followup
> {quote}
> I was helping to look into this with Nick & I think we may have figured out the core of the problem...
> The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop).
> With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread.
> Essentially what seems to be happening is that the fsyncService thread is running when 'catalina.sh stop' is executed. This goes in and calls SnapPuller.destroy() which aborts the current sync. Around line 517 of the SnapPuller, there is code that is supposed to cleanup the fsyncService thread, but I don't think it is getting executed because the thread that called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends up shutting that down before it can cleanup the fysncService...
> So... it seems like:
> {code}
>     if (fsyncService != null) 
> ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
> {code}
> could be added around line 1706 of SnapPuller.java,  or
> {code}
>            // since otherwise the JVM may kill the thread before any cleanup
>            // code can be called
>           puller.setDaemon(false);
> {code}
> could be added around line 230 of ReplicationHandler.java, however this needs some additional work (and I think it might need to be added regardless) since the cleanup code in SnapPuller(around 517) that shuts down the fsync thread never gets execute since logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest of the finally block...So the call to 
> {code}logReplicationTimeAndConfFiles(){code} around line 512 would need to get wrapped with a try/catch block to catch the IO exception...
> {quote}
> I will attach a patch as well that I believe fixes the issues. Is there a reason to *not* perform a hard shutdown of the fysnc thread? It seems sane enough to me to do a hard shutdown of the fsyncService if the action is aborted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org