You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2012/09/17 16:18:08 UTC

[jira] [Commented] (WHIRR-655) installations can time out on slow networks -timeout needs to be configurable

    [ https://issues.apache.org/jira/browse/WHIRR-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457038#comment-13457038 ] 

Steve Loughran commented on WHIRR-655:
--------------------------------------

This can be triggered by asking for every hadoop service at once; if I break it up then (somehow) it gets cached and comes in faster the next time.


The console doesn't pick up the problem, it just blocks happily

{code}
Authorizing firewall ingress to [hdp1] on ports [8021] for [10.0.0.82/32]
no answer to DNS resolution attempt for 10.0.0.82; using fallback
>> running InitScript{INSTANCE_NAME=configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker} on node(hdp1)
2012-09-17 15:09:32

{code}

But {{whirr.log}} shows the system is in trouble.

{code}

2012-09-17 15:08:07,317 DEBUG [jclouds.compute] (main) >> running [/tmp/init-configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker init] as stevel@10.0.0.82
2012-09-17 15:08:07,381 DEBUG [jclouds.compute] (main) << init(0)
2012-09-17 15:08:07,381 DEBUG [jclouds.compute] (main) >> running [echo 'null'|sudo -S /tmp/init-configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker start] as stevel@10.0.0.82
2012-09-17 15:08:08,466 DEBUG [jclouds.compute] (main) << start(0)
2012-09-17 15:09:05,783 ERROR [net.schmizz.sshj.transport.TransportImpl] (reader) Dying because - java.net.SocketTimeoutException: Read timed out

{code}

Threads

{code}

"sftp reader" prio=5 tid=7fd053093000 nid=0x117698000 in Object.wait() [117697000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <7f42b0088> (a net.schmizz.sshj.common.Buffer$PlainBuffer)
	at java.lang.Object.wait(Object.java:485)
	at net.schmizz.sshj.connection.channel.ChannelInputStream.read(ChannelInputStream.java:128)
	- locked <7f42b0088> (a net.schmizz.sshj.common.Buffer$PlainBuffer)
	at net.schmizz.sshj.sftp.PacketReader.readIntoBuffer(PacketReader.java:49)
	at net.schmizz.sshj.sftp.PacketReader.getPacketLength(PacketReader.java:57)
	at net.schmizz.sshj.sftp.PacketReader.readPacket(PacketReader.java:73)
	at net.schmizz.sshj.sftp.PacketReader.run(PacketReader.java:85)

"reader" prio=5 tid=7fd04d859000 nid=0x117595000 runnable [117594000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at net.schmizz.sshj.transport.Reader.run(Reader.java:68)

"user thread 1" prio=5 tid=7fd0530db000 nid=0x115e88000 waiting on condition [115e87000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.jclouds.predicates.RetryablePredicate.apply(RetryablePredicate.java:74)
	at org.jclouds.compute.callables.BlockUntilInitScriptStatusIsZeroThenReturnOutput.run(BlockUntilInitScriptStatusIsZeroThenReturnOutput.java:149)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)

"com.google.inject.internal.util.$Finalizer" daemon prio=5 tid=7fd04fe41000 nid=0x116e04000 in Object.wait() [116e03000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <7f4a0b2a8> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <7f4a0b2a8> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at com.google.inject.internal.util.$Finalizer.run(Finalizer.java:114)

"Low Memory Detector" daemon prio=5 tid=7fd04d80b800 nid=0x11641f000 runnable [00000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=9 tid=7fd04d80b000 nid=0x11631c000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=9 tid=7fd04d80a000 nid=0x116219000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=9 tid=7fd04d809800 nid=0x116116000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (Concurrent GC)" daemon prio=5 tid=7fd04d808800 nid=0x116013000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=7fd04f941800 nid=0x115d85000 in Object.wait() [115d84000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <7f44e48f0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <7f44e48f0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=7fd04f941000 nid=0x115c82000 in Object.wait() [115c81000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <7f44e5ea0> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <7f44e5ea0> (a java.lang.ref.Reference$Lock)

"main" prio=5 tid=7fd050800800 nid=0x10de10000 waiting on condition [10de0f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <7f42b0210> (a com.google.common.util.concurrent.AbstractFuture$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:280)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
	at org.apache.whirr.actions.ByonClusterAction.doAction(ByonClusterAction.java:151)
	at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:126)
	at org.apache.whirr.ByonClusterController.configureServices(ByonClusterController.java:99)
	at org.apache.whirr.ClusterController.configureServices(ClusterController.java:153)
	at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:114)
	at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:69)
	at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:59)
	at org.apache.whirr.cli.Main.run(Main.java:69)
	at org.apache.whirr.cli.Main.main(Main.java:102)

"VM Thread" prio=9 tid=7fd04f93c800 nid=0x115b7f000 runnable 

"Gang worker#0 (Parallel GC Threads)" prio=9 tid=7fd04f800000 nid=0x111218000 runnable 

"Gang worker#1 (Parallel GC Threads)" prio=9 tid=7fd04f801000 nid=0x11131b000 runnable 

"Gang worker#2 (Parallel GC Threads)" prio=9 tid=7fd04f801800 nid=0x11141e000 runnable 

"Gang worker#3 (Parallel GC Threads)" prio=9 tid=7fd04f802000 nid=0x111521000 runnable 

"Gang worker#4 (Parallel GC Threads)" prio=9 tid=7fd04f802800 nid=0x111624000 runnable 

"Gang worker#5 (Parallel GC Threads)" prio=9 tid=7fd04f803800 nid=0x111727000 runnable 

"Gang worker#6 (Parallel GC Threads)" prio=9 tid=7fd04f808800 nid=0x11182a000 runnable 

"Gang worker#7 (Parallel GC Threads)" prio=9 tid=7fd04f809000 nid=0x11192d000 runnable 

"Concurrent Mark-Sweep GC Thread" prio=9 tid=7fd04f8e6800 nid=0x1157f9000 runnable 
"Gang worker#0 (Parallel CMS Threads)" prio=9 tid=7fd04f8e5800 nid=0x114df3000 runnable 

"Gang worker#1 (Parallel CMS Threads)" prio=9 tid=7fd04f8e6000 nid=0x114ef6000 runnable 

"VM Periodic Task Thread" prio=10 tid=7fd04d81d800 nid=0x116522000 waiting on condition 

"Exception Catcher Thread" prio=10 tid=7fd050801800 nid=0x10e03f000 runnable 
JNI global references: 1317

{code}
                
> installations can time out on slow networks -timeout needs to be configurable
> -----------------------------------------------------------------------------
>
>                 Key: WHIRR-655
>                 URL: https://issues.apache.org/jira/browse/WHIRR-655
>             Project: Whirr
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>         Environment: cable connection; first yum install of a set of artifacts hosted on S3
>            Reporter: Steve Loughran
>            Priority: Minor
>
> Downloading RPMs from a yum repo over a slow link can take so long that scripts start to time out & your systems are left in an indeterminate state. It ought to be possible to specify timeouts on a per-cluster basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira