You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by "Aled Sage (JIRA)" <ji...@apache.org> on 2016/07/12 13:34:20 UTC

[jira] [Resolved] (BROOKLYN-313) Exception stacktrace never logged when resize effector fails, called by AutoScaler policy

     [ https://issues.apache.org/jira/browse/BROOKLYN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aled Sage resolved BROOKLYN-313.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 0.10.0

> Exception stacktrace never logged when resize effector fails, called by AutoScaler policy
> -----------------------------------------------------------------------------------------
>
>                 Key: BROOKLYN-313
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-313
>             Project: Brooklyn
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Aled Sage
>             Fix For: 0.10.0
>
>
> A customer was using the ControlledDynamicWebAppCluster, with a AutoScalerPolicy. However, they also had an EntitlementManager that throws a NullPointerException whenever it is called with {{context==null || context.user==null}} (which happens for automated calls not triggered explicitly by a user, such as for policies).
> The underlying cause of the NullPointerException was never logged. All that made it into the logs was something like:
> {noformat}
> 2016-07-09 16:07:57,491 DEBUG o.a.b.util.core.task.BasicTask [brooklyn-autoscalerpolicy-0]: call from Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 'Task[Auto-scaler]@DHtSIGZf' finishes, ended with error: java.util.concurrent.ExecutionException: org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> {noformat}
> We should *always* log (at least at debug) the underlying cause of an exception, even if it risks logging it multiple times.
> I've reproduced this in a test case. The thread stacktraces are shown below:
> The thread below is about to throw a NullPointerException. The logging from this thread is also shown. The underlying exception is recorded in the task (so a subsequent task.get() would throw it):
> {noformat}
> Daemon Thread [brooklyn-execmanager-cU8EcGHT-39] (Suspended (breakpoint at line 64 in EffectorExceptionLoggingTest$ThrowingEntitlementManager))	
> 	EffectorExceptionLoggingTest$ThrowingEntitlementManager.isEntitled(EntitlementContext, EntitlementClass<T>, T) line: 64	
> 	Entitlements.isEntitled(EntitlementManager, EntitlementClass<T>, T) line: 409	
> 	Entitlements.checkEntitled(EntitlementManager, EntitlementClass<T>, T) line: 414	
> 	EntityManagementSupport$EntityChangeListenerImpl.onEffectorStarting(Effector<?>, Object) line: 469	
> 	EffectorUtils.invokeMethodEffector(Entity, Effector<T>, Map<String,?>) line: 274	
> 	MethodEffector<T>.call(Entity, Map) line: 148	
> 	MethodEffector<T>(AbstractEffector<T>).call(Map, Entity) line: 61	
> 	AbstractEffector$1$1.call() line: 83	
> 	DynamicSequentialTask$DstJob.call() line: 359	
> 	BasicExecutionManager$SubmissionCallable<T>.call() line: 519	
> 	FutureTask<V>.run() line: 262	
> 	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145	
> 	ThreadPoolExecutor$Worker.run() line: 615	
> 	Thread.run() line: 745	
> 2016-07-09 16:33:47,333 WARN  o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-cU8EcGHT-39]: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> 2016-07-09 16:33:51,213 DEBUG o.a.b.u.c.t.BasicExecutionManager [brooklyn-execmanager-cU8EcGHT-39]: Exception running task Task[resize]@ZbLY4WUg (rethrowing): org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> 2016-07-09 16:33:51,214 DEBUG o.a.b.u.core.task.CompoundTask [brooklyn-execmanager-cU8EcGHT-36]: Secondary job queue for Task[Auto-scaler]@pOJ7r6Y0 ignoring error in inessential task Task[resize]@ZbLY4WUg: java.util.concurrent.ExecutionException: org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> {noformat}
> The thread below invoked the effector - note that the AutoScalerPolicy has wrapped the invocation in a task. The logging from this thread is also shown. Within that task, it will not catch the exception:
> {noformat}
> "brooklyn-execmanager-cU8EcGHT-0" daemon prio=5 tid=0x00007fcfe5b5f800 nid=0x6003 waiting on condition [0x00007000018de000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007f93d7ab0> (a java.util.concurrent.FutureTask)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:187)
>         at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
>         at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
>         at org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
>         at org.apache.brooklyn.util.core.task.DynamicTasks$TaskQueueingResult.andWaitForSuccess(DynamicTasks.java:160)
>         at org.apache.brooklyn.core.objs.proxy.EntityProxyImpl.invoke(EntityProxyImpl.java:211)
>         at com.sun.proxy.$Proxy19.resize(Unknown Source)
>         at org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppClusterImpl.resize(ControlledDynamicWebAppClusterImpl.java:297)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$3.resize(AutoScalerPolicy.java:364)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1054)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$14.call(AutoScalerPolicy.java:1)
>         at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
>         at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-07-09 16:36:54,385 DEBUG o.a.b.u.c.t.BasicExecutionManager [brooklyn-execmanager-cU8EcGHT-0]: Exception running task Task[Auto-scaler]@pOJ7r6Y0 (rethrowing): org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> {noformat}
> The thread below invoked the task. It has called {{task.blockUntilEnded}} to wait for the result:
> {noformat}
> "brooklyn-autoscalerpolicy-0" prio=5 tid=0x00007fcfe5b49800 nid=0x1407 waiting on condition [0x0000700002b14000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007f81a07b0> (a java.util.concurrent.FutureTask)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:187)
>         at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
>         at org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:417)
>         at org.apache.brooklyn.util.core.task.BasicTask.blockUntilEnded(BasicTask.java:407)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.resizeNow(AutoScalerPolicy.java:1066)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy.access$15(AutoScalerPolicy.java:1021)
>         at org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy$13.run(AutoScalerPolicy.java:952)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-07-09 16:38:45,089 DEBUG o.a.b.util.core.task.BasicTask [brooklyn-autoscalerpolicy-0]: call from Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 'Task[Auto-scaler]@pOJ7r6Y0' finishes, ended with error: java.util.concurrent.ExecutionException: org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: Error invoking resize at DynamicWebAppClusterImpl{id=iyo37ngogv}: NullPointerException
> {noformat}
> ---
> Note we'd have had the same problem if the cluster.resize() effector itself had thrown the exception (rather than the entitlements). We'd have only got logging like:
> {noformat}
> 2016-07-09 16:43:52,293 DEBUG o.a.b.u.c.t.BasicExecutionManager [brooklyn-execmanager-NpLl6jt3-1]: Exception running task Task[Auto-scaler]@uLBsngUp (rethrowing): java.lang.NullPointerException
> 2016-07-09 16:43:52,294 DEBUG o.a.b.util.core.task.BasicTask [brooklyn-autoscalerpolicy-0]: call from Thread[brooklyn-autoscalerpolicy-0,5,main], blocking until 'Task[Auto-scaler]@uLBsngUp' finishes, ended with error: java.util.concurrent.ExecutionException: java.lang.NullPointerException
> {noformat}
> This lack of logging is hard to reproduce in a simpler test case where effectors are invoked directly. It is primarily because of the additional layers of task, and the use of {{basicTask.blockUntilEnded}}, that we lose the message.
> However, I have in the past seen something similar when there is an exception in an effector invoked from the web-console: the web-console's activity view showed the full exception stacktrace, but the debug log did not have it. This is very bad for when a customer sends us a debug log!
> ---
> It's unclear what the holistic fix for this should be. 
> We do not want to require the caller of the effector to always wrap the call in a try-catch with logging: the failure of an effector should get logged automatically, and should include the underlying cause. Any other option risks us being sent customer log files that give no indication of what caused the problem.
> ---
> We can hopefully log any effector invocation error in the same way. Alternatively we could treat framework exceptions (e.g. NPE preparing to execute) as different from exceptions during the actual effector implementation. 
> I don't want to do it in low-level code like {{BasicTask.blockUntilEnded}}, because that will probably result in a huge amount of duplicate logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)