You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Rick Mangi <ri...@chartbeat.com> on 2015/12/18 17:53:44 UTC

CoordinatorStream errors

Hi all,

I just started seeing these errors the other day. I am heavily refactoring my code, but it works locally. I’m wondering if anyone has seen this error when deploying to yarn.

This is in stderr log on my application master.

Exception in thread "AMRM Callback Handler Thread" org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.util.ConcurrentModificationException
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:299)
Caused by: java.util.ConcurrentModificationException
	at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
	at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:405)
	at org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.getBootstrappedStream(CoordinatorStreamSystemConsumer.java:184)
	at org.apache.samza.coordinator.stream.AbstractCoordinatorStreamManager.getBootstrappedStream(AbstractCoordinatorStreamManager.java:85)
	at org.apache.samza.container.LocalityManager.readContainerLocality(LocalityManager.java:98)
	at org.apache.samza.job.model.JobModel.getContainerToHostValue(JobModel.java:96)
	at org.apache.samza.job.yarn.SamzaTaskManager.onContainerCompleted(SamzaTaskManager.java:213)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.samza.job.yarn.SamzaAppMaster$.onContainersCompleted(SamzaAppMaster.scala:143)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)

The jobs start up briefly and then the AM starts throwing this error and fails the job.



Re: CoordinatorStream errors

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Ok. Sounds good. Thanks!

On Mon, Dec 21, 2015 at 11:38 AM, Rick Mangi <ri...@chartbeat.com> wrote:

> HI Navina,
>
> It stopped happening once I deleted an old checkpoint topic. I think in
> the rapid development cycle my checkpoints became invalid. If it happens
> again I will save the logs.
>
> Thanks!
>
>
> > On Dec 21, 2015, at 2:14 PM, Navina Ramesh <nr...@linkedin.com.INVALID>
> wrote:
> >
> > Hi Rick,
> > Can you share the entire log for this issue? I suspect the concurrent
> > access happens on the bootstrappedSet (LinkedHashSet -> not thread safe)
> > between the Job Coordinator and SamzaAppMaster.
> >
> > When a container fails, the AM tried to read the locality information. If
> > some other container requests for the Jobmodel at the same time, the
> > JobCoordinator also bootstraps. However, these 2 events are supposed to
> > happen in order (first the AM reads locality info, then the JC refreshed
> > JobModel). I think this ordering is not guaranteed during job startup
> when
> > containers may still be coming up.
> > I am not entirely sure if this is what is happening.
> >
> > It will be great if you can share the log.
> >
> > Thanks!
> > navina
> >
> > On Fri, Dec 18, 2015 at 8:53 AM, Rick Mangi <ri...@chartbeat.com> wrote:
> >
> >> Hi all,
> >>
> >> I just started seeing these errors the other day. I am heavily
> refactoring
> >> my code, but it works locally. I’m wondering if anyone has seen this
> error
> >> when deploying to yarn.
> >>
> >> This is in stderr log on my application master.
> >>
> >> Exception in thread "AMRM Callback Handler Thread"
> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> >> java.util.ConcurrentModificationException
> >>        at
> >>
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:299)
> >> Caused by: java.util.ConcurrentModificationException
> >>        at
> >>
> java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
> >>        at
> java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:405)
> >>        at
> >>
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.getBootstrappedStream(CoordinatorStreamSystemConsumer.java:184)
> >>        at
> >>
> org.apache.samza.coordinator.stream.AbstractCoordinatorStreamManager.getBootstrappedStream(AbstractCoordinatorStreamManager.java:85)
> >>        at
> >>
> org.apache.samza.container.LocalityManager.readContainerLocality(LocalityManager.java:98)
> >>        at
> >>
> org.apache.samza.job.model.JobModel.getContainerToHostValue(JobModel.java:96)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaTaskManager.onContainerCompleted(SamzaTaskManager.java:213)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
> >>        at scala.collection.immutable.List.foreach(List.scala:318)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
> >>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> >>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> >>        at
> >> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> >>        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> >>        at
> >>
> org.apache.samza.job.yarn.SamzaAppMaster$.onContainersCompleted(SamzaAppMaster.scala:143)
> >>        at
> >>
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> >>
> >> The jobs start up briefly and then the AM starts throwing this error and
> >> fails the job.
> >>
> >>
> >>
> >
> >
> > --
> > Navina R.
>
>


-- 
Navina R.

Re: CoordinatorStream errors

Posted by Rick Mangi <ri...@chartbeat.com>.
HI Navina,

It stopped happening once I deleted an old checkpoint topic. I think in the rapid development cycle my checkpoints became invalid. If it happens again I will save the logs.

Thanks!


> On Dec 21, 2015, at 2:14 PM, Navina Ramesh <nr...@linkedin.com.INVALID> wrote:
> 
> Hi Rick,
> Can you share the entire log for this issue? I suspect the concurrent
> access happens on the bootstrappedSet (LinkedHashSet -> not thread safe)
> between the Job Coordinator and SamzaAppMaster.
> 
> When a container fails, the AM tried to read the locality information. If
> some other container requests for the Jobmodel at the same time, the
> JobCoordinator also bootstraps. However, these 2 events are supposed to
> happen in order (first the AM reads locality info, then the JC refreshed
> JobModel). I think this ordering is not guaranteed during job startup when
> containers may still be coming up.
> I am not entirely sure if this is what is happening.
> 
> It will be great if you can share the log.
> 
> Thanks!
> navina
> 
> On Fri, Dec 18, 2015 at 8:53 AM, Rick Mangi <ri...@chartbeat.com> wrote:
> 
>> Hi all,
>> 
>> I just started seeing these errors the other day. I am heavily refactoring
>> my code, but it works locally. I’m wondering if anyone has seen this error
>> when deploying to yarn.
>> 
>> This is in stderr log on my application master.
>> 
>> Exception in thread "AMRM Callback Handler Thread"
>> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
>> java.util.ConcurrentModificationException
>>        at
>> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:299)
>> Caused by: java.util.ConcurrentModificationException
>>        at
>> java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
>>        at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:405)
>>        at
>> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.getBootstrappedStream(CoordinatorStreamSystemConsumer.java:184)
>>        at
>> org.apache.samza.coordinator.stream.AbstractCoordinatorStreamManager.getBootstrappedStream(AbstractCoordinatorStreamManager.java:85)
>>        at
>> org.apache.samza.container.LocalityManager.readContainerLocality(LocalityManager.java:98)
>>        at
>> org.apache.samza.job.model.JobModel.getContainerToHostValue(JobModel.java:96)
>>        at
>> org.apache.samza.job.yarn.SamzaTaskManager.onContainerCompleted(SamzaTaskManager.java:213)
>>        at
>> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
>>        at
>> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
>>        at scala.collection.immutable.List.foreach(List.scala:318)
>>        at
>> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
>>        at
>> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
>>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>        at
>> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>        at
>> org.apache.samza.job.yarn.SamzaAppMaster$.onContainersCompleted(SamzaAppMaster.scala:143)
>>        at
>> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
>> 
>> The jobs start up briefly and then the AM starts throwing this error and
>> fails the job.
>> 
>> 
>> 
> 
> 
> --
> Navina R.


Re: CoordinatorStream errors

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Hi Rick,
Can you share the entire log for this issue? I suspect the concurrent
access happens on the bootstrappedSet (LinkedHashSet -> not thread safe)
between the Job Coordinator and SamzaAppMaster.

When a container fails, the AM tried to read the locality information. If
some other container requests for the Jobmodel at the same time, the
JobCoordinator also bootstraps. However, these 2 events are supposed to
happen in order (first the AM reads locality info, then the JC refreshed
JobModel). I think this ordering is not guaranteed during job startup when
containers may still be coming up.
I am not entirely sure if this is what is happening.

It will be great if you can share the log.

Thanks!
navina

On Fri, Dec 18, 2015 at 8:53 AM, Rick Mangi <ri...@chartbeat.com> wrote:

> Hi all,
>
> I just started seeing these errors the other day. I am heavily refactoring
> my code, but it works locally. I’m wondering if anyone has seen this error
> when deploying to yarn.
>
> This is in stderr log on my application master.
>
> Exception in thread "AMRM Callback Handler Thread"
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.util.ConcurrentModificationException
>         at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:299)
> Caused by: java.util.ConcurrentModificationException
>         at
> java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
>         at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:405)
>         at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.getBootstrappedStream(CoordinatorStreamSystemConsumer.java:184)
>         at
> org.apache.samza.coordinator.stream.AbstractCoordinatorStreamManager.getBootstrappedStream(AbstractCoordinatorStreamManager.java:85)
>         at
> org.apache.samza.container.LocalityManager.readContainerLocality(LocalityManager.java:98)
>         at
> org.apache.samza.job.model.JobModel.getContainerToHostValue(JobModel.java:96)
>         at
> org.apache.samza.job.yarn.SamzaTaskManager.onContainerCompleted(SamzaTaskManager.java:213)
>         at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
>         at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1$$anonfun$apply$5.apply(SamzaAppMaster.scala:143)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
>         at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onContainersCompleted$1.apply(SamzaAppMaster.scala:143)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at
> org.apache.samza.job.yarn.SamzaAppMaster$.onContainersCompleted(SamzaAppMaster.scala:143)
>         at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
>
> The jobs start up briefly and then the AM starts throwing this error and
> fails the job.
>
>
>


-- 
Navina R.