You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Yiannis Gkoufas <jo...@gmail.com> on 2015/02/18 21:18:57 UTC

Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) -
execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 -
ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
buffers: required 120, but only 2 of 2048 available.
at
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at
org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org
$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at
org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at
org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Ufuk Celebi <uc...@apache.org>.

Good idea. I've changed the message. :)

On 04 Mar 2015, at 14:51, Robert Metzger <rm...@apache.org> wrote:

> I agree with Henry.
> We should include the name of the required configuration parameter into the exception.
> Users often run into this issue.

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Robert Metzger <rm...@apache.org>.

I agree with Henry.
We should include the name of the required configuration parameter into the
exception.
Users often run into this issue.

I've filed a JIRA to track the fix:
https://issues.apache.org/jira/browse/FLINK-1646


On Thu, Feb 19, 2015 at 6:18 PM, Henry Saputra <he...@gmail.com>
wrote:

> Would it be helpful to add additional message in the error message in
> NetworkBufferPool#createBufferPool to check the
> taskmanager.network.numberOfBuffers property?
>
>
> - Henry
>
> On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <jo...@gmail.com>
> wrote:
> > Perfect! It worked! Thanks a lot for the help!
> >
> > On 18 February 2015 at 22:13, Fabian Hueske <fh...@gmail.com> wrote:
> >>
> >> 2048 is the default. So you didn't actually increase the number of
> buffers
> >> ;-)
> >>
> >> Try 4096 or so.
> >>
> >> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
> >>>
> >>> Hi!
> >>>
> >>> thank you for your replies!
> >>> I increased the number of network buffers:
> >>>
> >>> taskmanager.network.numberOfBuffers: 2048
> >>>
> >>> but I am still getting the same error:
> >>>
> >>> Insufficient number of network buffers: required 120, but only 2 of
> 2048
> >>> available.
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> On 18 February 2015 at 20:27, Fabian Hueske <fh...@gmail.com> wrote:
> >>>>
> >>>> Hi Yiannis,
> >>>>
> >>>> if you scale Flink to larger setups you need to adapt the number of
> >>>> network buffers.
> >>>> The background section of the configuration reference explains the
> >>>> details on that [1].
> >>>>
> >>>> Let us know, if that helped to solve the problem.
> >>>>
> >>>> Best, Fabian
> >>>>
> >>>> [1] http://flink.apache.org/docs/0.8/config.html#background
> >>>>
> >>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
> >>>>>
> >>>>> Hi there,
> >>>>>
> >>>>> I have a cluster of 10 nodes with 12 CPUs each.
> >>>>> This is my configuration:
> >>>>>
> >>>>> jobmanager.rpc.port: 6123
> >>>>>
> >>>>> jobmanager.heap.mb: 4024
> >>>>>
> >>>>> taskmanager.heap.mb: 8096
> >>>>>
> >>>>> taskmanager.numberOfTaskSlots: 12
> >>>>>
> >>>>> parallelization.degree.default: 120
> >>>>>
> >>>>> I have been getting the following error:
> >>>>>
> >>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1))
> (65/120)
> >>>>> - execution #0 to slot SimpleSlot (1)(0) -
> efc370a0b2a9a63f2e7b960cfe4e4c27
> >>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of
> network
> >>>>> buffers: required 120, but only 2 of 2048 available.
> >>>>> at
> >>>>>
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
> >>>>> at
> >>>>> org.apache.flink.runtime.taskmanager.TaskManager.org
> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
> >>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
> >>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
> >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>>> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>>>
> >>>>> at
> >>>>>
> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
> >>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
> >>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
> >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
> >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
> >>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >>>>> at
> >>>>>
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> >>>>> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>>>
> >>>>>
> >>>>> I failed to get any info online on how to solve it.
> >>>>> Any help would be welcome.
> >>>>>
> >>>>> Thank you!
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Henry Saputra <he...@gmail.com>.

Would it be helpful to add additional message in the error message in
NetworkBufferPool#createBufferPool to check the
taskmanager.network.numberOfBuffers property?


- Henry

On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <jo...@gmail.com> wrote:
> Perfect! It worked! Thanks a lot for the help!
>
> On 18 February 2015 at 22:13, Fabian Hueske <fh...@gmail.com> wrote:
>>
>> 2048 is the default. So you didn't actually increase the number of buffers
>> ;-)
>>
>> Try 4096 or so.
>>
>> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>>>
>>> Hi!
>>>
>>> thank you for your replies!
>>> I increased the number of network buffers:
>>>
>>> taskmanager.network.numberOfBuffers: 2048
>>>
>>> but I am still getting the same error:
>>>
>>> Insufficient number of network buffers: required 120, but only 2 of 2048
>>> available.
>>>
>>> Thanks a lot!
>>>
>>>
>>> On 18 February 2015 at 20:27, Fabian Hueske <fh...@gmail.com> wrote:
>>>>
>>>> Hi Yiannis,
>>>>
>>>> if you scale Flink to larger setups you need to adapt the number of
>>>> network buffers.
>>>> The background section of the configuration reference explains the
>>>> details on that [1].
>>>>
>>>> Let us know, if that helped to solve the problem.
>>>>
>>>> Best, Fabian
>>>>
>>>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>>>
>>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>>>>>
>>>>> Hi there,
>>>>>
>>>>> I have a cluster of 10 nodes with 12 CPUs each.
>>>>> This is my configuration:
>>>>>
>>>>> jobmanager.rpc.port: 6123
>>>>>
>>>>> jobmanager.heap.mb: 4024
>>>>>
>>>>> taskmanager.heap.mb: 8096
>>>>>
>>>>> taskmanager.numberOfTaskSlots: 12
>>>>>
>>>>> parallelization.degree.default: 120
>>>>>
>>>>> I have been getting the following error:
>>>>>
>>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>>>> - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27
>>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>>>> buffers: required 120, but only 2 of 2048 available.
>>>>> at
>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>>>> at
>>>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> at
>>>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>> at
>>>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>>> I failed to get any info online on how to solve it.
>>>>> Any help would be welcome.
>>>>>
>>>>> Thank you!
>>>>
>>>>
>>>
>>
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Yiannis Gkoufas <jo...@gmail.com>.

Perfect! It worked! Thanks a lot for the help!

On 18 February 2015 at 22:13, Fabian Hueske <fh...@gmail.com> wrote:

> 2048 is the default. So you didn't actually increase the number of buffers
> ;-)
>
> Try 4096 or so.
>
> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>
>> Hi!
>>
>> thank you for your replies!
>> I increased the number of network buffers:
>>
>> taskmanager.network.numberOfBuffers: 2048
>>
>> but I am still getting the same error:
>>
>> Insufficient number of network buffers: required 120, but only 2 of 2048
>> available.
>>
>> Thanks a lot!
>>
>>
>> On 18 February 2015 at 20:27, Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Hi Yiannis,
>>>
>>> if you scale Flink to larger setups you need to adapt the number of
>>> network buffers.
>>> The background section of the configuration reference explains the
>>> details on that [1].
>>>
>>> Let us know, if that helped to solve the problem.
>>>
>>> Best, Fabian
>>>
>>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>>
>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>>>
>>>> Hi there,
>>>>
>>>> I have a cluster of 10 nodes with 12 CPUs each.
>>>> This is my configuration:
>>>>
>>>> jobmanager.rpc.port: 6123
>>>>
>>>> jobmanager.heap.mb: 4024
>>>>
>>>> taskmanager.heap.mb: 8096
>>>>
>>>> taskmanager.numberOfTaskSlots: 12
>>>>
>>>> parallelization.degree.default: 120
>>>>
>>>> I have been getting the following error:
>>>>
>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>>> - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27
>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>>> buffers: required 120, but only 2 of 2048 available.
>>>> at
>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>>> at
>>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>>> at org.apache.flink.runtime.taskmanager.TaskManager.org
>>>> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>>> at
>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>>> at
>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> at
>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> at
>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> at
>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>> at
>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>> at
>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> at
>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>
>>>> at
>>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>> at
>>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>
>>>>
>>>> I failed to get any info online on how to solve it.
>>>> Any help would be welcome.
>>>>
>>>> Thank you!
>>>>
>>>
>>>
>>
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Fabian Hueske <fh...@gmail.com>.

2048 is the default. So you didn't actually increase the number of buffers
;-)

Try 4096 or so.

2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:

> Hi!
>
> thank you for your replies!
> I increased the number of network buffers:
>
> taskmanager.network.numberOfBuffers: 2048
>
> but I am still getting the same error:
>
> Insufficient number of network buffers: required 120, but only 2 of 2048
> available.
>
> Thanks a lot!
>
>
> On 18 February 2015 at 20:27, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi Yiannis,
>>
>> if you scale Flink to larger setups you need to adapt the number of
>> network buffers.
>> The background section of the configuration reference explains the
>> details on that [1].
>>
>> Let us know, if that helped to solve the problem.
>>
>> Best, Fabian
>>
>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>
>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>>
>>> Hi there,
>>>
>>> I have a cluster of 10 nodes with 12 CPUs each.
>>> This is my configuration:
>>>
>>> jobmanager.rpc.port: 6123
>>>
>>> jobmanager.heap.mb: 4024
>>>
>>> taskmanager.heap.mb: 8096
>>>
>>> taskmanager.numberOfTaskSlots: 12
>>>
>>> parallelization.degree.default: 120
>>>
>>> I have been getting the following error:
>>>
>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>> - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27
>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>> buffers: required 120, but only 2 of 2048 available.
>>> at
>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>> at
>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>> at org.apache.flink.runtime.taskmanager.TaskManager.org
>>> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>> at
>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>> at
>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>> at
>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>> at
>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>> at
>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>>
>>> I failed to get any info online on how to solve it.
>>> Any help would be welcome.
>>>
>>> Thank you!
>>>
>>
>>
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Yiannis Gkoufas <jo...@gmail.com>.

Hi!

thank you for your replies!
I increased the number of network buffers:

taskmanager.network.numberOfBuffers: 2048

but I am still getting the same error:

Insufficient number of network buffers: required 120, but only 2 of 2048
available.

Thanks a lot!


On 18 February 2015 at 20:27, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Yiannis,
>
> if you scale Flink to larger setups you need to adapt the number of
> network buffers.
> The background section of the configuration reference explains the details
> on that [1].
>
> Let us know, if that helped to solve the problem.
>
> Best, Fabian
>
> [1] http://flink.apache.org/docs/0.8/config.html#background
>
> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:
>
>> Hi there,
>>
>> I have a cluster of 10 nodes with 12 CPUs each.
>> This is my configuration:
>>
>> jobmanager.rpc.port: 6123
>>
>> jobmanager.heap.mb: 4024
>>
>> taskmanager.heap.mb: 8096
>>
>> taskmanager.numberOfTaskSlots: 12
>>
>> parallelization.degree.default: 120
>>
>> I have been getting the following error:
>>
>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) -
>> execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 -
>> ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>> buffers: required 120, but only 2 of 2048 available.
>> at
>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>> at
>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>> at org.apache.flink.runtime.taskmanager.TaskManager.org
>> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> at
>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>> at
>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>> I failed to get any info online on how to solve it.
>> Any help would be welcome.
>>
>> Thank you!
>>
>
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Yiannis,

if you scale Flink to larger setups you need to adapt the number of network
buffers.
The background section of the configuration reference explains the details
on that [1].

Let us know, if that helped to solve the problem.

Best, Fabian

[1] http://flink.apache.org/docs/0.8/config.html#background

2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <jo...@gmail.com>:

> Hi there,
>
> I have a cluster of 10 nodes with 12 CPUs each.
> This is my configuration:
>
> jobmanager.rpc.port: 6123
>
> jobmanager.heap.mb: 4024
>
> taskmanager.heap.mb: 8096
>
> taskmanager.numberOfTaskSlots: 12
>
> parallelization.degree.default: 120
>
> I have been getting the following error:
>
> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) -
> execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 -
> ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
> buffers: required 120, but only 2 of 2048 available.
> at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
> at
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
> at org.apache.flink.runtime.taskmanager.TaskManager.org
> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
> at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> at
> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
> at akka.dispatch.OnComplete.internal(Future.scala:247)
> at akka.dispatch.OnComplete.internal(Future.scala:244)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> I failed to get any info online on how to solve it.
> Any help would be welcome.
>
> Thank you!
>

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Posted by Stephan Ewen <se...@apache.org>.

Hi Yiannis!

You need to increase the number of buffers for your setup. Here is a FAQ
entry with a few pointers:

http://flink.apache.org/docs/0.8/faq.html#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this

Greetings,
Stephan
Am 18.02.2015 21:21 schrieb "Yiannis Gkoufas" <jo...@gmail.com>:

> Hi there,
>
> I have a cluster of 10 nodes with 12 CPUs each.
> This is my configuration:
>
> jobmanager.rpc.port: 6123
>
> jobmanager.heap.mb: 4024
>
> taskmanager.heap.mb: 8096
>
> taskmanager.numberOfTaskSlots: 12
>
> parallelization.degree.default: 120
>
> I have been getting the following error:
>
> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) -
> execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 -
> ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
> buffers: required 120, but only 2 of 2048 available.
> at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
> at
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
> at org.apache.flink.runtime.taskmanager.TaskManager.org
> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
> at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> at
> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
> at akka.dispatch.OnComplete.internal(Future.scala:247)
> at akka.dispatch.OnComplete.internal(Future.scala:244)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> I failed to get any info online on how to solve it.
> Any help would be welcome.
>
> Thank you!
>