You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Jack Huang <ja...@mz.com> on 2016/06/10 17:55:37 UTC

java.rmi.server.ExportException: Port already in use

Hi all,

I am having difficulty determining the reason my Samza task is failing. It
generally failed within 10 minutes of start. When I examine the YARN log I
see the following exception on some but not all containers:

java.rmi.server.ExportException: Port already in use: 40029; nested
exception is:
        java.net.BindException: Address already in use
        at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341)
        at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249)
        at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
        at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
        at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208)
        at java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:383)
        at java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:346)
        at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:118)
        at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:95)
        at javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:404)
        at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:89)
        at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:43)
        at org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
        at org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
        at org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:91)
        at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
        at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
Caused by: java.net.BindException: Address already in use
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.ServerSocket.bind(ServerSocket.java:375)
        at java.net.ServerSocket.<init>(ServerSocket.java:237)
        at java.net.ServerSocket.<init>(ServerSocket.java:128)
        at sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(RMIDirectSocketFactory.java:45)
        at sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(RMIMasterSocketFactory.java:345)
        at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
        at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330)
        ... 16 more

​


Aside from this, I also seeing a lot of garbage collection messages, but no
OutOfMemoryError:

2016-06-10T09:52:48.335+0000: 1.150: [GC (System.gc())
115345K->9996K(1005056K), 0.0097432 secs]
2016-06-10T09:52:48.345+0000: 1.160: [Full GC (System.gc())
9996K->9414K(1005056K), 0.0291634 secs]
2016-06-10T09:52:50.032+0000: 2.846: [GC (Allocation Failure)
271558K->18186K(1005056K), 0.0094124 secs]
2016-06-10T09:52:50.592+0000: 3.406: [GC (Allocation Failure)
280330K->13998K(1005056K), 0.0029778 secs]
2016-06-10T09:52:51.036+0000: 3.850: [GC (Allocation Failure)
276142K->11966K(1005056K), 0.0029768 secs]
2016-06-10T09:52:51.437+0000: 4.252: [GC (Allocation Failure)
274110K->11942K(1005056K), 0.0033398 secs]
2016-06-10T09:52:53.367+0000: 6.182: [GC (Metadata GC Threshold)
114006K->14958K(1037312K), 0.0050709 secs]
2016-06-10T09:52:53.372+0000: 6.187: [Full GC (Metadata GC Threshold)
14958K->8364K(1037312K), 0.0251294 secs]
2016-06-10T09:52:54.438+0000: 7.252: [GC (Allocation Failure)
302764K->14676K(1005056K), 0.0073039 secs]
2016-06-10T09:52:54.952+0000: 7.766: [GC (Allocation Failure)
309076K->18076K(1034752K), 0.0105222 secs]
2016-06-10T09:52:55.471+0000: 8.285: [GC (Allocation Failure)
342684K->14556K(1036288K), 0.0067447 secs]
2016-06-10T09:52:55.970+0000: 8.784: [GC (Allocation Failure)
339164K->15956K(1037312K), 0.0038883 secs]
2016-06-10T09:52:56.473+0000: 9.287: [GC (Allocation Failure)
342612K->13940K(1037312K), 0.0034693 secs]
2016-06-10T09:52:56.958+0000: 9.773: [GC (Allocation Failure)
340596K->15608K(1037824K), 0.0049325 secs]
2016-06-10T09:52:57.452+0000: 10.266: [GC (Allocation Failure)
343288K->18659K(1037824K), 0.0155791 secs]
2016-06-10T09:52:58.000+0000: 10.814: [GC (Allocation Failure)
346339K->19508K(1036800K), 0.0154724 secs]
2016-06-10T09:52:58.528+0000: 11.342: [GC (Allocation Failure)
346164K->23116K(1037312K), 0.0033848 secs]
2016-06-10T09:52:58.999+0000: 11.813: [GC (Allocation Failure)
349772K->26455K(1038336K), 0.0081673 secs]
2016-06-10T09:52:59.488+0000: 12.302: [GC (Allocation Failure)
354647K->27086K(1037824K), 0.0046321 secs]
2016-06-10T09:52:59.937+0000: 12.751: [GC (Allocation Failure)
355278K->30694K(1038336K), 0.0032607 secs]
2016-06-10T09:53:00.375+0000: 13.189: [GC (Allocation Failure)
358886K->34214K(1037824K), 0.0053298 secs]
2016-06-10T09:53:00.819+0000: 13.634: [GC (Allocation Failure)
362406K->34860K(1038848K), 0.0064049 secs]
2016-06-10T09:53:01.261+0000: 14.075: [GC (Allocation Failure)
364588K->38492K(1038848K), 0.0053598 secs]
2016-06-10T09:53:01.708+0000: 14.522: [GC (Allocation Failure)
368220K->40955K(1039360K), 0.0054838 secs]
2016-06-10T09:53:02.156+0000: 14.971: [GC (Allocation Failure)
371707K->42472K(1039360K), 0.0097664 secs]
2016-06-10T09:53:02.602+0000: 15.416: [GC (Allocation Failure)
373224K->46177K(1040384K), 0.0057107 secs]
2016-06-10T09:53:03.052+0000: 15.866: [GC (Allocation Failure)
378465K->46599K(1039872K), 0.0063725 secs]
2016-06-10T09:53:03.497+0000: 16.311: [GC (Allocation Failure)
378887K->50183K(1040384K), 0.0095219 secs]
2016-06-10T09:53:03.945+0000: 16.759: [GC (Allocation Failure)
382983K->52824K(1040384K), 0.0044979 secs]
2016-06-10T09:53:04.392+0000: 17.206: [GC (Allocation Failure)
385624K->54387K(1040896K), 0.0053487 secs]
2016-06-10T09:53:04.841+0000: 17.656: [GC (Allocation Failure)
388211K->56947K(1040896K), 0.0025053 secs]
2016-06-10T09:53:05.303+0000: 18.117: [GC (Allocation Failure)
390771K->59701K(1041408K), 0.0054432 secs]
2016-06-10T09:53:05.757+0000: 18.571: [GC (Allocation Failure)
394037K->60303K(1040896K), 0.0053059 secs]
2016-06-10T09:53:06.208+0000: 19.022: [GC (Allocation Failure)
394639K->63759K(1041408K), 0.0024715 secs]
2016-06-10T09:53:06.661+0000: 19.475: [GC (Allocation Failure)
398607K->66585K(1041408K), 0.0039495 secs]
2016-06-10T09:53:07.109+0000: 19.924: [GC (Allocation Failure)
401433K->67278K(1041408K), 0.0056410 secs]
2016-06-10T09:53:07.563+0000: 20.377: [GC (Allocation Failure)
402126K->70798K(1041408K), 0.0031031 secs]
2016-06-10T09:53:08.038+0000: 20.853: [GC (Allocation Failure)
405646K->73652K(1041408K), 0.0057616 secs]

​

Can anyone help?


Regards,
Jack

Re: java.rmi.server.ExportException: Port already in use

Posted by Jacob Maes <jm...@linkedin.com.INVALID>.
Hey Jack,

Be sure to check the AM log in addition to the container logs. The AM log
will show cases where the AM killed a container (e.g. for exceeding
yarn.container.memory.mb).

The only time I've ever seen a container die with nothing in the container
log, syserr, or AM log, it was being killed locally (i.e. someone or some
script was running "kill -9")

-Jake

On Fri, Jun 10, 2016 at 7:07 PM, Jack Huang <ja...@mz.com> wrote:

> Hi Jake,
>
> Thanks for the reply. It appears that my task fails too many times within
> the yarn.container.retry.window.ms
> ​
> window. Increasing yarn.container.retry.count​seems to solve the issue for
> the time being.
> It still remains unknown why my tasks are failing so often. There is no
> error messages in the log to help...
>
> Thanks,
> Jack
>
>
> On Fri, Jun 10, 2016 at 1:21 PM, Jacob Maes <ja...@gmail.com> wrote:
>
> > Hey Jack,
> >
> > The port collisions are fixed in
> > https://issues.apache.org/jira/browse/SAMZA-932. However, they usually
> > don't occur frequently enough to cause the job to fail. Usually only a
> > couple containers fail with this exception but the job restarts them and
> > eventually runs smoothly.
> >
> > Your GC log looks normal. Don't be discouraged by the term "Allocation
> > Failure". It just means that Garbage Collection was invoked because a
> > memory allocation failed. That's the most common cause of GC in any Java
> > application.
> >
> > -Jake
> >
> > On Fri, Jun 10, 2016 at 10:55 AM, Jack Huang <ja...@mz.com> wrote:
> >
> > > Hi all,
> > >
> > > I am having difficulty determining the reason my Samza task is failing.
> > It
> > > generally failed within 10 minutes of start. When I examine the YARN
> log
> > I
> > > see the following exception on some but not all containers:
> > >
> > > java.rmi.server.ExportException: Port already in use: 40029; nested
> > > exception is:
> > >         java.net.BindException: Address already in use
> > >         at
> > sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341)
> > >         at
> > > sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249)
> > >         at
> > > sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
> > >         at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
> > >         at
> > > sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208)
> > >         at
> > >
> >
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:383)
> > >         at
> > >
> >
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:346)
> > >         at
> > >
> >
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:118)
> > >         at
> > >
> >
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:95)
> > >         at
> > >
> >
> javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:404)
> > >         at
> org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:89)
> > >         at
> org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:43)
> > >         at
> > >
> >
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
> > >         at
> > >
> >
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
> > >         at
> > >
> >
> org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:91)
> > >         at
> > >
> org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
> > >         at
> > > org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
> > > Caused by: java.net.BindException: Address already in use
> > >         at java.net.PlainSocketImpl.socketBind(Native Method)
> > >         at
> > > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> > >         at java.net.ServerSocket.bind(ServerSocket.java:375)
> > >         at java.net.ServerSocket.<init>(ServerSocket.java:237)
> > >         at java.net.ServerSocket.<init>(ServerSocket.java:128)
> > >         at
> > >
> >
> sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(RMIDirectSocketFactory.java:45)
> > >         at
> > >
> >
> sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(RMIMasterSocketFactory.java:345)
> > >         at
> > > sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
> > >         at
> > sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330)
> > >         ... 16 more
> > >
> > > ​
> > >
> > >
> > > Aside from this, I also seeing a lot of garbage collection messages,
> but
> > no
> > > OutOfMemoryError:
> > >
> > > 2016-06-10T09:52:48.335+0000: 1.150: [GC (System.gc())
> > > 115345K->9996K(1005056K), 0.0097432 secs]
> > > 2016-06-10T09:52:48.345+0000: 1.160: [Full GC (System.gc())
> > > 9996K->9414K(1005056K), 0.0291634 secs]
> > > 2016-06-10T09:52:50.032+0000: 2.846: [GC (Allocation Failure)
> > > 271558K->18186K(1005056K), 0.0094124 secs]
> > > 2016-06-10T09:52:50.592+0000: 3.406: [GC (Allocation Failure)
> > > 280330K->13998K(1005056K), 0.0029778 secs]
> > > 2016-06-10T09:52:51.036+0000: 3.850: [GC (Allocation Failure)
> > > 276142K->11966K(1005056K), 0.0029768 secs]
> > > 2016-06-10T09:52:51.437+0000: 4.252: [GC (Allocation Failure)
> > > 274110K->11942K(1005056K), 0.0033398 secs]
> > > 2016-06-10T09:52:53.367+0000: 6.182: [GC (Metadata GC Threshold)
> > > 114006K->14958K(1037312K), 0.0050709 secs]
> > > 2016-06-10T09:52:53.372+0000: 6.187: [Full GC (Metadata GC Threshold)
> > > 14958K->8364K(1037312K), 0.0251294 secs]
> > > 2016-06-10T09:52:54.438+0000: 7.252: [GC (Allocation Failure)
> > > 302764K->14676K(1005056K), 0.0073039 secs]
> > > 2016-06-10T09:52:54.952+0000: 7.766: [GC (Allocation Failure)
> > > 309076K->18076K(1034752K), 0.0105222 secs]
> > > 2016-06-10T09:52:55.471+0000: 8.285: [GC (Allocation Failure)
> > > 342684K->14556K(1036288K), 0.0067447 secs]
> > > 2016-06-10T09:52:55.970+0000: 8.784: [GC (Allocation Failure)
> > > 339164K->15956K(1037312K), 0.0038883 secs]
> > > 2016-06-10T09:52:56.473+0000: 9.287: [GC (Allocation Failure)
> > > 342612K->13940K(1037312K), 0.0034693 secs]
> > > 2016-06-10T09:52:56.958+0000: 9.773: [GC (Allocation Failure)
> > > 340596K->15608K(1037824K), 0.0049325 secs]
> > > 2016-06-10T09:52:57.452+0000: 10.266: [GC (Allocation Failure)
> > > 343288K->18659K(1037824K), 0.0155791 secs]
> > > 2016-06-10T09:52:58.000+0000: 10.814: [GC (Allocation Failure)
> > > 346339K->19508K(1036800K), 0.0154724 secs]
> > > 2016-06-10T09:52:58.528+0000: 11.342: [GC (Allocation Failure)
> > > 346164K->23116K(1037312K), 0.0033848 secs]
> > > 2016-06-10T09:52:58.999+0000: 11.813: [GC (Allocation Failure)
> > > 349772K->26455K(1038336K), 0.0081673 secs]
> > > 2016-06-10T09:52:59.488+0000: 12.302: [GC (Allocation Failure)
> > > 354647K->27086K(1037824K), 0.0046321 secs]
> > > 2016-06-10T09:52:59.937+0000: 12.751: [GC (Allocation Failure)
> > > 355278K->30694K(1038336K), 0.0032607 secs]
> > > 2016-06-10T09:53:00.375+0000: 13.189: [GC (Allocation Failure)
> > > 358886K->34214K(1037824K), 0.0053298 secs]
> > > 2016-06-10T09:53:00.819+0000: 13.634: [GC (Allocation Failure)
> > > 362406K->34860K(1038848K), 0.0064049 secs]
> > > 2016-06-10T09:53:01.261+0000: 14.075: [GC (Allocation Failure)
> > > 364588K->38492K(1038848K), 0.0053598 secs]
> > > 2016-06-10T09:53:01.708+0000: 14.522: [GC (Allocation Failure)
> > > 368220K->40955K(1039360K), 0.0054838 secs]
> > > 2016-06-10T09:53:02.156+0000: 14.971: [GC (Allocation Failure)
> > > 371707K->42472K(1039360K), 0.0097664 secs]
> > > 2016-06-10T09:53:02.602+0000: 15.416: [GC (Allocation Failure)
> > > 373224K->46177K(1040384K), 0.0057107 secs]
> > > 2016-06-10T09:53:03.052+0000: 15.866: [GC (Allocation Failure)
> > > 378465K->46599K(1039872K), 0.0063725 secs]
> > > 2016-06-10T09:53:03.497+0000: 16.311: [GC (Allocation Failure)
> > > 378887K->50183K(1040384K), 0.0095219 secs]
> > > 2016-06-10T09:53:03.945+0000: 16.759: [GC (Allocation Failure)
> > > 382983K->52824K(1040384K), 0.0044979 secs]
> > > 2016-06-10T09:53:04.392+0000: 17.206: [GC (Allocation Failure)
> > > 385624K->54387K(1040896K), 0.0053487 secs]
> > > 2016-06-10T09:53:04.841+0000: 17.656: [GC (Allocation Failure)
> > > 388211K->56947K(1040896K), 0.0025053 secs]
> > > 2016-06-10T09:53:05.303+0000: 18.117: [GC (Allocation Failure)
> > > 390771K->59701K(1041408K), 0.0054432 secs]
> > > 2016-06-10T09:53:05.757+0000: 18.571: [GC (Allocation Failure)
> > > 394037K->60303K(1040896K), 0.0053059 secs]
> > > 2016-06-10T09:53:06.208+0000: 19.022: [GC (Allocation Failure)
> > > 394639K->63759K(1041408K), 0.0024715 secs]
> > > 2016-06-10T09:53:06.661+0000: 19.475: [GC (Allocation Failure)
> > > 398607K->66585K(1041408K), 0.0039495 secs]
> > > 2016-06-10T09:53:07.109+0000: 19.924: [GC (Allocation Failure)
> > > 401433K->67278K(1041408K), 0.0056410 secs]
> > > 2016-06-10T09:53:07.563+0000: 20.377: [GC (Allocation Failure)
> > > 402126K->70798K(1041408K), 0.0031031 secs]
> > > 2016-06-10T09:53:08.038+0000: 20.853: [GC (Allocation Failure)
> > > 405646K->73652K(1041408K), 0.0057616 secs]
> > >
> > > ​
> > >
> > > Can anyone help?
> > >
> > >
> > > Regards,
> > > Jack
> > >
> >
>

Re: java.rmi.server.ExportException: Port already in use

Posted by Jack Huang <ja...@mz.com>.
Hi Jake,

Thanks for the reply. It appears that my task fails too many times within
the yarn.container.retry.window.ms
​
window. Increasing yarn.container.retry.count​seems to solve the issue for
the time being.
It still remains unknown why my tasks are failing so often. There is no
error messages in the log to help...

Thanks,
Jack


On Fri, Jun 10, 2016 at 1:21 PM, Jacob Maes <ja...@gmail.com> wrote:

> Hey Jack,
>
> The port collisions are fixed in
> https://issues.apache.org/jira/browse/SAMZA-932. However, they usually
> don't occur frequently enough to cause the job to fail. Usually only a
> couple containers fail with this exception but the job restarts them and
> eventually runs smoothly.
>
> Your GC log looks normal. Don't be discouraged by the term "Allocation
> Failure". It just means that Garbage Collection was invoked because a
> memory allocation failed. That's the most common cause of GC in any Java
> application.
>
> -Jake
>
> On Fri, Jun 10, 2016 at 10:55 AM, Jack Huang <ja...@mz.com> wrote:
>
> > Hi all,
> >
> > I am having difficulty determining the reason my Samza task is failing.
> It
> > generally failed within 10 minutes of start. When I examine the YARN log
> I
> > see the following exception on some but not all containers:
> >
> > java.rmi.server.ExportException: Port already in use: 40029; nested
> > exception is:
> >         java.net.BindException: Address already in use
> >         at
> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341)
> >         at
> > sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249)
> >         at
> > sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
> >         at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
> >         at
> > sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208)
> >         at
> >
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:383)
> >         at
> >
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:346)
> >         at
> >
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:118)
> >         at
> >
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:95)
> >         at
> >
> javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:404)
> >         at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:89)
> >         at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:43)
> >         at
> >
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
> >         at
> >
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
> >         at
> >
> org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:91)
> >         at
> > org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
> >         at
> > org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
> > Caused by: java.net.BindException: Address already in use
> >         at java.net.PlainSocketImpl.socketBind(Native Method)
> >         at
> > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> >         at java.net.ServerSocket.bind(ServerSocket.java:375)
> >         at java.net.ServerSocket.<init>(ServerSocket.java:237)
> >         at java.net.ServerSocket.<init>(ServerSocket.java:128)
> >         at
> >
> sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(RMIDirectSocketFactory.java:45)
> >         at
> >
> sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(RMIMasterSocketFactory.java:345)
> >         at
> > sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
> >         at
> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330)
> >         ... 16 more
> >
> > ​
> >
> >
> > Aside from this, I also seeing a lot of garbage collection messages, but
> no
> > OutOfMemoryError:
> >
> > 2016-06-10T09:52:48.335+0000: 1.150: [GC (System.gc())
> > 115345K->9996K(1005056K), 0.0097432 secs]
> > 2016-06-10T09:52:48.345+0000: 1.160: [Full GC (System.gc())
> > 9996K->9414K(1005056K), 0.0291634 secs]
> > 2016-06-10T09:52:50.032+0000: 2.846: [GC (Allocation Failure)
> > 271558K->18186K(1005056K), 0.0094124 secs]
> > 2016-06-10T09:52:50.592+0000: 3.406: [GC (Allocation Failure)
> > 280330K->13998K(1005056K), 0.0029778 secs]
> > 2016-06-10T09:52:51.036+0000: 3.850: [GC (Allocation Failure)
> > 276142K->11966K(1005056K), 0.0029768 secs]
> > 2016-06-10T09:52:51.437+0000: 4.252: [GC (Allocation Failure)
> > 274110K->11942K(1005056K), 0.0033398 secs]
> > 2016-06-10T09:52:53.367+0000: 6.182: [GC (Metadata GC Threshold)
> > 114006K->14958K(1037312K), 0.0050709 secs]
> > 2016-06-10T09:52:53.372+0000: 6.187: [Full GC (Metadata GC Threshold)
> > 14958K->8364K(1037312K), 0.0251294 secs]
> > 2016-06-10T09:52:54.438+0000: 7.252: [GC (Allocation Failure)
> > 302764K->14676K(1005056K), 0.0073039 secs]
> > 2016-06-10T09:52:54.952+0000: 7.766: [GC (Allocation Failure)
> > 309076K->18076K(1034752K), 0.0105222 secs]
> > 2016-06-10T09:52:55.471+0000: 8.285: [GC (Allocation Failure)
> > 342684K->14556K(1036288K), 0.0067447 secs]
> > 2016-06-10T09:52:55.970+0000: 8.784: [GC (Allocation Failure)
> > 339164K->15956K(1037312K), 0.0038883 secs]
> > 2016-06-10T09:52:56.473+0000: 9.287: [GC (Allocation Failure)
> > 342612K->13940K(1037312K), 0.0034693 secs]
> > 2016-06-10T09:52:56.958+0000: 9.773: [GC (Allocation Failure)
> > 340596K->15608K(1037824K), 0.0049325 secs]
> > 2016-06-10T09:52:57.452+0000: 10.266: [GC (Allocation Failure)
> > 343288K->18659K(1037824K), 0.0155791 secs]
> > 2016-06-10T09:52:58.000+0000: 10.814: [GC (Allocation Failure)
> > 346339K->19508K(1036800K), 0.0154724 secs]
> > 2016-06-10T09:52:58.528+0000: 11.342: [GC (Allocation Failure)
> > 346164K->23116K(1037312K), 0.0033848 secs]
> > 2016-06-10T09:52:58.999+0000: 11.813: [GC (Allocation Failure)
> > 349772K->26455K(1038336K), 0.0081673 secs]
> > 2016-06-10T09:52:59.488+0000: 12.302: [GC (Allocation Failure)
> > 354647K->27086K(1037824K), 0.0046321 secs]
> > 2016-06-10T09:52:59.937+0000: 12.751: [GC (Allocation Failure)
> > 355278K->30694K(1038336K), 0.0032607 secs]
> > 2016-06-10T09:53:00.375+0000: 13.189: [GC (Allocation Failure)
> > 358886K->34214K(1037824K), 0.0053298 secs]
> > 2016-06-10T09:53:00.819+0000: 13.634: [GC (Allocation Failure)
> > 362406K->34860K(1038848K), 0.0064049 secs]
> > 2016-06-10T09:53:01.261+0000: 14.075: [GC (Allocation Failure)
> > 364588K->38492K(1038848K), 0.0053598 secs]
> > 2016-06-10T09:53:01.708+0000: 14.522: [GC (Allocation Failure)
> > 368220K->40955K(1039360K), 0.0054838 secs]
> > 2016-06-10T09:53:02.156+0000: 14.971: [GC (Allocation Failure)
> > 371707K->42472K(1039360K), 0.0097664 secs]
> > 2016-06-10T09:53:02.602+0000: 15.416: [GC (Allocation Failure)
> > 373224K->46177K(1040384K), 0.0057107 secs]
> > 2016-06-10T09:53:03.052+0000: 15.866: [GC (Allocation Failure)
> > 378465K->46599K(1039872K), 0.0063725 secs]
> > 2016-06-10T09:53:03.497+0000: 16.311: [GC (Allocation Failure)
> > 378887K->50183K(1040384K), 0.0095219 secs]
> > 2016-06-10T09:53:03.945+0000: 16.759: [GC (Allocation Failure)
> > 382983K->52824K(1040384K), 0.0044979 secs]
> > 2016-06-10T09:53:04.392+0000: 17.206: [GC (Allocation Failure)
> > 385624K->54387K(1040896K), 0.0053487 secs]
> > 2016-06-10T09:53:04.841+0000: 17.656: [GC (Allocation Failure)
> > 388211K->56947K(1040896K), 0.0025053 secs]
> > 2016-06-10T09:53:05.303+0000: 18.117: [GC (Allocation Failure)
> > 390771K->59701K(1041408K), 0.0054432 secs]
> > 2016-06-10T09:53:05.757+0000: 18.571: [GC (Allocation Failure)
> > 394037K->60303K(1040896K), 0.0053059 secs]
> > 2016-06-10T09:53:06.208+0000: 19.022: [GC (Allocation Failure)
> > 394639K->63759K(1041408K), 0.0024715 secs]
> > 2016-06-10T09:53:06.661+0000: 19.475: [GC (Allocation Failure)
> > 398607K->66585K(1041408K), 0.0039495 secs]
> > 2016-06-10T09:53:07.109+0000: 19.924: [GC (Allocation Failure)
> > 401433K->67278K(1041408K), 0.0056410 secs]
> > 2016-06-10T09:53:07.563+0000: 20.377: [GC (Allocation Failure)
> > 402126K->70798K(1041408K), 0.0031031 secs]
> > 2016-06-10T09:53:08.038+0000: 20.853: [GC (Allocation Failure)
> > 405646K->73652K(1041408K), 0.0057616 secs]
> >
> > ​
> >
> > Can anyone help?
> >
> >
> > Regards,
> > Jack
> >
>

Re: java.rmi.server.ExportException: Port already in use

Posted by Jacob Maes <ja...@gmail.com>.
Hey Jack,

The port collisions are fixed in
https://issues.apache.org/jira/browse/SAMZA-932. However, they usually
don't occur frequently enough to cause the job to fail. Usually only a
couple containers fail with this exception but the job restarts them and
eventually runs smoothly.

Your GC log looks normal. Don't be discouraged by the term "Allocation
Failure". It just means that Garbage Collection was invoked because a
memory allocation failed. That's the most common cause of GC in any Java
application.

-Jake

On Fri, Jun 10, 2016 at 10:55 AM, Jack Huang <ja...@mz.com> wrote:

> Hi all,
>
> I am having difficulty determining the reason my Samza task is failing. It
> generally failed within 10 minutes of start. When I examine the YARN log I
> see the following exception on some but not all containers:
>
> java.rmi.server.ExportException: Port already in use: 40029; nested
> exception is:
>         java.net.BindException: Address already in use
>         at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341)
>         at
> sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249)
>         at
> sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
>         at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
>         at
> sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208)
>         at
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:383)
>         at
> java.rmi.server.UnicastRemoteObject.exportObject(UnicastRemoteObject.java:346)
>         at
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:118)
>         at
> javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:95)
>         at
> javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:404)
>         at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:89)
>         at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:43)
>         at
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
>         at
> org.apache.samza.container.SamzaContainer$$anonfun$main$2.apply(SamzaContainer.scala:66)
>         at
> org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:91)
>         at
> org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
>         at
> org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
> Caused by: java.net.BindException: Address already in use
>         at java.net.PlainSocketImpl.socketBind(Native Method)
>         at
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>         at java.net.ServerSocket.bind(ServerSocket.java:375)
>         at java.net.ServerSocket.<init>(ServerSocket.java:237)
>         at java.net.ServerSocket.<init>(ServerSocket.java:128)
>         at
> sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(RMIDirectSocketFactory.java:45)
>         at
> sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(RMIMasterSocketFactory.java:345)
>         at
> sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
>         at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330)
>         ... 16 more
>
> ​
>
>
> Aside from this, I also seeing a lot of garbage collection messages, but no
> OutOfMemoryError:
>
> 2016-06-10T09:52:48.335+0000: 1.150: [GC (System.gc())
> 115345K->9996K(1005056K), 0.0097432 secs]
> 2016-06-10T09:52:48.345+0000: 1.160: [Full GC (System.gc())
> 9996K->9414K(1005056K), 0.0291634 secs]
> 2016-06-10T09:52:50.032+0000: 2.846: [GC (Allocation Failure)
> 271558K->18186K(1005056K), 0.0094124 secs]
> 2016-06-10T09:52:50.592+0000: 3.406: [GC (Allocation Failure)
> 280330K->13998K(1005056K), 0.0029778 secs]
> 2016-06-10T09:52:51.036+0000: 3.850: [GC (Allocation Failure)
> 276142K->11966K(1005056K), 0.0029768 secs]
> 2016-06-10T09:52:51.437+0000: 4.252: [GC (Allocation Failure)
> 274110K->11942K(1005056K), 0.0033398 secs]
> 2016-06-10T09:52:53.367+0000: 6.182: [GC (Metadata GC Threshold)
> 114006K->14958K(1037312K), 0.0050709 secs]
> 2016-06-10T09:52:53.372+0000: 6.187: [Full GC (Metadata GC Threshold)
> 14958K->8364K(1037312K), 0.0251294 secs]
> 2016-06-10T09:52:54.438+0000: 7.252: [GC (Allocation Failure)
> 302764K->14676K(1005056K), 0.0073039 secs]
> 2016-06-10T09:52:54.952+0000: 7.766: [GC (Allocation Failure)
> 309076K->18076K(1034752K), 0.0105222 secs]
> 2016-06-10T09:52:55.471+0000: 8.285: [GC (Allocation Failure)
> 342684K->14556K(1036288K), 0.0067447 secs]
> 2016-06-10T09:52:55.970+0000: 8.784: [GC (Allocation Failure)
> 339164K->15956K(1037312K), 0.0038883 secs]
> 2016-06-10T09:52:56.473+0000: 9.287: [GC (Allocation Failure)
> 342612K->13940K(1037312K), 0.0034693 secs]
> 2016-06-10T09:52:56.958+0000: 9.773: [GC (Allocation Failure)
> 340596K->15608K(1037824K), 0.0049325 secs]
> 2016-06-10T09:52:57.452+0000: 10.266: [GC (Allocation Failure)
> 343288K->18659K(1037824K), 0.0155791 secs]
> 2016-06-10T09:52:58.000+0000: 10.814: [GC (Allocation Failure)
> 346339K->19508K(1036800K), 0.0154724 secs]
> 2016-06-10T09:52:58.528+0000: 11.342: [GC (Allocation Failure)
> 346164K->23116K(1037312K), 0.0033848 secs]
> 2016-06-10T09:52:58.999+0000: 11.813: [GC (Allocation Failure)
> 349772K->26455K(1038336K), 0.0081673 secs]
> 2016-06-10T09:52:59.488+0000: 12.302: [GC (Allocation Failure)
> 354647K->27086K(1037824K), 0.0046321 secs]
> 2016-06-10T09:52:59.937+0000: 12.751: [GC (Allocation Failure)
> 355278K->30694K(1038336K), 0.0032607 secs]
> 2016-06-10T09:53:00.375+0000: 13.189: [GC (Allocation Failure)
> 358886K->34214K(1037824K), 0.0053298 secs]
> 2016-06-10T09:53:00.819+0000: 13.634: [GC (Allocation Failure)
> 362406K->34860K(1038848K), 0.0064049 secs]
> 2016-06-10T09:53:01.261+0000: 14.075: [GC (Allocation Failure)
> 364588K->38492K(1038848K), 0.0053598 secs]
> 2016-06-10T09:53:01.708+0000: 14.522: [GC (Allocation Failure)
> 368220K->40955K(1039360K), 0.0054838 secs]
> 2016-06-10T09:53:02.156+0000: 14.971: [GC (Allocation Failure)
> 371707K->42472K(1039360K), 0.0097664 secs]
> 2016-06-10T09:53:02.602+0000: 15.416: [GC (Allocation Failure)
> 373224K->46177K(1040384K), 0.0057107 secs]
> 2016-06-10T09:53:03.052+0000: 15.866: [GC (Allocation Failure)
> 378465K->46599K(1039872K), 0.0063725 secs]
> 2016-06-10T09:53:03.497+0000: 16.311: [GC (Allocation Failure)
> 378887K->50183K(1040384K), 0.0095219 secs]
> 2016-06-10T09:53:03.945+0000: 16.759: [GC (Allocation Failure)
> 382983K->52824K(1040384K), 0.0044979 secs]
> 2016-06-10T09:53:04.392+0000: 17.206: [GC (Allocation Failure)
> 385624K->54387K(1040896K), 0.0053487 secs]
> 2016-06-10T09:53:04.841+0000: 17.656: [GC (Allocation Failure)
> 388211K->56947K(1040896K), 0.0025053 secs]
> 2016-06-10T09:53:05.303+0000: 18.117: [GC (Allocation Failure)
> 390771K->59701K(1041408K), 0.0054432 secs]
> 2016-06-10T09:53:05.757+0000: 18.571: [GC (Allocation Failure)
> 394037K->60303K(1040896K), 0.0053059 secs]
> 2016-06-10T09:53:06.208+0000: 19.022: [GC (Allocation Failure)
> 394639K->63759K(1041408K), 0.0024715 secs]
> 2016-06-10T09:53:06.661+0000: 19.475: [GC (Allocation Failure)
> 398607K->66585K(1041408K), 0.0039495 secs]
> 2016-06-10T09:53:07.109+0000: 19.924: [GC (Allocation Failure)
> 401433K->67278K(1041408K), 0.0056410 secs]
> 2016-06-10T09:53:07.563+0000: 20.377: [GC (Allocation Failure)
> 402126K->70798K(1041408K), 0.0031031 secs]
> 2016-06-10T09:53:08.038+0000: 20.853: [GC (Allocation Failure)
> 405646K->73652K(1041408K), 0.0057616 secs]
>
> ​
>
> Can anyone help?
>
>
> Regards,
> Jack
>