You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Siebo <bl...@gmail.com> on 2017/04/13 09:20:02 UTC

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

I'm facing the same problem and after 2-3 days, ActiveMQ stopped because of
OutOfMemoryError on broker side:



After doing some research, I have some theories:
- TCP connection with EOFException cannot be closed.
- Each TCP connection was run under a separate thread, so the thread cannot
be destroyed due to the TCP connection still alive.
- When the number of threads exceed limitation, it caused OutOfMemoryError.

The problem occurred from last 3 months and I have to restart ActiveMQ
service daily to avoid OutOfMemoryError.
I appreciate all of your ideas and solutions.

Note: I am using ActiveMQ 5.14.1

Thanks,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4724925.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Tim Bain <tb...@alumni.duke.edu>.
That's great news.

Sorry for never responding to your last two emails; I still had them
flagged to come back to, but nothing in the thread dump gave me any idea of
what was going on. Now that you've determined that the behavior was driven
by external forces rather than something within the broker, that makes
complete sense.

Tim

On Aug 1, 2017 3:57 AM, "Siebo" <bl...@gmail.com> wrote:

> Hi,
>
> Finally, my problem was solved. It was a mistake from load balance tool
> which made continuous command call to check activemq alive or not. By some
> reasons, it made the number of thread increase by time and activemq die
> when
> it exceeded maximum threads.
> Thank you all (especially Tim) did help me a lot in resolving this trouble.
>
> Thanks and best regards,
> Siebo
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/org-apache-activemq-broker-TransportConnection-Transport-
> Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4729102.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Siebo <bl...@gmail.com>.
Hi,

Finally, my problem was solved. It was a mistake from load balance tool
which made continuous command call to check activemq alive or not. By some
reasons, it made the number of thread increase by time and activemq die when
it exceeded maximum threads.
Thank you all (especially Tim) did help me a lot in resolving this trouble.

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4729102.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Siebo <bl...@gmail.com>.
Hi Tim,

After quite long time, I got the thread dumps.
I attach only one of 'master' ActiveMQ machine.
thread_dump.log
<http://activemq.2283324.n4.nabble.com/file/n4725874/thread_dump.log>  
Here are weird info I noticed from thread_dump:
- There are 4955 threads in WAITING state.
- Most of them (4917 threads) comes from
org.apache.activemq.thread.DedicatedTaskRunner.runTask.
- There are 4899 threads relate to a queue/topic
mq1-42989-1493220210257-1:1:xxxx

Do you have any ideas from this thread dump?
I wonder if my trouble would be solved if UseDedicatedTaskRunner disabled?

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4725874.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Siebo <bl...@gmail.com>.
Hi Tim,

Sorry I could not reply soon because of some errors from nabble forum on
last Friday.


> So are all client IPs represented proportionally in the EOFExceptions?

No, just only one consumer represented EOFException in ActiveMQ's log. That
consumer is running on the same machine with the broker, so its IP is
127.0.0.1/0:0:0:0:0:0:0:1 in log file.


> You would have to explicitly enable websockets, and your clients would
> have 
> to explicitly use them, so it sounds like you're not using them. That's 
> fine, it just means that JIRA doesn't apply to you. 

I understand that Jira ticket does not apply to me. But I would like to know
how to enable websockets for my broker? Can I do that in ActiveMQ config
activemq.xml?


> How many threads fall into each of the three categories? 

I have no idea which threads fall into which category by looking into log
file. I also attach it.


> Keep in mind that the broker starts threads when clients connect (to read 
> data from the sockets), so having more threads on the active broker isn't 
> entirely unexpected. But the size of the difference might indicate that 
> clients are opening connections for each message or something similar. 
> Since you haven't told us anything about your client workload, I can't say 
> for sure whether the the difference in thread count is expected or is a 
> cause for concern. 

An 'active' broker would receive ~140 messages per hour. And it keep
receiving messages all day.
More info: after 2-3 days, OutOfMemoryError would occurred. We restart both
brokers daily to avoid that error.

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4725184.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Tim Bain <tb...@alumni.duke.edu>.
Siebo,

On Apr 18, 2017 4:28 AM, "Siebo" <bl...@gmail.com> wrote:

Hi Tim,


> OK, for the EOFException, are your brokers behind a load balancer like
> this thread's OP was? It sounds like you're not, so what's on the other
> end of those connections? One possibility is real client processes, or
> another is another broker in a network of brokers setup. In either case,
> are all clients/brokers misbehaving equally, or are some fine and others
> very unhappy?

→ I can tell that all consumers were working fine while EOFException
occurred continuously. Pretty weird huh?


My definition of misbehaving didn't have to mean that the client would
report errors. For example, if the client was continually opening new
connections to the broker without closing the old ones, it would be
misbehaving (not doing what it's supposed to) even though it would
successfully process data and would report no errors.

So are all client IPs represented proportionally in the EOFExceptions?

> The JIRA you linked to was specifically related to websockets; is that a
> configuration you're using?

→ I can't really get your idea. I made no configuration to ActiveMQ's
activemq.xml other than using MySQL persistenceAdapter instead of kahaDB. So
current configuration is using websockets I think.


You would have to explicitly enable websockets, and your clients would have
to explicitly use them, so it sounds like you're not using them. That's
fine, it just means that JIRA doesn't apply to you.

> I completely understand not being able to upgrade the version of ActiveMQ
> on a production server, but I don't buy the argument that it's not
> possible to take a thread dump just because it's a production server.
> Taking a thread dump is not a performance impact, and you should push back
> on whoever is telling you that you're not allowed to do it. Unless, of
> course, you don't care about the thread count and only want to pursue the
> EOFException question.

→ Yes, you're right. My problem is having no right to generate thread dump
on production server. It had already been requested. I hope to receive the
thread dump soon...


I misunderstood: I thought you were saying that your request was denied,
but now I understand that you just needed someone else to perform it. No
problem there.

> Also, what does the stack trace for the EOFException say the broker was
> doing when the EOFException occurred?

→ The broker did nothing in this case I think. It just started and the
EOFException appeared in log file and no message was sent into it (on
'inactive' ActiveMQ server).
As I know, ActiveMQ InactivityMonitor would check for 'inactive' connection
all the time. I wonder if it is the only job that broker was doing while
EOFException occurred.
Here are some lines of log, the EOFException was log about once a min:


Your attachment contains stacks for three threads. The first is a failover
transport that is trying to reconnect because the connection has been lost.
The third is a TCP transport that is waiting for data to be sent to it. The
second is a thread in a thread pool that has not been given any work to do;
thats not a problem, it just means you have some extra capacity in your
thread pool (and that's a good thing). None of these threads look like a
problem, though I'm curious why the failover transport isn't connected and
needs to reconnect.

How many threads fall into each of the three categories?

Keep in mind that the broker starts threads when clients connect (to read
data from the sockets), so having more threads on the active broker isn't
entirely unexpected. But the size of the difference might indicate that
clients are opening connections for each message or something similar.
Since you haven't told us anything about your client workload, I can't say
for sure whether the the difference in thread count is expected or is a
cause for concern.

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nab
ble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-
Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4725022.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Siebo <bl...@gmail.com>.
Hi Tim,


> OK, for the EOFException, are your brokers behind a load balancer like
> this thread's OP was? It sounds like you're not, so what's on the other
> end of those connections? One possibility is real client processes, or
> another is another broker in a network of brokers setup. In either case,
> are all clients/brokers misbehaving equally, or are some fine and others
> very unhappy? 

→ I can tell that all consumers were working fine while EOFException
occurred continuously. Pretty weird huh?

> The JIRA you linked to was specifically related to websockets; is that a
> configuration you're using? 

→ I can't really get your idea. I made no configuration to ActiveMQ's
activemq.xml other than using MySQL persistenceAdapter instead of kahaDB. So
current configuration is using websockets I think.


> I completely understand not being able to upgrade the version of ActiveMQ
> on a production server, but I don't buy the argument that it's not
> possible to take a thread dump just because it's a production server.
> Taking a thread dump is not a performance impact, and you should push back
> on whoever is telling you that you're not allowed to do it. Unless, of
> course, you don't care about the thread count and only want to pursue the
> EOFException question. 

→ Yes, you're right. My problem is having no right to generate thread dump
on production server. It had already been requested. I hope to receive the
thread dump soon...


> Also, what does the stack trace for the EOFException say the broker was
> doing when the EOFException occurred? 

→ The broker did nothing in this case I think. It just started and the
EOFException appeared in log file and no message was sent into it (on
'inactive' ActiveMQ server).
As I know, ActiveMQ InactivityMonitor would check for 'inactive' connection
all the time. I wonder if it is the only job that broker was doing while
EOFException occurred.
Here are some lines of log, the EOFException was log about once a min:


Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4725022.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Tim Bain <tb...@alumni.duke.edu>.
Also, what does the stack trace for the EOFException say the broker was
doing when the EOFException occurred?

On Apr 17, 2017 7:01 AM, "Tim Bain" <tb...@alumni.duke.edu> wrote:

> OK, for the EOFException, are your brokers behind a load balancer like
> this thread's OP was? It sounds like you're not, so what's on the other end
> of those connections? One possibility is real client processes, or another
> is another broker in a network of brokers setup. In either case, are all
> clients/brokers misbehaving equally, or are some fine and others very
> unhappy?
>
> The JIRA you linked to was specifically related to websockets; is that a
> configuration you're using?
>
> I completely understand not being able to upgrade the version of ActiveMQ
> on a production server, but I don't buy the argument that it's not possible
> to take a thread dump just because it's a production server. Taking a
> thread dump is not a performance impact, and you should push back on
> whoever is telling you that you're not allowed to do it. Unless, of course,
> you don't care about the thread count and only want to pursue the EOFException
> question.
>
> BTW, kill -3 will generate a thread dump without needing to install
> additional software, but it generates it to standard out for the broker
> process, so it only helps if standard out has been redirected somewhere you
> can access it. So this might or might not be useful to you.
>
> Tim
>
> On Apr 16, 2017 10:28 PM, "Siebo" <bl...@gmail.com> wrote:
>
> Hi Tim,
>
> First, I want to say thank you for your help.
> I would like to update more information for this trouble:
> I have 2 machines running ActiveMQ as Master-Master (for some reason, I
> used
> another tool to detect which machine would be used as 'active', while the
> other would be 'inactive' instead of using JDBC Master-Slave). I named them
> mq1 and mq2.
> Currently, mq1 is in 'active' while mq2 is in 'inactive' state.
> EOFException was logged continuously in both machines' log file.
>
> I did a number of processes check by executing command:
> top -H
> Result:
> mq1 (active): 5381 (2 running, 5379 sleeping)
> mq2 (inactive): 277 (1 running, 276 sleeping)
>
> EOFException seems not to be the reason for OutOfMemoryError anymore.
> I wonder if a thread leak occurred in mq1, as described in this ticket:
> https://issues.apache.org/jira/browse/AMQ-6482.
>
> Because these machines are being used in production purpose, I cannot
> generate thread dump or upgrade ActiveMQ version unless having an exact
> conclusion.
>
> Thanks and best regards,
> Siebo
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nab
> ble.com/org-apache-activemq-broker-TransportConnection-
> Transport-Transport-Connection-to-tcp-XX-XX-XXX-
> XXX-5n-tp4722840p4724978.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>
>

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Tim Bain <tb...@alumni.duke.edu>.
OK, for the EOFException, are your brokers behind a load balancer like this
thread's OP was? It sounds like you're not, so what's on the other end of
those connections? One possibility is real client processes, or another is
another broker in a network of brokers setup. In either case, are all
clients/brokers misbehaving equally, or are some fine and others very
unhappy?

The JIRA you linked to was specifically related to websockets; is that a
configuration you're using?

I completely understand not being able to upgrade the version of ActiveMQ
on a production server, but I don't buy the argument that it's not possible
to take a thread dump just because it's a production server. Taking a
thread dump is not a performance impact, and you should push back on
whoever is telling you that you're not allowed to do it. Unless, of course,
you don't care about the thread count and only want to pursue the EOFException
question.

BTW, kill -3 will generate a thread dump without needing to install
additional software, but it generates it to standard out for the broker
process, so it only helps if standard out has been redirected somewhere you
can access it. So this might or might not be useful to you.

Tim

On Apr 16, 2017 10:28 PM, "Siebo" <bl...@gmail.com> wrote:

Hi Tim,

First, I want to say thank you for your help.
I would like to update more information for this trouble:
I have 2 machines running ActiveMQ as Master-Master (for some reason, I used
another tool to detect which machine would be used as 'active', while the
other would be 'inactive' instead of using JDBC Master-Slave). I named them
mq1 and mq2.
Currently, mq1 is in 'active' while mq2 is in 'inactive' state.
EOFException was logged continuously in both machines' log file.

I did a number of processes check by executing command:
top -H
Result:
mq1 (active): 5381 (2 running, 5379 sleeping)
mq2 (inactive): 277 (1 running, 276 sleeping)

EOFException seems not to be the reason for OutOfMemoryError anymore.
I wonder if a thread leak occurred in mq1, as described in this ticket:
https://issues.apache.org/jira/browse/AMQ-6482.

Because these machines are being used in production purpose, I cannot
generate thread dump or upgrade ActiveMQ version unless having an exact
conclusion.

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.
nabble.com/org-apache-activemq-broker-TransportConnection-Transport-
Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4724978.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Siebo <bl...@gmail.com>.
Hi Tim,

First, I want to say thank you for your help.
I would like to update more information for this trouble:
I have 2 machines running ActiveMQ as Master-Master (for some reason, I used
another tool to detect which machine would be used as 'active', while the
other would be 'inactive' instead of using JDBC Master-Slave). I named them
mq1 and mq2.
Currently, mq1 is in 'active' while mq2 is in 'inactive' state.
EOFException was logged continuously in both machines' log file.

I did a number of processes check by executing command:
top -H
Result:
mq1 (active): 5381 (2 running, 5379 sleeping)
mq2 (inactive): 277 (1 running, 276 sleeping)

EOFException seems not to be the reason for OutOfMemoryError anymore.
I wonder if a thread leak occurred in mq1, as described in this ticket:
https://issues.apache.org/jira/browse/AMQ-6482.

Because these machines are being used in production purpose, I cannot
generate thread dump or upgrade ActiveMQ version unless having an exact
conclusion.

Thanks and best regards,
Siebo



--
View this message in context: http://activemq.2283324.n4.nabble.com/org-apache-activemq-broker-TransportConnection-Transport-Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4724978.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://XX.XX.XXX.XXX:5445 failed: java.io.EOFException

Posted by Tim Bain <tb...@alumni.duke.edu>.
An EOFException is an indication that the OS believes that the socket is
closed, so at the socket level, the OS will not hold onto large numbers of
sockets under the conditions you describe.  You can confirm this by using
netstat to see how many sockets are open when you reach (or approach) the
failure point.

At the application level (i.e. the Java JVM), it's unlikely that the thread
can't be destroyed due to the exact reason you described (that it can't
tell that the socket is dead), but it's possible that it's not able to exit
due to some other reason. You can check whether this theory might hold
water by seeing how many threads are alive soon after you restart the
broker and comparing that with the number of threads alive just before you
restart the broker the next time. If you have JMX enabled, JConsole or
JVisualVM can both do that, or you can use the jstack command line tool
from a JVM.

Does the OutOfMemoryError indicate that heap space is the resource you ran
out of, or was it something else (threads, file handles, etc.)? If it was
heap, you can use a memory profiler or sampler to see what objects are
being created, which might help someone on this list figure out what's
going on.

Also, did you configure appropriate limits for the memory store so producer
flow control will prevent you from filling memory just from accepting too
many non-persistent messages (or paging in too many messages from a
persistence store)? And is your heap large enough for the way you're using
your broker?

Tim

On Apr 13, 2017 7:34 AM, "Siebo" <bl...@gmail.com> wrote:

I'm facing the same problem and after 2-3 days, ActiveMQ stopped because of
OutOfMemoryError on broker side:



After doing some research, I have some theories:
- TCP connection with EOFException cannot be closed.
- Each TCP connection was run under a separate thread, so the thread cannot
be destroyed due to the TCP connection still alive.
- When the number of threads exceed limitation, it caused OutOfMemoryError.

The problem occurred from last 3 months and I have to restart ActiveMQ
service daily to avoid OutOfMemoryError.
I appreciate all of your ideas and solutions.

Note: I am using ActiveMQ 5.14.1

Thanks,
Siebo



--
View this message in context: http://activemq.2283324.n4.
nabble.com/org-apache-activemq-broker-TransportConnection-Transport-
Transport-Connection-to-tcp-XX-XX-XXX-XXX-5n-tp4722840p4724925.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.