You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Cadmean <hz...@hotmail.com> on 2015/07/03 07:56:23 UTC

Transport failed, please helpT_T

I have built a broker network with 4 brokers and 5000 clinets. ( I have
changed the max connection of each broker to 2500 in activemq.xml). Every
broker using failover forever to build the connection.
The problem is, after a uncertain time, some clients (about 20 clients which
OS incluing AIX and Suse Linux) start to show the following logs and keep
trying to reconnect the broker. Also, I can see those clients keep getting
online and offline when listening to the system topic
TOPIC://ActiveMQ.Advisory.Connetion.

[2015-06-10 09:05:51,412 [WARN][ActiveMQ Transport:
tcp://83.28.33.224:61616@47664]--Transport (tcp://83.28.33.224:61616@47664)
failed, reason: , attempting to automatically reconnect
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:386)
at org.apache.activemq.openwire.OpenWireFormat.unmarshal
(OpenWireFOrmat.java:258)
at org.apache.activemq.transport.tcp.TcpTransport.readCommand
(TcpTransport.java:221)
at org.apache.activemq.transport.tcp.TcpTransport.doRun
(TcpTransport.java:213)
at org.apache.activemq.transport.tcp.TcpTransport.run
(TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:735)

Enviroment:
Suse Linux 11 sp2,JDK 1.7
the jvm memory is set to 2G




--
View this message in context: http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Transport failed, please helpT_T

Posted by Cadmean <hz...@hotmail.com>.

The problem has been solved. 

After using WireShark, I found both [SYN] and [FIN] package looks good. In
this case, I examined the logic of the code for building/closing connection,
then I found someone changes the code which cause the problem. 

Thanks for your advice. 


Tim Bain wrote
> I wouldn't call it arrogance, but it's definitely a bad assumption (and my
> experience has been that even at large companies, intranets are generally
> less stable and reliable than the Internet as a whole, so assuming that
> your networking department can't possibly do something wrong gives them
> far
> too much credit).  Either way, using WireShark to dig into what's going on
> at a network level is still your best starting place; let us know what you
> find from that and we may be able to help you from there.
> 
> Are you seeing any warnings in the broker logs related to closing
> connections due to inactivity?  That's one thing (of many) that could
> explain EOFExceptions...
> 
> Since the version you're using doesn't have the bug fix I referenced, it's
> possible that upgrading to 5.10.2 or 5.11.1 would fix this.  Do you have
> the ability to try one of those versions in a test environment to see if
> it
> eliminates the problem?
> 
> Also, what technology are you using for your client code?  Java?  C++?
> Perl?
> 
> Tim
> 
> On Wed, Jul 8, 2015 at 6:41 PM, Cadmean &lt;

> hzcadmean@

> &gt; wrote:
> 
>> Thank you very much for your reply. I think it is very helpful.
>>
>> 1. You are right. I should not be that arrogant to say that it cannot be
>> the
>> problem of INTRANET, I will ask the network department for help next
>> week.
>>
>> 2. For now, the 20ish clients experience those connection problems
>> continually. When I chek it today, I found 10 more machines experience
>> the
>> same problems.
>>
>> 3. the Version I use is 5.10.0, sorry for missing that.
>>
>>
>> Tim Bain wrote
>> > Assuming that intranet == "stable network without any firewalls,
>> > misconfigurations, or hiccups" sounds like a huge mistake to me, and
>> even
>> > more so when you've posted a question indicating that your logs are
>> full
>> > of
>> > messages indicating that you have connection problems.  That's not to
>> say
>> > that there can't be bugs in the ActiveMQ code that could cause this
>> > behavior, but it's far from the only possible cause for what you're
>> > seeing.  And I second what Art said: if your security department will
>> > allow
>> > it, you want to use a network sniffer such as WireShark or tcpdump (but
>> > WireShark is generally preferred) to figure out what's going on at a
>> > network level; trying to piece it together from only debug logs is
>> likely
>> > to be difficult.
>> >
>> > Also, to clarify: are you saying that for those 20ish clients who start
>> > experiencing connection problems, they experience those connection
>> > problems
>> > continually?  Or do they recover after a few failures, only to have
>> other
>> > clients fail later?
>> >
>> > One last thing: the version of ActiveMQ you're using is ALWAYS relevant
>> > information, and should be included in any post to this mailing list
>> > asking
>> > for help.  How are we supposed to help figure out what's going on (or
>> if
>> > it's a known bug that's been fixed in a later version) if you don't
>> tell
>> > us
>> > what version you're using?  For example,
>> > https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
>> > 5.11.0, but I have no idea whether you're running a version that has
>> that
>> > fix.
>> >
>> > Tim
>> >
>> > On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <
>>
>> > hzcadmean@
>>
>> > > wrote:
>> >
>> >> 1. Since all the clients are in the INTRANET, I don't think the
>> network
>> >> could
>> >> be a problem, but I will check it anyway.
>> >>
>> >> 2. Right now, I haven't started producing messages. In this case, all
>> the
>> >> clients are just consumers without receving any messages. So I think
>> the
>> >> message redeliveries can not be the cause of the problem.
>> >>
>> >> The next thing I will try to do is opening debug logging to see if
>> there
>> >> is
>> >> any helpful information.
>> >>
>> >> Thanks a lot. :D
>> >>
>> >>
>> >> artnaseef wrote
>> >> > First thing I would look at here is diagnostics from the network
>> level
>> >> > itself.  WireShark or tcpdump can be used to get a better
>> understanding
>> >> of
>> >> > why the connections are dropping.
>> >> >
>> >> > If the network between the client and brokers is unreliable, this
>> will
>> >> > happen a lot and it will significantly interfere with the messaging.
>> >> >
>> >> > Also check the broker log files for any indications of causes of the
>> >> > dropped connections.
>> >> >
>> >> > With all of that said, with the failover transport, these failures
>> >> should
>> >> > be short-lived and all of the applications should continue to
>> operate
>> >> > normally.  The impact of greatest concern coming to mind is the
>> >> increased
>> >> > probability of message redeliveries, but that is a normal occurrence
>> >> with
>> >> > JMS (in other words, applications need to handle this possibility
>> with
>> >> or
>> >> > without these dropped connections).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
>> >> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>> >>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698842.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>





--
View this message in context: http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4699426.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Transport failed, please helpT_T

Posted by Tim Bain <tb...@alumni.duke.edu>.

I wouldn't call it arrogance, but it's definitely a bad assumption (and my
experience has been that even at large companies, intranets are generally
less stable and reliable than the Internet as a whole, so assuming that
your networking department can't possibly do something wrong gives them far
too much credit).  Either way, using WireShark to dig into what's going on
at a network level is still your best starting place; let us know what you
find from that and we may be able to help you from there.

Are you seeing any warnings in the broker logs related to closing
connections due to inactivity?  That's one thing (of many) that could
explain EOFExceptions...

Since the version you're using doesn't have the bug fix I referenced, it's
possible that upgrading to 5.10.2 or 5.11.1 would fix this.  Do you have
the ability to try one of those versions in a test environment to see if it
eliminates the problem?

Also, what technology are you using for your client code?  Java?  C++?
Perl?

Tim

On Wed, Jul 8, 2015 at 6:41 PM, Cadmean <hz...@hotmail.com> wrote:

> Thank you very much for your reply. I think it is very helpful.
>
> 1. You are right. I should not be that arrogant to say that it cannot be
> the
> problem of INTRANET, I will ask the network department for help next week.
>
> 2. For now, the 20ish clients experience those connection problems
> continually. When I chek it today, I found 10 more machines experience the
> same problems.
>
> 3. the Version I use is 5.10.0, sorry for missing that.
>
>
> Tim Bain wrote
> > Assuming that intranet == "stable network without any firewalls,
> > misconfigurations, or hiccups" sounds like a huge mistake to me, and even
> > more so when you've posted a question indicating that your logs are full
> > of
> > messages indicating that you have connection problems.  That's not to say
> > that there can't be bugs in the ActiveMQ code that could cause this
> > behavior, but it's far from the only possible cause for what you're
> > seeing.  And I second what Art said: if your security department will
> > allow
> > it, you want to use a network sniffer such as WireShark or tcpdump (but
> > WireShark is generally preferred) to figure out what's going on at a
> > network level; trying to piece it together from only debug logs is likely
> > to be difficult.
> >
> > Also, to clarify: are you saying that for those 20ish clients who start
> > experiencing connection problems, they experience those connection
> > problems
> > continually?  Or do they recover after a few failures, only to have other
> > clients fail later?
> >
> > One last thing: the version of ActiveMQ you're using is ALWAYS relevant
> > information, and should be included in any post to this mailing list
> > asking
> > for help.  How are we supposed to help figure out what's going on (or if
> > it's a known bug that's been fixed in a later version) if you don't tell
> > us
> > what version you're using?  For example,
> > https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
> > 5.11.0, but I have no idea whether you're running a version that has that
> > fix.
> >
> > Tim
> >
> > On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <
>
> > hzcadmean@
>
> > > wrote:
> >
> >> 1. Since all the clients are in the INTRANET, I don't think the network
> >> could
> >> be a problem, but I will check it anyway.
> >>
> >> 2. Right now, I haven't started producing messages. In this case, all
> the
> >> clients are just consumers without receving any messages. So I think the
> >> message redeliveries can not be the cause of the problem.
> >>
> >> The next thing I will try to do is opening debug logging to see if there
> >> is
> >> any helpful information.
> >>
> >> Thanks a lot. :D
> >>
> >>
> >> artnaseef wrote
> >> > First thing I would look at here is diagnostics from the network level
> >> > itself.  WireShark or tcpdump can be used to get a better
> understanding
> >> of
> >> > why the connections are dropping.
> >> >
> >> > If the network between the client and brokers is unreliable, this will
> >> > happen a lot and it will significantly interfere with the messaging.
> >> >
> >> > Also check the broker log files for any indications of causes of the
> >> > dropped connections.
> >> >
> >> > With all of that said, with the failover transport, these failures
> >> should
> >> > be short-lived and all of the applications should continue to operate
> >> > normally.  The impact of greatest concern coming to mind is the
> >> increased
> >> > probability of message redeliveries, but that is a normal occurrence
> >> with
> >> > JMS (in other words, applications need to handle this possibility with
> >> or
> >> > without these dropped connections).
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> >> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> >>
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698842.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Transport failed, please helpT_T

Posted by Cadmean <hz...@hotmail.com>.

Thank you very much for your reply. I think it is very helpful.

1. You are right. I should not be that arrogant to say that it cannot be the
problem of INTRANET, I will ask the network department for help next week.

2. For now, the 20ish clients experience those connection problems
continually. When I chek it today, I found 10 more machines experience the
same problems. 

3. the Version I use is 5.10.0, sorry for missing that. 


Tim Bain wrote
> Assuming that intranet == "stable network without any firewalls,
> misconfigurations, or hiccups" sounds like a huge mistake to me, and even
> more so when you've posted a question indicating that your logs are full
> of
> messages indicating that you have connection problems.  That's not to say
> that there can't be bugs in the ActiveMQ code that could cause this
> behavior, but it's far from the only possible cause for what you're
> seeing.  And I second what Art said: if your security department will
> allow
> it, you want to use a network sniffer such as WireShark or tcpdump (but
> WireShark is generally preferred) to figure out what's going on at a
> network level; trying to piece it together from only debug logs is likely
> to be difficult.
> 
> Also, to clarify: are you saying that for those 20ish clients who start
> experiencing connection problems, they experience those connection
> problems
> continually?  Or do they recover after a few failures, only to have other
> clients fail later?
> 
> One last thing: the version of ActiveMQ you're using is ALWAYS relevant
> information, and should be included in any post to this mailing list
> asking
> for help.  How are we supposed to help figure out what's going on (or if
> it's a known bug that's been fixed in a later version) if you don't tell
> us
> what version you're using?  For example,
> https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
> 5.11.0, but I have no idea whether you're running a version that has that
> fix.
> 
> Tim
> 
> On Tue, Jul 7, 2015 at 6:32 PM, Cadmean &lt;

> hzcadmean@

> &gt; wrote:
> 
>> 1. Since all the clients are in the INTRANET, I don't think the network
>> could
>> be a problem, but I will check it anyway.
>>
>> 2. Right now, I haven't started producing messages. In this case, all the
>> clients are just consumers without receving any messages. So I think the
>> message redeliveries can not be the cause of the problem.
>>
>> The next thing I will try to do is opening debug logging to see if there
>> is
>> any helpful information.
>>
>> Thanks a lot. :D
>>
>>
>> artnaseef wrote
>> > First thing I would look at here is diagnostics from the network level
>> > itself.  WireShark or tcpdump can be used to get a better understanding
>> of
>> > why the connections are dropping.
>> >
>> > If the network between the client and brokers is unreliable, this will
>> > happen a lot and it will significantly interfere with the messaging.
>> >
>> > Also check the broker log files for any indications of causes of the
>> > dropped connections.
>> >
>> > With all of that said, with the failover transport, these failures
>> should
>> > be short-lived and all of the applications should continue to operate
>> > normally.  The impact of greatest concern coming to mind is the
>> increased
>> > probability of message redeliveries, but that is a normal occurrence
>> with
>> > JMS (in other words, applications need to handle this possibility with
>> or
>> > without these dropped connections).
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>





--
View this message in context: http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698842.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Transport failed, please helpT_T

Posted by Tim Bain <tb...@alumni.duke.edu>.

Assuming that intranet == "stable network without any firewalls,
misconfigurations, or hiccups" sounds like a huge mistake to me, and even
more so when you've posted a question indicating that your logs are full of
messages indicating that you have connection problems.  That's not to say
that there can't be bugs in the ActiveMQ code that could cause this
behavior, but it's far from the only possible cause for what you're
seeing.  And I second what Art said: if your security department will allow
it, you want to use a network sniffer such as WireShark or tcpdump (but
WireShark is generally preferred) to figure out what's going on at a
network level; trying to piece it together from only debug logs is likely
to be difficult.

Also, to clarify: are you saying that for those 20ish clients who start
experiencing connection problems, they experience those connection problems
continually?  Or do they recover after a few failures, only to have other
clients fail later?

One last thing: the version of ActiveMQ you're using is ALWAYS relevant
information, and should be included in any post to this mailing list asking
for help.  How are we supposed to help figure out what's going on (or if
it's a known bug that's been fixed in a later version) if you don't tell us
what version you're using?  For example,
https://issues.apache.org/jira/browse/AMQ-5241 is fixed in 5.10.1 and
5.11.0, but I have no idea whether you're running a version that has that
fix.

Tim

On Tue, Jul 7, 2015 at 6:32 PM, Cadmean <hz...@hotmail.com> wrote:

> 1. Since all the clients are in the INTRANET, I don't think the network
> could
> be a problem, but I will check it anyway.
>
> 2. Right now, I haven't started producing messages. In this case, all the
> clients are just consumers without receving any messages. So I think the
> message redeliveries can not be the cause of the problem.
>
> The next thing I will try to do is opening debug logging to see if there is
> any helpful information.
>
> Thanks a lot. :D
>
>
> artnaseef wrote
> > First thing I would look at here is diagnostics from the network level
> > itself.  WireShark or tcpdump can be used to get a better understanding
> of
> > why the connections are dropping.
> >
> > If the network between the client and brokers is unreliable, this will
> > happen a lot and it will significantly interfere with the messaging.
> >
> > Also check the broker log files for any indications of causes of the
> > dropped connections.
> >
> > With all of that said, with the failover transport, these failures should
> > be short-lived and all of the applications should continue to operate
> > normally.  The impact of greatest concern coming to mind is the increased
> > probability of message redeliveries, but that is a normal occurrence with
> > JMS (in other words, applications need to handle this possibility with or
> > without these dropped connections).
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Transport failed, please helpT_T

Posted by Cadmean <hz...@hotmail.com>.

1. Since all the clients are in the INTRANET, I don't think the network could
be a problem, but I will check it anyway. 

2. Right now, I haven't started producing messages. In this case, all the
clients are just consumers without receving any messages. So I think the
message redeliveries can not be the cause of the problem.

The next thing I will try to do is opening debug logging to see if there is
any helpful information. 

Thanks a lot. :D


artnaseef wrote
> First thing I would look at here is diagnostics from the network level
> itself.  WireShark or tcpdump can be used to get a better understanding of
> why the connections are dropping.
> 
> If the network between the client and brokers is unreliable, this will
> happen a lot and it will significantly interfere with the messaging.
> 
> Also check the broker log files for any indications of causes of the
> dropped connections.
> 
> With all of that said, with the failover transport, these failures should
> be short-lived and all of the applications should continue to operate
> normally.  The impact of greatest concern coming to mind is the increased
> probability of message redeliveries, but that is a normal occurrence with
> JMS (in other words, applications need to handle this possibility with or
> without these dropped connections).





--
View this message in context: http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698757.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Transport failed, please helpT_T

Posted by artnaseef <ar...@artnaseef.com>.

First thing I would look at here is diagnostics from the network level
itself.  WireShark or tcpdump can be used to get a better understanding of
why the connections are dropping.

If the network between the client and brokers is unreliable, this will
happen a lot and it will significantly interfere with the messaging.

Also check the broker log files for any indications of causes of the dropped
connections.

With all of that said, with the failover transport, these failures should be
short-lived and all of the applications should continue to operate normally. 
The impact of greatest concern coming to mind is the increased probability
of message redeliveries, but that is a normal occurrence with JMS (in other
words, applications need to handle this possibility with or without these
dropped connections).



--
View this message in context: http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539p4698752.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Transport failed, please helpT_T

Posted by Christopher Shannon <ch...@gmail.com>.

I don't think there is enough information here to really solve the issue.
The EOFException just means that the clients have been disconnected but it
doesn't say why.  It could be a number of reasons including network issues
causing the disconnect or exceptions on message receive, etc.  Can you try
turning up the logging to debug to see if any more useful information shows
up in the logs?

On Fri, Jul 3, 2015 at 1:56 AM, Cadmean <hz...@hotmail.com> wrote:

> I have built a broker network with 4 brokers and 5000 clinets. ( I have
> changed the max connection of each broker to 2500 in activemq.xml). Every
> broker using failover forever to build the connection.
> The problem is, after a uncertain time, some clients (about 20 clients
> which
> OS incluing AIX and Suse Linux) start to show the following logs and keep
> trying to reconnect the broker. Also, I can see those clients keep getting
> online and offline when listening to the system topic
> TOPIC://ActiveMQ.Advisory.Connetion.
>
> [2015-06-10 09:05:51,412 [WARN][ActiveMQ Transport:
> tcp://83.28.33.224:61616@47664]--Transport (tcp://83.28.33.224:61616
> @47664)
> failed, reason: , attempting to automatically reconnect
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:386)
> at org.apache.activemq.openwire.OpenWireFormat.unmarshal
> (OpenWireFOrmat.java:258)
> at org.apache.activemq.transport.tcp.TcpTransport.readCommand
> (TcpTransport.java:221)
> at org.apache.activemq.transport.tcp.TcpTransport.doRun
> (TcpTransport.java:213)
> at org.apache.activemq.transport.tcp.TcpTransport.run
> (TcpTransport.java:196)
> at java.lang.Thread.run(Thread.java:735)
>
> Enviroment:
> Suse Linux 11 sp2,JDK 1.7
> the jvm memory is set to 2G
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Transport-failed-please-helpT-T-tp4698539.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>