You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Nitin Shah <ns...@btisystems.com> on 2014/03/25 19:55:06 UTC

Qpid and Behavior on NTP time change

Hi ,

I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.

Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:


1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.

2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running. THERE IS A SHIFT IN THE TIME BASE.

Can someone explain what is going on with this?

Any guidance and may be suggestions will be welcome. We would like to understand what is happening internally and see if we are doing something wrong.

Thanks

Nitin

RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Andrew,

We have this problem whether we set it forward or backwards. I agree that we should prevent it being set backwards, but we have an issue in either direction. 

Nitin

Ps we can try and gather more information. Any suggestions as to what would help? 

-----Original Message-----
From: Andrew Stitcher [mailto:astitcher@redhat.com] 
Sent: Wednesday, March 26, 2014 11:20 AM
To: dev@qpid.apache.org
Cc: Gordon Sim (gsim@redhat.com); Atif Faheem; Tony Hart; Lynne Morrison
Subject: Re: Qpid and Behavior on NTP time change

On Wed, 2014-03-26 at 10:56 -0400, Nitin Shah wrote:
> Hi Andrew and Qpid team,
> 
> We are really blocked by this issue and need to have a conversation going about the options to resolve our issue. I am adding others from our team so they can chime in if necessary.

As I said in my initial reply. If anything on the system sets the time backwards some running programs will have a problem. As you have determined qpid is one of those programs. The fact that it is messages processed by qpid itself is not the issue.

Tell your users not to set the time backwards/check for it and don't allow it.

As this is somewhat supposition on my part from knowing common problems it may not be your actual issue. Please check and give more details as to exactly what is the problem.

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Andrew Stitcher <as...@redhat.com>.
On Wed, 2014-03-26 at 10:56 -0400, Nitin Shah wrote:
> Hi Andrew and Qpid team,
> 
> We are really blocked by this issue and need to have a conversation going about the options to resolve our issue. I am adding others from our team so they can chime in if necessary.

As I said in my initial reply. If anything on the system sets the time
backwards some running programs will have a problem. As you have
determined qpid is one of those programs. The fact that it is messages
processed by qpid itself is not the issue.

Tell your users not to set the time backwards/check for it and don't
allow it.

As this is somewhat supposition on my part from knowing common problems
it may not be your actual issue. Please check and give more details as
to exactly what is the problem.

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Andrew Stitcher <as...@redhat.com>.
On Wed, 2014-03-26 at 11:34 -0400, Nitin Shah wrote:
> We have been looking at the code and wondered if CLOCK_MONOTONIC can be used.  Qpid uses  AbsTime everywhere. 
> Now it is possible that Qpid has decided to use REALTIME because there is the need for a real time clock between machines running Qpid; there might be timers in the client & the daemon that depend on using the same time reference.
> 
> Any thoughts?

If you want to try a local patch using CLOCK_MONOTONIC would be a good
approach.

However note that there are at least a couple of issues with just
changing the AbsTime implementation.

The Timer code uses pthread_cond_timedwait() for scheduling. This
assumes that it is given times from CLOCK_REALTIME by default. So you
will also need to change the way that Condition is constructed to use
CLOCK_MONOTONIC instead. (use pthread_condattr_init(),
pthread_condattr_setclock() to create an condattr struct and then attach
it to the condvar when making it with pthread_cond_init().

There are places where we use Abstime to construct a date/time - this is
not directly possible with CLOCK_MONOTONIC as the EPOCH would now be the
machine boot not 1/1/1970.

But give this a try and let us know - create a Jira, attaching patches,
that will give us a way to commit your changes.

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
We have been looking at the code and wondered if CLOCK_MONOTONIC can be used.  Qpid uses  AbsTime everywhere. 
Now it is possible that Qpid has decided to use REALTIME because there is the need for a real time clock between machines running Qpid; there might be timers in the client & the daemon that depend on using the same time reference.

Any thoughts?

Nitin




-----Original Message-----
From: Nitin Shah [mailto:nshah@btisystems.com] 
Sent: Wednesday, March 26, 2014 10:56 AM
To: Andrew Stitcher; dev@qpid.apache.org
Cc: Gordon Sim (gsim@redhat.com); Atif Faheem; Tony Hart; Lynne Morrison
Subject: RE: Qpid and Behavior on NTP time change

Hi Andrew and Qpid team,

We are really blocked by this issue and need to have a conversation going about the options to resolve our issue. I am adding others from our team so they can chime in if necessary.

Thanks

Nitin

-----Original Message-----
From: Nitin Shah
Sent: Tuesday, March 25, 2014 4:44 PM
To: Andrew Stitcher; dev@qpid.apache.org
Cc: Gordon Sim (gsim@redhat.com); Nitin Shah
Subject: RE: Qpid and Behavior on NTP time change

Andrew,

We have the NTP running on all the cards and I think I misled you on that. However, we have a CLI that allows the user to set the time and date and that is propagated across all cards via qpid messaging. First the module that has the qpid broker running updates the clock locally and that seems to cause the lock up. If NTP does it , that seems fine.

Nitin

-----Original Message-----
From: Andrew Stitcher [mailto:astitcher@redhat.com]
Sent: Tuesday, March 25, 2014 4:26 PM
To: dev@qpid.apache.org
Cc: Nitin Shah; Gordon Sim (gsim@redhat.com)
Subject: Re: Qpid and Behavior on NTP time change

On Tue, 2014-03-25 at 14:55 -0400, Nitin Shah wrote:
> Hi ,
> 
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
> 
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
> 
> 
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
> 
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running. THERE IS A SHIFT IN THE TIME BASE.
> 
> Can someone explain what is going on with this?
> 
> Any guidance and may be suggestions will be welcome. We would like to understand what is happening internally and see if we are doing something wrong.

Qpid like many other programs will get confused if the time ever goes backwards whilst it is running. I think that there should not be a problem if NTP only adjusts the time forwards.

AFAIR ntp dameons always used to be arranged so that they adjusted the clock without ever setting the time backwards. Because it has been known for many years that many parts of Unix/Linux cannot cope with time going backwards.

Incidentally why are you using AMQP to send NTP messages about when NTP is well adapted for the task itself?

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Hi Andrew and Qpid team,

We are really blocked by this issue and need to have a conversation going about the options to resolve our issue. I am adding others from our team so they can chime in if necessary.

Thanks

Nitin

-----Original Message-----
From: Nitin Shah 
Sent: Tuesday, March 25, 2014 4:44 PM
To: Andrew Stitcher; dev@qpid.apache.org
Cc: Gordon Sim (gsim@redhat.com); Nitin Shah
Subject: RE: Qpid and Behavior on NTP time change

Andrew,

We have the NTP running on all the cards and I think I misled you on that. However, we have a CLI that allows the user to set the time and date and that is propagated across all cards via qpid messaging. First the module that has the qpid broker running updates the clock locally and that seems to cause the lock up. If NTP does it , that seems fine.

Nitin

-----Original Message-----
From: Andrew Stitcher [mailto:astitcher@redhat.com] 
Sent: Tuesday, March 25, 2014 4:26 PM
To: dev@qpid.apache.org
Cc: Nitin Shah; Gordon Sim (gsim@redhat.com)
Subject: Re: Qpid and Behavior on NTP time change

On Tue, 2014-03-25 at 14:55 -0400, Nitin Shah wrote:
> Hi ,
> 
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
> 
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
> 
> 
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
> 
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running. THERE IS A SHIFT IN THE TIME BASE.
> 
> Can someone explain what is going on with this?
> 
> Any guidance and may be suggestions will be welcome. We would like to understand what is happening internally and see if we are doing something wrong.

Qpid like many other programs will get confused if the time ever goes backwards whilst it is running. I think that there should not be a problem if NTP only adjusts the time forwards.

AFAIR ntp dameons always used to be arranged so that they adjusted the clock without ever setting the time backwards. Because it has been known for many years that many parts of Unix/Linux cannot cope with time going backwards.

Incidentally why are you using AMQP to send NTP messages about when NTP is well adapted for the task itself?

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Andrew,

We have the NTP running on all the cards and I think I misled you on that. However, we have a CLI that allows the user to set the time and date and that is propagated across all cards via qpid messaging. First the module that has the qpid broker running updates the clock locally and that seems to cause the lock up. If NTP does it , that seems fine.

Nitin

-----Original Message-----
From: Andrew Stitcher [mailto:astitcher@redhat.com] 
Sent: Tuesday, March 25, 2014 4:26 PM
To: dev@qpid.apache.org
Cc: Nitin Shah; Gordon Sim (gsim@redhat.com)
Subject: Re: Qpid and Behavior on NTP time change

On Tue, 2014-03-25 at 14:55 -0400, Nitin Shah wrote:
> Hi ,
> 
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
> 
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
> 
> 
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
> 
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running. THERE IS A SHIFT IN THE TIME BASE.
> 
> Can someone explain what is going on with this?
> 
> Any guidance and may be suggestions will be welcome. We would like to understand what is happening internally and see if we are doing something wrong.

Qpid like many other programs will get confused if the time ever goes backwards whilst it is running. I think that there should not be a problem if NTP only adjusts the time forwards.

AFAIR ntp dameons always used to be arranged so that they adjusted the clock without ever setting the time backwards. Because it has been known for many years that many parts of Unix/Linux cannot cope with time going backwards.

Incidentally why are you using AMQP to send NTP messages about when NTP is well adapted for the task itself?

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Qpid and Behavior on NTP time change

Posted by Andrew Stitcher <as...@redhat.com>.
On Tue, 2014-03-25 at 14:55 -0400, Nitin Shah wrote:
> Hi ,
> 
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
> 
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
> 
> 
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
> 
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running. THERE IS A SHIFT IN THE TIME BASE.
> 
> Can someone explain what is going on with this?
> 
> Any guidance and may be suggestions will be welcome. We would like to understand what is happening internally and see if we are doing something wrong.

Qpid like many other programs will get confused if the time ever goes
backwards whilst it is running. I think that there should not be a
problem if NTP only adjusts the time forwards.

AFAIR ntp dameons always used to be arranged so that they adjusted the
clock without ever setting the time backwards. Because it has been known
for many years that many parts of Unix/Linux cannot cope with time going
backwards.

Incidentally why are you using AMQP to send NTP messages about when NTP
is well adapted for the task itself?

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Gordon Sim <gs...@redhat.com>.
On 03/26/2014 09:07 PM, Nitin Shah wrote:
> I believe all messages get affected on the sender side. So nothing gets sent and therefore nothing gets received.
 >
> We have enabled heartbeats and yes we set the reconnect option always inbuilt.  We set the connection option = "{reconnect: true, tcp_nodelay: true, heartbeat: 10}";
>
> What we see is that after the date and time get set on the module that has the Qpidd running, the messages stop getting sent. If one restarts the broker it recovers and send messages that may have been previously sent.

How are you determining that the messages aren't being sent? Are you 
monitoring the interaction over the wire?

When you say 'sent' here (as opposed to 'delivered', a distinction you 
made earlier), I'm assuming you mean they are not leaving the client. Is 
that true?

Can you verify whether the client is continually disconnecting due to 
heartbeat issues (changing the clock on the broker is I guess quite 
likely to cause issues with the timers detecting or sending heartbeats)? 
Unless your heartbeat interval is very low, I would still expect you to 
get some messages through on each reconnection, before you get disconnected.

Do your messages have any ttl set on them?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
My responses embedded below

-----Original Message-----
From: Gordon Sim [mailto:gsim@redhat.com] 
Sent: Wednesday, March 26, 2014 12:52 PM
To: dev@qpid.apache.org
Subject: Re: Qpid and Behavior on NTP time change

On 03/25/2014 06:55 PM, Nitin Shah wrote:
> Hi ,
>
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
>
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
>
>
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
>
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running.

Is it only these messages with which you are having problems, or are all messages through qpidd affected? Try to determine whether they are not getting sent or are not getting delivered as those are obviously two different issues. You can use wireshark for example to examine the interactions.

Nitin:: I believe all messages get affected on the sender side. So nothing gets sent and therefore nothing gets received. 
Are you using heartbeats on your connections, and are you using any reconnect logic (inbuilt or application defined)? Are there errors in the qpidd logs?
Nitin:: We have enabled heartbeats and yes we set the reconnect option always inbuilt.  We set the connection option = "{reconnect: true, tcp_nodelay: true, heartbeat: 10}";

What we see is that after the date and time get set on the module that has the Qpidd running, the messages stop getting sent. If one restarts the broker it recovers and send messages that may have been previously sent. 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Hi Gordon and Ted,

Thanks for your responses. Yesterday we tried experiments by turning OFF heartbeats and that had NO problems. I assume that Heartbeats are there to detect client failures ( like they die ). Is there a way to have heartbeats also enabled?

Nitin


______________________________
From: Gordon Sim [gsim@redhat.com]
Sent: Tuesday, April 08, 2014 4:50 AM
To: dev@qpid.apache.org
Subject: Re: Qpid and Behavior on NTP time change

On 04/03/2014 09:50 PM, Nitin Shah wrote:
> Gordon , Qpid team,
>
> Can you explain what that means.

Changing the time will confuse the broker (and the c++ client) in their
processing of heartbeats (since they use the time to determine when
heartbeats should be sent and when they should be expected).

As Andrew suggested earlier in the thread, you could try changing to use
a monotonic clock (by patching your local source). I believe there is a
JIRA open for this, and if you do pursue that route and want to attach a
patch and details of your experiments, that would speed up getting it
into a release.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Gordon Sim <gs...@redhat.com>.
On 04/03/2014 09:50 PM, Nitin Shah wrote:
> Gordon , Qpid team,
>
> Can you explain what that means.

Changing the time will confuse the broker (and the c++ client) in their 
processing of heartbeats (since they use the time to determine when 
heartbeats should be sent and when they should be expected).

As Andrew suggested earlier in the thread, you could try changing to use 
a monotonic clock (by patching your local source). I believe there is a 
JIRA open for this, and if you do pursue that route and want to attach a 
patch and details of your experiments, that would speed up getting it 
into a release.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Ted Ross <tr...@redhat.com>.
Nitin,

What I think Gordon was pointing out is that your log suggests that when 
the time was changed, your broker connections immediately timed-out as 
though they had not seen heartbeats for a long time.

As an experiment, can you try running without heartbeats (or a very long 
heartbeat interval) to see if you have the same problem?

You say the system "locks up".  Can you be very specific about what is 
actually happening?  Can you use the "qpid-stat" tool on the broker 
system to monitor the broker's operation?  The -c option shows active 
connections, -q shows queues and their counters.  Both should shed some 
light on what is causing the perceived lock-up.

-Ted

On 04/03/2014 04:50 PM, Nitin Shah wrote:
> Gordon , Qpid team,
>
> Can you explain what that means. We have the situation where , when the time gets changed and in this case forward , qpid invariably locks up and messages stop flowing. This mode of failure we see a lot off and it seems like there is no solution we have come across. We are gathering logs as I speak by turning on the verbose logs and can post those to you as soon as we have them. Your help in getting to the bottom of this will help us tremendously.
>
> So the scenario is that the module that is running the QPID broker, its time is updated by the linux 'date' command and this time information is then sent to the other modules via QPID messages. As soon as the date change is applied on the other modules ( as a result of receiving the date change message and executing the linux 'date' command) , qpid seems to lock up and no more messages can be transferred. The time change is a jump and could be 15 minutes.
>
> Let me know if I can provide any  further information.
>
> Thanks
>
> Nitin
>
> -----Original Message-----
> From: Gordon Sim [mailto:gsim@redhat.com]
> Sent: Friday, March 28, 2014 11:41 AM
> To: dev@qpid.apache.org
> Subject: Re: Qpid and Behavior on NTP time change
>
> On 03/28/2014 02:50 PM, Nitin Shah wrote:
>> When TIME changed thru the CLI on the module that is running the broker , a time Change time of  a minute through CLI saw the following errors on primary Module ( running the broker ) but no errors on the other modules.
>>
>> 2014-03-27T10:38:34.919457-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37806 timed out: closing
>> 2014-03-27T10:38:34.928981-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37807 timed out: closing
>> 2014-03-27T10:38:34.929669-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37810 timed out: closing
>> 2014-03-27T10:38:34.930076-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37808 timed out: closing
>> 2014-03-27T10:38:34.930490-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37809 timed out: closing
>> 2014-03-27T10:38:34.930855-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37811 timed out: closing
>> 2014-03-27T10:38:34.931833-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37812 timed out: closing
>> 2014-03-27T10:38:34.932094-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37813 timed out: closing
>> 2014-03-27T10:38:34.932484-04:00 scm1 qpidd[28177]: 2014-03-27
>> 10:38:34 [Protocol] error Connection
>> qpid.172.16.0.4:5672-172.16.0.4:37814 timed out: closing
>>
>>
>> Changed time by 10 minutes
>> Saw the same errors as above on primary SCM ( the one running the
>> broker )and saw following errors on payload modules
>>
>> 2014-03-27T10:40:54.413172-04:00 pld0103 TransceiverAgent[9469]:
>> [1.E.Mbus]:  qpid connection failed exception
>> 2014-03-27T10:40:54.435011-04:00 pld0103 DigiAgentSim[9467]:
>> [1.E.Mbus]:  qpid connection failed exception
> This is consistent with heartbeat failures.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Gordon , Qpid team,

Can you explain what that means. We have the situation where , when the time gets changed and in this case forward , qpid invariably locks up and messages stop flowing. This mode of failure we see a lot off and it seems like there is no solution we have come across. We are gathering logs as I speak by turning on the verbose logs and can post those to you as soon as we have them. Your help in getting to the bottom of this will help us tremendously. 

So the scenario is that the module that is running the QPID broker, its time is updated by the linux 'date' command and this time information is then sent to the other modules via QPID messages. As soon as the date change is applied on the other modules ( as a result of receiving the date change message and executing the linux 'date' command) , qpid seems to lock up and no more messages can be transferred. The time change is a jump and could be 15 minutes. 

Let me know if I can provide any  further information.

Thanks

Nitin

-----Original Message-----
From: Gordon Sim [mailto:gsim@redhat.com] 
Sent: Friday, March 28, 2014 11:41 AM
To: dev@qpid.apache.org
Subject: Re: Qpid and Behavior on NTP time change

On 03/28/2014 02:50 PM, Nitin Shah wrote:
> When TIME changed thru the CLI on the module that is running the broker , a time Change time of  a minute through CLI saw the following errors on primary Module ( running the broker ) but no errors on the other modules.
>
> 2014-03-27T10:38:34.919457-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37806 timed out: closing
> 2014-03-27T10:38:34.928981-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37807 timed out: closing
> 2014-03-27T10:38:34.929669-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37810 timed out: closing
> 2014-03-27T10:38:34.930076-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37808 timed out: closing
> 2014-03-27T10:38:34.930490-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37809 timed out: closing
> 2014-03-27T10:38:34.930855-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37811 timed out: closing
> 2014-03-27T10:38:34.931833-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37812 timed out: closing
> 2014-03-27T10:38:34.932094-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37813 timed out: closing
> 2014-03-27T10:38:34.932484-04:00 scm1 qpidd[28177]: 2014-03-27 
> 10:38:34 [Protocol] error Connection 
> qpid.172.16.0.4:5672-172.16.0.4:37814 timed out: closing
>
>
> Changed time by 10 minutes
> Saw the same errors as above on primary SCM ( the one running the 
> broker )and saw following errors on payload modules
>
> 2014-03-27T10:40:54.413172-04:00 pld0103 TransceiverAgent[9469]: 
> [1.E.Mbus]:  qpid connection failed exception
> 2014-03-27T10:40:54.435011-04:00 pld0103 DigiAgentSim[9467]: 
> [1.E.Mbus]:  qpid connection failed exception

This is consistent with heartbeat failures.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Gordon Sim <gs...@redhat.com>.
On 03/28/2014 02:50 PM, Nitin Shah wrote:
> When TIME changed thru the CLI on the module that is running the broker , a time Change time of  a minute through CLI saw the following errors on primary Module ( running the broker ) but no errors on the other modules.
>
> 2014-03-27T10:38:34.919457-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37806 timed out: closing
> 2014-03-27T10:38:34.928981-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37807 timed out: closing
> 2014-03-27T10:38:34.929669-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37810 timed out: closing
> 2014-03-27T10:38:34.930076-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37808 timed out: closing
> 2014-03-27T10:38:34.930490-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37809 timed out: closing
> 2014-03-27T10:38:34.930855-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37811 timed out: closing
> 2014-03-27T10:38:34.931833-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37812 timed out: closing
> 2014-03-27T10:38:34.932094-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37813 timed out: closing
> 2014-03-27T10:38:34.932484-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37814 timed out: closing
>
>
> Changed time by 10 minutes
> Saw the same errors as above on primary SCM ( the one running the broker )and saw following errors on payload modules
>
> 2014-03-27T10:40:54.413172-04:00 pld0103 TransceiverAgent[9469]: [1.E.Mbus]:  qpid connection failed exception
> 2014-03-27T10:40:54.435011-04:00 pld0103 DigiAgentSim[9467]: [1.E.Mbus]:  qpid connection failed exception

This is consistent with heartbeat failures.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


RE: Qpid and Behavior on NTP time change

Posted by Nitin Shah <ns...@btisystems.com>.
Hi Gordon,

I wanted to give you more information from the logs we saw. I am going to try this on the 0.26 release as well. I don't expect the problem to have gone away in the new release. I would like to know if you want other logs turned on to see more info. Let me know what if that helps.

Nitin

LOGS below


When TIME changed thru the CLI on the module that is running the broker , a time Change time of  a minute through CLI saw the following errors on primary Module ( running the broker ) but no errors on the other modules.

2014-03-27T10:38:34.919457-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37806 timed out: closing
2014-03-27T10:38:34.928981-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37807 timed out: closing
2014-03-27T10:38:34.929669-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37810 timed out: closing
2014-03-27T10:38:34.930076-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37808 timed out: closing
2014-03-27T10:38:34.930490-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37809 timed out: closing
2014-03-27T10:38:34.930855-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37811 timed out: closing
2014-03-27T10:38:34.931833-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37812 timed out: closing
2014-03-27T10:38:34.932094-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37813 timed out: closing
2014-03-27T10:38:34.932484-04:00 scm1 qpidd[28177]: 2014-03-27 10:38:34 [Protocol] error Connection qpid.172.16.0.4:5672-172.16.0.4:37814 timed out: closing


Changed time by 10 minutes
Saw the same errors as above on primary SCM ( the one running the broker )and saw following errors on payload modules

2014-03-27T10:40:54.413172-04:00 pld0103 TransceiverAgent[9469]: [1.E.Mbus]:  qpid connection failed exception
2014-03-27T10:40:54.435011-04:00 pld0103 DigiAgentSim[9467]: [1.E.Mbus]:  qpid connection failed exception

-----Original Message-----
From: Gordon Sim [mailto:gsim@redhat.com] 
Sent: Wednesday, March 26, 2014 12:52 PM
To: dev@qpid.apache.org
Subject: Re: Qpid and Behavior on NTP time change

On 03/25/2014 06:55 PM, Nitin Shah wrote:
> Hi ,
>
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
>
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
>
>
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
>
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running.

Is it only these messages with which you are having problems, or are all messages through qpidd affected? Try to determine whether they are not getting sent or are not getting delivered as those are obviously two different issues. You can use wireshark for example to examine the interactions.

Are you using heartbeats on your connections, and are you using any reconnect logic (inbuilt or application defined)? Are there errors in the qpidd logs?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org For additional commands, e-mail: dev-help@qpid.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Qpid and Behavior on NTP time change

Posted by Gordon Sim <gs...@redhat.com>.
On 03/25/2014 06:55 PM, Nitin Shah wrote:
> Hi ,
>
> I have talked to this mailing list before as we are users of QPID in our distributed system. Our architecture has multiple modules connected together via a backplane Ethernet link. We use QPID version 0.24 release both broker and clients are C++.
>
> Recently we have been trying to synchronize time across all  modules using the messaging mechanism. However, we are seeing the following problem and the sequence of operation is this:
>
>
> 1.       The module running the Qpid Broker receives an time update from a NTP server. It updates the local date/time and hwclock ( we are on Linux ) and sends a message that suggests all the other modules to update their clocks accordingly using the messaging in Qpid.
>
> 2.       These messages are not getting sent or are not getting delivered to the other modules until we kill and restart the broker. It seems this must be as a result of the time change that has happened where the broker is running.

Is it only these messages with which you are having problems, or are all 
messages through qpidd affected? Try to determine whether they are not 
getting sent or are not getting delivered as those are obviously two 
different issues. You can use wireshark for example to examine the 
interactions.

Are you using heartbeats on your connections, and are you using any 
reconnect logic (inbuilt or application defined)? Are there errors in 
the qpidd logs?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org