You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Simon Ratner <si...@proxy.co> on 2017/09/05 00:50:28 UTC

Premature supervision timeout

Hi devs,

I am tracking a nimble issue (on nrf52) that seems to surface
intermittently - a supervision timeout when i am not expecting one. Below
is a section of the log, nimble is the master here and as you can see, the
time between connection being established and the tmo error is ~260ms:

007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated: connect;
peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16 scan_window=16
itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512 min_ce_len=16
max_ce_len=768 own_addr_ty
007760 [ts=60624960ssb, mod=64 level=1] peer: connected; conn_handle=14
status=0 addr=73:a0:1a:2e:b1:df
007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated: exchange
mtu
007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated: discover
service by uuid; uuid=..
007793 *** ble_ll_conn.c:2160 *** cputime=61336489 anchor_point=61386140
last_rxd_pdu_cputime=61084251 tmo=300000
007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
status=520 itvl=40 latency=1 tmo=512
007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected; conn_handle=14
reason=520


The line marked with *** is in `ble_ll_conn_event_end`, just before it
reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match the
time when I see the "connected" host event (line #7760), and the anchor
point is presumably the time of the next scheduled conn event. What I
thought was interesting is that the tmo value is 300ms, i.e.
conn_itvl(50ms) * 6 rather than the supervision timeout. The connection sm
is in `BLE_LL_CONN_STATE_CREATED` state, and not
`BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having already
received the "connected" event from the host.

Any ideas what could be going on here?

cheers,
simon

Re: Premature supervision timeout

Posted by Simon Ratner <si...@proxy.co>.
So just increasing the conn itvl hasn't solved it, which makes me suspect
it isn't simply a timing thing, but rather a peer not responding to the
request (or responding in an unexpected way). I tried up to 180ms, which
should give it a full second to receive the first data frame.

014762 [ts=115328104ssb, mod=4 level=1] GAP procedure initiated: connect;
peer_addr_type=1 peer_addr=42:62:84:90:38:f7 scan_itvl=16 scan_window=16
itvl_min=24 itvl_max=144 latency=1 supervision_timeout=512 min_ce_len=16
max_ce_len=768
014792 [ts=115562464ssb, mod=64 level=1] peer: connected; conn_handle=29
status=0 addr=42:62:84:90:38:f7
014794 [ts=115578088ssb, mod=4 level=1] GATT procedure initiated: exchange
mtu
014907 *** ble_ll_conn.c:2160 *** cputime=116849198 anchor_point=117028850
last_rxd_pdu_cputime=115946959 tmo=1080000
014910 [ts=116484344ssb, mod=64 level=1] peer: updated; conn_handle=29
status=520 itvl=144 latency=1 tmo=512
014912 [ts=116499968ssb, mod=64 level=2] peer: failed to exchange mtu;
conn_handle=29 status=7
014915 [ts=116523404ssb, mod=64 level=1] peer: disconnected; conn_handle=29
reason=520

I will try the 1.2 branch and report.


On Tue, Sep 5, 2017 at 12:43 PM, will sanfilippo <wi...@runtime.io> wrote:

> Glad other folks were answering you Simon; I was away for a bit.
>
> I do not have anything else to add. The only thing is that if you do not
> have a sniffer one (possible) way to determine what is going on is to look
> at statistics. You need to look at the stats directly before and after the
> issue which could be a problem. What that will tell you is if the
> peripheral heard the connect request from the master and to determine which
> side is not hearing the data frames. If you want to go that route please
> let me know and I can tell which stats to look for. This might be difficult
> for you to set up if it is that infrequent.
>
> I would also try out the new code base as well if that is easily doable
> for you. Many things were fixed/changed so using that codebase going
> forward would certainly be best (imo)
>
>
> > On Sep 5, 2017, at 11:57 AM, Łukasz Rymanowski <
> lukasz.rymanowski@codecoup.pl> wrote:
> >
> > Hi,
> >
> > On Sep 5, 2017 8:15 PM, "Simon Ratner" <si...@proxy.co> wrote:
> >
> > On Tue, Sep 5, 2017 at 11:01 AM, Łukasz Rymanowski <
> > lukasz.rymanowski@codecoup.pl> wrote:
> >
> >>
> >> Note that this is how BLE works. Master sends LE Create Connection on
> >> Advertising event and assumes connection is created. In this point of
> time
> >> host gets Connection Complete Event according to BT spec. However, for
> the
> >> reasons Will described, it might happen that peer will not answer on any
> > of
> >> the first 6 connection events. In this case connection is considered as
> >> dropped and  Disconnect Complete Event will come with "Connection failed
> > to
> >> be established" goes to host
> >
> >
> > Got it, thanks Łukasz! I am sure that the timeout is happening in exactly
> > this place.Since it seems that in all cases the times are very close, i
> > think i can increase connection interval to make this significantly less
> > frequent.
> >
> >
> > Please let us know the result.
> > BTW Air logs would help to confirm what is happening.
> >
> > \Łukasz
>
>

Re: Premature supervision timeout

Posted by will sanfilippo <wi...@runtime.io>.
Glad other folks were answering you Simon; I was away for a bit.

I do not have anything else to add. The only thing is that if you do not have a sniffer one (possible) way to determine what is going on is to look at statistics. You need to look at the stats directly before and after the issue which could be a problem. What that will tell you is if the peripheral heard the connect request from the master and to determine which side is not hearing the data frames. If you want to go that route please let me know and I can tell which stats to look for. This might be difficult for you to set up if it is that infrequent.

I would also try out the new code base as well if that is easily doable for you. Many things were fixed/changed so using that codebase going forward would certainly be best (imo)


> On Sep 5, 2017, at 11:57 AM, Łukasz Rymanowski <lu...@codecoup.pl> wrote:
> 
> Hi,
> 
> On Sep 5, 2017 8:15 PM, "Simon Ratner" <si...@proxy.co> wrote:
> 
> On Tue, Sep 5, 2017 at 11:01 AM, Łukasz Rymanowski <
> lukasz.rymanowski@codecoup.pl> wrote:
> 
>> 
>> Note that this is how BLE works. Master sends LE Create Connection on
>> Advertising event and assumes connection is created. In this point of time
>> host gets Connection Complete Event according to BT spec. However, for the
>> reasons Will described, it might happen that peer will not answer on any
> of
>> the first 6 connection events. In this case connection is considered as
>> dropped and  Disconnect Complete Event will come with "Connection failed
> to
>> be established" goes to host
> 
> 
> Got it, thanks Łukasz! I am sure that the timeout is happening in exactly
> this place.Since it seems that in all cases the times are very close, i
> think i can increase connection interval to make this significantly less
> frequent.
> 
> 
> Please let us know the result.
> BTW Air logs would help to confirm what is happening.
> 
> \Łukasz


Re: Premature supervision timeout

Posted by Łukasz Rymanowski <lu...@codecoup.pl>.
Hi,

On Sep 5, 2017 8:15 PM, "Simon Ratner" <si...@proxy.co> wrote:

On Tue, Sep 5, 2017 at 11:01 AM, Łukasz Rymanowski <
lukasz.rymanowski@codecoup.pl> wrote:

>
> Note that this is how BLE works. Master sends LE Create Connection on
> Advertising event and assumes connection is created. In this point of time
> host gets Connection Complete Event according to BT spec. However, for the
> reasons Will described, it might happen that peer will not answer on any
of
> the first 6 connection events. In this case connection is considered as
> dropped and  Disconnect Complete Event will come with "Connection failed
to
> be established" goes to host


Got it, thanks Łukasz! I am sure that the timeout is happening in exactly
this place.Since it seems that in all cases the times are very close, i
think i can increase connection interval to make this significantly less
frequent.


Please let us know the result.
BTW Air logs would help to confirm what is happening.

\Łukasz

Re: Premature supervision timeout

Posted by Simon Ratner <si...@proxy.co>.
On Tue, Sep 5, 2017 at 11:01 AM, Łukasz Rymanowski <
lukasz.rymanowski@codecoup.pl> wrote:

>
> Note that this is how BLE works. Master sends LE Create Connection on
> Advertising event and assumes connection is created. In this point of time
> host gets Connection Complete Event according to BT spec. However, for the
> reasons Will described, it might happen that peer will not answer on any of
> the first 6 connection events. In this case connection is considered as
> dropped and  Disconnect Complete Event will come with "Connection failed to
> be established" goes to host


Got it, thanks Łukasz! I am sure that the timeout is happening in exactly
this place. Since it seems that in all cases the times are very close, i
think i can increase connection interval to make this significantly less
frequent.

Re: Premature supervision timeout

Posted by Łukasz Rymanowski <lu...@codecoup.pl>.
Hi Simon,

On 5 September 2017 at 19:40, Simon Ratner <si...@proxy.co> wrote:

> Just found this thread, which has come up a couple of times before (I think
> that's what you were referring to last time we spoke, Will?)
> https://www.mail-archive.com/dev@mynewt.incubator.apache.org/msg02454.html


AFAIK it is not valid anymore as Nimble now exchange features first as
well. After that Nimble takes decision to send LL_LENGTH_REQ or not.
I believe Andrzej fixed that some time ago.


>
> Could this be related, in a sense that the peer is sending some stray
> rejection frame that is causing connection to not be established? I do see
> this with both Android and iOS, for what it's worth (and both iOS10 and
> iOS11). What would be an easy way for me to confirm?
>
>
> On Tue, Sep 5, 2017 at 10:34 AM, Simon Ratner <si...@proxy.co> wrote:
>
> > Indeed that would be an improvement in error reporting :)
>

PR is in place for that. With this we will be sure if connection is
established or not in your scenario.

Note that this is how BLE works. Master sends LE Create Connection on
Advertising event and assumes connection is created. In this point of time
host gets Connection Complete Event according to BT spec. However, for the
reasons Will described, it might happen that peer will not answer on any of
the first 6 connection events. In this case connection is considered as
dropped and  Disconnect Complete Event will come with "Connection failed to
be established" goes to host


> However i am not convinced this is what i am seeing here - see my other
> > response.
> >
> > On Tue, Sep 5, 2017 at 10:28 AM, Łukasz Rymanowski <
> > lukasz.rymanowski@codecoup.pl> wrote:
> >
> >> Hi
> >>
> >> On 5 September 2017 at 19:08, will sanfilippo <wi...@runtime.io> wrote:
> >>
> >> > I do not think this is really an answer but it is the best I can do
> >> > without more information.
> >> >
> >> > When a device initially “connects” the state of the connection is not
> >> > considered established until a data frame is received from the other
> >> device
> >> > in the connection. The initial supervision timeout is 6 connection
> >> > intervals and is not based on the supervision timeout. That is why you
> >> see
> >> > the disconnect in 6 connection intervals.
> >>
> >>
> >> Actually I had plan some time ago to fix the error code on such event,
> >> because in this case we should use CONNECTION FAILED TO BE ESTABLISHED
> >> 0x3e
> >> Will put it on my todo list :)
> >>
> >>
> >> > So either the connect request from the master to the peripheral was
> not
> >> > received by the peripheral or it was received but no further data
> >> packets
> >> > were transferred and that is why the connection dropped.
> >> >
> >> > What version of code are you using? When I did the anchor point/last
> rxd
> >> > cputime math I see the difference between the two is 301889. This
> >> implies
> >> > to me that cputime is counting at 1 MHz and not at 32.768kHz. Which
> also
> >> > implies that you are not using the latest code.
> >> >
> >> > NOTE: I would expect this to happen occasionally, and more
> occasionally
> >> if
> >> > there are alot of devices transmitting in close proximity or if the
> two
> >> > devices connecting dont have a great RF link.
> >> >
> >> >
> >> Łukasz
> >>
> >>
> >> > > On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
> >> > >
> >> > > Hi devs,
> >> > >
> >> > > I am tracking a nimble issue (on nrf52) that seems to surface
> >> > > intermittently - a supervision timeout when i am not expecting one.
> >> Below
> >> > > is a section of the log, nimble is the master here and as you can
> see,
> >> > the
> >> > > time between connection being established and the tmo error is
> ~260ms:
> >> > >
> >> > > 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated:
> >> connect;
> >> > > peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16
> >> scan_window=16
> >> > > itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512
> >> min_ce_len=16
> >> > > max_ce_len=768 own_addr_ty
> >> > > 007760 [ts=60624960ssb, mod=64 level=1] peer: connected;
> >> conn_handle=14
> >> > > status=0 addr=73:a0:1a:2e:b1:df
> >> > > 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated:
> >> exchange
> >> > > mtu
> >> > > 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated:
> >> discover
> >> > > service by uuid; uuid=..
> >> > > 007793 *** ble_ll_conn.c:2160 *** cputime=61336489
> >> anchor_point=61386140
> >> > > last_rxd_pdu_cputime=61084251 tmo=300000
> >> > > 007795 [ts=60898380ssb, mod=64 level=1] peer: updated;
> conn_handle=14
> >> > > status=520 itvl=40 latency=1 tmo=512
> >> > > 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected;
> >> > conn_handle=14
> >> > > reason=520
> >> > >
> >> > >
> >> > > The line marked with *** is in `ble_ll_conn_event_end`, just before
> it
> >> > > reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match
> >> the
> >> > > time when I see the "connected" host event (line #7760), and the
> >> anchor
> >> > > point is presumably the time of the next scheduled conn event. What
> I
> >> > > thought was interesting is that the tmo value is 300ms, i.e.
> >> > > conn_itvl(50ms) * 6 rather than the supervision timeout. The
> >> connection
> >> > sm
> >> > > is in `BLE_LL_CONN_STATE_CREATED` state, and not
> >> > > `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having
> >> already
> >> > > received the "connected" event from the host.
> >> > >
> >> > > Any ideas what could be going on here?
> >> > >
> >> > > cheers,
> >> > > simon
> >> >
> >> >
> >>
> >
> >
>

Re: Premature supervision timeout

Posted by Simon Ratner <si...@proxy.co>.
Just found this thread, which has come up a couple of times before (I think
that's what you were referring to last time we spoke, Will?)
https://www.mail-archive.com/dev@mynewt.incubator.apache.org/msg02454.html

Could this be related, in a sense that the peer is sending some stray
rejection frame that is causing connection to not be established? I do see
this with both Android and iOS, for what it's worth (and both iOS10 and
iOS11). What would be an easy way for me to confirm?


On Tue, Sep 5, 2017 at 10:34 AM, Simon Ratner <si...@proxy.co> wrote:

> Indeed that would be an improvement in error reporting :)
> However i am not convinced this is what i am seeing here - see my other
> response.
>
> On Tue, Sep 5, 2017 at 10:28 AM, Łukasz Rymanowski <
> lukasz.rymanowski@codecoup.pl> wrote:
>
>> Hi
>>
>> On 5 September 2017 at 19:08, will sanfilippo <wi...@runtime.io> wrote:
>>
>> > I do not think this is really an answer but it is the best I can do
>> > without more information.
>> >
>> > When a device initially “connects” the state of the connection is not
>> > considered established until a data frame is received from the other
>> device
>> > in the connection. The initial supervision timeout is 6 connection
>> > intervals and is not based on the supervision timeout. That is why you
>> see
>> > the disconnect in 6 connection intervals.
>>
>>
>> Actually I had plan some time ago to fix the error code on such event,
>> because in this case we should use CONNECTION FAILED TO BE ESTABLISHED
>> 0x3e
>> Will put it on my todo list :)
>>
>>
>> > So either the connect request from the master to the peripheral was not
>> > received by the peripheral or it was received but no further data
>> packets
>> > were transferred and that is why the connection dropped.
>> >
>> > What version of code are you using? When I did the anchor point/last rxd
>> > cputime math I see the difference between the two is 301889. This
>> implies
>> > to me that cputime is counting at 1 MHz and not at 32.768kHz. Which also
>> > implies that you are not using the latest code.
>> >
>> > NOTE: I would expect this to happen occasionally, and more occasionally
>> if
>> > there are alot of devices transmitting in close proximity or if the two
>> > devices connecting dont have a great RF link.
>> >
>> >
>> Łukasz
>>
>>
>> > > On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
>> > >
>> > > Hi devs,
>> > >
>> > > I am tracking a nimble issue (on nrf52) that seems to surface
>> > > intermittently - a supervision timeout when i am not expecting one.
>> Below
>> > > is a section of the log, nimble is the master here and as you can see,
>> > the
>> > > time between connection being established and the tmo error is ~260ms:
>> > >
>> > > 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated:
>> connect;
>> > > peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16
>> scan_window=16
>> > > itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512
>> min_ce_len=16
>> > > max_ce_len=768 own_addr_ty
>> > > 007760 [ts=60624960ssb, mod=64 level=1] peer: connected;
>> conn_handle=14
>> > > status=0 addr=73:a0:1a:2e:b1:df
>> > > 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated:
>> exchange
>> > > mtu
>> > > 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated:
>> discover
>> > > service by uuid; uuid=..
>> > > 007793 *** ble_ll_conn.c:2160 *** cputime=61336489
>> anchor_point=61386140
>> > > last_rxd_pdu_cputime=61084251 tmo=300000
>> > > 007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
>> > > status=520 itvl=40 latency=1 tmo=512
>> > > 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected;
>> > conn_handle=14
>> > > reason=520
>> > >
>> > >
>> > > The line marked with *** is in `ble_ll_conn_event_end`, just before it
>> > > reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match
>> the
>> > > time when I see the "connected" host event (line #7760), and the
>> anchor
>> > > point is presumably the time of the next scheduled conn event. What I
>> > > thought was interesting is that the tmo value is 300ms, i.e.
>> > > conn_itvl(50ms) * 6 rather than the supervision timeout. The
>> connection
>> > sm
>> > > is in `BLE_LL_CONN_STATE_CREATED` state, and not
>> > > `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having
>> already
>> > > received the "connected" event from the host.
>> > >
>> > > Any ideas what could be going on here?
>> > >
>> > > cheers,
>> > > simon
>> >
>> >
>>
>
>

Re: Premature supervision timeout

Posted by Simon Ratner <si...@proxy.co>.
Indeed that would be an improvement in error reporting :)
However i am not convinced this is what i am seeing here - see my other
response.

On Tue, Sep 5, 2017 at 10:28 AM, Łukasz Rymanowski <
lukasz.rymanowski@codecoup.pl> wrote:

> Hi
>
> On 5 September 2017 at 19:08, will sanfilippo <wi...@runtime.io> wrote:
>
> > I do not think this is really an answer but it is the best I can do
> > without more information.
> >
> > When a device initially “connects” the state of the connection is not
> > considered established until a data frame is received from the other
> device
> > in the connection. The initial supervision timeout is 6 connection
> > intervals and is not based on the supervision timeout. That is why you
> see
> > the disconnect in 6 connection intervals.
>
>
> Actually I had plan some time ago to fix the error code on such event,
> because in this case we should use CONNECTION FAILED TO BE ESTABLISHED 0x3e
> Will put it on my todo list :)
>
>
> > So either the connect request from the master to the peripheral was not
> > received by the peripheral or it was received but no further data packets
> > were transferred and that is why the connection dropped.
> >
> > What version of code are you using? When I did the anchor point/last rxd
> > cputime math I see the difference between the two is 301889. This implies
> > to me that cputime is counting at 1 MHz and not at 32.768kHz. Which also
> > implies that you are not using the latest code.
> >
> > NOTE: I would expect this to happen occasionally, and more occasionally
> if
> > there are alot of devices transmitting in close proximity or if the two
> > devices connecting dont have a great RF link.
> >
> >
> Łukasz
>
>
> > > On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
> > >
> > > Hi devs,
> > >
> > > I am tracking a nimble issue (on nrf52) that seems to surface
> > > intermittently - a supervision timeout when i am not expecting one.
> Below
> > > is a section of the log, nimble is the master here and as you can see,
> > the
> > > time between connection being established and the tmo error is ~260ms:
> > >
> > > 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated:
> connect;
> > > peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16
> scan_window=16
> > > itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512 min_ce_len=16
> > > max_ce_len=768 own_addr_ty
> > > 007760 [ts=60624960ssb, mod=64 level=1] peer: connected; conn_handle=14
> > > status=0 addr=73:a0:1a:2e:b1:df
> > > 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated:
> exchange
> > > mtu
> > > 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated:
> discover
> > > service by uuid; uuid=..
> > > 007793 *** ble_ll_conn.c:2160 *** cputime=61336489
> anchor_point=61386140
> > > last_rxd_pdu_cputime=61084251 tmo=300000
> > > 007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
> > > status=520 itvl=40 latency=1 tmo=512
> > > 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected;
> > conn_handle=14
> > > reason=520
> > >
> > >
> > > The line marked with *** is in `ble_ll_conn_event_end`, just before it
> > > reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match
> the
> > > time when I see the "connected" host event (line #7760), and the anchor
> > > point is presumably the time of the next scheduled conn event. What I
> > > thought was interesting is that the tmo value is 300ms, i.e.
> > > conn_itvl(50ms) * 6 rather than the supervision timeout. The connection
> > sm
> > > is in `BLE_LL_CONN_STATE_CREATED` state, and not
> > > `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having already
> > > received the "connected" event from the host.
> > >
> > > Any ideas what could be going on here?
> > >
> > > cheers,
> > > simon
> >
> >
>

Re: Premature supervision timeout

Posted by Łukasz Rymanowski <lu...@codecoup.pl>.
Hi

On 5 September 2017 at 19:08, will sanfilippo <wi...@runtime.io> wrote:

> I do not think this is really an answer but it is the best I can do
> without more information.
>
> When a device initially “connects” the state of the connection is not
> considered established until a data frame is received from the other device
> in the connection. The initial supervision timeout is 6 connection
> intervals and is not based on the supervision timeout. That is why you see
> the disconnect in 6 connection intervals.


Actually I had plan some time ago to fix the error code on such event,
because in this case we should use CONNECTION FAILED TO BE ESTABLISHED 0x3e
Will put it on my todo list :)


> So either the connect request from the master to the peripheral was not
> received by the peripheral or it was received but no further data packets
> were transferred and that is why the connection dropped.
>
> What version of code are you using? When I did the anchor point/last rxd
> cputime math I see the difference between the two is 301889. This implies
> to me that cputime is counting at 1 MHz and not at 32.768kHz. Which also
> implies that you are not using the latest code.
>
> NOTE: I would expect this to happen occasionally, and more occasionally if
> there are alot of devices transmitting in close proximity or if the two
> devices connecting dont have a great RF link.
>
>
Łukasz


> > On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
> >
> > Hi devs,
> >
> > I am tracking a nimble issue (on nrf52) that seems to surface
> > intermittently - a supervision timeout when i am not expecting one. Below
> > is a section of the log, nimble is the master here and as you can see,
> the
> > time between connection being established and the tmo error is ~260ms:
> >
> > 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated: connect;
> > peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16 scan_window=16
> > itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512 min_ce_len=16
> > max_ce_len=768 own_addr_ty
> > 007760 [ts=60624960ssb, mod=64 level=1] peer: connected; conn_handle=14
> > status=0 addr=73:a0:1a:2e:b1:df
> > 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated: exchange
> > mtu
> > 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated: discover
> > service by uuid; uuid=..
> > 007793 *** ble_ll_conn.c:2160 *** cputime=61336489 anchor_point=61386140
> > last_rxd_pdu_cputime=61084251 tmo=300000
> > 007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
> > status=520 itvl=40 latency=1 tmo=512
> > 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected;
> conn_handle=14
> > reason=520
> >
> >
> > The line marked with *** is in `ble_ll_conn_event_end`, just before it
> > reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match the
> > time when I see the "connected" host event (line #7760), and the anchor
> > point is presumably the time of the next scheduled conn event. What I
> > thought was interesting is that the tmo value is 300ms, i.e.
> > conn_itvl(50ms) * 6 rather than the supervision timeout. The connection
> sm
> > is in `BLE_LL_CONN_STATE_CREATED` state, and not
> > `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having already
> > received the "connected" event from the host.
> >
> > Any ideas what could be going on here?
> >
> > cheers,
> > simon
>
>

Re: Premature supervision timeout

Posted by Simon Ratner <si...@proxy.co>.
Thanks Will,

I am indeed running pre-1.1 code at 1MHz. Planning to move to 1.1/1.2
shortly; if you think this is something that would behave better in the new
codebase, that would accelerate that process ;)

I see this happening regularly with just three devices (one nimble, two
phones), with the phones being in close proximity so signal should be
pretty solid.

It is strange to me that the host would report BLE_GAP_EVENT_CONNECT with a
success status to the application before connection is considered
established. Doesn't this imply that at least some response was received
from the peer? If an additional data frame is required, is there something
I can do to elicit a more immediate response from the peer device?
Currently my sequence is `connect` -> `exchange mtu` -> `discover
services`, with the timeout usually happening during service discovery (or
occasionally earlier, during mtu exchange).



On Tue, Sep 5, 2017 at 10:08 AM, will sanfilippo <wi...@runtime.io> wrote:

> I do not think this is really an answer but it is the best I can do
> without more information.
>
> When a device initially “connects” the state of the connection is not
> considered established until a data frame is received from the other device
> in the connection. The initial supervision timeout is 6 connection
> intervals and is not based on the supervision timeout. That is why you see
> the disconnect in 6 connection intervals. So either the connect request
> from the master to the peripheral was not received by the peripheral or it
> was received but no further data packets were transferred and that is why
> the connection dropped.
>
> What version of code are you using? When I did the anchor point/last rxd
> cputime math I see the difference between the two is 301889. This implies
> to me that cputime is counting at 1 MHz and not at 32.768kHz. Which also
> implies that you are not using the latest code.
>
> NOTE: I would expect this to happen occasionally, and more occasionally if
> there are alot of devices transmitting in close proximity or if the two
> devices connecting dont have a great RF link.
>
> > On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
> >
> > Hi devs,
> >
> > I am tracking a nimble issue (on nrf52) that seems to surface
> > intermittently - a supervision timeout when i am not expecting one. Below
> > is a section of the log, nimble is the master here and as you can see,
> the
> > time between connection being established and the tmo error is ~260ms:
> >
> > 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated: connect;
> > peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16 scan_window=16
> > itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512 min_ce_len=16
> > max_ce_len=768 own_addr_ty
> > 007760 [ts=60624960ssb, mod=64 level=1] peer: connected; conn_handle=14
> > status=0 addr=73:a0:1a:2e:b1:df
> > 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated: exchange
> > mtu
> > 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated: discover
> > service by uuid; uuid=..
> > 007793 *** ble_ll_conn.c:2160 *** cputime=61336489 anchor_point=61386140
> > last_rxd_pdu_cputime=61084251 tmo=300000
> > 007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
> > status=520 itvl=40 latency=1 tmo=512
> > 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected;
> conn_handle=14
> > reason=520
> >
> >
> > The line marked with *** is in `ble_ll_conn_event_end`, just before it
> > reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match the
> > time when I see the "connected" host event (line #7760), and the anchor
> > point is presumably the time of the next scheduled conn event. What I
> > thought was interesting is that the tmo value is 300ms, i.e.
> > conn_itvl(50ms) * 6 rather than the supervision timeout. The connection
> sm
> > is in `BLE_LL_CONN_STATE_CREATED` state, and not
> > `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having already
> > received the "connected" event from the host.
> >
> > Any ideas what could be going on here?
> >
> > cheers,
> > simon
>
>

Re: Premature supervision timeout

Posted by will sanfilippo <wi...@runtime.io>.
I do not think this is really an answer but it is the best I can do without more information.

When a device initially “connects” the state of the connection is not considered established until a data frame is received from the other device in the connection. The initial supervision timeout is 6 connection intervals and is not based on the supervision timeout. That is why you see the disconnect in 6 connection intervals. So either the connect request from the master to the peripheral was not received by the peripheral or it was received but no further data packets were transferred and that is why the connection dropped.

What version of code are you using? When I did the anchor point/last rxd cputime math I see the difference between the two is 301889. This implies to me that cputime is counting at 1 MHz and not at 32.768kHz. Which also implies that you are not using the latest code.

NOTE: I would expect this to happen occasionally, and more occasionally if there are alot of devices transmitting in close proximity or if the two devices connecting dont have a great RF link.

> On Sep 4, 2017, at 5:50 PM, Simon Ratner <si...@proxy.co> wrote:
> 
> Hi devs,
> 
> I am tracking a nimble issue (on nrf52) that seems to surface
> intermittently - a supervision timeout when i am not expecting one. Below
> is a section of the log, nimble is the master here and as you can see, the
> time between connection being established and the tmo error is ~260ms:
> 
> 007734 [ts=60421848ssb, mod=4 level=1] GAP procedure initiated: connect;
> peer_addr_type=1 peer_addr=73:a0:1a:2e:b1:df scan_itvl=16 scan_window=16
> itvl_min=24 itvl_max=40 latency=1 supervision_timeout=512 min_ce_len=16
> max_ce_len=768 own_addr_ty
> 007760 [ts=60624960ssb, mod=64 level=1] peer: connected; conn_handle=14
> status=0 addr=73:a0:1a:2e:b1:df
> 007763 [ts=60648396ssb, mod=4 level=1] GATT procedure initiated: exchange
> mtu
> 007765 [ts=60664020ssb, mod=4 level=1] GATT procedure initiated: discover
> service by uuid; uuid=..
> 007793 *** ble_ll_conn.c:2160 *** cputime=61336489 anchor_point=61386140
> last_rxd_pdu_cputime=61084251 tmo=300000
> 007795 [ts=60898380ssb, mod=64 level=1] peer: updated; conn_handle=14
> status=520 itvl=40 latency=1 tmo=512
> 007800 [ts=60937440ssb, mod=64 level=1] peer: disconnected; conn_handle=14
> reason=520
> 
> 
> The line marked with *** is in `ble_ll_conn_event_end`, just before it
> reports `BLE_ERR_CONN_SPVN_TMO`. The time of last PDU seems to match the
> time when I see the "connected" host event (line #7760), and the anchor
> point is presumably the time of the next scheduled conn event. What I
> thought was interesting is that the tmo value is 300ms, i.e.
> conn_itvl(50ms) * 6 rather than the supervision timeout. The connection sm
> is in `BLE_LL_CONN_STATE_CREATED` state, and not
> `BLE_LL_CONN_STATE_ESTABLISHED` as I would've expected, having already
> received the "connected" event from the host.
> 
> Any ideas what could be going on here?
> 
> cheers,
> simon