You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Simon Ratner <si...@proxy.co> on 2016/06/30 19:39:00 UTC

Disconnect reason wrong on local termination

Hi devs,

I've been looking at the reasons I get in BLE_GAP_EVENT_DISCONNECT to make
sure I cover all edge cases, and this is what I see (on an nrf51-based
board):

      switch (ctxt->disconnect.reason) {
        case BLE_HS_ENOTCONN:
          /* I see this when the local host has terminated the connection.
Feels like this should
           * be BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL) instead, per spec.
*/
          break;
        case BLE_HS_HCI_ERR(BLE_ERR_CONN_SPVN_TMO):
          /* Supervision timeout, I see this occasionally. */
          break;
        case BLE_HS_HCI_ERR(BLE_ERR_REM_USER_CONN_TERM):
        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_RESRCS):
        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_PWROFF):
          /* Remote termination, I see the first one when remote peer
terminates the connection. */
          break;
        case BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL):
          /* Local termination, I never see this one. */
          break;
      }

Local termination should not be setting reason to BLE_HS_ENOTCONN (which,
when mapped onto the space of error codes in 4.2 vol 2 part D, ends up
being BLE_ERR_MEM_CAPACITY, confusingly).

Re: Disconnect reason wrong on local termination

Posted by Christopher Collins <cc...@apache.org>.
On Thu, Jun 30, 2016 at 12:39:00PM -0700, Simon Ratner wrote:
> Hi devs,
> 
> I've been looking at the reasons I get in BLE_GAP_EVENT_DISCONNECT to make
> sure I cover all edge cases, and this is what I see (on an nrf51-based
> board):
> 
>       switch (ctxt->disconnect.reason) {
>         case BLE_HS_ENOTCONN:
>           /* I see this when the local host has terminated the connection.
> Feels like this should
>            * be BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL) instead, per spec.

I thought reporting the HCI error code for locally-terminated
connections would be confusing.  However, I have to admit I am not
really sure what I was thinking there... consistently reporting the HCI
reason code seems to make more sense.  If there are no objections from
anyone else, I will make this change to develop shortly.

Thanks,
Chris

Re: Disconnect reason wrong on local termination

Posted by Simon Ratner <si...@proxy.co>.
Just following up on the BLE_HS_ENOMEM, the most obvious place where
ble_gattc_disc_svc_by_uuid
returns that is when it doesn't have a free gattc_proc.

I have max_gattc_procs set to NIMBLE_OPT(MAX_CONNECTIONS), any reason why
that would not be enough? Is it possible procs are not being returned in
some situations? Or do you think this rc is coming from someplace deeper in
the stack?


On Thu, Jun 30, 2016 at 1:31 PM, Simon Ratner <si...@proxy.co> wrote:

> It is sometimes hard to tell if they are "expected", but below are a
> couple of sample traces.
>
> Based on the timestamps, I would say the first two are expected (device
> was slow to respond), while the last one is unexpected. In the last trace,
> the gatt op failed for another reason (rc=6 i.e. BLE_HS_ENOMEM... as an
> aside, any idea which specific resource it may have run out of?) then
> nothing for >2500ms, followed by a supervision timeout. In fact, now that
> you mention it, I have seen relays as long as 20~30sec before eventually
> hitting the supervision timeout - I had previously assumed it was some sort
> of an idle timeout triggered by the other end (a phone).
>
> ---8<---
>
> 844456:[ts=824663680ssb, mod=4 level=1] GAP procedure initiated: connect;
> addr_type=1 addr=4d:ad:24:22:94:a5 scan_itvl=16 scan_window=16 itvl_min=24
> itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
> 845483:[ts=825666608ssb, mod=64 level=1] gatt_cli: connection established;
> status=0 handle=2 peer_id_addr=4d:ad:24:22:94:a5
> peer_ota_addr=4d:ad:24:22:94:a5 conn_itvl=40 conn_latency=0
> supervision_timeout=256
> 845505:[ts=825688080ssb, mod=4 level=1] GATT procedure initiated: discover
> service by uuid; uuid=0304xxxx...
> 845789:[ts=825965264ssb, mod=64 level=2] gatt_cli: failed to read svc;
> conn_handle=2 status=7
> 845801:[ts=825976976ssb, mod=64 level=1] gatt_cli: disconnected;
> conn_handle=2 reason=520
>
> ---8<---
>
> 894919:[ts=873943792ssb, mod=4 level=1] GAP procedure initiated: connect;
> addr_type=1 addr=4d:ad:24:22:94:a5 scan_itvl=16 scan_window=16 itvl_min=24
> itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
> 895991:[ts=874990640ssb, mod=64 level=1] gatt_cli: connection established;
> status=0 handle=1 peer_id_addr=4d:ad:24:22:94:a5
> peer_ota_addr=4d:ad:24:22:94:a5 conn_itvl=40 conn_latency=0
> supervision_timeout=256
> 896013:[ts=875012688ssb, mod=4 level=1] GATT procedure initiated: discover
> service by uuid; uuid=0304xxxx
> 896096:[ts=875093696ssb, mod=4 level=1] GATT procedure initiated: read by
> uuid; start_handle=40 end_handle=65535 uuid=0304yyyy
> 906398:[ts=885154208ssb, mod=64 level=2] gatt_cli: failed to read chr;
> conn_handle=1 status=7
> 906408:[ts=885163968ssb, mod=64 level=1] gatt_cli: disconnected;
> conn_handle=1 reason=520
>
> ---8<---
>
> 101152:[ts=98780800ssb, mod=4 level=1] GAP procedure initiated: connect;
> addr_type=1 addr=58:cc:bb:09:9e:f1 scan_itvl=16 scan_window=16 itvl_min=24
> itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
> 101967:[ts=99576816ssb, mod=64 level=1] gatt_cli: connection established;
> status=0 handle=4 peer_id_addr=58:cc:bb:09:9e:f1
> peer_ota_addr=58:cc:bb:09:9e:f1 conn_itvl=40 conn_latency=0
> supervision_timeout=256
> 101989:[ts=99598288ssb, mod=4 level=1] GATT procedure initiated: discover
> service by uuid; uuid=0304xxxx
> 102002:[ts=99610976ssb, mod=64 level=2] gatt_cli: failed to read svc;
> conn_handle=4 rc=6
> 104643:[ts=102190320ssb, mod=64 level=1] gatt_cli: disconnected;
> conn_handle=4 reason=520
>
> ---8<---
>
>
>
> On Thu, Jun 30, 2016 at 12:45 PM, will sanfilippo <wi...@runtime.io>
> wrote:
>
>> Quick question: do you see supervision timeouts when you do not expect
>> them? If so, does this happen more/less when you are either a master or a
>> slave? I know this gets a bit tricky at times to “debug”, but if you are
>> seeing unexpected supervision timeouts do they occur very close to when the
>> device says it is connected (returns the connection complete event)? During
>> the bluetooth unplugfest we realized that the controller is not properly
>> returning the “connection failed to be established” error; it is returning
>> a supervision timeout.
>>
>> I would be interested to know if the devices are connecting, staying
>> connected for a while, then timing out…
>>
>>
>> > On Jun 30, 2016, at 12:39 PM, Simon Ratner <si...@proxy.co> wrote:
>> >
>> > Hi devs,
>> >
>> > I've been looking at the reasons I get in BLE_GAP_EVENT_DISCONNECT to
>> make
>> > sure I cover all edge cases, and this is what I see (on an nrf51-based
>> > board):
>> >
>> >      switch (ctxt->disconnect.reason) {
>> >        case BLE_HS_ENOTCONN:
>> >          /* I see this when the local host has terminated the
>> connection.
>> > Feels like this should
>> >           * be BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL) instead, per
>> spec.
>> > */
>> >          break;
>> >        case BLE_HS_HCI_ERR(BLE_ERR_CONN_SPVN_TMO):
>> >          /* Supervision timeout, I see this occasionally. */
>> >          break;
>> >        case BLE_HS_HCI_ERR(BLE_ERR_REM_USER_CONN_TERM):
>> >        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_RESRCS):
>> >        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_PWROFF):
>> >          /* Remote termination, I see the first one when remote peer
>> > terminates the connection. */
>> >          break;
>> >        case BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL):
>> >          /* Local termination, I never see this one. */
>> >          break;
>> >      }
>> >
>> > Local termination should not be setting reason to BLE_HS_ENOTCONN
>> (which,
>> > when mapped onto the space of error codes in 4.2 vol 2 part D, ends up
>> > being BLE_ERR_MEM_CAPACITY, confusingly).
>>
>>
>

Re: Disconnect reason wrong on local termination

Posted by Simon Ratner <si...@proxy.co>.
> Based on the timestamps, I would say the first two are expected
>
...

> 896096:[ts=875093696ssb, mod=4 level=1] GATT procedure initiated: read by
> uuid; start_handle=40 end_handle=65535 uuid=0304yyyy
> 906398:[ts=885154208ssb, mod=64 level=2] gatt_cli: failed to read chr;
> conn_handle=1 status=7
> 906408:[ts=885163968ssb, mod=64 level=1] gatt_cli: disconnected;
> conn_handle=1 reason=520
>

I misread that timestamp, thats a >10s gap, not 300ms.

Re: Disconnect reason wrong on local termination

Posted by Christopher Collins <cc...@apache.org>.
On Thu, Jun 30, 2016 at 01:31:00PM -0700, Simon Ratner wrote:
> It is sometimes hard to tell if they are "expected", but below are a couple
> of sample traces.
> 
> Based on the timestamps, I would say the first two are expected (device was
> slow to respond), while the last one is unexpected. In the last trace, the
> gatt op failed for another reason (rc=6 i.e. BLE_HS_ENOMEM... as an aside,
> any idea which specific resource it may have run out of?)

My guess is that the BLE_HS_ENOMEM is caused by a lack of mbufs.  The
problem is not too few gattc procedures; if it were, the initial call to
ble_disc_svc_by_uuid() would have returned the error instead.

This tells me two things:

1. We need to be more efficient with mbufs.  In particular, nimble
should be able to operate with smaller (and hence more) mbufs.  This is
something we have known about for a while but haven't gotten a chance to
address yet.  At the moment, nimble requires large mbufs (>= 260 bytes).
This obviously has an impact on the size of the mbuf pool your
application can allocate.

2. The GATT layer should not give up so easily when the layer below it
(ATT) fails due to memory exhaustion.  It should just try again a bit
later.

If I recall, your application is scanning most of the time.  I have not
initated many GATT procedures while scanning, and I think this might be
causing the mbuf shortage, so this is something I will be messing with.

In the meantime, you might try the following:

* Increase the number of mbufs allocated to msys, if your platform has
capacity for it.

* Disable logging (perhaps a non-starter, I know).  Logging to the
console is quite slow, and mbufs are being held in queues rather than
being processed while your application is logging.

Re: Disconnect reason wrong on local termination

Posted by Simon Ratner <si...@proxy.co>.
It is sometimes hard to tell if they are "expected", but below are a couple
of sample traces.

Based on the timestamps, I would say the first two are expected (device was
slow to respond), while the last one is unexpected. In the last trace, the
gatt op failed for another reason (rc=6 i.e. BLE_HS_ENOMEM... as an aside,
any idea which specific resource it may have run out of?) then nothing for
>2500ms, followed by a supervision timeout. In fact, now that you mention
it, I have seen relays as long as 20~30sec before eventually hitting the
supervision timeout - I had previously assumed it was some sort of an idle
timeout triggered by the other end (a phone).

---8<---

844456:[ts=824663680ssb, mod=4 level=1] GAP procedure initiated: connect;
addr_type=1 addr=4d:ad:24:22:94:a5 scan_itvl=16 scan_window=16 itvl_min=24
itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
845483:[ts=825666608ssb, mod=64 level=1] gatt_cli: connection established;
status=0 handle=2 peer_id_addr=4d:ad:24:22:94:a5
peer_ota_addr=4d:ad:24:22:94:a5 conn_itvl=40 conn_latency=0
supervision_timeout=256
845505:[ts=825688080ssb, mod=4 level=1] GATT procedure initiated: discover
service by uuid; uuid=0304xxxx...
845789:[ts=825965264ssb, mod=64 level=2] gatt_cli: failed to read svc;
conn_handle=2 status=7
845801:[ts=825976976ssb, mod=64 level=1] gatt_cli: disconnected;
conn_handle=2 reason=520

---8<---

894919:[ts=873943792ssb, mod=4 level=1] GAP procedure initiated: connect;
addr_type=1 addr=4d:ad:24:22:94:a5 scan_itvl=16 scan_window=16 itvl_min=24
itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
895991:[ts=874990640ssb, mod=64 level=1] gatt_cli: connection established;
status=0 handle=1 peer_id_addr=4d:ad:24:22:94:a5
peer_ota_addr=4d:ad:24:22:94:a5 conn_itvl=40 conn_latency=0
supervision_timeout=256
896013:[ts=875012688ssb, mod=4 level=1] GATT procedure initiated: discover
service by uuid; uuid=0304xxxx
896096:[ts=875093696ssb, mod=4 level=1] GATT procedure initiated: read by
uuid; start_handle=40 end_handle=65535 uuid=0304yyyy
906398:[ts=885154208ssb, mod=64 level=2] gatt_cli: failed to read chr;
conn_handle=1 status=7
906408:[ts=885163968ssb, mod=64 level=1] gatt_cli: disconnected;
conn_handle=1 reason=520

---8<---

101152:[ts=98780800ssb, mod=4 level=1] GAP procedure initiated: connect;
addr_type=1 addr=58:cc:bb:09:9e:f1 scan_itvl=16 scan_window=16 itvl_min=24
itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=16 max_ce_len=768
101967:[ts=99576816ssb, mod=64 level=1] gatt_cli: connection established;
status=0 handle=4 peer_id_addr=58:cc:bb:09:9e:f1
peer_ota_addr=58:cc:bb:09:9e:f1 conn_itvl=40 conn_latency=0
supervision_timeout=256
101989:[ts=99598288ssb, mod=4 level=1] GATT procedure initiated: discover
service by uuid; uuid=0304xxxx
102002:[ts=99610976ssb, mod=64 level=2] gatt_cli: failed to read svc;
conn_handle=4 rc=6
104643:[ts=102190320ssb, mod=64 level=1] gatt_cli: disconnected;
conn_handle=4 reason=520

---8<---



On Thu, Jun 30, 2016 at 12:45 PM, will sanfilippo <wi...@runtime.io> wrote:

> Quick question: do you see supervision timeouts when you do not expect
> them? If so, does this happen more/less when you are either a master or a
> slave? I know this gets a bit tricky at times to “debug”, but if you are
> seeing unexpected supervision timeouts do they occur very close to when the
> device says it is connected (returns the connection complete event)? During
> the bluetooth unplugfest we realized that the controller is not properly
> returning the “connection failed to be established” error; it is returning
> a supervision timeout.
>
> I would be interested to know if the devices are connecting, staying
> connected for a while, then timing out…
>
>
> > On Jun 30, 2016, at 12:39 PM, Simon Ratner <si...@proxy.co> wrote:
> >
> > Hi devs,
> >
> > I've been looking at the reasons I get in BLE_GAP_EVENT_DISCONNECT to
> make
> > sure I cover all edge cases, and this is what I see (on an nrf51-based
> > board):
> >
> >      switch (ctxt->disconnect.reason) {
> >        case BLE_HS_ENOTCONN:
> >          /* I see this when the local host has terminated the connection.
> > Feels like this should
> >           * be BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL) instead, per spec.
> > */
> >          break;
> >        case BLE_HS_HCI_ERR(BLE_ERR_CONN_SPVN_TMO):
> >          /* Supervision timeout, I see this occasionally. */
> >          break;
> >        case BLE_HS_HCI_ERR(BLE_ERR_REM_USER_CONN_TERM):
> >        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_RESRCS):
> >        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_PWROFF):
> >          /* Remote termination, I see the first one when remote peer
> > terminates the connection. */
> >          break;
> >        case BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL):
> >          /* Local termination, I never see this one. */
> >          break;
> >      }
> >
> > Local termination should not be setting reason to BLE_HS_ENOTCONN (which,
> > when mapped onto the space of error codes in 4.2 vol 2 part D, ends up
> > being BLE_ERR_MEM_CAPACITY, confusingly).
>
>

Re: Disconnect reason wrong on local termination

Posted by will sanfilippo <wi...@runtime.io>.
Quick question: do you see supervision timeouts when you do not expect them? If so, does this happen more/less when you are either a master or a slave? I know this gets a bit tricky at times to “debug”, but if you are seeing unexpected supervision timeouts do they occur very close to when the device says it is connected (returns the connection complete event)? During the bluetooth unplugfest we realized that the controller is not properly returning the “connection failed to be established” error; it is returning a supervision timeout.

I would be interested to know if the devices are connecting, staying connected for a while, then timing out…


> On Jun 30, 2016, at 12:39 PM, Simon Ratner <si...@proxy.co> wrote:
> 
> Hi devs,
> 
> I've been looking at the reasons I get in BLE_GAP_EVENT_DISCONNECT to make
> sure I cover all edge cases, and this is what I see (on an nrf51-based
> board):
> 
>      switch (ctxt->disconnect.reason) {
>        case BLE_HS_ENOTCONN:
>          /* I see this when the local host has terminated the connection.
> Feels like this should
>           * be BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL) instead, per spec.
> */
>          break;
>        case BLE_HS_HCI_ERR(BLE_ERR_CONN_SPVN_TMO):
>          /* Supervision timeout, I see this occasionally. */
>          break;
>        case BLE_HS_HCI_ERR(BLE_ERR_REM_USER_CONN_TERM):
>        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_RESRCS):
>        case BLE_HS_HCI_ERR(BLE_ERR_RD_CONN_TERM_PWROFF):
>          /* Remote termination, I see the first one when remote peer
> terminates the connection. */
>          break;
>        case BLE_HS_HCI_ERR(BLE_ERR_CONN_TERM_LOCAL):
>          /* Local termination, I never see this one. */
>          break;
>      }
> 
> Local termination should not be setting reason to BLE_HS_ENOTCONN (which,
> when mapped onto the space of error codes in 4.2 vol 2 part D, ends up
> being BLE_ERR_MEM_CAPACITY, confusingly).