You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Lukasz Wolnik <lu...@gmail.com> on 2018/08/05 19:22:43 UTC

Re: Reducing GATT write attribute's timeout and read attribute's BLE_HS_ENOMEM

Hi Chris,

It's been more than a year but I have finally got some findings.

In my another Mynewt powered device I got the BLE_HS_ENOMEM error
from ble_gattc_write_flat this time. I tried changing BLE_GATT_MAX_PROCS
(down to 2 or up to 8) but with no effect.

My device was keeping a connection live but stopped receiving any
communictations from its peer. No notifications, read, write, etc.. Luckily
it was always happening on the 5th consecutive connection (each previous
ones terminated by my app using ble_gap_terminate(conn_handle,
BLE_ERR_REM_USER_CONN_TERM)).

By changing MSYS_1_BLOCK_COUNT I started to get the BLE_HS_ENOMEM
respectively on:

8 - 4th connection
12 - 5th connection (default value)
16 - 6th connection

Yay! This thing is not random at all.

I assume it can't be NimBLE that's not freeing its resources up so I'm
going to look at my dynamically allocated structure that
uses os_memblock_get functions. I'm going to read up on Mynewt mbufs.

Thank you very much for pointing me into the right direction. I regained my
faith in BLE stack. No more panic attacks while pitching my device and
getting to the demo part :)

Kind regards,
Łukasz

On Mon, May 15, 2017 at 9:09 PM Łukasz Wolnik <lu...@gmail.com>
wrote:

> Hi Szymon,
>
> Thanks for the clarification. I was going to write a queue system for GATT
> writes/reads but it looks like I should rely on lower layer and just
> reconnect if my central app has problem with communicating to peripherial
> devices.
>
> Kind regards,
> Łukasz
>
> On Mon, May 15, 2017 at 9:07 PM, Łukasz Wolnik <lu...@gmail.com>
> wrote:
>
>> Hi Chris,
>>
>> Thanks a lot for your responses. They are very helpful and are radically
>> shaping the way I'm going to develop the second version of my app.
>>
>> I'm pretty sure, when I investigated the issue with gdb, that the problem
>> was not enough GATT procs available. I'll experiment with MSYS_1_BLOCK_COUNT
>> and BLE_GATT_MAX_PROCS and let you know if increasing BLE_GATT_MAX_PROCS
>> helped. Thanks a lot for sharing these two config values. And yes, that'd
>> great if alongside the error it would tell which resource is not available
>> and what are the current limits.
>>
>> Right, so it's a 30 not a 20-second timeout. My app is a wearable item
>> and it's crucial for it to be robust. I think what I can do is to manually
>> disconnect a connection handle when I'm not getting a confirmation within 1
>> second.
>>
>> Kind regards,
>> Łukasz
>>
>> On Mon, May 15, 2017 at 7:06 PM, Christopher Collins <ch...@runtime.io>
>> wrote:
>>
>>> On Mon, May 15, 2017 at 11:01:38AM -0700, Christopher Collins wrote:
>>> > Hi Łukasz,
>>> >
>>> > On Mon, May 15, 2017 at 12:33:59PM +0100, Łukasz Wolnik wrote:
>>> > > Hello,
>>> > >
>>> > > From time to time my ble_gattc_write_flat (run as central) is timing
>>> out
>>> > > after 20 seconds while sending to an Android 6 phone (in peripherial
>>> mode).
>>> > > Is there a way to reduce the timeout to just 1 second? At the moment
>>> if
>>> > > there's an issue with writing, my newt program has to wait 20
>>> seconds until
>>> > > it can respond to a timeout.
>>> > >
>>> > > What's the best strategy here? Keep "bombarding" the peripherial with
>>> > > multiple writes until receiving first confirmation. Or reduce the
>>> timeout
>>> > > from 20 seconds (I don't know where this value is coming from) and
>>> resend
>>> > > only when got an HCI 19 timeout error in the callback?
>>>
>>> Oops, I forgot to respond to your actual question :).  Sorry about that.
>>> The 30-second timeout is hardcoded in the spec, and is currently not
>>> configurable (Vol. 3, Part F, 3.3.3).  It might be useful to make this
>>> configurable, but the device would not be standards compliant.
>>>
>>> Chris
>>>
>>
>>
>

Re: Reducing GATT write attribute's timeout and read attribute's BLE_HS_ENOMEM

Posted by Christopher Collins <ch...@runtime.io>.
On Mon, Aug 06, 2018 at 02:03:22AM +0100, Lukasz Wolnik wrote:
> Hi Chris,
> 
> I have resolved the issue. It wasn't my mbuf structure but MSYS_1's pool
> memory leak (caused by my app).

[...]

> 
> I finally have a stable Mynewt app <-> Android repeated communication even
> on MSYS_1_BLOCK_COUNT = 8! So happy with it. Thanks again for the ride this
> bug turned out to be.

Greet to hear.  Good job chasing the problem down!

Chris

Re: Reducing GATT write attribute's timeout and read attribute's BLE_HS_ENOMEM

Posted by Lukasz Wolnik <lu...@gmail.com>.
Hi Chris,

I have resolved the issue. It wasn't my mbuf structure but MSYS_1's pool
memory leak (caused by my app).

After verifying that the same actions recreated in btshell result in no
MSYS_1 memory leak (BTW, SHELL_TASK and its mpool stat is a great tool!) I
turned my attention back to my code. Ha!

A couple hours later I found below guy in my ble_gattc_read's callback.



static int cb_on_read(..., ..., struct ble_gatt_attr *attr, ...)

        // To *save* time I copy-pasted and inlined below while loop from
the print_mbuf function.
        while (attr->om != NULL)
        {
            strncpy(p_message, attr->om->om_data, attr->om->om_len);

            p_message += attr->om->om_len;
            attr->om = SLIST_NEXT(attr->om, om_next); // GUY
        }

        // To fix above simply use const struct os_mbuf *om = attr->om;
before the while
        // loop and operate on a copy of the received attribute's pointer.


I finally have a stable Mynewt app <-> Android repeated communication even
on MSYS_1_BLOCK_COUNT = 8! So happy with it. Thanks again for the ride this
bug turned out to be.

Kind regards,
Łukasz

On Sun, Aug 5, 2018 at 8:22 PM Lukasz Wolnik <lu...@gmail.com>
wrote:

> Hi Chris,
>
> It's been more than a year but I have finally got some findings.
>
> In my another Mynewt powered device I got the BLE_HS_ENOMEM error
> from ble_gattc_write_flat this time. I tried changing BLE_GATT_MAX_PROCS
> (down to 2 or up to 8) but with no effect.
>
> My device was keeping a connection live but stopped receiving any
> communictations from its peer. No notifications, read, write, etc.. Luckily
> it was always happening on the 5th consecutive connection (each previous
> ones terminated by my app using ble_gap_terminate(conn_handle,
> BLE_ERR_REM_USER_CONN_TERM)).
>
> By changing MSYS_1_BLOCK_COUNT I started to get the BLE_HS_ENOMEM
> respectively on:
>
> 8 - 4th connection
> 12 - 5th connection (default value)
> 16 - 6th connection
>
> Yay! This thing is not random at all.
>
> I assume it can't be NimBLE that's not freeing its resources up so I'm
> going to look at my dynamically allocated structure that
> uses os_memblock_get functions. I'm going to read up on Mynewt mbufs.
>
> Thank you very much for pointing me into the right direction. I regained
> my faith in BLE stack. No more panic attacks while pitching my device and
> getting to the demo part :)
>
> Kind regards,
> Łukasz
>
> On Mon, May 15, 2017 at 9:09 PM Łukasz Wolnik <lu...@gmail.com>
> wrote:
>
>> Hi Szymon,
>>
>> Thanks for the clarification. I was going to write a queue system for
>> GATT writes/reads but it looks like I should rely on lower layer and just
>> reconnect if my central app has problem with communicating to peripherial
>> devices.
>>
>> Kind regards,
>> Łukasz
>>
>> On Mon, May 15, 2017 at 9:07 PM, Łukasz Wolnik <lu...@gmail.com>
>> wrote:
>>
>>> Hi Chris,
>>>
>>> Thanks a lot for your responses. They are very helpful and are radically
>>> shaping the way I'm going to develop the second version of my app.
>>>
>>> I'm pretty sure, when I investigated the issue with gdb, that the
>>> problem was not enough GATT procs available. I'll experiment with MSYS_1_BLOCK_COUNT
>>> and BLE_GATT_MAX_PROCS and let you know if increasing
>>> BLE_GATT_MAX_PROCS helped. Thanks a lot for sharing these two config
>>> values. And yes, that'd great if alongside the error it would tell which
>>> resource is not available and what are the current limits.
>>>
>>> Right, so it's a 30 not a 20-second timeout. My app is a wearable item
>>> and it's crucial for it to be robust. I think what I can do is to manually
>>> disconnect a connection handle when I'm not getting a confirmation within 1
>>> second.
>>>
>>> Kind regards,
>>> Łukasz
>>>
>>> On Mon, May 15, 2017 at 7:06 PM, Christopher Collins <ch...@runtime.io>
>>> wrote:
>>>
>>>> On Mon, May 15, 2017 at 11:01:38AM -0700, Christopher Collins wrote:
>>>> > Hi Łukasz,
>>>> >
>>>> > On Mon, May 15, 2017 at 12:33:59PM +0100, Łukasz Wolnik wrote:
>>>> > > Hello,
>>>> > >
>>>> > > From time to time my ble_gattc_write_flat (run as central) is
>>>> timing out
>>>> > > after 20 seconds while sending to an Android 6 phone (in
>>>> peripherial mode).
>>>> > > Is there a way to reduce the timeout to just 1 second? At the
>>>> moment if
>>>> > > there's an issue with writing, my newt program has to wait 20
>>>> seconds until
>>>> > > it can respond to a timeout.
>>>> > >
>>>> > > What's the best strategy here? Keep "bombarding" the peripherial
>>>> with
>>>> > > multiple writes until receiving first confirmation. Or reduce the
>>>> timeout
>>>> > > from 20 seconds (I don't know where this value is coming from) and
>>>> resend
>>>> > > only when got an HCI 19 timeout error in the callback?
>>>>
>>>> Oops, I forgot to respond to your actual question :).  Sorry about that.
>>>> The 30-second timeout is hardcoded in the spec, and is currently not
>>>> configurable (Vol. 3, Part F, 3.3.3).  It might be useful to make this
>>>> configurable, but the device would not be standards compliant.
>>>>
>>>> Chris
>>>>
>>>
>>>
>>