You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mynewt.apache.org by Szymon Janc <sz...@codecoup.pl> on 2017/01/20 09:21:16 UTC

[RFC] Reducing size of BLE Security Manager

Hi,

I was recently looking on how we could reduce size of SM code.
So my proposal is to change the way PDUs are parsed and constructed.

Instead of having ble_sm_foo_parse(), ble_sm_foo_write() and ble_sm_foo_tx()
for parsing and constructing PDU byte by byte we could use packed structures
for describing PDU and let compiler figure out details related to
unaligned access.

This also reduces number of memcpy since there is no need for intermediate
structures describing PDUs as those are packed now and can be cast  directly
to underlying mbuf buffer.

So now parsing incoming data looks like:
    struct ble_sm_pair_confirm *cmd;

    rc = ble_hs_mbuf_pullup_base(om, sizeof(*cmd));
    if (rc)
        return;

    cmd = (struct ble_sm_pair_confirm *)(*om)->om_data;

/* use cmd-> to access fields */

while constructing and sending PDU:
    struct ble_sm_pair_confirm *cmd;
    struct os_mbuf *txom;

   cmd = ble_sm_cmd_get(BLE_SM_OP_PAIR_CONFIRM, sizeof(*cmd), &txom);
    if (cmd == NULL)
       return;

    /* fill cmd with data */

    ble_sm_tx(proc->conn_handle, txom);


For convenience casting in RX path could be also wrapped in some helper macro.

My experiments shows following code size reduction in net_nimble_host.a when
building bletiny app:   137112 bytes to 136008 bytes so almost 1100
bytes difference.

Code is available at [1]. I'm not making it a pull request yet since
I'd like to get some
feedback about this approach from others. Also I still need to get
tests passing since
SM keys related tests fail now (and I'm not yet sure why).   I tested
this with Android
phone and both legacy and LE SC seems to work just fine.

Also this approach with packed structs could probably be used for
other protocols
like L2CAP or ATT.

Comments are welcome.


[1] https://github.com/sjanc/incubator-mynewt-core/commits/sm

-- 
pozdrawiam
Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by Christopher Collins <cc...@apache.org>.

On Fri, Jan 20, 2017 at 10:21:16AM +0100, Szymon Janc wrote:
> Code is available at [1]. I'm not making it a pull request yet since
> I'd like to get some feedback about this approach from others. Also I
> still need to get tests passing since SM keys related tests fail now
> (and I'm not yet sure why).   I tested this with Android phone and
> both legacy and LE SC seems to work just fine.

(you may already know all this...)

By the way, you can run the unit tests using:

    newt run net/nimble/host/test

When you use the "run" command, the tests get run in gdb, and the first
test failure causes gdb to halt.  This is useful for debugging test
failures.

Some of the host unit tests are a bit... interesting to debug :).  I can
take a look at the failure if you like.

Chris

Re: [RFC] Reducing size of BLE Security Manager

Posted by Szymon Janc <sz...@codecoup.pl>.

Hi,

On 20 January 2017 at 21:42, will sanfilippo <wi...@runtime.io> wrote:
> Hopefully I am not going to drag this discussion on too long, but I like this stuff so…
>
> The cortex-M processors have byte, half-word and word instructions. It will use the appropriate instruction when you access bytes, half-words or words.
>
> For example, here is an excerpt of disassembled code. In this example, req and cp point to non-packed structures. These elements are 16-bits in their respective structures.
>
> /* Copy timeoput from cp to req */
> req->timeout = cp->timeout;
>     deb8:       88fa            ldrh    r2, [r7, #6]
>     deba:       811a           strh    r2, [r3, #8]
>
> This is the same code but with req being packed. Note that cp is not packed:
>
> /* Copy timeout from cp to req */
> req->timeout = cp->timeout;
>     df2a:       7998            ldrb    r0, [r3, #6]
>     df2c:       7010            strb    r0, [r2, #0]
>     df2e:       79db            ldrb    r3, [r3, #7]
>     df30:       7053            strb    r3, [r2, #1]
>
> Since the compiler cannot assume that the 16-bit value is aligned within the req structure, it has to use byte instructions to store the bytes. This gets even worse for unaligned 32-bit values. This is what I was trying to point out as one of the pitfalls of using packed structures re: code size.

Right, but this is pretty much the same as result of (x is 32bit int)

    u8ptr[0] = (uint8_t)x;
    u8ptr[1] = (uint8_t)(x >> 8);
    u8ptr[2] = (uint8_t)(x >> 16);
    u8ptr[3] = (uint8_t)(x >> 24);

which you cannot avoid when constructing PDU do be send over the wire
if not using packed structure for those.

For the record:
 https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/ARM-Options.html
-munaligned-access
-mno-unaligned-access
Enables (or disables) reading and writing of 16- and 32- bit values
from addresses that are not 16- or 32- bit aligned. By default
unaligned access is disabled for all pre-ARMv6 and all ARMv6-M
architectures, and enabled for all other architectures. If unaligned
access is not enabled then words in packed data structures are
accessed a byte at a time.

So I think we should be safe that GCC won't do any crazy things like
taking different code path depending on address being aligned or not.

>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
>
>
> I think I understand what you are getting at here. I agree; you should limit the use of the packed structures to this.
>
> In the end, I really think it depends on what you do with these packed structures as to whether or not you will save space, but with your caveat above (how you use them), I agree that we will probably see code savings given the way the current code is written.
>
> All good! Fun discussion.

Indeed! :)

>
>> On Jan 20, 2017, at 11:45 AM, Szymon Janc <sz...@codecoup.pl> wrote:
>>
>> Hi,
>>
>> On 20 January 2017 at 19:14, Christopher Collins <cc...@apache.org> wrote:
>>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>>> I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).
>>>>
>>>> struct my_struct
>>>> {
>>>>      uint8_t e8;
>>>>      uint16_t e16;
>>>>      uint32_t e32;
>>>> } __packed__          /* I know this syntax is wrong, just an example */
>>>> struct my_struct my_data
>>>>
>>>> In your C code when you do this: my_data.e32 = 50, what is the
>>>> compiler going to do? If the structure is not packed, it knows it can
>>>> use an instruction that accesses words. If the structure is packed,
>>>> well, I guess it is up to the compiler what to do. In the past, I have
>>>> seen compilers that add code or call functions that will check the
>>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>>> will use a word instruction; if it happens to reside on a byte
>>>> boundary it needs to access the bytes individually to put them in a
>>>> register for use.
>>
>> Maybe I'm confusing something but isn't it that read from memory is always
>> word sized? Even if one access single byte?
>>
>>>
>>> I'm not really adding anything here, but here is something I realized
>>> recently.  When you tell gcc to pack a struct, it has two effects:
>>>
>>>    1. Eliminates padding.
>>>    2. Assumes instances of the struct are not properly aligned.
>>
>> Yes, that is main reason to use packed structures - to eliminate padding
>> and assume unaligned access.
>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
>>
>>> For MCUs which don't support unaligned accesses, the second effect may
>>> carry some hidden costs. Even if the struct is defined such that it
>>> wouldn't contain any padding, and even if all instances of the struct
>>> are properly aligned, adding the __packed__ attribute will result in an
>>> increase in code size.  The increase occurs because gcc can no longer
>>> assume that the struct or any of its members are aligned.
>>
>> But how is that worse then reading/writing byte by byte? You need to read
>> whole word to access byte, right?
>>
>> I did quick test for nrf51dk (bletiny with SM enabled):
>> development branch      region `FLASH' overflowed by 26860 bytes
>> sm branch                    region `FLASH' overflowed by 25968 bytes
>>
>> :-)
>>
>> --
>> pozdrawiam
>> Szymon K. Janc
>



-- 
pozdrawiam
Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by will sanfilippo <wi...@runtime.io>.

Hopefully I am not going to drag this discussion on too long, but I like this stuff so…

The cortex-M processors have byte, half-word and word instructions. It will use the appropriate instruction when you access bytes, half-words or words.

For example, here is an excerpt of disassembled code. In this example, req and cp point to non-packed structures. These elements are 16-bits in their respective structures.

/* Copy timeoput from cp to req */
req->timeout = cp->timeout;
    deb8:       88fa            ldrh    r2, [r7, #6]
    deba:       811a           strh    r2, [r3, #8]

This is the same code but with req being packed. Note that cp is not packed:

/* Copy timeout from cp to req */
req->timeout = cp->timeout;
    df2a:       7998            ldrb    r0, [r3, #6]
    df2c:       7010            strb    r0, [r2, #0]
    df2e:       79db            ldrb    r3, [r3, #7]
    df30:       7053            strb    r3, [r2, #1]

Since the compiler cannot assume that the 16-bit value is aligned within the req structure, it has to use byte instructions to store the bytes. This gets even worse for unaligned 32-bit values. This is what I was trying to point out as one of the pitfalls of using packed structures re: code size.

> 
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned  mydata.e32 =50 is not suppose to happen ever.


I think I understand what you are getting at here. I agree; you should limit the use of the packed structures to this.

In the end, I really think it depends on what you do with these packed structures as to whether or not you will save space, but with your caveat above (how you use them), I agree that we will probably see code savings given the way the current code is written.

All good! Fun discussion.

> On Jan 20, 2017, at 11:45 AM, Szymon Janc <sz...@codecoup.pl> wrote:
> 
> Hi,
> 
> On 20 January 2017 at 19:14, Christopher Collins <cc...@apache.org> wrote:
>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>> I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).
>>> 
>>> struct my_struct
>>> {
>>>      uint8_t e8;
>>>      uint16_t e16;
>>>      uint32_t e32;
>>> } __packed__          /* I know this syntax is wrong, just an example */
>>> struct my_struct my_data
>>> 
>>> In your C code when you do this: my_data.e32 = 50, what is the
>>> compiler going to do? If the structure is not packed, it knows it can
>>> use an instruction that accesses words. If the structure is packed,
>>> well, I guess it is up to the compiler what to do. In the past, I have
>>> seen compilers that add code or call functions that will check the
>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>> will use a word instruction; if it happens to reside on a byte
>>> boundary it needs to access the bytes individually to put them in a
>>> register for use.
> 
> Maybe I'm confusing something but isn't it that read from memory is always
> word sized? Even if one access single byte?
> 
>> 
>> I'm not really adding anything here, but here is something I realized
>> recently.  When you tell gcc to pack a struct, it has two effects:
>> 
>>    1. Eliminates padding.
>>    2. Assumes instances of the struct are not properly aligned.
> 
> Yes, that is main reason to use packed structures - to eliminate padding
> and assume unaligned access.
> 
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
> 
>> For MCUs which don't support unaligned accesses, the second effect may
>> carry some hidden costs. Even if the struct is defined such that it
>> wouldn't contain any padding, and even if all instances of the struct
>> are properly aligned, adding the __packed__ attribute will result in an
>> increase in code size.  The increase occurs because gcc can no longer
>> assume that the struct or any of its members are aligned.
> 
> But how is that worse then reading/writing byte by byte? You need to read
> whole word to access byte, right?
> 
> I did quick test for nrf51dk (bletiny with SM enabled):
> development branch      region `FLASH' overflowed by 26860 bytes
> sm branch                    region `FLASH' overflowed by 25968 bytes
> 
> :-)
> 
> -- 
> pozdrawiam
> Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by Szymon Janc <sz...@codecoup.pl>.

Hi,

On 20 January 2017 at 19:14, Christopher Collins <cc...@apache.org> wrote:
> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>> I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).
>>
>> struct my_struct
>> {
>>       uint8_t e8;
>>       uint16_t e16;
>>       uint32_t e32;
>> } __packed__          /* I know this syntax is wrong, just an example */
>> struct my_struct my_data
>>
>> In your C code when you do this: my_data.e32 = 50, what is the
>> compiler going to do? If the structure is not packed, it knows it can
>> use an instruction that accesses words. If the structure is packed,
>> well, I guess it is up to the compiler what to do. In the past, I have
>> seen compilers that add code or call functions that will check the
>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>> will use a word instruction; if it happens to reside on a byte
>> boundary it needs to access the bytes individually to put them in a
>> register for use.

Maybe I'm confusing something but isn't it that read from memory is always
word sized? Even if one access single byte?

>
> I'm not really adding anything here, but here is something I realized
> recently.  When you tell gcc to pack a struct, it has two effects:
>
>     1. Eliminates padding.
>     2. Assumes instances of the struct are not properly aligned.

Yes, that is main reason to use packed structures - to eliminate padding
and assume unaligned access.

Maybe I wasn't clear enough, but those are suppose to be used *only*
for mapping them to/from memory buffer ie only accessed as pointers.
So above mentioned  mydata.e32 =50 is not suppose to happen ever.

> For MCUs which don't support unaligned accesses, the second effect may
> carry some hidden costs. Even if the struct is defined such that it
> wouldn't contain any padding, and even if all instances of the struct
> are properly aligned, adding the __packed__ attribute will result in an
> increase in code size.  The increase occurs because gcc can no longer
> assume that the struct or any of its members are aligned.

But how is that worse then reading/writing byte by byte? You need to read
whole word to access byte, right?

I did quick test for nrf51dk (bletiny with SM enabled):
 development branch      region `FLASH' overflowed by 26860 bytes
 sm branch                    region `FLASH' overflowed by 25968 bytes

:-)

-- 
pozdrawiam
Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by will sanfilippo <wi...@runtime.io>.

I think it really comes down to what we want to get out of this. 

* If we are doing this for code readability, I think we all agree that the structure access is more readable than the other approach.
* If we are worried about RAM, packing the structures is better, especially the crazy way the BLE spec is written (no regard to alignment in their PDU’s at all).
* If we are worried about code size, well, not sure :-)
* If we are worried about speed, packing can only hurt. 

I do wonder though what this will do on the cortex-m0.


> On Jan 20, 2017, at 10:14 AM, Christopher Collins <cc...@apache.org> wrote:
> 
> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>> I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).
>> 
>> struct my_struct
>> {
>> 	uint8_t e8;
>> 	uint16_t e16;
>> 	uint32_t e32;
>> } __packed__		/* I know this syntax is wrong, just an example */
>> struct my_struct my_data
>> 
>> In your C code when you do this: my_data.e32 = 50, what is the
>> compiler going to do? If the structure is not packed, it knows it can
>> use an instruction that accesses words. If the structure is packed,
>> well, I guess it is up to the compiler what to do. In the past, I have
>> seen compilers that add code or call functions that will check the
>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>> will use a word instruction; if it happens to reside on a byte
>> boundary it needs to access the bytes individually to put them in a
>> register for use.
> 
> I'm not really adding anything here, but here is something I realized
> recently.  When you tell gcc to pack a struct, it has two effects:
> 
>    1. Eliminates padding.
>    2. Assumes instances of the struct are not properly aligned.
> 
> For MCUs which don't support unaligned accesses, the second effect may
> carry some hidden costs.  Even if the struct is defined such that it
> wouldn't contain any padding, and even if all instances of the struct
> are properly aligned, adding the __packed__ attribute will result in an
> increase in code size.  The increase occurs because gcc can no longer
> assume that the struct or any of its members are aligned.
> 
> Chris

Re: [RFC] Reducing size of BLE Security Manager

Posted by Christopher Collins <cc...@apache.org>.

On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
> I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).
> 
> struct my_struct
> {
> 	uint8_t e8;
> 	uint16_t e16;
> 	uint32_t e32;
> } __packed__		/* I know this syntax is wrong, just an example */
> struct my_struct my_data
> 
> In your C code when you do this: my_data.e32 = 50, what is the
> compiler going to do? If the structure is not packed, it knows it can
> use an instruction that accesses words. If the structure is packed,
> well, I guess it is up to the compiler what to do. In the past, I have
> seen compilers that add code or call functions that will check the
> alignment of e32. If e32 happens to reside on a 4-byte boundary it
> will use a word instruction; if it happens to reside on a byte
> boundary it needs to access the bytes individually to put them in a
> register for use.

I'm not really adding anything here, but here is something I realized
recently.  When you tell gcc to pack a struct, it has two effects:

    1. Eliminates padding.
    2. Assumes instances of the struct are not properly aligned.

For MCUs which don't support unaligned accesses, the second effect may
carry some hidden costs.  Even if the struct is defined such that it
wouldn't contain any padding, and even if all instances of the struct
are properly aligned, adding the __packed__ attribute will result in an
increase in code size.  The increase occurs because gcc can no longer
assume that the struct or any of its members are aligned.

Chris

Re: [RFC] Reducing size of BLE Security Manager

Posted by will sanfilippo <wi...@runtime.io>.

Szymon

> If you do byte by byte construction you do unaligned access too and
> compiler needs to handle it anyway:)

I was referring to C code that accesses a packed structure, not necessarily the construction part of it. For example: (and in this example I am assuming the processor can access bytes anywhere, 16-bit values on 16-bit boundaries and 32-bit values on 32-bit boundaries).

struct my_struct
{
	uint8_t e8;
	uint16_t e16;
	uint32_t e32;
} __packed__		/* I know this syntax is wrong, just an example */
struct my_struct my_data

In your C code when you do this: my_data.e32 = 50, what is the compiler going to do? If the structure is not packed, it knows it can use an instruction that accesses words. If the structure is packed, well, I guess it is up to the compiler what to do. In the past, I have seen compilers that add code or call functions that will check the alignment of e32. If e32 happens to reside on a 4-byte boundary it will use a word instruction; if it happens to reside on a byte boundary it needs to access the bytes individually to put them in a register for use.

BTW, you should compile that code for the M0. It would be interesting to see what the compiler does.


> On Jan 20, 2017, at 9:14 AM, Szymon Janc <sz...@codecoup.pl> wrote:
> 
> Hi Will,
> 
> On 20 January 2017 at 17:56, will sanfilippo <wi...@runtime.io> wrote:
>> I have mixed feelings about packed structures. For processors that cannot handle unaligned accesses I have always found that they increased code size. Every access of an element in that structure needs code to determine the alignment of that element.
> 
> If you do byte by byte construction you do unaligned access too and
> compiler needs to handle it anyway:)
> 
>> Sure, they save RAM, so if that is what you want then fine, but code size? When you did this code size comparison did you do it on a processor that handles unaligned access? This can also impact the speed at which the code runs although that is rarely an issue.
> 
> Size result I mentioned is from 'newt size'  FLASH column. Target is
> nrf52 DK with Cortex M4
>    app=@apache-mynewt-core/apps/bletiny
>    bsp=@apache-mynewt-core/hw/bsp/nrf52dk
>    build_profile=optimized
>    cflags=-DSTATS_NAME_ENABLE
> 
> So it does reduce ROM usage.
> 
>> 
>> About reducing copies. I am sure you know this, but folks should be careful doing something like mystruct = (struct mystruct *)om->om_data. You are not guaranteed that the data is contiguous so you better m_pullup first.
> 
> Yeap, actually I was thinking on having this wrapped with nice macro
> doing that for us.
> 
>> 
>> The controller does byte-by-byte copies and does not use packed structs. If we find that they generally svae code space we can modify that code as well.
> 
> yeap, if we agreed on using packed structures I'll work on other
> protocols too (including HCI and controller).
> 
> 
> One note from my side is that having packed structures with meaningful
> members name is much easier to read and understand without
> having to resort to spec.  (yes, this is personal opinion:P)
> 
>> 
>>> On Jan 20, 2017, at 8:21 AM, Christopher Collins <cc...@apache.org> wrote:
>>> 
>>> Hi Szymon,
>>> 
>>> On Fri, Jan 20, 2017 at 10:21:16AM +0100, Szymon Janc wrote:
>>>> Hi,
>>>> 
>>>> I was recently looking on how we could reduce size of SM code.
>>>> So my proposal is to change the way PDUs are parsed and constructed.
>>>> 
>>>> Instead of having ble_sm_foo_parse(), ble_sm_foo_write() and ble_sm_foo_tx()
>>>> for parsing and constructing PDU byte by byte we could use packed structures
>>>> for describing PDU and let compiler figure out details related to
>>>> unaligned access.
>>> [...]
>>> 
>>> I think that's a great idea.  The ATT code does something similar,
>>> though there is probably more work to be done there.  In my opinion,
>>> using packed structs for parsing and encoding doesn't just reduce code
>>> size, it also simplifies the code.
>>> 
>>> Chris
>> 
> 
> 
> 
> -- 
> pozdrawiam
> Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by Szymon Janc <sz...@codecoup.pl>.

Hi Will,

On 20 January 2017 at 17:56, will sanfilippo <wi...@runtime.io> wrote:
> I have mixed feelings about packed structures. For processors that cannot handle unaligned accesses I have always found that they increased code size. Every access of an element in that structure needs code to determine the alignment of that element.

If you do byte by byte construction you do unaligned access too and
compiler needs to handle it anyway:)

> Sure, they save RAM, so if that is what you want then fine, but code size? When you did this code size comparison did you do it on a processor that handles unaligned access? This can also impact the speed at which the code runs although that is rarely an issue.

Size result I mentioned is from 'newt size'  FLASH column. Target is
nrf52 DK with Cortex M4
    app=@apache-mynewt-core/apps/bletiny
    bsp=@apache-mynewt-core/hw/bsp/nrf52dk
    build_profile=optimized
    cflags=-DSTATS_NAME_ENABLE

So it does reduce ROM usage.

>
> About reducing copies. I am sure you know this, but folks should be careful doing something like mystruct = (struct mystruct *)om->om_data. You are not guaranteed that the data is contiguous so you better m_pullup first.

Yeap, actually I was thinking on having this wrapped with nice macro
doing that for us.

>
> The controller does byte-by-byte copies and does not use packed structs. If we find that they generally svae code space we can modify that code as well.

yeap, if we agreed on using packed structures I'll work on other
protocols too (including HCI and controller).


One note from my side is that having packed structures with meaningful
members name is much easier to read and understand without
having to resort to spec.  (yes, this is personal opinion:P)

>
>> On Jan 20, 2017, at 8:21 AM, Christopher Collins <cc...@apache.org> wrote:
>>
>> Hi Szymon,
>>
>> On Fri, Jan 20, 2017 at 10:21:16AM +0100, Szymon Janc wrote:
>>> Hi,
>>>
>>> I was recently looking on how we could reduce size of SM code.
>>> So my proposal is to change the way PDUs are parsed and constructed.
>>>
>>> Instead of having ble_sm_foo_parse(), ble_sm_foo_write() and ble_sm_foo_tx()
>>> for parsing and constructing PDU byte by byte we could use packed structures
>>> for describing PDU and let compiler figure out details related to
>>> unaligned access.
>> [...]
>>
>> I think that's a great idea.  The ATT code does something similar,
>> though there is probably more work to be done there.  In my opinion,
>> using packed structs for parsing and encoding doesn't just reduce code
>> size, it also simplifies the code.
>>
>> Chris
>



-- 
pozdrawiam
Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Posted by will sanfilippo <wi...@runtime.io>.

I have mixed feelings about packed structures. For processors that cannot handle unaligned accesses I have always found that they increased code size. Every access of an element in that structure needs code to determine the alignment of that element. Sure, they save RAM, so if that is what you want then fine, but code size? When you did this code size comparison did you do it on a processor that handles unaligned access? This can also impact the speed at which the code runs although that is rarely an issue.

About reducing copies. I am sure you know this, but folks should be careful doing something like mystruct = (struct mystruct *)om->om_data. You are not guaranteed that the data is contiguous so you better m_pullup first.

The controller does byte-by-byte copies and does not use packed structs. If we find that they generally svae code space we can modify that code as well.

> On Jan 20, 2017, at 8:21 AM, Christopher Collins <cc...@apache.org> wrote:
> 
> Hi Szymon,
> 
> On Fri, Jan 20, 2017 at 10:21:16AM +0100, Szymon Janc wrote:
>> Hi,
>> 
>> I was recently looking on how we could reduce size of SM code.
>> So my proposal is to change the way PDUs are parsed and constructed.
>> 
>> Instead of having ble_sm_foo_parse(), ble_sm_foo_write() and ble_sm_foo_tx()
>> for parsing and constructing PDU byte by byte we could use packed structures
>> for describing PDU and let compiler figure out details related to
>> unaligned access.
> [...]
> 
> I think that's a great idea.  The ATT code does something similar,
> though there is probably more work to be done there.  In my opinion,
> using packed structs for parsing and encoding doesn't just reduce code
> size, it also simplifies the code.
> 
> Chris

Re: [RFC] Reducing size of BLE Security Manager

Posted by Christopher Collins <cc...@apache.org>.

Hi Szymon,

On Fri, Jan 20, 2017 at 10:21:16AM +0100, Szymon Janc wrote:
> Hi,
> 
> I was recently looking on how we could reduce size of SM code.
> So my proposal is to change the way PDUs are parsed and constructed.
> 
> Instead of having ble_sm_foo_parse(), ble_sm_foo_write() and ble_sm_foo_tx()
> for parsing and constructing PDU byte by byte we could use packed structures
> for describing PDU and let compiler figure out details related to
> unaligned access.
[...]

I think that's a great idea.  The ATT code does something similar,
though there is probably more work to be done there.  In my opinion,
using packed structs for parsing and encoding doesn't just reduce code
size, it also simplifies the code.

Chris