You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Will San Filippo <wi...@micosa.net> on 2017/04/26 01:51:13 UTC

Problems with erasing flash on nordic devices

Hello:

Recently there has been some discussion around image upload, erasing flash, and connection supervision timeouts. We have recently noticed a case where it appears that due to erasing the flash the connection will time out (supervision timeout). Changing the supervision timeout, slave latency, etc will not guarantee success (although may make it less likely to occur).

We are currently evaluating fixes for this issue so in the meantime just be aware that this could occur.

For those who want the gory details:

What I think is happening is the following. A connection event starts at the correct time (it is not delayed by the flash erase) but at some point prior to getting the ADDRESS event a flash erase starts. The ADDRESS event is used internally to capture a timer value which records the start of the PDU. A slave will reset its anchor point when it receives a PDU as long as the access address matches. So what I think is happening is that the ADDRESS event and timer capture gets delayed because the CPU is halted during the flash erase causing the anchor point to get reset to an invalid time. The slave (peripheral) then wakes up at the incorrect time for all connection events from that point forward causing an eventual supervision timeout.

I realize that some might say to just drop the PDU if the timing is off. This would probably go a long way to making it much less likely to occur but there is still a chance that the connection will time out. The only way to guarantee this not occurring is to synchronize flash erase with radio events (something the current nimble stack does not do).

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Hrm. I do believe that works, though I admit I dont understand it at all.

Chasing a bunch of other problems while I confirm though.

Perhaps a radio comitter can chime in?

After an erase command the device disconnects (as expected as during the
flash erase the ble radio isnt serviced) and the logs say it successfully
returns to advertising, but it never shows up. (bleprph and blesplit
standard upstream)

Oddly If I break and step through bleprph_advertise it does advertise, so
its like a timing issue, the radio isnt ready yet or something...




On Fri, May 26, 2017 at 4:42 PM, Simon Ratner <si...@proxy.co> wrote:

> Sorry, not what i meant at all (I get terser as midnight approaches, ha!)
>
> You are only reading 64 bytes from flash, which is 16 32-bit ints, so your
> result is entirely expected. The rest of the data array is uninitialised.
> Correct code would look something like this:
>
> uint32_t data[64 >> 2];
> ...
> bytes_to_read = min(64, fa->fa_size - data_off);
> flash_area_read(fa, data_off, data, bytes_to_read);
> ...
> for (i = 0; i < bytes_to_read >> 2; i++){
>   if (data[i] != (uint32_t) -1) {
>     goto not_empty;
>   }
> }
> for (i = i << 2; i < bytes_to_read; i++) {
>   if (*(((uint8_t *) data) + i) != (uint8_t) -1) {
>     goto not_empty;
>   }
> }
> ...
>
>
>
> On Fri, May 26, 2017 at 11:09 AM, Jacob Rosenthal <jakerosenthal@gmail.com
> >
> wrote:
>
> > Just trying to cargo cult what chris said. whats appropriate? uint32_t?
> >
> > Could be related, but what Im seeing is
> >
> > data after flash_area_read before erase
> >
> > (gdb) p data
> > $1 = {3735928559 <repeats 30 times>, 126616, 46299, 1, 45733, 1, 145408,
> > 112640, 112640, 128076, 103483, 1, 127960, 536872908, 43465, 4294967295,
> > 4294967295, 4294967295,
> >   4294967295, 1, 536872908, 4294967295, 43503, 1, 127960, 127948,
> > 536872908, 127960, 536872963, 112640, 536872960, 46289, 109467, 103572,
> > 16777728}
> > (gdb) p data
> > $2 = {2532554812, 36, 32, 860, 18, 0, 0, 0, 536887296, 145449,
> 4085258752,
> > 4085286932, 553879568, 1744979993, 1610760970, 1744979992, 3735928559
> > <repeats 14 times>, 126616,
> >   46299, 1, 45733, 1, 145408, 112640, 112640, 128076, 103483, 1, 127960,
> > 536872908, 43465, 4294967295, 4294967295, 4294967295, 4294967295, 1,
> > 536872908, 4294967295, 43503, 1,
> >   127960, 127948, 536872908, 127960, 536872963, 112640, 536872960, 46289,
> > 109467, 103576, 1090519552}
> >
> >
> > data after flash_area_read  after erase
> >
> >
> > $2 = {4294967295 <repeats 16 times>, 536874064, 536874068, 536883092,
> > 536873880, 0, 38233, 536883088, 536874048, 536874048, 536873534,
> 536876440,
> > 111639, 2, 536874048, 126612,
> >   46299, 1, 45733, 1, 145408, 112640, 112640, 126612, 46299, 32, 45733,
> 32,
> > 145408, 32, 0, 536873624, 103483, 32, 0, 536873688, 108847, 1, 127956,
> > 536883076, 536878212,
> >   127956, 536873687, 2, 127968, 46289, 109155, 103576, 1090519552}
> > (gdb)
> >
> > seems like it successfully erases only 16 uint32s
> >
> >
> > On Thu, May 25, 2017 at 11:49 PM, Simon Ratner <si...@proxy.co> wrote:
> >
> > > At a glance, you are confusing bytes with ints (data is declared as
> > > unsigned int), so for the most part you are comparing uninitialised
> stack
> > > memory.
> > >
> > > Hth,
> > >
> > > simon
> > >
> > > On Thu, May 25, 2017 at 11:04 PM, Jacob Rosenthal <
> > jakerosenthal@gmail.com
> > > >
> > > wrote:
> > >
> > > > Pushed something like you described, new flash_area_is_empty function
> > in
> > > > flash_map
> > > > https://github.com/apache/incubator-mynewt-core/pull/281
> > > >
> > > > Currently doesnt actually work.. either my code sucks or the erase
> > doesnt
> > > > actually ever succeed on nrf51 is is leaving behind non 0xff bytes..
> > not
> > > > sure which
> > > >
> > > > On Tue, May 16, 2017 at 5:26 PM, Christopher Collins <
> chris@runtime.io
> > >
> > > > wrote:
> > > >
> > > > > On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> > > > > > Seems like it could go in flash_map?
> > > > > >
> > > > > > Its also been requested to also have a function that detects if
> an
> > > area
> > > > > > should be erased, based on it having any non 0xff bytes within.
> > > > > >
> > > > > > Should that go there as well.
> > > > >
> > > > > That is the same thing, isn't it?
> > > > >
> > > > > > Whats a good method to loop through all bytes of a flash area?
> > > > >
> > > > > Perhaps something like the following:
> > > > >
> > > > > 1. Allocate a buffer that is properly aligned to be accessed by an
> > > > > unsigned int*.  It may take some testing to
> > > > > determine a good size that balances speed and size efficiency; I
> > would
> > > > > start with 64 bytes.  I would also start by putting this buffer on
> > the
> > > > > stack.
> > > > >
> > > > > 2. Read a chunk of flash into the buffer
> > > > > (min(size-of-buffer, num-unread-bytes).
> > > > >
> > > > > 3. Use an unsigned int* to iterate through the buffer.  If the
> > pointer
> > > > > points to something other than (unsigned int)-1, then the flash is
> > not
> > > > > erased.
> > > > >
> > > > > 4. If there are any remaining bytes (i.e., number of bytes read is
> > not
> > > a
> > > > > multiple of sizeof(unsigned int)), compare them individually
> against
> > > > > 0xff using a uint8_t*.
> > > > >
> > > > > Chris
> > > > >
> > > >
> > >
> >
>

Re: Problems with erasing flash on nordic devices

Posted by Simon Ratner <si...@proxy.co>.
Sorry, not what i meant at all (I get terser as midnight approaches, ha!)

You are only reading 64 bytes from flash, which is 16 32-bit ints, so your
result is entirely expected. The rest of the data array is uninitialised.
Correct code would look something like this:

uint32_t data[64 >> 2];
...
bytes_to_read = min(64, fa->fa_size - data_off);
flash_area_read(fa, data_off, data, bytes_to_read);
...
for (i = 0; i < bytes_to_read >> 2; i++){
  if (data[i] != (uint32_t) -1) {
    goto not_empty;
  }
}
for (i = i << 2; i < bytes_to_read; i++) {
  if (*(((uint8_t *) data) + i) != (uint8_t) -1) {
    goto not_empty;
  }
}
...



On Fri, May 26, 2017 at 11:09 AM, Jacob Rosenthal <ja...@gmail.com>
wrote:

> Just trying to cargo cult what chris said. whats appropriate? uint32_t?
>
> Could be related, but what Im seeing is
>
> data after flash_area_read before erase
>
> (gdb) p data
> $1 = {3735928559 <repeats 30 times>, 126616, 46299, 1, 45733, 1, 145408,
> 112640, 112640, 128076, 103483, 1, 127960, 536872908, 43465, 4294967295,
> 4294967295, 4294967295,
>   4294967295, 1, 536872908, 4294967295, 43503, 1, 127960, 127948,
> 536872908, 127960, 536872963, 112640, 536872960, 46289, 109467, 103572,
> 16777728}
> (gdb) p data
> $2 = {2532554812, 36, 32, 860, 18, 0, 0, 0, 536887296, 145449, 4085258752,
> 4085286932, 553879568, 1744979993, 1610760970, 1744979992, 3735928559
> <repeats 14 times>, 126616,
>   46299, 1, 45733, 1, 145408, 112640, 112640, 128076, 103483, 1, 127960,
> 536872908, 43465, 4294967295, 4294967295, 4294967295, 4294967295, 1,
> 536872908, 4294967295, 43503, 1,
>   127960, 127948, 536872908, 127960, 536872963, 112640, 536872960, 46289,
> 109467, 103576, 1090519552}
>
>
> data after flash_area_read  after erase
>
>
> $2 = {4294967295 <repeats 16 times>, 536874064, 536874068, 536883092,
> 536873880, 0, 38233, 536883088, 536874048, 536874048, 536873534, 536876440,
> 111639, 2, 536874048, 126612,
>   46299, 1, 45733, 1, 145408, 112640, 112640, 126612, 46299, 32, 45733, 32,
> 145408, 32, 0, 536873624, 103483, 32, 0, 536873688, 108847, 1, 127956,
> 536883076, 536878212,
>   127956, 536873687, 2, 127968, 46289, 109155, 103576, 1090519552}
> (gdb)
>
> seems like it successfully erases only 16 uint32s
>
>
> On Thu, May 25, 2017 at 11:49 PM, Simon Ratner <si...@proxy.co> wrote:
>
> > At a glance, you are confusing bytes with ints (data is declared as
> > unsigned int), so for the most part you are comparing uninitialised stack
> > memory.
> >
> > Hth,
> >
> > simon
> >
> > On Thu, May 25, 2017 at 11:04 PM, Jacob Rosenthal <
> jakerosenthal@gmail.com
> > >
> > wrote:
> >
> > > Pushed something like you described, new flash_area_is_empty function
> in
> > > flash_map
> > > https://github.com/apache/incubator-mynewt-core/pull/281
> > >
> > > Currently doesnt actually work.. either my code sucks or the erase
> doesnt
> > > actually ever succeed on nrf51 is is leaving behind non 0xff bytes..
> not
> > > sure which
> > >
> > > On Tue, May 16, 2017 at 5:26 PM, Christopher Collins <chris@runtime.io
> >
> > > wrote:
> > >
> > > > On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> > > > > Seems like it could go in flash_map?
> > > > >
> > > > > Its also been requested to also have a function that detects if an
> > area
> > > > > should be erased, based on it having any non 0xff bytes within.
> > > > >
> > > > > Should that go there as well.
> > > >
> > > > That is the same thing, isn't it?
> > > >
> > > > > Whats a good method to loop through all bytes of a flash area?
> > > >
> > > > Perhaps something like the following:
> > > >
> > > > 1. Allocate a buffer that is properly aligned to be accessed by an
> > > > unsigned int*.  It may take some testing to
> > > > determine a good size that balances speed and size efficiency; I
> would
> > > > start with 64 bytes.  I would also start by putting this buffer on
> the
> > > > stack.
> > > >
> > > > 2. Read a chunk of flash into the buffer
> > > > (min(size-of-buffer, num-unread-bytes).
> > > >
> > > > 3. Use an unsigned int* to iterate through the buffer.  If the
> pointer
> > > > points to something other than (unsigned int)-1, then the flash is
> not
> > > > erased.
> > > >
> > > > 4. If there are any remaining bytes (i.e., number of bytes read is
> not
> > a
> > > > multiple of sizeof(unsigned int)), compare them individually against
> > > > 0xff using a uint8_t*.
> > > >
> > > > Chris
> > > >
> > >
> >
>

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Just trying to cargo cult what chris said. whats appropriate? uint32_t?

Could be related, but what Im seeing is

data after flash_area_read before erase

(gdb) p data
$1 = {3735928559 <repeats 30 times>, 126616, 46299, 1, 45733, 1, 145408,
112640, 112640, 128076, 103483, 1, 127960, 536872908, 43465, 4294967295,
4294967295, 4294967295,
  4294967295, 1, 536872908, 4294967295, 43503, 1, 127960, 127948,
536872908, 127960, 536872963, 112640, 536872960, 46289, 109467, 103572,
16777728}
(gdb) p data
$2 = {2532554812, 36, 32, 860, 18, 0, 0, 0, 536887296, 145449, 4085258752,
4085286932, 553879568, 1744979993, 1610760970, 1744979992, 3735928559
<repeats 14 times>, 126616,
  46299, 1, 45733, 1, 145408, 112640, 112640, 128076, 103483, 1, 127960,
536872908, 43465, 4294967295, 4294967295, 4294967295, 4294967295, 1,
536872908, 4294967295, 43503, 1,
  127960, 127948, 536872908, 127960, 536872963, 112640, 536872960, 46289,
109467, 103576, 1090519552}


data after flash_area_read  after erase


$2 = {4294967295 <repeats 16 times>, 536874064, 536874068, 536883092,
536873880, 0, 38233, 536883088, 536874048, 536874048, 536873534, 536876440,
111639, 2, 536874048, 126612,
  46299, 1, 45733, 1, 145408, 112640, 112640, 126612, 46299, 32, 45733, 32,
145408, 32, 0, 536873624, 103483, 32, 0, 536873688, 108847, 1, 127956,
536883076, 536878212,
  127956, 536873687, 2, 127968, 46289, 109155, 103576, 1090519552}
(gdb)

seems like it successfully erases only 16 uint32s


On Thu, May 25, 2017 at 11:49 PM, Simon Ratner <si...@proxy.co> wrote:

> At a glance, you are confusing bytes with ints (data is declared as
> unsigned int), so for the most part you are comparing uninitialised stack
> memory.
>
> Hth,
>
> simon
>
> On Thu, May 25, 2017 at 11:04 PM, Jacob Rosenthal <jakerosenthal@gmail.com
> >
> wrote:
>
> > Pushed something like you described, new flash_area_is_empty function in
> > flash_map
> > https://github.com/apache/incubator-mynewt-core/pull/281
> >
> > Currently doesnt actually work.. either my code sucks or the erase doesnt
> > actually ever succeed on nrf51 is is leaving behind non 0xff bytes.. not
> > sure which
> >
> > On Tue, May 16, 2017 at 5:26 PM, Christopher Collins <ch...@runtime.io>
> > wrote:
> >
> > > On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> > > > Seems like it could go in flash_map?
> > > >
> > > > Its also been requested to also have a function that detects if an
> area
> > > > should be erased, based on it having any non 0xff bytes within.
> > > >
> > > > Should that go there as well.
> > >
> > > That is the same thing, isn't it?
> > >
> > > > Whats a good method to loop through all bytes of a flash area?
> > >
> > > Perhaps something like the following:
> > >
> > > 1. Allocate a buffer that is properly aligned to be accessed by an
> > > unsigned int*.  It may take some testing to
> > > determine a good size that balances speed and size efficiency; I would
> > > start with 64 bytes.  I would also start by putting this buffer on the
> > > stack.
> > >
> > > 2. Read a chunk of flash into the buffer
> > > (min(size-of-buffer, num-unread-bytes).
> > >
> > > 3. Use an unsigned int* to iterate through the buffer.  If the pointer
> > > points to something other than (unsigned int)-1, then the flash is not
> > > erased.
> > >
> > > 4. If there are any remaining bytes (i.e., number of bytes read is not
> a
> > > multiple of sizeof(unsigned int)), compare them individually against
> > > 0xff using a uint8_t*.
> > >
> > > Chris
> > >
> >
>

Re: Problems with erasing flash on nordic devices

Posted by Simon Ratner <si...@proxy.co>.
At a glance, you are confusing bytes with ints (data is declared as
unsigned int), so for the most part you are comparing uninitialised stack
memory.

Hth,

simon

On Thu, May 25, 2017 at 11:04 PM, Jacob Rosenthal <ja...@gmail.com>
wrote:

> Pushed something like you described, new flash_area_is_empty function in
> flash_map
> https://github.com/apache/incubator-mynewt-core/pull/281
>
> Currently doesnt actually work.. either my code sucks or the erase doesnt
> actually ever succeed on nrf51 is is leaving behind non 0xff bytes.. not
> sure which
>
> On Tue, May 16, 2017 at 5:26 PM, Christopher Collins <ch...@runtime.io>
> wrote:
>
> > On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> > > Seems like it could go in flash_map?
> > >
> > > Its also been requested to also have a function that detects if an area
> > > should be erased, based on it having any non 0xff bytes within.
> > >
> > > Should that go there as well.
> >
> > That is the same thing, isn't it?
> >
> > > Whats a good method to loop through all bytes of a flash area?
> >
> > Perhaps something like the following:
> >
> > 1. Allocate a buffer that is properly aligned to be accessed by an
> > unsigned int*.  It may take some testing to
> > determine a good size that balances speed and size efficiency; I would
> > start with 64 bytes.  I would also start by putting this buffer on the
> > stack.
> >
> > 2. Read a chunk of flash into the buffer
> > (min(size-of-buffer, num-unread-bytes).
> >
> > 3. Use an unsigned int* to iterate through the buffer.  If the pointer
> > points to something other than (unsigned int)-1, then the flash is not
> > erased.
> >
> > 4. If there are any remaining bytes (i.e., number of bytes read is not a
> > multiple of sizeof(unsigned int)), compare them individually against
> > 0xff using a uint8_t*.
> >
> > Chris
> >
>

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Pushed something like you described, new flash_area_is_empty function in
flash_map
https://github.com/apache/incubator-mynewt-core/pull/281

Currently doesnt actually work.. either my code sucks or the erase doesnt
actually ever succeed on nrf51 is is leaving behind non 0xff bytes.. not
sure which

On Tue, May 16, 2017 at 5:26 PM, Christopher Collins <ch...@runtime.io>
wrote:

> On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> > Seems like it could go in flash_map?
> >
> > Its also been requested to also have a function that detects if an area
> > should be erased, based on it having any non 0xff bytes within.
> >
> > Should that go there as well.
>
> That is the same thing, isn't it?
>
> > Whats a good method to loop through all bytes of a flash area?
>
> Perhaps something like the following:
>
> 1. Allocate a buffer that is properly aligned to be accessed by an
> unsigned int*.  It may take some testing to
> determine a good size that balances speed and size efficiency; I would
> start with 64 bytes.  I would also start by putting this buffer on the
> stack.
>
> 2. Read a chunk of flash into the buffer
> (min(size-of-buffer, num-unread-bytes).
>
> 3. Use an unsigned int* to iterate through the buffer.  If the pointer
> points to something other than (unsigned int)-1, then the flash is not
> erased.
>
> 4. If there are any remaining bytes (i.e., number of bytes read is not a
> multiple of sizeof(unsigned int)), compare them individually against
> 0xff using a uint8_t*.
>
> Chris
>

Re: Problems with erasing flash on nordic devices

Posted by Christopher Collins <ch...@runtime.io>.
On Tue, May 16, 2017 at 05:11:36PM -0700, Jacob Rosenthal wrote:
> Seems like it could go in flash_map?
> 
> Its also been requested to also have a function that detects if an area
> should be erased, based on it having any non 0xff bytes within.
> 
> Should that go there as well.

That is the same thing, isn't it?

> Whats a good method to loop through all bytes of a flash area?

Perhaps something like the following:

1. Allocate a buffer that is properly aligned to be accessed by an
unsigned int*.  It may take some testing to
determine a good size that balances speed and size efficiency; I would
start with 64 bytes.  I would also start by putting this buffer on the
stack.

2. Read a chunk of flash into the buffer
(min(size-of-buffer, num-unread-bytes).

3. Use an unsigned int* to iterate through the buffer.  If the pointer
points to something other than (unsigned int)-1, then the flash is not
erased.

4. If there are any remaining bytes (i.e., number of bytes read is not a
multiple of sizeof(unsigned int)), compare them individually against
0xff using a uint8_t*.

Chris

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Seems like it could go in flash_map?

Its also been requested to also have a function that detects if an area
should be erased, based on it having any non 0xff bytes within.

Should that go there as well. Whats a good method to loop through all bytes
of a flash area?

On Tue, May 16, 2017 at 11:01 AM, Christopher Collins <ch...@runtime.io>
wrote:

> On Tue, May 16, 2017 at 09:31:11AM -0700, marko kiiskila wrote:
> > Maybe we could add hw/hal/src/hal_flash.c:hal_flash_erase_if_needed()?
> > Which would erase slots which have any non-0xff bytes, and then use it
> from
> > image erase.
>
> Good idea.  It might be best if that function were added to a package
> other than the HAL, though, since it can be implemented using existing
> HAL functions.
>
> Chris
>

Re: Problems with erasing flash on nordic devices

Posted by Christopher Collins <ch...@runtime.io>.
On Tue, May 16, 2017 at 09:31:11AM -0700, marko kiiskila wrote:
> Maybe we could add hw/hal/src/hal_flash.c:hal_flash_erase_if_needed()?
> Which would erase slots which have any non-0xff bytes, and then use it from
> image erase.

Good idea.  It might be best if that function were added to a package
other than the HAL, though, since it can be implemented using existing
HAL functions.

Chris

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Something like that might make sense as theres a good bit of duplicated
code locating the proper flash area the PR I sent
https://github.com/apache/incubator-mynewt-core/pull/281

On Tue, May 16, 2017 at 9:31 AM, marko kiiskila <ma...@runtime.io> wrote:

>
> > On May 15, 2017, at 3:46 PM, Christopher Collins <ch...@runtime.io>
> wrote:
> >
> > Hi Jacob,
> >
> > On Mon, May 15, 2017 at 02:51:35PM -0700, Jacob Rosenthal wrote:
> >> So Ill duplicate the upload command to an erase command.
> >>
> >> Is there a bit of metadata I could check to see if a slot was filled and
> >> erase was necessary? If so I could conditionally erase in the upload
> >> command still, for backwards compat reasons.
> >
> > No, I'm afraid there is no way to distinguish an empty slot from a
> > partially written one.  We would need to add a new newtmgr command for
> > that (or add an option to "image list").
> >
> > Just to be clear, there also isn't an erase command at the moment.  This
> > still needs to be added.
> >
>
> Maybe we could add hw/hal/src/hal_flash.c:hal_flash_erase_if_needed()?
> Which would erase slots which have any non-0xff bytes, and then use it from
> image erase.
>

Re: Problems with erasing flash on nordic devices

Posted by marko kiiskila <ma...@runtime.io>.
> On May 15, 2017, at 3:46 PM, Christopher Collins <ch...@runtime.io> wrote:
> 
> Hi Jacob,
> 
> On Mon, May 15, 2017 at 02:51:35PM -0700, Jacob Rosenthal wrote:
>> So Ill duplicate the upload command to an erase command.
>> 
>> Is there a bit of metadata I could check to see if a slot was filled and
>> erase was necessary? If so I could conditionally erase in the upload
>> command still, for backwards compat reasons.
> 
> No, I'm afraid there is no way to distinguish an empty slot from a
> partially written one.  We would need to add a new newtmgr command for
> that (or add an option to "image list").
> 
> Just to be clear, there also isn't an erase command at the moment.  This
> still needs to be added.
> 

Maybe we could add hw/hal/src/hal_flash.c:hal_flash_erase_if_needed()?
Which would erase slots which have any non-0xff bytes, and then use it from
image erase.

Re: Problems with erasing flash on nordic devices

Posted by Christopher Collins <ch...@runtime.io>.
Hi Jacob,

On Mon, May 15, 2017 at 02:51:35PM -0700, Jacob Rosenthal wrote:
> So Ill duplicate the upload command to an erase command.
> 
> Is there a bit of metadata I could check to see if a slot was filled and
> erase was necessary? If so I could conditionally erase in the upload
> command still, for backwards compat reasons.

No, I'm afraid there is no way to distinguish an empty slot from a
partially written one.  We would need to add a new newtmgr command for
that (or add an option to "image list").

Just to be clear, there also isn't an erase command at the moment.  This
still needs to be added.

Chris

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
So Ill duplicate the upload command to an erase command.

Is there a bit of metadata I could check to see if a slot was filled and
erase was necessary? If so I could conditionally erase in the upload
command still, for backwards compat reasons.

On Fri, May 5, 2017 at 6:10 PM, Christopher Collins <ch...@runtime.io>
wrote:

> On Fri, May 05, 2017 at 05:58:52PM -0700, Jacob Rosenthal wrote:
> > Thats basically what Im saying. Though  if you just do #2 I wont need a
> > separate erase command. It'll erase and disconnect the first time, but
> not
> > the second time. Ill get an erase command for free.
>
> I don't think that will work without some more intelligence in the image
> upload command handler.  Here is the typical sequence of events:
>
> 1. Image upload (offset=0) command received.
> 2. Command handler erases second image slot.
> 3. BLE connection drops during erase.
> 4. Command handler writes first part of image to beginning of second
>    slot.
>
> If you try to reconnect and upload again, the firmware will need to
> erase the second slot once again, as the first image part has already
> been written.  By separating the flash erase from the write, you can
> ensure the slot is empty before starting the upload.
>
> The extra "intelligence" I mentioned would be the command handler
> comparing the flash contents against the data contained in the upload
> request.  If the contents are identical, the firmware could skip the
> erase and act like it just wrote the image segment (or indicate to the
> client that it should resume the upgrade at a particular offset).
>
> Chris
>

Re: Problems with erasing flash on nordic devices

Posted by Christopher Collins <ch...@runtime.io>.
On Fri, May 05, 2017 at 05:58:52PM -0700, Jacob Rosenthal wrote:
> Thats basically what Im saying. Though  if you just do #2 I wont need a
> separate erase command. It'll erase and disconnect the first time, but not
> the second time. Ill get an erase command for free.

I don't think that will work without some more intelligence in the image
upload command handler.  Here is the typical sequence of events:

1. Image upload (offset=0) command received.
2. Command handler erases second image slot.
3. BLE connection drops during erase.
4. Command handler writes first part of image to beginning of second
   slot.

If you try to reconnect and upload again, the firmware will need to
erase the second slot once again, as the first image part has already
been written.  By separating the flash erase from the write, you can
ensure the slot is empty before starting the upload.

The extra "intelligence" I mentioned would be the command handler
comparing the flash contents against the data contained in the upload
request.  If the contents are identical, the firmware could skip the
erase and act like it just wrote the image segment (or indicate to the
client that it should resume the upgrade at a particular offset).

Chris

Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Thats basically what Im saying. Though  if you just do #2 I wont need a
separate erase command. It'll erase and disconnect the first time, but not
the second time. Ill get an erase command for free.

On Fri, May 5, 2017 at 5:41 PM, will sanfilippo <wi...@runtime.io> wrote:

> 1) Add a command to erase the unused image (the image that is not running).
> 2) Modify the code such that it only erases the flash if it is not already
> erased.
>

Re: Problems with erasing flash on nordic devices

Posted by will sanfilippo <wi...@runtime.io>.
Jacob:

I think we might be talking past each other here. I do understand that the code is auto-erasing on upload. Let’s face it: if you want to upload a new image you are gonna have to erase the one that is there. Sure, there may be a case where it is not there or is already the one you want, but that is not going to be the norm.

So to be a bit more concise, here is what I think would have to be done:

1) Add a command to erase the unused image (the image that is not running).
2) Modify the code such that it only erases the flash if it is not already erased.

The application would then have to deal with reconnecting after any disconnects.


> On May 5, 2017, at 5:34 PM, Jacob Rosenthal <ja...@gmail.com> wrote:
> 
> Im not erasing. It is auto erasing on upload. So I cant upload.
> 
> On Fri, May 5, 2017 at 5:32 PM, will sanfilippo <wi...@runtime.io> wrote:
> 
>> BTW: there is an image list command that will tell you if there is an
>> image that you need to erase.
>> 


Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Im not erasing. It is auto erasing on upload. So I cant upload.

On Fri, May 5, 2017 at 5:32 PM, will sanfilippo <wi...@runtime.io> wrote:

> BTW: there is an image list command that will tell you if there is an
> image that you need to erase.
>

Re: Problems with erasing flash on nordic devices

Posted by will sanfilippo <wi...@runtime.io>.
So I left something out of my proposed work-around. The code would have to be modified such that if the page/sector was already erased the image upload would not blindly erase it.

BTW: there is an image list command that will tell you if there is an image that you need to erase.


> On May 5, 2017, at 5:15 PM, Jacob Rosenthal <ja...@gmail.com> wrote:
> 
> You've hit on my easy solution exactly. If we only erase when theres an
> image present to be erase, then I can just reconnect after disconnect. Its
> not fixed, but itll do.
> 
> *Currently It is not possible to update an nrf51 device over ble.  *When I
> reconnect, it erases again, and again drops the connection.
> 
> So perhaps theres a good way to determine if an image is present and if an
> erase is necessary?
> 
> 
> 
> On Fri, May 5, 2017 at 5:02 PM, will sanfilippo <wi...@runtime.io> wrote:
> 
>> Anyway, there have been thoughts on this but no solution has been decided
>> upon. For now your best bet is to simply reconnect if the connection drops
>> (imo).
>> 


Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
You've hit on my easy solution exactly. If we only erase when theres an
image present to be erase, then I can just reconnect after disconnect. Its
not fixed, but itll do.

*Currently It is not possible to update an nrf51 device over ble.  *When I
reconnect, it erases again, and again drops the connection.

So perhaps theres a good way to determine if an image is present and if an
erase is necessary?



On Fri, May 5, 2017 at 5:02 PM, will sanfilippo <wi...@runtime.io> wrote:

> Anyway, there have been thoughts on this but no solution has been decided
> upon. For now your best bet is to simply reconnect if the connection drops
> (imo).
>

Re: Problems with erasing flash on nordic devices

Posted by will sanfilippo <wi...@runtime.io>.
What is the simple solution to which you are referring? Not erasing if you do not need to? Or what was mentioned in the thread regarding discarding a PDU if the time it arrived does not seem to make sense? As I am sure you realize, neither of those will guarantee a fix.

Anyway, there have been thoughts on this but no solution has been decided upon. For now your best bet is to simply reconnect if the connection drops (imo).


> On May 5, 2017, at 4:23 PM, Jacob Rosenthal <ja...@gmail.com> wrote:
> 
> Any thoughts on this?
> 
> Also, Theres a simple solution while a more complex one is discussed.
> The current code erases every time no matter what, theres even a code
> comment there about it.
> https://github.com/apache/incubator-mynewt-core/blob/afa6d53254cbf444a3f44cc1851f0b038227edb6/mgmt/imgmgr/src/imgmgr.c#L322
> 
> The update process already includes at least one reset and reconnect (after
> test/confirm), so adding another reconnect after first firmware upload
> resets as it erases is 'fine'.
> 
> Im not sure how to get started on that myself. Thoughts on this?
> 
> 
> On Tue, Apr 25, 2017 at 6:51 PM, Will San Filippo <wi...@micosa.net> wrote:
> 
>> Hello:
>> 
>> Recently there has been some discussion around image upload, erasing
>> flash, and connection supervision timeouts. We have recently noticed a case
>> where it appears that due to erasing the flash the connection will time out
>> (supervision timeout). Changing the supervision timeout, slave latency, etc
>> will not guarantee success (although may make it less likely to occur).
>> 
>> We are currently evaluating fixes for this issue so in the meantime just
>> be aware that this could occur.
>> 
>> For those who want the gory details:
>> 
>> What I think is happening is the following. A connection event starts at
>> the correct time (it is not delayed by the flash erase) but at some point
>> prior to getting the ADDRESS event a flash erase starts. The ADDRESS event
>> is used internally to capture a timer value which records the start of the
>> PDU. A slave will reset its anchor point when it receives a PDU as long as
>> the access address matches. So what I think is happening is that the
>> ADDRESS event and timer capture gets delayed because the CPU is halted
>> during the flash erase causing the anchor point to get reset to an invalid
>> time. The slave (peripheral) then wakes up at the incorrect time for all
>> connection events from that point forward causing an eventual supervision
>> timeout.
>> 
>> I realize that some might say to just drop the PDU if the timing is off.
>> This would probably go a long way to making it much less likely to occur
>> but there is still a chance that the connection will time out. The only way
>> to guarantee this not occurring is to synchronize flash erase with radio
>> events (something the current nimble stack does not do).


Re: Problems with erasing flash on nordic devices

Posted by Jacob Rosenthal <ja...@gmail.com>.
Any thoughts on this?

Also, Theres a simple solution while a more complex one is discussed.
The current code erases every time no matter what, theres even a code
comment there about it.
https://github.com/apache/incubator-mynewt-core/blob/afa6d53254cbf444a3f44cc1851f0b038227edb6/mgmt/imgmgr/src/imgmgr.c#L322

The update process already includes at least one reset and reconnect (after
test/confirm), so adding another reconnect after first firmware upload
resets as it erases is 'fine'.

Im not sure how to get started on that myself. Thoughts on this?


On Tue, Apr 25, 2017 at 6:51 PM, Will San Filippo <wi...@micosa.net> wrote:

> Hello:
>
> Recently there has been some discussion around image upload, erasing
> flash, and connection supervision timeouts. We have recently noticed a case
> where it appears that due to erasing the flash the connection will time out
> (supervision timeout). Changing the supervision timeout, slave latency, etc
> will not guarantee success (although may make it less likely to occur).
>
> We are currently evaluating fixes for this issue so in the meantime just
> be aware that this could occur.
>
> For those who want the gory details:
>
> What I think is happening is the following. A connection event starts at
> the correct time (it is not delayed by the flash erase) but at some point
> prior to getting the ADDRESS event a flash erase starts. The ADDRESS event
> is used internally to capture a timer value which records the start of the
> PDU. A slave will reset its anchor point when it receives a PDU as long as
> the access address matches. So what I think is happening is that the
> ADDRESS event and timer capture gets delayed because the CPU is halted
> during the flash erase causing the anchor point to get reset to an invalid
> time. The slave (peripheral) then wakes up at the incorrect time for all
> connection events from that point forward causing an eventual supervision
> timeout.
>
> I realize that some might say to just drop the PDU if the timing is off.
> This would probably go a long way to making it much less likely to occur
> but there is still a chance that the connection will time out. The only way
> to guarantee this not occurring is to synchronize flash erase with radio
> events (something the current nimble stack does not do).