You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mynewt.apache.org by marko kiiskila <ma...@runtime.io> on 2018/07/26 12:10:10 UTC

FCB revamp

Hi,

there’s few issues with FCB which I need to fix.

1) Compressing flash sector in the middle of write.
Currently the cycle of operation is expected to be the following:
fcb_append() -> get offset to write data to
flash_area_write() -> write data
fcb_append_finish() -> writes CRC etc

This is not too great if the location returned by fcb_append() gets rotated to be
scratch area before fcb_append_finish() is called. Which is quite possible
if you’re using FCB to store logs, and data comes from different tasks.

2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
(log entries) is not good enough.

To fix 1 I was going to add linkage to fcb_entry such that FCB can track
of incomplete writes, and add a write routine which should be used when
doing the write.

Issue 2 is a little bit more cumbersome. I want to be able to at least
allow backwards compatibility, where code figures out from data in
the flash which CRC format is there. But then be able to switch to
16 bit CRC format as new flash areas are written. Or keep using
the old format, because you want to do downgrade to old SW version
if necessary.

So that’s what I’m currently thinking of trying to achieve. Let me know
if you guys have other things to take into consideration here.

Re: FCB revamp

Posted by marko kiiskila <ma...@runtime.io>.


> On Jul 26, 2018, at 3:10 PM, marko kiiskila <ma...@runtime.io> wrote:
> 
> Hi,
> 
> there’s few issues with FCB which I need to fix.
> 
> 1) Compressing flash sector in the middle of write.
> Currently the cycle of operation is expected to be the following:
> fcb_append() -> get offset to write data to
> flash_area_write() -> write data
> fcb_append_finish() -> writes CRC etc
> 
> This is not too great if the location returned by fcb_append() gets rotated to be
> scratch area before fcb_append_finish() is called. Which is quite possible
> if you’re using FCB to store logs, and data comes from different tasks.

Note that this is not an issue when storing config in FCB; locking within
config module ascertains that this will not happen. But for log_fcb this
race condition exists. Which makes me lean towards not having the
write tracking inside FCB, but rather at a user of FCB. I.e. with logs it
would cancel writes where target is an area in the flash which was
erased from underneath.

> 
> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
> (log entries) is not good enough.
> 
> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
> of incomplete writes, and add a write routine which should be used when
> doing the write.
> 
> Issue 2 is a little bit more cumbersome. I want to be able to at least
> allow backwards compatibility, where code figures out from data in
> the flash which CRC format is there. But then be able to switch to
> 16 bit CRC format as new flash areas are written. Or keep using
> the old format, because you want to do downgrade to old SW version
> if necessary.
> 
> So that’s what I’m currently thinking of trying to achieve. Let me know
> if you guys have other things to take into consideration here.

Re: FCB revamp

Posted by will sanfilippo <wi...@runtime.io>.

Interesting; I see what this method does for you. I still think there is a fairly high chance of getting a false positive with a 1 byte crc but that might not be a huge issue.

Anyway, thanks for this!

Will

> On Jul 31, 2018, at 12:47 AM, Laczen JMS <la...@gmail.com> wrote:
> 
> Hi Marko and Will,
> 
> I have experienced the same problem while developing nvs for zephyr. I
> always got stuck when I would except that there would be changes to
> flash by an "external" factor. As soon as this could happen there is a
> problem with the meaning of the crc. The crc fails this means:
> a. Incomplete write,
> b. Some tampering occurred,
> c. The len is not pointing to the correct location,
> In all these cases the result will be that the data is not considered
> valid, the only case where a increased crc might be helping is in b.
> 
> In the latest version of nvs I have changed how the data is written to
> flash. The data itself is written from the start of a sector, a
> "allocation entry" is written from the end of a flash sector. This
> allocation entry contains a sector offset and length of the data
> written.  The allocation entry is of fixed size. The writing of data
> is done using the following method:
> 1. Check if there is sufficient space (difference between allocation
> entry location and next data location),
> 2. Write the data at the specific location, this could include a data crc,
> 3. Write the allocation entry, this includes a crc calculated over the
> item length and offset,
> 
> When there is insufficient space a allocation entry of all zeros is
> written at the end of the allocation entries, and a new sector is
> used.
> 
> At startup the allocation entry write position is found by stepping
> through the allocation entries until all 0xff are found. The data
> write position is found by searching the position from which all data
> up to the allocation entry write position is 0xff.
> 
> The disadvantage is of course that the allocation entries take up some
> space (only the offset is added).
> 
> KInd regards,
> 
> Jehudi
> 
> Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles <Ca...@nordicsemi.no>:
>> 
>> + Andrzej
>> 
>> On 30/07/2018, 18:45, "will sanfilippo" <wi...@runtime.io> wrote:
>> 
>>    I guess you could also use some error correction/detection coding scheme as well if you wanted to try and actually recover data that had errors in it (but that seems a bit much). Another thing I have seen done is that just the header has FEC on it. This way, you can almost always get the proper length. Again, not sure if any of this is warranted.
>> 
>>    Will
>>> On Jul 30, 2018, at 6:44 AM, marko kiiskila <ma...@runtime.io> wrote:
>>> 
>>> Hi,
>>> 
>>>> On Jul 30, 2018, at 1:47 PM, Laczen JMS <la...@gmail.com> wrote:
>>>> 
>>>> Hi Marko and Will,
>>>> 
>>>> I have been studying fcb and I think you can leave it at 8 bit as it
>>>> was. The crc can only be interpreted as a check that
>>>> the closing was done properly. As soon as the crc check fails this
>>>> only means that the closing failed. It could just as well
>>>> be fixed to zero instead of a crc.
>>> 
>>> That is a valid point. If you can’t trust your data path to flash, what can you trust?
>>> 
>>>> 
>>>> When writing errors (bit errors) would occur fcb would need a lot more
>>>> protection, the  length which is written first could
>>>> no longer be trusted and it would be impossible to find the crc. The
>>>> writing of the length can also be a more problematic
>>>> case to solve, what happens when the write of the length fails and the
>>>> length that is read is pointing outside the sector ?
>>> 
>>> On the other hand, there are flashes where multiple write cycles to
>>> same location are allowed; you can keep turning more bits to zero.
>>> There you can corrupt the data after writing. And on some the default,
>>> unwritten state is 0 (this we need to address for FCB, image management).
>>> 
>>> You’re right; adding mechanisms to detect corruption of length field, for
>>> example, starts to get complicated and costly. Recovery is easier to do,
>>> however, if we use a stronger version of CRC. I.e. if CRC does not match,
>>> then just go forward a byte at a time until it does.
>>> 
>>> 
>>>> 
>>>> Kind regards,
>>>> 
>>>> Jehudi
>>>> 
>>>> 
>>>> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
>>>>> 
>>>>> Thanks for the response, Will.
>>>>> 
>>>>> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
>>>>> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
>>>>> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
>>>>> mechanism on a channel where we expect bit errors.
>>>>> 
>>>>> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
>>>>> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
>>>>> there is no gain.
>>>>> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
>>>>> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
>>>>> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
>>>>> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
>>>>> option ‘force 8 bits’.
>>>>> 
>>>>> 
>>>>>> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
>>>>>> 
>>>>>> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
>>>>>> 
>>>>>> Will
>>>>>>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> there’s few issues with FCB which I need to fix.
>>>>>>> 
>>>>>>> 1) Compressing flash sector in the middle of write.
>>>>>>> Currently the cycle of operation is expected to be the following:
>>>>>>> fcb_append() -> get offset to write data to
>>>>>>> flash_area_write() -> write data
>>>>>>> fcb_append_finish() -> writes CRC etc
>>>>>>> 
>>>>>>> This is not too great if the location returned by fcb_append() gets rotated to be
>>>>>>> scratch area before fcb_append_finish() is called. Which is quite possible
>>>>>>> if you’re using FCB to store logs, and data comes from different tasks.
>>>>>>> 
>>>>>>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
>>>>>>> (log entries) is not good enough.
>>>>>>> 
>>>>>>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
>>>>>>> of incomplete writes, and add a write routine which should be used when
>>>>>>> doing the write.
>>>>>>> 
>>>>>>> Issue 2 is a little bit more cumbersome. I want to be able to at least
>>>>>>> allow backwards compatibility, where code figures out from data in
>>>>>>> the flash which CRC format is there. But then be able to switch to
>>>>>>> 16 bit CRC format as new flash areas are written. Or keep using
>>>>>>> the old format, because you want to do downgrade to old SW version
>>>>>>> if necessary.
>>>>>>> 
>>>>>>> So that’s what I’m currently thinking of trying to achieve. Let me know
>>>>>>> if you guys have other things to take into consideration here.
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>>

Re: FCB revamp

Posted by Sterling Hughes <st...@gmail.com>.

Hi,

>>>
>>
>> I think (b) is more commonly failure of flash over time, and writes
>> and/or erases that fail.  Some times bits will be “stuck” on
>> flashes, especially inexpensive ones in high volume consumer products
>> that have no correction for this.  The probability that this will 
>> happen
>> across a population of millions of devices is probably pretty high, 
>> and
>> the bugs are hard to find.
>>
> The failure were bits will be stuck is both (b) and (c). There is no 
> reason
> why the stuck bits would not appear in the len data.

Yup, my concern is making sure that invalid data is not passed up to the 
higher layer, which I think would be the case with FCB now.  I think in 
production, over a population of devices, this will manifest itself as 
random crashes.

>>
> I guess by magic you mean some kind of recognizable start record, this
> would certainly limit the data you can store in fcb.
>
> Maybe we can team up to create a fcb2, taking some ideas from fcb and
> some of nvs and create a basis that is easily extended to different 
> kind of
> records to be stored in flash.
>

Definitely, I think this would be great.

> I am thinking about:
> a. using a allocation entry table at the end of a sector
> b. define the allocation entry as:
>     type + len + offset + ate crc (1 byte) + data crc (2 bytes)
>     where: type can be used to determine what is inside the data:
>         0: record without specific meaning (fcb today)
>         1: record contains data id (16 bit) followed by data
>         2: record contains data name (string finished by \0) followed 
> by data
> c. nvs and possibly other storage solutions are added on top of fcb2.
> d. write in such a way that streaming is allowed
>

I agree.  I think interleaving the various types is also important (as 
you reference above), as I think it would be good to share flash storage 
areas with one big FCB that could be used for config/other elements.

Sterling

Re: FCB revamp

Posted by Laczen JMS <la...@gmail.com>.

Hi Sterling,

Op di 31 jul. 2018 om 23:11 schreef Sterling Hughes
<st...@gmail.com>:
>
> Hi,
>
> On 31 Jul 2018, at 0:47, Laczen JMS wrote:
>
> > Hi Marko and Will,
> >
> > I have experienced the same problem while developing nvs for zephyr. I
> > always got stuck when I would except that there would be changes to
> > flash by an "external" factor. As soon as this could happen there is a
> > problem with the meaning of the crc. The crc fails this means:
> > a. Incomplete write,
> > b. Some tampering occurred,
> > c. The len is not pointing to the correct location,
> > In all these cases the result will be that the data is not considered
> > valid, the only case where a increased crc might be helping is in b.
>
> I think (b) is more commonly failure of flash over time, and writes
> and/or erases that fail.  Some times bits will be “stuck” on
> flashes, especially inexpensive ones in high volume consumer products
> that have no correction for this.  The probability that this will happen
> across a population of millions of devices is probably pretty high, and
> the bugs are hard to find.
>
The failure were bits will be stuck is both (b) and (c). There is no reason
why the stuck bits would not appear in the len data.
> >
> > In the latest version of nvs I have changed how the data is written to
> > flash. The data itself is written from the start of a sector, a
> > "allocation entry" is written from the end of a flash sector. This
> > allocation entry contains a sector offset and length of the data
> > written.  The allocation entry is of fixed size. The writing of data
> > is done using the following method:
> > 1. Check if there is sufficient space (difference between allocation
> > entry location and next data location),
> > 2. Write the data at the specific location, this could include a data
> > crc,
> > 3. Write the allocation entry, this includes a crc calculated over the
> > item length and offset,
> >
> > When there is insufficient space a allocation entry of all zeros is
> > written at the end of the allocation entries, and a new sector is
> > used.
> >
> > At startup the allocation entry write position is found by stepping
> > through the allocation entries until all 0xff are found. The data
> > write position is found by searching the position from which all data
> > up to the allocation entry write position is 0xff.
> >
>
> This is a good way of accomplishing this, another is to use a magic on
> every entry.  It would be interesting to allow streaming of entries
> using this approach — fixed length entries that contain type, offset,
> len — perhaps encoded more efficiently.  We’ve been discussing
> adding a flash store interface to Mynewt similar to NVS and one of the
> requirements is definitely to not require the full amount of RAM to
> write a larger BLOB entry to the flash.
I guess by magic you mean some kind of recognizable start record, this
would certainly limit the data you can store in fcb.

Maybe we can team up to create a fcb2, taking some ideas from fcb and
some of nvs and create a basis that is easily extended to different kind of
records to be stored in flash.

I am thinking about:
a. using a allocation entry table at the end of a sector
b. define the allocation entry as:
    type + len + offset + ate crc (1 byte) + data crc (2 bytes)
    where: type can be used to determine what is inside the data:
        0: record without specific meaning (fcb today)
        1: record contains data id (16 bit) followed by data
        2: record contains data name (string finished by \0) followed by data
c. nvs and possibly other storage solutions are added on top of fcb2.
d. write in such a way that streaming is allowed

Kind regards,

Jehudi
>
> Sterling

Re: FCB revamp

Posted by Sterling Hughes <st...@gmail.com>.

Hi,

On 31 Jul 2018, at 0:47, Laczen JMS wrote:

> Hi Marko and Will,
>
> I have experienced the same problem while developing nvs for zephyr. I
> always got stuck when I would except that there would be changes to
> flash by an "external" factor. As soon as this could happen there is a
> problem with the meaning of the crc. The crc fails this means:
> a. Incomplete write,
> b. Some tampering occurred,
> c. The len is not pointing to the correct location,
> In all these cases the result will be that the data is not considered
> valid, the only case where a increased crc might be helping is in b.

I think (b) is more commonly failure of flash over time, and writes 
and/or erases that fail.  Some times bits will be “stuck” on 
flashes, especially inexpensive ones in high volume consumer products 
that have no correction for this.  The probability that this will happen 
across a population of millions of devices is probably pretty high, and 
the bugs are hard to find.

>
> In the latest version of nvs I have changed how the data is written to
> flash. The data itself is written from the start of a sector, a
> "allocation entry" is written from the end of a flash sector. This
> allocation entry contains a sector offset and length of the data
> written.  The allocation entry is of fixed size. The writing of data
> is done using the following method:
> 1. Check if there is sufficient space (difference between allocation
> entry location and next data location),
> 2. Write the data at the specific location, this could include a data 
> crc,
> 3. Write the allocation entry, this includes a crc calculated over the
> item length and offset,
>
> When there is insufficient space a allocation entry of all zeros is
> written at the end of the allocation entries, and a new sector is
> used.
>
> At startup the allocation entry write position is found by stepping
> through the allocation entries until all 0xff are found. The data
> write position is found by searching the position from which all data
> up to the allocation entry write position is 0xff.
>

This is a good way of accomplishing this, another is to use a magic on 
every entry.  It would be interesting to allow streaming of entries 
using this approach — fixed length entries that contain type, offset, 
len — perhaps encoded more efficiently.  We’ve been discussing 
adding a flash store interface to Mynewt similar to NVS and one of the 
requirements is definitely to not require the full amount of RAM to 
write a larger BLOB entry to the flash.

Sterling

Re: FCB revamp

Posted by Laczen JMS <la...@gmail.com>.

Hi Marko and Will,

I have experienced the same problem while developing nvs for zephyr. I
always got stuck when I would except that there would be changes to
flash by an "external" factor. As soon as this could happen there is a
problem with the meaning of the crc. The crc fails this means:
a. Incomplete write,
b. Some tampering occurred,
c. The len is not pointing to the correct location,
In all these cases the result will be that the data is not considered
valid, the only case where a increased crc might be helping is in b.

In the latest version of nvs I have changed how the data is written to
flash. The data itself is written from the start of a sector, a
"allocation entry" is written from the end of a flash sector. This
allocation entry contains a sector offset and length of the data
written.  The allocation entry is of fixed size. The writing of data
is done using the following method:
1. Check if there is sufficient space (difference between allocation
entry location and next data location),
2. Write the data at the specific location, this could include a data crc,
3. Write the allocation entry, this includes a crc calculated over the
item length and offset,

When there is insufficient space a allocation entry of all zeros is
written at the end of the allocation entries, and a new sector is
used.

At startup the allocation entry write position is found by stepping
through the allocation entries until all 0xff are found. The data
write position is found by searching the position from which all data
up to the allocation entry write position is 0xff.

The disadvantage is of course that the allocation entries take up some
space (only the offset is added).

KInd regards,

Jehudi

Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles <Ca...@nordicsemi.no>:
>
> + Andrzej
>
> On 30/07/2018, 18:45, "will sanfilippo" <wi...@runtime.io> wrote:
>
>     I guess you could also use some error correction/detection coding scheme as well if you wanted to try and actually recover data that had errors in it (but that seems a bit much). Another thing I have seen done is that just the header has FEC on it. This way, you can almost always get the proper length. Again, not sure if any of this is warranted.
>
>     Will
>     > On Jul 30, 2018, at 6:44 AM, marko kiiskila <ma...@runtime.io> wrote:
>     >
>     > Hi,
>     >
>     >> On Jul 30, 2018, at 1:47 PM, Laczen JMS <la...@gmail.com> wrote:
>     >>
>     >> Hi Marko and Will,
>     >>
>     >> I have been studying fcb and I think you can leave it at 8 bit as it
>     >> was. The crc can only be interpreted as a check that
>     >> the closing was done properly. As soon as the crc check fails this
>     >> only means that the closing failed. It could just as well
>     >> be fixed to zero instead of a crc.
>     >
>     > That is a valid point. If you can’t trust your data path to flash, what can you trust?
>     >
>     >>
>     >> When writing errors (bit errors) would occur fcb would need a lot more
>     >> protection, the  length which is written first could
>     >> no longer be trusted and it would be impossible to find the crc. The
>     >> writing of the length can also be a more problematic
>     >> case to solve, what happens when the write of the length fails and the
>     >> length that is read is pointing outside the sector ?
>     >
>     > On the other hand, there are flashes where multiple write cycles to
>     > same location are allowed; you can keep turning more bits to zero.
>     > There you can corrupt the data after writing. And on some the default,
>     > unwritten state is 0 (this we need to address for FCB, image management).
>     >
>     > You’re right; adding mechanisms to detect corruption of length field, for
>     > example, starts to get complicated and costly. Recovery is easier to do,
>     > however, if we use a stronger version of CRC. I.e. if CRC does not match,
>     > then just go forward a byte at a time until it does.
>     >
>     >
>     >>
>     >> Kind regards,
>     >>
>     >> Jehudi
>     >>
>     >>
>     >> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
>     >>>
>     >>> Thanks for the response, Will.
>     >>>
>     >>> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
>     >>> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
>     >>> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
>     >>> mechanism on a channel where we expect bit errors.
>     >>>
>     >>> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
>     >>> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
>     >>> there is no gain.
>     >>> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
>     >>> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
>     >>> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
>     >>> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
>     >>> option ‘force 8 bits’.
>     >>>
>     >>>
>     >>>> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
>     >>>>
>     >>>> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
>     >>>>
>     >>>> Will
>     >>>>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
>     >>>>>
>     >>>>> Hi,
>     >>>>>
>     >>>>> there’s few issues with FCB which I need to fix.
>     >>>>>
>     >>>>> 1) Compressing flash sector in the middle of write.
>     >>>>> Currently the cycle of operation is expected to be the following:
>     >>>>> fcb_append() -> get offset to write data to
>     >>>>> flash_area_write() -> write data
>     >>>>> fcb_append_finish() -> writes CRC etc
>     >>>>>
>     >>>>> This is not too great if the location returned by fcb_append() gets rotated to be
>     >>>>> scratch area before fcb_append_finish() is called. Which is quite possible
>     >>>>> if you’re using FCB to store logs, and data comes from different tasks.
>     >>>>>
>     >>>>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
>     >>>>> (log entries) is not good enough.
>     >>>>>
>     >>>>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
>     >>>>> of incomplete writes, and add a write routine which should be used when
>     >>>>> doing the write.
>     >>>>>
>     >>>>> Issue 2 is a little bit more cumbersome. I want to be able to at least
>     >>>>> allow backwards compatibility, where code figures out from data in
>     >>>>> the flash which CRC format is there. But then be able to switch to
>     >>>>> 16 bit CRC format as new flash areas are written. Or keep using
>     >>>>> the old format, because you want to do downgrade to old SW version
>     >>>>> if necessary.
>     >>>>>
>     >>>>> So that’s what I’m currently thinking of trying to achieve. Let me know
>     >>>>> if you guys have other things to take into consideration here.
>     >>>>
>     >>>
>     >
>
>
>

Re: FCB revamp

Posted by "Cufi, Carles" <Ca...@nordicsemi.no>.

+ Andrzej

On 30/07/2018, 18:45, "will sanfilippo" <wi...@runtime.io> wrote:

    I guess you could also use some error correction/detection coding scheme as well if you wanted to try and actually recover data that had errors in it (but that seems a bit much). Another thing I have seen done is that just the header has FEC on it. This way, you can almost always get the proper length. Again, not sure if any of this is warranted.
    
    Will
    > On Jul 30, 2018, at 6:44 AM, marko kiiskila <ma...@runtime.io> wrote:
    > 
    > Hi,
    > 
    >> On Jul 30, 2018, at 1:47 PM, Laczen JMS <la...@gmail.com> wrote:
    >> 
    >> Hi Marko and Will,
    >> 
    >> I have been studying fcb and I think you can leave it at 8 bit as it
    >> was. The crc can only be interpreted as a check that
    >> the closing was done properly. As soon as the crc check fails this
    >> only means that the closing failed. It could just as well
    >> be fixed to zero instead of a crc.
    > 
    > That is a valid point. If you can’t trust your data path to flash, what can you trust?
    > 
    >> 
    >> When writing errors (bit errors) would occur fcb would need a lot more
    >> protection, the  length which is written first could
    >> no longer be trusted and it would be impossible to find the crc. The
    >> writing of the length can also be a more problematic
    >> case to solve, what happens when the write of the length fails and the
    >> length that is read is pointing outside the sector ?
    > 
    > On the other hand, there are flashes where multiple write cycles to
    > same location are allowed; you can keep turning more bits to zero.
    > There you can corrupt the data after writing. And on some the default,
    > unwritten state is 0 (this we need to address for FCB, image management).
    > 
    > You’re right; adding mechanisms to detect corruption of length field, for
    > example, starts to get complicated and costly. Recovery is easier to do,
    > however, if we use a stronger version of CRC. I.e. if CRC does not match,
    > then just go forward a byte at a time until it does.
    > 
    > 
    >> 
    >> Kind regards,
    >> 
    >> Jehudi
    >> 
    >> 
    >> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
    >>> 
    >>> Thanks for the response, Will.
    >>> 
    >>> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
    >>> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
    >>> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
    >>> mechanism on a channel where we expect bit errors.
    >>> 
    >>> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
    >>> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
    >>> there is no gain.
    >>> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
    >>> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
    >>> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
    >>> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
    >>> option ‘force 8 bits’.
    >>> 
    >>> 
    >>>> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
    >>>> 
    >>>> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
    >>>> 
    >>>> Will
    >>>>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
    >>>>> 
    >>>>> Hi,
    >>>>> 
    >>>>> there’s few issues with FCB which I need to fix.
    >>>>> 
    >>>>> 1) Compressing flash sector in the middle of write.
    >>>>> Currently the cycle of operation is expected to be the following:
    >>>>> fcb_append() -> get offset to write data to
    >>>>> flash_area_write() -> write data
    >>>>> fcb_append_finish() -> writes CRC etc
    >>>>> 
    >>>>> This is not too great if the location returned by fcb_append() gets rotated to be
    >>>>> scratch area before fcb_append_finish() is called. Which is quite possible
    >>>>> if you’re using FCB to store logs, and data comes from different tasks.
    >>>>> 
    >>>>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
    >>>>> (log entries) is not good enough.
    >>>>> 
    >>>>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
    >>>>> of incomplete writes, and add a write routine which should be used when
    >>>>> doing the write.
    >>>>> 
    >>>>> Issue 2 is a little bit more cumbersome. I want to be able to at least
    >>>>> allow backwards compatibility, where code figures out from data in
    >>>>> the flash which CRC format is there. But then be able to switch to
    >>>>> 16 bit CRC format as new flash areas are written. Or keep using
    >>>>> the old format, because you want to do downgrade to old SW version
    >>>>> if necessary.
    >>>>> 
    >>>>> So that’s what I’m currently thinking of trying to achieve. Let me know
    >>>>> if you guys have other things to take into consideration here.
    >>>> 
    >>> 
    >

Re: FCB revamp

Posted by will sanfilippo <wi...@runtime.io>.

I guess you could also use some error correction/detection coding scheme as well if you wanted to try and actually recover data that had errors in it (but that seems a bit much). Another thing I have seen done is that just the header has FEC on it. This way, you can almost always get the proper length. Again, not sure if any of this is warranted.

Will
> On Jul 30, 2018, at 6:44 AM, marko kiiskila <ma...@runtime.io> wrote:
> 
> Hi,
> 
>> On Jul 30, 2018, at 1:47 PM, Laczen JMS <la...@gmail.com> wrote:
>> 
>> Hi Marko and Will,
>> 
>> I have been studying fcb and I think you can leave it at 8 bit as it
>> was. The crc can only be interpreted as a check that
>> the closing was done properly. As soon as the crc check fails this
>> only means that the closing failed. It could just as well
>> be fixed to zero instead of a crc.
> 
> That is a valid point. If you can’t trust your data path to flash, what can you trust?
> 
>> 
>> When writing errors (bit errors) would occur fcb would need a lot more
>> protection, the  length which is written first could
>> no longer be trusted and it would be impossible to find the crc. The
>> writing of the length can also be a more problematic
>> case to solve, what happens when the write of the length fails and the
>> length that is read is pointing outside the sector ?
> 
> On the other hand, there are flashes where multiple write cycles to
> same location are allowed; you can keep turning more bits to zero.
> There you can corrupt the data after writing. And on some the default,
> unwritten state is 0 (this we need to address for FCB, image management).
> 
> You’re right; adding mechanisms to detect corruption of length field, for
> example, starts to get complicated and costly. Recovery is easier to do,
> however, if we use a stronger version of CRC. I.e. if CRC does not match,
> then just go forward a byte at a time until it does.
> 
> 
>> 
>> Kind regards,
>> 
>> Jehudi
>> 
>> 
>> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
>>> 
>>> Thanks for the response, Will.
>>> 
>>> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
>>> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
>>> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
>>> mechanism on a channel where we expect bit errors.
>>> 
>>> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
>>> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
>>> there is no gain.
>>> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
>>> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
>>> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
>>> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
>>> option ‘force 8 bits’.
>>> 
>>> 
>>>> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
>>>> 
>>>> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
>>>> 
>>>> Will
>>>>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> there’s few issues with FCB which I need to fix.
>>>>> 
>>>>> 1) Compressing flash sector in the middle of write.
>>>>> Currently the cycle of operation is expected to be the following:
>>>>> fcb_append() -> get offset to write data to
>>>>> flash_area_write() -> write data
>>>>> fcb_append_finish() -> writes CRC etc
>>>>> 
>>>>> This is not too great if the location returned by fcb_append() gets rotated to be
>>>>> scratch area before fcb_append_finish() is called. Which is quite possible
>>>>> if you’re using FCB to store logs, and data comes from different tasks.
>>>>> 
>>>>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
>>>>> (log entries) is not good enough.
>>>>> 
>>>>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
>>>>> of incomplete writes, and add a write routine which should be used when
>>>>> doing the write.
>>>>> 
>>>>> Issue 2 is a little bit more cumbersome. I want to be able to at least
>>>>> allow backwards compatibility, where code figures out from data in
>>>>> the flash which CRC format is there. But then be able to switch to
>>>>> 16 bit CRC format as new flash areas are written. Or keep using
>>>>> the old format, because you want to do downgrade to old SW version
>>>>> if necessary.
>>>>> 
>>>>> So that’s what I’m currently thinking of trying to achieve. Let me know
>>>>> if you guys have other things to take into consideration here.
>>>> 
>>> 
>

Re: FCB revamp

Posted by marko kiiskila <ma...@runtime.io>.

Hi,

> On Jul 30, 2018, at 1:47 PM, Laczen JMS <la...@gmail.com> wrote:
> 
> Hi Marko and Will,
> 
> I have been studying fcb and I think you can leave it at 8 bit as it
> was. The crc can only be interpreted as a check that
> the closing was done properly. As soon as the crc check fails this
> only means that the closing failed. It could just as well
> be fixed to zero instead of a crc.

That is a valid point. If you can’t trust your data path to flash, what can you trust?

> 
> When writing errors (bit errors) would occur fcb would need a lot more
> protection, the  length which is written first could
> no longer be trusted and it would be impossible to find the crc. The
> writing of the length can also be a more problematic
> case to solve, what happens when the write of the length fails and the
> length that is read is pointing outside the sector ?

On the other hand, there are flashes where multiple write cycles to
same location are allowed; you can keep turning more bits to zero.
There you can corrupt the data after writing. And on some the default,
unwritten state is 0 (this we need to address for FCB, image management).

You’re right; adding mechanisms to detect corruption of length field, for
example, starts to get complicated and costly. Recovery is easier to do,
however, if we use a stronger version of CRC. I.e. if CRC does not match,
then just go forward a byte at a time until it does.


> 
> Kind regards,
> 
> Jehudi
> 
> 
> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
>> 
>> Thanks for the response, Will.
>> 
>> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
>> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
>> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
>> mechanism on a channel where we expect bit errors.
>> 
>> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
>> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
>> there is no gain.
>> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
>> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
>> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
>> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
>> option ‘force 8 bits’.
>> 
>> 
>>> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
>>> 
>>> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
>>> 
>>> Will
>>>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> there’s few issues with FCB which I need to fix.
>>>> 
>>>> 1) Compressing flash sector in the middle of write.
>>>> Currently the cycle of operation is expected to be the following:
>>>> fcb_append() -> get offset to write data to
>>>> flash_area_write() -> write data
>>>> fcb_append_finish() -> writes CRC etc
>>>> 
>>>> This is not too great if the location returned by fcb_append() gets rotated to be
>>>> scratch area before fcb_append_finish() is called. Which is quite possible
>>>> if you’re using FCB to store logs, and data comes from different tasks.
>>>> 
>>>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
>>>> (log entries) is not good enough.
>>>> 
>>>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
>>>> of incomplete writes, and add a write routine which should be used when
>>>> doing the write.
>>>> 
>>>> Issue 2 is a little bit more cumbersome. I want to be able to at least
>>>> allow backwards compatibility, where code figures out from data in
>>>> the flash which CRC format is there. But then be able to switch to
>>>> 16 bit CRC format as new flash areas are written. Or keep using
>>>> the old format, because you want to do downgrade to old SW version
>>>> if necessary.
>>>> 
>>>> So that’s what I’m currently thinking of trying to achieve. Let me know
>>>> if you guys have other things to take into consideration here.
>>> 
>>

Re: FCB revamp

Posted by Laczen JMS <la...@gmail.com>.

Hi Marko and Will,

I have been studying fcb and I think you can leave it at 8 bit as it
was. The crc can only be interpreted as a check that
the closing was done properly. As soon as the crc check fails this
only means that the closing failed. It could just as well
be fixed to zero instead of a crc.

When writing errors (bit errors) would occur fcb would need a lot more
protection, the  length which is written first could
no longer be trusted and it would be impossible to find the crc. The
writing of the length can also be a more problematic
case to solve, what happens when the write of the length fails and the
length that is read is pointing outside the sector ?

Kind regards,

Jehudi


Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila <ma...@runtime.io>:
>
> Thanks for the response, Will.
>
> I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
> it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
> onlly happen on crash/power outage within a specific time window. This is not used as an error detection
> mechanism on a channel where we expect bit errors.
>
> The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
> really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
> there is no gain.
> There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
> has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
> So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
> a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
> option ‘force 8 bits’.
>
>
> > On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
> >
> > I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
> >
> > Will
> >> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
> >>
> >> Hi,
> >>
> >> there’s few issues with FCB which I need to fix.
> >>
> >> 1) Compressing flash sector in the middle of write.
> >> Currently the cycle of operation is expected to be the following:
> >> fcb_append() -> get offset to write data to
> >> flash_area_write() -> write data
> >> fcb_append_finish() -> writes CRC etc
> >>
> >> This is not too great if the location returned by fcb_append() gets rotated to be
> >> scratch area before fcb_append_finish() is called. Which is quite possible
> >> if you’re using FCB to store logs, and data comes from different tasks.
> >>
> >> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
> >> (log entries) is not good enough.
> >>
> >> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
> >> of incomplete writes, and add a write routine which should be used when
> >> doing the write.
> >>
> >> Issue 2 is a little bit more cumbersome. I want to be able to at least
> >> allow backwards compatibility, where code figures out from data in
> >> the flash which CRC format is there. But then be able to switch to
> >> 16 bit CRC format as new flash areas are written. Or keep using
> >> the old format, because you want to do downgrade to old SW version
> >> if necessary.
> >>
> >> So that’s what I’m currently thinking of trying to achieve. Let me know
> >> if you guys have other things to take into consideration here.
> >
>

Re: FCB revamp

Posted by marko kiiskila <ma...@runtime.io>.

Thanks for the response, Will.

I made it one-byte because the scenario where the CRC check strength comes into play is somewhat rare;
it is to detect partial writes. I.e. when fcb_append_finish() fails to get called. This, presumably, would
onlly happen on crash/power outage within a specific time window. This is not used as an error detection
mechanism on a channel where we expect bit errors.

The way I did it was I added 2 syscfg knobs to control which CRC is included in the build. In case you get
really tight on code space. Of course, newtmgr uses CRC16, so if you have that enabled,
there is no gain.
There’s 3 different options when starting FCB: inherit from flash, force 16 bit, or force 8 bits. If the flash region
has not been initialized with anything, then the ‘inherit’ option will prefer 16 bits over 8 bits.
So if you need to worry about backwards compatibility, set ‘inherit’. If you want to use 16 bit, and are flashing
a new device, use ‘inherit’ or set ‘force 16bits’. If you need to keep FCB region backwards compatible, use
option ‘force 8 bits’.

> On Jul 27, 2018, at 11:05 PM, will sanfilippo <wi...@runtime.io> wrote:
> 
> I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).
> 
> Will
>> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
>> 
>> Hi,
>> 
>> there’s few issues with FCB which I need to fix.
>> 
>> 1) Compressing flash sector in the middle of write.
>> Currently the cycle of operation is expected to be the following:
>> fcb_append() -> get offset to write data to
>> flash_area_write() -> write data
>> fcb_append_finish() -> writes CRC etc
>> 
>> This is not too great if the location returned by fcb_append() gets rotated to be
>> scratch area before fcb_append_finish() is called. Which is quite possible
>> if you’re using FCB to store logs, and data comes from different tasks.
>> 
>> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
>> (log entries) is not good enough.
>> 
>> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
>> of incomplete writes, and add a write routine which should be used when
>> doing the write.
>> 
>> Issue 2 is a little bit more cumbersome. I want to be able to at least
>> allow backwards compatibility, where code figures out from data in
>> the flash which CRC format is there. But then be able to switch to
>> 16 bit CRC format as new flash areas are written. Or keep using
>> the old format, because you want to do downgrade to old SW version
>> if necessary.
>> 
>> So that’s what I’m currently thinking of trying to achieve. Let me know
>> if you guys have other things to take into consideration here.
>

Re: FCB revamp

Posted by will sanfilippo <wi...@runtime.io>.

I realize I am a bit late with these comments but better late than never I guess. Well, maybe some or all of these comments should never be made :-) This could be a horrible idea but why not just make it backward incompatible? (for #2). Maybe have a syscfg for folks that want to use the old format in the code? I would even consider making all entries have a two-byte CRC (regardless of length of data written). I presume the resume that it was one-byte was to save flash space? Heck, maybe even allow the syscfg to specify a 32-bit CRC (for those with lots of flash and/or really do not want a false positive to occur).

Will
> On Jul 26, 2018, at 5:10 AM, marko kiiskila <ma...@runtime.io> wrote:
> 
> Hi,
> 
> there’s few issues with FCB which I need to fix.
> 
> 1) Compressing flash sector in the middle of write.
> Currently the cycle of operation is expected to be the following:
> fcb_append() -> get offset to write data to
> flash_area_write() -> write data
> fcb_append_finish() -> writes CRC etc
> 
> This is not too great if the location returned by fcb_append() gets rotated to be
> scratch area before fcb_append_finish() is called. Which is quite possible
> if you’re using FCB to store logs, and data comes from different tasks.
> 
> 2) CRC is 1 byte only. It’s ok for short config entries, but for longer things
> (log entries) is not good enough.
> 
> To fix 1 I was going to add linkage to fcb_entry such that FCB can track
> of incomplete writes, and add a write routine which should be used when
> doing the write.
> 
> Issue 2 is a little bit more cumbersome. I want to be able to at least
> allow backwards compatibility, where code figures out from data in
> the flash which CRC format is there. But then be able to switch to
> 16 bit CRC format as new flash areas are written. Or keep using
> the old format, because you want to do downgrade to old SW version
> if necessary.
> 
> So that’s what I’m currently thinking of trying to achieve. Let me know
> if you guys have other things to take into consideration here.