You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nuttx.apache.org by "resyfer (via GitHub)" <gi...@apache.org> on 2024/03/23 15:10:05 UTC

[I] [Discussion] NAND flash file system [nuttx]

resyfer opened a new issue, #11979:
URL: https://github.com/apache/nuttx/issues/11979

Hi, as part of GSoC '24, I wish to propose a file system to support NAND flash file systems, that would take into account the challenges of an RTOS, limited memory of embdedded systems, and the practical range of devices used in these systems.

The file system I have in mind, what I call mnemofs, is not something entirely new. It's a heavily modified version of [littlefs](https://github.com/littlefs-project/littlefs) that has been modified to suit NAND flash needs like bad block management.

It will be quite long, and I would really love suggestions and constructive criticism on this, and I would be grateful if you could give this a read.

## NAND Flash vs NOR Flash

While [this page](https://www.embedded.com/flash-101-nand-flash-vs-nor-flash/) and [this page](https://www.baeldung.com/cs/flash-memory-nor-vs-nand) sum it up pretty well, these are the advantages and disadvantages of NAND flash in a nutshell:
- Advantages: Less size, (about 40%) less cost per byte, faster writes/erases.
- Disadvantages: Slower reads, bad blocks right from production, higher chance of bit flipping, higher standby power requirement.

## Mnemofs

### Behavior
COnsidering the reasoning behind similar file systems like [`UBIFS`](http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf) and [`littlefs`](https://github.com/littlefs-project/littlefs/blob/master/DESIGN.md), mnemofs will also be a Copy-On-Write file system. This is accentuated by the fact that a block needs to be erased before a page can be *updated* in it, thus requiring out-of-place updates.

### Journal
To facilitate atomic updates, which in turn promotes power-loss resilience, a journal is required.

The journal in mnemofs will be of `n+2` blocks and will be in the form of a circular singly linked list. It's sufficient to be a singly linked list, any case where traversal is required also requires traversal from the start (for eg. to recreate a file with its updates).

The first `n` blocks will store updates, followed by their checksums (VERY similar to `littlefs`). The function of the last two blocks is explained later.

Due to the limitations of journaling file systems, the journal will move once it has faced enough of P/E cycles (Program/Erase cycles). Again, the moving of the journal is explained later.

The journal will be committed after it's full, or any log that needs to be inserted can be written to it.

### Master Node & Master Blocks

The directories will be, as usual, in a tree structure. The master node is the root node. As mnemofs is CoW, once the journal is being committed, the tree will be updated from the bottom by making copies with the updated values. It will be done to prevent multiple updates to the inner.

This means that, like demonstrated in [littlefs design document](https://github.com/littlefs-project/littlefs/blob/master/DESIGN.md#existing-designs), the root will be the last to get updated, which is the master node.

Every update of the master node will be written to one of the last two blocks of the journal (alternatively). There are two blocks at the end of the journal, called the master blocks, as the smallest erasable unit is one block. To prevent traversal for the master nodes, the head of the journal will have the values of the master blocks.

The master node contains a pointer to the journal, while the journal, through this mechanism, contains pointers to the master node.

### Moving journal

Moving of the journal will happen when the blocks the journal is on wears down to the upper limit. Then, the entire journal will be moved. To prevent the constraints of a contiguous space, a linked list structure was used.

The journal will be copied by first copying the latest change such that block `n+1` block has 1 log, while `n+2` block has at most 1 log. Then the head of the journal will be copied, and then the rest of the block.

This would mean that the master block will be updated after moving of journal to point to the new journal. Thus a new log will be put into the **new** journal for this update.

### Power-Loss Resilience

The journal, plus maintaining the old master node, will lead to power-loss resilience as the old tree can be reconstructed upon mount again.

Thus it will look something like this:
![image](https://github.com/apache/nuttx/assets/74897008/4f091ffc-406e-4226-b51a-e48adfef2ae2)

### Block Allocator

Blocks will have a 2 bit wear value, ie. 0-3. Once all blocks hit 3, they will all be reset.

There is a need for a block allocator that has 1 job. It gives the most suitable block to write to whenever asked. The storage will be divided into groups of 2^14 blocks, called chunks. The chunk wear will be a total of the wear of blocks inside it.

Averages are:
|Chunk Wear Ranges|Chunk Average Wear|
|---|---|
|0|0|
|(0, 16384]|1|
|(16384, 32768]|2|
|(32768, 49152]|3|

The block allocator will have 3 separate data structures, but all of them will share the same node structure.

![image11](https://github.com/apache/nuttx/assets/74897008/886deb64-504d-4a88-a66d-3cdf01673886)

The free chunks are completely free chunks, and are sorted in a min-heap according to their wear levels. They are written in the image below like `chunk_no (wear)`.

![image12](https://github.com/apache/nuttx/assets/74897008/f8c3148a-0d84-461b-abe6-06b3a6d91351)

Then partially filled chunks will be sorted in a min-heap using their wear levels, as well as a red-black tree (using a different set of points) using their chunk numbers for easy searching.

![image7](https://github.com/apache/nuttx/assets/74897008/0d996402-0ebe-4389-ba9b-bb05678e0cab)

The final section, the fully filled chunks, will be sorted according to their chunk numbers as a red-black tree.

![image8](https://github.com/apache/nuttx/assets/74897008/3475d7e1-43d5-41bd-8887-a54d2dc70e14)

The block allocator will use up all of the free chunks first, and then the partially filled chunks.

#### Blocks in a chunk

Status of blocks will be in [Roaring bitmap(s)](https://roaringbitmap.org/). The graphs in the first comment shows the comparisons.

### Data

#### Directories and direntries

Directories will be inaccessible-to-user files containing directory entries.

Each file name and directory name will be hashed to an integer to reduce the string comparisons. It will be done such that lexically similar strings will have different hashes, so that for hash collisions, string comparison is quick.

Graph in first comment for comparison of a personally devised hash-search.

#### Files

On-flash files will be arranges in a CTZ skip list like in littlefs.

There will be a global LRU for changes to files and directories. When a node is removed from the LRU, it will be written to the journal. Thus, to recreate a portion of the file after change, the LRU needs to be read, then the journal, and then finally the flash.

This is quicker than reading the journal or pages directly from flash at the expense of some RAM.

## mkmnemofs

This will format the flash to the file system's needs. It will be a standalone utility as well as part of the mount options.

## Conclusion

This is mnemofs. This was quite a lot, and I hope you believe me, my original attempt was around 19 pages worth of information. Thus, if you need clarification for any part, or if you find any glaring holes in this, please do let me know. I can then update this as well. Thank you in advance!

## Glaring Limitations

- This follows in the footsteps of JFFS2 to be a memory hog, and uses quite a bit of memory, especially as it linearly increases.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Discussion] NAND flash file system [nuttx]

Posted by "acassis (via GitHub)" <gi...@apache.org>.

acassis commented on issue #11979:
URL: https://github.com/apache/nuttx/issues/11979#issuecomment-2016948921

   @resyfer wow!!! Amazing work!
   
   @xiaoxiang781216 could you or someone from Xiaomi with previous experience with NAND file system review this proposal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Discussion] NAND flash file system [nuttx]

Posted by "xiaoxiang781216 (via GitHub)" <gi...@apache.org>.

xiaoxiang781216 commented on issue #11979:
URL: https://github.com/apache/nuttx/issues/11979#issuecomment-2017091222

   @Donny9 could you review the plan?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Discussion] NAND flash file system [nuttx]

Posted by "resyfer (via GitHub)" <gi...@apache.org>.

resyfer commented on issue #11979:
URL: https://github.com/apache/nuttx/issues/11979#issuecomment-2016522644

   # Graphs
   
   ## Block status
   <img src='https://github.com/apache/nuttx/assets/74897008/4c8df51c-077a-4be7-84d9-6122caf320c7' width='250'>
   <img src='https://github.com/apache/nuttx/assets/74897008/64dbe6f0-892b-4e2b-83dc-0ee80edea33d' width='250'>
   <br/>
   <img src='https://github.com/apache/nuttx/assets/74897008/8263a837-0e1e-4676-b29f-5b49510d52a1' width='250'>
   <img src='https://github.com/apache/nuttx/assets/74897008/97d4bdc6-e875-4f17-a9f3-07590e4754eb' width='250'>
   
   ## Hash Search
   <img src='https://github.com/apache/nuttx/assets/74897008/4ea1aed4-2830-49fc-87c4-532c20f55793' width='250'>
   <img src='https://github.com/apache/nuttx/assets/74897008/adc3b158-921f-4ec7-9578-da16e65c3f11' width='250'>
   
   ## Randomness
   
   POSIX random function slightly modified:
   
   <img src='https://github.com/apache/nuttx/assets/74897008/f5912d0c-8344-42e5-9798-b91494dd2412' width='250'>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org