File System Paper
Filesystem options for the Netbook Lx
During the development of the Netbook LX some filesystem issues were brought up:
- A lot of stuff needs to be packed into the Nand flash
- During development a CF needs to be used as the Root filesystem and this interacts badly with suspend/resume.
- Whilst the machine won't be rebooted very often, long disk checks are not favorable and should be possibly avoided.
- Both CF and Nand flash have quite limited numbers of max writes and the devices should remain working for years, rather than days or weeks.
- The system needs to be resistant to corruption when batteries go flat/power fails
- There should be a way of integrating extra applications on CF or MMC with the base system on NAND.
- Users need to store their files and data somewhere.
- User data is really important and it should be highest priority not to lose it whatever happens.
- Users will remove removable media when they feel like - this should be dealt with gracefully, _always_
- We can't easily change BooSt to read different filesystems
- We are expecting 20,000 warm boots but < 10 cold boots over 4 years.
There are a number of different filesystems, bits of hardware, 'layout' issues and kernel issues which interact in a complicated fashion. This document describes the layout of the system that meets certain needs. Background discussion on the hardware and fundamentals in now in File System Background.
Here are what seems to be plausible schemes for laying out our filesystems. Most issues are now resolved but will no doubt be a few wrinkles still to deal with. A picture is worth a thousand words so:
There are two plausible layouts:
- Running from flash(JFFS2) with some stuff in RAM to reduce writes.
- Loading a system image from flash, then running entirely in RAM.
In many ways these tend towards a similar setup but the fundamental differnces are:
Using a writable filesystem on the flash:
- Allows system updates - security updates, users can add and remove applications to get the set they want in the flash.
- Only gives about 1.7x compression, thus reducing base application set.
Using a read-only filesystem on the flash:
- Means that system updates must be monolithic - complete flash image re-write, fatal if it fails.
- Gives better compression (2-2.5x ?) = better base apps set
- Any security updates/app changes applied in RAM would be lost on cold boot - confusing to the user who thinks they've done them.
We decided to favour the former approach due to the importance of easily being able to swap apps around to get the set you need that fits.
The first fixed thing is that BooSt can only read a FAT fs, and we can't easily change that, so at least the kernel must be in a FAT partition. BooSt also lives in it's own small partition (0.5Mb). BooSt manages a spare area so that it can replace bad blocks in the internal flash. If this spare area is propotional to the size of the FAT partition then keeping that small gives us a bit of extra flash space. (Any nand 'spare' blocks the bootloader is managing separately from the OS are 'wasted space', or at least an inefficiency). We could probably gain a small amount of space by making BooSt understand JFFS2, but that's tricky and thus probably not worth the effort.
BooSt could load one big Kernel+initrd image but for the reason above its best for booSt to load the kernel from a small (1.5Mb) partition and for the kernel to read the remaining partionned flash itself.
There will also be a small user data partition on the flash (1-2Mb). See User Data for more details.
- /dev/mtdblock/1 - 0.5Mb BooSt (raw?)
- /dev/mtdblock/2 - ~1Mb Kernel (FAT16?)
- /dev/mtdblock/3 - ~60Mb Rootfs (jffs2)
- /dev/mtdblock/4 - 1-2Mb User files (jffs2)
There are several 'types' of user data:
- Application configs - screen background, perferences, printer name etc
- User config - personal info - name, address, passwords User 'unobvious' data - data managed by the system - contacts, appointments User Files - stuff they are working on, have downloaded, or created
These need to be treated differently in order for the device to 'do what people expect', within the resources available.
There simply isn't space for much user data in internal flash (<2Mb). There is probably some more space in RAM, but still limited to a few tens of MB. And the more they have here the less space they have for running apps. Exactly how much there is typically spare remains to be seen.
This means that a base machine with no removable media can only be used in a fairly limited way - no backup for created files in RAM, little room for downloads. It almost certainly makes sense to include at least a few Mb of external storage (CF/MMC) with the machine to enable backups.
Important and rarely-changed user config data (name, networking setup) will be stored in the internal flash. There will be a small (1-2Mb) partition dedicated to this purpose. Other 'internal' user data such as contacts and appointments will stored here. All other files the user is expected to manage themselves. The contacts and appointments info is treated differently from most other data because users are not used to managing it like a 'file/document' - it's 'data in the machine'. This has implications for synchronisation too.
Filesystem layout details
- / is /dev/mtdblock/2 mounted read-only for general use.
We'll use tempfs for the ram filesystem as it's variable-size and size-limitable.
- /tmp (temporary files): In tempfs, reducing writes to flash and speeding things up.
- /var (application temporary data, logging). In tempfs (keeps logging and other writes off flash). Risk of loss of package database changes
- /etc (config files). In tempfs.
- /etc and /var both contain information that should be persistent (application config and package database). Backing this up to flash (poosibly just specific files) is a good idea. An anacron job to do this should probasbly run daily (1500 times over 4 year lifetime) to mirror any changes on the corresponding JFFS2 flash filesystem. This could be done for specific files or just changed files. This requires remounting / read-write and making the jffs2 version of /tmp,/var,and /etc visible. I think this requires them to actually be under /orig/ or something (from whence the original contents are copied, and thus can be backup-up to.
- /home (personal data, and personal app config) is complicated. It involves personal files, application data and removable media. See “user data” for the full discussion. Appears to user with a friendlier name: 'My Files'?
Inserted media will appear here as 'CF Card' and 'MMC Card', which will be links to the mounted devices in /mnt/cf1 and /mnt/mmc1. This makes the (first partition on the) removable media visible to the user so that they can manage files and space themselves. Applications on the CF card will be in a separate (read-only ext2?) partition (and thus do not appear here). This is idiot-proof but potentially confusing.
Contacts and agenda-type info will live in mtdblock/4, mounted as what? (/home/internalfiles?)
The main difference is that this version will have a CF card provided. This will probably have the same RAM/NAND use as the SoHo version for the base system, but also storing large applications and files on the CF.
This means that we need to manage the appearance/removal of the apps on the CF in a seamless and user-friendly manner.
It is also an option to run a complete system from CF card, as we have done in development, working just like a laptop Drive. This allows the device to use standard Debian in it's entirety, but requires power management to work reliably with the CF (initial 2.4.19 support available since 2004.05.24).
Adding applications from removable media
This is not trivial. There are schemes out there for managing apps on removable media and disappearing apps, but none really do what we want. See FileSystemBackground for details. Each application needs to be announced to the system (menu entries, icons). This requires the menu system to know where to look for these resources. Applications also need to be able to find their files (normally in /usr/lib/appname and /usr/share/appname) and libraries (normally in /usr/lib/appname). Applications can be compiled with a 'prefix' so that they know they are below /media/cf/ but this means recompiling every app for use on external media, and indeed also means that you can only add apps via cf, not via MMC.
A rather niftier technique is to use 'rootkit' technology to change system paths on the fly so that apps think they are installed normally, but actually remain on the media. This requires either a kernel hack to know about the specific paths or (rather less hackily) a library which is loaded first that will trap all filesystem calls and rewrite paths as required. This would be implemented by using the LD_PRELOAD environment variable which can specify any libraries which must be loaded before running an executable. The only disadvantage of this is that prelinking is broken (this is used by, for example, openoffice to greatly speed up loading. It would be possible to implement it as a special kernel hack instead to avoid this problem. Ugly but effective.
This approach will automatically adapt to the current mount point of an application. The main disadvantage is that someone need to write the code to trap all the VFS calls and change them accordingly. A detailed spec is needed for what this code should do
Strip off current mount path from references to /usr/lib, /usr/share/ /usr/bin. when reading. Add them back on when writing. Make sure that it doesn't happen more than once to a given path. Affected calls are: fopen, fclose, read, write, stat, etc… (do 64-bit versions as well).