The file systems of Linux
In contrast to Windows and macOS, that are strictly controlled by the Microsoft and Apple corporations respectively, Linux is a big open-source project developed by a community of enthusiasts. It’s code always remains available for those who want to contribute, and anyone is free to tweak it as per individual requirements or create their own distributions on its foundation. That is why Linux exists in so many variants, and the same goes for its file system, a fundamental part of any OS defined in the article on Filesystem basics. The kernel offers support for numerous storage formats, yet, the most commonly used are ones of the Ext family, XFS, Btrfs, F2FS, JFS, and ReiserFS. Below you can find their brief descriptions and learn about the main specificities associated with them.
Please note: Linux also provides read-write access to file systems of other platforms, such as NTFS, FAT32 and exFAT of Microsoft Windows, HFS+ of macOS and ZFS of BSD, Solaris, Unix that are addressed in the dedicated articles.
Ext (Extended File System) was released in 1992 as the first very format designed specifically for Linux. However, it still had serious performance limitations and was quickly superseded by Ext2. This filesystem and its later revisions – Ext3 and Ext4 – became the default choice for a large majority of Linux distributions.
Ext2 has proved to be more efficient owing to its structure, which is based around the concept of inodes. Such an index descriptor keeps the attributes of a certain object, like a file or directory, and points to the locations of its underlying data. The space in Ext2 is divided into blocks that form larger units referred to as Block Groups. The information about all Block Groups is maintained by the Descriptor Table positioned straight after the Superblock. Each Block Group keeps inodes in its own Inode Table. It also monitors the state of its blocks and inodes using the Block and the Inode Bitmaps. Meanwhile, the name of a file or directory doesn’t constitute a part of its inode – names are mapped to the corresponding inode numbers via directories, implemented as a special kind of files.
Most Linux filesystems are similar in that the name is not regarded as an attribute and rather defined as an alias for a file in a certain directory. A file object can be linked from many locations and exist under different names (the so-called hard links). This can lead to serious and even insurmountable difficulties in recovery of file names after file deletion or logical damage.
Ext3 is in fact an upgraded version of Ext2 that supports journaling. The journal in Ext3 is organized as a log file, which records all changes to the filesystem and protects it from corruption in the event of a crash.
Ext4 is an improvement over Ext4, which changed the method of data allocation from individual blocks to extents. The idea behind it is to write most of the file’s data to a continuous area and then note down only the address of its first block and the number of blocks in a sequence. Up to four extents can be stored directly in the inode, while the rest are arranged as a B+tree. In addition, Ext4 postpones the operation until the data is actually committed to the disk, and thus is able to minimize fragmentation.
On the whole, it is considered to be one of the most flexible general-purpose FS types, which has also gained a reputation for solid stability.
XFS (Extended File System) is another very mature filesystem that was initially created by Silicon Graphics and applied on the company's IRIX servers. In 2001, it made its way to the Linux kernel and is now supported by most Linux distributions, some of which, like Red Hat Enterprise Linux, even use it by default.
This FS type is optimized for storing very large files and volumes on a single host. It splits its storage space into equally sized areas called Allocation Groups. Each of them acts like a distinct filesystem, i.e. has its own Superblock, manages its own structures and space usage. The latter is controlled with the help of B+trees, one of which records the first block in the continuous free space region, and the other – the number of blocks it is composed of. Storage blocks are assigned to files using the same extent-based approach. All files and directories in XFS are represented by their individual inodes. The allocation of extents may be stored directly in the inode or traced by another B+tree linked to it in case of a very large or fragmented file. And just like inodes in Ext, they do not contain the names, that are available only in the corresponding directory entries.
XFS deploys the journaling principle for any updates to its metadata. All changes are written to the Journal first before the actual blocks get modified, which enables its instant recovery in case of any mishaps. In general, this FS type is designed to be highly scalable and operates very well on server hardware.
Btrfs (B-tree File System) is one of the most popular new-generation formats for Linux, and a lot of effort is being put to make it stable. It was developed by Oracle and has been supported by the mainline Linux kernel since 2009. Fedora and SUSE already employ it by default.
Btrfs is adjusted to work on a wide range of devices, from smartphones to high-end servers. Moreover, it embraces the features of a logical volume manager, being able to spread over multiple storages, together with countless other advanced possibilities.
As its name suggests, Btrfs relies heavily on B-tree structures, each composed of internal nodes and leaves. An internal node points to a child node or leaf, while a leaf contains an item with some information. The actual layout and content of an item depends on the type of the given B-tree. The Root B-tree, whose location is available in the Superblock, has references to the rest of B-trees. The Chunk B-tree manages logical to physical address mapping, whereas the Device B-tree conversely links the physical blocks on the underlying devices to their virtual addresses. The File System B-tree is responsible for the allocation of files and folders. Small files are stored right there in inside extent items. Larger ones are placed outside in contiguous areas called extents. In such a case, an extent item references all extents holding the file’s data. Directory items include file names and point to their inode items. Inode items, in their turn, are used for other properties, like size, permissions, etc.
Btrfs is a Copy-on-Write (CoW) filesystem. Instead of employing a journal, it makes copies of the blocks before modifying them and then writes this data to a different free location. This mechanism helps it to eliminate the risk for data corruption when an update gets interrupted, for example, due to power loss. Thanks to it and a wide variety of other attractive features, Btrfs is finding more and more adherents among modern Linux users.
F2FS (Flash-Friendly File System) is another modern format introduced by Samsung Electronics in 2012. It has been designed specifically for storage devices based on the NAND flash memory, and thus is most extensively utilized in modern smartphones and removable storage media.
F2FS works on the basis of the log-structured FS approach (LFS) and takes into account such peculiarities of flash storage as constant access time and a limited number of data rewriting cycles. Instead of creating one large chunk for writing, F2FS assembles the blocks into separate chunks (up to 6) that are written concurrently.
It divides its storage space into fixed-sized segments. Consecutive segments make up a section, and several sections constitute a zone. Data allocation in them is performed with the help of nodes. The latter come in three types: direct, indirect and inodes. An inode stores metadata, including name, size and other file’s properties; A direct node indicates the location of its data blocks, while an indirect node points to blocks in other nodes. The physical addresses of these nodes can be found in the Node Address Table (NIT). The content itself is stored in the Main Area. The sections in it separate the data blocks from the node blocks with service information. The usage status of all blocks is recorded by the Segment Information Table (SIT). The Segment Summary Area (SSA) specifies which blocks are assigned to which node.
When running out of free segments, F2FS cleans up itself in the background when the system is inactive. The cleaning algorithm selects the victim segments based on the number of used blocks according to the SIT or by their age.
The described organization makes F2FS perform quite well on solid-state storage. Yet, so far, it has been mostly applied on portable devices and is rarely encountered on desktop and server machines.
JFS (Journaled File System) was created by IBM in 1990. The original version, sometimes referred to as JFS1, was implemented in the company’s AIX OS. Later, JFS2 was released and ported to Linux after it became open-source.
A JFS volume is composed of regions called Allocation Groups, and each of them contains one or more FileSets. All files and directories are described by their individual inodes, whereas the data content is represented by one or more extents. All extents are indexed by a dedicated B+tree. The content of small directories is stored within their inodes, while larger ones are organized as B+trees. B+trees also control the usage of storage space: the first tree stores the starting blocks of free extents, and the second one – the number of free extents. JFS also comprises a separate log area and writes to it whenever metadata changes take place.
In general, JFS is considered to be a speedy and reliable filesystem. However, it seldom sees any enhancements and now is falling out of use, being surpassed by more modern options.
ReiserFS is an alternative Linux format optimized for storing a big number of small files. It was initially designed by Namesys in 2001 and brought a number of new features that were very innovative at the time of its introduction. Yet, eventually, its maintenance was handed over to volunteers due to certain technical issues.
ReiserFS is organized around the S+tree, which is composed of internal and leaf nodes. This structure is used to manage all files, directories and metadata. It contains items of four basic types: direct, indirect, directory and stat items. Direct items hold actual data, indirect items just link to certain data blocks, directory items represent entries in a directory, and stat items contain the properties of files and folders. Each item has its unique key used to locate it in the tree. This key includes the item’s identifier, address and type.
Files and file fragments that do don’t occupy the entire block are combined and stored directly in the leaf nodes of the S+tree. This mechanism is called tail-packing, and it helps to reduce the amount of wasted space and fragmentation. Moreover, ReiserFS doesn’t make any changes directly to the S+tree – it writes them to the Journal first and then copies to the required location on the storage.
All in all, ReiserFS has good search capabilities and enables compact allocation of small files. However, this format is no longer actively supported, and it is very unlikely to remain relevant in the nearest future.
Hint: The information concerning the possibility of successful recovery of data from the mentioned FS types can be found in the articles describing the specifics of data recovery from different operating systems and chances for data recovery. To get a grasp on how the procedure should be carried out, please, use the manual on data recovery from Linux.
If you wish to get to know the filesystems utilized in other environments, please read the following articles:
The filesystems of Windows: FAT/FAT32, exFAT, NTFS, ReFS, HPFS
Last update: April 19, 2023