The basics of file systems
Presently, the computer market offers a huge variety of opportunities for storing information in the digital form. Existing storage devices include internal and external hard drives, memory cards of photo/video cameras, USB flash drives, RAID sets along with other complex storages. Pieces of data are kept on them in the form of files, like documents, pictures, databases, email messages, etc. that have to be efficiently organized on the disk and easily retrieved when needed.
The following article provides a general overview of the file system, the major means of data management on any storage, and describes the peculiarities of its different types.
What is a file system?
Any computer file is stored on a storage medium with a given capacity. In actual fact, each storage is linear space for reading or both reading and writing digital information. Each byte of information on it has its offset from the storage start known as an address and is referenced by this address. A storage can be presented as a grid with a set of numbered cells (each cell is a single byte). Any item saved to the storage gets its own cells.
Generally, computer storages use the pair of a sector and in-sector offset to reference any byte of information on the storage. A sector is a group of bytes (usually 512 bytes), a minimum addressable unit of the physical storage. For example, byte 1040 on a hard disk drive will be referenced as a sector #3 and offset in sector 16 bytes ([sector]+[sector]+[16 bytes]). This scheme is applied to optimize storage addressing and to use a smaller number to refer to any portion of information located on the storage.
To omit the second part of the address (in-sector offset), files are usually stored starting from the sector start and occupy whole sectors (e.g.: a 10-byte file occupies the whole sector, a 512-byte file also occupies the whole sector, at the same time, a 514-byte one occupies two entire sectors).
Each file is stored in "unused" sectors and can be read later by its known position and size. However, how do we know which sectors are occupied and which are free? Where are the size, position and name of the file stored? This is exactly what the file system is responsible for.
As a whole, the file system (often abbreviated as FS) is a structured representation of data and a set of metadata describing this data. It is applied to the storage during the format operation. This structure serves for the purposes of the whole storage and is also a part of an isolated storage segment – a disk partition. Usually, it operates in blocks, not sectors. FS blocks are groups of sectors that optimize storage addressing. Modern types generally use block sizes from 1 to 128 sectors (512-65536 bytes). Files are usually stored at the start of a block and take up entire blocks.
Constant write/delete operations within a storage cause its fragmentation. Thus, files are not stored as whole units, but get divided into fragments. For example, a volume is completely occupied by files with the size of about 4 blocks each (e.g. a collection of photos). A user wants to store one that will take up 8 blocks, and therefore deletes the first and the last files. By doing this, he or she frees the space of 8 blocks, however, the first segment is located near to the storage start while the second one – to the storage end. In this case, the 8-block file is split into two parts (4 blocks for each part) and takes the free space "holes". The information about both fragments as its parts is stored in the file system.
In addition to the user's data, the file system also contains its own parameters (such as a block size), file descriptors (including its size, location, fragments, etc.), names and directory hierarchy. It may also store security information, extended attributes and other parameters.
To comply with diverse users' requirements, such as storage performance, stability and reliability, plenty of FS types (or formats) are developed to be able to serve different purposes more effectively.
File systems of Windows
Microsoft Windows employs two major file systems: NTFS, the primary format most modern versions of this OS use by default, and FAT, which was inherited from old DOS and has exFAT as its later extension. ReFS was also introduced by Microsoft as a new generation format for server computers starting from Windows Server 2012. HPFS developed by Microsoft together with IBM can be found only on extremely old machines running Windows NT up to 3.5.
FAT (File Allocation Table) is one of the simplest FS types, which has been around since the 1980s. It consists of the FS descriptor sector (boot sector or superblock), the block allocation table (referred to as the File Allocation Table) and plain storage space for storing data. Files in FAT are stored in directories. Each directory is an array of 32-byte records, each defining a file or its extended attributes (e.g. a long name). A record attributes the first block of a file. Any next block can be found through the block allocation table by using it as a linked list.
The block allocation table contains an array of block descriptors. A zero value indicates that the block is not used, and a non-zero one relates to the next block of a file or a special value for its end.
The numbers in FAT12, FAT16, FAT32 stand for the number of bits used to address an FS block. This means that FAT12 can use up to 4096 different block references, while FAT16 and FAT32 can use up to 65536 and 4294967296 accordingly. The actual maximum count of blocks is even less and depends on the implementation of the FS driver.
FAT12 and FAT16 used to be applied to old floppy disks and do not find extensive employment nowadays. FAT32 is still widely used for memory cards and USB sticks. The format is supported by smartphones, digital cameras and other portable devices.
FAT32 can be used on Windows-compatible external storages or disk partitions with the size under 32 GB when they are formatted with the built-in tool of this OS, or up to 2 TB when other means are employed to format the storage. The file system also doesn't allow creating files the size of which exceeds 4 GB. To address this issue, exFAT was introduced, which doesn't have any realistic limitations concerning the size and is frequently utilized on modern external hard drives and SSDs.
NTFS (New Technology File System) was introduced in 1993 with Windows NT and is currently the most common file system for end user computers based on Windows. Most operating systems of the Windows Server line use this format as well.
This FS type is quite reliable thanks to journaling and supports many features, including access control, encryption, etc. Each file in NTFS is stored as a descriptor in the Master File Table and its data content. The Master file table contains entries with all information about them: size, allocation, name, etc. The first 16 entries of the table are retained for the BitMap, which keeps record of all free and used clusters, the Log used for journaling records and the BadClus containing information about bad clusters. The first and the last sectors of the file system contain its settings (the boot record or the superblock). This format uses 48 and 64 bit values to reference files, thus being able to support data storages with extremely high capacity.
ReFS (Resilient File System) is the latest development of Microsoft introduced with Windows 8 and now available for Windows 10. Its architecture absolutely differs from other Windows formats and is mainly organized in a form of the B+-tree. ReFS has high tolerance to failures due to new features included into it. The most noteworthy one among them is Copy-on-Write (CoW): no metadata is modified without being copied; data is not written over the existing data – it is placed to another area on the disk. After any modifications, a new copy of metadata is saved to a free area on the storage, and then the system creates a link from older metadata to the newer copy. Thus, a significant quantity of older backups are stored in different places, providing easy data recovery unless this storage space is overwritten.
HPFS (High Performance File System) was created by Microsoft in cooperation with IBM and introduced with OS/2 1.20 in 1989 as a file system for servers that could provide much better performance when compared to FAT. In contrast to FAT, which simply allocates any first free cluster on the disk for the file fragment, HPFS seeks to arrange the file in contiguous blocks, or at least ensure that its fragments (referred to as extents) are placed maximally close to each other. At the beginning of HPFS, there are three control blocks occupying 18 sectors: the boot block, the super block and the spare block. The remaining storage space is divided into parts of contiguous sectors referred to as bands taking 8 MB each. A band has its own sector allocation bitmap showing which sectors in it are occupied (1 – taken, 0 – free). Each file and directory has its own F-Node located close to it on the disk – this structure contains the information about the location of a file and its extended attributes. A special directory band located in the center of the disk is used for storing directories, while the directory structure itself is a balanced tree with alphabetical entries.
Hint: The information concerning data recovery perspectives of the FS types used by Windows can be found in the articles on data recovery specificities of different OS and chances for data recovery. For detailed instructions and recommendations, please, read the manual devoted to data recovery from Windows.
File systems of macOS
Apple's macOS applies two FS types: HFS+, an extension to their legacy HFS used on old Macintosh computers, and APFS, a format employed by modern Macs running macOS 10.14 and later.
HFS+ used to be the primary format of Apple desktop products, including Mac computers, iPods, as well as Apple X Server products before it was replaced by APFS in macOS High Sierra. Advanced server products also use Apple Xsan, a clustered file system derived from StorNext and CentraVision.
HFS+ uses B-trees for placing and locating files. Volumes are divided into sectors, typically 512 bytes in size, then they are grouped into allocation blocks, the number of which depends on the size of the entire volume. The information concerning free and used allocation blocks is kept in the Allocation File. All allocation blocks assigned to each file as extends are recorded in the Extends Overflow File. And, finally, all file attributes are listed in the Attributes file. Data reliability is improved through journaling which makes it possible to keep track of all changes to the system and quickly return it back to the working state in case of unexpected events. Among other supported features are hard links to directories, logical volume encryption, access control, data compression, etc.
The Apple file system is aimed to address fundamental issues present in its predecessor and was developed to efficiently work with modern flash storages and solid-state drives. This 64-bit format uses the copy-on-write method to increase performance, which allows to copy each block before the changes to it are applied, and offers a lot of data integrity and space-saving features. All the contents and metadata about files, folders along with other APFS structures are kept in the APFS container. The Container Superblock stores information about the number of blocks in the Container, the block size, etc. Information about all allocated and free blocks of the Container is managed with the help of Bitmap Structures. Each volume in the Container has its own Volume Superblock which provides information about this volume. All files and folders of the volume are recorded in the File and Folder B-Tree, while the Extents B-Tree is responsible for extents – references to file contents (file start, its length in blocks).
Hint: The details related to the possibility of data recovery from these FS types can be found in the articles about the peculiarities of data recovery depending on the operating system and chances for data recovery. If you’re interested in the practical side of the procedure, please, refer to the guide on data recovery from macOS.
File systems of Linux
Open-source Linux aims at implementing, testing and using different types of file systems. The most popular formats for Linux include:
Ext2, Ext3, Ext4 are simply different versions of the "native" Linux Ext file system. This type falls under active developments and improvements. Ext3 is just an extension of Ext2 that uses transactional file writing operations with a journal. Ext4 is a further development of Ext3, extended with the support of optimized file allocation information (extents) and extended file attributes. This FS is frequently used as a "root" one for most Linux installations.
ReiserFS - an alternative Linux file system optimized for storing a huge number of small files. It has good search capabilities and enables compact allocation of files by storing their tails or simply very small items along with metadata in order to avoid using large FS blocks for this purpose. However, this format is no longer actively developed and supported.
XFS - a robust journaling file system that was initially created by Silicon Graphics and used by the company's IRIX servers. In 2001, it made its way to the Linux kernel and is now supported by most Linux distributions, some of which, like Red Hat Enterprise Linux, even use it by default. This FS type is optimized for storing very big files and volumes on a single host.
JFS - a file system developed by IBM for the company's powerful computing systems. JFS1 usually stands for JFS, JFS2 is the second release. Currently, this project is open-source and implemented in most modern Linux versions.
Btrfs - a file system based on the copy-on-write principle (COW) that was designed by Oracle and has been supported by the mainline Linux kernel since 2009. Btrfs embraces the features of a logical volume manager, being able to span multiple devices, and offers much higher fault tolerance, better scalability, easier administration, etc. together with a number of advanced possibilities.
F2FS – a Linux file system designed by Samsung Electronics that is adapted to the specifics of storage devices based on the NAND flash memory that are widely used in modern smartphones and other computing systems. This type works on the basis of the log-structured FS approach (LFS) and takes into account such peculiarities of flash storage as constant access time and a limited number of data rewriting cycles. Instead of creating one large chunk for writing, F2FS assembles the blocks into separate chunks (up to 6) that are written concurrently.
The concept of "hard links" used in this kind of operating systems makes most Linux FS types similar in that the file name is not regarded as a file attribute and rather defined as an alias for a file in a certain directory. A file object can be linked from many locations, even multiply from the same directory under different names. This can lead to serious and even insurmountable difficulties in recovery of file names after file deletion or logical damage.
Hint: The information concerning the possibility of successful recovery of data from the mentioned FS types can be found in the articles describing the specifics of data recovery from different operating systems and chances for data recovery. To get a grasp on how the procedure should be carried out, please, use the manual on data recovery from Linux.
File systems of BSD, Solaris, Unix
The most common file system for these operating systems is UFS (Unix File System) also often referred to as FFS (Fast File System).
Currently, UFS (in different editions) is supported by all Unix-family operating systems and is a major file system of the BSD OS and the Sun Solaris OS. Modern computer technologies tend to implement replacements for UFS in different operating systems (ZFS for Solaris, JFS and derived formats for Unix etc.).
Hint: The information about the likelihood of a successful result when it comes to data recovery from these FS types can be found in the articles about OS-specific peculiarities of data recovery and chances for data recovery. The process itself is described in the instruction dedicated to data recovery from Unix, Solaris and BSD.
Clustered file systems
Clustered file systems are used in computer cluster systems and support distributed storage.
Distributed FS types include:
ZFS – Sun company "Zettabyte File System" - a format developed for distributed storages of Sun Solaris OS.
Apple Xsan – the Apple company evolution of CentraVision and later StorNext.
VMFS – "Virtual Machine File System" developed by VMware company for its VMware ESX Server.
GFS – Red Hat Linux "Global File System".
JFS1 – the original (legacy) design of IBM JFS used in older AIX storage systems.
Common properties of these file systems include distributed storages support, extensibility and modularity.
To learn about other technologies used to store and manipulate data, please, refer to the storage technologies section.
Last update: September 10, 2021