Apple Core Storage and its data organization principles. Possible challenges in data recovery
Apple has long had very little to offer when it came to storage flexibility, especially for average Mac users, who lack technical expertise or are simply reluctant to pore over complex RAID concepts in order to be able to benefit from the AppleRAID technology. The introduction of Core Storage in 2011, although almost went unnoticed by the general public, lead the way to some important storage management features, whose implementation became possible only owing to this technology. Yet, along with the considerable benefits it brought to Macintosh computers, Core Storage has some disadvantages with regard to the safety of the stored data and chances for its recovery.
Apple Core Storage basics
First appeared in macOS 10.7 Lion, Core Storage has been one of the key components of Mac’s data management system up to macOS 10.13 High Sierra. In essence, it is Apple’s proprietary implementation of a logical volume manager – just like Linux LVM, it acts as a layer of virtualization between the partition scheme applied to the physical storage and the file systems its volumes are formatted with.
The basic model implies that a partition scheme gives some logical parameters to the physical disk and establishes certain fixed boundaries between the “regions” on it (called partitions) so that the OS could manage information on each of the regions independently and present them to the user as separate logical disks. After that, a file system can be written to each of the partitions whose structures define the way data chunks are actually organized on it.
In contrast, a logical volume manager enables far more flexible relationship between disks and volumes than the one offered by conventional partition schemes: partitions can be dynamically allocated by the system while a single volume can span more than one physical storage device.
Initially, Core Storage served as the basis for FileVault 2 – the technology which brought native full-disk encryption capabilities to Macs in Lion. In Mountain Lion the ability to increase the capacity of a single volume beyond one physical drive expanded the use of Core Storage for the Fusion Drive configuration – a combination of a hard disk drive and a solid state one treated as one logical element.
How is data organized by Core Storage?
The structure of Core Storage is quite similar to that of Linux LVM: it consists of four major levels as well, though the latter do not coincide completely – one or several Physical Volumes are combined into a Logical Volume Group in which Logical Volumes are created and which may export one or several Volume Families:
like in Linux LVM, a Physical Volume (PV) is the most basic building block, usually a real physical storage device (for example, a hard disk drive or an SSD), but it may also be a disk image or a set of disks which make up a RAID system. However, to become a PV a storage must be partitioned with the GPT partitioning scheme (GUID Partition Table) and get its own identifier called GUID. In addition, each of them keeps some information about the Logical Volume Group it belongs to;
a Logical Volume Group (LVG) is an equivalent of a Volume group in LVM which encompasses one or more Physical Volumes forming a single storage pool for Logical Volumes. As a rule, one Logical Volume is set up with the total capacity of all Physical Volumes;
a Logical Volume (LV) is a virtual storage device within a Logical Volume Group which receives a file system (HFS+), and gets mounted. Data on a Logical Volume is organized like on any traditional volume so that it could be easily accessed and read.
a Logical Volume Family (LVF) is a new concept introduced by Apple, it keeps various metadata related to Logical Volumes comprising a Logical Volume Group together with the set of properties in regard to their encryption. All the specified properties are inherited by each of the LVs.
Being capable of carrying out storage allocation operations in the background, Core Storage is used to facilitate the process of disk encryption performed by FileVault 2. Earlier, FileVault stored an encrypted file system in regular files, but the file-based technology was far from perfect, especially when the entire volume was to be encrypted. Core Storage allows encrypting data of a volume at the block level: it creates Logical Volumes for encrypted user and system data and moves blocks in and out of an encrypted partition. This way, when an encrypted volume is unlocked, a new Logical Volume is created, which contains the whole encrypted file system and unallocated space as one blocks chunk. A typical FileVault 2 configuration looks as follows:
/dev/disk0 – a physical storage system with several volumes;
/dev/disk0s3 – a physical volume on disk0s3, the content of which is encrypted and the volume is included to the Core Storage Logical Volume Group;
/dev/disk1 – a Logical Volume in a Logical Volume Group which is the starting point for decryption of the contents of disk0s3.
Secondly, Core Storage has become the main mechanism helping to automate the process of data distribution between the two components of a Fusion Drive, which usually has the following composition:
dev/disk0 – a physical solid state drive which is part of a Core Storage LVG;
dev/disk1 – a physical hard disk drive included into a Core Storage LVG ;
dev/disk2 – a logical volume which consists of disk0 and disk1.
In this configuration disk0 is set to be the primary device in a Logical Volume Group, therefore, the system prioritizes it for storing files, so that frequently used ones are moved in 128 KB blocks to the faster SSD storage and vice versa. Data migration is performed by four major Core Storage calls: RdChunkCS, WrChunkCS, WrBgMigCS and RdBgMigrCs. As a result, the user gets an optimized system which combines the performance of the flash memory and the capacity of the magnetic one.
Advantages and disadvantages of Core Storage
Core Storage is a dependable and high-performance volume format. It provides the foundation for Fusion Drive and its intelligent data migration, not to mention the in-place transformations required for the implementation of the FileVault 2 disk encryption. Yet, the technology has several significant shortcomings one should be aware of:
unlike Linux LVM, Core Storage doesn’t support thin provisioning;
live capacity expansion is not available in Core Storage as opposed to LVM, so it is impossible to expand the storage pool as the storage grows. In fact, the diskutil command does provide the option of resizing Core Storage groups and volumes, but it is not really well-documented and bears an inherent risk of total data loss;
Disk Utility lacks the capabilities of manipulating the Core Storage layout – one cannot create or remove logical volumes, view groups or families without using the Terminal;
Core Storage doesn't support the new Apple APFS file system: after the installation of macOS High Sierra and later (macOS Mojave for Fusion Drive) the Logical Volume Group will be converted to a special APFS container;
the technology doesn’t offer any fault-tolerance options. What is more, each drive belonging to Core Storage is a part of a single whole and cannot be accessed separately should one of the member disks be disconnected or fail. All the data the defective system keeps inevitably becomes lost.
Possibility of data recovery
To make sure that Core Storage is indeed enabled on the problem drives, one can check the hexadecimal view of the largest partition of each disk: the “signature number” 0x4353 must be present in position 0x58.
All the user data Core Storage holds is spanned across the component disks. Moreover, special encrypted metadata at the end of each storage is required for its correct interpretation. Therefore, the presence of all components is absolutely necessary for data recovery – total failure of at least one of them leads to irreversible data loss. Also, severe damage of encrypted metadata or the storage GUID makes it impossible to retrieve intact files. Among other potential issues are:
The loss of a partition table on one of the disks
The storage, in this case, can be opened if all the partitions are defined manually: the start sector and the size of the partition in sectors are to be specified correctly.
Problems related to encryption of the storage
When full-disk encryption is enabled on Mac, a special storage encryption key is generated and written to the area of metadata or to a special file on the system configuration partition called EncryptedRoot.pllist.wipekey. When the metadata area gets corrupted or access to EncryptedRoot.pllist.wipekey becomes impossible, unfortunately, the data cannot be decrypted and its recovery is beyond the bounds of possibility.
In other cases, Core Storage metadata is recognized by UFS Explorer and the storage can be assembled and then scanned by the program.
Last update: February 6, 2019