Clustered file systems

When it comes to personal computing, the most common practice is to have a storage device attached to one dedicated machine, with a local filesystem that manages data under control of its OS, as explained in the article on Filesystem basics. Yet, this approach is not always effective in case of an enterprise-grade infrastructure. A complex system may involve numerous drives, which can be located on different servers. Even so, those servers may need to work together in order to handle a single demanding task or have concurrent access to the same data. One way to achieve this is by means of a clustered filesystem. Such a solution makes them appear as a single storage unit, available for simultaneous usage irrespective of the underlying digital media or computers hosting them. Here you can get a basic understanding of this technology and learn about different cluster FS types.

Content:

What is a clustered filesystem?
Common clustered filesystem types

What is a clustered filesystem?

A regular filesystem is not intended to be mounted on more than one server at a time. And doing otherwise can lead to serious inconsistencies, damaging its logical structure. For instance, being unaware of each other’s activities, two servers may try allocating the same block of storage to different files, relying on the information about free blocks loaded into their memory. Or certain blocks may already be modified by one server, and others will ignore this fact and use the outdated content instead. Such a problem can be addressed by using a clustered filesystem.

A clustered filesystem can be mounted on multiple servers at once, while being accessed by them on the block level and managed as a unified entity. It puts together the available storage capacity and shares it between the servers. At the same time, discrepancies are eliminated, since each server stays in sync with the actual filesystem state, as if all their applications were running on the same machine.

Such a filesystem operates on block devices (hard disks, SSDs, storage arrays, etc.), that can be attached to the server directly or connected via networking protocols, like iSCSI, Fibre Channel, ATA over Ethernet, etc. and imported to the cluster. The most typical implementation comprises a storage area network (SAN).

Hint: In case you need to recover data from SAN, please refer to the provided guide.

As far as the functionality is concerned, a clustered filesystem is similar to any traditional format, like NTFS of Microsoft Windows or ext4 of Linux. It likewise serves as a mechanism to organize data on a storage and retrieve it upon necessity. The difference lies in its being located on two or more servers that are connected into a cluster configuration. And all members of this cluster are able to read and write to the shared storage resource, just like to their local drives. Despite that, all changes made by one machine immediately become visible for the rest, so that data integrity is preserved. And the filesystem itself coordinates input/output operations and may lock them to avoid the so-called collisions. For this to be possible, the essential filesystem metadata may be either distributed across all the servers in a cluster or stored on a centralized metadata server.

Clustered filesystems offer many benefits for multiserver environments. They simplify storage administration, allowing to manage the whole cluster remotely or from any server in this cluster. Most of them include a volume manager, making it possible to provision the necessary amount of capacity. It also becomes easy to extend the system by adding a new server. Some may provide advanced safety features, like replication and snapshots. Yet, on the downside, this storage model often involves high-cost equipment, including expensive disk arrays, switches, cabling, host bus adapters, etc. and may be difficult to maintain in view of the complex architecture.

Common clustered filesystem types

Like it happens with traditional filesystems, clustered solutions are created by different vendors and for various application scenarios. Thus, they may diverge significantly in their design and capabilities. The most widely known cluster FS types are as follows:

GFS2 (Global File System 2) – the primary clustered filesystem for Linux provided by RHEL. It has replaced the original GFS version and has been a part of the kernel package since 2009.
VMFS (Virtual Machine File System) – a popular clustered filesystem optimized for storing virtual machine files. It was developed by VMware Inc. for their VMware ESX Server and is utilized by the company's server virtualization products. Presently, there are six versions of VMFS that correspond to the ESX/ESXi Server releases.
OCFS2 (Oracle Cluster File System) – a general-purpose clustered filesystem created by Oracle and integrated into the Linux kernel since 2006.
SNFS (StorNext File System) – a clustered filesystem made by Quantum Corporation that enables Windows, Linux and Apple machines to read and write to the same volume. Its original name was CentraVision File System (CVFS).
Xsan – Apple’s clustered filesystem for macOS, based on the previously mentioned StorNext filesystem.
CXFS (Clustered XFS) – a proprietary clustered filesystem designed by Silicon Graphics (SGI) specifically for the IRIX operating system and implemented on IRIX- and Linux-based servers.
VxCFS (Veritas Cluster File System) – a clustered filesystem developed by Veritas Technologies and distributed with their VERITAS Storage Foundation products. It can be employed on servers powered by IBM AIX, Linux, HP-UX and Solaris.
IBM Spectrum Scale (formerly GPFS, General Parallel File System) – a clustered filesystem released by IBM in 1998 for their AIX operating system. Later, it became available for Linux and Windows Server.
Lustre – a highly scalable open-source clustered file system, initially made available in 2003 by Cluster File Systems Inc. The file system is now commonly used in scientific research and other data-intensive computing environments, including some of the world’s most powerful supercomputers.
GlusterFS – an open-source clustered file system designed to handle large amounts of unstructured data. It was initially launched in 2005 by Gluster Inc., which was later acquired by Red Hat in 2011. Today, GlusterFS is widely applied in cloud environments.
CephFS – a file system component of the Ceph distributed storage platform, an open-source project actively supported by organizations such as Red Hat, SUSE and others. CephFS integrates with many platforms and is actively used in cloud infrastructures (including OpenStack), containerized environments (including Kubernetes, OpenShift) and for enterprise storage.
BeeGFS (formerly FhGFS, Fraunhofer Parallel File System) – a high-performance open-source clustered file system developed in 2005 by the Fraunhofer Institute in Germany. Over time, BeeGFS has evolved and become widely employed in scientific research, supercomputing centers and some enterprise environments.
OrangeFS – an open-source clustered file system that is based on PVFS (Parallel Virtual File System). Optimized for high performance, OrangeFS has broad application in high-throughput computing environments and large research institutions.

In case you wish to expand your knowledge about local FS types used in different environments, please read the following articles:

Last update: November 15, 2024

If you liked this article, you can share it on social media: