|
|  |
RAID ARRAYS: DATA ORGANIZATION AND RECOVERY
This article describes general information about disk data organization on RAID arrays and provides general
knowledge for self-service and professional service data recovery from RAID arrays.
The sources of this article are public information, RAID specifications and our own data recovery experience.
Copyright © SysDevSoftware, Bogdan Shulga 2007-2008
This artical is published for educational purposes only. Any commercial use is prohibited.
Contents
RAID array terms
Speaking about RAID arrays, we should agree the terms used. Please note: some terms are explained within the body of this document and/or in RAID level related article. If you are using some kind of RAID and you are interested in additional knowlege about it, it's recommended to read the entire document. The most common terms used are given below:
- RAID - the Redundant Array of Independent Disks. It means storage schema where set of independent storages are combined into one. Depending of actual data organization it increases storage stability, performance and/or reliability.
- Hardware RAID - the hardware-driven RAID. It consists of hardware RAID controller chip or board that handles array and set of attached storage devices. Typically, hardware RAID is visible by Operation System (OS) as one monolithic storage device. Data organization is managed by hardware controller and could be configured with RAID controller device BIOS settings.
- Software RAID - the software-driven RAID. This RAID assumes no specific hardware and is created on set of independed storage units by OS. Logically it is one monolithic storage. Data organization is managed by OS drivers (that utilize CPU time) and not require additional hardware.
Well-known examples are: NT LDM software RAIDs, Linux, BSD, MacOS RAIDs.
- Virtual RAID - virtually-reconstructed Hardware or Software RAID from its components. It is the virtual storage created by data recovery software to simulate original RAID storage for data recovery purposes.
- RAID Component (unit) - the disk or disk partition that is used to be a data storage for the RAID.
- Mirroring - is a data organization schema that means data copies distribution among Units. 'Mirror' assumes if some data block exists on one storage unit, at least one copy of the same data will lay on another unit. This gives good fault-tolerance because in case of unit physical damage there are a copy of the data, located on another unit(s) of this RAID.
Classic mirror is the RAID 1.
- Striping - the technique and data organization schema that is used to significantly increase Input/Output (I/O) performance of RAID by distributing data fragments among Units. Data on RAID is divided to a small parts (so-called Stripes) and is distributed across all available units. Speed-up effect is achived because data could be read/written from all units independently in 'parallel mode'.
Classic stripe set is the RAID 0.
- Parity - the bit addition (logical eXlusive OR, XOR) of the data from different RAID units. Each Parity byte is calculated as XOR of corresponding data bytes from data units: P = U1 XOR U2 XOR ... XOR UN. Important property of this operation is that data byte from each unit could be reconstructed by Parity byte and data bytes from other units like: U1 = P XOR U2 XOR ... XOR UN (for 1st unit). Parity is used as fault-tolerance technique in assumption that no more then one disk could fail same time.
RAID array types (levels)
Depending of data organization schema and techniques or their combinations, there are different types (levels) of RAID. Each RAID level was introduced to be the solution for specific case and/or realization. They have own advantages and disadvantages, measuring in terms of cost, performance, reliability and purpose.
Below you can find information about commonly-used levels of RAID.
RAID, level 0 (RAID0, data striping)
RAID, level 0 represents classic implementation of Data Striping. The abbreviation 'RAID' is not really correct for this array type, because
there are no any redundancy. This level of array could be built on two or more storage units. Stripes are defined as data fragments, usually with size 4096, 8192 ... 256K etc. bytes (each next value is previous, multiplied by 2). Realization may also have other stripe sizes, but they must be divisible by storage sector size (usually - 512 bytes). Each next data stripe is located on subsequent storage unit.
Figure 1. Stripes organization on RAID 0 (2 units)
This stripes allocation schema allows speed-up I/O operations up to U times (where U - number of units in RAID0). This achieved by scheduling concurrent or continuous I/O request to different units (usually - different hard disks devices). For example, to read stripes 0..3 (data segment with size 4 stripes), controller should send 2 concurrent read request: read two first stripes from Unit 1 and read two first stripes from Unit 2. In this case units will perform physical read at same time, so controller will get result two times faster.
Due to such kind of stripe organization, RAID uses almost all storage space for data, so there are no any redundancy on data area. However RAID 0 storage size sometimes is less then sum of sizes of individual units because controller might reserve some storage space for 'own technical needs' and because array might be built from disks with different size. In last case controller will use for stripe set as much space of each unit, as has smallest unit. So RAID 0 storage size is: (min(Unit Size) - Reserved) x Units Count.
Advantages of RAID, level 0:
- Extremely high performance in both read and write;
- Simple realization (even most on-board SATA controllers support RAID 0);
- Up to 100% disk space is used for data;
- Cheapest storage space solution.
Disadvantages
- No any fault tolerance: Unit failure causes data loss.
Recovery perspectives
- Controller failure/disassembled array: easy to recover all data. You should know stripe size and units order.
- Damaged unit: in case of any of units become unreadable, data recovery for continuous data segments above StripeSize*(UnitsCount-1) is impossible.
RAID, level 1 (RAID1, data mirroring)
RAID, level 1 represents classic implementation of Data Mirroring. Classic schema - it is the pair (or more) units with the same data. Data size of RAID 1 is equal to data size of smaller unit, minus possible controller-reserved space. When controller reads data from RAID 1 it may schedule requests to different disks to speed-up I/O. Write operation works either in parallel mode (to both disks same time: faster) or on each disk after another (fault-tolerant). RAID 1 not uses any data segmentation.
Advantages of RAID, level 1:
- Usual for single disk or faster read operations;
- Very high fault-tolerance;
- RAID may operate in case at least one mirror disk is left (in so called 'degraded mode').
- One of most available solutions (even most onboard SATA controllers support RAID 1);
Disadvantages
- Most expensive disk space (each unit stores the same information);
- Slow write operation (in practice, could be even slower then individual disk);
Recovery perspectives
- Controller failure/disassembled array: easy to recover all data from any of units.
- Damaged unit: data could be recovered from any readable unit.
RAID, level 4 (RAID4, stripe set with dedicated parity)
This is first, successful enough, attempt to compromise between fault-tolerance, speed and cost. This schema includes usual Stripe Set (like RAID 0), extended with one more dedicated unit to store error control parity information.
Array is built using 3 or more disks. There were two similar RAID levels: Level 3 (RAID 3) and Level 4 (RAID 4) that are realizations of this idea. RAID 3 was very complex because required complex hardware implementation and at present it is not used.
Figure 4. Stripe Set with dedicated parity (RAID 4)
How fault-tolerance works? Stripe set stores actual RAID data. Each 'column' of stripes (see Figure 4) is summed with XOR to get Parity (refer to RAID array terms for detailed parity description).
Due to specifics of RAID 4, it mostly like RAID 0 (fast read, large capacity), but additionally has extended internal errors correction feature (in case some stripe could not be read, controller may reconstruct it from other stripes and parity information). Dedicated parity disk is not used for data and is just a 'backup unit'.
Advantages of RAID, level 4:
- Extra-fast read operations;
- Fault-tolerance;
- RAID may operate in 'degraded mode' in case if one of disks is gone bad (like RAID 0 in case of parity disk failed or with stripes reconstruction, basing on parity);
- Cost - fault tolerance efficient solution.
Disadvantages
- Very slow write operations: any write/update require parity information updates on one dedicated disk. This is the 'bottleneck' of this solution and its main disadvantage.
- Slow read in degraded mode (data disk failed) due to high load on 'Parity unit'.
Recovery perspectives
- Controller failure/disassembled array: easy to recover all data. N-1 disks required, data disks are preferred (to build virtual RAID 0); need to know disk order and stripe size.
- Damaged unit: recovery chances are near 100% in case no more then 1 disk failed. Same problem that for RAID 0 in case 2 or more disks failed.
RAID, level 5 (RAID5, stripe set with distributed parity)
Now, the best compromise solution between fault-tolerance, speed and cost. This schema includes usual Stripe Set (like RAID 0), that mixes data and parity information.
Like RAID 4 it requires at least 3 disks, but unlike RAID 4, RAID 5 has no dedicated disk to store parity information so there are no such 'queue' for parity updates on write.
Depending of RAID purpose, realization, vendor and so on there are different methods of parity distribution across Stripe Set. Most commonly used methods are: Left Symmetric (backward dynamic parity distribution; most used), Right Symmetric (forward dynamic parity distribution), Left Asymmetric (backward parity distribution) and Right Asymmetric (forward parity distribution). Other methods (like 'dedicated column', 'delayed parity' and so on) are rarely used and are vendor-specific.
Figure 5. Left Symmetric parity distribution (RAID 5)
Figure 6. Left Asymmetric parity distribution (RAID 5)
Figure 7. Right Symmetric parity distribution (RAID 5)
Figure 8. Right Asymmetric parity distribution (RAID 5)
Fault-tolerance works like for RAID 4: stripe set stores actual data and parity information; each 'column' of stripes is summed into 'parity stripe' of the column.
Due to specifics of RAID 5, it works like RAID 0 (fast read, large capacity), like RAID 4 has extended internal errors correction ability (in case some stripe could not be read, controller may reconstruct it from other stripes and parity information). Actual RAID storage size is (U-1) * (min(unit size) - Reserved)
Advantages of RAID, level 5:
- Extra-fast read operations;
- Fast write, but depends of data and parity distribution method.
- Fault-tolerance;
- RAID may operate in 'degraded mode' in case if one of disks is gone bad (in stripe-reconstruction mode);
- Cost - fault tolerance efficient solution.
Disadvantages
- Slower write then for RAID 0.
- Write speed is content-dependant and parity distribution method dependant.
Recovery perspectives
- Controller fault/disassembled array: easy to recover all data. All disks preferred, but N-1 required; need to know disk order, stripe size and parity distribution method.
- Damaged unit: recovery chances are near 100% in case no more then 1 disk failed. Same problem that for RAID 0 in case 2 or more disks failed.
RAID, level 6 (RAID6, stripe set with double distributed parity)
The new, still rarely used solution for reliable and same time cost efficient data storing. Idea of RAID6 is to extend RAID5 schema with one more parity stripe with different parity calculation algorithm, based on Galois field algebra. This schema allows to have one more data redurancy unit and to correct efficient any disk errors.
Data organization on RAID 6 is similar to RAID5: data and parity (P-stripe) are rotated among disks. The difference is in one more stripe (so-called Q-stripe) that always follows P-stripe and contains GF sum of data, located in same 'column'.
For more information about RAID6 and Q-stripe algorithms refer to this page:
http://www.cs.utk.edu/~plank/plank/papers/CS-96-332.html
Advantages of RAID, level 6:
- Extra-fast read operations;
- Fast write, but depends of data and parity distribution method.
- Very good fault-tolerance;
- RAID may operate in 'degraded mode' in case if one or even 2 of disks are gone bad (in stripe-reconstruction mode);
- Cost - fault tolerance efficient solution.
Disadvantages
- Slower write then for RAID 0.
- Write speed is content-dependant and parity distribution method dependant.
Recovery perspectives
- Controller fault/disassembled array: easy to recover all data. All disks preferred, but N-1 or N-2 required; need to know disk order, stripe size and parity distribution method.
- Damaged unit: recovery chances are near 100% in case no more then 1 disk failed. Same problem that for RAID 0 in case more then 2 disks failed.
Nested RAID: level 0+1, level 10, level 50, level 51 etc.
These are nested RAID realizations, based on level 0, level 5 and level 1. RAID, level 0+1 is used as pair of stripe-sets to increase fault-tolerance with no performance leak. RAID, level 10 - it is Striped extension over set of mirrors to increase performance and extend their size. Realizations require at least 4 disks. RAID 50 is the stripe set of individual RAID5 created for performance reason and RAID51 is the mirror of individual RAID5 created for fault-tolerance (these ones require at least 6 disks to be built). Below are given examples for RAID 0+1 and RAID 10.
Figure 2. Data organization on mirror of stripes (RAID 0+1; 6 units)
Figure 3. Data organization on stripe of mirrors (RAID 10; 6 units, 2x3 mirrors)
Advantages of nested RAIDs:
- Increased speed or fault-tolerance;
- RAID may operate in degraded mode;
- RAID 10 and RAID 0+1 are very available solutions (even available on some onboard controllers).
Disadvantages
- Expensive disk space for mirrors;
- Hard to manage/maintain.
Recovery perspectives
- Controller failure/disassembled array: easy to recover all data.
- Damaged unit: recovery chances are near 100% in case it's possible to virtually assemble at least one stripe set (RAID10, RAID50) or at least one mirror instance (RAID 0+1, RAID 51).
Data Recovery
Depending of actual RAID layout and parameters, described above, it easy to recover data from disassembled RAID array. To do so, you have to:
- Be sure in RAID array level;
- Refer to RAID type description to precise segments layout across the units;
- Refer to 'Recovery' section of RAID type description and precise additional, required parameters (units order, stripe size, parity distribution - if required);
- Using data layout diagram and parameters, perform disk read.
Simplest recovery result is the RAID storage image file that could be written to new disk or RAID. It also could be analysed with any data recovery software that supports disk image files.
Recovery process automation, basing on RAID array layout knowledge is possible using UFS Explorer Professional Recovery product that supports virtual reconstruction and data recovery from:
- RAID arrays, levels 0, 1, 0+1, 10, 4, 5, 6, 50, 51 (also RAID 7, that is hardware-variation of RAID 4);
- Degraded RAID arrays, level 1, 0+1, 10, 5 and 6;
- Most used file systems, used for NAS RAID storages and servers with RAID arrays.
Last update: 09.04.2008
|