|
|  |
UNDELETE FROM LINUX, UNIX, BSD: WHY NOT?
This article describes general concept how Linux-like systems handle file deletion on Linux/Unix file systems (Ext2/Ext/ReiserFS/XFS)
and answers why the files could not be un-deleted.
The sources of this article are public information and our own analysis and data recovery experience.
Copyright © SysDevSoftware, Bogdan Shulga 2007
This artical is published for educational purposes only. Any commercial use is prohibited.
How data is organized?
Most existing file systems, including Linux file systems, use so-called block data organization. Originally, storage devices could logically operate with small data units, called sectors. Usually, the size of these sectors is 512 bytes.
Each data fragment or file on disk takes one or more these sectors. To access this file, storage driver must address this sector and read data from it. Assuming 'Linear Block Access' (LBA) we may think about storage sectors as about array of cells with ordinal numbers.
Due to disk addressing optimization, file system realization logically gathers equal sets of sectors to so-called blocks. Each block is the set of sectors that could be logically addressed with file system driver. Minimal possible block size is one sector; in practice, it's never used block size over 64KB.
Most existing file systems (Windows FAT/NTFS, Linux Ext2/Ext3/XFS etc.) use the block as smallest logical disk unit. This means that file (or file tail) below block size will take entire this block. Some file systems (like ReiserFS), however, may use free block 'tails' to store small files and data fragments.
Assuming most common case, we have file organization on the disk as follows:
- File logically is divided to blocks (parts with size equal to block size);
- If there are file tail, file system driver allocates one more block for it (or in some cases - block tail (ReiserFS));
- File is written to existing free disk blocks.
What is free disk block? This is data areas of file systems that is not used by any existing file, file system structure or is not reserved for technical needs.
Free space and fragmentation
In practice, many sequential create file/append data/truncate data/delete file requests makes free space on file system fragmented.
Figure 1. How fregmentation appears
Figure 1 shows simplest example of fragmentation appearance. When 'File 3' is written, there no linear disk space to to store all file data and so file is saved across two non-linked free blocks fragments.
Note: file system driver in this example not wipes blocks 1-2 after 'File 1', but just marks them as 'free'. Before 'File 3' is written, blocks 1-2 still had file data of 'File 1'.
Also note that in practice, large files may consist of up to few hundreds of unlinked data fragments, few blocks each.
How fragments are linked?
File system stores special objects to describe files: information nodes (briefly - inodes). Among all, inode contains the following information:
- Object type;
- Object size;
- Object allocation table/list/tree.
When file system driver reads inode, it can determine: what object is (file, directory, link etc.) and decide about object read/write/handling strategy. From object size, driver may calculate how many blocks are used by object, including size of file object tail. Finally, from object allocation data, driver may get information about actual data blocks locations.
How object allocation information is organized? The key part of object allocation information is array, list or B-tree of pointers to data blocks or to continuous fragments of blocks. The first part or root of this information is stored as part of inode.
So why no undelete?
Unlike Windows, Linux/BSD/Unix operation system drivers clean the part of inode information after file is deleted: it fills with zero object size, object type/mode information and object allocation information. This means after file if deleted, software knows nothing about it.
Assume files 2 and 3 on Figure 1 are RAW encrypted files (no headers; both are like 'white noise'). Assume both takes full blocks and both are deleted. These means that no allocation information left and data recovery software is unable to detect file 2 or file 3 boundaries: both are too similar and they are mixed one with another. This is classic example of 'impossible recovery' situation.
Practical situation is not so far from this: binary files fragments are too similar for data recovery software; there is much heavy fragmentation and so on. All this makes pessimistic prognoses for file undelete.
No chance?
Recovery is still possible. There set of recovery methods that could help with file recovery, but none can give 100%, or even 80% result (remember about encrypted or compressed file fragments that data recovery software is unable to classify):
- Signature-based search: software searches 'known' file fragments and makes assumption about next continuous fragment contents; usually unable to give exact file size (except there are found known file header that contains file size by itself); helpless for cases of heavy fragmentation.
- Statistical fragments analysis: makes assumption about fragment links, based on statistical methods of data analysis. Could be helpful for homogenic file (most bmp pictures, some archives etc.), but is helpless for heterogenic content (like CD/DVD images etc);
- Lost file system structures search: software finds lost file system structures that could help to determine layout of lost fragments.
If you are thinking about data recovery by yourself, be ready that any data recovery software for Linux/Unix/BSD file systems is unable to process with real undelete. Large part of recovery process is the manual work with data fragments analysis.
Our data recovery products now are developed into two RAW recovery directions: signature search (to recognize file types) and file system structures search.
This allows combining set of stages of data recovery with level-by-level data classification and going closer to higher recovery rates.
Last update: 24.03.2007
|