SYSDEV LABORATORIES - Data Recovery and Data Access solutions  
About | Legal notice | Site map Home Contact
Solutions  |  Products  |  Services  |  Knowledge Base  |  Downloads  |  Support  |  Order
  Linux undelete: why it often does not work?
This article describes general concept how Linux-like systems handle file deletion on Linux/Unix file systems (Ext2/Ext/ReiserFS/XFS) and answers why the files could not be un-deleted.

The sources of this article are public information and our own analysis and data recovery experience.
 

The information below describes general issues of deleted files recovery from Linux/Unix file systems and explains why it's hard or impossible to retrive deleted files with thier exact size and names from these file systems. Please take to account that in some cases deleted data recovery is still possible and could be available with our products. Please read product information of the specialized software products to check if deleted file recovery is available with our products for the specific file system.


How data is organized?

Most existing file systems, including Linux file systems, use so-called block data organization. Originally, storage devices could logically operate with small data units, called sectors. Usually, the size of these sectors is 512 bytes. Each data fragment or file on disk takes one or more these sectors. To access this file, storage driver must address this sector and read data from it. Assuming 'Linear Block Access' (LBA) we may think about storage sectors as about array of cells with ordinal numbers.

Due to disk addressing optimization, file system realization logically gathers equal sets of sectors to so-called blocks. Each block is the set of sectors that could be logically addressed with file system driver. Minimal possible block size is one sector; in practice, it's never used block size over 64KB.

Most existing file systems (Windows FAT/NTFS, Linux Ext2/Ext3/XFS etc.) use the block as smallest logical disk unit. This means that file (or file tail) below block size will take entire this block. Some file systems (like ReiserFS), however, may use free block 'tails' to store small files and data fragments.

Assuming most common case, we have file organization on the disk as follows:
  • File logically is divided to blocks (parts with size equal to block size);
  • If there are file tail, file system driver allocates one more block for it (or in some cases - block tail (ReiserFS));
  • File is written to existing free disk blocks.
What is free disk block? This is data areas of file systems that is not used by any existing file, file system structure or is not reserved for technical needs.


Free space and fragmentation

In practice, many sequential create file/append data/truncate data/delete file requests makes free space on file system fragmented.

Block fragmentation

Figure 1. How fregmentation appears

Figure 1 shows simplest example of fragmentation appearance. When 'File 3' is written, there no linear disk space to to store all file data and so file is saved across two non-linked free blocks fragments.

Note: file system driver in this example not wipes blocks 1-2 after 'File 1', but just marks them as 'free'. Before 'File 3' is written, blocks 1-2 still had file data of 'File 1'.

Also note that in practice, large files may consist of up to few hundreds of unlinked data fragments, few blocks each.


How fragments are linked?

File system stores special objects to describe files: information nodes (briefly - inodes). Among all, inode contains the following information:
  • Object type;
  • Object size;
  • Object allocation table/list/tree.
When file system driver reads inode, it can determine: what object is (file, directory, link etc.) and decide about object read/write/handling strategy. From object size, driver may calculate how many blocks are used by object, including size of file object tail. Finally, from object allocation data, driver may get information about actual data blocks locations.

How object allocation information is organized? The key part of object allocation information is array, list or B-tree of pointers to data blocks or to continuous fragments of blocks. The first part or root of this information is stored as part of inode.


So why no undelete?

Unlike Windows, Linux/BSD/Unix operation system drivers clean the part of inode information after file is deleted: it fills with zero object size, object type/mode information and object allocation information. This means after file if deleted, software knows nothing about it.

Assume files 2 and 3 on Figure 1 are RAW encrypted files (no headers; both are like 'white noise'). Assume both takes full blocks and both are deleted. These means that no allocation information left and data recovery software is unable to detect file 2 or file 3 boundaries: both are too similar and they are mixed one with another. This is classic example of 'impossible recovery' situation.

Practical situation is not so far from this: binary files fragments are too similar for data recovery software; there is much heavy fragmentation and so on. All this makes pessimistic prognoses for file undelete.


No chance?

Recovery is still possible. There set of recovery methods that could help with file recovery, but none can give 100%, or even 80% result (remember about encrypted or compressed file fragments that data recovery software is unable to classify):
  • File system journal analysis. Previous versions of file descriptors still might remain in journal so recoverty could be possible.
  • Incomplete structures analysis. Software may guess file system files by non-wiped fragments of file metadata that still may occur on disk.
  • Signature-based search: software searches 'known' file fragments and makes assumption about next continuous fragment contents; usually unable to give exact file size (except there are found known file header that contains file size by itself); helpless for cases of heavy fragmentation.
  • Statistical fragments analysis: makes assumption about fragment links, based on statistical methods of data analysis. Could be helpful for homogenic file (most bmp pictures, some archives etc.), but is helpless for heterogenic content (like CD/DVD images etc);
  • Lost file system structures search: software finds lost file system structures that could help to determine layout of lost fragments.
If you are thinking about data recovery by yourself, be ready that any data recovery software for Linux/Unix/BSD file systems is often unable to process with real undelete. Large part of recovery process could be the manual work with unnamed files or file fragments analysis.

Our data recovery products now are developed into two recovery directions: IntelliRAW™ signature search (to recognize file types) and file system structures analysis. This allows combining set of stages of data recovery with level-by-level data classification and going closer to best recovery rates.


Last update: 04.06.2009
Copyright © 2004-2009 SysDevSoftware, the Development & Research division of SysDev Laboratories LLC. All rights reserved.