How useful is unrm when it comes to the undeletion of larger files?

My assumtion was, since for FFS-type filesystems rotdelay is 0 by default (see man tunefs), and cylinder groups are switched if 25% of the capacity of a group is reached, that a large file would be broken into chunks the size of 25% of a cylinder group. So for files the size of a cylinder group (typically 5 - 20 MB), things should not look too bad.

For my port of TCT to HP-UX I tested unrm on a file > 2GB multiple times to verify correct functioning of the large-file support: I used bigfile to create a 2.1 GB file on an empty, freshly created 3 GB FFS filesystem. Then I filled the disk up to the last fragment (df -g gives detailed info about a filesystem in HP-UX) with additional files containing "a"s. I deleted the 2.1 GB file, invoked unrm and ran checkbig on it's output. While waiting almost three hours during the unrm run, I thought I should analyze the output of unrm to see how contiguously that file was stored on the disk. Since I created my file with the bigfile utility, it essentially contained a huge number of sequentially numbered lines, each 1024 chars long. So it was easy to add some logic to checkbig to report the number of contiguous line numbers it finds when checking the unrm output for completeness. Postprocessing this output into a histogram (using histo.pl from bigtest-0.6.tar.gz) and sending it to gnuplot, yielded the following surprise:

The 3 GB partition used a cylinder group size of 10 MB. That means the peak around 2500 is what I was hoping to see. But instead of ca. 800 2.5 MB chunks I had a total of 1849 chunks, with only 300+ of them being about 2.5 MB in size. And there is the peak a the very left of the x-axis, more than 500 chunks of 8 KB, which is the minimum logical block size of the that filesystem. How come? Looking at the raw output of checkbig was not very enlighting:

Allocation cluster 1 start 1536400 end 1536423 size 24
Allocation cluster 2 start 0 end 95 size 96
Allocation cluster 3 start 1536424 end 1536447 size 24
Allocation cluster 4 start 1536288 end 1536399 size 112
Allocation cluster 5 start 766240 end 768735 size 2496
Allocation cluster 6 start 1536448 end 1538783 size 2336
Zero block found at offset: 5218304d|0x4fa000h
Zero block found at offset: 5219328d|0x4fa400h
Zero block found at offset: 5220352d|0x4fa800h
Zero block found at offset: 5221376d|0x4fac00h
Zero block found at offset: 5222400d|0x4fb000h
Zero block found at offset: 5223424d|0x4fb400h
Zero block found at offset: 5224448d|0x4fb800h
Zero block found at offset: 5225472d|0x4fbc00h
Allocation cluster 7 start 768744 end 768751 size 8
Allocation cluster 8 start 104 end 135 size 32
Allocation cluster 9 start 96 end 103 size 8
Allocation cluster 10 start 768736 end 768743 size 8
Allocation cluster 11 start 136 end 2591 size 2456
Allocation cluster 12 start 768752 end 770143 size 1392
Allocation cluster 13 start 1538784 end 1540191 size 1408
Allocation cluster 14 start 770144 end 770151 size 8
Allocation cluster 15 start 2592 end 2679 size 88

Then I decided to plot the chunk size over the starting line number of each chunk, to see of the allocation behaviour changed over time, while the file was being created:

Again a surprise, what looked random in the raw data now seems to be three different strategies, with the first strategy change occurring when the file was approx. 800 MB in size. The second change obviously occurred around 1.6 GB. The third phase is what I would have expected throughout the whole file, i.e. almost all chunks being 2.5 MB in size.

Currently I have no explanation for these "strategy changes". As mentioned above, the file was created in a fresh, empty 3GB partition, so there should have been no space constraints influencing the allocation behaviour. At first I suspected that the 1K writes that bigfile uses when writing line after line might influence the allocation strategy. But I ran a second test where I created the large file on a different partition and then copied it with cp into the fresh, empty 3 GB partition. The results were 100% identical. Time permitting, I will rewrite bigfile to use mmap when creating the file and using madvise(LINEAR) to give some a-priori knowledge to the OS while creating the file. But it may well be that madvise only influences the paging strategy and not the disk allocation strategy...

For people wanting to undelete larger files from FFS filesystems these results mean that one has to expect a substantial fragmentation. To "reassemble" the file from the chunks will require detailed knowledge about the file content's structure.