Insights and Rants: Maximum Misunderstanding about Defrag Benchmarks

Last year MaximumPC wrote an article, "The Disk Defrag Difference" in which they claimed that the defrag programs that they tested were not much better than the built-in defrag program in Windows Vista.
They used the HDD benchmarks in PCMark Vantage to see whether a given defrag program would improve the PCMark score or not. Unfortunately they didn't read the fine print. I stumbled apon it by accident, so I can't criticise them too much. You have to read the PCMark05 Whitepaper quite carefully to find it. Page 17, under the heading "HDD Tests" (emphasis added):

For these tests, we use RankDisk, an application developed and copyrighted by Intel®. In our testing, we found RankDisk to be suitable for a neutral benchmark. RankDisk is used to record a trace of disk activity during usage of typical applications. These traces can then be replayed to measure the performance of disk operations for that usage.
RankDisk records disk access events using the device drivers and bypasses the file system and the operating system’s cache. This makes the measurement independent of the file system overhead or the current state of the operating system. In replaying traces, RankDisk always creates and operates on a new “dummy” file. This file is created in the same (or closest possible) physical location of the target hard disk. This allows the replaying of traces to be safe (does not destroy any existing files) and comparable across different systems. Due to the natural fragmentation of hard disks over time, they should be defragmented before running these tests.
The traces used for each test were created from real usage. The traces contain different amount of writing and reading on the disk; total ratio in the HDD test suite disk operations is 53% reads and 47% of writes.

Bear in mind that the PCMark benchmarks are written to compare different systems to one another to see which one is faster. So it makes sense for them to record a "trace" of disk activity, and then try to emulate it on various systems. But the "trace" data does not refer to files on the test system, only clusters. In other words, the original reference system may have had a system file like user.dll that was loaded from clusters 25,26 and 27. When the HDD "read" benchmark runs on my system it reads clusters 25,26 and 27, irrespective of the location of my copy of user.dll, which could be stored on clusters 127, 254 and 55 if it was fragmented.
In order to benchmark the "write" activity, the original reference system could have written data to cluster 1020, 1021 and 1022. On my system these clusters may be in use, so the benchmark program finds the "nearest" available clusters, say 1200, 1201 and 1203, and does its timing accordingly. It's the only safe way to do a "write" benchmark without corrupting the test system, so from PCMark's point of view, it's close enough.
In my example, notice that cluster 1202 is not available, and this could ruin the benchmark's measured performance, depending on the characteristics of the drive. That's why they say that the benchmark should be run on a defragmented system.
So when I run a PCMark HDD benchmark on my system, I am comparing the read speed of my hard drive to the read speed of the reference system's hard drive. Similarly, I am comparing write speeds, with the exception that the software will use available empty space on mys system, rather than the actual space used on the reference system. It's a limitation of the software, but is required for safety purposes. The benchmark results bear no resemblance to the actual location of system files on the test system, only the location of free disk space.
So in effect MaximumPC was measuring the ability of each defrag program to create free disk space where the benchmark required free space, based on the write patterns of the reference system. This is not the same as measuring the improvement in file placement and thus performance boost claimed by the defrag software.
It's incredibly difficult to measure a "performance gain" on a Windows system in any kind of objective way. I'm still not convinced I have the answer, but time will tell. Raxco have a utility called FileAccessTimer, and Diskeeper refer to readfile in some of their literature.
A package that reads all the files on a PC and times them doesn't help either, because it will read both the "fast" files and the "slow" files, so these will balance one another out. I would love to find a way of timing how long it takes to open a Word document or Excel spreadsheet, but have yet to figure out a way of doing this accurately. Maybe I'll have to write one.

Update Wednesday 10am: I realised that I can create an Excel document that displays the current time, calculated on opening. I can do much the same on an Access data file. So now I have written a very simple VB utility that displays the time and then opens a requested file:

Private Sub Form_Load()
Dim strCommand As String
strCommand = Command()
Form1.txtTimeDisplay.Caption = Format(Time(), "HH:nn:ss") _
& vbCrLf & Timer
Form1.txtFileName = Trim(strCommand)
Form1.Refresh
Call OpenThisDoc(Form1.hwnd, strCommand)
Form1.txtTimeDisplay2.Caption = Format(Time(), "HH:nn:ss") _
& vbCrLf & Timer
End Sub

Unfortuantely it is only accurate to the nearest second, so I'm not sure how useful it will end up being. Time will tell.
Update 2: It seems that Word 2007 and Excel 2007 return control back to the VB program after they have opened the file requested. This enables me to get the launch time accurate to the nearest 10th of a second. Since Access97 already has the "Timer" function built in, the accuracy is maintained there too. I am also including my SMSQ "Stress Test" program in the benchmarking mix.
Test Results: see article above, and more results collected. It turns out the results can be measured within a 15% variation, using an average of 10 measurements taken to the nearest 100th of a second.

Tuesday, February 24, 2009

Maximum Misunderstanding about Defrag Benchmarks

No comments:

Post a Comment