Fragmentation is an issue for hard drives because they are mechanical devices with seek times and (to a lesser extent) varying read speeds. If you could reduce the seek time of a hard drive to zero, there would be no need to defrag the drive, because every sector of the drive would take just as long to read, so the performance improvement of rearranging the sectors would be zero. In a mechanical drive this will never happen, simply because the laws of physics tell us that any object with mass and velocity requires time to reach its destination, and time to change course.
In any RAM-based system, the rules change. It takes just as long to read one memory address as another. If it didn't, then our software would behave very differently. So if you want to read 2k from memory address A00000 it will take as long as if you wanted to read it from address B00000 or 000000. There is no "seek" time, just a "read" time. Similarly, reading 4k from two different addresses (2 x 2k) will take as long as reading them from adjacent addresses.
I have ignored the effects of caching for the purposes of this example, because I am assuming the worst case, where no caching has occurred, either for the hard drive or the RAM drive.
So why did I defrag it the first time round? I guess I associate "defrag" with "tidy up". There used to be a Norton Utility called DS for Directory Sort, and on slow DOS systems with a lot of file names in a single folder, DS helped quite a bit. Nowadays Windows does the sorting for you, so provided the directories are not too long, you don't even notice the sort order of a folder.
Now for the fun part: Flash Drives have a limited lifespan: typically 100,000-300,000 writes. There is no restriction on the number of times you can read the drive, just write it. On a well-designed drive, this means you would have to write new information to the entire drive 27 times a day for 10-30 years before the drive is exhausted. On a cheap or badly-designed drive, the story is different. Any change to the size (in clusters) or location of the file affects the File Allocation Table (FAT), so the FAT gets rewritten each time such a change is made. On a badly-designed flash drive, the FAT is rewritten to the same memory location each time, so that individual memory area will fail after 100000 writes, resulting in data loss. Most flash drives use the FAT32 file system.
Bear in mind that a defrag program changes the FAT after each cluster (or group of clusters) is moved, so that defragmenting a file with 27 fragments with resut in 27 changes to the FAT, and therefore 27 writes to the same area of memory. This is done to prevent data loss due to a reset or power failure. Defrag programs typically remove hundreds if not thousands of fragments, so a defrag pass has the potential of creating thousands of write requests to the FAT, as well as all the other reads and writes going on in other areas of the drive.
If your defrag program removes 100000 fragments, it has the potential to permanently damage the flash drive! Without wanting to sound alarmist, it is something to consider, and since you have no way of knowing whether your flash drive is well designed or not, it isn't worth the risk. Why waste time defragging to gain 0% performance improvement in exchange for the risk of 100% data loss? The technique for avoiding this situation is called "wear levelling", but not all manufacturers use it. I found the following description
Longevity/LifespanThis quote sounds optimistic, because it is written by a flash drive manufacturer. I've used several USB Flash drives, (and lost 2 of them) and the chance of data loss by misplacing the entire device is higher than wearing it out, but the point I'm trying to make is that flash drives are not designed or intended to be defragmented.
Unlike DRAM, flash memory chips have a limited lifespan. Further, different flash chips have a different number of write cycles before errors start to occur. Flash chips with 300,000 write cycles are common, and currently the best flash chips are rated at 1,000,000 write cycles per block (with 8,000 blocks per chip). Now, just because a flash chip has a given write cycle rating, it doesn't mean that the chip will self-destruct as soon as that threshold is reached. It means that a flash chip with a 1 million Erase/Write endurance threshold limit will have only 0.02 percent of the sample population turn into a bad block when the write threshold is reached for that block.
The better flash solid state flash drive manufacturers have two ways to increase the longevity of the drives: First, a "balancing" algorithm is used. This monitors how many times each disk block has been written. This will greatly extend the life of the drive. The better manufacturers have "wear-leveling" algorithms that balance the data intelligently, avoiding both exacerbating the wearing of the blocks and "thrashing" of the disk: When a given block has been written above a certain percentage threshold, the solid state flash drive will (in the background, avoiding performance decreases) swap the data in that block with the data in a block that has exhibited a "read-only-like" characteristic.
Second, should bad blocks occur, they are mapped out as they would be on a rotating disk. With usage patterns of writing gigabytes per day, each flash-based solid state flash drive should last hundreds of years, depending on capacity. If it has a DRAM cache, it'll last even longer.
Hard drives wear out too, but defragging can help extend the life of a mechanical hard drive, not to mention improve performance. The same is not true for flash drives. You have been warned!
1 comment:
Hi Donn,
Interesting post, as usual! I just wanted to make a couple of points. There are actually 2 types of flash memory: NOR and NAND. NOR is like RAM, NAND mimics a block device (like HDD). You can see a description at http://en.wikipedia.org/wiki/Flash_memory.
Also, regarding the 100,000-300,000 write lifespan - that is the case for NOR flash. NAND endurance is 1,000,000 erase cycles and above (http://www.data-io.com/pdf/NAND/Samsung/NANDtechnology.pdf).
Just wanted to point out about the NAND differences.
Joe Abusamra
Post a Comment