Topics:   Apple   -   Microsoft   -   Linux   -   Unix

DeDup... WOW

Okay we all have heard about how cool dedup is production environments like for backups, and vmware or other virtual environments where you have lots of repeated data, but I am simply amazed at how much duplicate data I have on my home boxes, so far just copying my directories from the old non-dedupped pool to the one that I have converted to the latest zpool version and enabled dedup is amazing, I haven?t even finished copying everything yet, but I have done 592GB so far and have a dedup ratio of 1.43 which gives me about 254GB of duplicate data and that is equal to free disk space, this is all without trying to produce duplicated blocks. Now another amazing fact is that most of this data was pdf?s, mp3?s and video files, and iso?s for OSes all of these file types mostly contain compressed data or were compressed using bzip2 or gzip, yet ZFS was able to squeeze the data even space saving out of disk.

Since I frequent #zfs on freenode.org, I guess I hear more users coming online asking about what is the best motherboard, controller, sata drive layout for their killer home fileserver, I have seen users talk about having a dozen or more 1.5TB drives, in a raidz2 layout so approximately 10TB of usable space. It always amazes me how a home user can use that much space, I have 1.5TB of storage in my raidz pool, which is huge and took quite a while to fill up with stuff that mostly I won?t ever use, but can?t stand to part with because I might need, and there is no real way for a home user to back up large quantites of data, DVD?s aren?t really an option, I don?t know of an application that burns dvd?s and offers an interface that will sit and ask you to insert your next DVD? 20 years ago there was a program that did this with floppies but I may be missing something but this really isn?t available for Solaris yet, not sure I would sit there and feed 100 DVD-r?s into my fileserver, and it would probably take a week to backup 25% of my small 1.5 TB fileserver, bluray burners are just too expensive as is the media, and 1TB over DSL or cable is just too painful to think about and costly as well.

What I have found is the more space someone has the less likely they are to delete stuff that they should. There is also a much greater chance of having files just plain missplaced, while moving files to the dedupped pool I noticed, files that I had moved from other older boxes to my large raidz pool that were hidden away in [machinename]/home/[username]/.local/share/Trash ? oops those were deleted month or years ago and were supposed to be gone but some applet hid them away and the applet will never delete them so there they sit wasting disk space.

I can only imagine what people who have these monster ZFS fileservers with 10+ TB of disk space will find one day? I guess it will really be fun for those geeks that want to try and fill there messavie mega ZFS servers and dedup keeps finding all the duplicates so there attempts at filling them run into a brick wall after about 5TB or so because they won?t be able to find enough unique data they are interested in, come on how much anime, and rock mp3?s are there out there once you dedup them ;-)

 

More Stories in Unix Admin Corner