Topics:   Apple   -   Microsoft   -   Linux   -   Unix

Bad Disk and destroying the pool, back up in less than 5 hours.

After about 6 months of being on  its death bed my 36GB SCSI hard disk finally died, it had been giving me a few zpool checksum errors from time to time, a few scsi errors in dmesg but it kept working i had cleared scrubbed the pool and cleared the errors 3 or 4 times in the last months, the scsi drive was probably 4-5 years old when I bought it off eBay and I had it another 5 years or so its death was not unexpected. Not sure what I had planned when I put the disk in the raidz pool that holds 2 other 18.2GB drives. Turns out the replacement drive I purchased 6months ago was a few megabytes smaller than the other `18.2GB drives, not sure of the exact difference, it was enough to make zfs that is part of Solaris 10u8 to complain replacement disk too small, this is supposedly fixed in Opensolaris as this is a Sparc box and I?m not ready to deploy sparc opensolaris on my system yet. I needed a solution, So what to do, the pool had 190 filesystems and snapshots cross mounted all over the system and I didn?t feel like scripting some script to copy and recreate the layout. Turns out it wasn?t necessary, 6 commands is all it took to get the system back to normal, and enough disks space to hold the data

zfs snapshot ?R pool@now # creates a snapshot called @now on every filesystem in the pool.
zfs send ?R pool@now > /largedisk/pool_backup_zfs_stream
zpool destroy pool # add in ?f if necessary
zpool create pool raidz c1t0d0 c1t1d0 c2t0d0 # recreate the pool
cat /largedisk/pool_backup_zfs_stream | zfs recv -Fd -v pool
reboot # restore all the complicated mount points and restart services that failed after the pool destruction


Not only did this restore the pool?s filesystems back to where they left off, it recreated all the snapshots, and mountpoints and all the zfs settings. It did it so well that all programs stored on the pool came up without any issues. I was simply amazed how easy it was to recover the system, including 190 filesystems and snapshots composing nearly 30GB of data in about 3 hours. I don?t think a standard raid5 lvm can be created and mounted in this amount of time much less restoring a complex filesystem structures and all their data.

 

More Stories in Unix Admin Corner