Most Popular in Unix
-
ILOM Sunservice login incorrect
-
Auto putty login and an ebook required
-
zlogin: makeutx failed
-
what is the difference between /etc/mnttab and /etc/vfstab
-
Jumpstart Error: Unsupported version (3.0).
-
SAN/Solaris: luxadm shows NOT CONNECTED
-
involuntary context switching
-
Net Calculator
-
Installing bash on AIX 6.1
-
SFTP check if file exists
VDI part 3 and DEDUP can lead to system instability.
I hope you all aren?t bored with VDI yet, but at least I?m posting again. I do find VDI incredibly cool and powerful technology. I can clone Virtual Box guests in less than 10 seconds once the guest image is imported, then I can try something and throw it away when I?m done.
Now onto some growing pains and work a rounds for them. I followed the VDI demo config which says you need to limit ZFS ARC to 2GB, which is all good and fine if you are only running VDI on the box, but of course I?m not. I am running OSOL b132, with dedup enabled(at least on some file systems) on the VirtualBox host, and if you follow zfs-discuss you will notice a disturbing number of occurrence of ZFS taking long time to do things on file systems that have dedup enabled, not only are writes slow, unless you have a slog (separate logging device(s)) and large amounts of ram, but zfs destroy can take long times and if not allowed to complete, on reboot zpool importing can take even longer, some reports up to 60 hours without use of the machine.
Last night I decided to do some clean up on my system, I thought I had cleaned out the file system, using rm ?rf that really doesn?t impose the large penalty of a zfs destroy at least on b132, and I thought I was good. But the ?f means don?t bug me if I do fail so I ended up with 12GB of files left on the file system when I ran zfs destroy since the system didn?t return immediately, I dug into my ZFS tool kit of utilities, and tried zilstat which showed little activity, then I moved onto arcstat which showed about 30% arc cache misses, and iopattern (that is part of Brendan Gregg?s DTrace toolkit) showed 100% random IO thus why it was taking so long, this proceeded for over an hour, the box was still quite usable, and responsive then arcstat began improving I was only having less than 10% cache misses which lead to my hope that it would be done soon, but then the box decided not to respond, it had decent ping times, but ssh connections were timing out. I happen to have gkrellm running over ssh ?X which was periodically updating and I saw one core pegged at 100%, the other less than 5% busy, and the disk was doing less than 500kB/s which is bad for a 1TB SATA drive in a zpool by itself usually capable of delivering 80MB/s or more. The box did eventually get back to normal. In the end Linux box that was using an iSCSI target on the pool timed out and was pretty unhappy and had to be rebooted, but that was the worst of it.
Since that happened I decided to change the ARC max limit I set in /etc/system, it is now set to 4GB max ARC, and I have put lockNgo the program I mentioned in part 2 of this series into root?s crontab and now executes every 5 minutes, when I woke up this morning I logged into a VDI desktop, and all went smoothly and there was 1.5GB of free ram available on the system. I may try removing the arc limitation and just adjusting the amount of memory that lockNgo forces to be available on the system, currently though I do give up the use of 1GB of ram, it?s better than limiting ARC to 2GB on a fileserver. I should also be able to tune the cronjob not to run 24/7. I?m not going to need a virtual at 3am so why reserve the memory. I still haven?t got around to doing any more zfs destroy?s but I will have to do more clean up eventually.
My VDI setup currently is using 2 boxes, the Sun Blade1000 is running Solaris 10u8, VDI core, clustered Mysql, and OpenDS, as well as acting as the iSCSI targets for virtuals so that it doesn?t compete with VirtualBox on the AMD machine for memory, and will soon be expanding to include Squid and DNS so I can power down my Sun Blade 1500 (Frankenstein), to save a bit of power since the blade1000 isn?t really taxed running VDI. I am currently serving Virtual Desktops on upto 3 Sun Ray 1?s and possibly on Sun Desktop Client which is a virtual Sun Ray 1 that runs on windows. I haven?t started up more than 2 Virtual desktop yet but have more testing to do.
Now onto some growing pains and work a rounds for them. I followed the VDI demo config which says you need to limit ZFS ARC to 2GB, which is all good and fine if you are only running VDI on the box, but of course I?m not. I am running OSOL b132, with dedup enabled(at least on some file systems) on the VirtualBox host, and if you follow zfs-discuss you will notice a disturbing number of occurrence of ZFS taking long time to do things on file systems that have dedup enabled, not only are writes slow, unless you have a slog (separate logging device(s)) and large amounts of ram, but zfs destroy can take long times and if not allowed to complete, on reboot zpool importing can take even longer, some reports up to 60 hours without use of the machine.
Last night I decided to do some clean up on my system, I thought I had cleaned out the file system, using rm ?rf that really doesn?t impose the large penalty of a zfs destroy at least on b132, and I thought I was good. But the ?f means don?t bug me if I do fail so I ended up with 12GB of files left on the file system when I ran zfs destroy since the system didn?t return immediately, I dug into my ZFS tool kit of utilities, and tried zilstat which showed little activity, then I moved onto arcstat which showed about 30% arc cache misses, and iopattern (that is part of Brendan Gregg?s DTrace toolkit) showed 100% random IO thus why it was taking so long, this proceeded for over an hour, the box was still quite usable, and responsive then arcstat began improving I was only having less than 10% cache misses which lead to my hope that it would be done soon, but then the box decided not to respond, it had decent ping times, but ssh connections were timing out. I happen to have gkrellm running over ssh ?X which was periodically updating and I saw one core pegged at 100%, the other less than 5% busy, and the disk was doing less than 500kB/s which is bad for a 1TB SATA drive in a zpool by itself usually capable of delivering 80MB/s or more. The box did eventually get back to normal. In the end Linux box that was using an iSCSI target on the pool timed out and was pretty unhappy and had to be rebooted, but that was the worst of it.
Since that happened I decided to change the ARC max limit I set in /etc/system, it is now set to 4GB max ARC, and I have put lockNgo the program I mentioned in part 2 of this series into root?s crontab and now executes every 5 minutes, when I woke up this morning I logged into a VDI desktop, and all went smoothly and there was 1.5GB of free ram available on the system. I may try removing the arc limitation and just adjusting the amount of memory that lockNgo forces to be available on the system, currently though I do give up the use of 1GB of ram, it?s better than limiting ARC to 2GB on a fileserver. I should also be able to tune the cronjob not to run 24/7. I?m not going to need a virtual at 3am so why reserve the memory. I still haven?t got around to doing any more zfs destroy?s but I will have to do more clean up eventually.
My VDI setup currently is using 2 boxes, the Sun Blade1000 is running Solaris 10u8, VDI core, clustered Mysql, and OpenDS, as well as acting as the iSCSI targets for virtuals so that it doesn?t compete with VirtualBox on the AMD machine for memory, and will soon be expanding to include Squid and DNS so I can power down my Sun Blade 1500 (Frankenstein), to save a bit of power since the blade1000 isn?t really taxed running VDI. I am currently serving Virtual Desktops on upto 3 Sun Ray 1?s and possibly on Sun Desktop Client which is a virtual Sun Ray 1 that runs on windows. I haven?t started up more than 2 Virtual desktop yet but have more testing to do.
More Stories in Unix Admin Corner
Most Popular Stories
A fix for those "Pairing Record Missing" errors
ILOM Sunservice login incorrect
Auto putty login and an ebook required
zlogin: makeutx failed
Innovation isn't dead, it just moved to the cloud
Diablo 3 Slow on Mac? Here is a Solution
Planning Board Rendering Unveils New Glass-Enclosed Apple Store for Stanford
7 Online Resources To Trace The History Of Your House
Daily iPhone App: Elenints matches Triple Town's planning with a few new tricks
Meet Heckerty, well-known British children's story, makes its way to the iPad