The Lazy Engineer Solution

Never underestimate the lazy engineers to find the simplest solution

I’ve been notified of some ECC RAM errors on some of my blades in the DataCenter…  here is the old way and the new way to deal with it

Old Way…

  1. Contact stakeholders to let them know of issues
  2. Schedule down time of Services
  3. Create maintenance tools Boot CD
  4. Wait until after Business Hours
  5. Drive into DC
  6. Reboot Server and boot Diagnostic CD, hoping that it copied OK.
  7. Chew nails nervously while tools run while fending off calls about when downtime ends
  8. Finish and hope (pray!) the server starts.
  9. Check that all Services have returned to Production Levels
  10. Drive back to Office, listening to 24 Voicemail messages about how Services are down
  11. Send out notices and apologies while boss berates you for downtime

New Way

  1. Let boss know
  2. Download tool and create iso
  3. Migrate Virtual Servers to new Host
  4. Connect to Remote Access Card and set Virtual CD to Diagnostic iso and Boot order
  5. Reboot Host and run tests while getting coffee (the only time I leave my seat…)
  6. Restart blade and migrate Virtual Server back to host
  7. Let boss know it’s all done with no interruption to services

Nothing like lazy Engineers to work out a way that I can do all that without leaving my seat… No wonder Virtualisation is the new way…

EDIT: except that if it was in the Cloud, I wouldn’t have to worry about it at all

