SysAdmin Weekly - 037 - When Incident Response Plans Meet Reality #8
Replies: 1 comment
-
|
I've been thinking about Incident Response / Backup & DR success and horror stories and I'd love to get the communities take on this as well. I'll start with a story of my own. Many years ago I was working as a MSP field engineer supporting a range of different customers. We had one fairly large medical customer that had a MASSIVE over-dependency that had been pointed out several times. Their practice management application was run entirely on top of a single large SQL instance.... like Terabytes large. I don't recall the exact size, but it was close to double-digits if I recall correctly. See this entity did a number of different imaging tests. So these were large format images that had to be retained for a length of time because.... HIPAA and other regulatory reasons. With that all in mind you likely see the issue here. Production DEPENDED on a SINGLE large SQL instance with no HA involved. Not at the OS layer, not at the SQL layer... not at the hypervisor layer. We had asked several times for a meeting to address this and plan it into the orgs official backup & DR plan to no avail. Well, as so often happens in IT..... Murphy (al a Murphy's Law) decided to show up. That SQL box fell flat on its face one day and required recovery from backup. While we were able to recover the workload and data, it was *NOT instantaneous. The recovery took somewhere in the range of 30ish hours if I recall..... meaning which patients were unable to be seen, or have imaging tests completed, thus holding up potentially urgent medical care until the system was back online. This is one case, where a proper backup & DR discussion and subsequent changes & planning would have made a world of difference. Curious to hear about your stories as well! Would love to share them on the show in the future! Cheers! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey All!
Just dropped a new episode of SysAdmin Weekly that digs into incident response, disaster recovery, and business continuity. The episode talks specifically about why so many plans look great on paper and then completely fall apart the moment something actually goes wrong.
Episode links
Transcript for this episode can be found here >
sysadmin-weekly-037-incident-response-plans-meet-reality.txt
Full episode of the podcast can be viewed here:
YouTube - https://youtu.be/n_AvIKPoM7g
Spotify - https://open.spotify.com/episode/0MNGmYiGlGk6l9v8HrpZit?si=kLvUMflnSVWzl9IBISTyPQ
Also, don't forget our newsletter! It was just updated yesterday! - https://newsletter.sysadminweekly.com/
Discussion Continues
In this episode, Eric and I Discuss:
If you’ve ever:
…this one’s for you.
I’d love to hear from you!
Tell us about your:
Drop your thoughts below 👇
Happy Friday and have a great weekend!
Beta Was this translation helpful? Give feedback.
All reactions