Surviving a Server Disaster, Post-Mortem of a Failed CPanel Transfer
It all started very fine. I spent weeks carefully crafting my new cloud server on https://www.digitalocean.com/ , spent hours performing all sorts of optimizations with great tools from https://www.cloudlinux.com/, brought in my AI powered Best Friend, and threw in every backup option you can imagine.
I then tested, and battle tested Google Cloud Storage Buckets to store static assets at the edge of every major Data Centre around the world, and threw in Universal SSL so all traffic would be encrypted for everybody (Even the cat.jpg ! )
Always Plan For Failure
After having successfully moved over about 100 websites on there over a week, it became almost routine.
That server is so much optimized, running on so much power, with Intel Broadwell 2697Av4 with a clock speed of 2.6Ghz, and the Intel Skylake 8168 with a clock speed of 2.7Ghz, it just hummed away at them… On a busy day of monitoring load, it never got past 20% load…
Then it happened…
I had just finished transferring a website over, with database holding a little over 1Million Records (I seriously need to persuade that client to use https://docs.gravityforms.com/captcha/ on their website… Can’t believe we got 700K worth of spam on there…), and all of a sudden, I got all sorts of High Load alerts… My humming server just spiked at > 80% load within minutes (But didn’t crash, thanks to software load balancing in place)
Keep Calm, Do your Checklist
Time to think, and fast…
I checked server load, MySql went from 11% to 600% (It’s a 8vCPU machine, so 600% means 100% of 6 cores 😉 Took me a few days to figure that one out a couple of years ago when I was young ).
First thing I noticed, all tables were using the MyISAM storage engine. That’s quite old stuff, and I had optimized by DB for InnoDB Storage Engine, so I thought, okay, just migrate the bloody thing and it’ll work…
I kind of worked… Load dropped from 600% to 200%, but the website was still running slower than me in the morning (those who know me will understand 🙂 )
Check the logs
To cut a long story of fighting my way through PROCESS LISTS that got stuck, I finally ended up noticing the tables had lost ALL their indexes !
Oh, for that one, I decided to transfer the database using PHPMyAdmin instead of the CPanel Transfer Utility because that one had failed earlier…
Biggest mistake of my life… For some obscure reasons, the new versions of PHPMyAdmin exports DO NOT preserve indexes… duh… huh… What ? !
So I had choices
- Rebuild the indexes manually
- Find what was wrong and fix it
I decided to fix (I like the challenge)
So I went into the transfer logs, and saw there was an error on one of the tables… More of a view than a table really… Some stupid plugin developers out there… … …
Anyway, I fixed the table, ran mysqldump this time via SSH (this one preserves indexes !), restored onto the server, and restarted MySQL…
1 hour later, it’s at 5% CPU Load !!! And everything is humming calmly again
The Super-Star Function
If you’re new to Server and Database Management, this simple utility could mean life and death to your server (Or whether you’ll still have a job to go to on Monday Morning 🙂 )
mysqldump -u the_user_name -p the_database_name > the_database_name.sql
And stop using PHPMyAdmin (The voice of Umar Bahadoor was screaming in my head, telling me “You’re a DevOps, you know all the command line tools, WHY have you switched to GUI???!!!”)
- Use Your Brains (I raised a support ticket to CPanel, I’m still waiting for an answer…)
- DO NOT EVER export data using PHPMyAdmin (unless I missed one of the options?) if your tables got indexes (Seriously, they should…). Go on to SSH and mysqldump!
- MyISAM storage engine is… … … Just don’t use that… … …
- INDEX your tables… I went from 600% load to 5% load just because of missing indexes… (If I catch one of you using un-indexed tables on that server, you will hear from me ! 😉 )
- Don’t do migrations on a Friday Evening (Seriously… don’t… even CPanel Support isn’t online…)
- Planning to fail does save your bum (If that load balancer wasn’t there, I would probably have had to restore yesterday’s backup… Even after a reboot, MySQL would have used 100% of resources, plus PHP (even php-fpm would fail… Hail mod_lsapi ! ), the server would have constantly crashed after each reboot (I learned that one the hard way a few years ago while under a DDoS attack… solution in that case ? lock down DNS of the culprit website and figure out how to fix it… Since then we’ve moved to Cloudflare’s Global Managed DNS which secures traffic even before it reaches your server, and never suffered downtimes because of a DDoS again…)
So with a little preparedness, and a lot of experience, you can avoid disasters from happening.
You just got to use the right tools for the right job 🙂
April 26, 2018