The insight is that scale causes rare events to become common.
Sure, without automated monitoring we’d be blind, and without automated problem-solving we’d be overwhelmed. So yes, “automate everything.”
But some things you can’t automate. … You can’t “automate” the recruiting, training, rapport, culture, and downright caring of teams of human beings who are awake 24/7/365, with skills ranging from multi-tasking on support chat to communicating clearly and professionally over the phone to logging into servers and identifying and fixing issues as fast as (humanly?) possible.
And you can’t “automate” away the rare things, even the technical ones. By their nature they’re difficult to define, hence difficult to monitor, and difficult to repair without the forensic skills of a human engineer.
with high growth, the surprise appears quickly