Three Devs And A Maybe

148: Site Reliability Engineering with Niall Murphy

Informações:

Sinopse

In this week’s episode we are lucky to be joined by Niall Murphy to discuss the discipline of Site Reliability Engineering. We start off by speaking about how he got into computing, how the SRE role came to be and what drew him to it. From here, we highlight the position of an SRE within a company/group, what SLA’s are, the positives of having 50% operations work caps and blameless postmortems. This leads us to talk about the reasoning behind striving for 100% uptime is actually detrimental to the product, and the benefits of having an Error Budget. Finally, we discuss how the role has evolved since its inception, the Wheel of Misfortune and what drew him to contribute to the seminal SRE book.