Observability (C3-P-O series)

As I described in the introduction to this series, I’ve used the acronym C3-P-O as a tool to help the teams I lead remember some key principles for maintaining a healthy infrastructure environment. This post is about the “O” in the acronym - Observability.

I touched on Observability in my previous post on PRO Infrastructure, so I won’t go into much further detail here other than to reinforce the principle.

It is vitally important that all aspects of a service’s behavior can be seen, tracked, trended, and interrogated. Only by gathering metrics, making them visible, and comparing them to expected and historical trend lines can we understand whether the component is healthy, operating as expected, and fulfilling intended purpose. This information should be made broadly available throughout the organization so that others don’t have to guess about what might be happening. Going further, there are now commonly available tools and services that are capable of using historical data to very accurately predict when trouble is brewing. That’s a superpower that you absolutely need to build into your organization’s capabilites.

One of the best resources on Observability that I’ve come across is available from Honeycomb, covering the critical aspects of observability engineering. I can’t add much to what they’ve published on the subject, so head over and check it out.

That’s it for C3-P-O - I hope you enjoyed the series and maybe took a few useful tidbits from it.

Comments


Copyright

CC BY-NC-ND 4.0