Teller's Tech

Providing instruction on DevOps, Cloud Computing, and Site Reliability Engineering related topics

06/16/2026

Focus on Top Incidents for Reliability

06/16/2026

Ship It Weekly Podcast: Ship It Conversations: Meta’s Francois Richard on AI Incident Response, SLOs, and Reliability at Scale

This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.

In this Ship It: Conversations episode, I talk with Francois Richard, Engineering Director at Meta, about reliability at scale, how AI is changing production risk, what teams actually learn from incidents, and why recovery practice matters just as much as prevention.

We talk about the proactive and reactive sides of reliability, why SLOs should represent a promise to users instead of just another dashboard number, how incident reviews should drive real system improvements, and how teams can practice recovery before production forces the lesson on them.

The bigger theme here is that reliability is not just about avoiding failure. It is about knowing what happens when prevention fails. That means practicing regional failure, understanding overload behavior, improving incident response, using AI carefully during investigation, and making reliability targets match the actual lifecycle and importance of the system.

Highlights

• Why reliability work starts with both prevention and recovery

In this episode, Francois Richard from Meta discusses the evolving landscape of reliability in software engineering, especially with AI's impact on productio...

06/15/2026

Teller Talks Cloud Glue

06/15/2026

Google Cloud Apigee Bug

06/14/2026

AWS Lambda Tenant Isolation Guide

06/12/2026

Coinbase Matching Engine Outage

06/12/2026

Meta Instagram AI Hijack

06/12/2026

Ship It Weekly Podcast: Coinbase Outage, Meta AI Account Recovery, AWS AgentCore Code Injection, Apigee Tenant Isolation, and the Glue That Breaks Production

This episode of Ship It Weekly is about the hidden glue holding production together.

Brian covers Coinbase’s May 7 outage postmortem, where an AWS us-east-1 cooling failure exposed the difference between being “multi-AZ” on paper and actually being able to recover when stateful, low-latency systems are tied to a failed zone.

Then he looks at Meta’s AI-assisted Instagram support issue and why account recovery is identity infrastructure, not just customer support. If AI can influence password resets, email changes, MFA resets, or account ownership flows, that workflow needs to be treated like a production control plane.

The episode also covers AWS AgentCore CLI CVE-2026-11393, where collaborator metadata could break out into generated Python code during agent import, and an Apigee cross-tenant issue from Google’s Apigee security bulletins that shows why tenant isolation has to be tested beyond the obvious happy path.

This episode of Ship It Weekly discusses critical infrastructure failures and their implications. Brian analyzes Coinbase's outage due to an AWS cooling fail...

06/11/2026

Ship It Weekly KEDA alert

06/10/2026

DevOps Lightning Round

Address

Greencastle, PA
17225

Website

https://tellerstech.com/, https://shipitweekly.fm/, https://oncallbrief.com/

Alerts

Be the first to know and let us send you an email when Teller's Tech posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Teller's Tech

06/16/2026

06/16/2026

06/15/2026

06/15/2026

06/14/2026

06/12/2026

06/12/2026

06/12/2026

06/11/2026

06/10/2026

Address

Website

Alerts

Shortcuts

Share

Category