10 Years of Keeping the Cloud Clean With Cloud Custodian
Celebrating a decade since Cloud Custodian was open sourced
When I joined Capital One in 2015, the company was deep into one of the most ambitious cloud transformations in financial services. The vision came straight from the top:
“Increasingly we’re focusing on cloud computing and building the underlying capabilities such that product development will be faster and faster and more effective.” — Richard Fairbank, CEO, Capital One -2015
Cloud Custodian was born in that moment — inside a large enterprise moving fast to embrace the cloud, while needing to ensure control at scale. The challenges were universal: cost sprawl, security posture, compliance across hundreds of teams. How do you give developers the freedom to innovate with any tools they choose, while keeping the organization well-managed?
The answer at the time was one-off scripts and vendor tools that showed you the problem but didn’t fix it. We needed something better. Not another dashboard to admire problems in, but a tool that could actually fix them in real time, at scale, while also educating developers on the best way to do things in the future, all while ensuring the organizational policies were continuously enforced.
I kept coming back to something I’d noticed across my career: in software, you spend far more time reading code than writing it. Cloud governance needed the same thing, a shared declarative language, readable by engineers, FinOps, and security teams alike. One that could not just identify waste and risk, but act on it. Shut down the idle resources. Enforce the tagging. Fix the misconfiguration. Right then in realtime, not next quarter. Real world remediation that works at enterprise scale, inclusive of exception management and policy application across varying environments
That gap is what became Cloud Custodian. Starting in July 2015, I began building a rules engine for cloud governance and took it to enterprise wide adoption. On April 19, 2016, it was announced at AWS Summit in Chicago that Cloud Custodian was open source.

That was the whole idea. It still is.
That decision to open source it turned out to be the most important one we made. Because every organization moving to the cloud was hitting the same wall. Financial services, healthcare, retail, gaming, technology: the specific rules were different, but the underlying problem was identical. They needed a shared language for what “well-managed” means and a continuous workflow to enforce it.

Ten years later that problem hasn’t gone away. If anything it’s gotten harder. Cloud environments are more complex, multi-cloud is the norm, and now AI and machine learning workloads are adding entirely new categories of resources: GPU fleets, model serving infrastructure, training pipelines, with cost profiles and security surfaces that most governance tooling wasn’t built for. And with AI agents now generating and deploying infrastructure code autonomously, the volume and velocity of what needs to be governed has never been higher. The organizations getting this right aren’t running quarterly cloud audits. They’re enforcing policy continuously, as code, as close to the source as possible.
That’s exactly what Cloud Custodian was designed to do when we open sourced it ten years ago. And it’s exactly why it matters more today than ever.
Here’s the story of ten years of keeping the cloud clean.
Early Adoption
The response to the open source release was faster than any of us expected. Within weeks of the April 2016 announcement, teams from other organizations were filing issues and submitting pull requests. They were hitting the exact same problems — tagging chaos, security configuration drift, cloud bills nobody could fully explain. The tool resonated because the problem was universal.
The community that formed early was pragmatic and production-focused. These weren’t people kicking tires — they were running real workloads and needed real solutions. Policies that had been maintained as one-off scripts at one organization started showing up as contributions that worked across many.
Going Multi-Cloud
The multi-cloud story is really a community story. Cloud providers heard the same ask from their customers and responded by contributing directly to the project. Microsoft contributed a team of engineers to build out Azure. The same happened with GCP, then Oracle contributing OCI support, then Tencent contributing Tencent Cloud. Each new provider was a group of customers of those providers saying: we have this problem too, and we want to solve it together. What made it even more meaningful was that engineers from the cloud providers themselves — AWS, Microsoft, and Google — became active contributors too, helping ensure Cloud Custodian worked natively and deeply with their own platforms.
This is what made Cloud Custodian genuinely cloud-agnostic. The same YAML policy language, the same resource-filter-action primitives, the same CI/CD workflow — applied natively to each cloud’s own APIs and event streams. You learn it once. It works everywhere.
The CNCF Journey
On June 25, 2020 Capital One donated Cloud Custodian to the CNCF Sandbox, giving the project neutral governance and open stewardship. No single company controlling its direction, just a community building something together.
The Sandbox was the first step. Two years later in September 2022 the community completed the due diligence process to achieve CNCF Incubating status, demonstrating production adoption at scale, a healthy diverse contributor base, and sound open source practices. Along the way the project completed an independent security audit with Ada Logics, integrated continuous fuzzing via OSS-Fuzz, and shipped signed container artifacts. The kind of rigor enterprise users rightly expect from a project running in their most critical environments.
The next step is CNCF Graduation, the foundation’s highest maturity level, and the project is actively working toward that milestone.
Kubernetes and Shift Left
As Kubernetes became the default runtime for enterprise workloads, the community had a natural ask: can we bring the same governance as code approach to our clusters?
In October 2022 Cloud Custodian shipped a Kubernetes admission controller, bringing the same YAML policy language to in-cluster enforcement. No new tool, no new language to learn.
That same release brought Terraform and shift-left support, extending governance as code to infrastructure before it’s even provisioned. Catch misconfigurations in the CI pipeline, not in production. And it works on a developer workstation pre-commit, or directly annotates pull requests in code hosting platforms.
Together these two capabilities completed the governance as code lifecycle: validate in IaC, enforce at cluster admission, remediate in the live cloud. One language, one workflow, end to end.
Community and Contributors
Ten years in, the numbers speak for themselves: 500M+ downloads, 500+ contributors, and thousands of organizations running Cloud Custodian in production across every major industry.
But the real story is the people. Engineers from organizations like Capital One, Intuit, JP Morgan Chase, FICO, Siemens, Microsoft, Amazon, Cox Automotive, Avalara, HBO Max, FactSet, Ticketmaster, Verint, Datadog, Eli Lilly, Northwestern Mutual and many more have all shaped what this project is today. Cloud provider engineers from AWS, Microsoft, and Google contributed directly, helping ensure Cloud Custodian works natively with their platforms.
Feature requests came from real production problems. Bug reports came from real environments. The clouds we support today were built by the people who needed them.
The Next Chapter: AI, Data, and Community
Ten years ago, “governance as code” was a novel framing. Today it’s an expectation. The question for the next decade is how governance as code evolves as the cloud itself evolves.
The most immediate frontier is AI infrastructure. GPU fleets, model serving endpoints, training pipelines, vector databases — these are not just new resource types, they are a new category of cloud spend that most governance tooling wasn’t designed for. The cost profile of a single GPU instance left running overnight dwarfs what an idle EC2 ever cost. And with AI agents now autonomously generating and deploying infrastructure, the velocity of provisioning has outpaced any manual governance process. Cloud Custodian is already extending deeper support for AI-native cloud services including AWS Bedrock, Google Vertex AI, and others, so the same governance as code policies that manage your compute and storage can govern your AI workloads too. Building on that foundation, at Stacklet we are taking this further with a control plane for cloud and AI, autonomously optimizing cost, enforcing security, and ensuring compliance without waiting for a human to act.
Data platforms like Snowflake and Databricks have become critical parts of the enterprise cloud footprint, with their own cost sprawl, access control challenges, and compliance requirements. Runaway queries, idle warehouses, unchecked data sharing — these are governance problems, and they deserve the same governance as code treatment that cloud resources have had for a decade. Extending Cloud Custodian to data platforms is the next natural step.
Cloud Custodian’s role in that future is the same as it has always been, to be the best open source tool for expressing what “well-managed” means in code, and for enforcing it continuously across every cloud, cluster, data platform, and AI workload you run.
Thank You
To every contributor who has opened a pull request, filed a bug, written documentation, or helped someone in the community chat over the past ten years, thank you. To Capital One for having the conviction to build this and the generosity to open source it. And to the CNCF for providing a home where it could grow beyond us.
Here’s to the next ten years.
Categories
- AI
- cloud-custodian
- FinOps