CONTRIBUTED BY RAVI SHARMA, MANAGER, SOFTWARE DEVELOPMENT, T-Mobile
Most people know T-Mobile as a wireless service provider. After all, we have an international presence and we’re the third largest mobile carrier in the United States. But we’re also a technology company with new products that include our TVision Home television service, our T-Mobile Money personal banking offering, and our SyncUp Drive vehicle monitoring and roadside assistance device.
Behind the scenes, T-Mobile is also a leader in the open source community. We have shared 35+ code repositories on GitHub — including our POET pipeline framework automation library — to help other organizations support their internal and external customers by adopting robust and intelligent practices that speed up the CI/CD cycle.
I’m a senior systems reliability engineer in T-Mobile’s system reliability engineering (SRE) unit. Our team successfully rolled out phase-1 of POET implementation to 30+ teams. This was a huge success and the plan is to scale it up to our 350 developer teams and 5,000 active users with a stable, reliable CI/ CD pipeline using a combination of Jenkins and CloudBees Core running on a Kubernetes cluster.
FEWER PLUGINS, MORE MASTERS
We started by building a streamlined container-based pipeline infrastructure that is centrally managed and easily adaptable to development methodologies. The result frees our developer teams to focus on developing and testing applications rather than on maintaining the Jenkins environment.
We then reduced the number of Jenkins plugins we use in our master from 200 to four. There are over 1,000 such add-ons, including build tools, testing utilities and cloud-integration resources. They are an excellent way to extend the platform, but they are also the Achilles’ heel of Jenkins because they can cause conflicts.
Next, we moved from a single master powering all our Jenkins slaves to multiple masters, and now have 30 pipeline engines powering roughly 10 teams each. This setup has reduced CPU loads and other bottlenecks while allowing T-Mobile’s DevOps teams to continue enjoying the benefits of horizontal scaling.
SPINNING-UP JENKINS PIPELINES IN TWO MINUTES
As a result of this work, my SRE team can now spin up a Jenkins master from a Docker image in roughly two minutes, test it and roll it out to our production environment. Individual teams can then customize their CI/CD pipelines to meet the needs of specific projects. We allow these teams to extend the platform, but we have restricted the list of add-ons to 16 core plugins. These plugins are preconfigured in a Docker container, and every team starts with an identical CI/CD pipeline, which they can then set up to their liking at the folder level.
This streamlined and centralized approach to deploying our pipeline allows the SRE team to put everything in motion and then get out of the way. But that’s only half the story. The real magic happens when our developer teams take ownership of the simplified CI/CD pipelines. They no longer have to worry about the underlying Jenkins technology and can shift their attention to on-boarding their solutions.
The POET Pipeline minimizes the need for Jenkins Groovy code, which is cumbersome, error-prone and difficult to incorporate into third-party libraries. Instead, everything starts with pipeline definition files located within the pipeline source code and step containers are created to perform builds, deployments and other pipeline functions.
We include 40 generic containers in the POET Pipeline, so our developers don’t have to start from scratch. Of course, they have to know how to create Docker containers and how to write a YAML file to extend the pipeline functionalities. By simplifying the infrastructure, keeping plugins to a minimum and eliminating the need for Groovy, we’ve given our developers the freedom to define their own pipelines without having to depend on a centralized management team.
OUR DEVELOPERS ARE NO LONGER INFRASTRUCTURE ENGINEERS
To further empower our developers, we’ve authored comprehensive POET Pipeline documentation, including easy-to-understand help files, tutorials and videos for self-guided learning and self-service support. This valuable resource also frees up our pipeline management team and our developers to concentrate on innovation.
This documentation is part of the “customer-focused” approach we’ve adopted. We treat our internal development teams as our customers, and the POET Pipeline is our product. Can you imagine T-Mobile asking subscribers to rebuild their smartphones every time they make a call? Or making them talk to a CSR before sending a text message? Then why should we ask our developers to serve double duty as infrastructure engineers?
On top of keeping developers happy and simplifying management tasks, our streamlined POET Pipeline framework has dramatically reduced downtime. Our plugin-heavy, single-master Jenkins environment hogged CPU cycles, caused all kinds of configuration headaches and was constantly going down.
On any given week, we had to restart Jenkins two or three times. Sometimes, our builds put such a strain on our environment that we had to restart it overnight and reset everything when our teams weren’t working. With POET Pipeline, we’ve reduced downtime to a single such incident per year.
SCALING OUR SUCCESSES
By eliminating the need for a pipeline specialist on every development team, we have also incurred substantial labor and cost savings as a result of our work with Jenkins and CloudBees Core. When you consider a typical work year of 2,000 hours and multiply that by 350 teams, you’re looking at hundreds of thousands of hours and tens of millions of dollars. We can now redirect these resources into and to building revenue-generating products to better serve T-Mobile’s external customers.
These numbers are huge, but don’t let them fool you into thinking that the POET Pipeline is not for you. We may have hundreds of teams and thousands of developers, but Jenkins is scalable, and any size organization can use the tools we’ve developed. That’s why we’ve chosen to share our pipeline with the open source community on GitHub.
INNOVATING WITH THE WORLD
Innovation does not occur in a vacuum. By putting our code out there for others to use and modify, we are helping developers around the world shift their focus from managing pipelines to building better applications. In turn, we benefit by applying the wisdom of the wider community to our internal projects.
Everyone wins, but the real winners are T-Mobile’s customers. They can look forward to new and improved offerings because we’re spending less time managing our pipeline framework and more time delivering the products and services that simplify and enhance their lives.We have also incurred substantial labor and cost savings as a result of our work with Jenkins and CloudBees Core. When you consider a typical work year of 2,000 hours and multiply that by 350 teams, you’re looking at hundreds of thousands of hours and tens of millions of dollars.