Infrastructure As A Platform09 Nov 2020
Sometimes I hear the words “DevOps is the new IT”. I must say that I can relate to this a lot.
In the early days, IT used to be a generic pool of people managing anything about computers, as well as any other electronic stuff. In the modern world a similar generic placeholder exists, and it has the same mission: close any existing gaps in team composition and staffing. The name of this placeholder varies, people call it DevOps, Ops, Infrastructure, Admin and some other. I do not say that this is bad, in fact I believe the opposite - this proves that this is a necessity. There must be someone looking across the entire stack at work. Different organizations evolved into different approaches to DevOps - some have isolated Ops teams to work in the infrastructure, some have this responsibility mixed into developers responsibilities, while keeping an independent team of SREs to watch the workloads in production. Sometimes developers do deploys themselves, sometimes there is a special team/person doing this.
I would like to share my vision on the place of DevOps and where it is going. By no means this is the only way, this is one of the many ways to approach this topic. I would like to speculate around and try to explain, why I believe that this approach is good.
One of the major goals of developers is to deliver new functionality to the product and make sure old functionality keeps working. This means that anything which goes beyond is a distraction. One may say that this is a simplification because tech debt and stuff, but lets take it as a business point of view, because it is pretty close to real life.
Now i will jot down some regular Infra/Ops-related activities:
- Provision servers
- Configure network
- Update OS versions
- Monitor servers
How do these activities relate to delivering new functionality? Well, if a server does not exist, then one might say that provisioning it and deploying some useful code on it will deliver new functionality, and he will be right. On the other hand If we are doing it for the Nth time, what is new to such functionality?
If we want to provision a new server, will we make it from scratch, or follow some guidelines, or use some tooling?
Meet Internal Tooling
Alright, in case we provision new servers often, using internal tooling to do so might seem legit. It could be a responsibility within the team, or this might fall into a DevOps Tooling (you can use another name here) team. As soon as the product takes off and needs to have more services and environments, it starts to make little sense to provision them manually. There are many reasons:
- This manual labor is an expensive waste of developer work time
- The art of configuring a service becomes arcane knowledge if undocumented
- It is nearly impossible to do controlled improvements of something which merely exists as a mental construct in some people’s heads
- People doing this work become very hard to scale
This is where infra-as-code and configuration management solutions come into play. It provides a way to put arcane knowledge of infra provisioning into defined processes. It is hard to overestimate the benefit - suddenly infrastructure can receive iterative improvements too!
It all sounds very exciting, but one must be careful to keep a zoo of the internal tools under control. Quite often engineers are guilty of over-engineering, because desire to build something new, improve or automate is an inherent quality of an engineer. As product portfolio grows, it becomes important to standardize across the products and reuse existing knowledge where possible.
Infrastructure As A Platform
‘Platform’ is a popular buzzword in software world. It practically means a way to build software in an extensible and programmable way, and then offer these capabilities to the public to build upon. Take a look at Amazon AWS of Facebook - they offer a platform, where numerous third-parties come in and build their products. If you want to read more about platforms - here is an old, but very popular rant on the topic.
I would say that Infrastructure should adopt this vision as well. Of course it is not possible in all cases. Quite often internal infrastructure teams are very limited in resources, but correct mindset can help to shape it even with limited resources.
Here are the key concepts of a platform inspired by the internet (my interpretation):
- A platform must be externally programmable and extensible
- Dogfooding is paramount
- No business logic built-in - this comes on top of the platform
Why do I think that Infrastructure must build up a platform vision, even though Jeff Bezos have it already figured out for us? The answer is - internal requirements (or at least the part of the answer). It is not enough to spin up a bunch of VMs and set up a network to get what a software company needs internally. Providing internal (and potentially extenralizable) services for infrastructure resource management offers a manageable and controlled way to handle offerings from an Infrastructure team and scale it as the company grows.
It helps (at the very least) with the following challenges:
- Standardize operations solutions across the company’s products, (logging, monitoring, backup, etc.)
- Perform controlled and continuous improvements if infrastructure
- Support multiple versions of infrastructure setups
- Remove the bottleneck of small Infra team by empowering developers to provision their own services in a standardized way
- Comply with various regulatory standards across products
- Keep legacy systems under control
I am not saying that it is impossible to solve these challenges without having an Infrastructure Platform - what I am trying to convey is that the platform setup has inherent qualities which help a lot in case you are struggling to solve these challenges.
If you are unfamiliar with the concept of a ‘Golden Path’ from Spotify - check it out. I like how they define a Golden Path - it is an ‘opinionated and supported’ path to ‘build something’. A Golden Path concept is perfectly aligned with the idea of an Infrastructure Platform - It is basically a way to offer infrastructure services as a platform. In some cases it is not possible to wrap everything into code, so some interfaces come in the form of manuals and guidelines - which is perfectly fine (as soon as they are documented).
As a wrap up I would like to note, that this concept is by no means a single way to do things, and might not be appropriate in certain cases like early startup environments. There is certain overhead to the approach and it must be taken with a grain of salt. Consider this paragraph a brief disclaimer :)