Sunday, December 18, 2016

Offloading Service Deployment Responsibilities to a Cloud Service Provider

Working as an engineer at a progressive tech startup, I quickly realized that when you are tasked to build a production-level system, you are implicitly also responsible for maintaining that system. That maintenance may seem like a simple task at first (“The code runs on my machine, just run the same command on some server in the cloud. Easy”), but when it comes to technology, the seemingly simple details are usually what makes the job incredibly difficult. The task to “just run the same command on a server in the cloud” quickly unravels into a number of issues:
  • Which server do I run the command on?
  • How to I provision that server?
  • How do I get my code to that server?
  • How do I get my secret values to the service that will run on that server (API keys, certificates, etc.)
  • How do I keep the server patched with the latest security updates, both on the OS level and the background daemons?
  • Whenever I make changes in the code in my version control system, how do I deploy those new changes to server (CI/CD)?
And these few issues are just the tip of the iceberg. Each issue is likely to have another “just do something” solution that generates another list of issues to be resolved.

Looking at all of the issues that need to be resolved just to deploy some code can be daunting for a new engineer who is used to building software in a single comfortable language (pure Python or Java or maybe even C) and simply running it locally on his or her own system. And even while many of the issues of deploying software have been ubiquitous enough to evolve from checklists of best practices into an entire career field dubbed “devops”, sometimes it is easier to just avoid these issues entirely by offloading them to a cloud service provider.

The tradeoff of offloading responsibilities to a cloud service provider comes at the cost of losing some detail of control over the deployed environment. For certain services, it may be important to retain a high level of control over the deployed environment at the cost of retaining the responsibility of maintaining that environment. In this post, I will cover the 3 most common deployment strategies that tradeoff levels of detailed control of the environment for ease of deployment and maintenance of a service in the cloud. These 3 strategies are the server level strategy, the container level strategy, and the function level strategy.

Server level

The service can be installed directly on a server (or more precisely, on a virtual machine). This is the most traditional way to deploy a service and also gives the operations team the most control over the deployed environment.

Pros

  • Access to low-level resources - Access to hardware-accelerated resources including network hardware.

Cons

  • Patch management - All patches on the server must be managed. Especially security updates.
  • Development environment != production environment - Typically results in the “well it works on MY machine” argument between developers and the operations team if the development environment and production environment are not kept in sync.
  • Scaling management - Scaling of the servers in the cluster must be managed.

Use-cases

  • Network service - A packet router that forwards most packets using hardware-accelerated network interfaces, but routes certain packets to a user-level process for inspection or logging.

Container level

Giving up some control over the deployment and operation of the service (and also the associated responsibility of maintaining that control), services can be deployed at the container level. A container is an abstraction of the higher-level functionality of the operating system from the lower-level kernel. For example, a virtual machine can be running a certain version of a Linux kernel, but then have several higher-level operating systems running on top of the kernel inside containers, such as Ubuntu 14.04, Ubuntu 16.04, CentOS 6, and another Ubuntu 14.04. Since the separation of the higher-level operating system from the kernel makes the deployment of each operating system much more lightweight, this gives the operations team the option to run many more independently running operating systems than traditionally possible.

Pros

  • Development environment = Production environment - Since the environment is explicitly defined by the developer, the development environment and the production environment will always be the same. This advantage drastically reduces the “well it works on MY machine” arguments between developers and the operations team since if it works on the developer’s machine, then it is very likely to still work on the production machine.

Cons

  • Patch management - Patches must be managed both at the server level and at the container level. However, this is not as challenging as in a server-level deployment strategy since the virtual machine kernel will require significantly less maintenance than higher-level operating system services, and the higher-level operating system services will be explicitly defined and managed by the developer.
  • Scaling management - In addition to managing the scaling of the servers in the cluster, the containers within that cluster must also be managed.

Use-cases

  • Service that requires custom environment
  • Large monolithic services

Function level

Giving up even more control of the deployment and operation of the service, a service can be deployed as pure code to a function as a service (FaaS) platform. Popular examples of FaaS are AWS Lambda (https://aws.amazon.com/lambda/), Google Functions (https://cloud.google.com/functions/docs/), and Azure Functions (https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). Each of these platforms allows a user to simply upload code that can be triggered on any number of events, such as scheduled time, an HTTP request (useful for webhooks), a certain log being generated, or a number of other platform specific events. Although FaaS may seem like magic at first since logically you are running pure code in the cloud without any servers, FaaS is simply an abstraction on top of containers that moves the responsibility of container management to the cloud service provider. With the least amount of control (and associated responsibility) over the deployment, FaaS is the easiest deployment method to maintain.

Pros

  • No patch management - All patches/updates are managed by the FaaS provider.
  • Automatically scales - All scaling is managed by FaaS provider.
  • Price - The typical pay-per-execution model of FaaS means the service is only charged while the code is executing. Therefore, if a service is triggered every 5 minutes and only runs for 200ms, then it will be charged only 4% of the cost of the same service that requires the server to be running at all times. However, if the service is running 100% of the time, FaaS pricing is usually more expensive than maintaining the system yourself (see pricing in cons section).

Cons

  • Price - FaaS is more expensive than maintaining an identical system yourself. However, the time that is spent building and maintaining that identical system is time that the engineer could be spending on creating new features for your service. Therefore, FaaS is likely to be less expensive in the long run.
  • Constrained to a limited list of environments - FaaS provides a limited list of runtime environments, typically one per each supported programming language. The environment usually contains all the standard libraries and tools required for most applications, but if specific customizations to the environment are required to run the service, FaaS may not be an option.
  • Downloads entire service on each run - FaaS works by starting a new container that downloads the entire service code (usually compressed) and then executing that code. If the codebase is large, then the download will take a long time causing a longer delay between the time the function was triggered and the time it is executed. However, if code if properly minified (a standard workflow with nodejs projects), then this download time is relatively negligible.
  • Development environment ~= production environment - The developer must be responsible for ensuring that the development environment matches the production environment so that behavior remains the same in both environments. Although this is easier to accomplish in FaaS than with traditional server-level deployments since the FaaS environments are usually well-defined, it is more difficult than container-level deployments where the environment is explicitly defined by the developer.

Use-cases

  • Simple services that can be run in a provided environment - One of the main principles of microservices is that a microservice should have one responsibility. Adhering to this principle fits FaaS like a glove since the services are small and the responsibilities are well-defined. If you are looking to deploy a large monolith service, however, FaaS might not be the best option.


Although these lists and descriptions of each deployment strategy are not comprehensive, they are usually good enough to make a decision on which deployment strategy is best for a given system or service.

2 comments: