Tuesday, September 25, 2018

Webpack 4 - Resolving globally installed dependencies

When using NPM (or yarn) to manage JavaScript dependencies for a project, it is best practice to install the dependencies locally in the project so that multiple NPM projects on the system do not have clashing dependencies or dependency versions. However, it is best to break away from this pattern in favor of using a globally installed version of the dependency if the following cases hold true:

The dependency is massive or takes an extraordinarily long time to install.
The project is the only NPM project (or one of very few closely related NPM projects) on the system (e.g. running inside a docker container).

One instance where both of these criteria hold true is building a docker container for a BuckleScript (or ReasonML) project.

BuckleScript is primarily a compiler that compiles either OCaml or ReasonML code into JavaScript. Therefore, it would seem that bs-platform (the NPM BuckleScript dependency) should only be required as a devDependency and used during the build process. However, bs-platform also contains some code that must be included in the project during runtime. Therefore, bs-platform must be included in the project as either a dependency or a peerDependency.

The simplest option to include bs-platform in the project is to add it to the dependencies field of package.json and allow NPM (or yarn) to install it locally into the project's node_modules directory. This method works great on a development machine where npm dependencies are cached and only need to be installed again if the package is deleted from the local node_modules directory. However, when installing npm dependencies in a docker container, any change to the package.json file would trigger a full install of all of the npm packages. This typically isn’t a problem for smaller npm dependencies that take less than a few seconds to install, but bs-platform installs and compiles an OCaml compiler from scratch. On my fast machine, this process takes over 6 minutes. On my slower machines, this process takes nearly half an hour. Waiting for over 6 minutes to build the docker container whenever anything changes in the package.json file is unacceptable. Especially during development when package.json changes nearly all the time.

Waiting for docker to build the image with bs-platform installed every time as a local dependency....

So now the situation is this: we have a dependency that takes an extraordinarily long time to install AND this project is the only NPM project running on the system (i.e. docker container). Looks like we have a perfect candidate for breaking away from best practice of installing the dependency locally, and, instead, install the dependency globally ahead of time. This will allow us to install bs-platform a single time and cache it as a docker layer. Then, any changes to package.json will happen on a subsequent docker layer without requiring the reinstall of bs-platform.

Next, since we need to resolve the globally installed dependency at run time, we include it as a peerDependency in package.json. This will inform our build tool (Webpack 4) that the project requires bs-platform, BUT it should already be installed on the system.

Finally, we configure webpack to resolve bs-platform as a globally installed dependency instead of a locally installed dependency by adding the following lines to webpack.config.js:

  resolve: {
    alias: {
      'bs-platform': path.resolve(execSync('npm root -g').toString().trim(),
                                  'bs-platform')
    }
  },

When we are ready to build the ReasonML project, we will run bsb -make-world and webpack --mode production which will output our final JavaScript file to send to the client’s browser.

Tuesday, July 17, 2018

Phoenix + ReasonReact - Full-Stack Functional Programming

Image from resir014's post

As I started designing a new web application project, I had the wonderful opportunity to choose which technologies to use to power it. Like many other developers, I have seen the technical debt that is inherent to some of the potential choices, so I decided to take a step back and objectively evaluate which technologies would work best for this specific project. Based on an interview with Jesper Louis Anderson, I decided to consider functional programming languages as a starting point since they are the most “agile” types of languages.

For the backend web server, I wanted to use a functional programming language, but I wasn’t ready to give up all of the niceties of Rails (convention over configuration, generators, ActiveRecord). This led me to the Phoenix Framework, which is basically Rails, but instead of running on Ruby, it runs on Elixir in the Erlang RunTime System. This means that I get all of the niceties of Rails with all the power of the Erlang RunTime System and OTP (fast, scalable, reliable). The choice was easy.

For the client, I wanted to use static typing, immutability, and composable components, but I didn’t want the complexities of handling a different dependency for each of these criteria. That’s when I found ReasonML. ReasonML is basically a re-skinned syntax of OCaml to make it look more like JavaScript and less like a scary functional programming language. Out of the box, ReasonML provides static typing, immutability, and other useful functional programming features (Variants are pretty great!). Also, since ReasonML is also developed by Facebook developers, the project is very closely tied with React. Good enough for me!

Now that I chose the winning team of technologies to use, my next step was to “google” all of the technologies together to see if someone has already written a blog post on how to wire them all up together. Unfortunately, even though these two technologies (Phoenix and ReasonReact) are awesome, they are still fairly new and haven’t yet gotten an invitation to the “popular web technologies” club, so I was left to wiring up the two technologies together myself. To help you, humble reader, from also not finding any results on using these two technologies together, I have put together this post to show you how I managed to get the two working together quite nicely.

So if you are ready to harness the awesome power of the Phoenix Framework as your server and ReasonReact as your client, please read on. (To get spoilers for this post, follow along with the complete code here).

Setting up the Phoenix Project

For this simple web application, we are going to start by creating a simple Phoenix project. If you aren’t familiar with the Phoenix framework yet (don’t worry; I haven’t heard of it until I started this project, and now I am obsessed with it), it is basically a Ruby on Rails project except instead of using Ruby, it uses Elixir and runs on top of the powerful Erlang RunTime System. If you aren’t familiar with the Erlang RunTime System, all you need to know now is that it is very fast, scalable, and reliable.

To get started, first we need to install Erlang and Elixir on the machine. Usually, when installing language runtimes on my machine, I like to use a version manager so I can keep my machine clean of binaries and configuration files, and also so I can easily switch between versions of the language runtime in case either something is not compatible with the current version I have installed or I want to try out the new features of the newest version. My version manager of choice for Erlang and Elixir is asdf.

To install asdf, follow the installation instructions on the asdf readme. On macOS, this will look something like the following: 

 git clone https://github.com/asdf-vm/asdf.git ~/.asdf --branch v0.5.0  
 echo -e '\n. $HOME/.asdf/asdf.sh' >> ~/.bash_profile  
 echo -e '\n. $HOME/.asdf/completions/asdf.bash' >> ~/.bash_profile

Next, restart your shell so that the PATH changes take effect.

Now, install Erlang:

 asdf plugin-add erlang  
 asdf install erlang 20.3.8  
 asdf global erlang 20.3.8

and Elixir:

 asdf plugin-add elixir  
 asdf install elixir 1.6.6  
 asdf global elixir 1.6.6

Now that we have Erlang and Elixir installed, we can install the Phoenix framework. Follow the Phoenix set up instructions. They should look something like this:

 mix install local.hex  
 mix archive.install https://github.com/phoenixframework/archives/raw/master/phx_new.ez

Next, we have to make sure we have node.js running on the machine. I prefer to use nvm as the node version manager. Follow the nvm installation instructions to install nvm. The instructions should look something like this:

 curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash  
 nvm install v8.11.3  
 nvm use v8.11.3

Now we have everything installed, and we’re ready to start our first project!

Creating the Phoenix Project

To get our application started, we need to manually create all the necessary directories, Elixir files, and configuration files… Just kidding! 😆

If you are familiar with Rails generators, then you will be right at home here. We can create the entire starting point of a Phoenix application with a single command:

 $ mix phx.new myapp --no-ecto  
 * creating myapp/config/config.exs  
 * creating myapp/config/dev.exs  
 * creating myapp/config/prod.exs  
 * creating myapp/config/prod.secret.exs  
 * creating myapp/config/test.exs  
 * creating myapp/lib/myapp/application.ex  
 * creating myapp/lib/myapp.ex  
 * creating myapp/lib/myapp_web/channels/user_socket.ex  
 * creating myapp/lib/myapp_web/views/error_helpers.ex  
 * creating myapp/lib/myapp_web/views/error_view.ex  
 * creating myapp/lib/myapp_web/endpoint.ex  
 * creating myapp/lib/myapp_web/router.ex  
 * creating myapp/lib/myapp_web.ex  
 * creating myapp/mix.exs  
 * creating myapp/README.md  
 * creating myapp/test/support/channel_case.ex  
 * creating myapp/test/support/conn_case.ex  
 * creating myapp/test/test_helper.exs  
 * creating myapp/test/myapp_web/views/error_view_test.exs  
 * creating myapp/lib/myapp_web/gettext.ex  
 * creating myapp/priv/gettext/en/LC_MESSAGES/errors.po  
 * creating myapp/priv/gettext/errors.pot  
 * creating myapp/lib/myapp_web/controllers/page_controller.ex  
 * creating myapp/lib/myapp_web/templates/layout/app.html.eex  
 * creating myapp/lib/myapp_web/templates/page/index.html.eex  
 * creating myapp/lib/myapp_web/views/layout_view.ex  
 * creating myapp/lib/myapp_web/views/page_view.ex  
 * creating myapp/test/myapp_web/controllers/page_controller_test.exs  
 * creating myapp/test/myapp_web/views/layout_view_test.exs  
 * creating myapp/test/myapp_web/views/page_view_test.exs  
 * creating myapp/.gitignore  
 * creating myapp/assets/brunch-config.js  
 * creating myapp/assets/css/app.css  
 * creating myapp/assets/css/phoenix.css  
 * creating myapp/assets/js/app.js  
 * creating myapp/assets/js/socket.js  
 * creating myapp/assets/package.json  
 * creating myapp/assets/static/robots.txt  
 * creating myapp/assets/static/images/phoenix.png  
 * creating myapp/assets/static/favicon.ico  
 Fetch and install dependencies? [Yn] Y  
 * running mix deps.get  
 * running mix deps.compile  
 * running cd assets && npm install && node node_modules/brunch/bin/brunch build  
 We are all set! Go into your application by running:  
   $ cd myapp  
 Start your Phoenix app with:  
   $ mix phx.server  
 You can also run your app inside IEx (Interactive Elixir) as:  
   $ iex -S mix phx.server

This command will create a new Phoenix project called “myapp” in the “myapp” directory. The “--no-ecto” flag tells the Phoenix generator not to include Ecto. Ecto is basically the persistence layer of a Phoenix application, and it allows your app to communicate with a backend database (for Rails developers, think ActiveRecord). Since we are just creating a simple frontend application, we are purposely keeping the application simple by not spinning up a database. If you need a database for your application, you can omit the “--no-ecto” flag, set up a database, and continue to follow along with this tutorial. If you want to see more details about the “mix phx.new” command, check out the great Phoenix Up and Running guide.

The “mix phx.new” command will also prompt you to fetch and install dependencies. That is fine.

To test out if the application is working correctly, run the following commands:

 cd myapp  
 mix phx.server

Then, with your favorite web browser, browse to http://localhost:4000. If everything worked correctly, then you should now see the default Phoenix application page. 🎉

Adding ReasonReact to the Phoenix Project

When choosing a Javascript development environment, many developers have come to the agreement that static typing is very important, immutable objects are very important, and composability of components is very important. Since I am lazy about dependency management, I would like to get all of these features automatically built into a single dependency. Enter ReasonReact.

ReasonReact is basically React, but written in ReasonML instead of Javascript. ReasonML is basically a syntax re-write of OCaml to make it look more like JavaScript and less like a scary functional programming language.

Did someone say "monad"? 😱

To get ReasonReact to run in the browser, however, we are going to need to compile the ReasonML code into JavaScript. That is where Bucklescript comes in. To install the necessary ReasonReact dependencies in our project, run the following commands:

 cd assets  
 npm install —save-dev bs-platform # this will likely need take a couple of minutes to run the first time  
 npm install —save react react-dom reason-react

Now that ReasonReact is installed into the project, let’s create a small ReasonReact component that will get rendered on the home page of the application. Add the following files to your project (borrowed from the reason-react-example project):

myapp/assets/js/SimpleRoot.re

 ReactDOMRe.renderToElementWithId(<Page message="Hello!" />, "index");

myapp/assets/js/Page.re

 let component = ReasonReact.statelessComponent("Page");  
 let handleClick = (_event, _self) => Js.log("clicked!");  
 let make = (~message, _children) => {  
  ...component,  
  render: self =>  
   <div onClick=(self.handle(handleClick))>  
    (ReasonReact.string(message))  
   </div>,  
 };

Then add the following bucklescript configuration file to the assets directory:

myapp/  assets/bsconfig.json

 {  
  "name": "myapp",  
  "bsc-flags": ["-bs-no-version-header", "-bs-super-errors"],  
  "reason": {"react-jsx": 2},  
  "bs-dependencies": ["reason-react"],  
  "sources": [  
   {  
    "dir": "js",  
    "subdirs": true  
   }  
  ],  
  "package-specs": {  
   "module": "commonjs",  
   "in-source": true  
  },  
  "suffix": ".bs.js",  
  "refmt": 3  
 }

And add the following commands to “scripts” in the assets/package.json file:  

myapp/assets/package.json

  ...  
  "scripts": {  
   "build": "bsb -make-world",  
   "start": "bsb -make-world -w",  
   "clean": "bsb -clean-world”,  
   ...

This will allow us to run the Bucklescript commands through npm. To make sure everything is installed and working correctly, “cd” into your “assets” directory and run the following command to build your ReasonReact component:
 

 npm run build

If everything is working correctly, Bucklescript will compile all of the ReasonML files (*.re) in the assets/js directory recursively into JavaScript commonjs files with the extension .bs.js.

Now we have the JavaScript files, but how do we get them to actually run in any of the web pages served through the Phoenix web app? To accomplish this, we only need to make a few more changes.

By default, a Phoenix web app uses brunch as the JavaScript build system (instead of webpack, gulp, grunt, etc.), so we’ll stick with that. Open up the assets/brunch-config.js file and add the JavaScript file of the root React component generated by Bucklescript to the “modules/autoRequire” object. This will make brunch build the final JavaScript package to run the ReactComponent when it gets loaded into the browser. Otherwise, the React component code will be bundled with the final JavaScript file, but it won’t be executed when it gets loaded into the browser.

myapp/assets/brunch-config.js

  ...  
  modules: {  
   autoRequire: {  
    "js/app.js": ["js/app", "js/SimpleRoot.bs.js"]  
   }  
  },  
  ...

Now, the last step of displaying the React component in a web page served by the Phoenix web app is to create a “div” tag with the “id” of “index” so that the React component will know where to attach. We will do this by editing the default template created with the Phoenix web app. Delete all of the contents in the lib/myapp_web/templates/page/index.html.eex file and replace them with the following:  

myapp/lib/myapp_web/templates/page/index.html.eex

  <div id=“index”></div>

Save this file and start up the Phoenix server again with mix phx.server in the root directory of the project (if it’s not already running). Load up the page in your favorite web browser again (http://localhost:4000), and if everything worked correctly, you should now see the React component displayed beneath the default Phoenix header. If this worked, then you are now running a ReasonReact client from a Phoenix web application! Great job! 🎉

Hot-loading ReasonReact

So one thing you know as a developer is you are not going to write the code once, compile it, and be done forever (unless you are a perfect programmer who can foresee all future requirements of the system 🔮). You are going to be constantly changing the ReasonReact code and seeing how it affects your application. To reduce the time it takes to write the code, save the code, compile the code, restart the server, write the code, save the code, and so on, you’ve likely used hot-reloading. Hot-reloading is when your development environment watches your code for changes, and then automatically compiles and redeploys the changes to your development server and you can immediately see the impact of the code you are currently working on. Luckily, Phoenix does this automatically with both the Elixir code in the “lib” directory and client code in the “assets” directory when you run the “mix phx.server” command. E.g. if you change (and save) and Elixir file in lib/myapp_web/controllers or a JavaScript file in assets/js/, then the changes will automatically be updated in your running development server. Awesome, right? Well what about ReasonML code? Phoenix doesn’t watch or update this code automatically. Bummer. 😞

However, since Phoenix is watching for changes in JavaScript files in assets/js/, and when we make changes to the ReasonML files and compile them into JavaScript with Bucklescript, then Phoenix *does* see the updated JavaScript files and automatically hot-reloads them into your running development server. So all we have to do is have Bucklescript watch for changes to our ReasonML files and automatically compile the new JavaScript files when it sees changes. Fortunately, we already added this command to our assets/package.json file as the “start” script. If you run “npm run start” from the “assets” directory, then Bucklescript starts in “watch” mode, so it will automatically compile any ReasonML changes into JavaScript, Phoenix will see the changes to the generated JavaScript files and automatically hot-reload them into the running development server. Great! 😄

So now we know we can have hot-reloading of all of our code during development (both Elixir server code and ReasonReact frontend code) with the following two commands:  

 cd assets && npm run start  
 mix phx.server

But how do we combine both of these commands into a single command we can run when we’re ready for a long and productive coding session? (Because, yes, running a single command is far more effective than running two commands. I’m that lazy.) We can just run both of these two commands asynchronously inside of our own Mix task! (For Ruby developers, Mix is basically Rake for Elixir. For other developers, Mix is basically Make but more fun.)

Create the following file in the lib/mix/tasks directory:

myapp/lib/mix/tasks/myapp.server.ex

 defmodule Mix.Tasks.Myapp.Server do  
  use Mix.Task  
  @shortdoc "For Development - watches changes in Phoenix and ReasonML code."  
  @moduledoc """  
   For Development - watches changes in Phoenix and ReasonML code.  
  """  
  def run(_args) do  
   tasks = [  
    Task.async(fn -> Mix.shell.cmd "cd assets && npm run start" end),  
    Task.async(fn -> Mix.Task.run "phx.server" end),  
   ]  
   Enum.map(tasks, &Task.await(&1, :infinity))  
  end  
 end

Now we compile the task by running “mix compile” in the root directory of our project.

 mix compile

And finally, we can start our entire server, with hot-reloading for both server and client code, with the single command in the root directory of the project:

 mix myapp.server

Conclusion

Throughout this post, we learned how to install Erlang, Elixir, node.js, Phoenix, and Bucklescript, how to serve a ReasonReact client from a Phoenix web application, and how to set up hot-reloading for all of the code in the project. We went through each of these steps pretty quickly, so if you are having trouble with any of the specific technologies in any of the steps, I strongly advise you to look through the official documentation of that technology (each of the technologies discussed in this article has pretty great documentation). If nothing else, I hope this post encourages you to check out my two latest favorite technologies (Phoenix/Elixir and ReasonReact) and gives you an idea of how to synthesize the two to make even greater, fully-functional web applications. (Full code here)

Monday, January 16, 2017

Building Packages with Docker and Hosting Them in a Private APT Repository on AWS S3

Recently, I’ve needed to automate the installation of a specific package on an Ubuntu machine. Normally, this task would be as easy as apt-get install <pkgname>, but the officially hosted package on the Ubuntu APT repository was not built with the required configure options. The most maintainable solution to this problem then was to download the source, compile it with the right configure options on the target operating system distribution, and upload the resulting package to a privately hosted APT repository so that it could later be downloaded with apt-get install.

The challenge of compiling the source code on the target operating system distribution turned out to be an annoying issue since the target operating system was a completely different operating system than my development environment. One potential solution would be to spin up a virtual machine of the target operating system distribution, run the build script on that virtual machine, and then copy the newly built package to my development machine where I could upload it to the private APT repository (I did not want to upload the resulting package to the APT repository directly from the VM since I did not want to copy my secret credentials of the APT repository onto another machine). Although this solution would work, it would be very annoying and time-consuming to spin up a new VM, run a script on that VM, and then copy files to a different machine every time there was a change in the build script, build process, or source code itself. Annoying and time-consuming tasks are the enemy of a fully automated system, so I wanted to find a way to run the entire process from my development machine with a single command. The answer was containerization. Specifically, Docker.

The more convenient solution is basically the same as the annoying solution except the VM is replaced with a Docker container. A single script on the development machine creates a new Docker container of the target operating system distribution, copies build scripts into the container, runs the build scripts inside the container, and copies the resulting packages back to the development machine where they can be uploaded to the private APT repository.

As an example, I will walk through the specific solution of building Squid 3.5 with SSL enabled for Ubuntu 16.04 and upload the resulting packages to a private APT repository hosted on AWS S3.

The first action in this script is to create a Docker container. On creation, the Docker container runs the build scripts and places the resulting packages into the $CONTAINER_ARTIFACT_PATH directory in the container. The script then copies each package from the $CONTAINER_ARTIFACT_PATH directory on the container to the $LOCAL_ARTIFACT_PATH directory on the development machine. Once all of the packages are on the development machine, the development machine pushes them to the APT repository hosted on AWS S3 using deb-s3 with the AWS credentials configured on the development machine.

Get the rest of the code here on Github.

Thursday, December 22, 2016

Automate Everything with the Right Tools for Security and Profit

The security of a system is only as strong as the metaphorical “weakest link” in that system. In the case of our product, the weakest link tends to be the deployment of our infrastructure. Although the engineering skills at our company are spectacular, our combined experience in DevOps is less substantial. That lack of experience, plus the startup mantra guiding us to “move fast and break things” has resulted in an infrastructure deployment strategy that is a manual process twisted up in a mishmash of infrastructure deployment technologies. Sometimes the delicate ecosystem of deployment technologies miraculously manages to deploy a running system into the production environment, but, more often than not, mistakes are made during the deployment process that cause the maintenance of the running system to be a major headache.

In an attempt to make the deployment of the infrastructure more reliable and secure, I have targeted two rules that must be followed to improve the infrastructure deployment strategy: automate everything and use tools only as they are intended.

Automate everything

To automate everything is one of the core requirements of effective DevOps. Fully automating a process is the only way to scale that process faster than linearly with respect to active human operators. In addition to the DevOps advantage of making possible the unprecedented scaling of system, automating everything also results in several benefits directly related to security. In this section, I describe some of these benefits.

Reviewable infrastructure

At the company, we have a very strict and well-defined process of merging code into the main application. One of the steps in that process includes a mandatory code review from certain owners of the specific component that is being modified. This code review not only ensures quality of the submitted code changes (and, therefore, of the entire codebase itself), it also ensures security of critical components of the application (such as the authentication mechanism) by requiring the security team to review the code before it is submitted.

Similarly, the automated parts of our infrastructure require code reviews from owners of the specific sections of infrastructure that are being modified. However, since part of our infrastructure deployment is manual, those changes require a ticket to be submitted through a change management system which eventually end up in the inbox of a member of the operations team. That member of the operations team, who likely has complete administrator access to the system, then manually implements the change. Although we can log the changes that are manually made to the infrastructure using an auditing service like AWS CloudTrail or AWS Config, discrepancies or mistakes in the implementation can only be noticed after they have already occurred (if they are even noticed at all). Fully automating all of the deployed infrastructure allows us to apply the same rigor of review, for both quality and security, of deployed infrastructure changes as we do for code changes of the main application before the changes are even applied to the system.

Auditable infrastructure

Where reviewability of infrastructure changes is useful before the change occurs, auditability of the infrastructure is useful after the changes have already been applied to the system. Fully automating the infrastructure means that all of the infrastructure is recorded in updated documentation in the form of the automated code. In the case where you, or an external auditor, needs to review how the infrastructure is designed, you can simply refer to the latest version of your automation code in your version control system as the most up to date documentation of the infrastructure.

If any of the infrastructure is deployed manually, then the documentation of that part of the system must be manually updated. The tedious, and typically less prioritized, process of manually keeping documentation in sync with an ever changing project inevitably results in an outdated, and incorrect, description of the infrastructure. However, if the infrastructure is constructed based on the documentation itself, as in a DevOps fully automated system, then the documentation will always inherently be in sync with the real system.

Disaster recovery

In addition to being able to audit, review, and version control a fully automated infrastructure, it is also significantly easier to relaunch a fully automated infrastructure in its entirety. Even though an entire system may not typically need to be completely redeployed under normal conditions, a disaster may require the entire system to be completely redeployed into a different region. The recovery time objective (RTO) of a critical system is usually very short, which requires the mean time to recovery (MTTR) to be as quick as possible. With a fully automated infrastructure, the MTTR can be reduced to the time it takes to press one, or maybe several, buttons (or maybe less if disaster failover is also automated!). Not only is the MTTR of a fully automated infrastructure quicker than a partially manually deployed infrastructure, it is also more reliable and significantly less stressful to deploy it.

Automatic rollback

One advantage of using a version control system is that each iteration of the system is recorded. Not only can you review previous versions of the system, you can also deploy previous versions of the system. This feature is especially useful if a mistake is made in the infrastructure that needs to be immediately rolled back to a previous state. In a manually deployed infrastructure, it can be difficult to even remember what changes were made, and even more difficult to figure out how to reverse them.

No snowflake systems

Another security challenge associated with infrastructure deployment is updating security patches and configuration settings on each of the resources. For example, if a piece of software running throughout your environment requires a security patch, then that update only needs to occur once in the automation code. Similarly, if the configuration of all load balancers needs to be updated to use a more stronger security policy, then that update only needs to occur once in the automation code. If these changes were made manually to each system, then, depending on the complexity of the change, the human operator is likely to unintentionally configure each one slightly differently. Slight differences of configuration settings of the systems can lead to security vulnerabilities that may go unnoticed for a very long time.

Use tools as they are intended

In addition to automating the entire infrastructure, choosing the right tool for the job is very important from a security point of view. Using a tool that is designed for a specific task helps to ensure readability and reliability of the deployed infrastructure defined by that tool.

More specifically, when defining an infrastructure, use an infrastructure definition tool (CloudFormation or Terraform). When configuring servers, use a server configuration tool (Chef, Puppet, or Ansible). When defining a Docker container, use a Dockerfile. When packaging and distributing a piece of software, use a package management system (yum, apt, etc.). Although it seems obvious to use the right tool for the job, each tool requires time and effort of the human operator to learn to use effectively. Couple the extra effort required to learn to use the tool with the fact that many of the tools also offer features to half-heartedly accomplish other tasks, many human operators are tempted to use a single tool outside of its intended domain. Although learning and using a single tool while ignoring other potentially more logical options may seem like a time-saving temptation, the added complexity of using a tool outside of its intended domain results in layers of technical debt that inevitably take more time to resolve in the future.

One example that I have seen of using a tool outside of its intended domain is using Red Hat’s Ansible, a server configuration tool, to define an infrastructure. The main difference between the configuration of an infrastructure and the configuration of a specific server is the number of configuration points in each of these types of systems. An infrastructure has a relatively limited number of configuration points (ELB options, route tables, etc.), whereas a server has an intractably large number of configuration points (installed software, configuration of that software, environment variables, etc.). Because of this difference, infrastructure tool templates are easier to read and understand than server configuration tool templates since infrastructure tool templates are able to explicitly define all configuration points of the system. On the reverse side, server configuration tools are only able to explicitly define the desired configuration points (make sure package A is installed with a specific configuration) while ignoring any part of the system that has not been mentioned (do not uninstall package B if it was not mentioned in the server configuration template). The additional complexity requiring the understanding both the server configuration and the initial state that it applies to is necessary for server configuration, but that additional complexity is unnecessary for infrastructure definition. Therefore, the additional unnecessary complexity of using a server configuration tool to define an infrastructure introduces unnecessary risk into the deployment of the infrastructure.

Another example that I have seen of using a tool outside of its intended domain is using Ansible (again) combined with preprocessing shell scripts to define Docker containers. In this instance, several bash scripts would generate a Dockerfile by replacing variables in a Dockerfile.tpl file (using a combination of environment variables and variables defined in the bash scripts themselves), build the container by running the newly created Dockerfile artifact that would run an Ansible playbook on itself, and then upload the resulting container to a remote container repository. Later, several shell scripts from another repository would pull and run that container with variables defined from the new environment. Needless to say, following the variables through this process or recreating a simple local environment of this tightly coupled system to test these containers proved exceedingly difficult. Understanding that most of this process could have been defined in a single Dockerfile (without Ansible or the complicated preprocessing scripts), accepting this high-level of unnecessary complexity equates to accepting a high-level of unnecessary risk in deploying this system. (In fairness to the writer of this process, that system was initially created to deploy a system directly onto a VM. The containerization of the system was later added as a constraint, and insufficient resources were granted to properly rewrite the process to address the new constraint.)

Solution

Although automating all of the infrastructure and choosing the right tools for the job is a difficult and time-consuming task, it is a necessary task to create a resilient and secure infrastructure. In this section, I describe a workable solution using several specific tools. These tools may not work for your specific system, but they may provide a good place to start.

Infrastructure definition - Use CloudFormation to define resources in the cloud such as VPCs, route tables, SQS queues, and EC2 instances. Include the installation of a pull-based server configuration agent on each EC2 instance defined so that it will be able to configure itself when it boots.
Server configuration - Use a pull-based server configuration tool, such as Chef, that can define the configuration of each server in the infrastructure based on the “role” of that server (secure transparent proxies have configuration X, bastion hosts have configuration Y, etc.). When the machines boot up from the infrastructure definition tool, they automatically pull their own configuration from the server configuration tool.
Container building tool - Use a Dockerfile to define how a container should be built. The additional complexity of requiring preprocessing with bash scripts or self-configuration with Ansible is likely to be a warning sign that the system is not designed properly. Reassess the design and try to follow Docker’s best practices.
FaaS deployment tools - I am a fan of running small services as FaaS since most of the infrastructure responsibilities are delegated to the cloud service provider. Launch these services with FaaS deployment tool such as Serverless.

Although developing a resilient and secure infrastructure is a difficult and complicated task, following these two rules will immediately take you a long way. Also, as an added benefit, your security team and auditor will thank you.

Sunday, December 18, 2016

Offloading Service Deployment Responsibilities to a Cloud Service Provider

Working as an engineer at a progressive tech startup, I quickly realized that when you are tasked to build a production-level system, you are implicitly also responsible for maintaining that system. That maintenance may seem like a simple task at first (“The code runs on my machine, just run the same command on some server in the cloud. Easy”), but when it comes to technology, the seemingly simple details are usually what makes the job incredibly difficult. The task to “just run the same command on a server in the cloud” quickly unravels into a number of issues:

Which server do I run the command on?
How to I provision that server?
How do I get my code to that server?
How do I get my secret values to the service that will run on that server (API keys, certificates, etc.)
How do I keep the server patched with the latest security updates, both on the OS level and the background daemons?
Whenever I make changes in the code in my version control system, how do I deploy those new changes to server (CI/CD)?

And these few issues are just the tip of the iceberg. Each issue is likely to have another “just do something” solution that generates another list of issues to be resolved.

Looking at all of the issues that need to be resolved just to deploy some code can be daunting for a new engineer who is used to building software in a single comfortable language (pure Python or Java or maybe even C) and simply running it locally on his or her own system. And even while many of the issues of deploying software have been ubiquitous enough to evolve from checklists of best practices into an entire career field dubbed “devops”, sometimes it is easier to just avoid these issues entirely by offloading them to a cloud service provider.

The tradeoff of offloading responsibilities to a cloud service provider comes at the cost of losing some detail of control over the deployed environment. For certain services, it may be important to retain a high level of control over the deployed environment at the cost of retaining the responsibility of maintaining that environment. In this post, I will cover the 3 most common deployment strategies that tradeoff levels of detailed control of the environment for ease of deployment and maintenance of a service in the cloud. These 3 strategies are the server level strategy, the container level strategy, and the function level strategy.

Server level

The service can be installed directly on a server (or more precisely, on a virtual machine). This is the most traditional way to deploy a service and also gives the operations team the most control over the deployed environment.

Pros

Access to low-level resources - Access to hardware-accelerated resources including network hardware.

Cons

Patch management - All patches on the server must be managed. Especially security updates.
Development environment != production environment - Typically results in the “well it works on MY machine” argument between developers and the operations team if the development environment and production environment are not kept in sync.
Scaling management - Scaling of the servers in the cluster must be managed.

Use-cases

Network service - A packet router that forwards most packets using hardware-accelerated network interfaces, but routes certain packets to a user-level process for inspection or logging.

Container level

Giving up some control over the deployment and operation of the service (and also the associated responsibility of maintaining that control), services can be deployed at the container level. A container is an abstraction of the higher-level functionality of the operating system from the lower-level kernel. For example, a virtual machine can be running a certain version of a Linux kernel, but then have several higher-level operating systems running on top of the kernel inside containers, such as Ubuntu 14.04, Ubuntu 16.04, CentOS 6, and another Ubuntu 14.04. Since the separation of the higher-level operating system from the kernel makes the deployment of each operating system much more lightweight, this gives the operations team the option to run many more independently running operating systems than traditionally possible.

Pros

Development environment = Production environment - Since the environment is explicitly defined by the developer, the development environment and the production environment will always be the same. This advantage drastically reduces the “well it works on MY machine” arguments between developers and the operations team since if it works on the developer’s machine, then it is very likely to still work on the production machine.

Cons

Patch management - Patches must be managed both at the server level and at the container level. However, this is not as challenging as in a server-level deployment strategy since the virtual machine kernel will require significantly less maintenance than higher-level operating system services, and the higher-level operating system services will be explicitly defined and managed by the developer.
Scaling management - In addition to managing the scaling of the servers in the cluster, the containers within that cluster must also be managed.

Use-cases

Service that requires custom environment
Large monolithic services

Function level

Giving up even more control of the deployment and operation of the service, a service can be deployed as pure code to a function as a service (FaaS) platform. Popular examples of FaaS are AWS Lambda (https://aws.amazon.com/lambda/), Google Functions (https://cloud.google.com/functions/docs/), and Azure Functions (https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). Each of these platforms allows a user to simply upload code that can be triggered on any number of events, such as scheduled time, an HTTP request (useful for webhooks), a certain log being generated, or a number of other platform specific events. Although FaaS may seem like magic at first since logically you are running pure code in the cloud without any servers, FaaS is simply an abstraction on top of containers that moves the responsibility of container management to the cloud service provider. With the least amount of control (and associated responsibility) over the deployment, FaaS is the easiest deployment method to maintain.

Pros

No patch management - All patches/updates are managed by the FaaS provider.

Automatically scales - All scaling is managed by FaaS provider.
Price - The typical pay-per-execution model of FaaS means the service is only charged while the code is executing. Therefore, if a service is triggered every 5 minutes and only runs for 200ms, then it will be charged only 4% of the cost of the same service that requires the server to be running at all times. However, if the service is running 100% of the time, FaaS pricing is usually more expensive than maintaining the system yourself (see pricing in cons section).

Cons

Price - FaaS is more expensive than maintaining an identical system yourself. However, the time that is spent building and maintaining that identical system is time that the engineer could be spending on creating new features for your service. Therefore, FaaS is likely to be less expensive in the long run.
Constrained to a limited list of environments - FaaS provides a limited list of runtime environments, typically one per each supported programming language. The environment usually contains all the standard libraries and tools required for most applications, but if specific customizations to the environment are required to run the service, FaaS may not be an option.
Downloads entire service on each run - FaaS works by starting a new container that downloads the entire service code (usually compressed) and then executing that code. If the codebase is large, then the download will take a long time causing a longer delay between the time the function was triggered and the time it is executed. However, if code if properly minified (a standard workflow with nodejs projects), then this download time is relatively negligible.
Development environment ~= production environment - The developer must be responsible for ensuring that the development environment matches the production environment so that behavior remains the same in both environments. Although this is easier to accomplish in FaaS than with traditional server-level deployments since the FaaS environments are usually well-defined, it is more difficult than container-level deployments where the environment is explicitly defined by the developer.

Use-cases

Simple services that can be run in a provided environment - One of the main principles of microservices is that a microservice should have one responsibility. Adhering to this principle fits FaaS like a glove since the services are small and the responsibilities are well-defined. If you are looking to deploy a large monolith service, however, FaaS might not be the best option.

Although these lists and descriptions of each deployment strategy are not comprehensive, they are usually good enough to make a decision on which deployment strategy is best for a given system or service.