I have been building, testing and deploying NixOS systems - at medium to large scales - to cloud environments like AWS and GCP since 2015 in production. I also use NixOS for my own development laptops and workstation configurations to ensure I have reproducible systems to work on and for each codebase I setup a Nix flake or Nix shell to manage codebase dependencies and reliable development environments for everyone on the team.
Below is an introduction to NixOS and some core components (e.g. Nix the config language, Nix the package manager,
nixpkgs the package registry) plus a brief explanation of the innovations that make creating cloud infrastructures using NixOS so prouductive.
| \ | (_)_ __/ _ \/ ___| | \| | \ \/ / | | \___ \ | |\ | |> <| |_| |___) | |_| \_|_/_/\_\\___/|____/
What is NixOS?
NixOS is a Linux-based operating system which is built on top of the Nix package manager. It is customizable and focuses on building reliable system images from an expression written in a declarative configuration language (confusing also called Nix). System administrators specify the desired state of their system rather than the steps to achieve that state.
One of the key features of NixOS is its ability to roll back changes of system configuration changes including system-level packages. This is possible because Nix (the package manager not the language) can store multiple versions of each package which allows users to switch between them by eschewing the shared mutable state of the Filesystem Hierarchy Standard (FHS) that other Linux distributions are based upon. It allows administrators to easily revert their system to a previous state if something goes wrong.
How is this different to convergent tools like Chef, Puppet, Ansible, etc.?
In Chef, Puppet, and related tools administrators write scripts or manifests that specify the steps required to configure a system. These scripts are usually written in Ruby, YAML or a specialized variant of it, which get executed on the target system to make the necessary changes. This can lead to eventually divergent states between systems using the same current version of the scripts/manifests/recipes as each other but run at different times or different prior versions applied to each system over time. Often these issues are sutble and hard to troubleshoot.
In contrast, NixOS's declarative language is a lazily evaluated functional programming language ("Nix"). In NixOS, administrators specify the desired state of their system setting configuration options, which are then evaluated to generate the necessary system configuration files and package sets. The system reaches the desired state by evaluating modules (implementations) of the configurations set by the administrators. This separates concerns between the user (administrators) and providers (module implementators) and offers automated VM testing for module implementors to continuously test their module from the UI (configuration options setting).
One key difference between these approaches is that the declarative configuration language used in NixOS allows administrators to specify the desired state as configuration options of their system without having to worry about the details of how to achieve that state. This can make it easier to reason about the configuration of a system and to make changes in a predictable and repeatable way given it is also backed by a package manage focused on build reproducibility especially when fixing the
nixpkgs package set used.
Meaningful Tests, Introspection & Debuggability
One big productivity boost I have personally experienced while using NixOS to build system images and configs is that there are multiple scopes, methods, and levels available to me to test and introspect system configuration changes that provide much faster feedback cycles for the larger changes than mainstream alternatives for system configuration (e.g. Chef, Puppet, Ansible, Docker, etc.).
There are multiple ways to test or introspect NixOS configurations with different goals:
- Automated Virtual Machine (VM) tests
You can write and run simulated multi-node integration tests using qemu VMs to run automated tests. It allows me to test my configurations in a controlled environment without affecting me running system or deploying. This saves me time! For instance, testing that multiple nodes each send a unique randomly generated log message is eventually collected by a log collector node within a timeout period is a simple endeavor to write a NixOS VM automated test. Such tests can also check a port in a single machine harness is accessible and responding to HTTP.
- REPL mode
While writing system configurations you can introspect other expressions in a REPL much like you can with most modern programming languages today which speeds up some parts of development. Heck, you can even use a REPL just for interactively building and refining derivations too.
- Dry-run or diff mode
When building or modifying NixOS configurations, you can use the –dry-run flag to simulate that you want to what packages your changes to the configuration would trigger to without actually making them. This is useful for checking the syntax and linkage of your configuration expressions and for previewing the Nix package differences between generations of configurations to help you validate your change's surface area.
The Value Proposition
The biggest value proposition of NixOS is that because you can build purely functional configurations (although there are escape hatches too), you can define hermetically sealed reproducible system configurations no matter the state of a random package repository at the time you ran the script or what state the filesystem was in. It provides the ability to reason more locally about system configurations (within parameters) which makes debugging subtle issues much simpler and failures repeatable so you aren't second guessing yourself while babysitting a CI/CD pipeline on repeat retries until is magically works.
This creates a much more productive workflow for delivering high quality software infrastructures.