I have been building, testing and deploying NixOS systems - at medium to large scales - to cloud environments like AWS and GCP since 2015 in production. I also use NixOS for my own development laptops and workstation configurations to ensure I have reproducible systems to work on and for each codebase I setup a Nix flake or Nix shell to manage codebase dependencies and reliable development environments for everyone on the team.
Below is an introduction to NixOS and some core components (e.g. Nix the config language, Nix the package manager,
nixpkgs the package registry) plus a brief explanation of the innovations that make creating cloud infrastructures using NixOS so prouductive.
| \ | (_)_ __/ _ \/ ___| | \| | \ \/ / | | \___ \ | |\ | |> <| |_| |___) | |_| \_|_/_/\_\\___/|____/
What is NixOS?
NixOS is a Linux-based operating system that's built on top of the Nix package manager. It is highly customizable and focuses on building reliable system images from an expression written in a declarative configuration language (confusing also called Nix). This means that system administrators specify the desired state of their system rather than the steps to achieve that state.
One of the key features of NixOS is its ability to roll back changes of system configuration changes including system-level packages. This is possible because the Nix package manager can store multiple versions of each package and allows users to switch between them by eschewing the shared mutable state of the Filesystem Hierarchy Standard (FHS) that other Linux distributions are predicated on. This can be especially useful for testing and debugging purposes, as it allows administrators to easily revert their system to a previous state if something goes wrong.
How is this different to convergent tools like Chef, Puppet, Ansible, etc.?
In Chef, Puppet, and related tools administrators write scripts or manifests that specify the steps required to configure a system. These scripts are usually written in Ruby, YAML or a specialized variant of it, which get executed on the target system to make the necessary changes. This can lead to eventually divergent states between systems using the same current version of the scripts/manifests/recipes as each other but run at different times or different prior versions applied to each system over time. Often these issues are sutble and hard to troubleshoot.
In contrast, NixOS's declarative language is a lazily evaluated functional programming language ("Nix"). In NixOS, administrators specify the desired state of their system using Nix expressions, which are then evaluated to generate the necessary system configuration files and package sets.
One key difference between these approaches is that the declarative configuration language used in NixOS allows administrators to specify the desired state of their system without having to worry about the details of how to achieve that state. This can make it easier to reason about the configuration of a system and to make changes in a predictable and repeatable way.
Meaningful Tests, Introspection & Debuggability
One big productivity boost I have personally experience while using NixOS to build system images and configs is that there are multiple scopes, methods, and levels available to me to test and introspect system configuration changes that provide much faster feedback cycles for the larger changes than mainstream alternatives for system configuration (e.g. Chef, Puppet, Ansible, Docker, etc.).
There are multiple ways to test or introspect NixOS configurations with different goals:
- Automated Virtual Machine (VM) tests
You can run simulated multi-node integration tests using qemu VMs to run automated test setups, assertions and teardowns. This allows you to test your configurations in a controlled environment without affecting your running system. This level of testing is beyond mainstream configuration systems in much less time. For instance, testing multiple nodes each send a unique log message that is eventually collected by a log collector node within a timeout is a simple endeavor to write an automated test for. Such tests can also check a port in a single machine harness is accessible and responding to HTTP.
- REPL mode
while writing system configurations you can introspect other expressions in a REPL much like you can with most modern programming languages today which speeds up some parts of development.
- Dry-run or diff mode
When building or modifying a NixOS configuration, you can use the
--dry-runflag to simulate the changes that build the system configuration without actually making them. This is useful for checking the syntax of your configuration files and for previewing the Nix package differences between generations of configurations.
The Value Proposition
The biggest value proposition of NixOS is that because you can build purely functional configurations (although there are escape hatches too), you can define hermetically sealed reproducible system configurations no matter the state of a random package repository at the time you ran the script or what state the filesystem was in. It provides the ability to reason more locally about system configurations (within parameters) which makes debugging subtle issues much simpler and failures repeatable so you aren't second guessing yourself while babysitting a CI/CD pipeline on repeat retries until is magically works.
This creates a much more productive workflow for delivering high quality software infrastructures.