Go Cantabular!
The UK based Office for National Statistics has selected Cantabular to allow flexible dissemination for Census 2021 data. The UK Census is a significant project in both scale and budget (estimate for 2011 Census £482 million over a decade).
Cantabular is designed for organisations that want to share statistics derived from sensitive and potentially personally identifiable data, whilst protecting privacy. It is designed to modernise statistics.
Sensible Code decided to use the Go programming language (also known as Golang) to build the latest version of its product. The Cantabular team have been working on it for three years and found Go a great fit; almost everyone developing in Go at Sensible Code learned it on the project. Go has a small set of keywords to learn, and a compact and readable language specification.
Strong tooling and library support
The toolkit provided by the language core helps all developers, whether new to the language or not. A standout is the gofmt tool that formats valid Go code into a uniform standard. By defining a standard, there are no time-wasting debates about how the code should be formatted. It enables our entire Go codebase to be formatted in this standard way, making the code look more uniform and easier to read. In addition gofmt is used to support the continuous integration process in Cantabular and to ensure the code is formatted correctly for every build.
Furthermore, Go features a test framework (go test) and has performance tooling (pprof) for profiling; both of these are used in Cantabular. Go reduces the friction to getting started by bundling the core development tools with the language. This is especially important for new developers and it’s worth mentioning the useful defaults included.
Go’s extensive official library enabled the Cantabular team to minimise external dependencies for this enterprise scale project. As this excellent paper by Russ Cox (@_rsc) highlights, the convenience of including external code has an overhead associated with auditing and managing dependencies. This is a concern in security-critical environments; a software’s attack surface can increase with large numbers of dependencies, both direct and transitive.
Safety features
Security is an important consideration for Cantabular since it processes sensitive data.
Go is a memory safe language and includes features that avoid security bugs that might occur if Cantabular was developed in another language. Although data races can circumvent this memory safety, there is also an in-built race checker to help catch them.
Go’s static type system helps too. Data types of values are checked at compilation time, before the code is even run. This process helps catch bugs, rather than discovering a problem while the code is actually running. Additionally, Go’s static typing helps Cantabular offer protection against accidentally publishing raw data by defining distinct types for publication-safe and publication-unsafe values.
The built-in Go tooling discussed earlier helps with safety too; go vet is a built-in static analysis tool for checking for mistakes or bad coding practices. A large range of third-party static analysis tools supplement the built-in tooling. As these analysis tools are often written in Go too, they are easy to deploy — see the next section. This makes it simple to incorporate these tools into continuous integration pipelines for providing automated code quality checking that can help highlight potential security flaws.
Rapid build and deployment
Cantabular customers run our code in their secure environments; we do not have access to their systems to debug deployment problems. Customers shouldn’t need to configure prerequisite software to use Cantabular because Go compiles code to standalone binary executables. One hitch is that getting static Go binaries is not quite as simple as it could be, but if this is not yet simple, then at least it is possible. Static binaries remove guesswork on what libraries and packages our customers have installed, and avoids us having to suggest additional software requirements (e.g. Docker, Ansible or others).
Go code can be compiled for multiple platforms from a single operating system; further reducing barriers to a customer looking to run our software. Enterprise customers have varying computing requirements. Some customers are Windows-only shops and the ability to offer a Windows installation option with minimum effort is a great benefit to them. Internally, we’ve benefited from Go’s cross-platform support too: most developers at Sensible Code use Linux, but developers on our Go projects sometimes use OS X.
Last, and by no means least, quick builds were a must have for the Go language developers from the outset, and at Sensible Code we benefit directly from this by not waiting long for our code to compile!
Going forward
Cantabular is a strategic product for the company; we must provide long term support to all our customers since the product will be operational for some years. There is a need therefore to consider the plan for Go’s forward compatibility.
Being able to rebuild the source to incorporate subsequent security fixes to the Go compiler and libraries over a support period, with minimum effort, reduces code maintenance overhead. This means that future Go performance improvements can be passed to customers. Hopefully, this compatibility promise also proves true when Go 2 is finally released too, as this post suggests:
Go 2 must also bring along all the existing Go 1 source code. We must not split the Go ecosystem.
Where do we Go next?
There are tradeoffs in the choice of a programming language and a team has to find a compromise by selecting a language that is suitable for the application domain, and for developer comfort and productivity. The positive experiences with Go certainly make it a candidate for SensibleCode future projects.
This post is based on a presentation that Steven Maude presented on behalf of SensibleCode at Golang Amsterdam in 2019.
Thanks to Orne Brocaar and the rest of the great team of organisers for inviting us to speak and for hosting us in the evening.
Want to know more about Cantabular? Visit the Cantabular site!
SensibleCode are looking for a Statistical Disclosure Control Specialist to help build core expertise in the discipline.