NTPsec: a Secure, Hardened NTP Implementation

Note: This article was first published in the October 2016 issue of Linux Journal.

Network time synchronization—aligning your computer's clock to the same Universal Coordinated Time (UTC) that everyone else is using—is both necessary and a hard problem. Many internet protocols rely on being able to exchange UTC timestamps accurate to small tolerances, but the clock crystal in your computer drifts (its frequency varies by temperature), so it needs occasional adjustments.

That's where life gets complicated. Sure, you can get another computer to tell you what time it thinks it is, but if you don't know how long that packet took to get to you, the report isn't very useful. On top of that, its clock might be broken—or lying.

To get anywhere, you need to exchange packets with several computers that allow you to compare your notion of UTC with theirs, estimate network delays, apply statistical cluster analysis to the resulting inputs to get a plausible approximation of real UTC, and then adjust your local clock to it. Generally speaking, you can get sustained accuracy to on the close order of 10 milliseconds this way, although asymmetrical routing delays can make it much worse if you're in a bad neighborhood of the internet.

The protocol for doing this is called NTP (Network Time Protocol), and the original implementation was written near the dawn of internet time by an eccentric genius named Dave Mills. Legend has it that Dr Mills was the person who got a kid named Vint Cerf interested in this ARPANET thing. Whether that's true or not, for decades Mills was the go-to guy for computers and high-precision time measurement.

Eventually though, Dave Mills semi-retired, then retired completely. His implementation (which we now call NTP Classic) was left in the hands of the Network Time Foundation and Harlan Stenn, the man Information Week feted as "Father Time" in 2015. Unfortunately, on NTF's watch, some serious problems accumulated. By that year, the codebase already was more than a quarter-century old, and techniques that had been state of the art when it was first built were showing their age. The code had become rigid and difficult to modify, a problem exacerbated by the fact that very few people actually understood the Byzantine time-synchronization algorithms at its core.

Among the real-world symptoms of these problems were serious security issues. That same year of 2015, InfoSec researchers began to realize that NTP Classic installations were being routinely used as DDoS amplifiers—ways for crackers to packet-lash target sites by remote control. NTF, which had complained for years of being under-budgeted and understaffed, seemed unable to fix these bugs.

This is intended to be a technical article, so I'm going to pass lightly over the political and fundraising complications that ensued. There was, alas, a certain amount of drama. When the dust finally settled, a very reluctant fork of the Mills implementation had been performed in early June 2015 and named NTPsec. I had been funded on an effectively full-time basis by the Linux Foundation to be the NTPsec's architect/tech-lead, and we had both the nucleus of a capable development team and some serious challenges.

This much about the drama I will say because it is technically relevant: one of NTF's major problems was that although NTP Classic was nominally under an open-source license, NTF retained pre-open-source habits of mind. Development was closed and secretive, technically and socially isolated by NTF's determination to keep using the BitKeeper version-control system. One of our mandates from the Linux Foundation was to fix this, and one of our first serious challenges was simply moving the code history to git.

This is never trivial for a codebase as large and old as NTP Classic, and it's especially problematic when the old version-control system is proprietary with code you can't touch. I ended up having to revise Andrew Tridgell's SourcePuller utility heavily—yes, the same code that triggered Linus Torvalds' famous public break with BitKeeper back in 2005—to do part of the work. The rest was tedious and difficult hand-patching with reposurgeon. A year later in May 2016—far too late to be helpful—BitKeeper went open source.

______________________