- The Birth of Link-State Routing
"Routing is a hard problem." That's what colleagues told me when I joined BBN in 1971, right after getting my undergraduate and master's degrees in computer science at Harvard University. As a student, I had built the hardware and software to connect Harvard's first interactive computer, the DEC PDP-1, to the infant Arpanet (the first packet switching computer network and precursor of the Internet).1 I had also taken a Harvard course taught by two senior BBN engineers who built the original interface message processors (IMPs), the switching nodes that routed traffic across the network. While at BBN, and with almost all the course requirements for a PhD in hand, it was time to find a possible dissertation topic. So I was glad to learn that dynamic routing—determining the best paths for network traffic in real time—was considered a difficult computer science issue because I planned to tackle it.
By 1971, the Arpanet had been up and running long enough to disclose shortcomings in the performance and stability of the original design. My job at BBN was to redesign and rewrite all the software for the IMP, not just the routing module. I fondly recall releasing new IMP software to the whole network (amounting to a few dozen nodes at the time) every other Tuesday morning for two years. Fifty software versions later, the IMPs had more stable congestion management, better reliability, and higher throughput.2 But routing still remained a challenging area for further study.
By the end of 1974 I had completed my PhD dissertation for Harvard on routing3 while working full-time at BBN. This dissertation described the problem, analyzed and compared many routing algorithms, and pointed out topics for further work, such as hierarchical routing for networks of networks. But I had not resolved some of the nagging network accidents, outages, and other crises caused by the original Arpanet routing system.
Why routing is a difficult problem
Routing is challenging for three reasons. It's a realtime software process running on many nodes.4 Second, it's a significant optimality problem when considered as an algorithm running on a single node. Finally, it's demanding of system resources, placing further constraints on what is possible in practice.
Routing in a distributed computer network (the Arpanet being the prototypical example) is unlike most other software applications:
• Routing is a nonstop application; it does not start with a set of known initial conditions nor does it run to completion.
• Routing is decentralized, with no master clock, and no master routing table. Each node must perform its own routing calculations.
• The routing process is inherently distributed. It runs in all nodes all the time. The constraints of time and space imply that nodes are working with different information. Routing loops are a snag to be avoided.
• Routing must be adaptive, using alternate routes to avoid congested parts of the net.
• Routing continuously measures the net, while sending traffic over it, affecting the measurements. Oscillations are a danger; stability is a challenge.
• Routing must also be a fail-safe application that continues to function if nodes fail, even if the net is partitioned into two or more pieces. Each subset of the network should keep working, and recover seamlessly when an outage is repaired.
• Perhaps the scariest aspect of routing is that it is mission-critical to the entire network. As we had ample opportunity at BBN to witness, failures in routing can cause an entire network to stop working. For instance, in 1971 the IMP at Harvard had a memory error that caused its routing updates to be all zeroes. That led all other nodes in the net to conclude that the Harvard IMP was the best route to everywhere, and to send all their traffic there. Naturally, the network collapsed. We developed heuristics to detect such problems and prevent their wider propagation. The fact remains that local routing failures, unlike almost any other local failure, can have global consequences.
When considered as a classic algorithm, routing is an "interesting" computer science problem, with three critical, interrelated components:
• As a real-time...