Short Lectures on Distributed Computing 4: Self-Healing and Compact Routing

Disttibued algorithms and networks differ from their centralised counterparts in that failure is part of their DNA! i.e. they can continue despite failures and need to be designed to handle such failures. Failures can come in many forms – from crash failures (a component is knocked out and refuses to get up), transient failures (a component gets punched hard, falls down, recovers on wobbly feet!), Byzantine (a component (our fictitious boxer, as you have no doubt figured out by now) has taken money from bookies to fix the match and is trying to lose ruining the whole fight).

An adversary – the other boxer is a good way to model attacks – i.e. I seek to make a statement such that my boxer does well against the best boxer known; my boxer can stand forever against a boxer that does not use an undercut; my boxer does well against a boxer that throws random punches but may do poorly against one who knows that I didn’t have breakfast today;my referee does well against a bookie who’s paid my boxer to get the match fixed!

The paradigm which looks at keeping the distributed system running despite failures in some acceptable form (maybe degraded from the original) is called fault-tolerance. Self-healing is a fault-tolerance paradigm for reconfigurable networks, in a broad sense. Reconfigurable networks are networks which permit changes to be made quickly , say, in contrast with hard-wired networks where to change the network topology, you would have to physically disconnect and change wires. One example is overlay networks e.g the Internet which can be thought of as virtual connrections made over an underlay – a network of wirings which permit any two computers (say, with IP addresses) to connect to each other.

Self-healing can now be thought of as a game where an adverary removes/deletes/adds a node from the network/graph and the neighbours of the deleted node respond locally by creating new connections in the reconfigurable network with the objective maintaining global properties such as connectivity, distances, expansion etc while allowing only minimal increase in degree of individual nodes. Formalising and designing self-healing algorithms was the subject of my PhD thesis and this lecture should give some flavour of it.

The other topic this lecture deals with is compact routing – a very interesting topic that generated a lot of interest in the distributed graph algorithms community. Those with basic familiarity with routing will remember that to route messages in a network, two data structures are used: the routing table which is available at every node, and routing labels which are the address and related information that each packet contains. These can be thought of as corresponding to post-offices and addresses on letters to be delivered to individuals.

Now, the simplest way to biuld a routing table would be to run something like a Djikstra algorithm all pairs shortest path algorithm and at every node store which neighbour to forward a packet to for every destination. For example, if a node has three neighbours a,b, and c, and it gets a packet destined for node x, the node can see in its routing table that node b is on the shortest path from itself to x, and it can thus forward the packet to node b. This is called shortest path routing and is most desirable since it follows the quickest route to get packets across (ignoring other real world constraints such as bad weather, lazy porters etc etc..). However, the cost is that each node now stores an O(n) sized table where n is the number of nodes in the network. This is considered too much!

Thus, the research area of compact routing is obsessed with keeping the tables (and also the labels) to  smaller (than n) size at the expense of approximating shortest paths – some thought on the matter leads to conclusion that this is an unavoidable tradeoff.

At the same time, what about the six degrees of seperation? – that idea states that not only do you know everybody in the world through a chain of six people, if these people were willing enough – they could even route a postcard from you to them! – thus, this is a really compact routing protocol. This effect was discovered by the Milgram experiment in the 1930s USA.

The difference is that this efffect relies a lot on geographical information i.e. Chicago is closer to New York than Albuquerque, thus, if I have a choice between a friend in New York and another in Albuquerque, I will send the packet to my friend in New York (i.e. I will do greedy routing) if my objective is to send the packet to a Barack Obama, resident of Chicago! In general, compact routing on networks has very little global information, thus, the challenge is higher. This information is gathered and a protocol built. This, however, creates a new challenge – what happens to the routing if the network changes. Compact self-healing routing is a research area that seeks to provide continuing compact routing when the network is changing, at least in the self-healing paradigm.

Lecture Slides are available here (in pdf).

Short Lectures on Distributed Computing 3: Let’s do some elections!

Elections are in the air!! – Oh well, that’s not much fun any more sadly so let’s do more mundane things like Computing!

Leader Election is  one of the foundational problems of distributed computing – starting with a 1977 paper by Le Lann. The idea is for the nodes in the network to elect a leader among their own – no mean task even for computers!

Here, the drosophila of distributed computing comes into form – the Ring topology! We discuss some examples and algorithms on rings. Then we visit some of our own results – sublinear probablistic Leader elections on topologies like good expanders and the most simultaneously time and message efficient Leader Election!

We also visit verification and proof labelling – an attempt to understand complexity in distributed computing in similar terms to P vs NP centralised computing.

Here are the lecture slides in pdf.

Short Lectures on Distributed Computing: Lecture 2 – Trees and formal message passing model

This lecture looks at the most basic of distributed algorithms: flooding (with a rather interesting question that arose accidentally!). We also see how it can be used to build spanning trees – Breadth First Search (BFS), Depth First Search (DFS) Trees, Broadcast and Convergecast (reverse broadcast)!

We look at the message passing model in a more formal manner including the timing model – synchronous or asynchronous! Thus, we fill up more of the empty slots of the huge matrix of possibilities of models in this multi-agent world of distributed computing!

Here are the lecture slides in pdf.

A report from DA@L – Distributed Algorithms @ Loughborough

A 180 degrees science from DA@L
A 180 degrees scene from DA@L – Distributed Algorithms @ Loughborough

Jan 28th, 2019 was a historic day at Loughborough University – well, at least, for distributed algorithms, since the first Distributed Algorithms @ Loughborough workshop was held there. Even though I am biased as the organiser, my feeling is that the day went well and the attendees enjoyed the workshop and all that was on offer – the high quality talks, the hospitality at Burleigh court – the large lunch and dinner, the drinks and even the foosball!

Somebody at Burleigh court kept trying hard to convince us (by putting up signposts) that our workshop was on Distribution Algorithms (rather than on Distributed Algorithms)  but still the speakers pushed on with the high quality talks on distributed algorithms, the slides and abstracts of which are now available on the conference site or linked right below (in order of presentation). Many thanks again to all the attendees and to EPSRC for making this possible.



DA@L is cooking!

DA@L is ready.  We have an exciting line up for the Distributed Algorithms @ Loughborough workshop for this coming Monday, Jan 28th. The details of the workshop  are at – many thanks to EPSRC for funding the COSHER project.

  • Artur Czumaj, Warwick, UK. Round Compression for Parallel Approximate Matching Algorithms
  • Matthew Daggitt, Cambridge, UK. An algebraic approach to routing
  • Michael Elkin, Ben-Gurion University, Israel. Distributed Routing
  • Shay Kutten, Technion, Israel. Brief history of local checkability: from self stabilization to complexity theory
  • Iain Phillips, Loughborough University, UK. How to break the Internet in new and interesting ways
  • Thomas Sauerwald, Cambridge, UK. On coalescence time in graphs — When is coalescing as fast as meeting?
  • Paul Spirakis, Liverpool University, UK. Models for Programmable Matter
  • Amitabh Trehan, Loughborough University, UK. COSHER: Self-Healing Compact Routing and other problems in low memory
  • Posco Tso,  Loughborough University, UK. The Federated RaspberryPi µ-Infrastructure Testbed


Short Lectures on Distributed Computing: Lecture 1 – Introduction

The first in a short series of lectures attempting the impossible task of covering distributed computing in 4 lectures to a group at the University of Bergen!

The Zoo of distributed computing  where you’ll find many kind of creatures – regular or strange, benign or dangerous, dumb or rational and cunning, healthy or weak, quick or slow, boring or interesting. Centralised vs Distributed – Message passing (flooding etc) with a hint of Shared memory and PRAM etc.

Here are the lecture slides in pdf.

COSHER, Jonas and DA@L

My EPSRC first grant COSHER (Compact Self-Healing Routing) is coming to it’s final stages. Maybe I should have shouted about it earlier but we do have a website at My wonderful postoc Jonas Lefevre has recently moved on to his new mission after his year here. We have a number of interesting results, I believe, and more than that, a number of interesting ideas in the pipeline. In particular, we have some interesting ideas on algorithms for low memory large networks which I shall discuss here at some point in the future. We also have work on variants of leader election and fully dynamic self-healing routing in the pipeline. The time was almost too short – for this, it is good that EPSRC has now removed the funding cap of £125K from the first grant (besides renaming it to NIA – New Investigator Award) making it practically possible to support a postdoc for more than a year. Merci beacoup Jonas and thank you, EPSRC.

Finally, I am planning to organise a workshop centered around the themes of the grant – Reliability, Routing and Memory constraints on January 28th at Loughborough. I already have some exciting ‘Ayes’ to participation. The idea is to have some practitioners tell us what they really need and some theorists like me to feel happy that our dreamy ideas will be transformative in reality! I plan to call it DA@L – Distributed Algorithms @ Loughborough. I know, I know, it probably reminds a few of you of Lentil soup (daal), which doesn’t seem as exciting as butter chicken but, hey, climate change is real and maybe we have to do our bit eating low carbon usage foods! 🙂 –  If you have some ideas and/or plan to participate, please drop me a message!