Weapons of Math Destruction and the hippocratic oath

Don’t mess with the maths – it can shoot you in the foot! -‘You talking to me, you talking to me…’

Here is a very interesting article on the dangers of the big data hype: http://spectrum.ieee.org/tech-talk/computing/software/are-you-making-a-weapon-of-math-destruction.

Author Cathy O’Neil (The mathbabe!) in her book Weapons of Math Destruction (What a name!) states that we should remember that predictive models and algorithms are really just “opinions embedded in math.” Well said, indeed.  After all, maths is annother language – a very powerful one for emotionless, ‘logical’ calculus though.

WeaponsMath r4-6-06.jpg

Maybe time for all to take the  Modeler’s Hippocratic Oath (Derman and Wilmott):

I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

∼ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

∼ I will never sacrifice reality for elegance without explaining why I have done so.

∼ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.

∼ I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.

Belfast to Mexico City via Self-healing Compact Routing!

Thanks to the Newton Fund and the Mexican Academy of Science (AMS) I get to spend six weeks in Mexico city visiting my colleague at UNAM, researching, what else but Compact Self-Healing Routing… and authentic Mexican food 😉

The first part of this post is by way of thanks to the Newton Fund  , the UK academies (say, The Royal Society) and the Mexican Academy of Sciences (AMC) for a Newton mobility grant  which will allow me to visit my colleague Armando Castaneda at UNAM for six weeks in August and September this year. The call funds upto three months of ‘foreign activity’ so I could have possibly asked for more time but I was unsure of being able to get away for so long. Armando managed to visit me at Belfast some time ago so this can be even thought of as a return visit! He even managed to time his visit to coincide with the Belfast Marathon (while he’s training for a marathon) – `curiosier and curioser’, as Alice would say!

Screen Shot 2016-07-11 at 23.50.38.png
Basics of Routing: To send a letter in an envelope- what is needed is an address (packet header) and a database at each post office (routing tables)!

 

So, what’s this resilient compact routing? About three years ago, I was fortunate to be an  I-CORE postdoc with Danny Dolev  and Armando was a postdoc at Technion. We started discussing ideas around routing and possibly due to my long line of work with self-healing algorithms (on which many blog posts may follow!) we started to gravitate towards the question: Can we route messages despite failures in the network?  At about the same time, Shiri Chechik bested our Leader Election paper (On the complexity of universal leader election) with her paper Compact routing schemes with improved stretch for the best paper award at PODC 2013. With many life-changing events in-between (such as getting faculty positions and moving to many degrees drop in average temperature and many degrees more of precipitation!), the first paper in this line has just managed to struggle over in early 2016. Compact Routing messages in self-healing trees (Arxivwas a finalist for the best paper award in ICDCN 2016.

So, what’s this resilient compact routing (take 2)? Routing is a very important `primitive’ for networks – the ability of the network to take a message from a source node and deliver it to a target. We encounter it every time we get onto a network- as soon as we connect to a router, to a website, send an email, make a skype/voip call etc.. In practice, the most used protocols for routing are based on well known standard graph distance finding algorithms such as Djikstra’s and Bellman-Ford, which itself is a testament to the longevity of these algorithms and to the power of graph algorithms, in general. If a node x gets a packet which started at node a and needs to end at node b, node x will refer to it’s routing table – a table which tells it which of its neighbours to send its packet to.

Often,  a routing table will contain an entry for every node in the network telling where to forward a message addressed to that node. Now, this means the table can be really really huge depending on the size of the network. In practice, there are ways around this. One way, which makes for some nice theory is to do some preprocessing on the network e.g. build spanning trees, do DFS traversal, maybe some renaming and port changes, to reduce the size of the routing tables and the packet header. The crux of many of these schemes seems to be (the now seemingly simple) idea of interval routing (which by itself may not be compact) introduced by Santaro and Khatib in 1982. The idea may be summarised as follows:

  • Starting from a particular node, do a Depth First Search (DFS) traversal and construct the corresponding DFS tree. Also, give each node a label that is that node’s DFS number (ID) (say, the time step at which that node was first encountered in the DFS traversal).
  • Now, at each node, if you store the ‘largest’ ID of its subtree and the IDs of its children, you can now get intervals – which tell you which neighbour in the tree a node should send a packet addressed to another node. Hence, the name Interval Routing.

This spawned an active and productive field of research for the last few decades particularly in the effort of reducing both the size of the labels and tables while reducing the necessary tradeoff to be paid in terms of the distance. It is well known that if you reduce the space you use for routing, you cannot use the shortest paths and must pay in some measure by using a longer path (this measure is called stretch).

Looking at the above description, one will notice that developing these data structures seem to rely heavily on the initial DFS traversal. This implies that one may need to do a lot of recomputation if the network changes. In this spirit, there are some (but not many) works on `resilient’ compact routing i.e. compact routing that can handle changes to the network. Ours is one such attempt – in which we show how to do the well know Thorup-Zwick routing (over trees for now) in the self-healing model. In brief, the self-healing model is a responsive model of resilience where an adversary chooses a node/processor  to take down or even insert (presumably to do the most damage in either case) and the neighbours of the attacked node/processor react by adding some edges/connections to the graph/network. We show how to both the compact routing and self-healing in low-memory (O(log^n) at most, where n is the number of nodes in the network).

I will defer the technical details of both self-healing and our compact self-healing routing to another post but needless to say, I am quite excited about continuing this work!

 

 

 

 

Driving BCTCS 2016

The 32nd British Colloquium of Theoretical Computer Science (BCTCS 2016) was held in Belfast from March 22nd to 24th at Belfast (QUB). Here are some insights, pictures and links to some talks.

The  32nd British Colloquium of Theoretical Computer Science (BCTCS 2016)  was held in Belfast from March 22nd to 24th at the pictueresque setting of Queen’s University Belfast and the newly remodelled Graduate School. It was a good amount of work – a bit like a 200 mts race where you accelerate rapidly and take a while to stop! This was more so because my colleague Alan Stewart who brought the conference here promptly retired leaving me in-charge! (Though he still did much of the work even post-retirement 🙂

I did my PhD in the USA in the area of algorithms and I discover that the areas of focus in theoretical Computer Science differ markedly in the US and UK. The UK has traditional strength in Languages and Logic whereas in the US there seems to be more strength in Algorithms based theory (this is something that the EPSRC readily admit!). BCTCS had good representation across the themes particularly since some of our speakers were from across the pond(s) (including Iceland!).

Slides from some of the talks are available at http://www.amitabhtrehan.net/bctcs.html

 

 

Hope you have a look and find them interesting!

Halg!

Highlights of Algorithms, supermarine Paris, June 6th to 8th, 2016.

Halg! Here comes the Algorithms. Halg! .. and the algorithmicians.

Highlights of Algorithms  is taking place in Paris right now (June 6th to 8th). Surprisingly, all of Paris I have seen seems above water (contrary to what I expected seeing the news!). I think it’s a great idea- getting (hopefully) the best papers on algorithms in the past year at one venue (no formal proceedings). The atmosphere is lively and relaxed with probably more participants than the organisers expected (overflowing room!).

In true (and increasingly rare?) ode to ‘Theory’, Stephen Alstrup’s talk on labelling schemes used paper and OHP, as seen below:

IMG_20160606_151933

 

The talk schedule is given at http://highlightsofalgorithms.org/program/. The talks today have ranged across topics from dynamic algorithms, mechanism design to distributed graph algorithms. Each talk, of course, deserves a post on itself to do justice. Tomorrow afternoon, I will present my paper Compact Routing Messages in Self-Healing Trees in one of the short paper sessions.

On a lighter node, google searching halg reveals this ‘urban dictionary’ entry: http://www.urbandictionary.com/define.php?term=Halg:

On a basic level, a “halg” is a break in a conversation that is being held over MSN. The length of a “halg” depends on circumstances, time of day and the people involved in the conversation.

Hugh! Hold on, I need a halg before my next post.

88

..Being a follow up to the glorious 42 (earlier post) in being two times 42 (almost?) – strong, weak and electromagnetic (but not gravitational) at the same time! eh?

After I published 42,  Fred, a physicist colleague commented that 42 is important in some very fundamental physics and he was disappointed that more hadn’t been made out of this. He also sent some evidence of this grave atrocity!

PastedGraphic-1

From my amateur interest in theoretical particle physics (I was a student of Murray Gell Mann, after all 😉 (Well, I can keep that suspense for another blog post) – I know that the search for the Grand Unified Theory (GUT) made some great progress (with ideas such as string theory). Basically, there are 4 fundamental forces – strong, weak, electromagnetic and gravity. It seems like while others have agreed to be unifired, gravity has been a b*** and has refused to be unified – Any UK/EU readers reading this post? 😉

Well, getting back to 42, apparently, at a  particular high energy, the three ‘friendly’ forces become equal in energy of 1/42 (behold, 42 makes an appearance). If you go to the wikipedia page on 42 – you’ll find a number of entries of the importance of 42  – In fact, exactly 42 total entries under Mathematics, Science, Technology, Astronomy and Religion!! – wow, was that intentional?

Q: For a given number x, can I find x interesting things to say about it? I suppose there may be a way to do that!

Now, what about 88! Of course, 88 is two fat ladies! – that’s it! If you’ve ever had the honour to play Tambola! – I remember as a kid being carted around to Tambola games with my mother and family. At least, that adds to interesting things one can say about the first 90 numbers as listed on this Tambola nicknames site. Now, clearly the nicknames site has an Indian bias – where else would the nickname for the number 83 be India wins Cricket World Cup?

PS: ‘R u sure that 9 time 6 equals 42? My child has a different answer.’ – another comment from a careful reader! I hope you caught that in the last post! My answer, of course, would have been to ask Douglas Adams (if he was alive) or to query his faulty super duper computer yet again!

Discovering the Distributed Algorithms behind Biological Cell Communication

What are the distributed algorithms behind cell communication? Stuck in a sandpit, I and colleagues at QUB gather up some ideas which will hopefully also find some applications.

Ever been in a sandpit? When I was asked to be in one (called the Applied Maths Sandpit at our new EPS faculty.) I was not sure what it would be. It could be an innocent fun activity in the sand as in the first picture, an unlikely dream holiday at a golf course (as in the second picture), or, most likely, a grueling day which I was not forewarned of (third picture).  As it turns out, it was rather nice and, ultimately, useful. There was no sand around, of course! (and neither was there much warm Sun, unsurprisingly for here).

For a start, I met a few brilliant colleagues I did not know existed. Then, in a day of real intense cut-throat Dragon’s den like competition, we pitched our, mostly half-baked, ideas (btw, I am joking about the ‘cut-throat’, after all,  we were (applied) mathematicians, not MBA enterpreneurs at the gathering!). Amazingly, at the end of it, our emergently formed motley crew of  me, Fred Currell, Thomas Huettemann, Dermot Green and Alessandro Ferraro (All except me from QUB Maths and Physics) have been given resources to recruit and spoil a PhD student for a proposal that makes up the ideas of this post.

So, here comes a very high level,  sparse, ambitious, and rough sketch:

Living organisms can be thought of as clusters of cells in communication (typically at gap-junction interfaces). Within interacting communities new cells are born and old ones are removed, through (sometimes programmed) death. There is a strong environmental influence on these processes. On a much smaller scale, quantum-mechanical processes are at work within cells, complicating the picture further. We think real life processes are efficient, fault-tolerant, self-healing and scalable, leading us to hypothise that there must be powerful distributed algorithms somewhere in these networks of cells waiting to be discovered.

Networks  are often modelled as a graphs: the cells are nodes and a common surface between two cells facilitates communication. In biological systems, things change and this dynamism in networks is often addressed by failure models, including adversarial and accidental (random) death. The network can react to these changes in various ways and we seek a mathematical framework in which to formulate and analyse the various questions arising.

The team thought of three systems which could be of interest (some of the team already work on these though I know little about them at the moment):

  1. Volvocine green algae — These capture the evolutionary emergence of multicellularity, including what is believed to be the simplest multi-cellular eukaryotic organism: Tetrabaena socialisjournal.pone.0081641.g001

    Rough outline of phylogenetic relationships in volvocine green algae.

  2. Light harvesting in photosynthetic organisms — the mechanism whereby living organisms harvest energy from light is believed to be one of the clearest biological systems countering the view that life is too “warm and wet” for quantum phenomena to be relevant.nphys2474-f1.jpg

    A quantum machine for efficient light-energy harvesting (from the paper)

  3. The tumour spheroid — a simple multicellular mimic of a tumour, this system is amenable to direct laboratory study and is known to show many of the hallmarks of cancer, including the lack of growth regulation mechanisms, meaning it seeks to grow avidly. nature14971-f2.jpg

    A study showing effects on size, shape and growth rate of tumours (from the paper)

The PhD advertisement is here (if you know of suitable candidates): http://www.qub.ac.uk/schools/eeecs/Research/PhDStudy/PhD-2016-17-65/

Talking of Maths sandpits, somebody is already working putting them to work: MathsSandPit.co.uk

Question: What’s the best way discover algorithms that nature uses? (Of course, this is a very old question!).

42

Hunting for the ultimate questions and answers a la ‘hitchhiker’s guide to the galaxy’, and some rather ‘useful’ algorithms!

Might as well begin with the answer to the ‘Ultimate Question of Life, the Universe, and Everything’. I would have titled my blog ‘The Algorithm’ if I knew how to reach the answer! Of course, the answer is deduced by the supercomputer, Deep Thought, afterr 7½ million years of computation. Deep Thought then points out that the answer seems meaningless because the beings who instructed it never actually knew what the Question was. So, when asked to produce The Ultimate Question, Deep Thought builds an even more powerful computer (the planet Earth) to do so, which is supposed to run for 10 million years to find the question and so on…(and you know the rest). Of course, independently, the answer is also discovered to be 9 times 6 (equals 42)!

So, to begin with, here’s an algorithm ….

hitchalgorithm

Question: Are you aware of other such fundamentally useful algorithms?