[Title Page, 0:00] I guess I should start by explaining the first slide. IÕm here thanks to Conor inviting me but in reality IÕm just masquerading as someone who should be at a mathematically structured functional programming conference. My background is in fairly conventional mathematics, geometry, topology, and a bit of theoretical physics, and while my day job is largely programming, nothing I do at work has anything to do with anything IÕll say here. Functional programming is just a spare time activity for me. (Not completely true, I do a little bit of functional Python.) A few years back I got interested in Haskell. Much of Haskell is documented in academic papers so I started being exposed to highly mathematical computer science papers on things like Intuitionistic Logic, the Curry-Howard isomorphism, applications of Category Theory to and so on. Despite being an undergrad mathematician and going on to doing a PhD, and having a long standing interest in computing since I was a kid, I hadnÕt come across any of this stuff. Any computing I saw as part of a mathematics course was numerical methods and the only structures I saw were flat arrays of floating point numbers. It seems to me that thereÕs a bit of a cultural divide. So I thought IÕd try to cross it, and as I learnt things I decided to write blog posts about them, largely so I could remember what I had learnt, but also to make the crossing easier for others. But also I like to see connections between different branches of mathematics and as I come from outside of CS maybe I get a chance to see some connections with CS that would otherwise be tricky to see. Anyway I just put whatever I feel like in my blog and today IÕm going to survey a bunch of things IÕve written about over the last couple of years tied together with a common theme. Some of these are other peopleÕs papers that IÕve enjoyed, but I hope one or two things are novel. Anyway, because of Haskell one thing really caught my interest - type theory. In particular, types had much in common with everyday algebra - you can add and multiply types, thereÕs a zero and one, you have distributivity, exponentials and so on. I should say which type theory IÕm talking about here but that can just unfold as we go along, bearing in mind that IÕm really talking about a subset of the Haskell type system. These are all concepts familiar from high school algebra and I think itÕs fun to put these ideas to work with types. CS people seem to make up new logics the way mathematicians make up new algebras. But an algebraic approach is more natural to me. On one level we can formalise what I mean by High School Algebra precisely. Back in 1888 Dedekind gave a list of identities that hold for all positive integers. [Dedekind ŌHigh School IdentitiesĶ 1, 2:30] Here are some of them. TheyÕre very similar to the axioms for a commutative semiring except that they donÕt mention zero. The amazing thing is that all of these identities hold also for types. IÕll make that more precise in a moment. [Dedekind ŌHigh School IdentitiesĶ 2:46] But even more amazing is that this is the rest or DedkindÕs list, and still these identities hold for types if we interpret exponentials as function types. (Or objects in bicartesian closed categories, but I wonÕt talk about that.) By the way, Tarski became interested in these identities and asked whether they were a complete set of axioms for the positive integers and +, * and exponentiation. Turns out they donÕt because there are a bunch of exotic identities that canÕt be proved from these axioms. [Commutativity] Anyway, just in case itÕs not clear IÕll give some examples of how these identities relate to types. Each equality corresponds to an isomorphism we can write in Haskell. By isomorphism I mean total function with an inverse. Commutativity just corresponds to swapping the terms in a disjunction of types. Left gets mapped to right and right gets mapped to left. [Distributivity, 3:52] Distributivity is almost as straightforward. YouÕre essentially just ŌpullingĶ the Left or Right out of the pair. [Associativity of Exponentiation, 4:00] I donÕt know what to call this axiom so I made up this name. ItÕs almost the same as the Haskell uncurry function except you need to swap the arguments. You can write down the inverses to these examples more or less immediately. The other identities are pretty easy as well. But note that something important is missing: zero. Of course zero is the type with no elements and 1 is the type with one element. (And 2 is the type with two elements and so on.) Possibly this was because of an ancient debate over what value 0^0 should take. But all computer scientists all know that 0^0=1 so we can add a few more identities. [Missing Identities, 4:17] There are some more slightly more obscure ones we could add too, but I donÕt really want to make a big deal of these axioms. The important point for me is that they form a way in to the subject because I might not be a computer scientist, but I do know how to do algebra. The crucial thing to note here is that the equality signifies the presence of an isomorphism. If we use the identities to derive an equality, backtracking through the details of the derivation will give a chain of functions whose composition implements the corresponding isomorphism. ItÕs like the Curry-Howard isomorphism. Mathematicians usually only care that a proof exists, but computer scientists need to know which proof you use because each proof gives a different program. But what about recursive types? For example, consider the type of naturals, n. [The Natural Type 1, 5:30] We could write it something like this using notation from the CS literature. The integers are a fixed point for the Ō1+Ķ operation. [The Natural Type, 2] We can write this in Haskell. [The Natural Type 3] But ultimately what weÕre saying is that n=1+n. Clearly we have departed from what Dedekind was talking about because in the original context there is no integer n such that n=1+n. But this isnÕt as scary as it looks. The high school identities donÕt imply a cancellation rule so n=1+n doesnÕt get us into trouble. We canÕt cancel the nÕs to get 0=1. Roughly speaking, for a suitable type theory any equation of the form x=f(x) of a fairly broad class (eliminating things like x=x) has a unique solution. ThatÕs an important point because if we find two types satisfying the same identity then they are isomorphic types (in the right type theory). [Binary Trees, 6:16] Consider the type of simple binary trees defined by the equation t=1+t^2. A tree is either a single node corresponding to the first term, or a pair of trees corresponding to the second term, and IÕve given examples of both types. [Can we deduce...?, 6:30, 1] ItÕs tempting to solve the equation t=1+t^2 and say that t is given by this expression and deduce that t^7=t. Even if we decided not to use complex numbers, itÕs still tempting to use this next argument instead. [Can we deduce...?, 2] Do either of these have any validity? Types are not complex numbers, and there isnÕt an obvious way to subtract types, but the strange thing is that there is a remarkable 18-step isomorphism that packs 7 trees into 1 isomorphically. [Justification, 7:07] One of my all time favourite papers is on precisely this subject, Andreas BlassÕs ŌSeven Trees in OneĶ. Later followed up with Fiore and LeinsterÕs paper who answered the above question precisely with a pair of theorems that show, surprisingly, that we can make the above argument valid. [The Theorems, 7:26] Here are the theorems quoted directly. The terms need some explanation. N is the naturals (starting at zero) and N[x] means polynomials in x with coefficients in N. A rig (which is another name for a semiring) is an algebraic structure that satisfies the axioms I gave earlier that donÕt involve exponentials (except we donÕt need to assume commutativity of multiplication). In a ring, addition forms an abelian group under addition which means that do also have subtraction. So in summary, the first theorem says that under mild conditions on p, if assuming x=p(x) we can show polynomials are equal using subtraction, we can also show it without subtraction. It might require a quite different derivation, but it can be done. The second theorem says that we donÕt even have to prove that the polynomials are equal. Under slightly more restricted conditions on p we only have to show that the polynomials are equal when evaluated at all of the roots of x=p(x). This means that the two suspect derivations I gave earlier can be turned into something valid. So we can use subtraction in intermediate stages of some computations with types. So what does a proof of t=t^7 look like if we donÕt use subtraction? Just for fun, and in the spirit of making reasoning about types as elementary as possible, I thought IÕd express this problem as a one player solitaire-like game. [A Game 1,9:11] The game is played on a semi-infinite strip of squares, each square large enough to hold a pile of pennies. You can see how IÕve labelled the squares with powers of t. [A Game 2] The problem of proving t^7=t is the challenge of getting from the first position to the second position here. The legal moves look like this. [Rule 1,9:39] Either two neighbour-but-one pennies undergo fusion to form one penny [Rule 2] or one penny undergoes fission and splits into two neighbours but one. Each legal move corresponds to a step in an algebraic derivation and an entire solution gives rise to a function mapping t^7 -> t and vice versa. [Solution, 10:09] Soon after I posted this to my blog with the name Nuclear Pennies someone came up with this solution. Cale Gibbard I believe. Just to prove that this really is high school level stuff, someone else who read my blog (Brent Yorgey) used this game to help teach the easy converse to the above theorem: if q_1(x)!=q_2(x) then you canÕt legally get from the position representing q_1 to q_2. It was probably pretty novel for these kids to see complex numbers having an application to a game. But what they didnÕt know is that they were also proving that various types werenÕt isomorphic, or at least that you canÕt build an isomorphism from the isomorphisms defined by the high school identities. [Functions of Types, 10:46] High school mathematics isnÕt just about algebra. When I was a lad we also studied calculus. At first it might seem that calculus has nothing to do with types in the sense IÕve been talking about, but Conor McBride made a remarkable discovery that the derivative of a type isnÕt just meaningful, itÕs potentially a very useful tool. But before that I need to talk about type functions. I donÕt want to set up too much formal structure as that wouldnÕt be appropriate in the informal approach IÕm taking here. I just want to think of functions as things that map types to types, like a Haskell type constructor. We can define type functions both explicitly and implicitly. In the first case, IÕm defining explicitly t to represent triples. The argument to t is the thing weÕre making a triple of. t is also an example of a container in the sense Conor talks about. So for any type x, t(x) is a triple of xÕs. In the second case IÕm defining l implicitly. A list of xÕs is either the empty list (1) or an x (forming the head) and another list that forms the tail. The isomorphism that forms the equality here is simply the list constructor. ConorÕs paper has stricter formal notation for all this but if we use that itÕs harder to see the connection with high school algebra. [Leibniz Rules 1, 12:06] The notation for calculus used by mathematicians can be a bit inconsistent. Here are some familiar rules of calculus often known as the Leibniz rules that we can use to differentiate type functions. In particular note the Leibniz product rule where you either differentiate the first term or you differentiate the second term but not both. [Leibniz Rules 2] But IÕll also use the ÔprimeÕ notation to indicate derivatives and IÕve rewritten some of the above rules using it as well as adding the chain rule which is more convenient in this form. [Holey Triple 1, 12:56] Now letÕs introduce the idea of a ÔholeÕ in a container. So considering the t(x)=x^3 example before, t(x) is a triple of xÕs, t(x^2) is a triple of pairs of xÕs and so on. A container with a hole is one where one of the contents is missing but we still have information about where the hole was. For example we see there are three different ways to make precisely one hole in in a triple. [Holey Triple 2] Now if we apply the Leibniz rules to x^3 we get exactly the same thing. The Leibniz rules tell us exactly how to ensure we only make one hole in our type. For example, the rule for differentiating a product says that either we differentiate one of the factors, or the other, but not both. We either get the hole in one factor or the other. The resulting data type in this case consists of three different kinds of pairs, or equivalently one pair and a three-valued value to indicate which position the hole was made in. (3, being 1+1+1, is just the type whose elements have one of three values.) [A List with a Hole... 2, 14:13] We can try another example. This time weÕll use the equation for lists I gave earlier. If you remember the rules for implicit differentiation (which I did at high school) we can write down an equation for lÕ(x) that can be immediately translated into Haskell, say. [A List with a Hole... 2] But we can do a bit better. With some rearrangement we see that lÕ(x) and l(x)^2 satisfy the same equation and so are isomorphic. In other words, a list with a hole is the same as two lists. [...is Two Lists 1, 14:51] This makes perfect sense. Just take the bit before and after the hole. ThatÕs precisely the same information as having a list with a hole in it. This isnÕt just abstract nonsense. The big application for holes is for making Gerard HuetÕs zippers. [...is Two Lists 2] We can use a hole to represent a position in a list waiting for a value to be inserted into it. But we can also do the same thing with other types such as trees and these are now commonplace in functional code. The useful thing is that with appropriate zippers you can walk up and down structures shifting the hole about as you go. ItÕs a bit like having a pointer into the structure in an imperative language, except that pointer is 100% guaranteed to point to a valid element of the structure purely by virtue of its type. You canÕt accidentally have a hole somewhere outside of the structure youÕre working with, you simply canÕt represent such a thing. Calculus makes it easy to derive the correct zipper for a given type. [The Illicit Way to do That 1, 16:05] HereÕs an illicit view of the same thing. Instead of leaving the equation for l(x) implicit we can try to make it explicit. We have to not only introduce subtraction but also division. Really it makes no sense at all, and yet the infinite series seems to make intuitive sense. A list has either zero elements or one element or two and so on. It follows immediately that a list with a hole is two lists. [The Illicit Way to do That 2] Pretend you didnÕt see that. [More Illicit Stuff 1] WeÕve talked about the Leibniz rules, but most of us meet calculus first in an analytic way as a kind of limit. For example, this equation holds for any sufficiently differentiable function f. But if we choose delta such that delta^2=0 then the formula is exact. [More Illicit Stuff 1] Actually, thatÕs not the illicit bit. Just as you can do complex arithmetic by setting i^2=-1, you can define a structure where delta^2=0. In fact, if you can implement complex numbers in your favourite programming language then you can define such a delta and it gives an elegant way to compute derivatives known as automatic differentiation. (Or at least itÕs one type of AD.) That, by the way, is something IÕve done at work. But the type theory comes in like this. Imagine delta^2=0 held for a type. That would mean something like that you are allowed to have an element of type delta, but not two such elements in the same computer at one time. ItÕs a bit like a C++ singleton or, IÕve been told, like something from affine logic. If you have such a type, then there is a type theoretical interpretation of the equation. An f(x+delta) is an container consisting not of elements of x but of either elements of x or elements of delta. But there can only be one element of delta. So we either have no elements of delta, so we have f(x), or we have precisely one element of delta which we can think of as living in a hole, in which case we get delta fÕ(x). Anyway, I donÕt know of any appropriate type system where this would be useful but it does give some intuition. Anyway, back on track. You might have noticed that IÕm missing some rules for differentiation. We donÕt know how to differentiate exponentials. [Differentiating Exponentials, 16:56] LetÕs look at some high school examples. ThereÕs a big problem. When we differentiate x^n we seem to need to be able to subtract 1. But we havenÕt defined subtraction of types. Conversely, if we could figure out how to differentiate exponentials, maybe that would tell us how to subtract. But before I do anything along those lines, hereÕs an example of why you might want something like subtraction. Suppose you have some type X and you want to store a pair of XÕs. Then you just need some data of type x^2. But what if you know that the two elements of the pair might be equal and want to store this fact. [Maybe Identical Pairs 1, 17:28] You could just store something of type 2x^2, ie. a pair and a binary flag signalling equality. But thatÕs wasteful as we store two copies of our data whether theyÕre identical or not. [Maybe Identical Pairs 2] A better match would be x+x^2. But thatÕs still wasteful as the x^2 component will never actually be used to store identical pairs even though it can do so. If the pairs are identical itÕs only the first term that gets used. [Maybe Identical Pairs 3] What we really want is a type y such that y+x=x^2. The corresponding isomorphism should embed x along the diagonal of x^2 and y should cover everything off the diagonal. I call it the antidiagonal, the type of distinct pairs. [Maybe Identical Pairs 4] In a sense, y=x^2-x=x(x-1). How do we construct such a thing? [Maybe Identical Pairs 5] IÕll borrow some notation from Knuth here. He defines the underlined exponent like this. Often itÕs called the falling factorial or the Pochhammer symbol. WeÕre initially interested in the case n=2. The earlier slide gives us a hint. [Differentiating Exponentials, 18:54] Check out the n(n-1). If we could differentiate exponentials twice (with n a type, not just an integer) weÕd be able to get something with an n(n-1) in it. (And differentiating just once should recover something containing an n.) In fact, if we set x=1 we could get exactly the type we want. But (1) can we do it? and (2) does what we get actually solve y+x=x^2? [Exponentials, 19:30] Back to our high school rules again. Consider the rules for exponentials. TheyÕre well known rules but a few years back Ralf Hinze noticed that they actually give a really nice way to lazily memoise functions. The idea is that if an exponent can be written as a sum or a product then it can be reduced to a combination of exponents of the individual terms or factors as in the third and fourth rule here. Using this we we can eliminate exponentials for all types that can be built from zero and one using sums and products. [Binary Lists, 20:09] For example consider a list of two-valued booleans whose equation is given by x=1+2x. Either we have 1, an empty list, or 2x, a boolean value and another list. We can carry out this reduction and at the end we can write an equation for t(y) for any type y. We can see that t(y) is an infinite binary tree which we can draw like this. [A Binary List Trie, 21:13] The equation says such a tree always has a root and two subtrees making it an infinite structure. Note how we have a 1-1 correspondence between nodes in the tree and lists of bools. That is why this structure memoises functions f from x->y. For each value of x this tree stores a value of f at the appropriate node. IÕve labelled the elements of the type Ô2Õ as T and F. Taking a left branch means having an element F in our list. Taking the right branch means there was a T. And we stop walking if we reach the end of the list. This is just like the traditional trie (spelt T R I E) structure for strings where we find the value we want by taking a path through the trie corresponding to the string. You can think of the ÔaddressÕ of an element in this tree structure being given by elements of x. More importantly, we have replaced a function with a tree structure and equivalently weÕve replaced an exponential type with one without the tricky exponential. [A Trie With a Hole] We can differentiate this to get an infinite binary tree with a hole. I just did an ikmplicit differentiation again. We can easily interpret this equation: A binary tree with a hole either has a hole at the root, in which case we just get two subtrees, or it has a hole on one side or the other, in which case we have a root element, a branch that is a tree, and a branch that is a tree with a hole. IÕve underlined the term that contains the example hole IÕve drawn. In this case the right branch is a whole tree, and the left branch has a hole. Now remember what I said earlier. Making a hole in a container, and then choosing the contents of the container to all be 1, we should be left with just the exponent. Now think about the trie. If we make a hole in the tree, the position, or ŌaddressĶ, of the hole is given by a binary list, like [F,F,T] in this case If we now store the type 1 everywhere in this tree weÕre essentially storing no information in it other than the position of the hole. In other words, weÕve explained why the derivative of y^n at y=1 is n. WeÕre basically encoding elements of a type as the position of a hole in a trie. So what happens if we differentiate twice? [A Trie with Two Holes, 24:11] We get two holes. But obviously these holes canÕt be in the same place, as once weÕve made one hole, that position no longer exists for us to make another hole in it. So if a single hole encodes a value of type x, a pair of holes encodes two *distinct* values of type x. ItÕs the antidiagonal. We now have a sort of explanation for why the second derivative of y^n at y=1 is n(n-1). As I said above, n(n-1) is ÔmorallyÕ the type of pairs of distinct elements. So in total we now have a procedure for performing our subtraction. We can evaluate n(n-1) by using HinzeÕs method to eliminate the difficult part of y^x, and then setting y=1. This reveals that there is a deep connection between between derivatives of types, and collections of distinct elements. One last thing to do is ensure that this actually works. WeÕve done some strange things along the way and really what IÕve given is more of a plausibility argument. We can code this up as Haskell code like this. [The Details] IÕll write p (for pair) = tÕÕ(1) and remember that x=tÕ(1) is the original list of booleans. What IÕve done is try to give each term a meaningful constructor name. Each possibility for a pair of distinct boolean lists is explicitly represented. For example, if the first of the pair is empty then the second element of the pair canÕt be empty, so it must have both a head and a tail, ie. a bool and a list of bools. At the other end, if the lists have the same head then the tails must differ so we can use p recursively to encode that. The result is that we actually get a kind of compression for pairs of lists. If, as we walk down the lists together, we find they have shared tails, then we can just store that shared tail once. For lists of more general types not only do we get to share tails, but even if elements differ we can exploit sharing within those elements. We might get this with lists of lists of booleans. And if we do this with trees (binary trees say, not TRIEs) then we get explicit sharing within the tree when the two subtrees of the same node are the same. ItÕs not a complete solution to the problem of sharing, but it does give us something useful. What I find surprising here is that I didnÕt really have to think very hard to write this code. I was just doodling with elementary calculus and tried deriving the corresponding code. It was only after the fact that I realised that this code was recursively descending through my tree trying to factor out the commonality between subtrees. ItÕs weird when code writes itself like that and you have to then reverse engineer it to figure out what it does! IÕve a few more examples of those on my blog. First time through it wasnÕt totally trivial to extract the corresponding Haskell code. But we can simplify the process for more general types and make the compiler do the work for us. [More General Types, 28:02] By using the Leibniz product rule for x^(m+n) written as the product of x^m and x^n we find out how to compute the antidiagonal of the sum of two types. We can get a similar formula for the antidiagonal of the product of two types. Note how the terms correspond to the terms in the product rule for 2nd derivatives. In fact, the rule for the sum looks just like the usual binomial expansion. We get a similar rule for product types by using a second derivative version of the chain rule. So just as we can use HinzeÕs rules to reduce exponentials, we can also reduce antidiagonals. We can code this up in Haskell using multiparameter type classes. It becomes even even more easy to automate if we use a generic functional programming language or Lammel and Peyton JonesÕs ŌScrap Your BoilerplateĶ approach. BTW If you donÕt trust this nonsensical calculus you can take these rules for products and sums as defining the underlined exponent. You can then use some of the high school identities to prove by induction that they behave like antidiagonals for all types built from zero, one, sums, products and fixedpoints. We can even go further and try to find higher order antidiagonals, ie. consider the type of n distinct elements of a type. IÕll just sketch how it goes. [More General Antidiagonals, 29:39] Just as I used the Leibniz product rule and the chain rule to get a rule for sums and products, we can use the generalised product rule and the generalised chain rule (known as the Faa di Bruno formula) to get equations for reducing the generalised antidiagonals of sum and product types. (Note that n and r are naturals here!) The first one is a neat example of what is known as Umbral calculus. If we wrote it with non-underlined exponents then it would be the usual binomial formula. So underlined exponents satisfy a similar equation to non-underlined exponents. ItÕs a weird kind of ŌshadowyĶ copy of what normal exponents do, and for a long time they werenÕt well understood, hence the name Umbral. ItÕs nice to see identities from combinatorics making sense for types, though of course I havenÕt written down the actual isomorphisms, just asserted their existence by writing an equals sign. In fact, IÕm not even sure what the best language is for this - maybe you could do it in Agda or Omega. The second identity uses the Faa di Bruno formula which contains things called Bell polynomials. Everything makes sense for types because all of the Bell polynomial coefficients are positive integers. And if you donÕt know the Faa di Bruno formula, I sketched a proof of this formula on my blog just a few days ago. The proof was derived by thinking about making holes in containers of containers. So not only does this kind of combinatorial algebra and calculus have something to say about types. Thinking about derivatives as holes gives useful intuition when thinking about combinatorics, algebra and calculus! In fact Jacques Carette seems to be the person defining the approach we should take by considering what are known as species. I hope to see lots more being written on combinatorics and types in the future. And that seems like a good note to end on. I hope IÕve convinced you that all kinds of informal arguments about algebra and calculus can generate useful results. Thank you. And make the questions easy :-)