Foundations of Computer Science Lecture Notes ENGR 3520 Fall 2013 Thursday, Nov 21, 2013 λ-calculus λ-terms. A λ-term is either: A variable x, y, z,... λx.m M N (M) (where x is a variable and M a λ-term) (where M, N are λ-terms) (where M is a λ-term) Examples: x λx.x λy.(λx.x) λx.(x (λy.y)) Conventions: Application associates to the left, so that M N P should be understood as (M N) P. The body of a function λx.m goes as far to the right as possible. Thus, λx.λy.x should be understood as λx.(λy.x). The scope of λx in λx.m is all of M. A variable x is free in a λ-term if some use of x does not appear in the scope of a λx. Examples: y is free in λx.y; x is free in λy.(x (λx.x)); z is not free in λz.(λx.z) Substitution. Substitution: M[x N] x[x N] = N y[x N] = y (if x y) (M 1 M 2 )[x N] = M 1 [x N] M 2 [x N] (λx.m)[x N] = λx.m (λy.m)[x N] = λy.(m[x N]) (if y is not free in N) We can always rename bound variables to avoid conflict: 1
λx.m = λy.(m[x y]) if y is not free in M. (There is a question what happens when the substitution rules do not apply. For instance, (λx.y)[y x] is not defined, because x, the variable parameter in λx.y, is free in x, the term being substituted for y. One possibility is to return an error. Another possibility is to apply renaming to the problematic variable parameter to get (λz.y)[y x] which by the substitution rules above gives you λz.x. The latter approach is often called capture-avoiding substitution.) Reduction Rules. Reduction (or evaluation): M N (λx.m) N M[x N] M P N P (if M N) P M P N (if M N) λx.m λx.n (if M N) A term of the form (λx.m) N is called a redex. It is the one place where a reduction does any work. Examples: (λx.x) (λy.y) x[x λy.y] = λy.y ((λx.(λy.x)) z 1 ) z 2 (λy.x)[x z 1 ] z 2 = (λy.z 1 ) z 2 z 1 [y z 2 ] = z 1 ((λx.(λy.y)) (λz.z)) (λx.(λy.x)) (λy.y)[x λz.z] (λx.(λy.x)) = (λy.y) (λx.(λy.x)) y[y λx.(λy.x)] = λx.(λy.x) 2
The λ-term could be written more simply as using the conventions above. Similarly, could be written I will use the conventions fully from now on. ((λx.(λy.x)) z 1 ) z 2 (λx.λy.x) z 1 z 2 ((λx.(λy.y)) (λz.z)) (λx.(λy.x)) (λx.λy.y) (λz.z) (λx.λy.x) I will generally expand out substitutions immediately in reductions, skipping the intermediate step of showing what the substitution looks like. A λ-term is in normal form if there is no applicable reduction. Not every λ-term eventually reduces to a normal form: (λx.x x) (λx.x x) (λx.x x) (λx. x) (λx.x x) (λx. x)... There can be more than one redex in a λ-term, meaning that there may be more than one applicable reduction. For instance, ((λx.x) (λy.x)) ((λx.λy.x) z 1 z 2 ). A property of the λ- calculus is that all the ways to reduce a term to a normal form yield the same normal form (up to renaming of bound variables). This is called the Church-Rosser property. It says that the order in which we perform reductions to reach a normal form is not important. Encoding Booleans. Even though the lambda-calculus only has variables and functions, we can encode traditional data types within in. Boolean values encoding (à la Church): true = λx.λy.x false = λx.λy.y In what sense are these encodings of Boolean values? Booleans are useful because they allow you to select one branch or the other of a conditional expression. The trick is that when B reduces to either true or false, then B M N reduces either to M or to N, respectively: 3
If B true, then B M N true M N = (λx.λy.x) M N (λy.m) N M while if B false, then B M N false M N = (λx.λy.y) M N (λy.y) N N We can also define an explicit if expression: if = λc.λx.λy.c x y so that B M N could also be written if B M N. We can define logical operators: and = λm.λn.m n m or = λm.λn.m m n not = λm.λx.λy.m y x Thus: and true false = (λm.λn.m n m) true false (λn.true n true) false true false true = (λx.λy.x) false true (λy.false) true false not false = (λm.λx.λy.m y x) false (λx.λy.false y x) = (λx.λy.(λu.λv.v) y x) (λx.λy.(λv.v) x) (λx.λy.x) = true 4
Encoding Natural Numbers. 0 = λf.λx.x 1 = λf.λx.f x 2 = λf.λx.f (f x) 3 = λf.λx.f (f (f x)) 4 =... In general, natural number n is encoded as λf.λx.f n x. Successor operations: succ = λn.λf.λx.(n f) (f x) succ 1 = (λn.λf.λx.(n f) (f x)) (λf.λx.f x) λf.λx.((λf.λx.f x) f) (f x) λf.λx.(λx.f x) (f x) λf.λx.f (f x) = 2 Other operations: plus = λm.λn.λf.λx.(m f) (n f x) times = λm.λn.λf.λx.m (n f) x iszero? = λn.n (λx.false) true plus 1 2 = (λm.λn.λf.λx.(m f) (n f x)) 1 2 (λn.λf.λx.(1 f) (n f x)) 2 λf.λx.(1 f) (2 f x) = λf.λx.((λf.λx.f x) f) ((λf.λx.f (f x)) f x) λf.λx.(λx.f x) ((λf.λx.f (f x)) f x) λf.λx.f ((λf.λx.f (f x)) f x) λf.λx.f ((λx.f (f x)) x) λf.λx.f (f (f x)) = 3 5
times 2 3 = (λm.λn.λf.λx.m (n f) x) 2 3 (λn.λf.λx.2 (n f) x) 3 λf.λx.2 (3 f) x = λf.λx.(λf.λx.f (f x)) ((λf.λx.f (f (f x))) f) x λf.λx.(λf.λx.f (f x)) (λx.f (f (f x))) x λf.λx.(λx.(λx.f (f (f x))) ((λx.f (f (f x))) x)) x λf.λx.(λx.(λx.f (f (f x))) (f (f (f x)))) x λf.λx.(λx.f (f (f (f (f (f x)))))) x λf.λx.f (f (f (f (f (f x))))) = 6 iszero? 0 = (λn.n (λx.false) true) (λf.λx.x) (λf.λx.x) (λx.false) true (λx.x) true true iszero? 2 = (λn.n (λx.false) true) (λf.λx.f (f x)) (λf.λx.f (f x)) (λx.false) true (λx.(λx.false) ((λx.false) x)) true (λx.false) ((λx.false) true) false More difficult is defining a predecessor function, taking a nonzero natural number n and returning n 1. There are several ways of defining such a function; here is probably the simplest: pred = λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u) pred 2 = (λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)) (λf.λx.f (f x)) λf.λx.(λf.λx.f (f x)) (λg.λh.h (g f)) (λu.x) (λu.u) λf.λx.(λx.(λg.λh.h (g f)) ((λg.λh.h (g f)) x)) (λu.x) (λu.u) 6
λf.λx.(λg.λh.h (g f)) ((λg.λh.h (g f)) (λu.x)) (λu.u) λf.λx.(λg.λh.h (g f)) (λh.h ((λu.x) f)) (λu.u) λf.λx.(λg.λh.h (g f)) (λh.h x) (λu.u) λf.λx.(λh.h ((λh.h x) f)) (λu.u) λf.λx.(λh.h (f x)) (λu.u) λf.λx.(λu.u) (f x) λf.λx.f x = 1 Note that pred 0 is just 0: pred 0 = (λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)) (λf.λx.x) λf.λx.(λf.λx.x) (λg.λh.h (g f)) (λu.x) (λu.u) λf.λx.(λx.x) (λu.x) (λu.u) λf.λx.(λu.x) (λu.u) λf.λx.x = 0 Encoding Pairs. pair = λx.λy.λz.z x y first = λp.p (λx.λy.x) second = λp.p (λx.λy.y) first (pair 1 2) = (λp.p (λx.λy.x)) ((λx.λy.λz.z x y) 1 2) (λp.p (λx.λy.x)) ((λy.λz.z 1 y) 2) (λp.p (λx.λy.x)) (λz.z 1 2) (λz.z 1 2) (λx.λy.x) (λx.λy.x) 1 2 (λy.1) 2 1 second (pair 1 2) = (λp.p (λx.λy.y)) ((λx.λy.λz.z x y) 1 2) (λp.p (λx.λy.y)) ((λy.λz.z 1 y) 2) 7
(λp.p (λx.λy.y)) (λz.z 1 2) (λz.z 1 2) (λx.λy.y) (λx.λy.y) 1 2 (λy.y) 2 2 Recursion. With conditionals and basic data types, we are very close to having a Turingcomplete programming language (that is, one that can simulate Turing machines). All that is missing is a way to do iteration: loops. It turns out we can write recursive functions in the λ-calculus, which gives us loops. Consider factorial. Intuitively, we would like to define fact by fact = λn.(iszero? n) 1 (times n (fact (pred n))) but this is not a valid definition, since the right-hand side refers to the term being defined. It is really an equation, the same way x = 3x is an equation. Consider that equation, x = 3x. Define F (t) = 3t. Then, a solution of x = 3x is really a fixed-point of F, namely, a value t 0 for which F (t 0 ) = t 0. And F has only one fixed-point, namely t 0 = 0, which gives us the one solution to x = 3x, namely x = 0. Similarly, if we define F fact = λf.λn.(iszero? n) 1 (times n (f (pred n))) then we see that the definition that we re looking for is a fixed point of F fact, namely, a term f such that F fact f = f. Indeed, if we have such a term, then: f 3 = F fact f 3 = (λf.λn.(iszero? n) 1 (times n (f (pred n)))) f 3 (λn.(iszero? n) 1 (times n (f (pred n)))) 3 (iszero? 3) 1 (times 3 (f (pred 3))) times 3 (f (pred 3)) times 3 (f 2) = times 3 (F fact f 2) times 3 (times 2 (f 1)) = times 3 (times 2 (F fact f 1)) times 3 (times 2 (times 1 (f 1))) = times 3 (times 2 (times 1 (F fact f 1))) times 3 (times 2 (times 1 1)) 8
6 (The notation indicates that one or more reductions have taken place.) Thus, what we need is a way to find fixed points in the λ-calculus. The following function does just that: Y = λf.(λx.f (x x)) (λx.f (x x)) Y F gives us a fixed point of F. (Technically, Y F reduces in one step to a fixed-point of F ): Y F = (λf.(λx.f (x x)) (λx.f (x x))) F (λx.f (x x)) (λx.f (x x)) F ((λx.f (x x)) (λx.f (x x))) F (F ((λx.f (x x)) (λx.f (x x))))... So indeed, (λx.f (x x)) (λx.f (x x)) is a fixed point of F. We can use Y to define our factorial function: fact = Y F fact and we can check: fact 3 = Y F fact 3 = (λf.(λx.f (x x)) (λx.f (x x))) F fact 3 (λx.f fact (x x)) (λx.f fact (x x)) 3 F fact ((λx.f fact (x x)) (λx.f fact (x x))) 3 = F fact fact 3 = (λf.λn.(iszero? n) 1 (times n (f (pred n)))) fact 3 (λn.(iszero? n) 1 (times n (fact (pred n)))) 3 (iszero? 3) 1 (times 3 (fact (pred 3))) times 3 (fact (pred 3)) times 3 (fact 2) = times 3 (F fact fact 2) times 3 (times 2 (fact 1)) = times 3 (times 2 (F fact fact 1)) times 3 (times 2 (times 1 (fact 1))) = times 3 (times 2 (times 1 (F fact fact 1))) 9
times 3 (times 2 (times 1 1)) 6 where fact = (λx.f fact (x x)) (λx.f fact (x x)) is the fixed point of F fact, such that fact = F fact fact. 10