A Brief Introduction to the λ-Calculus (Part 2)

In the second part of this series on λ-calculus, we’ll explore how to perform recursion in λ-calculus and how to represent numbers in λ-calculus using Peano arithmetic. At the end of this post, I’ll show how to code an algorithm to compute the nth number in the Fibonacci sequence using λ-calculus.

A word of warning, this post contains some challenging concepts. Take it slow and make sure you understand each step before moving on to the next one.

Self application

To begin with, we’ll look at an interesting behavior that happens when you call the self_apply function with itself as an argument.

Recall that self_apply is defined as:

self_apply = λi.(i i)

Applying self_apply to itself:

(self_apply self_apply)
=> (λi.(i i) λi.(i i))
=> (λi.(i i) λj.(j j)) [α-conversion]
=> (λj.(j j) λj.(j j)) [β-reduction]
=> (self_apply self_apply)

Notice that applying α-conversion and β-reduction the statement (self_apply self_apply) results in (self_apply self_apply). No matter how many operations are applied to this expression we get the same expression as a result.

If you had a machine which could choose which operation to apply to a given λ-calculus expression and which would apply them automatically, this expression would cause an automated infinite self repeating operation. We’ll take advantage of this behavior soon.

Recursion

Recall from part 1 that we can’t perform recursion in λ-calculus by using function names inside function definitions. Let’s temporarily lift that restriction to imagine an example of what we’d like to achieve:

WARNING: What follows is not real λ-calculus! It is only presented for illustrative purposes.

f = λi.(f (G i))

Repeatedly expanding the definition of such a function would look like this:

(f x)
=> (λi.(f (G i)) x)
=> (λi.(λi'.(f (G i')) (G i)) x)
=> (λi.(λi'.(λi''.(f (G i'')) (G i')) (G i)) x)

I had to rename variables as part of these pseudo-operations because this isn’t actually valid λ-calculus.

Let’s stop the recursion there by deleting the f and see what would result if we could terminate this iteration. Again, this is not a real λ-calculus operation:

(λi.(λi'.(λi''.(G i'') (G i')) (G i)) x)
=> (λi'.(λi''.(G i'') (G i')) (G x)
=> (λi''.(G i'') (G (G x))
=> (G (G (G x)))

As you can see, this would prefix a series of calls to G applied to x.

This illustrates the kind of behavior we want, repeatedly expanding f, resulting in a series of recursive operations.

We can achieve this kind of recursion without referring to the function name in its function body by taking the next iteration of f as an argument, f’, to the function f. We’re back to real λ-calculus from now on:

f = λf'.λi.λc.(c i (f' f' (G i)))

Notice that we need to pass a copy of the f function to f as its first argument to use for further recursion. We use a selector function, c, to terminate the recursion by discarding the (f’ f’ (G i)) expression if c is true. In the case where c is false we continue the recursion by selecting the (f’ f’ (G i)) expression. We have two instances of f’ in the recursion expression so that further recursive calls to f can re-use the copy of f we already have.

When c is false, f returns a function, (f’ f’ (G i)), which takes a selector function as an argument to terminate the iteration. This way we can specify when to terminate the function by passing it a series of false and true values.

We can call the function recursively to duplicate G a finite number of times in front of x as follows. We need to call f with itself as an argument followed by its intended argument, x, and a string of Boolean selector functions:

(f f x false false true)
=> (λf'.λi.λc.(c i (f' f' (G i))) f x false false true)
=> (λi.λc.(c i (f f (G i))) x false false true)
=> (λc.(c x (f f (G x))) false false true)
=> ((false x (f f (G x))) false true)
=> ((λf'.λi.λc.(c i (f' f' (G i))) f (G x)) false true)
=> ((λc.(c (G x) (f f (G (G x)))) false true)
=> ((false (G x) (f f (G (G x)))) true)
=> ((λf'.λi.λc.(c i (f' f' (G i))) f (G (G x))) true)
=> ((λi.λc.(c i (f f (G i))) (G (G x))) true)
=> ((λc.(c (G (G x)) (f f (G (G (G x)))))) true)
=> (true (G (G x)) (f f (G (G (G x)))))
=> (G (G x))

Notice how we use the false function to continue the recursion and the true function to terminate the recursion.

Y

This kind of manual recursive function call is fine for performing recursion when we know the number of iterations we need ahead of time, but what about if we don’t know the number of iterations ahead of time? We need an automatic mechanism for performing recursion which can terminate when we encounter a terminating condition.

We saw a function, self_apply, which performs an automatic infinite operation sequence in the first section of this article. Notice in that section how the body (i i) causes the infinite repetition? Let’s modify self_apply by adding in a conditional function as an argument that can conditionally discard the repetition (i i):

half_recur = λf.λi.(f (i i))

If we pass false to this function, the (i i) self application is not evaluated:

half_recur false F G 
=> λf.λi.(f (i i)) false F G
=> λi.(false (i i)) F G
=> (false (F F)) G
=> (λfirst.λsecond.second (F F)) G
=> λsecond.second G
=> G

false terminates the automatic self application!

If we apply this function to true, we get the self apply behavior, (F F), as a result.

Applying half_recur to a function, F, we get another function:

half_recur F
=> λf.λi.(f (i i)) F
=> λi.(F (i i))

By applying this function to itself, we get the famous Y-combinator, also known as the fixed point combinator, discovered by Haskell Curry:

Y = λf.(λi.(f (i i)) λi.(f (i i)))

The Y-combinator has a very useful behavior when it’s applied to a function:

(Y F)
=> (λf.(λi.(f (i i)) λi.(f (i i))) F)
=> (λi.(F (i i)) λi.(F (i i)))
=> (λi.(F (i i)) λi'.(F (i' i'))) [α-conversion]
=> (F (λi'.(F (i' i')) λi'.(F (i' i'))))
=> (F (λi.(F (i i)) λi.(F (i i)))) [α-conversion]
== (F (Y F))

The Y-combinator duplicates the function it’s passed as a prefix to itself!

This behavior allows us to perform general recursion.

Notice that this is not necessarily an infinite recursion. If F is a conditional function, a false condition would discard the Y-combinator and terminate the recursion!

As a personal note, I think that Curry’s discovery of the Y-combinator was an act of sheer genius. It is a remarkable mechanism.

Numbers

In the late 19th century, Guiseppe Peano created a set of axioms for natural numbers which we can use to perform arithmetic in λ-calculus. (I’m going to skip over the undecidability of Peano’s arithmetic for the purpose of this article).

Peano’s axioms use a “successor” function, as well as defining the number 0 as a natural number. We’ll use the following two functions for zero and successor:

zero = identity == λi.i
successor = λn.λs.((s false) n)

This kind of numbering system, described in Greg Michaelson’s book, is an alternative to Church Numerals. Michaelson’s formulation of Peano’s arithmetic makes it easier for us to perform comparisons on numbers.

Any natural number can be created by applying the successor function a number of times to zero. For example:

one = (successor zero)
two = (successor (successor zero))
three = (successor (successor (successor zero)))

Every time we apply successor to a number, it creates a pair of arguments false and n to a function s. We can define the natural numbers like so:

one = successor zero
=> λn.λs.((s false) n) zero
=> λs.((s false) zero)

two = successor one == successor successor zero
=> λn.λs.((s false) n) λn.λs.((s false) n) zero
=> λn.λs.((s false) n) λs.((s false) zero)
=> λn.λs.((s false) n) λs'.((s' false) zero)
=> λs.((s false) λs'.((s' false) zero))

and so on...

Notice that all numbers are functions which take a selector function as an argument.

We choose to define numbers this way because in the case where we apply a number to true, the result will be false, unless the number is zero, in which case we’ll get true as a result:

zero true
=> λi.i true
=> true

one true
=> λs.((s false) zero) true
=> ((true false) zero)
=> ((λfirst.λsecond.first false) zero)
=> (λsecond.false zero)
=> false

This is because true has the same behavior as select_first and the first argument which will be passed to the selector for any non-zero number will always be false because of how we defined successor (successor always puts false as the first argument to the selector).

This behavior allows us to create an is_zero function:

is_zero = λn.(n true)

is_zero applied to zero will evaluate to true, and if applied to any other natural number, it will evaluate to false.

Since we defined natural numbers as recursive applications of the successor function, we can discard one application of the successor function to get the predecessor of a number. We’ve previously seen a function which will discard the first argument in a pair of functions, the select_second function (a.k.a false).

Here’s what happens when we apply a non-zero natural number (successor X) to false:

((successor X) false)
=> ((λn.λs.((s false) n) X) false)
=> (λs.((s false) X) false)
=> ((false false) X)
=> ((λfirst.λsecond.second false) X)
=> (λsecond.second X)
=> X

By applying the number to false, we’ve stripped the successor function prefix from X!

We can define a predecessor function like this:

predecessor = λn.((is_zero n) n (n false))

We need to handle zero as a special case, so we use is_zero in this function.

Addition

In order to add two numbers to one another, we’ll take the approach of incrementing one number while decrementing the other:

add_iter = λadd_iter'.λx.λy.((is_zero y) x (add_iter' (successor x) (predecessor y)))

We can then define the add function by calling this increment/decrement function recursively with the Y-combinator:

add = Y add_iter

Let’s try adding 1+1.

This next section is pretty long, but it demonstrates that addition can work just by applying the four operations of λ-calculus:

add one one
=> (Y add_iter) one one
=> λf.(λi.(f (i i)) λi.(f (i i))) add_iter one one
=> (λi.(add_iter (i i)) λi’.(add_iter (i’ i’))) one one
=> add_iter (λi’.(add_iter (i’ i’)) λi’.(add_iter (i’ i’))) one one

Replacing (λi’.(add_iter (i’ i’)) λi’.(add_iter (i’ i’))) by (Y add_iter)

=> add_iter (Y add_iter) one one
=> λadd_iter'.λx.λy.((is_zero y) x (add_iter' (successor x) (predecessor y))) (Y add_iter)) one one
=> λx.λy.((is_zero y) x ((Y add_iter) (successor x) (predecessor y))) one one
=> λy.((is_zero y) one ((Y add_iter) (successor one) (predecessor y))) one
=> ((is_zero one) one ((Y add_iter) (successor one) (predecessor one)))
=> (false one ((Y add_iter) (successor one) (predecessor one)))
=> (false one ((Y add_iter) (successor one) (predecessor one)))
=> ((Y add_iter) (successor one) (predecessor one))

Replacing successor one by two and predecessor one by zero

=> (Y add_iter) two zero
=> (λf.(λi.(f (i i)) λi.(f (i i))) add_iter add_iter) two zero
=> ((λi.(add_iter (i i)) λi.(add_iter (i i))) add_iter) two zero
=> ((λi.(add_iter (i i)) λi’.(add_iter (i’ i’)))) two zero
=> add_iter (λi’.(add_iter (i’ i’)) λi’.(add_iter (i’ i’))) two zero

Replacing (λi’.(add_iter (i’ i’)) λi’.(add_iter (i’ i’))) by (Y add_iter)

=> add_iter (Y add_iter) two zero
=> λadd_iter'.λx.λy.((is_zero y) x (add_iter' (successor x) (predecessor y))) (Y add_iter) two zero
=> λx.λy.((is_zero y) x ((Y add_iter) (successor x) (predecessor y))) two zero
=> λy.((is_zero y) two ((Y add_iter) (successor two) (predecessor y))) zero
=> ((is_zero zero) two ((Y add_iter) (successor two) (predecessor zero)))
=> (true two ((Y add_iter) (successor two) (predecessor zero)))
=> two

Yes, I know that took a lot of work; but it is a valid arithmetic procedure which is performed using nothing but functions!

Fibonacci

Now that we have numbers, arithmetic, and recursion, it’s trivial to define a function to calculate the nth number in the Fibonacci sequence:

fib_iter = λfib_iter'.
λn.((is_zero n)
zero
((is_zero (predecessor n))
one
(add
(fib_iter' (predecessor n))
(fib_iter' (predecessor (predecessor n))))))

fib = Y fib_iter

Conclusion

That’s essentially all we need to use λ-calculus to perform general computations on the natural numbers. As Countess Ada Lovelace famously realized, any machine which can perform general numerical computations is sufficiently powerful to perform computations on any kind of data.

There’s a lot more to λ-calculus than what I’ve discussed in these posts, but this should be enough to give you the tools you need to continue exploring λ-calculus.

References

An Introduction to Functional Programming Through Lambda Calculus by Greg Michaelson

Lambda Calculus at Wikipedia

The Lambda Calculus at Stanford Encyclopedia of Philosophy

Haskell theoretical foundations – Lambda calculus

Normal, Applicative and Lazy Evaluation

Advertisement

A Brief Introduction to the λ-Calculus (Part 1)

In this post, I’ll be discussing the untyped λ-calculus (lambda calculus). λ-calculus forms the basis of all functional programming languages and is one of the three theoretical models of computing, the other two being Turing Machines and Recursive Function Theory.

Alonso Church created λ-calculus in the 1930s as a formal system of mathematical logic for computation based on function abstraction and application. Church envisioned a simple language which contains only functions. λ-calculus doesn’t even have Boolean values or numbers. We’ll explore how to represent Boolean values using only functions below and we’ll cover representing numbers using only functions in a later post.

The most important distinction between imperative languages and functional languages is how they perform abstraction. In an imperative language, abstraction is performed by assigning names to variables which can change value over time. This is similar to how the parts of a Turing Machine can change over time; e.g. the tape of the machine can move. In functional programming languages, abstraction is performed by assigning names to values and functions which never change and computing new values by applying functions to values. In λ-calculus, functions are immutable values which we can name.

For example, here is the identity function:

λi.i

The function starts with the λ symbol, followed by a name, representing the argument of the function. A period separates the argument of the function from the body of the function. The body of the function can be any λ-calculus expression. Names in λ-calculus can be any string of characters except spaces, parentheses, . and λ.

For convenience, we can give names to our functions (technically this is an extension of the λ-calculus). Let’s define a few example functions so you can get used to the syntax:

identity = λi.i
self_application = λi.ii
apply = λx.(λy.xy)

Don’t worry too much about what these do just yet, you’ll probably be able to figure out what they do after you learn about β-reduction below.

Notice that the body of a function can also contain other functions. Functions are first class in λ-calculus, just like in any other functional programming language!

In the identity function above, i is used to form an abstraction. In the function, i can refer to anything until the function is specialized by application. For example, suppose we have a name, G, we could apply the identity function to the name G to specialize the function. We indicate that we want to apply the function by placing it in front of the argument G in parentheses:

(λi.i G)

In order to actually apply the value, we replace the instances of i in the function body with the argument, G. In this case, we get the result G:

=> G

The process we performed above is an operation. We use => to indicate that an operation was performed on (λi.i G) to compute G. We could also write the steps above like this:

(λi.i G) => G

Operations advance an algorithm to its next step, eventually resulting in a solution. In imperative languages, there are many operations, each of which alter the value of variables or the program counter. In λ-calculus there are only four operations:

  • λ-abstraction (lambda abstraction)
  • β-reduction (beta reduction)
  • α-conversion (alpha conversion)
  • η-conversion (eta conversion)

It’s important to note that naming functions is not an operation. Function names are assigned statically, they can not be used inside their own function definition. They are simply an alias for the function and can’t be used for recursion.

Operations in the λ-calculus

λ-abstraction

λ-abstraction is simply the introduction of a lambda function. For example, we could create a new function using the λ-abstraction operation:

λx.xy

Also, notice that there are two names, x and y, in the body of the function but only x appears in the function argument. This is a valid λ-calculus expression.

We say that the name x is bound and the name y is free. A function which has no free variables is called a combinator and a function with at least one free variable is called a closure.

Sometimes the parentheses are excluded from nested functions like this:

apply = λx.λy.xy

In this case, you can always add parentheses by following the rule that lambda abstractions are right-associative.

β-reduction

β-reduction is process of applying a function to a value, as we saw above:

(λi.ij G) => Gj

For convenience, the parentheses are often excluded. If they are not used, you can always add them by using the rule that function applications are left-associative.

α-conversion

α-conversion allows us to rename an argument to avoid a name collision. To do this, we choose a different name for the argument and replace the old name with the new one wherever it exists in the function body:

λi.ij => λp.pj

For example, the two instances of j in this expression refer to different values, so we need to rename them using α-conversion before applying β-reduction:

(λi.iji λj.j) => (λi.iji λx.x) => (λx.x jλx.x) => jλx.x

η-conversion

η-conversion is used when the argument of a function only appears as the last term of the function body. In this case, the function can be simplified to remove the argument:

λi.RWCi => RWC

Applicative versus normal order reduction

Unlike in imperative languages, in λ-calculus, the order in which these operations are performed is undefined. For example when a function has an argument which contains an expression which could be simplified by β-reduction, we can choose which β-reduction to apply first. Consider the following expression:

(λi.ij (λx.xy G))

Here we apply β-reduction to the argument first. This is called applicative order reduction:

(λi.ij (λx.xy G)) => (λi.ij Gy) => Gyj

Here we apply β-reduction to the left-most function first. This is called normal order reduction:

(λi.ij (λx.xy G)) => (λx.xy G)j => Gyj

Church and Rosser showed that all evaluation orders of an expression in λ-calculus result in the same value.

This property of λ-calculus and functional programming languages in general is called execution order independence. Execution order independence enables the parallel execution of many lambda calculus expressions.

Making decisions

In order to model general computation, we need a way to choose from two alternatives. In order to do this, we introduce the select_first and select_second functions:

select_first = λfirst.λsecond.first
select_second = λfirst.λsecond.second

select_first consumes one argument, and then a second, and discards the second argument:

select_first A B
=> ((λfirst.λsecond.first A) B
=> (λsecond.A B)
=> A

select_second chooses the second argument:

select_second A B
=> ((λfirst.λsecond.second A) B
=> (λsecond.second B)
=> B

Consider the structure of an if-then statement in an imperative language.

if condition then A else B

If the condition is true, then we want to evaluate the first expression, A, and if the condition is false, then we want to evaluate the second expression, B.

But select_first has the behavior of evaluating the first expression!

select_first A B => A

We can rename select_first to true:

true A B => A

This performs a behavior equivalent to:

if true then A else B

We can similarly use select_second to represent false:

false A B => B

This performs a behavior equivalent to:

if false then A else B

Cond

Then the whole if-then statement can be expressed as a function cond that takes another function, c, which chooses either the expression a or the expression b:

cond = λa.λb.λc.((c a) b)

Let’s evaluate cond with true as an argument:

cond A B true
=> λa.λb.λc.((c a) b) A B λfirst.λsecond.first
=> λb.λc.((c A) b) B λfirst.λsecond.first
=> λc.((c A) B) λfirst.λsecond.first
=> ((λfirst.λsecond.first A) B)
=> (λsecond.A B)
=> A

It’s easy to see how passing false to cond will produce the desired behavior of evaluating to B.

Not

Not can be expressed as:

if condition then false else true

So we can simply apply false and true to cond to get a definition for not:

not = λx.(((cond false) true) x)

Applying not to true, we get:

not true
=> λx.(((cond false) true) x) true
=> (((cond false) true) true)
=> λa.λb.λc.((c a) b) false true true
=> λb.λc.((c false) b) true true
=> λc.((c false) true) true
=> ((true false) true)
=> ((λfirst.λsecond.first false) true)
=> (λsecond.false true)
=> false

And

And can be expressed as:

if A then B else false

If A is true then the expression is true if B is true, otherwise it’s false. If A is false, then it doesn’t matter what value B is, the expression is false.

and = λx.λy.((x y) false)

Applying true and true to this function we get:

and true true
=> ((λx.λy.((x y) false) true) true)
=> (λy.((true y) false) true)
=> ((true true) false)
=> ((λfirst.λsecond.first true) false)
=> (λsecond.true false)
=> true

Or

Or can be expressed as:

if A then true else B

If A is true then it doesn’t matter what B is, the expression is true. If A is false, then the expression is true if B is true, otherwise it’s false.

or = λx.λy.(((cond true) y) x)

Let’s apply false and true to or:

or false true
=> ((λx.λy.(((cond true) y) x) false) true)
=> (λy.(((cond true) y) false) true)
=> (((cond true) true) false)
=> (((λa.λb.λc.((c a) b) true) true) false)
=> ((λb.λc.((c true) b) true) false)
=> (λc.((c true) true) false)
=> ((false true) true)
=> ((λfirst.λsecond.second true) true)
=> (λsecond.second true)
=> true

Conclusion

At this point, you have a good enough grasp of the concepts of λ-calculus to make simple, non-recursive functions which act on Boolean values.

Next time we’ll cover how to represent numbers in the λ-calculus and how to perform recursion using the Y-combinator!

You can continue reading part 2 of this series here.

A side note on computability

A function which can be evaluated in the λ-calculus with a finite number of operations is called λ-computable. All λ-computable functions are computable on a Turing Machine and all Turing computable functions are λ-computable (See Church-Turing thesis). As a result, Turing Machines and the λ-calculus are equivalent in terms of what kinds of functions they can evaluate. Because of this equivalence, any problem which can be solved efficiently with an imperative programming language can also be solved efficiently with a functional programming language.

Formal syntax of λ-calculus

<expression> := <name> | <function> | <application>
<function> := λ<name>.<expression>
<application> := (<function> <expression>)

References

An Introduction to Functional Programming Through Lambda Calculus by Greg Michaelson

Lambda Calculus at Wikipedia

The Lambda Calculus at Stanford Encyclopedia of Philosophy

Haskell theoretical foundations – Lambda calculus

Normal, Applicative and Lazy Evaluation