The * Lisp 1 Programmer's Manual* appeared in 1960, and the
* Lisp 1.5 Programmer's Manual* (McCarthy 1965) was published
in 1962. The early history of Lisp is described in McCarthy 1978.

The two dialects in which most major Lisp programs of the 1970s were written are MacLisp (Moon 1978; Pitman 1983), developed at the MIT Project MAC, and Interlisp (Teitelman 1974), developed at Bolt Beranek and Newman Inc. and the Xerox Palo Alto Research Center. Portable Standard Lisp (Hearn 1969; Griss 1981) was a Lisp dialect designed to be easily portable between different machines. MacLisp spawned a number of subdialects, such as Franz Lisp, which was developed at the University of California at Berkeley University of California at Berkeley, and Zetalisp (Moon 1981), which was based on a special-purpose processor designed at the MIT Artificial Intelligence Laboratory to run Lisp very efficiently. The Lisp dialect used in this book, called Scheme (Steele 1975), was invented in 1975 by Guy Lewis Steele Jr. and Gerald Jay Sussman of the MIT Artificial Intelligence Laboratory and later reimplemented for instructional use at MIT. Scheme became an IEEE standard in 1990 (IEEE 1990). The Common Lisp dialect (Steele 1982, Steele 1990) was developed by the Lisp community to combine features from the earlier Lisp dialects to make an industrial standard for Lisp. Common Lisp became an ANSI standard in 1994 (ANSI 1994).

The characterization of numbers as "simple data" is a barefaced bluff.
In fact, the treatment of numbers is one of the trickiest and most
confusing aspects of any programming language. Some typical issues
involved are these: Some computer systems distinguish *
integers*, such as 2, from * real numbers*, such as 2.71.
Is the real number 2.00 different from the integer 2? Are the
arithmetic operations used for integers the same as the operations
used for real numbers? Does 6 divided by 2 produce 3, or 3.0? How
large a number can we represent? How many decimal places of accuracy
can we represent? Is the range of integers the same as the range of
real numbers? Above and beyond these questions, of course, lies a
collection of issues concerning roundoff and truncation errors
the entire science of numerical analysis. Since our focus in this
book is on large-scale program design rather than on numerical
techniques, we are going to ignore these problems. The numerical
examples in this chapter will exhibit the usual roundoff behavior that
one observes when using arithmetic operations that preserve a limited
number of decimal places of accuracy in noninteger operations.

Throughout this book, notation in this book, when we wish to emphasize the distinction between the input typed by the user and the response printed by the interpreter, we will show the latter in slanted characters.

Lisp systems typically provide features to aid the user in formatting expressions. Two especially useful features are one that automatically indents to the proper pretty-print position whenever a new line is started and one that highlights the matching left parenthesis whenever a right parenthesis is typed.

Lisp obeys the convention that every expression has a value. This convention, together with the old reputation of Lisp as an inefficient language, is the source of the quip by Alan Perlis (paraphrasing Oscar Wilde) that "Lisp programmers know the value of everything but the cost of nothing."

In this book, we do not show the interpreter's response to evaluating definitions, since this is highly implementation-dependent.

Chapter 3 will show that this notion of environment is crucial, both for understanding how the interpreter works and for implementing interpreters.

It may seem strange that the evaluation rule says, as part of the
first step, that we should evaluate the leftmost element of a
combination, since at this point that can only be an operator such as
` +` or ` *` representing a built-in primitive procedure
such as addition or multiplication. We will see later that it is
useful to be able to work with combinations whose operators are
themselves compound expressions.

Special syntactic forms that are simply convenient alternative surface
structures for things that can be written in more uniform ways are
sometimes called * syntactic sugar*, to use a phrase coined by
Peter Landin. In comparison with users of other languages, Lisp
programmers, as a rule, are less concerned with matters of syntax.
(By contrast, examine any Pascal manual and notice how much of it is
devoted to descriptions of syntax.) This disdain for syntax is due
partly to the flexibility of Lisp, which makes it easy to change
surface syntax, and partly to the observation that many "convenient"
syntactic constructs, which make the language less uniform, end up
causing more trouble than they are worth when programs become large
and complex. In the words of Alan Perlis, "Syntactic sugar causes
cancer of the semicolon."

Observe that there are two different operations being combined here:
we are creating the procedure, and we are giving it the name `
square`. It is possible, indeed important, to be able to separate
these two notions to create procedures without naming them, and
to give names to procedures that have already been created. We will
see how to do this in section 1.3.2.

Throughout this book, we will describe the general syntax of expressions by using italic symbols delimited by angle brackets e.g., name to denote the "slots" in the expression to be filled in when such an expression is actually used.

More generally, the body of the procedure can be a sequence of expressions. In this case, the interpreter evaluates each expression in the sequence in turn and returns the value of the final expression as the value of the procedure application.

Despite the simplicity of the substitution idea, it turns out to be
suprisingly complicated to give a rigorous mathematical definition of
the substitution process. The problem arises from the possibility of
confusion between the names used for the formal parameters of a
procedure and the (possibly identical) names used in the expressions
to which the procedure may be applied. Indeed, there is a long history
of erroneous definitions of *substitution* in the literature of
logic and programming semantics. See Stoy 1977 for a careful
discussion of substitution.

In chapter 3 we will introduce * stream processing*, which is a
way of handling apparently "infinite" data structures by incorporating
a limited form of normal-order evaluation. In section 4.2 we will
modify the Scheme interpreter to produce a normal-order variant of
Scheme.

"Interpreted as either true or false" means this: In Scheme, there are
two distinguished values that are denoted by the constants `
#t` and ` #f`. When the interpreter checks a predicate's
value, it interprets ` #f` as false. Any other value is
treated as true. (Thus, providing ` #t` is logically
unnecessary, but it is convenient.) In this book we will use names
` true` and ` false`, which are associated with the
values ` #t` and ` #f` respectively.

` Abs` also uses the "minus" operator ` -`, which, when
used with a single operand, as in ` (- x)`, indicates
negation.

A minor difference between ` if` and ` cond` is that the
e part of each ` cond` clause may be a sequence of expressions.
If the corresponding *<**p**>* is found to be
true, the expressions *<**e**>* are evaluated in
sequence and the value of the final expression in the sequence is
returned as the value of the ` cond`. In an ` if`
expression, however, the *<**consequent**>* and
*<**alternative**>* must be single expressions.

Declarative and imperative descriptions are intimately related, as indeed are mathematics and computer science. For instance, to say that the answer produced by a program is correctness of a program "correct" is to make a declarative statement about the program. There is a large amount of research aimed at establishing techniques for proving programs correct proving that programs are correct, and much of the technical difficulty of this subject has to do with negotiating the transition between imperative statements (from which programs are constructed) and declarative statements (which can be used to deduce things). In a related vein, an important current area in programming-language design is the exploration of so-called very high-level languages, in which one actually programs in terms of declarative statements. The idea is to make interpreters sophisticated enough so that, given "what is" knowledge specified by the programmer, they can generate "how to" knowledge automatically. This cannot be done in general, but there are important areas where progress has been made. We shall revisit this idea in chapter 4.

This square-root algorithm is actually a special case of Newton's method, which is a general technique for finding roots of equations. The square-root algorithm itself was developed by Heron of Heron of Alexandria in the first century A.D. We will see how to express the general Newton's method as a Lisp procedure in section 1.3.4.

We will usually give predicates names ending with question marks, to help us remember that they are predicates. This is just a stylistic convention. As far as the interpreter is concerned, the question mark is just an ordinary character.

Observe that we express our initial guess as 1.0 rather than 1. This
would not make any difference in many Lisp implementations. MIT
Scheme, however, distinguishes between exact integers and decimal
values, and dividing two integers produces a rational number rather
than a decimal. For example, dividing 10 by 6 yields 5/3, while
dividing 10.0 by 6.0 yields 1.6666666666666667. (We will learn how to
implement arithmetic on rational numbers in section 2.1.1.) If we start with an initial guess of
1 in our square-root program, and *x* is an exact integer, all
subsequent values produced in the square-root computation will be
rational numbers rather than decimals. Mixed operations on rational
numbers and decimals always yield decimals, so starting with an
initial guess of 1.0 forces all subsequent values to be decimals.

Readers who are worried about the efficiency issues involved in using procedure calls to implement iteration should note the remarks on "tail recursion" in section 1.2.1.

It is not even clear which of these procedures is a more efficient implementation. This depends upon the hardware available. There are machines for which the "obvious" implementation is the less efficient one. Consider a machine that has extensive tables of logarithms and antilogarithms stored in a very efficient manner.

The concept of consistent renaming is actually subtle and difficult to define formally. Famous logicians have made embarrassing errors here.

Lexical scoping dictates that free variables in a procedure are taken to refer to bindings made by enclosing procedure definitions; that is, they are looked up in the environment in which the procedure was defined. We will see how this works in detail in chapter 3 when we study environments and the detailed behavior of the interpreter.

Embedded definitions must come first in a procedure body. The management is not responsible for the consequences of running programs that intertwine definition and use.

In a real program we would probably use the
block structure introduced in the last section to hide the definition
of ` fact-iter`:

We avoided doing this here so as to minimize the number of things to think about at once.(efine (factorial n) (define (iter product counter) (if (> counter n) product (iter ( counter product) (+ counter 1)))) (iter 11))

When we discuss the implementation of
procedures on register machines in chapter 5, we will see that any
iterative process can be realized "in hardware" as a machine that
has a fixed set of registers and no auxiliary memory. In contrast,
realizing a recursive process requires a machine that uses an
stack auxiliary data structure known as a * stack*.

Tail recursion has long been known as a compiler optimization trick. A coherent semantic basis for tail recursion was provided by Carl Hewitt (1977), who explained it in message passing[tail recursion and] terms of the "message-passing" model of computation that we shall discuss in chapter 3. Inspired by this, Gerald Jay Sussman and Guy Lewis Steele Jr. (see Steele 1975) constructed a tail-recursive interpreter for Scheme. Steele later showed how tail recursion is a consequence of the natural way to compile procedure calls (Steele 1977). The IEEE standard for Scheme requires that Scheme implementations be tail-recursive.

An example of this was hinted at in section 1.1.3: The interpreter itself evaluates expressions using a tree-recursive process.

For example, work through in detail how the reduction rule applies to the problem of making change for 10 cents using pennies and nickels.

One approach to coping with redundant computations is to arrange matters
so that we automatically construct a table of values as they
are computed. Each time we are asked to apply the procedure to some
argument, we first look to see if the value is already stored in the
table, in which case we avoid performing the redundant computation.
This strategy, known as * tabulation* or * memoization*, can be
implemented in a straightforward way. Tabulation can sometimes be
used to transform processes that require an exponential number of
steps (such as ` count-change`) into processes whose space and time
requirements grow linearly with the input. See exercise 3.27.

These elements of Pascal's triangle are called the *binomial
coefficients*, because the *n*th row consists of the
coefficients of the terms in the expansion of (*x* +
*y*)^{n}. This pattern for computing
the coefficients appeared in Blaise Pascal's 1653 seminal work on
probability theory, *Traité du triangle
arithmétique*. According to Knuth (1973), the same pattern
appears in *Szu-yuen Yü-chien*("The Precious Mirror of the
Four Elements"), published by the Chinese mathematician Chu Shih-chieh
in 1303, in the works of the twelfth-century Persian poet and
mathematician Omar Khayyam, and in the works of the twelfth-century
Hindu mathematician Bháscara Áchárya.

These statements mask a great deal of oversimplification. For instance, if we count process steps as "machine operations" we are making the assumption that the number of machine operations needed to perform, say, a multiplication is independent of the size of the numbers to be multiplied, which is false if the numbers are sufficiently large. Similar remarks hold for the estimates of space. Like the design and description of a process, the analysis of a process can be carried out at various levels of abstraction.

More precisely, the number of multiplications required is equal to 1
less than the log base 2 of *n* plus the number of ones in the
binary representation of *n*. This total is always less than
twice the log base 2 of *n*. The arbitrary constants
*k*_{1} and *k*_{2} in the definition of
order notation imply that, for a logarithmic process, the base to
which logarithms are taken does not matter, so all such processes are
described as _{}.

You may wonder why anyone would care about raising numbers to the 1000th power. See section 1.2.6.

39

This iterative algorithm is ancient. It appears in the *
Chandah-sutra* by Àchàrya Pingala, written before
200 B.C. See Knuth 1981, section 4.6.3, for a full discussion and
analysis of this and other methods of exponentiation.

40

This algorithm, which is sometimes known as the "Russian peasant method" of multiplication, is ancient. Examples of its use are found in the Rhind Papyrus, one of the two oldest mathematical documents in existence, written about 1700 B.C. (and copied from an even older document) by an Egyptian scribe named A'h-mose.

41

This exercise was suggested to us by Joe Stoy, based on an example in Kaldewaij, 1990.

Euclid's Algorithm is so called because it appears in Euclid's *
Elements* (Book 7, ca. 300 B.C.). According to Knuth (1973), it
can be considered the Knuth, Donald E. oldest known nontrivial
algorithm. The ancient Egyptian method of multiplication (exercise 1.18) is surely older, but, as
Knuth explains, Euclid's algorithm is the oldest known to have been
presented as a general algorithm, rather than as a set of illustrative
examples.

This theorem was proved in 1845 by Gabriel Lamé, a French
mathematician and engineer known chiefly for his contributions to
mathematical physics. To prove the theorem, we consider pairs
(*a _{k}, b_{k}*), where

If *d* is a divisor of *n*, then so is *n/d*.
But *d* and *n/d* cannot both be greater than
.

Pierre de Fermat (16011665) is considered to be the founder of
modern number theory. He obtained many important number-theoretic
results, but he usually announced just the results, without providing
his proofs. Fermat's Little Theorem was stated in a letter he wrote
in 1640. The first published proof was given by Euler in 1736 (and an
earlier, identical proof was discovered in the unpublished manuscripts
of Leibniz). The most famous of Fermat's results known as
Fermat's Last Theorem was jotted down in 1637 in his copy of
the book * Arithmetic* (by the third-century Greek
mathematician Diophantus) with the remark "I have discovered a truly
remarkable proof, but this margin is too small to contain it."
Finding a proof of Fermat's Last Theorem became one of the most famous
challenges in number theory. A complete solution was finally given in
1995 by Andrew Wiles of Princeton University.

The reduction steps in the cases where the exponent *e*x, *y*, and *m*, we can find the remainder of
*x* times *y* modulo *m* by computing separately
the remainders of *x* modulo *m* and *y* modulo
*m*, multiplying these, and then taking the remainder of the
result modulo *m*. For instance, in the case where *e*
is even, we compute the remainder of *b ^{e/2}* modulo

Numbers that fool the Fermat test are called * Carmichael
numbers*, and little is known about them other than that they are
extremely rare. There are 255 Carmichael numbers below 100,000,000.
The smallest few are 561, 1105, 1729, 2465, 2821, and 6601. In
testing primality of very large numbers chosen at random, the chance
of stumbling upon a value that fools the Fermat test is less than the
chance that cosmic radiation cosmic radiation will cause the computer
to make an error in carrying out a "correct" algorithm. Considering
an algorithm to be inadequate for the first reason but not for the
second illustrates the difference between mathematics and engineering.

One of the most striking applications of cryptography probabilistic
prime testing has been to the field of cryptography. Although it is
now computationally infeasible to factor an arbitrary 200-digit
number, the primality of such a number can be checked in a few seconds
with the Fermat test. This fact forms the basis of a technique for
constructing "unbreakable codes" suggested by Rivest, Shamir, and
Adelman (1977). The resulting * RSA algorithm* has become a
widely used technique for enhancing the security of electronic
communications. Because of this and related developments, the study
of prime numbers, once considered the epitome of a topic in "pure"
mathematics to be studied only for its own sake, now turns out to have
important practical applications to cryptography, electronic funds
transfer, and information retrieval.

This series, usually written in the equivalent form _{} is due to Leibniz. We'll see how to use
this as the basis for some fancy numerical tricks in section 3.5.3.

Notice that we have used block structure (section 1.1.8) to embed the
definitions of ` pi-next` and ` pi-term` within `
pi-sum`, since these procedures are unlikely to be useful for any
other purpose. We will see how to get rid of them altogether in
section 1.3.2.

The intention of exercises 1.31 1.33 is to demonstrate the
expressive power that is attained by using an appropriate abstraction
to consolidate many seemingly disparate operations. However, though
accumulation and filtering are elegant ideas, our hands are somewhat
tied in using them at this point since we don't yet have data
structures to provide suitable means of combination for these
abstractions. We will return to these ideas in section 2.2.3 when we
show how to use *sequences* as interfaces for combining filters
and accumulators to build even more powerful abstractions. We will see
there how these methods really come into their own as a powerful and
elegant approach to designing programs.

This formula was discovered by the seventeenth-century English mathematician John Wallis.

It would be clearer and less intimidating to people learning Lisp if a
name more obvious than ` lambda`, such as `
make-procedure`, were used. But the convention is firmly
entrenched. The notation is adopted from the
calculus a mathematical formalism introduced by the mathematical
logician Alonzo Church (1941). Church developed the calculus to provide a rigorous foundation for
studying the notions of function and function application. The calculus has become a basic tool for mathematical
investigations of the semantics of programming languages.

Understanding internal definitions well enough to be sure a program means what we intend it to mean requires a more elaborate model of the evaluation process than we have presented in this chapter. The subtleties do not arise with internal definitions of procedures, however. We will return to this issue in section 4.1.6, after we learn more about evaluation.

We have used 0.001 as a representative "small" number to indicate a tolerance for the acceptable error in a calculation. The appropriate tolerance for a real calculation depends upon the problem to be solved and the limitations of the computer and the algorithm. This is often numerical analyst a very subtle consideration, requiring help from a numerical analyst or some other kind of magician.

This can be accomplished using ` error`, which takes as
arguments a number of items that are printed as error messages.

Try this during a boring lecture: Set your calculator to radians mode and then repeatedly press the cos button until you obtain the fixed point.

(pronounced "maps to") is the mathematician's
way of writing ` lambda`. _{} means `
(lambda(y) (/ x y))`, that is, the function whose value at
*y* is *x/y*.

Observe that this is a combination whose operator is itself a combination. Exercise 1.4 already demonstrated the ability to form such combinations, but that was only a toy example. Here we begin to see the real need for such combinations when applying a procedure that is obtained as the value returned by a higher-order procedure.

See exercise 1.45 for a further generalization.

Elementary calculus books usually describe Newton's method in terms
of the sequence of approximations *x _{n+1} = x_{n}
- g(x_{n}) / Dg(x_{n})*. Having language for talking
about processes and using the idea of fixed points simplifies the
description of the method.

Newton's method does not always converge to an answer, but it can be shown that in favorable cases each iteration doubles the number-of-digits accuracy of the approximation to the solution. In such cases, Newton's method will converge much more rapidly than the half-interval method.

For finding square roots, Newton's method converges rapidly to the correct solution from any starting point.

The notion of first-class status of programming-language Strachey, Christopher elements is due to the British computer scientist Christopher Strachey (19161975).

We'll see some examples of this after we introduce data structures in chapter 2.

The major implementation cost of first-class procedures is that allowing procedures to be returned as values requires reserving storage for a procedure's free variables even while the procedure is not executing. In the Scheme implementation we will study in section 4.1, these variables are stored in the procedure's environment.