One of the most common optimizations performed by compilers is the optimization of variable lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup-variable-value operation of the evaluator machine. This searches for a variable by comparing it with each variable that is currently bound, working frame by frame outward through the run-time environment. This search can be expensive if the frames are deeply nested or if there are many variables. For example, consider the problem of looking up the value of x while evaluating the expression (* x y z) in an application of the procedure that is returned by
(let ((x 3) (y 4))
(lambda (a b c d e)
(let ((y (* a b x))
(z (+ c d x)))
(* x y z))))
Since a let expression is just syntactic sugar for a
lambda combination, this expression is equivalent to
((lambda (x y)
(lambda (a b c d e)
((lambda (y z) (* x y z))
(* a b x)
(+ c d x))))
3
4)
Each time lookup-variable-value searches for x, it must
determine that the symbol x is not eq? to y or
z (in the first frame), nor to a, b, c, d, or
e (in the second frame). We will assume, for the moment, that
our programs do not use define--that variables are
bound only with lambda. Because our language is
lexically
scoped, the run-time environment for any expression will have a
structure that parallels the lexical structure of the program in which
the expression appears.
Thus, the compiler can know, when it analyzes the
above expression, that each time the procedure is applied the variable
x in (* x y z) will be found two frames out from the
current frame and will be the first variable in that frame.
We can exploit this fact by inventing a new kind of variable-lookup operation, lexical-address-lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables to pass over in that frame. Lexical-address-lookup will produce the value of the variable stored at that lexical address relative to the current environment. If we add the lexical-address-lookup operation to our machine, we can make the compiler generate code that references variables using this operation, rather than lookup-variable-value. Similarly, our compiled code can use a new lexical-address-set! operation instead of set-variable-value!.
In order to generate such code, the compiler must be able to determine
the lexical address of a variable it is about to compile a reference
to. The lexical address of a variable in a program depends on where
one is in the code. For example, in the following program, the
address of x in expression e1 is (2,0)--two frames back
and the first variable in the frame. At that point y is at
address (0,0) and c is at address (1,2). In expression
e2,
x is at (1,0),
y is at (1,1), and
c is at (0,2).
((lambda (x y)
(lambda (a b c d e)
((lambda (y z) e1)
e2
(+ c d x))))
3
4)
One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which variables will be at which positions in which frames in the run-time environment when a particular variable-access operation is executed. The compile-time environment is a list of frames, each containing a list of variables. (There will of course be no values bound to the variables, since values are not computed at compile time.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses an empty compile-time environment. When a lambda body is compiled, compile-lambda-body extends the compile-time environment by a frame containing the procedure's parameters, so that the sequence making up the body is compiled with that extended environment. At each point in the compilation, compile-variable and compile-assignment use the compile-time environment in order to generate the appropriate lexical addresses.
Exercises
through
describe how to complete this sketch of
the lexical-addressing strategy in order to incorporate lexical lookup
into the compiler.
Exercise
describes another use for the
compile-time environment.
Exercise.
Write a procedure lexical-address-lookup that implements the new
lookup operation. It should take two arguments--a lexical address
and a run-time environment--and return the value of the variable
stored at the specified lexical address. Lexical-address-lookup
should signal an error if the value of the variable is the symbol
*unassigned*.
Also write a procedure lexicaladdressset! that
implements the operation that changes the value of the variable at a
specified lexical address.
Exercise. Modify the compiler to maintain the compile-time environment as described above. That is, add a compile-time-environment argument to compile and the various code generators, and extend it in compile-lambda-body.
Exercise. Write a procedure find-variable that takes as arguments a variable and a compile-time environment and returns the lexical address of the variable with respect to that environment. For example, in the program fragment that is shown above, the compile-time environment during the compilation of expression e1 is ((y z) (a b c d e) (x y)). Find-variable should produce
(find-variable 'c '((y z) (a b c d e) (x y))) (1 2)(find-variable 'x '((y z) (a b c d e) (x y))) (2 0)
(find-variable 'w '((y z) (a b c d e) (x y))) not-found
Exercise.
Using find-variable from exercise
,
rewrite compile-variable and compile-assignment to output
lexical-address instructions. In cases where find-variable
returns not-found (that is, where the variable is not in the
compile-time environment), you should have the code generators use the
evaluator operations, as before, to search for the binding.
(The only place a variable that is not found at compile time can be is in
the global environment, which is part of the run-time environment but
is not part of the compile-time environment.
Thus, if you wish, you may have the evaluator operations look directly in
the global environment, which can be obtained with the operation
(op get-global-environment), instead of having them search the whole run-time
environment found in env.)
Test the modified compiler on a few simple cases, such as the nested
lambda combination at the beginning of this section.
Exercise.
We argued in section
that internal definitions
for block structure should not be considered ``real'' defines. Rather,
a procedure body should be interpreted as if the internal variables being
defined were installed as ordinary lambda variables initialized to their
correct values using set!. Section
and
exercise
showed how to modify the metacircular
interpreter to accomplish this by scanning out internal definitions. Modify
the compiler to perform the same transformation before it compiles a procedure
body.
Exercise.
In this section we have focused on the use of the compile-time
environment to produce lexical addresses. But there are other uses
for compile-time environments. For instance, in
exercise
we increased the efficiency of compiled
code by open-coding primitive procedures. Our implementation treated
the names of open-coded procedures as reserved words. If a program
were to rebind such a name, the mechanism described in
exercise
would still open-code it as a primitive,
ignoring the new binding. For example, consider the procedure
(lambda (+ * a b x y) (+ (* a x) (* b y)))which computes a linear combination of x and y. We might call it with arguments +matrix, *matrix, and four matrices, but the open-coding compiler would still open-code the + and the * in (+ (* a x) (* b y)) as primitive + and *. Modify the open-coding compiler to consult the compile-time environment in order to compile the correct code for expressions involving the names of primitive procedures. (The code will work correctly as long as the program does not define or set! these names.)