What Is Meant by Data?

We began the rational-number implementation in
section by implementing the rational-number
operations `add-rat`, `sub-rat`, and so on in terms of three
unspecified procedures: `make-rat`, `numer`, and `denom`.
At that point, we could think of the operations as being defined in
terms of data objects--numerators, denominators, and rational
numbers--whose behavior was specified by the latter three procedures.

But exactly what is meant by *data*? It is not enough to say
``whatever is implemented by the given selectors and constructors.''
Clearly, not every arbitrary set of three procedures can serve as an
appropriate basis for the rational-number implementation. We need to
guarantee that,
if we construct a rational number `x` from a pair
of integers `n` and `d`, then extracting the `numer` and the
`denom` of `x` and dividing them should yield the same result
as dividing `n` by `d`. In other words, `make-rat`,
`numer`, and `denom` must satisfy the condition that, for any
integer `n` and any non-zero integer `d`, if `x`
,
then

In fact, this is the only condition `make-rat`, `numer`, and
`denom` must fulfill in order to form a suitable basis for a
rational-number representation. In general, we can think of data as
defined by some collection of selectors and constructors, together
with specified conditions that these procedures must fulfill in order
to be a valid representation.^{}

This point of view can serve to define not only ``high-level'' data
objects, such as rational numbers, but lower-level objects as well.
Consider the notion of a pair, which we used in order to define our
rational numbers. We never actually said what a pair was, only that
the language supplied procedures `cons`, `car`, and `cdr`
for operating on pairs. But the only thing we need to know about
these three operations
is that if we glue two objects together using
`cons` we can retrieve the objects using `car` and `cdr`.
That is, the operations satisfy the condition that, for any objects
`x` and `y`, if `z` is `(cons x y)` then `(car z)`
is `x` and `(cdr z)` is `y`. Indeed, we mentioned that
these three procedures are included as primitives in our language.
However, any triple of procedures that satisfies the above condition
can be used as the basis for implementing pairs. This point is
illustrated strikingly by the fact that we could implement `cons`,
`car`, and `cdr` without using any data structures at all but
only using procedures. Here are the definitions:

(define (cons x y) (define (dispatch m) (cond ((= m 0) x) ((= m 1) y) (else (error "Argument not 0 or 1 - CONS" m)))) dispatch)This use of procedures corresponds to nothing like our intuitive notion of what data should be. Nevertheless, all we need to do to show that this is a valid way to represent pairs is to verify that these procedures satisfy the condition given above.(define (car z) (z 0))

(define (cdr z) (z 1))

The subtle point to notice is that the value returned by `(cons x
y)` is a procedure--namely the internally defined procedure `
dispatch`, which takes one argument and returns either `x` or `
y` depending on whether the argument is 0 or 1. Correspondingly, `
(car z)` is defined to apply `z` to 0. Hence, if `z` is the
procedure formed by `(cons x y)`, then `z` applied to 0 will
yield `x`. Thus, we have shown that `(car (cons x y))` yields
`x`, as desired. Similarly, `(cdr (cons x y))` applies the
procedure returned by `(cons x y)` to 1, which returns `y`.
Therefore, this procedural implementation of pairs is a valid
implementation, and if we access pairs using only `cons`, `
car`, and `cdr` we cannot distinguish this implementation from one
that uses ``real'' data structures.

The point of exhibiting the procedural representation of pairs is not
that our language works this way (Scheme, and Lisp systems in general,
implement pairs directly, for efficiency reasons) but that it could
work this way. The procedural representation, although obscure, is a
perfectly adequate way to represent pairs, since it fulfills the only
conditions that pairs need to fulfill. This example also demonstrates
that the ability to manipulate procedures as objects automatically
provides the ability to represent compound data. This may seem a
curiosity now, but procedural representations of data will play a
central role in our programming repertoire. This style of programming
is often called
*message passing*, and we will be using it as a
basic tool in chapter 3 when we address the issues of modeling and
simulation.

**Exercise.**
Here is an alternative procedural representation of pairs. For this
representation, verify that `(car (cons x y))` yields `x` for
any objects `x` and `y`.

(define (cons x y) (lambda (m) (m x y)))What is the corresponding definition of(define (car z) (z (lambda (p q) p)))

**Exercise.**
Show that we can represent pairs of nonnegative integers using only
numbers and arithmetic operations if we represent the pair *a* and *b*
as the integer that is the product 2^{a} 3^{b}. Give the corresponding
definitions of the procedures `cons`, `car`, and `cdr`.

**Exercise.**
In case representing pairs as procedures wasn't mind-boggling enough,
consider that, in a language that can manipulate procedures, we can
get by without numbers (at least insofar as nonnegative integers are
concerned) by implementing 0 and the operation of adding 1 as

(define zero (lambda (f) (lambda (x) x)))This representation is known as(define (add-1 n) (lambda (f) (lambda (x) (f ((n f) x)))))

Define `one` and `two` directly (not in terms of `zero`
and `add-1`). (Hint: Use substitution to evaluate `(add-1 zero)`).
Give a direct definition of the addition procedure `+` (not in
terms of repeated application of `add-1`).