Today's Computers, Intelligent Machines and Our Future Hans Moravec Stanford University July 21, 1976 this version 1978 Introduction: The unprecedented opportunities for experiments in complexity presented by the first modern computers in the late 1940's raised hopes in early computer scientists (eg. John von Neumann and Alan Turing) that the ability to think, our greatest asset in our dealings with the world, might soon be understood well enough to be duplicated. Success in such an endeavor would extend mankind's mind in the same way that the development of energy machinery extended his muscles. In the thirty years since then computers have become vastly more capable, but the goal of human performance in most areas seems as elusive as ever, in spite of a great deal of effort. The last ten years, in particular, has seen thousands of people years devoted directly to the problem, referred to as Artificial Intelligence or AI. Attempts have been made to develop computer programs which do mathematics, computer programming and common sense reasoning, are able to understand natural languages and interpret scenes seen through cameras and spoken language heard through microphones and to play games humans find challenging. There has been some progress. Samuel's checker program can occasionally beat checker champions. Chess programs regularly play at good amateur level, and in March 1977 a chess program from Northwestern University, running on a CDC Cyber-176 (which is about 20 times as fast as previous computers used to play chess) won the Minnesota Open Championship, against a slate of class A and expert players. A ten year effort at MIT has produced a system, Mathlab, capable of doing symbolic algebra, trigonometry and calculus operations better in many ways than most humans experienced in those fields. Programs exist which can understand English sentences with restricted grammar and vocabulary, given the letter sequence, or interpret spoken commands from hundred word vocabularies. Some can do very simple visual inspection tasks, such as deciding whether or not a screw is at the end of a shaft. The most difficult tasks to automate, for which computer performance to date has been most disappointing, are those that humans do most naturally, such as seeing, hearing and common sense reasoning. A major reason for the difficulty has become very clear to me in the course of my work on computer vision. It is simply that the machines with which we are working are still a hundred thousand to a million times too slow to match the performance of human nervous systems in those functions for which humans are specially wired. This enormous discrepancy is distorting our work, creating problems where there are none, making others impossibly difficult, and generally causing effort to be misdirected. In the early days of AI the thought that existing machines might be much too small was widespread, but people hoped that clever mathematics and advancing computer technology could soon make up the difference. The idea that available compute power might still be vastly inadequate has since been swept under the rug, due to wishful thinking and a feeling that there was nothing to be done about it anyway and that voicing such an opinion could cause AI to be considered impractical, resulting in reduced funding. This attitude has had some bad effects, one of them being that AI research has been centered on computers less powerful than absolutely necessary. The first section of this essay discusses natural intelligence. It notes two major branches of the animal kingdom in which intelligence evolved independently, and suggests that it is easier to construct than is sometimes assumed. The second part compares the information processing ability of present computers with intelligent nervous systems. The factor of one million is derived in two different ways. Section three examines the development of electronics, and concludes the state of the art can provide more power than is now available, and that the one million gap could be closed in ten years. Part four introduces some hardware and software aspects of a system which would be able to make use of the advancing technology, providing a means for achieving human equivalence, perhaps by the next decade. Part five considers the implications of the emergence of intelligent machines, and concludes that they are the final step in a revolution in the nature of life. Classical evolution based on DNA, random mutations and natural selection may be completely replaced by the much faster process of intelligence mediated cultural and technological evolution. Section 1: The Natural History of Intelligence Product lines: Natural evolution has produced a continuum of complexities of behavior, from the mechanical simplicity of viruses to the magic of mammals. In the higher animals most of the complexity resides in the nervous system. Evolution of the brain began in early multi-celled animals a billion years ago with the development of cells capable of transmitting electrochemical signals. Because neurons are more localized than hormones they allow a greater variety of signals in a given volume. They also provide evolution with a more uniform medium for experiments in complexity. The advantages of implementing behavioral complexity in neural nets seem to have been overwhelming, since all modern animals more than a few cells in size have them [animal refs.]. Two major branches in the animal kingdom, vertebrates and mollusks, contain species which can be considered intelligent. Both stem from one of the earliest multi-celled organisms, an animal something like a hydra made of a double layer of cells and possessing a primitive nerve net. Most mollusks are intellectually unimpressive sessile shellfish, but one branch, the cephalopods, possesses high mobility, large brains and imaging eyes. These structures evolved independently of the corresponding equipment in vertebrates and there are fascinating differences. The optic nerve connects to the back of the retina, so there is no blind spot. The brain is annular, forming a ring encircling the esophagus. The circulatory system, also independently evolved, has three blood pumps, a systemic heart pumping oxygenated blood to the tissues and two gill hearts, each pumping venous blood to one gill. The oxygen carrier is a green copper compound called hemocyanin, evolved from an earlier protein that also became hemoglobin. These animals have some unique abilities. Shallow water octopus and squid are covered by a million individually controlled color changing effectors called chromatophores, whose functions are camouflage and communication. The capabilities of this arrangement have been demonstrated by a cuttlefish accurately imitating a checkerboard it was placed upon, and an octopus in flight which produced a pattern like the seaweed it was traversing, coruscating backward along the length of its body, diverting the eye from the true motion. Deep sea squid have photophores capable of generating large quantities of multicolored light. Some are as complex as eyes, containing irises and lenses [squid]. The light show is modulated by emotions in major and subtle ways. There has been little study of these matters, but this must provide means of social interaction. Since they also have good vision, there is the potential for high bandwidth communication. Cephalopod intelligence has not been extensively investigated, but a few controlled experiments indicate rapid learning in small octopus [Boycott]. The Cousteau film in the references shows an octopus' response to a problem requiring a two stage solution. A fishbowl containing a lobster is sealed with a cork and dropped into the water near it. The octopus is attracted, and spends a long while alternately probing the container in various ways and returning to its lair in iridescent frustration. On the final iteration it exits its little hole in the ground and unhesitatingly wraps three tentacles around the bowl, and one about the cork, and pulls. The cork shoots to the surface and the octopus eats. The Time-Life film contains a similar sequence, with a screw top instead of a cork! If small octopus have almost mammalian behavior, what might giant deep sea squid be capable of? The behavior of these large brained, apparently shy, animals has virtually never been observed. Birds are more closely related to humans than are cephalopods, their common ancestor with us being a 300 million year old early reptile. Size-limited by the dynamics of flying, some birds have reached an intellectual level comparable to the highest mammals. Crows and ravens are notable for frequently outwitting people. Their intuitive number sense (ability to perceive the cardinality of a set without counting) extends to seven, as opposed to three or four in us. Such a sense is useful for keeping track of the number of eggs in a nest. Experiments have shown [Stettner] that most birds are more capable of high order "reversal" and "learning set" learning than all mammals except the higher primates. In mammals these abilities increase with increasing cerebral cortex size. In birds the same functions depend on areas not present in mammalian brains, forebrain regions called the "Wulst" and the hyperstriatum. The cortex is small and relatively unimportant. Clearly this is another case of independent evolution of similar mental functions. Penguins, now similar to seals in behavior and habitat, might be expected to become fully aquatic, and evolve analogously to the great whales. The cetaceans are related to us through a small 30 million year old primitive mammal. Some species of dolphin have body and brain masses identical to ours, and archaeology reveals they have been this way several times as long. They are as good as us at many kinds of problem solving, and perhaps at language. The references contain many anecdotes, and describe a few controlled experiments, showing that dolphins can grasp and communicate complex ideas. Killer whales have brains seven times human size, and their ability to formulate plans is better than the dolphins', on whom they occasionally feed. Sperm whales, though not the largest animals, have the world's largest brains. There may be intelligence mediated conflict with large squid, their main food supply. Elephants have brains about five times human size, matriarchal tribal societies, and complex behavior. Indian domestic elephants usually learn 500 commands, limited by the range of tractor-like tasks their owners need done, and form voluntary mutual benefit relationships with their trainers, exchanging labor for baths. They can solve problems such as how to sneak into a plantation at night to steal bananas, after having been belled (answer: stuff mud into the bells). And they remember for decades. Inconvenience and cost has prevented more elephant research. The apes are our cousins. Chimps and gorillas can learn to use tools and to communicate with human sign languages at a retarded level. As chimps have one third, and gorillas one half, human brainsize, similar results should be achievable with the larger brained, but less human-like animals. Though no other species has managed to develop a technological culture, it may be that some of them can be made partners in ours, accelerating its evolution with their unique capabilities. Nervous System Size and Intelligence A feature shared by all living organisms whose behavior is complex enough to indicate near-human intelligence is a nervous system of a hundred billion neurons. Imaging vision requires a billion neurons. A million brain cells usually permits fast and interesting, but stereotyped, behavior as in a bee. A thousand is adequate for slow moving animals with minimal sensory input, such as slugs and worms. A hundred runs most sessile animals. The portions of nervous systems for which tentative wiring diagrams have been obtained (eg. much of the brain of the large neuroned sea slug, Aplysia, the flight controller of the locust and the early stages of some vertebrate visual systems) reveal that the neurons are configured into efficient, clever, assemblies. This should not be surprising, as unnecessary redundancy means unnecessary metabolic load, a distinct selective disadvantage. Time before present Representative Creatures Significant events 0 (you are here) | | | | | computers massive technology 2.5 million years | | | | | | 10 | | | | elephants | tool use | | | whales | primates 40 | | | | | | | | | | | | 90 octopus squid | | | | | | | +-----+-----+ 160 +---+---+ birds mammals | | | learned behavior 250 early squid +------+------+ warm bloodedness | reptiles 360 | | cephalopods fish | 490 | | amphibians land vertebrates +---+ +----+---+ 640 mollusks vertebrates | | 810 | | complex nerve centers +------+------+ 1 billion years | invention of the neuron | old age death 1.21 | sex in animals perfected | 1.44 | multi-cellular animals animals 1.69 | plants | 1.96 | | oxygen to support animals +----+ 2.25 | | 2.56 blue-green | nucleated cells algae | 2.89 +-------+ | DNA genetics? 3.24 | photosynthesis earliest cells reliable reproduction 3.61 | invention of the cell | inorganic protein microspheres 4 billion years non-living chemicals amino acid formation FIGURE: Highlights in the evolution of terrestrial intelligence. The distance along the edge of the tree is proportional to the square root of the time from the present. This seems to space things nicely. Evolution has stumbled on many ways of speeding up its own progress, since species that adapt more quickly have a selective advantage. Most of these speedups, such as sex and dying of old age, are refinements of one of the oldest, the encoding of genetic information in the easily mutated and modular DNA molecule. In the last few million years the genetically evolved ability of animals, especially mammals, to learn a significant fraction of their behavior after birth has provided a new medium for growth of complexity. Modern man, though perhaps not the most individually intelligent animal on the planet, is the species in which this cultural evolution seems to have had the greatest effect, making human culture the most potent force on the earth's surface. Our cultural and technological evolution has proceeded by massive interchange of ideas and information, trial and error guided by the ability to predict the outcome of simple situations, and other techniques mediated by our intelligence. The process is self reinforcing because its consequences, such as improved communication methods and increased wealth and population, allow more experiments and faster cross fertilization among different lines of inquiry. Many of its techniques have not been available to biological evolution. The effect is that present day global civilization is developing capabilities orders of magnitude faster. Of course biological evolution has had a massive head start. Although cultural evolution has developed methods beyond those of its genetic counterpart, the overall process is essentially the same. It involves trying large numbers of possibilities, selecting the best ones, and combining successes from different lines of investigation. This requires time and other finite resources. Finding the optimum assembly of particular type of component which achieves a desired function usually requires examination of a number of possibilities exponential in the number of components in the solution. With fixed resources this implies a design time rising exponentially with complexity. Alternatively the resources can be used in stages, to design subassemblies, which are then combined into larger units, and so on, until the desired result is achieved. This can be much faster since the effort rises exponentially with the incremental size of each stage and linearly with the number of stages, with an additional small term, for overall planning, exponential in the number of stages. The resulting construct will probably use more of the basic component and be less efficient than an optimal design. Biological evolution is affected by these considerations as much as our technology. If a device is so difficult to design that our technology cannot build it, then neither should we expect to find it in the biological world. Conversely, if we find some naturally evolved thing, we can rest assured that designing an equally good one one is not an impossibly difficult task. Presumably there is a way of using the physics of the universe to construct entities functionally equivalent to human beings, but vastly smaller and more efficient. Terrestrial evolution has not had the time or space to develop such things. But by building within the sequence atoms, amino acids, proteins, cells, organs, animal (often concurrently), it produced a technological civilization out of inanimate matter in only two billion years. Harangue: The existence of several examples of intelligence designed under these constraints should give us great confidence that we can achieve the same in a time span similar to that of other technological accomplishments. The situation is analogous to the history of heavier than air flight, where birds, bats and insects clearly demonstrated the possibility before our culture mastered it. Flight without adequate power to weight ratio is heartbreakingly difficult (vis. Langley's steam powered aircraft or current attempts at man powered flight), whereas with enough power (on good authority!) a shingle will fly. Refinement of the aerodynamics of lift and turbulence is most effectively tackled after some experience with suboptimal aeroplanes. After the initial successes our culture was able to far surpass biological flight in a few decades. Although there are known brute force solutions to most AI problems, current machinery makes their implementation impractical. Instead we are forced to expend our human resources trying to find computationally less intensive answers, even where there is no evidence that they exist. This is honorable scientific endeavor, but, like trying to design optimal airplanes from first principles, a slow way to get the job done. With more processing power, competing presently impractical schemes could be compared by experiment, with the outcomes often suggesting incremental or revolutionary improvements. Computationally expensive highly optimizing compilers would permit efficient code generation at less human cost. The expanded abilities of existing systems such as Mathlab, the symbolic mathematics system from MIT, which can be used as a desk calculator for doing algebra and trigonometry as well as arithmetic, along with new experimental results, would accelerate theoretical development. Gains made this way would improve the very systems being used, causing more speedup. The intermediate results would be inefficient kludges busily contributing to their own improvement. The end result is systems as efficient and clever as any designed by more theoretical approaches, but sooner, because more of the labor has been done by machines. With enough power anything will fly. The next section examines how much is needed. Section 2: Measuring Processing Power During the past ten years Digital Equipment Corporation's PDP-10 has become the standard computer for AI and related research, partly because it was designed with advanced techniques, such as time sharing and unusual computer languages, in mind. When first introduced, the PDP-10 was considered a large machine. By today's standards it is medium size. The PDP-10 dealt with in this section is the KA model, the standard until very recently. The very largest scientific computers, heavily used in physics, chemistry and other fields, made by companies such as Control Data Corp. and IBM, are about 100 times the speed of the KA. When it was new a KA system cost about half a million dollars. Large computers sell for around 10 million. Low level vision: The visual system of a few animals has been studied in some detail, especially the layers of the optic nerve near the retina. The neurons comprising these structures are used efficiently to compute local operations like high pass filtering and edge, curvature, orientation and motion detection. Assuming the visual cortex (and possibly the optic nerve itself) is as computationally intensive as the retina, successive layers producing increasingly abstracted representations, we can estimate the total capability. There are a million separate fibers in a cross section of the human optic nerve. The thickness of the optical cortex is a thousand times the depth occupied by the neurons which apply a single simple operation. The eye is capable of processing images at the rate of ten per second (flicker at higher frequencies is detected by special operators). This means that the human visual system evaluates 10,000 million pixel simple operators each second. A tightly hand coded simple operator, like high pass filtering by subtraction of a local average, applied to a million pixel picture takes at least 160 seconds when executed on a PDP-10, not counting timesharing. Since the computer can evaluate only one at a time, the effective rate is 1/160 million pixel simple operators per second. Thus a hand coded PDP-10 falls short of being the equal of the human visual system by a speed factor of 1.6 million. It may not be necessary to apply every operator to every portion of every picture, and a general purpose computer, being more versatile than the optic nerve, can take advantage of this. I grant an order of magnitude for this effect, reducing the optic nerve to a mere 100,000 PDP-10 equivalents. 14| sperm whale * 10 | | human * 13| chimp * 10 | | human vision * 12| 10 | | 11| proposed NASA wind + 10 | tunnel simulator | 10| 10 | | 9 | Cray + 10 | bee * | CDC 7600, IBM 360/195 + 8 | 10 | KL-10 + CP | 7 | KA-10 + bit 10 | --- | sec 6 | 10 | slug * | 5 | 10 | * sponge (alive) | 4 | 10 | | 3 | 10 | + pocket calculator | +--------------------------------------------------------------------------- 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 CE (bits) FIGURE: Compute Power and Energy of various devices. Scales are logarithmic. The Cray machine is an extremely fast and large scientific computer. The NASA simulator would probably be a general purpose computer 100 times as powerful as the biggest existing machines. It has not been designed yet. The size of this factor is related to having chosen to implement our algorithms in machine language. If we had opted to disassemble a number of PDP-10's and reconfigure the components to do the computation, far fewer (perhaps only one!) would have been required. On the other hand if we had run our algorithms in interpreted Lisp, 10 to 100 times as many would be needed. The tradeoff is that the design time varies inversely with the execution efficiency. A good Lisp program to compute a given function is much easier to produce than an efficient machine language program, or an equivalent piece of hardware. As a practical example of the kind of problem this gap poses in current research, consider my work. The task is to construct a program which can drive a vehicle sensing the world with a TV camera through terrain cluttered with obstacles, avoiding the obstacles and getting to a desired place. The programs are written efficiently and in the spirit of computing only as much as is actually required to track objects from one image to the next, and to judge their distance from the parallax caused by vehicle motion. In spite of this it takes a large program several minutes of computing to process each frame. Differences in performance caused by changes in the program can often be determined only after tens of images have been processed, implying a run time of hours. This greatly limits experimentation. Also, many ideas on how to significantly improve performance cannot reasonably be tried because they slow down the computation by another factor of 10 or 100, increasing typical runs to days and weeks! Many (such as taking pictures at much smaller intervals than the current two foot motions) require very little additional programming, and would be almost certain to improve things. Entropy measurement: Is there a quantitative way in which the processing power of a system, independent of its detailed nature, can be measured? A feature of things which compute massively is that they change state in complicated and unexpected ways. The reason for believing that, say, a stationary rock does little computing is its high predictability. By this criterion the amount of computing done by a device is in the mind of the beholder. A machine with a digital display which flashed 1, 2, 3, 4 etc., at fixed intervals would seem highly predictable to an adult of our culture, but might be justifiably considered to be doing an interesting, nontrivial and informative computation by a young child. Information theory provides a measure for this idea. If a system is in a given state and can change to one of a number of next states with equal probability, the information in the transition, which I will call the Compute Energy (CE), is given by CE = log2 N where N is the number of next states. The measure is in binary digits, bits. If we consider the system in the long run, considering all the states it might ever eventually be in, then this measure expresses the total potential variety of the system. A machine which can accomplish a given thing faster is more powerful than a slower one. A measure for Compute Power is obtained by dividing the above sum by the time required for a transition. Thus: CP = log2 N / t The units are bits/second. Slightly more complicated formulas, which give lower values, apply if the transitions probabilities and times are not all equal. These measures are highly analogous to the energy and power capacities of a battery. Some properties follow: They are linear, i.e. the compute power and energy of a system of two or more independent machines is the sum of the individual power and energies; Speeding up a machine by a factor of n increases the CP by the same factor; A completely predictable system has a CP and CE of zero; A machine with a high short term CP, which can reach a moderate number of states in a short time, can yet have a low CE, if the total number of states attainable in the long run is not high. A representative computer: For the KA-PDP10, considering one instruction time, we have (roughly) that in one microsecond this machine is able to execute one of 2^5 different instructions, involving one of 2^4 accumulators and one of 2^18 memory locations, most of these combinations resulting in distinct next sates. This corresponds to a CP of log2 (2^5 x 2^4 x 2^18) bit / 10^-6 sec = 27 x 10^6 bit/sec This number is reduced by the fact that that different instruction sequences can result in the same outcome, and increased slightly by information flowing in from high speed storage devices connected to the computer for a net of about 8.5x10^6 bit/sec (details in [Moravec]). The CP is also limited by the total compute energy. If we ignore external devices, this is simply the total amount of memory, about 36x2^18 = 9.4x10^6 bits. The PDP 10 could execute at its maximum effectiveness for 9.4/8.5 = 1.1 seconds before reaching a state which could have been arrived at more quickly another way. The energy can be extended indefinitely, however, by addition of external storage devices, such as disks and tapes. Overall, the processing power of a typical major AI center computer is at most 10^7 bits/sec. Time sharing reduces this to about 10^6 b/s per user. Programming in a moderately efficient high level language costs another factor of 10, and running under an interpreter may result in a per user power of a mere 10,000 bits/sec, if the source code is efficient. A typical nervous system: We now consider the processing ability of animal nervous systems, using humans as an example. Since the data is even more scanty than what we assumed about the PDP-10, some not unassailable assumptions need to be made. The first is that the processing power resides in the neurons and their interconnections, and not in more compact nucleic acid or other chemical encodings. There is no currently widely accepted evidence for the latter, while neural mechanisms for memory and learning are being slowly revealed. A second is that the neurons are used reasonably efficiently, as detailed analysis of small nervous systems and small parts of large ones reveals (and common sense applied to evolution suggests). Thirdly, that neurons are fairly simple, and their state can be represented by a binary variable, "firing" or "not firing", which can change about once per millisecond. Finally we assume that human nervous systems contain about 40 billion neurons. Considering the space of all possible interconnections of these 40 billion (treating this as the search space available to natural evolution in its unwitting attempt to produce intelligence, in the same sense that the space of all possible programs is available to someone trying to create intelligence in a computer), we note that there is no particular reason why every neuron should not be able to change state every millisecond. The number of combinations thus reachable from a given state is 2^(40x10^9) the binary log of which gives CE = 40x10^9. This leads to a compute power of CP = 40 x 10^9 bit / 10^-3 sec = 40 x 10^12 bit/sec which is about a million times the maximum power of the KA-10. Keep in mind that much of this difference is due to the high level of interpretation in the KA, compared to what we assumed for the nervous system. Rewiring its gates or transistors for each new task would greatly increase the CP, but also the programming time. If the processor is made of 100,000 devices which can change state in 100 ns, the potential CP available through reconfiguration is 10^5 bits/10^-7 sec = 10^12 b/s. The CE would be unaffected. If automatic design and fabrication methods result in small quantity integrated circuit manufacture becoming less expensive and more widely practiced, my calculations may prove overly pessimistic. Thermodynamic efficiency: Thermodynamics and information theory provide us with a minimum amount of energy per bit of information generated at a given background temperature (the energy required to out shout the thermal noise). This is approximately the Boltzmann constant, 1.38 x 10^-16 erg/deg variable = 0.96 x 10^-16 erg/deg bit The reduction is due to the theoretical fact that a "variable", also known as a degree of freedom, is worth log2 e bits, about 1.44 bits. This measure allows us to estimate the overall energy efficiency of computing engines. For instance, we determined the computing power of the brain, which operates at 300 degrees K, to be 40x10^12 bits/sec. This corresponds to a physical power of 40 x 10^12 bit/sec x 300 deg x 0.96 x 10^-16 erg/deg bit = 1.15 erg/sec = 1.15 x 10^-7 watt The brain runs on approximately 40 watts, so we conclude that it is 10^-8 times as efficient as the physical limits allow. Doing the same calculation for the KA10, again at 300 deg, we see that a CP of 8.5x10^6 bit/sec is worth 2.44x10^-14 watts. Since this machine needs 10 kilowatts the efficiency is only 10^-18. Conceivably a ten watt, but otherwise equivalent, KA10 could be designed today, if care were taken to use the best logic for the required speed in every assembly. The efficiency would then still be only 10^-15. As noted previously, there is a large cost inherent in the organization of a general purpose computer. We might investigate the computing efficiency of the logic gates of which it is constructed (as was, in fact, done with the brain measure). A standard TTL gate can change state in about 10ns, and consumes 10^-3 watt. The switching speed corresponds to a CP of 10^8 bit/sec, or a physical power of 2.87x10^-13 watt. So the efficiency is 10^-10, only one hundred times worse than a vertebrate neuron. The newer semiconductor logic families are even better. C-MOS is twice as efficient as TTL, and Integrated Injection Logic is 100 times better, putting it on a par with neurons. Experimental superconducting Josephson junction logic operates at 4 deg K, switches in 10^-11 sec, and uses 10^-7 watts per gate. This implies a physical compute power of 3.5x10^-12 watt, and an efficiency of 7x10^-5, 1000 times better than neurons. At room temperature it requires a refrigerator that consumes 100 times as much energy as the logic, to pump the waste heat uphill from 4 degrees to 300. Since the background temperature of the universe is about 4 degreees, this can probably eventually be done away with. It is thus likely that there exist ways of interconnecting gates made with known techniques which would result in behavior effectively equivalent to that of human nervous systems. Using a million I^2L gates, or 10 thousand Josephson junction gates, and a trillion bits of slower bulk storage, all running at full speed, such assemblies would consume as little as, or less than, the power needed to operate a brain of the conventional type. Past performance indicates that the amount of human and electronic compute power available is inadequate to design such an assembly within the next few years. The problem is much reduced if the components used are suitable large subassemblies. Statements of good high level computer languages are the most effective such modularizations yet discovered, and are probably the quickest route to human equivalence, if the necessary raw processing power can be accessed through them. This section has indicated that a million times the power of typical existing machines is required. The next suggests this should be available at reasonable cost in about ten years. Section 3: The Growth of Processing Power The references below present, among other things, the following data points on a price curve: Transistor price .0001c .01c 1c $1 $100 +---+---+---+---+---+---+---+---+---+---+ Year | | +- O -+ 1950 | # | 1951 $100 transistor | # | 1952 transistor hearing aid | # | 1953 | # | 1954 +- # -+ 1955 transistor radios | # | 1956 | O | 1957 $10 transistor | # | 1958 | # | 1959 +- O -+ 1960 $1 transistor | # | 1961 | # | 1962 $100,000 small computer (IBM 1620) | # | 1963 | O | 1964 +- # -+ 1965 $0.08 transistor (IC) | # | 1966 $1000 4 func calculator | # | 1967 $6000 scientific calc. | # | 1968 $10,000 small computer (PDP 8) | # | 1969 +- O -+ 1970 $200 4 func calculator | # | 1971 | # | 1972 1K RAMS (1 c/bit) | # | 1973 | # | 1974 $1000 small computer (PDP 11) +- O -+ 1975 4K RAMS (.1 c/bit) | # | 1976 $5 4 func calc (.05 c/trans) | | 1977 | | 1978 | | 1979 +---+---+---+---+---+---+---+---+---+---+ The numbers indicate a remarkably stable evolution. The price per electronic switch has declined by a steady factor of ten every five years, if speed and reliability gains are included. Occasionally there is a more precipitous drop, when a price threshold which opens a mass market is reached. This makes for high incentives, stiff price competition and mass production economies. It happened in the early sixties with transistor radios, and is going on now for pocket calculators and digital wristwatches. It is begining for microcomputers, as these are incorporated into consumer products such as stoves, washing machines, televisions and sewing machines, and soon cars. During such periods the price can plummet by a factor of 100 in a five year period. Since the range of application for cheap processors is larger than for radios and calculators, the explosion will be more pronounced. The pace of these gains is in no danger of slackening in the forseeable future. In the next decade the current period may seem to be merely the flat portion of an exponential rise. On the immediate horizon are the new semiconductor techniques, I^2L, and super fast D-MOS, CCD for large sensors and fast bulk memory, and magnetic bubbles for mass storage. The new 16K RAM designs use a folded (thicker) cell structure to reduce the area required per bit, which can be interpreted as the first step towards 3 dimensional integration, which could vastly increase the density of circuitry. The use of V-MOS, an IC technique that vertically stacks the elements of a MOS transistor is expanding. In the same direction, electron beam and X-ray lithography will permit smaller circuit elements. In the longer run we have ultra fast and efficient Josephson junction logic, of which small IC's exist in an IBM lab, optical communication techniques, currently being incorporated into intermediate distance telephone links, and other things now just gleams in the eye of some fledgling physicist or engineer. My favorite fantasies include the "electronics" of super-dense matter, either made of muonic atoms, where the electrons are replaced by more massive negative particles or of atoms constructed of magnetic monopoles which (if they exist) are very massive and affect each other more strongly than electric charges. The electronics and chemistry of such matter, where the "electron" orbitals are extremely close to the nucleus, would be more energetic, and circuitry built of it should be astronomically faster and smaller, and probably hotter. Mechanically it should exhibit higher strength to weight ratios. The critical superconducting transition field strengths and temperatures would be higher. For monopoles there is the possibility of combination magnetic electric circuitry which can contain, among many other goodies, DC transformers, where an electric current induces a monopole current at right angles to it, which in turn induces another electric current. One might also imagine quantum DC transformers, matter composed of a chainlike mesh of alternating orbiting electric and magnetic charges. I interpret these things to mean that the cost of computing will fall by a factor of 100 during the next 5 years, as a consequence of the processor explosion, and by the usual factor of 10 in the 5 years after that. As an approximation to what is available today, note that in large quantities an LSI-11 sells for under $500. This provides a moderately fast 16 bit processor with 4K of memory. Another $500 could buy an additional 32K of memory, if we bought in quantity. The result would be a respectable machine, somewhat less powerful than the KA-10, for $1000. At the crude level of approximation employed in the previous section, a million machines of this type should permit human equivalence. A million dollars would provide a thousand of them today (a much better buy, in terms of raw processing power, than a million dollar large processor). In ten years a million dollars should provide the equivalent of a million such machines, in the form of a smaller number of faster processors, putting human equivalence within reach. A roomful of isolated small computers is unlikely to prove very useful for our purposes. The next section suggests how to make them work together. Section 4: Mega Processing The following discussion is based on an interconnection system for computers described in a more technical version of this essay [Moravec], based on Batcher sorting nets, which has approximately the following properties: Every processor may send a fairly long message to any other processor about every quarter of a microsecond. The messages from all the processors are emitted in synchronized waves. A wave takes one microsecond to filter through the interconnection net, causing there to be four waves in the net at one time. Each message includes a priority number introduced by the sending computer. The network delivers to each processor the message with the highest priority addressed to it, if any. The processor sending each delivered message receives an acknowledgement, the processors whose messages were blocked by higher priority ones receive notices of failure. The amount of network logic per processor is small, and grows as the square of the log of the number of processors. This low growth rate ensures that even in a system of a million processors the cost of the interconnection is no greater than the cost of the processors. A major feature of this scheme is its flexibility. It can function as any of the fixed interconnection patterns of current experimental multiprocessors, or as a hexagonal mesh, or a 7 dimensional cubic lattice, should that be desired, or the tree organization being considered in a Stanford proposal. It can simulate programmed pipeline machines, where numbers stream between units that combine and transform them. What is more, it can do all of these things simultaneously, since messages within one isolated subset of the processors have no effect on messages in a disjoint subset. This permits a very convenient kind of "time" sharing, where individual users get and return processors as their demands change. Such mimicry fails to take advantage of the ability to reconfigure the interconnection totally every message wave. There are many applications, such as searching a tree of possibilities in reasoning or game playing where this could be used very effectively. Several existing programming languages can be extended to make this capability conveniently available to programmers. Conventional programming languages consist of strings of commands and conditional commands to be obeyed by the computer. This type of programming can be extended to make reasonably convenient use of a parallel computer by providing means by which the programmer can specify that several strings of such commands can be carried out concurrently, and by providing large data objects such as arrays which are manipulated by operations that work on all the elements of the objects simultaneously. The high bandwidth of the communications net is required to transmit data manipulation commands to multiple parts of large structures (by a chain letter technique), and to pass program segments from processor to processor. [Moravec] contains many more details, and also suggests what may be a more elegant solution. We will probably want the first versions of such a system to be able to serve several independent users simultaneously. The system's resources would be managed by the system monitor, a program running on several machines which maintains a pool of free processors, and parcels them out on request, and which also handles file system requests (bulk storage would be connected to a handful of the processors), and allocation of other devices. Processes belonging to a single user will be initiated by a particular master machine, probably the one connected to his console. This master can create a tree of subprocesses, possibly intercommunicating, running on different machines. It should be possible, for example, to do vision by having one subset configured as an array processor for efficient implementation of retina-like processing, while another is running an Algol/APL for the less structured analytic geometry needed to interpret the image, and yet a third is operating a Lisp system doing abstract reasoning about the scene. Many existing systems permit this kind of organization, but they are hampered by having an absurdly small amount of computing power. How is a system of this kind initialized, and how does one abort an out of control process taking place in part of it without affecting the rest? A possibility is to have an "executive" class of messages (perhaps signalled by a particular bit in the data portion), which user jobs are not permitted to emit. Reception of such messages might cause resetting of the processor, loading of memory locations within it, and starting execution at a requested locations. A single externally controllable machine can be used to get things going, fairly quickly if it emits a self replicating chain letter. Now consider reliability. The system can obviously tolerate any reasonable number of inoperable processors, by simply declaring them unavailable for use. Failures in the communication net are much more serious, and under most situations will require the system to stop operating normally. It is possible to write diagnostic programs which can track down defective comparator elements or broken data wires. If something should happen to the clock signals to a given level it would be necessary to wheel out an oscilloscope. If reliability were a critical issue it would be possible to include a duplicate net, to run things the while other was being debugged. Section 5: The Future Suppose my projections are correct, and the hardware requirements for human equivalence are available in 10 years for about the current price of a medium large computer. Suppose further that software development keeps pace (and it should be increasingly easy, because big computers are great programming aids), and machines able to think as well as humans begin to appear in 10 years. If the cost of electronics continues to plummet beyond then (and the existence of increasingly cheaper and better robot labor, in addition to scientific and engineering improvements, should ensure that), an additional 15 years should bring human equivalence into the pocket calculator price range. I also assume that sensors and effectors for such devices will be able to match human performance, since even today's technology is able to supercede it in many areas. What then? Well, even if these machines are only as clever as human beings, they will have enormous advantages over humans in competitive situations. Their production and upkeep is vastly less expensive, so more of them can be put to work with given resources. They can be easily specialized for given tasks, and be programmed to work tirelessly. Because we are not constrained to use any particular type of component in building them, versions can be designed to work efficiently in environments in which sustaining humans is very expensive, such as deep in the oceans, and more importantly in boundless outer space. Most significantly of all, they can be put to work as programmers and engineers, with the task of optimizing the software and hardware which make them what they are. The successive generations of machines produced this way will be increasingly smarter and more cost effective. Of course, there is no reason to assume that human equivalence represents any sort of upper bound. When pocket calculators can out-think humans, what will a really big computer be like? Regardless of how benevolent these machines are made, homo sapiens will simply be outclassed. Societies and economies are as surely subject to evolutionary pressures as biological organisms. Failing social systems eventually wither and die, and are replaced by more successful competitors, and those that can sustain the most rapid expansion dominate sooner or later. I expect the human race to expand into space in the near future, and O'Neill's habitats for people will be part of this. But as soon as machines are able to match human performance, the economics against human colonies become very persuasive. Just as it was much cheaper to send Pioneer to Jupiter and Viking to Mars than men to the Moon, so it will be cheaper to build orbiting power stations with robot rather than human labor. A machine can be designed to live in free space and love it, drinking in unattenuated sunlight and tolerating hard radiation. And instead of expensive pressurized, gravitied, decorated human colonies, the machines could be put to work converting lunar material into orbiting automatic factories. The doubling time for a machine society of this type would be much shorter than for human habitats, and the productive capability would expand correspondingly faster. The first societies in space will be composed of co-operating humans and machines, but as the capabilities of the self-improving machine component grow, the human portion will function more and more as a parasitic drag. Communities with a higher ratio of machines to people will be able to expand faster, and will become the bulk of the intelligent activity in the solar system. In the long run the sheer physical inability of humans to keep up with these rapidly evolving progeny of our minds will ensure that the ratio of people to machines approaches zero, and that a direct descendant of our culture, but not our genes, inherits the universe. This may not be as bad as it sounds, since the machine society can, and for its own benefit probably should, take along with it everything we consider important, up to and including the information in our minds and genes. Real live human beings, and a whole human community, could then be reconstituted if an appropriate circumstance ever arose. Since biology has committed us to personal death anyway, with whatever immortality we can hope for residing only in our children and our culture, shouldn't we be happy to see that culture become as capable as possible? In fact, attempting to hobble its growth is an almost certain recipe for long term suicide. The universe is one random event after another. Sooner or later an unstoppable virus deadly to humans will evolve, or a major asteroid will collide with the earth, or the sun will go nova, or we will be invaded from the stars by a culture that didn't try to slow down its own evolution, or any number of other things. The bigger, more diverse and competent our offspring are, the more capable they will be of detecting and dealing with the problems that arise. For the egomaniacs among us there is another possibility. The main problem in keeping up with the machines is that we evolve by the old DNA + nucleated cell + sex + personal death method, while our machines evolve by the new improved intelligence + language + culture + science + technology technique, which is so very much faster that our biology seems to stand still in comparison. If we could somehow transfer our evolution to the faster form, we should be able hold our own. At first thought genetic engineering might seem to be the key. Successive generations of human beings could be designed by engineering mathematics and on the basis of computer simulations just like airplanes and computers are now. But this is just like building robots out of proteins instead of metal and plastic. Being made of protein is in fact a major drawback. That stuff is stable only in a narrow temperature and pressure range, sensitive to all sorts of high energy disturbances, and so on, and rules out many construction techniques and components. Is there some way to retain our essential humanness, at least temporarily until we think of something better, while transferring ourselves to a more malleable form? Imagine the following process (meant to suggest a variety of ways such a thing could be done). You are in an operating theater, and a brain surgeon (probably a machine) is in attendance. On a table next to yours is a potentially human equivalent computer, dormant now for lack of a program to run. Your skull, but not your brain, is under the influence of a local anaesthetic. You are fully conscious. Your brain case is opened, and the surgeon peers inside. Its attention is directed at a small clump of about 100 neurons somewhere near the surface. It examines, non-destructively, the three dimensional structure and chemical makeup of that clump with neutron tomography, phased array radio encephalography, etc., and derives all the relevant parameters. It then writes a program which can simulate the behavior of the clump as a whole, and starts it running on a small portion of the computer next to you. It then carefully runs very fine wires from the computer to the edges of the neuron assembly, to provide the simulation with the same inputs the neurons are getting. You and it both check out the accuracy of the simulation. After you are satisfied, it carefully inserts tiny relays between the edges of the clump and the rest of the brain, and runs another set of wires from the relays to the computer. Initially these simply transmit the clump's signals through to the brain, but on command they can connect the simulation instead. A button which activates the relays when pressed is placed in your hand. You press it, release it and press it again. There should be no difference. As soon as you are satisfied, the simulation connection is established firmly, and the now unconnected clump of neurons is removed. The process is repeated over and over for adjoining clumps, until the entire brain has been dealt with. Occasionally several clump simulations are combined into a single equivalent but more efficient program. Though you have not lost consciousness, or even your train of thought, your mind has been removed from the brain and transferred to the machine. A final step is the disconnection of the your old sensory and motor system, to be replaced by higher quality ones in your new home. This last part is no different than the installation of functioning artificial arms, legs, pacemakers, kidneys, ears and hearts and eyes being done or contemplated now. Advantages become apparent as soon as the process is complete. Somewhere in your machine is a control labelled "speed". It was initially set to "slow", to enable the simulations to remain synchronized with the rest of your old brain, but now the setting is changed to "fast". You can communicate, react and think at a thousand times your former rate. But this is only a minor first step. Major possibilities stem from the fact that the machine has a port which enables the changing program that is you to be read out, non-destructively, and also permits new portions of program to be read in. This allows you to conveniently examine, modify, improve and extend yourself in ways currently completely out of the question. Or, your entire program can be copied into a similar machine, resulting in two thinking, feeling versions of you. Or a thousand, if you want. And your mind can be moved to computers better suited for given environments, or simply technologically improved, far more conveniently than the difficult first transfer. The program can also be copied to a dormant information storage medium, such as magnetic tape. In case the machine you inhabit is fatally clobbered, a copy of this kind can be read into an unprogrammed computer, resulting in another you, minus the memories accumulated since the copy was made. By making frequent copies, the concept of personal death could be made virtually meaningless. Another plus is that since the essence of you is an information packet, it can be sent over information channels. Your program can be read out, radioed to the moon, say, and infused there into a waiting computer. This is travel at the speed of light. The copy that is left behind could be shut down until the trip is over, at which time the program representing you with lunar experiences is radioed back, and transfused into the old body. But what if the original were not shut down during the trip? There would then be two separate versions of you, with different memories for the trip interval. When the organization of the programs making up humans is adequately understood, it should become possible to merge two sets of memories. To avoid confusion, they would be carefully labelled as to which had happened where, just as our current memories are usually labelled with the time of the events they record. This technique opens another vast realm of possibilities. Merging should be possible not only between two versions of the same individual but also between different persons. And there is no particular reason why mergings cannot be selective, involving some of the other person's memories, and not others. This is a very superior form of communication, in which memories, skills, attitudes and personalities can be rapidly and effectively shared. The amount of memory storage an individual will typically carry will certainly be greater than humans make do with today, but the growth of knowledge will insure the impracticability of everybody lugging around all the world's knowledge. This implies that individuals will have to pick and choose what their minds contain at any one time. There will often be knowledge and skills available from others superior to a person's own. The incentive to substitute those talents for native ones will be overwhelming most of the time. This will result in a gradual erosion of individuality, and formation of an incredibly potent community mind. A pleasant possibility presents itself. Why should the mind transferral process be limited to human beings? Earthly life contains several species with brains as large as or larger than man's, from dolphins, our cephalic equals, to elephants and the large whales, and perhaps giant squid, with brains up to twenty times as big. If the technical problem of translation can be overcome, and it may be quite difficult for squid, in particular, since their minds are evolved entirely independently, then our culture could be fused with theirs, with each component used according to its value. In fact, a synthesis of all terrestrial life is desirable with the simpler organisms contributing only the information in their DNA, if that's all they have. In this way all the knowledge generated by terrestrial biological and cultural evolution will be retained in the data banks, available whenever needed. This is a far more secure form of storage than the present one, where genes and ideas are lost as species become extinct and individuals die. We now have a picture of a super-consciousness, the synthesis of terrestrial life, and perhaps jovian and martian life as well, constantly improving and extending itself, spreading outwards from the solar system, converting non-life into mind. There may be other such bubbles expanding from elsewhere. What happens when we meet another? Well, it's presumptuous of me to say at this tender stage of the evolution, but fusion of us with them is certainly a possibility, requiring only a translation scheme between the data representations. This process, possibly occuring now elsewhere, might convert the entire universe into an extended thinking entity. References Section 1: The Natural History of Intelligence [animal] JERISON, Harry J., RIOPELLE, A.J., ed. "Paleoneurology and the Evolution of Mind", "Animal Problem Solving", Scientific American, Vol. 234, No. 1, Penguin Books, 1967. January 1976, 90-101. GOODRICH, Edwin S., BITTERMAN, M. E., "Studies on the Structure and Development of Vertebrates", "The Evolution of Intelligence", Dover Publications Inc., New York, 1958. Scientific American, Vol. 212, No. 1, January 1965, 92-100. BUCHSBAUM, Ralph, "Animals without Backbones", GRIFFIN, Donald R., ed. The University of Chicago Press, 1948. "Animal Engineering", W.H. Freeman and Company, FARAGO, Peter, and Lagnado, John, San Francisco, June 1974. "Life in Action" Alfred A. Knopf, New York, 1972. BURIAN, Z. and Spinar, Z.V., "Life Before Man", BONNER, John Tyler, American Heritage Press, 1972. "Cells and Societies", Princeton University Press, Princeton, 1955. [squid] COUSTEAU, Jacques-Yves and Diole, Philippe, BOYCOTT, Brian B., "Octopus and Squid", "Learning in the Octopus", Doubleday & Company, Garden City, N.Y., 1973. Scientific American, Vol. 212, No. 3, (also a televised film of the same name) March 1965, 42-50. "The Octopus", LANE, Frank W., a televised film, Time-Life films. "The Kingdom of the Octopus", Worlds of Science Book, Pyramid Publications Inc. October, 1962. [bird] BAKKER, Robert T., STETTNER, Laurence Jay and Matyniak, Kenneth A. "Dinosaur Renaissance", "The Brain of Birds", Scientific American, Vol. 232, No. 4, Scientific American, Vol. 218, No. 6, April 1975. June 1968, 64-76. [whale] LILLY, John. C., STENUIT, Robert, "The Mind of the Dolphin" & "The Dolphin, Cousin to Man", "Man and Dolphin", Bantam Books, New York, 1972. Doubleday and Company, New York, 1967. COUSTEAU, Jacques-Yves and Diole, Philippe, "Whales and Dolphins", "The Whale", A BBC produced film shown Doubleday & Company, Garden City, N.Y., 1972. in the NOVA television series FICHTELIUS, Karl-Erik and Sjolander, Sverre, McINTYRE, Joan, ed. "Smarter than Man?", "Mind in the Waters", Ballantine Books, New York, 1974. Charles Scribner's Sons, San Francisco, 1974. [elephant] RENSCH, Bernhard, "The Intelligence of Elephants", Scientific American, February 1957, 44. [primate] "The First Words of Washoe", PFEIFFER, John, Televised film shown in the NOVA series "The Human Brain", Worlds of Science Books, Pyramid Publications Inc., LeGROS CLARK, W.E., New York, 1962. "History of the Primates", The University of Chicago Press, Chicago 1966. Section 2: Measuring Processing Power WILLOWS, A.O.D., HUBEL, David H., "Giant Brain Cells in Mollusks", "The Visual Cortex of the Brain", Scientific American, Vol. 224, No. 2, Scientific American, February 1971, 69-75. November 1963, 54-62. KANDEL, Eric R., WILLIAMS, Peter L., Warwick, Roger "Nerve Cells and Behavior", "Functional Neuroanatomy of Man", Scientific American, Vol. 223, No. 1, W.B. Saunders Company, Philadelphia, 1975. July 1970, 57-70. "PDP-10 Reference Handbook", AGRANOFF, Bernard W., Digital Equipment Corporation, Maynard Mass., 1971. "Memory and Protein Synthesis", Scientific American, Vol. 216, No. 6, TRIBUS, Myron and McIrvine, Edward C., June 1967, 115-122. "Energy and Information", Scientific American, Vol. 224, No. 3, KENNEDY, Donald, September 1971, 179-188. "Small Systems of Nerve Cells", Scientific American, Vol. 216, No. 5, GLASSTONE, Samuel, Lewis, David, May 1967, 44-52. "Elements of Physical Chemistry", D. Van Nostrand Co. Inc., New York, 1960. BAKER, Peter F., "The Nerve Axon", MILLER, Richard T., Scientific American, Vol. 214, No. 3, "Super Switch", 284, in Science Year annual 1975, March 1966, 74-82. Field Enterprises Educational Corp., 1975. LANDAUER, Rolf, "Fundamental Limitations in the Computational Process", IBM, Yorktown Heights N.Y. 1976 Section 3: The Growth of Processing Power McWHORTER, Eugene W. TIEN, P. K. "The Small Electronic Calculator", "Integrated Optics", Scientific American, Vol. 234, No. 3, Scientific American, Vol. 230, No. 4, March 1976, 88-98. April 1974, 28-35. HODGES, David A., HITTINGER, William C. "Trends in Computer Hardware Technology", Metal-Oxide-Semiconductor Technology", Computer Design, Vol. 15, No. 2, Scientific American, Vol. 229, No. 2, February 1976, 77-85. August 1973, 48-57. SCRUPSKI, Stephen E., et al., BOBECK, Andrew H. and Scovil, H. E. D. "Technology Update", "Magnetic Bubbles", Electronics, Vol. 48, No. 21, McGraw-Hill, Scientific American, Vol. 224, No. 6, October 16, 1975, 74-127. June 1971, 78-90. VACROUX, Andre G. HEATH, F. G. "Microcomputers", "Large-Scale Integration in Electronics", Scientific American, Vol. 232, No. 5, Scientific American, Vol. 222, No. 3, May 1975, 32-40. February 1970, 22-31. TURN, Rien RAJCHMAN, Jan A. "Computers in the 1980's", "Integrated Computer Memories", Columbia University Press, Scientific American, Vol. 217, No. 1, Rand Corporation, 1974. July 1967, 18-31. HITTINGER, William C. and Sparks, Morgan "Microelectronics", Scientific American, Vol. 213, No. 5, November 1965, 56-70. Section 4: Mega Processing BATCHER, K.E. KNUTH, D.E. "Sorting Networks and their Applications", "Sorting and Searching", 1968 Spring Joint Computer Conf. Proceedings The Art of Computer Programming, Vol. 3 April 1968, 307-314. Addison-Wesley, 1973. VAN VOORHIS, David C. "An Economical Construction for Sorting Networks", 1974 National Computer Conference Proceedings April 1974, 921-927. General Reference MORAVEC, Hans P. "The Role of Raw Power in Intelligence", Stanford AI Memo (to be published) available from the author