Report on The National Science Foundation Research Planning Workshop in

Report on The National Science Foundation Research Planning Workshop

on

Cognition and Spatial Reasoning:

The Human-Machine Connection

May 15, 1997

Turf Valley Conference Center

Ellicott City, MD

Conference Organizers:

Susan L. Epstein

Dept. of Computer Science

Hunter College and

The Graduate School,

City University of New York

Jack J. Gelfand

Department of Psychology

Princeton University

Michael M. Marefat

Department of Electrical

and Computer Engineering

University of Arizona

Sponsored by the National Science Foundation Programs in:

Knowledge Models and Cognitive Systems

Robotics and Machine Intelligence

Database and Expert Systems

Interactive Systems

Table of Contents

Executive Summary

1. Introduction

2. Overview of Spatial Representation and Reasoning

2.1 Representation

2.2 Diagrammatic representations

2.3 Incomplete information

2.4 Modes of reasoning

2.5 Research issues

3. Representation of Space in the Brain

3.1 Processing pathways in the visual system

3.2 Spatial coordinate systems for motor control

3.2 Premotor cortex

3.3 Research issues

4. Geometric Reasoning and CAD/CAM

4.1 Introduction

4.2 State of the art

4.3 Research issues

5. Robotics and Perception

5.1 State of the art in machine vision

5.2 Research issues

6. Diagrammatic Reasoning

6.1 State of the art in diagrammatic reasoning

6.2 Research issues

7. Linguistic Representation of Space

8. Qualitative Physical Reasoning

8.1 State of the art in qualitative physical reasoning

8.2 Research issues

9. Interdisciplinary Research Opportunities

References

Summaries Of Workshop Presentations

Thursday, May 15

Session 1: Spatial Cognition and Representation of Space in the Brain

Session 2: Robotics and Perception

Session 3: Linguistic Representation of Space.

Friday, May 16

Session 4: Spatial Representation and Reasoning

Session 5: Diagrammatic Reasoning

Session 6: Qualitative Physical Reasoning

Session 7: Geometric Reasoning and CAD

Saturday, May 17

Session 8: Formal Theories of Spatial Reasoning

Workshop Contributors and Attendees

Executive Summary

Because space is a fundamental feature of our environment, spatial knowledge and spatial perception play crucial roles in even the most ordinary human problem solving. People process spatial information when they navigate, when they manipulate objects, and when they design them. Although people manage such tasks quite easily, automated spatial reasoning has proven to be a difficult task, presumably because computers lack the appropriate representation and processing mechanisms for spatial information.

On May 15-17, 1997 in Ellicott City, MD, the National Science Foundation sponsored a workshop at which 24 attendees examined the current state of the art in machine and human spatial perception, reasoning, and cognition. This report, "Cognition and Spatial Reasoning: The Human-Machine Connection," summarizes their discussion as they sought common ground for interdisciplinary research. It is intended as a research planning guide for the National Science Foundation in its continued support of similar interdisciplinary efforts.

The participants are actively engaged in research on spatial cognition within a variety of disciplines: computer science, mechanical and manufacturing engineering, cognitive science, linguistics, and philosophy. Nonetheless, there were recurrent themes in their presentations:

• Appropriate models are a function of the spatial reasoning task under consideration.

• Proper choice and integration of different models is essential.

• The integration of large scale and small scale information is an important challenge.

• Reasoning must proceed both quantitatively and qualitatively.

• Representation and reasoning must provide for spatial ambiguity and abstraction.

• Representation must accommodate dynamic change.

Because these issues arose repeatedly, across a broad range of applications, it was the attendees’ consensus that interdisciplinary research efforts to explore these issues from multiple points of view would be both productive and potentially innovative. Neuroscience and cognitive science research on the sources of human capabilities are important too, given the skill and ease with which people process spatial information. We therefore offer the following broad recommendations:

• Establish a special research program in spatial reasoning, similar to those on scientific

databases and collaborative environments.

• Couple neuroscience research on the representation of space in the brain with research into better models and representation languages for space and spatial properties.

• Support more research into the ontological foundations of formal spatial logics as coupled with computer-aided design (CAD/CAM) and geographic information systems (GIS).

• Encourage applications of work in the linguistic representation of space and human spatial conceptualization to practical areas, such as human-computer interaction,

• Support research to develop analogical reasoning and learning, and model-based reasoning and learning in the spatial domain.

• Support research on AI spatial reasoning in concrete task domains. Such studies should generate tractable results, useful in solving problems that arise in practice, and should produce immediately beneficial systems.

• Establish intra-NSF and multi-agency sponsored research on spatial reasoning.

• Continue to organize and provide seed money for workshops and meetings. These have proved to be effective forums for the transfer of knowledge between the diverse communities involved.

The first sections of this report introduce spatial cognition and provide an overview of spatial representation and reasoning. Subsequent sections provide details on CAD/CAM, robotics and perception, diagrammatic reasoning, the linguistic representation of space, and qualitative physical reasoning. A more detailed discussion of our recommendations appears in Section 9.

We thank the session leaders, B. Chandrasekaran, Ernie Davis, Michael Graziano, Pat Hayes, Yumi Iwasaki, Avi Kak, Ari Requicha, and Barbara Tversky, for their ideas, enthusiasm, and original drafts of portions of this manuscript. We are also grateful to Ernie Davis for his sensitive and erudite editing. We accept full responsibility for any portions of this document that do not reflect their wisdom. We thank the programs in Knowledge Models and Cognitive Systems, Robotics and Machine Intelligence, Database and Expert Systems, and Interactive Systems at the National Science Foundation for their support. We also thank Larry Reeker, of the Program on Knowledge Models and Cognitive Systems at the National Science Foundation, for his encouragement and support in the development and organization of this meeting. His vision sets an exemplary standard for us all.

1. Introduction

Each person has a physical presence that manifests itself in space, and thereby creates problems to be addressed there. Furthermore, each of us has a variety of sensors that collect information about space and our presence in it, sensors that provide a constant stream of potentially relevant input data. Thus space is a fundamental category of thought, one that plays a deep role in many aspects of human cognition.

By "cognition" in this document we mean all the "processes by which the sensory input is transformed, reduced, elaborated stored, recovered and used" (Neisser1967). Thus cognition goes beyond perception to include encoding and subsequent manipulations. Spatial cognition refers to cognition whose input references space, and spatial reasoning is the body of methods and tools that represents or processes spatial information to derive, make explicit, or predict new spatial knowledge.

The ability to locate oneself in the world or to navigate successfully through a physical environment is an apparently elementary task for people, but a difficult one for machines. That alone has been sufficient to target spatial reasoning as a significant subject in both human and machine intelligence. Spatial reasoning has therefore been studied and applied in robotics, CAD/CAM, expert systems, spatial databases and GIS, computer vision and image processing, linguistics, neuroscience, and computer graphics.

Faced with a more challenging task, however, such as the design of state-of-the-art aircraft, people also bring to bear rich spatial representations, reasoning methods, and communication skills from which machines could benefit. The Boeing 777 aircraft, for example, was designed in a paperless environment. The execution of this enormous project required spatial reasoning for a variety of tasks, including the conception and design of parts and systems from initial sketches to full specifications, the generation of manufacturing plans for individual components, and the construction of assembly sequences for robots. The development of automated systems for similar tasks will require sophisticated tools for the representation and manipulation of spatial information.

The next generation of CAD/CAM systems will rely even further upon powerful tools that represent and manage spatial information. Such systems might be able to perform a complete automated verification of design consistency; negotiate, record, and retrieve design decisions; and generate and evaluate manufacturing options for optimization among competing goals. The development of such tools requires substantial progress in many directions, particularly in spatial information systems.

The analysis of spatial reasoning and spatial cognition is a problem that has engaged researchers from a broad range of disciplines. This document is compendium of expertise from scientists in computer vision, CAD/CAM, and artificial intelligence, as well as neuroscientists, cognitive scientists, linguists, and philosophers. Despite our widely disparate orientations, methodologies, and vocabularies, we found recurrent themes and issues for spatial representation and reasoning.

In the case of representation we recognize the need to:

• Within a single representation, combine spatial information on a very large scale with spatial information on a very small scale, and then reason with that representation.

• Integrate coarse and detailed models smoothly.

• Provide multiple models, each of which describes the same world but supports a different reasoning task. The reasoner must select a single appropriate model or integrate several of them.

• Express and manipulate spatial information along a continuum from purely qualitative (e.g., "A is inside B") to properties of intermediate specificity (e.g., "the diameter of A is less than that of B") to purely quantitative properties (e.g., a sphere two inches in diameter).

• Treat implicit information, inferred information, and presupposed information in distinct and appropriate ways.

• Forego as impossible precise measurements, ideal shapes, and precise positions, be they measured or generated (spatial imprecision).

• Employ deliberate ambiguity and abstraction in both diagrams and discourse to communicate ideas that are not yet completely formed. For example, in early architectural sketches, blobs are used to represent buildings that do not yet have a defined shape.

• Support perspective (or frame of reference).

• Represent change, both as a dynamic across time (e.g., movement) and to convey static or dynamic ideas through animation.

In the case of reasoning we recognize the need to:

• Draw analogies and use case-based reasoning to identify similarities and differences between shapes, components, and structures, and to apply such information to solution construction.

• Employ plausible reasoning, for example, to determine automatically whether a set of three-dimensional spatial information is internally consistent, or to identify the set of all possible spatial configurations that satisfy some set of expressed spatial constraints.

• Identify and generate useful abstractions automatically, invoke the correct one(s), and seamlessly integrate information between different views (multi-scale abstractions and focus).

• Integrate multiple kinds of spatial expertise.

• Learn about space and problem solving in space, for example, learning structure and shape from examples, so that images can be understood, retrieved and manipulated based on their content, or that robots can automatically abstract and learn correct grasping strategies.

• Support communication, negotiation, and an interface among intelligent agents that collaborate in spatial reasoning.

2. Overview of Spatial Representation and Reasoning

2.1 Representation

A remarkable point of agreement among all the workshop participants was their repeated references to information and its representation. This was particularly noticeable in the two neuroscience presentations in Session 1. Haxby and Graziano naturally spoke of information being represented, projected, and fed back; of different kinds of information about objects being represented in different parts of the brain.

There are, however, three caveats to this consensus. First, some AI researchers (Brooks 1991) and many connectionists reject using the language of representation to characterize cognitive states. Second, even if such a language of representation is deemed useful, it may not necessarily be the language of information processing. For example, Paul Churchland has described neural structures that are involved in the visuo-motor coordination as a frog catches its prey, a fly (personal communication). Churchland shows that, in a suitably transformed space, the solution to the control problem can be expressed as a linear relationship between the (transformed values of) location of the prey and a motor control parameter. In the frog’s neural structure, visual information is represented in one neural layer and the motor control information in another layer directly below it. The two layers are so connected that the prey location information directly sets the motor control parameter. It is clear that the states of the layer constitute a representation of information; it is less clear that the connections between layers are best viewed in terms of processing information. Third, computer scientists and cognitive scientists approach the analysis of spatial representation and cognition in very different ways, so that quite separate issues and vocabulary tend to arise in the two fields. In computer science, a representation or a reasoning process is an artifact designed to satisfy particular purposes. Hence, the structure is perfectly known; the key issue is the adequacy of the representation or process to the given task. In cognitive science, by contrast, representations and processes are aspects of cognition by an experimental subject, that can be examined only through rather indirect and subtle experimentation and theorizing; the key issue is how well the theory accounts for the empirical data. Thus, some issues, such as the use of precise versus qualitative representation, are central in computer science but peripheral in cognitive science because they are inaccessible to researchers there. Other issues, such as frame of reference or viewpoint, are central in cognitive science, but peripheral in computer science because they are not algorithmically significant. It is notable, however, that in vision and in natural language research, where the computational and cognitive approaches have been particularly successful at finding common ground, this dichotomy in representation between them is much less apparent. There is every reason to hope that as our understanding increases the same will be true in the other research areas discussed in this report. In addition, as more sophisticated methods to probe the human brain are developed, we may reach a common ground between processing in the brain and its computational significance.

2.2 Diagrammatic representations

Computer science has many different internal representations for two-dimensional and three-dimensional spatial information (Glasgow, Narayanan, and Chandrasekaran 1995). Most of them share the following properties:

• Each instance of the representation denotes a unique spatial layout, up to some natural class of isomorphisms.

• Each well-formed instance of the representation is guaranteed to be geometrically consistent, or at least the conditions for consistency are easily checked.

• It is straightforward to render the representation as a two-dimensional picture or a three-dimensional model.

A representation with the first two of these properties is known as a vivid representation in the sense of (Levesque 1986). A representation with all three properties is known as a diagrammatic representation. Where precise spatial information is available or can easily be posited, diagrammatic representations have obvious advantages. In general, there are straightforward, well-known, efficient algorithms to manipulate them. They are easily understood by programmers and users, and easily connected to graphical interfaces. Examples of diagrammatic representations include occupancy arrays and other retina-like representations, polygons described by the coordinates of their vertices, spline representations of curves, and constructive solid geometry representations of volume, where component shapes, dimensions, and relative positions are given exactly. Diagrammatic representations are discussed at greater length in Section 6.

2.3 Incomplete information

Artificial intelligence for high-level spatial reasoning must necessarily deal with incomplete information and with an exceptionally wide range of spatial properties and relations (Davis 1990; McDermott 1987). These include at least topological relations (Randell, Cui, and Cohn 1992); measures of distance, area, and volume; angles and directions; differential properties, such as tangents and curvature (deKleer 1977); symmetries; notions of one shape approximating another; repeated structures; and dynamic properties such as motion, shape change, and flow (Zhao 1994). Human high-level reasoning integrates all these concepts smoothly, and an automated system of general intelligence must do likewise.

Moreover, spatial reasoning in AI must deal with an extraordinarily wide range of scales. It is not at all unusual for a problem in commonsense physical reasoning to involve lengths varying across seven or eight orders of magnitude. Consider, for example, the reasoning involved in driving a car across the United States, where the distances involved range from the 3000 miles of the journey to motions of a fraction of an inch in controlling the car.

Similar issues arise in computer vision and in natural language processing. Due to occlusion and a limited field of vision, the information directly extractable from visual perception of a three-dimensional scene is almost always incomplete. The information is also almost always uncertain and imprecise, due to noise in the data and imperfect output from the low-level visual processors. Natural language descriptions of spatial properties and relations are almost always vague and geometrically incomplete.

For these reasons, diagrammatic representations do not suffice in these domains. Rather, a general-purpose AI system must employ a representation that is able to express partial knowledge of any of the above-mentioned types of spatial information, and an inference system capable of dealing with this partial knowledge. To avoid confusion and inconsistency in the definition of representations with such broad scope, it is crucial to be precise about the exact meaning of the representation. (Contrast this with the narrower spatial representation languages used in most other parts of computer science, where there is much less potential for ambiguity, and it is often safe to have ambiguities resolved purely procedurally.) Therefore, formal theories of spatial representations and inference, as well as formal methods to define and analyze models, concepts, representation languages, and inference techniques become particularly important in AI applications. Examples of representations that have been studied for incomplete spatial information include constructive solid geometry representations with interval bounds on the dimensions and angles involved, exact shape representations with tolerance to express permitted variance from the ideal, and systems of topological or metric constraints between regions.

2.4 Modes of reasoning

Cognitive tasks involve many different modes of spatial reasoning. In deduction, a program infers the truth of a proposition or the value of a measurement from given information. In matching, the program determines whether two representations can denote the same region, a key problem in model-based visual recognition and in queries for image-database management. In explanation, the program is expected to propose a spatial model to account for perceptual data. In generalization, the program abstracts key spatial properties from a collection of spatial instances. For example, Epstein and Gelfand have augmented a hierarchical mixture of experts game-playing program called Hoyle with the ability to abstract spatial properties significant to the game (Epstein, Gelfand and Lesniak, 1996; Epstein, Gelfand, and Lock 1998). In summarization, a program is expected to find the salient spatial properties of a single image, a mode used in the generation of natural language scene descriptions for the indices of an image database. In path planning, given a representation of a region with clear space and obstacles and an object to be moved, the program is expected to find a path to move the object through the space. In simulation, given the geometry and other characteristics of an initial physical situation, the program is expected to predict what will occur (DeCuyper, Keymeulen, and Steels 1995; Funt 1980; Gardin and Meltzer 1989). Strictly speaking, this last mode is physical rather than spatial reasoning, but in many important applications, most of the complexity arises from the geometry, so the geometric reasoning dominates the problem.

Each of these reasoning modes may be called upon in dynamic problems as well as static ones. In dynamic problems, the information given and the conclusions to be derived may involve motions and time-varying spatial relations. Simulation, of course, is always dynamic.

2.5 Research issues

The following areas of research are critical in the development of spatial representation and inference for a broadly intelligent system.

• The philosophical foundations of spatial cognition include the metaphysics of space and its relation to the rest of physical reality, and the epistemology of spatial knowledge and its relation to other knowledge. What, fundamentally, is a region? What does it mean for an object to occupy a region? What kinds of objects do occupy regions? What ontological status should be granted to such things as holes, which occupy space, but are not objects in the usual sense (Casati and Varzi 1994)? How might one’s understanding of space be viewed as an abstraction of a prior, more immediate, understanding of physics?

• The ontology of spatial cognition includes the models that might be available to support it. Is standard Euclidean geometry always the best model for reasoning about space? Or are there alternative models, such as discrete geometries, geometries with infinitesimals, tolerance spaces, or mereological and topological models, that are sometimes or always more useful as a basis for analysis? What is the class of spatial regions that need be considered in commonsense reasoning? For instance, can it be assumed that all objects occupy an extended regions, or must we admit objects that occupy a point, curve, or surface? Can it be assumed that all regions of interest are connected? That they are regular? That they have only finitely many holes and bumps? and so on.

• The characterization of spatial knowledge and spatial inference determines what kinds of spatial properties are relevant, what kinds of partial knowledge arise, and what kinds of inferences must be made in the spatial reasoning tasks that broadly intelligent systems will face (Newell 1981).

• Representation for spatial cognition includes the development of languages that can express the properties and partial knowledge in the characterization.

• Complexity and algorithmics for spatial cognition analyze the formal complexity of inference over these languages and develop complete algorithms for tractable classes of inference.

• Tools and implementation develop and implement inference systems for spatial knowledge that are practical for the tasks to be carried out.

• Characterize, define, analyze, and implement the uncertain and plausible inferences useful in spatial reasoning. For instance, if the diameter of R is less than 1 foot, then its circumference is probably less than 50 feet. Very little work has been done in this area.

Should one study spatial reasoning independently of a specific task domain that, in other respects, is fairly well understood? The argument against working in the abstract is that it is too easy to define broad representation languages and classes of inference, almost all of which are intractable or uncomputable in the worst case. The challenge for AI is to develop systems that are useful for the problems that arise in practice, and there is no way to predict what kinds of problems arise without a specific practical problem. The argument against working on concrete problems is that, at this time, any task domain that we understand well enough to be reasonably confident of the limits of the spatial reasoning involved will likely be a task domain where the scope of the representation and inferences needed are quite narrow, for which highly specialized techniques that do not extend to a general method may well suffice. For the present, therefore, it seems wise to pursue a mixture of approaches, working simultaneously on developing theories from very specific well-understood tasks; from broad, poorly understood tasks; and from purely abstract considerations. Hopefully, these will eventually meet and integrate.

3. Representation of Space in the Brain

3.1 Processing pathways in the visual system

It was often pointed out in the workshop that humans have a distinctive capability to reason spatially. Though we cannot answer all of the questions about how humans accomplish this with our present level of understanding, we will discuss some of what is known in this section. Over half of the cerebral cortex in primates is devoted exclusively or primarily to visual processing. This large region of cortex appears to be organized as a mosaic of modules, or separate visual areas, each of which has different properties and processes different kinds of information. It has been suggested that these modules are grouped into functional processing streams. Each of these proposed streams consists of a set of modules connected in a hierarchical fashion. For example, one stream which courses along the bottom part of the primate brain is thought to be primarily involved in the analysis of form, color, and texture. At the lowest level of this stream, the neurons have simple response properties, responding to lines and edges. At the highest level, the neurons respond in a selective fashion to complex colored objects such as faces and hands. In the motion processing stream, neurons at the lowest levels respond to simple motion of dots or lines, and at the highest levels neurons are sensitive to complex patterns of motion such as spiral or whole-field rotations. Less well understood, is the primate spatial location stream. Although a set of cortical modules has been found to be involved in spatial processing, it is not yet clear if these modules are hierarchically organized. While the visual system seems to divide visual perception into these different processes, the three streams are not independent, but are highly interconnected at every level of the hierarchy. (A good introductory review of this discussion is given in Kandel et al., 1991. A more technical review is given by Merigan and Maunsell, 1993.)

The apparent separation of processing of form and space information continues in the working memory circuits of the prefrontal cortex. Working memory is the short-term cache meory of the brain that is used in cognitive processing and decision-making (Baadeley, 1986). Haxby and coworkers have used brain imaging techniques to show that the human brain region in prefrontal cortex used for short-term memory of positional information is in a different location from that devoted to the short-term memory for objects (Courtney et al., 1998; Courtney et al., 1996). This dichotomy may account for the ease with which we can reason qualitatively about space and is an important potential area for research.

3.2. Spatial coordinate systems for motor control

Though some recent progress has been made in understanding of spatial and motion processing for cognitive tasks, much more is known about the processing of space and motion for the purpose of motor control. The terminus of the motion and spatial processing streams in the visual system as discussed in the last section receives much more than just visual input. It also receives tactile, joint, auditory, and vestibular information. (For a good overview see Andersen, 1987.) Because of its multimodal nature, this cortical region is ideally suited to process the space surrounding the body. This area, called posterior parietal cortex, projects to a variety of areas involved in the further processing of visual space and visuo-motor coordination. (For a good overview see Gross and Graziano, 1995.) In this manner, light falling on the retina can eventually result in motor behavior.

When we look at an object, its image is projected through the cornea and lens and evokes neuronal activity on a localized part of the retina. Already, the location of the stimulus is partly encoded; that is, the firing of retinal neurons can signal the location of the stimulus on the retina, in what are called retinocentric coordinates. However, if we reach toward that object, we must control the joints and muscles of the arm using a set of motor coordinates. How does the visuo-motor system, outlined above, transform the coordinates of objects as projected onto the retina into motor coordinates ? The answer would seem to lie primarily in posterior parietal cortex and structures projecting to it. Recent work has implicated the premotor cortex, very close to the motor output stage, in the final stages of the visuo-motor transformation. It is in this area that neurons encode the locations of nearby objects in coordinates that are useful for the motor system.

3.2 Premotor cortex

The premotor cortex in the frontal lobe is known to be involved in sensory-motor integration (Wise, 1985). Its neurons respond to somatosensory stimuli and are also active during voluntary movement. As first shown by Rizzolatti et al. (1981) and subsequently corroborated by others (Graziano, Hu and Gross, 1997), many of the neurons in premotor cortex that respond to touching the surface of the skin in a particular area also respond to visual stimuli on a particular location on the retina. That is, the neurons are bimodal. These bimodal neurons have matching tactile and visual receptive fields.

Figure 2 Pictorial representation of bimodal touch and visual receptive fields in parietal cortex as described in the text (from Graziano at al., 1997).

Fogassi, Rizzolatti and colleagues (1992) have found that the visual receptive fields of most premotor cells do not move when the monkey moves its eyes. Rather, the receptive fields seem to be stationary in space. On this basis the investigators suggested that the receptive fields were fixed to the head, or possibly the trunk, and therefore coded space in head- or trunk-centered coordinates rather than in coordinates centered on the retina. However, this idea remained untested because the investigators did not study the effect of head and trunk movement. Head centered visual receptive fields should move when the head is rotated, and trunk-centered receptive fields should move with the trunk.

Subsequent studies (Graziano, Yap and Gross, 1994; Graziano, Hu and Gross, 1997) have shown that premotor cortex does not contain a single, simple egocentric coordinate system as Rizolatti and colleagues had hypothesized. Instead, for most bimodal neurons with a tactile response on the arm, the visual receptive field is anchored to the arm and moves when the arm moves. Likewise, for most bimodal neurons with a tactile response on the face, the visual receptive field is anchored to the face and moves as the head is rotated. Figure 1 shows examples of the responses of two such neurons. In Fig. 1(A), the stippled area on the face shows the tacile receptive field of the neuron. Touching the skin here activates the neuron. The boxed area shows the visual receptive field. Objects placed in this region of space, near the face, also activate the neuron. This region of space is fixed to the head, moving when the head is moved. Fig. 1(B) shows a similar example, but is related to the arm instead of to the head.

Visual receptive fields anchored to the arm can encode stimulus location in arm-centered coordinates, and would be useful for guiding arm movements. Visual receptive fields anchored to the head can likewise encode stimuli in head-centered coordinates, useful for guiding head movements. This body-part-centered or motor-effector-centered scheme can provide a general solution to a problem of sensory-motor integration: sensory stimuli are located in a coordinate system anchored to a particular body part, helping to guide the movements of that body part.

3.3 Research issues

Although much has been learned about the behaviour of neurons, and thus the information processing, in pariatal cortex and in premotor cortex, these two brain regions are only a part of the sequence of areas that span the brain from the sensory side to the motor side. Little is known about most of the dorsal stream areas in the occipital lobe, and less is known about the motor areas that receive information from premotor cortex. Still less is known about how all these areas interact with each other. Clearly the visuo-motor system in the primate brain remains largely unexplored. The most important questions, however, go well beyond neuroscience. How do these brain areas and neuronal responses account for human spatial processing, for hand-eye coordination, for the abilities of atheletes, for the every day abilities of all of us, and for the disabilities of brain damaged people ? Can brain systems for processing space and movement be mimicked by machines ? Can robots be built to move autominously, avoid obstacles, and reach out for targets ? These questions are all interdisciplinary in nature and can best be studied by neuroscientists collaborating with technologists in these fields.

4. Geometric Reasoning and CAD/CAM

4.1 Introduction

Mechanical and electromechanical products such as automobiles, cameras, or machine tools, constitute a large proportion of our gross national product. The geometry of physical parts and assemblies plays a crucial role throughout a product’s life cycle, from design through manufacture, inspection, assembly, field service and disposal. This geometry also impacts products at all scales, from enormous aircraft and ships to micro-electromechanical systems (beginning to appear in the marketplace), and even nano-electromechanical systems (still very much at the research stage).

Given a representation of the geometry of a desired product (plus ancillary data such as material or surface finish), a typical problem is to determine the proper manufacturing processes and the sequence of the operations needed to manufacture, inspect and assemble it. Such process planning, inspection planning, and assembly planning problems are central in modern manufacturing. Although the supply of software and hardware systems for CAD/CAM is a billion-dollar industry, and computers are ubiquitous in the modern industrial workplace, activities such as design and planning remain almost exclusively the province of humans. These activities primarily require reasoning about geometry and synthesis rather than analysis, both notoriously difficult for computers.

Geometric reasoning for CAD/CAM is not only of great practical interest, but also of scientific interest in itself. CAD/CAM is a rich, real-world domain. The need to deal with the real world, in all its complexity, poses important problems that must be addressed if AI is to fulfill its potential as a major intellectual discipline and as a source of technology for applications. These problems range from the relatively mundane yet technically difficult, such as ensuring that AI systems collaborate synergistically with other enterprise software, to fundamental issues, such as reasoning with multiple models in the presence of spatial uncertainty and partial information.

4.2 State of the art

AI methods coupled with quantitative geometric computation now successfully tackle difficult problems in geometric reasoning in CAD/CAM. Recognition of machinable features is a problem in representation conversion. Given a solid model of a part (an unambiguous representation of the part’s geometry (Requicha 1980)) or a design-oriented representation of the part in terms of design features such as bosses, webs or slots, find a manufacturing-oriented representation in terms of machinable features such as holes or pockets, regions of the workpiece that can be associated with specific manufacturing processes. In a traditional means-ends analysis framework, machinable features can be viewed as differences between an initial state of the workpiece (the stock or raw material) and the goal state (the desired part). These differences may be reduced by applying the appropriate machining operations.

Kim, Marefat, and Requicha describe various algorithms for machining-feature recognition. Kim’s algorithms use convex decomposition to identify outermost faces in a successive manner, information intrinsically relevant to a wide spectrum of manufacturing processes. The convex decomposition is then converted into form feature decomposition using combination operations, some of which have been generated through an inductive learning procedure (Wang and Kim 1994). Marefat computes a three-dimensional qualitative model of the workpiece’s geometry. His spatial reasoning procedures search the qualitative model for alternative interpretations of the geometry. Each interpretation is a set of three-dimensional primitives that correspond to features that can be produced by machining operations and their spatial relationships. Requicha and his students use AI techniques, such as blackboards and truth maintenance, in conjunction with geometric modeling computations. Their systems reason with hints from various sources, such as solid-modeling geometry, design features, or tolerances, and validate or reject those hints through geometric computations. For further details see (Han and Requicha 1997; Johnson and Marefat 1996; Kim 1992; Raman and Marefat 1997; Vandenbrande and Requicha 1993; Waco and Kim 1994).

Feature recognition shares only some similarities with vision. Like vision, feature recognition involves 3D-scene segmentation and object recognition and must accommodate partial data. Whereas in vision, partial data is typically due to occlusion, in feature recognition, it is due to volumetric feature interactions. Unlike vision, however, feature recognition does not have to deal with noise.

Shape analogies are important for such tasks as reuse of process and assembly plans, content-based retrieval and querying of design databases, and case-based learning. Marefat uses qualitative geometric descriptions to compare workpieces and determine shape similarities. This approach solves a difficult problem in shape analogy in a restricted domain (Johnson and Marefat 1997). The inference mechanisms model three-dimensional spatial relationships as qualitative matrices and provide hierarchical methods for mapping between alternative descriptions of similar spatial configurations.

Spyridi, Requicha and Sptitz (Spyridi and Requicha 1990) are developing a planner for dimensional inspection with Coordinate Measuring Machines (in essence, highly accurate 3D-digitizers). The input is a solid model of a part plus a description of tolerance specifications to be confirmed. The output is a sequence of inspection operations characterized by the part-to-machine orientation, the surface to be inspected, the probe to be used, the probe orientation, and the probe path. The planner uses a least commitment approach in a state space formulation. Some of the operators enforce accessibility constraints through elaborate geometric computations involving Minkowski operations on object’s faces and Boolean operations on direction cones (sets of directions in space). Sensor placement and planning for dimensional inspection of three dimensional components using machine vision have also been studied (Yang and Marefat 1994).

Inspection planning has strong intellectual ties with robotics. Accessibility of surfaces to inspection probes of coordinate measuring machines and placement of cameras in an active sensing system are closely related to issues of visibility in vision. Assembly planning algorithms must compute collision-free paths for inserting parts in an assembly. This is essentially an accessibility computation for a set of faces, and is closely related to accessibility analysis for inspection. Path planning is necessary both in inspection planning and in many robotic problems.

Fixture design is closely related both to grasping problems in robotics, and to assembly planning, because the fixture is an assembly of components, and loading the part into the fixture is an assembly operation. Penev and Requicha developed a system for the automatic design of fixtures for immobilizing a workpiece (Penev and Requicha 1997). The input is a solid model of a part and a description of a task to be executed while the part is in the fixture. The system produces a set of contact points where suitable clamps and locators are to be placed, and then constructs the fixture as an assembly of modular components from a given set of solid hardware primitives. Rule-based task analysis subsystems encode desirable and undesirable portions of the object as zones of attraction and repulsion in an artificial potential field. This field is used in a generate-and-test approach that employs randomization to construct initial sets of contact points and kinematic computations to test them and ensure that the part cannot move.

While most of the work in geometric reasoning for CAD/CAM intends to automate or provide support for the human engineer’s of geometric and visual reasoning processes, some efforts have also been made to support the development of human engineers skilled in visual analysis and synthesis. Visual Reasoning Tutor uses geometric sweeping operations and rule-based intelligent tutoring modules to enhance visual reasoning skills of young engineers (Mengshoel and Kim 1996). Such tools could be used in understanding the underlying procedures in human spatial cognition.

The work presented at the workshop focused on reasoning with solid models, i.e., complete representations of the detailed geometry of parts or assemblies. Other models are relevant for CAD/CAM, too, including higher-level functional and behavioral models, and kinematic models composed of sticks and joints. These models do not describe a part’s geometry unambiguously, but are nevertheless important as useful abstractions at various stages of product design. Reasoning with incomplete geometric data is still in a relatively primitive stage in the CAD/CAM domain.

4.3 Research issues

A number of difficult problems associated with autonomous or semi-autonomous spatial reasoning in design and manufacture remain. Understanding how humans designers and planners tackle any of these problems could prove useful information in the design of advanced CAD/CAM systems.

Detailed quantitative representations of geometry, such as solid models, are inappropriate at the initial stages of design. Ideally, design and manufacturing systems should be able to reason at multiple levels of abstraction and use multiple views or perspectives. This raises a host of unsolved problems concerning the interplay between qualitative and quantitative models, which models to use when, consistency between models, and integration and cooperation between distinct views or abstraction levels.

Some models are deliberately ambiguous and contain only partial information about their objects. The ambiguities are to be removed at later stages of the design process, possibly through negotiation involving several aspects of the design. For example, the exact dimensions of a clearance hole are not important to a designer (within some reasonable bounds), and may be decided by such manufacturing considerations as existing tools. Reasoning with ambiguous geometric models is difficult and not well understood.

Design and planning problems typically have large solution spaces. Many of these solutions, however, are unacceptable because they are too far from optimal. For industrial acceptance, CAD/CAM systems must produce high-quality designs and plans. At first, automated systems may not be able to deliver acceptable solution quality for many industrial problems. The systems will still be useful, however, provided that they can accept and use human advice. Such intervention must be anticipated at the system design stage, not grafted onto a system designed for purely automatic operation. Graceful human-computer cooperation in CAD/CAM is a non-trivial issue.

Spatial uncertainty is inherent in physical objects and processes. For example, we cannot manufacture a perfect cube, and even if we could, we would never be able to detect that it was perfect. Yet qualitative reasoning seems to require certain topological predicates (e.g., connectivity). that must be extracted reliably from imprecise numerical data. Can qualitative information be used to guide numerical computations and fight the unavoidable round-off errors? Early solid modelers were plagued by numerical inaccuracies; even modern modelers are far from robust when certain unstable computations are required. For example, determining the common area of two solids that just touch often cannot be done reliably, but that contact region is important when we must move an object into contact with another in a compliant assembly operation.

Current CAD/CAM subsystems are usually invoked sequentially. For example, the output of the design activity is fed into a feature recognizer, which extracts a complete set of machining features for the part, and passes these to a process planner that decides how to machine the part. Such sequential invocation is suboptimal and violates the modern principles of concurrent engineering. A better approach would operate all these subsystems concurrently and negotiate among them a solution that considers design requirements, manufacturing constraints, and so on. For example, a feature recognizer and a process planner operating incrementally could provide to a designer immediate feedback about the manufacturing consequences of design decisions. Cooperation among different problem solvers raises difficult and unsolved problems of system design, communications and coordination.

Intelligent CAD/CAM systems require a great deal of domain knowledge, knowledge that evolves dynamically and differs from company to company. Company-specific preferences about which objects to design, and how to design and manufacture them are important because they provide the company with competitive advantages. How can such domain knowledge and preferences involving object geometry and spatial configuration strategies be captured in a CAD/CAM system? Manual encoding would be difficult, costly, and extremely difficult to maintain in a dynamic environment. A better approach would be to use machine learning to acquire that knowledge from examples. This would require systems that accept human advice, and powerful learning techniques for the spatial domain.

5. Robotics and Perception

5.1. State of the art in machine vision

Although many important problems remain to be solved, significant progress has been made in vision for robotics and automation during the last decade. For example, 10 years ago it would have been impossible to recognize and locate the objects in a pile like that in Figure 2(a), but today many laboratories can do so. Similarly, a decade ago it would have been impossible for a mobile robot to make sense of the cluttered hallway shown in Figure 3, but today it is not much of a challenge. These two figures are taken from (Grewe and Kak 1995) and (Kosaka and Kak 1992 respectively).

Figure 2 A scene containing objects from the model-base is shown in (a). A color-composite light-stripe image of the scene is shown in (b). (c) A three-dimensional plot of the points detected. (d) The result of the segmentation process.

The fundamental problem that fueled interest in computer vision half a century ago, however, remains unsolved today: to recognize and locate 3-D objects in single 2-D images in the presence of clutter. A scene like that in Figure 2(a) can be analyzed only with the help of a laser-based range sensor that generates a three-dimensional image of the scene. The other frames of Figure 2 show a structured-light image of the scene (b), a height map that illustrates the three-dimensional nature of the data collected (c), and a segmentation map obtained by detecting range discontinuities and by the thresholding of curvature (d). The high quality of the segmentation is the primary reason for the robustness of 3-D vision in this case.

Scenes like Figure 2(a) are not amenable to interpretation from single 2-D images primarily because there are an extremely large number of degrees of freedom associated with such scenes. There are six different ways the position and the orientation coordinates of each object can vary in the scene, so, given the number of objects that can be in a scene, a great many parameters determine the scene’s composition.

Figure 3 (a) A camera image taken from the robot while it is engaged in hallway navigation. (b) A scene expectation map rendered from a 3D model of the hallway. (c) Ellipses representing the uncertainties associated with the vertices of the edges in the expectation map. (d) A reprojection of those model edges into the camera image shown in (a) after the robot determined its position in the hallway.

If scene variability can be controlled, it is indeed possible to carry out automatic scene interpretation in a robust manner from single 2-D images. The reason that a mobile robot can readily make sense of the scene of Figure 3 is that a hallway has only three degrees of freedom with respect to the mobile robot. As the robot travels down a hallway, it can keep track of its positional uncertainty and project this uncertainty into the camera image to bound where the different features of a hallway may show up in a camera image. Figure 3(b) is a projection into the camera image of what the robot expects to see. This projection (or expectation image) is made on the fly from a 3-D wire-frame model of the hallway, stored in the robot’s on board memory. The ellipses attached to the lines in the expectation map, derived from the robot's known positional uncertainty, help place bounds on where in the camera image the computer should seek the correspondent of a feature in the expectation map. This immediately leads to expectation-driven detection of edges in the camera image. Figure 3(c) is the set of all the edges extracted from the camera image. Clearly, the computer is not bogged down by irrelevant information. Figure 3(d) is a re-projection into the camera image of those expectation map edges that were successfully matched with the edges extracted from the scene.

5.2. Research issues

While object recognition remains a central problem in robot perception, other equally important problems include how to inspect objects whose identities and locations are known, how to construct models of 3-D objects, and how to elicit a desired behavior from a robot. Many issues remain, issues associated with the representations (i.e., models) that robotic systems use to acquire information about, reason upon, and act in the world. The remainder of this section describes some areas where further research progress is needed.

Machine vision would benefit from richer knowledge representations of space. Most robotic models are based solely on geometry. An important potential enhancement would be more complex abstract topological relationships, such as "on," "above," "below," "behind," "left of," and "inside," as well as non-metric measures, such as "close", "far", and "near." Such representation would allow choices of coordinate frames that may not be strictly Euclidean.

The inclusion of motion is an important issue. Since a robot’s world is dynamic, enhancing it with representations and mechanisms for temporal changes and constraints will increase its ability to perceive and act in its world. Despite the difficulties that motion introduces into a scene, understanding and reasoning about dynamic scenes are essential.

Machines could reason more efficiently about physical space if they had task level constraints in a unified formulation. Machines’ current ability to perceive and act in the physical world is only possible when their task space is clearly defined. Currently it is difficult to represent a task in a unified, simple way that allows a robot to interact with the underlying data structures that represent the world encompassing the task. Task level programming and improving current human-robot interactions are two areas where research progress is needed.

It is also critical that other parts of the modeling process be recognized. Environments to be modeled must include not only the geometric space but also the related environment parameters and processes, such as rich texture maps, lighting models, and models of the sensors used to explicitly generate higher level information. Most current modeling is of fairly simple, structured indoor environments, often static ones. The modeling of richer environments is important, particularly outdoor ones, which tend to have less structure and are therefore more difficult to represent.

Machines would also benefit from the integration of new sensors. Compared to biological systems, current machine sensors are quite limited. Performance would be enhanced with the creation of faster and more accurate sensors. 3-D imaging technology is improving, and new range-finding systems are being developed that appear promising. In addition, proper exploration and perception in a spatial environment requires many different senses. The fusion of multiple sensor sources (such as video, audio, and touch) would provide redundant and complementary information to improve performance and reduce errors in perception.

The final crucial research issue is the facilitation of behavior-based control. A behavior is a set of stimulus-action pairs. Activated by the detection of a trigger stimulus, the behavior determines the most appropriate action as a response to the stimulus, and then performs that action. For example, a robot navigating a hallway must be able to detect a potential obstacle (stimulus) and invoke its obstacle avoidance behavior (response). Equally important stimuli, however, are logical (as opposed to physical) events, such as the addition of information to a database or a change in task specifications. Logical events make no change in the external environment. Detection of either physical or logical trigger events in the image domain may require processing time that is exponential in the image size if the search is not guided by the target that is sought. Thus one must define behaviors with an explicit representation of the trigger stimulus plus a strategy for how that target may be used to optimize the search for it. This sort of guidance is classically termed attention; in the visual domain, visual attention. A visual attention mechanism guided by the database of sought objects would simplify the general, single-image object recognition problem posed above because it would explicitly reduce the combinatorics of the number of degrees of freedom and the number of objects. At one time, behavior-based control, as introduced by the subsumption architectures, promised intelligent control of a robot, but recent formal analyses have shown otherwise. The mechanisms excluded from subsumption (hierarchies, intermediate representations, attention, goals) are in fact critical for the strategy to scale up to human-sized problems. A subtle redefinition of behaviors satisfies the scaling requirements. The original definition of a behavior in subsumption required that trigger events be found only in the external world and that all action of a behavior be applied only to the external world. If the world is redefined to include both external and internal worlds (that is, the representations inside the robot), then those powerful mechanisms are naturally facilitated. The next challenge is to develop an effective algorithm for composing behavior sets to satisfy particular tasks. For robots that manifest spatial cognition, a first priority is to develop a theory for needed knowledge acquisition from sensors.

6. Diagrammatic Reasoning

Diagrams and models play a central role in human spatial reasoning. Psychological studies of spatial and temporal reasoning have established that individuals construct mental models of relevant domains. Johnson-Laird has observed that people reason better from diagrams than from verbal premises, and that the most helpful diagrams are those that correspond to people’s mental models. Furthermore, he has demonstrated that teaching people diagramming strategies improved their performance in reasoning tasks (Bauer and Johnson-Laird 1993).

Research in diagrammatic reasoning attempts to understand the advantages of diagrammatic problem solving in humans and machines. The purposes of such research include the development of artificial agents that can take advantages of diagrams for problem solving as efficiently and as elegantly as people can, and the development of effective instruction methods to teach people visualization skills for problem solving. To do so, we must understand what diagrams represent to people (their informational aspect), how people process them (their computational aspect), and how they are used (their pragmatics).

The informational aspect of diagrams tailors them to the task at hand. In Larkin and Simon’s work on explaining why pictures are helpful in problem solving, they focused on the information localization property of diagrams (Larkin and Simon 1987). Since information is spatially organized in diagrams, one is likely to find relevant information by searching the small area surrounding the current focus of attention. In the case of diagrams that are intentionally designed for a particular task, an even more significant feature is the fact that they represent just the right level of abstraction for a given problem. Since such diagrams are designed with a particular class of problems in mind, they bring out just the features of the presented situation that are relevant for solving the specific type of problems. The word "abstraction", however, must be taken with a grain of salt in the context of diagrams. Though all diagrams abstract out some irrelevant aspects of the situation depicted, they are not strictly less detailed than the original representation, since depictions must take a concrete form with a specific size, shape, orientation, color, and so forth. In a sense, a diagram not only abstracts but also adds details. Thus, interpreting a diagram as an abstraction requires a tacit understanding of the features of the diagram that are intended, and those that are not.

Diagrams clearly have a computational aspect as well. Abstraction alone cannot be the reason for the power of diagrams in problem solving, for if it were, there would be no reason actually to draw a diagram once the right level of abstraction was identified and a new abstraction description created. For the diagram to be helpful in problem solving, the particular abstraction must be mapped to a two-dimensional drawing so that the relevant features are easily recognizable by the processor of the visual information. For example, analogical processing, including the detection of regularity and symmetry, has been shown to play a central role in human visual cognition (Ferguson, Aminoff, and Gentner 1996). In general, once the right level of abstraction is identified, the process of mapping from the old representation to the new one must take into account the computational characteristics of the processing mechanism that accompanies the new one if the new representation is to increase the overall problem solving efficiency. The various conventions common in many types of diagrams should also strongly guide the design of diagrams (Tversky 1995). Even though such conventions are not computational considerations per se, they may arise from some underlying computational characteristics of the visual cognitive mechanism, and, moreover, they will certainly affect the efficiency and accuracy of the processing of diagrams by human eyes.

Diagrams are rarely used in isolation. To understand diagrammatic reasoning, it is important to study how diagrams are used in conjunction with other means of representation and communication. In physical and mathematical problems, diagrams are used along with a symbolic representation (Forbus 1980; Forbus and al. 1991; Iwasaki, Tessler, and Law 1995; Novak and Bulko 1992).

Diagrams are often modified during the problem solving process to reflect the state of the symbolic part of the representation. Sometimes, multiple diagrams are used: to complement one another, for a comparison, or as an animation to show a sequence of continuous or discrete changes. In human communication, people point to and wave their hands at different parts of diagrams. Though ephemeral, such gestures certainly must add elements to people’s mental model of the situation depicted by the diagram, and their effects must be taken into consideration when studying the advantages of diagrams in problem solving.

6.1 State of the art in diagrammatic reasoning

AI researchers have begun to appreciate the advantages of using diagrams and models in reasoning (e.g., (Glasgow and Papadias 1992)). Proof-theoretic methods of inference suffer from an inability to demonstrate directly that an inference is invalid. The best that they can do is to fail to find a derivation. In contrast, a diagram or model can establish directly that an inference is invalid by instantiating the premises together with the negation of the putative conclusion. On the other hand, cognitive psychologists are currently studying the use of diagrammatic methods to help logically-untrained individuals reason better. This "model" method relies on teaching reasoners to keep a mental record of the alternative possibilities. Preliminary results have demonstrated a 30% improvement in reasoning. Hence, the method looks promising as a simple aid to reasoning.

With respect to the use of diagrams for teaching people reasoning strategies, some researchers have expressed doubts about the effects of diagrams on the inference process itself. For example, "In view of the dramatic effects that alternative representations may produce on search and recognition processes, it may seem surprising that the differential effects on inference appear less strong. Inference is largely independent of representation if the information content of the two sets of inference rules (one operating on diagrams and the other operating on verbal statements) is equivalent" (Larkin and Simon 1987). Barwise and Etchemendy have similarly argued that the truth behind the adage that a picture is worth a thousand words is that diagrams and pictures are good at presenting a wealth of specific conjunctive information (Barwise and Etchemendy 1990). "It is much harder to use them," they say, "to present indefinite information, negative information, or disjunctive information." Such information, they claim, is often better conveyed by sentences. Fortunately, these views are overly pessimistic. We now know that it is possible to devise diagrams that help individuals to reason about both disjunctions and negations (Bauer and Johnson-Laird 1993). There are still other important concepts (e.g., universal quantification) for which an effective diagramming strategy is yet to be devised.

6.2. Research issues

A major challenge in psychology is to extend the use of diagrammatic procedures to help individuals improve their reasoning about probabilities. People without training in probability calculus are notoriously bad at probabilistic reasoning, and even experts can have difficulties. Consider, for example, the following problem. The probability of a DNA match is one in one million if the suspect is not guilty. The suspect’s DNA matches the crime sample. Is the suspect likely to be guilty? Most people respond, "yes," which is wrong. Current studies have shown that the cause of the error is the difficulty of building the full set of mental models (equivalent to the correct partition of the problem). Given the success of diagrams in helping individuals to make deductions, using diagrammatic methods as a tool to improve probabilistic reasoning is an exciting possibility. The aim of such research should be a mutual increase in our understanding of how individuals reason, in the development of computer programs for reasoning, and a powerful aid to reasoning.

A second major challenge is the development of a diagrammatic reasoning system that can not only interpret diagrams and solve problems using diagrams but can also design useful diagrams for problem solving and communication. Such a system should be able to construct diagrams, transfer them automatically, and inspect them to make necessary inferences. The achievement of this goal will require research in AI, psychology, linguistics, and human-computer interaction to understand the computational aspect and the pragmatics of diagrammatic representation.

7. Linguistic Representation of Space

How do we talk about space and the things in it? What does the way we talk about space reveal about the way we think about space? Language and cognition both schematize the world; to what extent do these schematizations coincide, and to what extent do they differ, and why? This section integrates the state of the art with research challenges in the field.

Like the mind and the nervous system, language distinguishes the what — objects — from the where — spatial relations among objects. Objects are usually expressed with nouns, an open-class category; spatial relations are usually expressed with forms from closed-class categories such as prepositions. Objects are normally referred to at the basic level, the level of horse and table, and recognized by distinctive shapes. Faces are a notable exception. Typically labeled at the level of an individual, they are distinguished by internal configurations. Although research on the language of objects is well-established, mysteries remain, for example, our inability to describe faces precisely, contrasted with our highly accurate recognition of them. What sorts of things and relations does language convey well, and what does it not convey well?

Research on the linguistic expression of spatial relations is in initial, promising stages. Though geometric interpretations of terms like "above" and "in" are tempting, they are unlikely to be sufficient. Functions also affect uses (a pear perched on top of a pile of fruit is said to be "in" the bowl even if it is above the bowl’s borders) as do goals ("above" a tree to pluck an apple describes a different spatial region from "above" a tree to photograph it). Moreover, different languages map physical and functional relations to language differently. Spatial relations exist both within and between objects; the language of object parts and relations is another topic worthy of investigation.

The language of the dynamic aspects of space — events, activities, and changes — appears to be more complex than the language of the static world, again like perception and cognition. As for objects, language gives clues as to how people segment events and think about movement and change. Perhaps because these aspects of language and cognition are more complex, research is in its nascent stage and should be encouraged.

Conveying the location of an object typically requires a reference object and a perspective. Although a visual perspective is given by the viewpoint on a scene, language (and thought) allow the assumption of other perspectives on objects and on relations among objects. For spatial relations, three basic perspectives or frames of reference have been distinguished: relative (viewer-centered), intrinsic (object-centered), and absolute (environment centered). (There are some complexities not captured by this trichotomy.) Recent research shows that languages differ in the extent to which they rely on these frames of reference, and that linguistic differences may influence how speakers organize space. For example, speakers perform differently on non-linguistic spatial tasks depending on whether their language relies heavily on the relative frame of reference (e.g., from the viewer’s perspective, X is to the left of or behind Y), or has no relative frame of reference and uses the absolute frame of reference instead (e.g., X is to the east of or uphill from Y) (Levinson 1996). Perspective also is a promising area of research because it can supply coherence in extended spatial discourse, for example, to give directions on how to travel from A to B, how to operate a device, how to describe an environment or a face or a device. The construction of extended spatial discourse, including perspective and hierarchical organization, is another promising topic.

Learning spatial language is also an important area for research. The existence of differences in the way languages organize space raises complex issues about how learners acquire the spatial semantic system of their language. Learning spatial language can no longer be regarded, as it often was, as a matter of simply learning the words for a universally shared set of non-linguistic spatial concepts. Instead, children must apply their non-linguistic understanding of space to the task of discovering the categories and conventions of the input language (Bowerman 1996). How this interaction between non-linguistic cognition and the input language takes place is still poorly understood.

Spatial inference is a challenging area, particularly in extended spatial discourse. Although the language of spatial relations and perspective lends itself to inference, the requisite reasoning is not always straightforward. For example, a finger is part of the hand and the hand is part of the arm, but it is odd to say that a finger is part of the arm. As another example, if Bob is in front of Alan and Carl in front of Bob, Bob may also be in front of Carl if they are facing each other.

A fundamental property of language is its expected role in interaction. Language is typically used with others. Even in solo descriptions, a particular audience is assumed. Differences in perspective and the ineffability of certain things and relations are important issues arising in studying interactive spatial language.

Spatial language encompasses more than words. It also includes gestures, such as pointing or indicating extent or manner; enactments, such as claiming territory by putting down a marker; and diagrams, such as maps and charts. Depictive language conveys space more directly than symbolic (verbal) language. Research on how machines might do this would be constructive.

Spatial language is coopted to talk about many other things. We say "this field is wide open," "that idea has no support," "she’s at the top of the heap," "he’s down in the dumps," and "that family is on welfare." What aspects of meaning get transferred metaphorically? To what extent does metaphoric use of spatial language reflect spatial thinking, or is it just words?

All these issues are ideal for interdisciplinary research. In fact, it is difficult to see how they can be resolved without the contributions of cognitive psychologists, linguists, anthropologists, neuroscientists, and computer scientists.

8. Qualitative Physical Reasoning

8.1 State of the art in qualitative physical reasoning

Consider a half-full wineglass being gradually tilted. Without knowing any further details, can one predict if wine will eventually spill onto some surface, and when? In this simple scenario, qualitative reasoning enables one to say that the wine will come out if the tilting continues. To answer the second question, however, one may need to know precise numeric details of the wine level, the size of the glass jar, and even the viscosity of the liquid and rate of tilting.

Qualitative reasoning (QR) is concerned with the first kind of inference, which uses qualitative information about the physical phenomena to draw useful conclusions about structures and behaviors of physical systems. The same kind of reasoning can be applied to domains involving expert knowledge of, for example, electronic circuitry, fluid dynamics or design of complex mechanisms. Examples of qualitative information include directions of movement, relative magnitudes of physical quantities, boundary topological information, connectivity of circuit components, and typical behaviors of physical objects. Qualitative knowledge is often usefully integrated with quantitative — typically numerical — information.

In the past decade, the qualitative reasoning community has developed techniques for using qualitative information about devices, processes, and physical constraints to diagnose faults in electrical circuits and predict behaviors of complex mechanical systems such as clocks and air conditioners. Qualitative descriptions abstract away numerical details, so the conclusions drawn from them are often usefully robust. In addition, since qualitative reasoning uses concepts such as causality and "natural" descriptions of the topology of systems, it can often provide intuitive explanations of the underlying physical phenomena, making QR useful for tutoring and human-oriented interface design.

Qualitative Physical Reasoning (QPR) is the application of QR ideas to physical systems. Early applications of QPR focused on systems with a small number of degrees of freedom, and considered only the topological structure of the device, such as the connectivity of a circuit. More recently, the work in QPR has involved analyses of fluid flow or mechanical interactions, such as interacting gear pairs, which depend crucially on shapes, and involve far more degrees of freedom, perhaps infinitely many. This reasoning is centrally concerned with space, time and spatio-temporal objects and boundaries, e.g., (Yip and Zhao 1996). Robot navigation, for example, typically involves spatial planning using models of space with incomplete metric information (Epstein 1998). The analysis of complex mechanical systems also considers spatial interactions in their configuration space descriptions. Another central task of QPR is that of locating meaningful objects and patterns in continuous physical fields, such as weather fronts in meteorological data sets. These inferences are inherently more complex.

A central concern in QPR is the identification of a suitable conceptual vocabulary — often called an ontology — to describe the systems and processes, since the necessary inferences must be expressible in these terms. For example, qualitative spatial reasoning about movement is often centrally concerned with whether or not paths can cross boundaries, so the relationship between volume and boundary often plays a crucial role in the descriptions used by such systems. The resulting concept of space is different from the traditional coordinate-based description. As another example, QPR often treats objects as essentially dynamic, giving an ontological picture which may be useful in diagrammatic reasoning, where objects are usually considered as static.

8.2 Research issues

Research is QPR is spurred by the following questions. What are appropriate ontological bases for various kinds of tasks and applications for various kinds of reasoning? What do they have in common? Are there any universal "minimal" ontologies of general applicability? How does QPR integrate with more traditional (quantitative) methods and techniques used throughout science and engineering? How can existing spatio-temporal QPR techniques be usefully applied (in an intellectual sense) and what other areas will they be useful in?

In particular, the following research problems are worthy of immediate attention:

• The description of relations involving tolerance judgments and order-of-magnitude reasoning.

• Qualitative descriptions of continuously deformable objects.

• The birth, evolution and destruction or disassembly of objects, especially complex assemblies and local phenomena in continuous fields (e.g. thunderstorms).

• Computing qualitative descriptions from sensory inputs.

• Techniques for utilizing spatial concepts in specialized reasoners.

• Tools for rapid prototyping of QPR problem solvers.

9. Interdisciplinary Research Opportunities

We envision many areas of natural and significant impact involving spatial reasoning, particularly in the areas of CAD/CAM, robotics, GIS, human-computer interaction, game playing, and molecular biology. It is the consensus of this meeting that the most promising current research opportunities lie in multipronged approaches rather than in the development of a universal spatial reasoner.

A deeper understanding of spatial cognition requires better models and representation languages for space and spatial properties. The kinds of representations include qualitative, symbolic, linguistic, numerical, and analogical models. The types of spatial phenomena to be addressed include shape, orientation, large-scale space, visual space, body-centered space, and diagrams. There are important interrelationships among how the brain processes visual information, how humans structure their internal spatial representations, how natural language structures space, and how a formal logic or computational model might represent spatial knowledge.

We offer below suggestions for 18 interdisciplinary research projects inspired by interaction at this meeting. Of course, intrinsic to any of them must be a careful scientific evaluation of the results.

• Transformation and inspection of visual and spatial representations for computational imagery. Studies of imagery and visual perception have resulted in an understanding of basic primitives for reasoning with image representation. For tasks that would require imaging if carried out by humans, intelligent systems require the implementation of primitive and high-level functions for image construction, transformation and inspection.

• Ontological foundations for formal spatial logics, including applications to GIS. There have been noticeable progress and results in the area of qualitative spatial logics in the last decade or so. Whereas early work emphasized the simulation of processes of human spatial reasoning, attention may also be devoted in a serious fashion to the object-domains toward which the reasoning itself is directed. Philosophical and AI research in the area of formal ontology has reached a degree of sophistication that can provide articulated grounds for innovative work in this direction. There are already some interesting GIS results with cognitive geography.

• Research in geometric reasoning. Better frameworks for representing and manipulating geometric knowledge could lead to designs that provide more adequate separation between generic principles and application constraints. An ability to reason about geometric concepts and relationships would bring AI research to real world problems with the right mixture of complexities. Such AI research results would directly apply to the solution of problems central to modern manufacturing, including process planning, inspection planning, and assembly planning.

• Mutual confirmation and verification of data and processes as reported by neuroscience, computational models, and cognitive psychology. For example, it would be constructive to have a computer model accept neuroscience data and use an automated vision system to make predictions that could be confirmed in the neuroscience laboratory.

• Spatial reasoning for engineering, CAD/CAM, and graphics. Engineering graphics is the classical discipline for both the teaching of engineering visualization and the diagrammatic communication of engineering design. The kinds of spatial reasoning and spatial ontologies used in this discipline are poorly understood; they are largely an art, albeit an important one. Spatial representations and methods of reasoning about 3D objects could provide more effective ways of teaching this discipline and formalize the language of graphic design.

• Linguistically inspired and cognitively grounded natural language processing for human-computer interaction. Progress in understanding human spatial cognition in AI, psychology, and linguistics could be used to create new systems for natural language understanding and generation, for example, in textual summaries of diagrams or visual scenes. Conversely, natural language systems provide an arena for testing theories of how space and language interact.

• Notions of similarity with respect to spatial reasoning, including geometric, qualitative, and domain-specific factors. Knowledge of how analogical reasoning, retrieval, and learning operate in spatial domains is important, as is how spatial analogies are used in image understanding, navigation, and CAD/CAM. Improved understanding could lead to advanced software systems for computational biology and for GIS.

• Spatial representations and model-based reasoning in robotic navigation. Learning and representation of 3D-environments is essential for robotic navigation. Issues such as sensing, object recognition and avoidance, hierarchical structure, and pattern learning are all relevant to this study, as well as the integration of qualitative and quantitative information.

• Neurally-based sensory control for robotics and the confirmation of that information by implementation. The visual guidance of movement is one of the oldest open problems in the study of the human brain and behavior. The same problem is central to building robots that reach toward, avoid, or navigate intelligently around objects in their environments. Data from neuroscience on how the brain solves this problem can inspire engineering solutions for robots, and, in turn, robots can be used to test the plausibility of neuroscientific theories. Cognitive neuroscience, computation, visual psychophysics and systems neuroscience have contributed to the development of sophisticated models of visual attention. These models have begun to produce testable predictions that may assist progress towards our understanding of biological vision. Furthermore, since visual attention is ubiquitous in visual perception, the models promise to advance the state-of-the-art in computer vision as well. For spatial cognition, these models could be applied, for example, to the interpretation of hand-drawn diagrams used in spatial communication.

• The relationship between spatial cognition and spatial language. Spatial cognition provides the basic understanding of space that is encoded in spatial language, but the two systems are not isomorphic. For example, closed-class spatial forms such as prepositions give only a schematic characterization of the shape of a located object and the object with respect to which it is located. How spatial language is constrained by spatial perception and conceptualization is still poorly understood, and nor is it clear how much learning the spatial forms of a particular language may influence non-linguistic spatial perception and conceptualization. Research on the interaction of the two systems is essential to arriving at a better understanding of how human cognition manages space.

• How diagrams are used in design and analysis. The use of diagrams pervades many areas of design (e.g., architectural design, device design) and analysis (e.g., analysis of proofs, analysis of circuits). It is important to understand the role diagrams play in these applications, and how they could be applied in computational approaches to design and analysis. For example, one could consider the use of diagrams in the development of interactive tools for learning and applying formal techniques of logic.

• Spatial aspects of multimodal communication, including gestures, enactments, and diagrams. Communication is not restricted to words. More typically, it is multimodal. While speaking and using diagrams, people seamlessly interweave words, depictions, gestures, and enactments. Depictions use spatial relations to convey spatial or other relations. Gestures, such as pointing, refer directly to topics, and enactments, such as demonstrations of events, schematize actual events. Depictions, gestures, and enactments communicate more directly than words do.

• Interactions. Spatial problem solving is often done in pairs or groups, where communities develop their own common ground and conventions for sketches, diagrams, and verbal expressions. What sorts of conventions prove useful for what sorts of tasks? How are differences in perspective and construal resolved?

• Automated reasoning with solids. A major issue here is to determine adequate representation and information processing methods that can be used to evaluate different designs and also closely approximate reality. Often, many representations are possible for each object. Automatic conversion between semantically equal representations and the choice of an appropriate representation are serious issues. Furthermore, limitations on real-world manufacturability and approximations inherent in numeric and algebraic calculations make the study of approximate spatial calculations and approximate geometries crucial. Feature modeling and extraction play an important role in these computations.

• Spatio-temporal segmentation of events. Although events are continuous in space and time, they are perceived and conceived categorically. What determines the categorization and how is it talked about? How do these spatio-temporal objects interact and correlate (over space and time)?

• Investigation of the metamathematical properties of spatial representation languages, and the design of new languages with desirable computational properties. Knowledge of properties such as decidability, completeness, categoricity, and expressiveness, is vital to the evaluation and selection of an appropriate language and implementation for specific tasks. These results should guide the integration of such logics into computationally efficient, application-oriented integrated systems.

• Cross-linguistic and cross-cultural variation in the organization of space. How children come to understand space and external representations of it (e.g., diagrams, maps, and models) could be addressed by various approaches, including developmental studies and computational modeling of developmental results. Recent research suggests that there are significant differences across both languages and cultures in the way space is structured for purposes of communication and social interaction. The nature and extent of this variation is, however, still poorly understood. Research into this domain has the potential to illuminate what is universal in human spatial conceptualization, and what can be modified on the basis of experience.

• Computational toolboxes for building problem solvers that employ spatial representation and reasoning. What are the minimal common set of transformation operations on spatial representations? How are they applied? What are the basic data types and organizational principles? Power programming environments for building spatial reasoners are essential to test, refine, and apply theories of spatial cognition.

References

Andersen, R. A. 1987. Inferior Parietal Lobule Function in Spatial Perception and Visuomotor Integration. In Handbook of Physiology, 483-518.

Baadeley, A. 1986. Working Memory, London: Oxford.

Barwise, J. and Etchemendy, J. 1990. Visual Information and Valid Reasoning, in Visualization in Mathematics, W. Zimmerman (Ed.), Washington D. C.: Mathematical Association of America.

Bauer, M. I. and Johnson-Laird, P. N. 1993. How Diagrams Can Improve Reasoning. Psychological Science, 4 (6), 372-378.

Bowerman, M. 1996. Learning How to Structure Space for Language: A Crosslinguistic Perspective. In P. Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Ed.), Language and Space, Cambridge, MA: MIT Press.

Casati, R. and Varzi, A. 1994. Holes and Other Superficialities . Cambridge, MA: MIT Press.

Courtney, S. M., Petit, L., Maisog, J. M., Ungerleider, L. G., Haxby, J. V. 1998. An Area Specialized for Spatial Working Memory in Human Frontal Cortex. Science, 279, 1347-1351.

Courtney, S. M., Ungerleider, L. G., Keil, K., Haxby, J. V. 1996. Object and Spatial Visual Working Memory Activate Separate Neural Systems in Human Cortex. Cerebral Cortex 6(1), 39-49.

Davis, E. 1990. Representations of Commonsense Knowledge, Chapter 6. San Mateo, CA: Morgan Kaufmann.

De Renzi, E. 1982. Disorders of Space Exploration and Cognition, New York: John Wiley & Sons.

DeCuyper, J., Keymeulen, D. and Steels, L. 1995. A Hybrid Architecture for Modeling Liquid Behavior. In J. Glasgow, N. H. Narayanan, & B. Chandrasekaran (Ed.), Diagrammatic Reasoning: Cognitive and Computational Perspectives, 731-752. Cambridge MA: AAAI Press/MIT Press.

deKleer, J. 1977. Multiple Representations of Knowledge in a Mechanics Problem-Solver. In Proceedings of the IJCAI-77, 299-304.

Epstein, S. L. 1998. Pragmatic Navigation: Reactivity, Heuristics, and Search. Artificial Intelligence, 100(1-2) 275-322.

Epstein, S. L., Gelfand, J. and Lesniak, J. 1996. Pattern-Based Learning and Spatially Oriented Concept Formation in a Multi-Agent, Decision-Making Expert. Computational Intelligence, 12, 199-221.

Epstein, S. L., Gelfand, J. and Lock, E. T. 1998. Learning Game-Specific Spatially-Oriented Heuristics. Constraints, in press.

Ferguson, R. W., Aminoff, A. and Gentner, D. Modeling Qualitative Differences in Symmetry Judgments. In Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society, 1996, 534-539.

Fogassi, L., Gallese, V., di Pellegrino, G., Fadiga, L., Gentilucci, M.,Luppino, G., Matelli, M., Pedotti, A. and Rizzolatti, G. 1992. Space Coding by Premotor Cortex. Experimental Brain Research, vol. 89, 686-698.

Forbus, K. 1980. Spatial and Qualitative Aspects of Reasoning about Motion. In Proceedings of the First National Conference on Artificial Intelligence, 170-173.

Forbus, K. , Nelson, P. and Faltings, B. 1991. Qualitative Spatial Reasoning: The CLOCK Project. Artificial Intelligence, 51 (1-3), 417-471.

Funt, B. V. 1980. Problem Solving with Diagrammatic Representations. Artificial Intelligence, 13, 201-230.

Gardin, F. and Meltzer, B. 1989. Analogical Representations of Naive Physics. Artificial Intelligence, 38, 139-159.

Glasgow, J. and Papadias, D. 1992. Computational Imagery. Cognitive Science, 16 (3): 355-394.

Glasgow, J., Narayanan, N. H. and Chandrasekaran, B. (1995). Diagrammatic Reasoning: Cognitive and Computational Perspectives. Cambridge, MA: The MIT Press.

Graziano, M. S. A. , Hu, X. T. and Gross, C. G. 1997. Visuospatial Properties of Ventral Premotor Cortex. J. Neurophys., 77, 2268-2292.

Graziano, M. S. A., Yap, G. S. and Gross, C. G. 1994. Coding of Visual Space by Premotor Neurons, Science, 266, 1054-1057.

Grewe, L. and Kak, A. C. 1995. Interactive Learning of a Multi-Attribute Hash Table Classifier for Fast Object Recognition. Computer Vision and Image Understanding, 61 (3), 387-416.

Gross, C. G. and Graziano, M. S. A. 1995. Multiple Representations of Space in the Brain. The Neuroscientist, 1, 43-50.

Han, J.-H. and Requicha, A. A. G. 1997. Integration of Feature Based Design and Feature Recognition. Computer-Aided Design, 29 (5), 393-403.

Iwasaki, Y., Tessler, S. and Law, K. H. 1995. REDRAW: Diagrammatic Reasoner for Qualitative Structural Analysis. In J. Glasgow, B. Chandrasekaran, & H. Narayanan (Ed.), Diagrammatic Reasoning in Computational and Cognitive Perspectives on Problem Solving with Diagrams, Menlo Park, CA: AAAI Press.

Johnson, E. and Marefat, M. 1996. Qualitative Spatial Reasoning for Manufacturing Features. In Proceedings of the Artificial Intelligence and Manufacturing Workshop, Albuquerque, NM., 88-97.

Johnson, E. and Marefat, M. 1997. Systematic Spatial Inferencing Using a Qualitative Model for Manufactured Components. IEEE Transactions on Pattern Analysis and Machine Intelligence, submitted, Tech. Rept., TR-ISL-19, Intelligent Systems Laboratory, University of Arizona, Tucson, AZ, Feb. 1997.

Kandel, E., Schwartz, J., Jessel, T. Principles of Neural Science, Chapter 30, New York: Elsevier.

Kim, Yong Se. 1992. Recognition of Form Features Using Convex Decomposition. Computer-Aided Design, 24 (9), 461-476.

Kosaka, A. and Kak, A. C. 1992. Fast Vision-Guided Mobile Robot Navigation using Model-Based Reasoning and Prediction of Uncertainties. Computer Vision, Graphics, and Image Processing -- Image Understanding, : 271 - 329.

Larkin, J. and Simon, H. 1987. Why a Diagram is (Sometimes) Worth 10,000 Words. Cognitive Science, 11, 65-99.

Levesque, H. 1986. Making Believers out of Computers. Artificial Intelligence, 30, 81-108.

Levinson, S. C. 1996. Frames of Feference and Molyneux’s Question: Crosslinguistic Evidence. In P. Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Ed.), Language and Space, Cambridge, MA: MIT Press.

McDermott, D. 1987. Spatial Reasoning. In S. Shapiro (Ed.), The Encyclopedia of Artificial Intelligence, John Wiley and Sons.

Mengshoel, O. J. and Kim, Y. S. 1996. Intelligent Critiquing and Tutoring of Spatial Reasoning Skills. AI in Engineering Design, Analysis, Manufacturing, 10(3), 235-249.

Merrigan, W. and Maunsell, H. 1993. How Parallel are the Primate Visual Pathways? Ann. Rev. Neurosci., 16, 369-402.

Neisser, U. 1967. Cognitive Psychology . New York: Appleton-Century-Crofts.

Newell, A. 1981. The Knowledge Level. AI Magazine, 2(2), 1-20, 33.

Novak, G. and Bulko, W. 1992. Uses of Diagrams in Solving Physics Problems. In Proceedings of the AAAI Symposium on Reasoning with Diagrammatic Representations, Stanford, CA, 139-144.

Penev, K. and Requicha, A. A. G. 1997. Automatic Fixture Synthesis in 3D. In Proceedings of the IEEE International Conference on Robotics & Automation, 1713-1718. Albuquerque, NM: IEEE Press.

Raman, R. and Marefat, M. 1997. Qualitative Spatial Inferencing in Interpreting Designs for CAPP. Technical Report TR-1SL-21 Intelligence Systems Lab, University of Arizona, Tucson, AZ, April 1997.

Randell, D., Cui, Z. and Cohn, A. G. 1992. A Spatial Logic Based on Regions and Connections. In Proceedings of the Third International Conference on Knowledge Representation and Reasoning, 165-176.

Requicha, A. A. G. 1980. Representations for Rigid Solids: Theory, Methods, and Systems. ACM Computing Surveys, 12 (4): 437-464.

Rizzolatti, G., Scandolara, C., Matelli, M., Gentilucci, M. 1981. Afferent Properties of Periarculate Neurons in Macaque Monkeys. II. Visual Responses, Behavioral Brain Research, vol. 2, 147-163, 1981.

Robertson, I. H. and Marshall, J. C. 1993. Unilateral Neglect: Clinical and Experimental Studies, Hillsdale, NJ: Erlbaum.

Spyridi, A. J. and Requicha, A. A. G. 1990. Accessibility Analysis for the Automatic Inspection of Mechanical Parts by Coordinate Measuring Machines. In Proceedings of the IEEE International Conference on Robotics & Automation, 1284-1289. Cincinnati, OH: IEEE Press.

Tversky, B. 1995. Cognitive Origins of Graphic Productions in Understanding Images. In F. T. Marchese (Ed.), 25-53. New York: Springer-Verlag.

Vandenbrande, J. and Requicha, A. 1993. Spatial Reasoning for the Automatic Recognition of Machinable Features in Solid Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15 (10): 1269-1285.

Waco, D. and Kim, Y. 1994. Geometric Reasoning for Machining Features Using Convex Decomposition. Computer-Aided Design, 26 (6), 477-489.

Wang, E. and Kim, Y. S. 1994. Inductive Generation of Combination Operations for Form Feature Recognition Using Convex Decomposition. Computers and Industrial Engineering, 27(1-4), 123-126.

Wise, S. P. 1985. The Primate Premotor Cortex: Past, Present, and Preparatory, Annual Review of Neuroscience, vol. 8, pp. 1-19.

Yang, C. and Marefat, M. M. 1994. Object Oriented Concepts and Mechanisms for Feature-Based Computer Integrated Inspection. Journal of Advances in Engineering Software, 20 (2-3), 157-179.

Zhao, F. 1994. Extracting and Representing Qualitative Behaviors of Complex Systems in Phase Spaces. Artificial Intelligence, 69 (1-2), 51-92.

Summaries Of Workshop Presentations

Thursday, May 15

Session 1: Introductory Talks

Spatial Cognition and Representation of Space in the Brain

9:00A.M. – 10:00 A.M.

Introduction to Spatial Cognition

Barbara Tversky

Department of Psychology

Stanford University

10:00 A.M. – 11:00 A.M.

Functional Imaging Studies of Spatial Perception and Working Memory

James Haxby

National Institute of Mental Health

11:15 A.M. – 12:15 P.M.

Task-Based Visuomotor Representation of Space in the Brain

Michael Graziano

Department of Psychology

Princeton University

Thursday, May 15

Session 2: Robotics and Perception, 1:30 p.m. - 3:15 p.m.

Session Leader: Avi Kak, Robot Vision Lab, Purdue University

Sensor Planning for Robotics Tasks:

Integrating Geometric, Optical and Motion Constraints

Peter K. Allen

Department of Computer Science

Columbia University

If I want to use a camera for a machine vision task, where should I place it? This simple question is at the heart of the sensor planning problem which integrates information about the environment such as CAD models, knowledge about the available sensors, and knowledge about the task to automatically determine sensing strategies and parameters. I will describe a system called MVP (Machine Vision Planning) that can determine the locus of camera positions, orientations, and lens parameters that satisfy visibility, field-of-view, resolution and focus requirements for a given machine vision task. We will also describe 2 new extensions to MVP. The first extension answers the question "what viewpoint should I choose if things are moving in the scene''. We solve this using a new swept volume algorithm that computes temporal occlusion volumes. The second extension answers the question "where should I look next'' for the problem of building an accurate 3D model of an object from different viewpoints. This will be demonstrated with a system that builds 3-D models from range data. The sensor planner is able to compute the location of the next range scan that will reduce the current model's uncertainty and result in an accurate model being built with a minimum number of scans.

Perception and Behavior-Based Control

John K. Tsotsos

Department of Computer Science

University of Toronto

This proposal attempts to reconcile the successful behavior-based robot architectures with the results presented by Tsotsos which show that mechanisms which were anathema in the early behaviorist dogma, such as attention and goal-directed processing are perhaps necessary if vision is used as a major sensor. S* is a conceptual strategy for control which facilitates these mechanisms. Further, a unique method for determining whether a particular S* behavior collection can satisfy a given mission is described. A language for mission plan specification is defined permitting the integration of deliberative and reactive specifications and a proof procedure is sketched which can determine if a particular behavior set violates the timing and feasibility constraints of a specific mission.

Thursday, May 15

Session 3: Linguistic Representation of Space, 3:30 P.M. - 5:15 P.M.

Session Leader: Barbara Tversky, Department of Psychology, Stanford University

Spatial Structure in Language and Vision

Leonard Talmy

Department of Linguistics

Center for Cognitive Science

State University of New York at Buffalo

Human cognition comprehends a certain number of relatively distinct major cognitive systems. These include: language, (different modalities of) perception, reasoning, affect, attention, memory, and cultural structure. Some research suggests that each cognitive system has some structural properties that may be uniquely its own; some further structural properties that it shares with only one or a few other cognitive systems; and some fundamental structural properties that it has in common with all the cognitive systems. It can be assumed that each such cognitive system is more integrated and interpenetrated with connections from other cognitive systems than is envisaged by the strict modularity notion. I term this view the "overlapping systems" model of cognitive organization. In this presentation, I compare the conceptual structuring system of language with apparent or suggested aspects of structuring in visual perception. Thus, some factors with a significant structural role in visual perception -- such as symmetry, rotation, and dilation -- are at best minimally represented in the closed-class forms of languages. And conversely, linguistic closed-class forms express such categories as `reality status' -- e.g., inflections that represent a proposition as factual, conditional, potential, or counterfactual -- that have little part in visual perception. But both language and visual perception do have a number of apparent structural factors in common. These include (the representation or perception of): object structure, object arrangement, interior structure within bulk, the topological character of such structure, reference frames, multiple hierarchical embedding of structure, the distribution of attention over a scene, and the deployment of a perspective point relative to a scene.

Partitioning Space: Crosslinguistic Perspectives on How Languages Classify

Spatial Relations and How Children Acquire the Categories.

Melissa Bowerman

Max Planck Institute for Psycholinguistics

The ability to perceive and interpret spatial relationships is clearly supported and constrained by both human biology and experience with universal environmental conditions like gravity. Consistent with this, it has been widely assumed that children learn spatial morphemes for concepts they have already acquired on a nonlinguistic basis -- e.g., IN for a universal notion of "containment" and ON for "contact" or "support". But recent crosslinguistic work challenges this view: spatial semantic categories are often structured surprisingly differently across languages, and children show sensitivity to language-specific principles of spatial classification by as early as two years of age. Clearly both universal and language-specific factors contribute to the structure of spatial semanticcategories and to the process by which children acquire them, but how these factors interact is still poorly understood.

Friday, May 16

Session 4: Spatial Representation and Reasoning, 8:30 A.M. - 10:15 A.M.

Session Leader: B. Chandrasekaran, Department of Computer & Information Sciences, The Ohio State University

A model-based approach to spatial reasoning in AI

Janice Glasgow

Department of Computing and Information Sciences

Queen's University

Model-based reasoning involves proving the truth of a proposition by computation in the semantic domain. In contrast, rule-based reasoning is proving truth by means of formal manipulation of formulas. A growing body of research in cognitive science suggests that human spatial reasoning is model-based, rather than rule-based. The presentation will begin with a cognitive perspective of model-based reasoning. A modal logic of spatial assertions for reasoning in spatial domains will be presented, along with possible extensions that address structural hierarchy, temporal modalities, multiple views and analogy.

The Integration of Spatially-Oriented Reasoners in a Decision-Making System

Susan Epstein

Department of Computer Science

City University of New York

Jack Gelfand

Department of Psychology

Princeton University

Machines that reason effectively and efficiently about actions in two-dimensional space should incorporate visual perception. They require knowledge representations that appropriately focus attention and capture ideas such as "symmetry" and "boundary." They require a reasoning architecture that incorporates visual perception with high-level reasoning, carefully balancing the contributions of them both. Finally they require algorithms that filter out perceptually significant input and inductively apply it. One long-range goal is to have the faster, compiled, visually-based procedures augment and then gradually supplant more costly computation. The primary issue we see is how to learn meaningful features of space. Our work addresses this in the domain of two-person, perfect information board games.

Friday, May 16

Session 5: Diagrammatic Reasoning, 10:30 A.M. - 12:15 P.M.

Session Leader: Yumi Iwasaki, Department of Computer Science, Stanford University

Diagrammatic Reasoning: How space intersects qualitative reasoning and analogy

Kenneth D. Forbus

Institute for the Learning Sciences

Northwestern University

It is well known that diagrams and models play a central role in human spatial reasoning. What these roles are, and how they interact with other cognitive processes, are important research questions. This talk outlines a partial answer to each of these questions. (1) One role of diagrams is to serve as a substrate for qualitative reasoning. We illustrate this with examples from reasoning about motion, problem solving, and reasoning about structures. (2) The same similarity computations used in conceptual processing may also play a central role in visual cognition. We illustrate this by describing Fergusonís MAGI model of symmetry, and how analogical encoding can be used to understand diagrams.

Mental Models, Diagrams, and Human Reasoning

Phil Johnson-Laird

Department of Psychology

Princeton University

Mental models have a structure corresponding to the structure of what they represent. They are therefore are similar to diagrams. According to this theory, people reason by constructing models of the premises, formulating conclusions from them, and checking whether any models are counterexamples. Hence, certain sorts of diagrams should help people to reason. This talk presents three corroboratory lines of evidence. First, people reason better from diagrams that correspond to models than from verbal premises. Second, they often develop their own diagrammatic strategies. Third, when naive reasoners are taught a diagrammatic strategy, it improves their reasoning even when they carry it out in imagination only.

Friday, May 16

Session 6: Qualitative Physical Reasoning, 1:30 P.M. - 3:15 P.M.

Session Leader: Pat Hayes, Institute for Human & Machine Cognition, University of West Florida

The Spatial Semantic Hierarchy for Large-Scale and Visual Space

Benjamin Kuipers

Computer Science Department

University of Texas at Austin

We have developed the Spatial Semantic Hierarchy (SSH) as a heterogeneous representation for knowledge of large-scale space: the cognitive map.

Each level of the SSH has its own descriptive ontology and its own mathematical foundation. The objects, relations, and assumptions at each level are abstracted from the levels below. The control level allows the robot and its environment to be formalized as a continuous dynamical system, whose stable equilibrium points can be abstracted to a discrete set of ``distinctive states.'' Trajectories linking these states can be abstracted to actions, giving a discrete causal graph representation of the state space. The causal graph of states and actions can in turn be abstracted to a topological network of places and paths. Local metrical models, such as occupancy grids, of neighborhoods of places and paths can then be built on the framework of the topological network without their usual problems of global consistency.

The immediacy and global accessibility of visual space contrasts dramatically with the locality and incremental access that characterizes large-scale space. Nonetheless, we are exploring the

hypothesis that the structure of knowledge of visual space is a SSH-like hierarchy of significantly different representations.

At the control level are small active "attention buffers" that focus processing power on tracking a few selected visual features during motion. An attention buffer is like the fovea in representing a focus of attention, but quite different in our implementation in that it is not anatomically linked to the center of the visual field, and several can exist simultaneously and move independently.

At the intermediate symbolic levels, visual features such as blobs, edges, corners, and their geometric relationships, are used to confirm or disconfirm schema descriptions of extended objects in the scene.

The metrical level description consists of several distinct frames of reference: the 2D visual frame, the 3D egocentric frame, and the 3D world-centered frame. The three frames of reference have different ways of representing changing position and changing percepts with continuous motion.

As we did in the construction of the original SSH for large-scale space, the challenge here is to exploit both empirical constraints about human behavior and computational constraints from the task of visually-guided navigation, in the context of building a robot implementation.

Session 6: Qualitative Spacial Reasoning (continued)

Spatial Reasoning About Physical Fields

Feng Zhao

Department of Computer and Information Sciences

The Ohio State University

Many important physical phenomena such as temperature, air flow, and sound are described as spatially distributed continuous fields. Visual thinking plays an important role in physical reasoning. Based on the research in automating diverse reasoning tasks about dynamical systems, nonlinear controllers, kinematic mechanisms, and fluid motion, we have identified a style of visual thinking --- imagistic reasoning. Imagistic reasoning organizes computations around image-like, analogue representations so that perceptual and symbolic operations can be brought to bear to infer structure and behavior. We have developed a computational paradigm --- spatial aggregation (SA) --- as a realization of imagistic reasoning. SA comprises a field ontology, a mechanism of multi-layer aggregation, and a programming language. It takes a spatial continuous field as input and produces high-level descriptions of structure and behavior. Programs incorporating imagistic reasoning have been shown to perform at an expert level in domains that defy current engineering methods.

Friday, May 16

Session 7: Geometric Reasoning and CAD, 3:45 P.M. - 5:30 P.M.

Session Leader: Ari Requicha, Institute for Robotics & Intelligent Systems, University of Southern California

SPATIAL REASONING WITH THREE-DIMENSIONAL SHAPE FEATURES

Michael Marefat

Department of Electrical and Computer Eng.

University of Arizona

This talk presents a model for qualitative spatial inferencing about the shape of manufactured components. The model incorporates the shape features of the component as well as mathematical spatial relatioships between the features. Unlike previous feature spatial relationships which use vague, English-based vocabularies, the spatial relationships are formulated using the half-spaces created by the faces of a shape feature so as to completely capture feature interactions. With the incorporation of these spatial relationships, the model is capable of supporting spatial inferencing tasks which benefit the planning and problem solving in manufacturing. Specifically, these tasks are: (i) generation of the multiple interpretations of a component in terms of its three-dimensional shape features. (ii) identification of the symmetric interpretations of a component, and (iii) inferring shape analogies between components and subparts of components. We will discuss the definition, formulation, and computational complexity of these computational inferencing tasks, as well as their properties. The utility of these computational inferences within the design and manufacturing domains will be shown.

Geometric and Visual Reasoning for CAD/CAM

Yong Se Kim

Department of General Engineering

University of Illinois at Urbana-Champaign

In this talk, two aspects of geometric and visual reasoning as needed in computer-aided design and manufacturing area will be discussed. First, I will describe a form feature recognition method to identify high-level geometric entities automatically from the CAD data of mechanical parts so that manufacturing-relevant information can be obtained. Then, I will talk about an instructional software system to help engineering students develop visual reasoning skills.

Our feature recognition method uses a hierarchical volumetric decomposition which abstracts the outside-in geometric relations between the boundary faces of the part. Manufacturing applications in machining process planning and assembly mating have been developed.

The visual reasoning instructional software utilizes so-called missing view problem, where 3-D solid objects are to be constructed from two orthographic projections through visual analysis and synthesis. The system includes a geometric framework and a critiquing and tutoring module, and has been used in engineering graphics course.

Saturday, May 17

Session 8: Formal Theories of Spatial Reasoning, 9:00 A.M. - 10:45 A.M.

Session Leader: Ernie Davis, Department of Computer Science, New York University

Formal Calculi for Qualitative Spatial Reasoning

A. G. Cohn

Division of Artificial Intelligence

University of Leeds

Following on from work in the philosophical logic literature by Whitehead and others on logical calculi for representing space in a qualitative way, researchers in AI and Geographical Information

Systems have recently been developing and continuing this work. Much of the research has concentrated on representing and reasoning about topology, but other spatial features such as orientation, distance, size, shape and change have also been considered. Initially much of the research concentrated on representation and indeed this work continues, though there is also now a body of research on computational and complexity results. In my talk I will survey this work and point to areas for future work.

The Structure of Spatial Location

Achille C. Varzi

Department of Philosophy

Columbia University

Ordinary reasoning about space is first and foremost reasoning about entities located in space. The exact nature and properties of this locative relation in turn depend on the sorts of entities one considers. Material bodies, for instance, "occupy" the regions in which they are located; immaterial entities (such as holes or shadows, or perhaps events and processes) are less exclusive and can share a location with their peers. In my talk I shall review some reasons for regarding an account of spatial location as a central ingredient in the representation of our spatial competence. I shall offer some examples of what such an account amounts to, of the difficulties involved, and of the main directions along which a theory of location can be developed formally and made to interact with other fundamental ingredients of qualitative spatial reasoning, such as mereology, topology, morphology, and kinematics.

Workshop Contributors and Attendees

Peter K. Allen
Department of Computer Science
Columbia University
500 W. 120th Street
New York, NY 10027
allen@cs.columbia.edu

Michael Anderson
Department of Computer Science
University of Hartford
West Hartford, CT 06117
anderson@hartford.edu

Melissa Bowerman
Max Planck Institute for Psycholinguistics
Postbus 310
6500 AH Nijmegen
The Netherlands
melissa@mpi.nl

B. Chandrasekaran
Department of Computer and Information Science
591 Dreese Lab
2015 Neil Avenue
The Ohio State University, Columbus, OH 43201
chandra@cis.ohio-state.edu

A. G. Cohn
Division of AI
School of Computer Studies
University of Leeds
LEEDS, LS2 9JT, UK
agc@scs.leeds.ac.uk

Ernie Davis
Department of Computer Science
New York University
251 Mercer St.
New York, NY 10012
davise@cs.nyu.edu

Susan L. Epstein
Department of Computer Science
Hunter College of The City University of New York
695 Park Avenue
New York, NY 10021
epstein@roz.hunter.cuny.edu

Kenneth D. Forbus
Qualitative Reasoning Group
The Institute for the Learning Sciences
Northwestern University
1890 Maple Avenue
Evanston, Illinois, 60201
forbus@ils.nwu.edu

Jack Gelfand
Psychology Department
1-S-6 Green Hall
Washington Road
Princeton University
Princeton, NJ 08544

Mike Graziano
Psychology Department
Green Hall
Princeton University
Princeton, NJ 08544
graziano@princeton.edu

Patrick J. Hayes
Institute for Human and Machine Cognition
University of West Florida
11000 University Parkway
Pensacola, FL 32514
phayes@ai.uwf.edu

Janice Glasgow
Department of Computing and Information Science
Queen’s University, Kingston
Ontario, Canada K7L 3N6
janice@qucis.queensu.ca

James Haxby
National Institute of Mental Health
Building 10 Room 4C10
9000 Rockville Pike
Bethesda, MD 20892
haxby@nih.gov

Yumi Iwasaki
Knowledge Systems Laboratory
Gates Bldg. 2A, M/C 9020
Department of Computer Science
Stanford University
Stanford, CA 94305
iwasaki@ksl.stanford.edu

Phil Johnson-Laird
Department of Psychology
Green Hall
Princeton, NJ 08544
phil@clarity.princeton.edu

Avi Kak
Robot Vision Lab
1285 EE Building
Purdue University
W. Lafayette, IN 47907-1285
kak@purdue.edu

Yong Se Kim
University of Illinois at Urbana-Champaign
104 S. Mathews Ave.
Urbana, IL 61801-2996
yskim@ux1.cso.uiuc.edu

Benjamin Kuipers
Department of Computer Sciences
University of Texas at Austin
Austin, Texas 78712 USA
kuipers@cs.utexas.edu

Michael Marefat
Department of Electrical and Computer Engineering
University of Arizona
Tucson, AZ 85721
marefat@ece.arizona.edu

Aristides A. G. Requicha
Institute for Robotics and Intelligent Systems
University of Southern California
Los Angeles, CA 90089-0781
requicha@lipari.usc.edu

John Tsotsos
Department of Computer Science
6 King’s College Road, Room 283D
University of Toronto,
Toronto, Ontario, Canada M5S 3H5
tsotsos@vis.toronto.edu

Leonard Talmy
Department of Linguistics
685 Baldy Hall
State University of New York
Buffalo, NY 14260
talmy@acsu.buffalo.edu

Barbara Tversky
Department of Psychology
Building 420
Stanford University
Stanford, CA 94305-2130
bt@Psych.Stanford.EDU

Achille C. Varzi
Department of Philosophy
Columbia University
MC 4971 Philosophy Hall
1150 Amsterdam Avenue
New York NY 10027
varzi@columbia.edu

Feng Zhao
Dept. of Computer & Information Science
The Ohio State University
2015 Neil Avenue
Columbus, OH 43210-1277
fz@cis.ohio-state.edu