Lev Manovich on Tue, 18 Aug 1998 23:12:59 -0700 (PDT)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Syndicate: A THEORY OF CULTURAL INTERFACES 2/3


II. Cinema

Printed word tradition which has initially dominated the language of
cultural interfaces, is becoming less important, while the part played by
cinematic elements is getting progressively stronger. This is consistent
with a general trend in modern society towards presenting more and
more information in the form of time-based audio-visual moving
image sequences, rather than as text. As new generations of both
computer users and computer designers are growing up in a media-
rich environment dominated by television rather than by printed texts,
it is not surprising that they favor cinematic language over the
language of print.
        A hundred years after cinema's birth, cinematic ways of seeing
the world, of structuring time, of narrating a story, of linking one
experience to the next, are being extended to become the basic ways in
which computer users access and interact with all cultural data. In this
way, the computer fulfills the promise of cinema as a visual Esperanto
which pre-occupied many film artists and critics in the 1920s, from
Griffith to Vertov. Indeed, millions of computer users communicate
with each other through the same computer interface. And, in contrast
to cinema where most of its "users" were able to "understand"
cinematic language but not "speak" it (i.e., make films), all computer
users can "speak" the language of the interface. They are active users of
the interface, employing it to perform many tasks: send email, run
basic applications, organize files and so on.
        The original Esperanto never became truly popular. But cultural
interfaces are widely used and are easily learned. We have a truly
unprecedented situation in the history of cultural langauges:
something which is designed by a rather small group of people is
immediately adopted by millions of computer users.  How is it possible
that people around the world adopt today something which a 20-
something programmer in Northern California has hacked together
just the night before?  Shall we conclude that we are somehow
bilogically "wired" to the interface language, the way we are "wired,"
according to the original hypothesis of Noam Chomsky, to different
natural languages?
        Interestingly, the speed with which the language of cultural
interfaces is formulated in the end of the twentieth century is
comparable to the speed with which cinematic language was
formulated exactly a hundred years ago. In both cases, the ease with
which the users "acquired" these languages was to a large extent due to
the fact that these languages drew on previous and already well
acquired cultural forms. In the case of cinema, it was theater, magic
lantern shows and other nineteenth century forms of public
entertainment. Cultural interfaces in their turn draw on older cultural
forms such as the printed word and cinema. I have already discussed
some ways in which the printed word tradition structures interface
language; now it is cinema's turn.
        I will begin with probably the most important case of cinema's
influence on cultural interfaces - the mobile camera. Originally
developed as part of 3-D computer graphics technology for such
applications as computer-aided design, flight simulators and computer
movie making, during the 1980's and 1990's the camera model became
as much of an interface convention as scrollable windows or cut and
paste function. It became an accepted way for interacting with any data
which is represented in three dimensions -- which, in a computer
culture, means literally anything and everything: the results of a
physical simulation, an architectural site, design of a new molecule,
financial data, the structure of a computer network and so on. As
computer culture is gradually spatializing all representations and
experiences, they become subjected to the camera's particular grammar
of data access. Zoom, tilt, pan and track: we now use these operations to
interact with data spaces, models, objects and bodies.
        Abstracted from its historical temporary "imprisonment" within
the physical body of a movie camera directed at physical reality, a
virtualized camera also becomes an interface to all types of media
beside 3-D space. As an example, consider GUI (Graphical User
Interface) of the leading computer animation software --
PowerAnimator from Alias/Wavefront. [11] In this interface, each
window, regardless of whether it displays a 3-D model, a graph or even
plain text, contains Dolly, Track and Zoom buttons. In this way, the
model of a virtual camera is extended to apply to navigation through
any kind of information, not only the one which was spatialized. It is
particularly important that the user is expected to dolly and pan over
text as though it is a 3-D scene. Cinematic vision triumphed over the
print tradition, with the camera subsuming the page. The Guttenberg
galaxy turned out to be just a subset of the Lumieres' universe.
        Another feature of cinematic perception which persists in
cultural interfaces is a rectangular framing of represented reality.
Cinema itself inherited this framing from Western painting. Since the
Renaissance, the frame acted as a window into a larger space assumed
to extend beyond the frame. This space was cut by the frame's rectangle
into two parts: "onscreen space," the part which is inside the frame,
and the part which is outside. In the famous formulation of Leon-
Battista Alberti, the frame acted as a window onto the world. Or, in a
more recent formulation of Jacques Aumont and his co-authors, "The
onscreen space is habitually perceived as included within a more vast
scenographic space. Even though the onscreen space is the only visible
part, this larger scenographic part is nonetheless considered to exist
around it." [12]
        Just as a rectangular frame of painting and photography presents
a part of a larger space outside it, a window in HCI presents a partial
view of a larger document. But if in painting (and later in
photography), the framing chosen by an artist was final, computer
interface benefits from a new invention introduced by cinema: the
mobility of the frame. As a kino-eye moves around the space revealing
its different regions, so can a computer user scroll through a window's
contents.
        It is not surprising to see that screen-based interactive 3-D
environments, such as VRML words, also use cinema's  rectangular
framing since they rely on other elements of cinematic vision,
specifically a mobile virtual camera. It may be more surprising to
realize that Virtual Reality (VR) interface, often promoted as the most
"natural" interface of all, utilizes the same framing. [13] As in cinema,
the world presented to a VR user is cut by a rectangular frame. As in
cinema, this frame presents a partial view of a larger space. [14] As in
cinema, the virtual camera moves around to reveal different parts of
this space.
        Of course, the camera is now controlled by the user and in fact is
identified with his/her own sight. Yet, it is crucial that in VR one is
seeing the virtual world through a rectangular frame, and that this
frame always presents only a part of a larger whole. This frame creates a
distinct subjective experience which is much more close to cinematic
perception than to unmediated sight.
        Interactive virtual worlds, whether accessed through a screen-
based or a VR interface, are often discussed as the logical successor to
cinema, as potentially the key cultural form of the twenty-first century,
just as cinema was the key cultural form of the twentieth century.
These discussions usually focus on the issues of interaction and
narrative. So, the typical scenario for twenty-first century cinema
involves a user represented as an avatar existing literally "inside" the
narrative space, rendered with photorealistic 3-D computer graphics,
interacting with virtual characters and perhaps other users, and
affecting the course of narrative events.
        It is an open question whether this and similar scenarios
commonly invoked in new media discussions of the 1990's, indeed
represent an extension of cinema or if they rather should be thought of
as a continuation of some theatrical traditions, such as improvisational
or avant-garde theater. But what undoubtedly can be observed in the
1990's is how virtual technology's dependence on cinema's mode of
seeing and language is becoming progressively stronger. This coincides
with the move from proprietary and expensive VR systems to more
widely available and standardized technologies, such as VRML (Virtual
Reality Modeling Language). [15]
        The creator of a VRML world can define a number of viewpoints
which are loaded with the world. [16] These viewpoints automatically
appear in a special menu in a VRML browser which allows the user to
step through them, one by one. Just as in cinema, ontology is coupled
with epistemology: the world is designed to be viewed from particular
points of view. The designer of a virtual world is thus a
cinematographer as well as an architect. The user can wander around
the world or she can save time by assuming the familiar position of a
cinema viewer for whom the cinematographer has already chosen the
best viewpoints.
        Equally interesting is another option which controls how a
VRML browser moves from one viewpoint to the next. By default, the
virtual camera smoothly travels through space from the current
viewpoint to the next as though on a dolly, its movement
automatically calculated by the software. Selecting the "jump cuts"
option makes it cut from one view to the next. Both modes are
obviously derived from cinema. Both are more efficient than trying to
explore the world on its own.
        With a VRML interface, nature is firmly subsumed under
culture. The eye is subordinated to the kino-eye. The body is
subordinated to a virtual body of a virtual camera. While the user can
investigate the world on her own, freely selecting trajectories and
viewpoints, the interface privileges cinematic perception -- cuts, pre-
computed dolly-like smooth motions of a virtual camera, and pre-
selected viewpoints.
        The area of computer culture where cinematic interface is being
transformed into a cultural interface most aggressively is computer
games. By the 1990's, game designers have moved from two to three
dimensions and have begun to incorporate cinematic language in a
increasingly systematic fashion. Games started featuring lavish
opening cinematic sequences (called in the game business
"cinematics") to set the mood, establish the setting and introduce the
narrative. Frequently, the whole game would be structured as an
oscillation between interactive fragments requiring user's input and
non-interactive cinematic sequences, i.e. "cinematics". [17]  As the decade
progressed, game designers were creating increasingly complex -- and
increasingly cinematic -- interactive virtual worlds. Regardless of a
game's genre -- action/adventure, fighting, flight simulator, first-
person action, racing or simulation -- they came to rely on
cinematography techniques borrowed from traditional cinema,
including the expressive use of camera angles and depth of field, and
dramatic lighting of 3-D sets to create mood and atmosphere. In the
beginning of the decade, games used digital video of actors
superimposed over 2-D or 3-D backgrounds, but by its end they
switched to fully synthetic characters. [18] This switch also made virtual
words more cinematic, as the characters could be better visually
integrated with their environments. [19]
        A particularly important example of how computer games use --
and extend -- cinematic language, is their implementation of a dynamic
point of view. In driving and flying simulators and in combat games,
such as Tekken 2 (Namco, 1994 -), after a certain event takes place (car
crashes, a fighter being knocked down), it is automatically replayed
from a different point of view. Other games such as the Doom series (Id
Software, 1993 -) and Dungeon Keeper (Bullfrog Productions, 1997)
allow the user to switch between the point of view of the hero and a
top down "bird's eye" view. Finally, Nintendo went even further by
dedicating four buttons on their N64 joypad to controlling the view of
the action. While playing Nintendo games such as Super Mario 64
(Nintendo, 1996) the user can continuously adjust the position of the
camera. Some Sony Playstation games such as Tomb Rider (Eidos, 1996)
also use the buttons on the Playstation joypad for changing point of
view.
        The incorporation of virtual camera controls into the very
hardware of a game consoles is truly a historical event. Directing the
virtual camera becomes as important as controlling the hero's actions.
This is admitted by the game industry itself. For instance, a package for
Dungeon Keeper lists four key features of the game, out of which the
first two concern control over the camera: "switch your perspective,"
"rotate your view," "take on your friend," "unveil hidden levels." In
games such as this one, cinematic perception functions as the subject in
its own right. [20] Here, the computer games are returning to "The New
Vision" movement of the 1920s (Moholy-Nagy, Rodchenko, Vertov
and others), which foregrounded new mobility of a photo and film
camera, and made unconventional points of view the key part of their
poetics.
        The fact that computer games continue to encode, step by step,
the grammar of a kino-eye in software and in hardware is not an
accident. This encoding is consistent with the overall trajectory driving
the computerization of culture since the 1940's, that being the
automation of all cultural operations.  This automation gradually
moves from basic to more complex operations: from image processing
and spell checking to software-generated characters, 3-D worlds, and
Web Sites. The side effect of this automation is that once particular
cultural codes are implemented in low-level software and hardware,
they are no longer seen as choices but as unquestionable defaults. To
take the automation of imaging as an example, in the early 1960's the
newly emerging field of computer graphics incorporated a linear one-
point perspective in 3-D software, and later directly in hardware. [21] As a
result, linear perspective became the default mode of vision in digital
culture, be it computer animation, computer games, visualization or
VRML worlds. Now we are witnessing the next stage of this process:
the translation of cinematic grammar of points of view into software
and hardware. As Hollywood cinematography is translated into
algorithms and computer chips, its convention becomes the default
method of interacting with any data subjected to spatialization, with a
narrative, and with other human beings. (At SIGGRAPH '97 in Los
Angeles, one of the presenters called for the incorporation of
Hollywood-style editing in multi-user virtual worlds software. In such
implementation, user interaction with other avatar(s) will be
automatically rendered using classical Hollywood conventions for
filming dialog. [22]) Element by element, cinema is being poured into a
computer: first one-point linear perspective; next the mobile camera
and a rectangular window; next cinematography and editing
conventions, and, of course, digital personas also based on acting
conventions borrowed from cinema, to be followed by make-up, set
design, and, of course, the narrative structures themselves. From one
cultural language among others, cinema is becoming the cultural
interface, a toolbox for all cultural communication, overtaking the
printed word.
        But, in one sense, all computer software already has been based
on a particular cinematic logic.  Consider the key feature shared by all
modern human-computer interfaces - overlapping windows. [23] All
modern interfaces display information in overlapping and resizable
windows arranged in a stack, similar to a pile of papers on a desk. As a
result, the computer screen can present the user with practically an
unlimited amount of information despite its limited surface.
        Overlapping windows of HCI can be understood as a synthesis of
two basic techniques of twentieth-century cinema: temporal montage
and montage within a shot. In temporal montage, different images
follow each other in time, while in montage within the shot, these
images co-exist within the screen. The first technique defines the
cinematic language as we know it; the second is used more rarely.
Examples of this technique are vignettes within a screen employed in
early cinema to show an interlocutor of a telephone conversation;
superimpositions of a few images and multiple screens used by the
avant-garde filmmakers; and the use of deep focus and a particular
compositional strategy (for instance, a character looking through a
window, such as in Citizen Kane, Ivan the Terrible and Rear Window)
to juxtapose close and far away scenes. [24]
        As testified by its popularity, temporal montage works.
However, it is not a very efficient method of communication: the
display of each additional piece of information takes time to watch,
thus slowing communication. It is not accidental that the European
avant-garde of the 1920's inspired by the engineering ideal of efficiency,
experiments with various alternatives, trying to load the screen with as
much information at one time as possible. [25] In his 1927 Napoleon Abel
Gance uses a multiscreen system which shows three images side by
side. Two years later, in A Man with a Movie Camera (1929) we watch
Dziga Vertov speeding up the temporal montage of individual shots,
more and more, until he seems to realize: why not simply
superimpose them in one frame? Vertov overlaps the shots together,
achieving temporal efficiency -- but he also pushes the limits of a
viewer's cognitive capacities. His superimposed images are hard to
read -- information becomes noise. Here cinema reaches one of its
limits imposed on it by human psychology; from that moment on,
cinema retreats, relying on temporal montage or deep focus, and
reserving superimpositions for infrequent cross-dissolves.
        In window interface, the two opposites -- temporal montage and
montage within the shot -- finally come together. The user is
confronted with a montage within the shot -- a number of windows
present at once, each window opening up into its own reality. This,
however, does not lead to the cognitive confusion of Vertov's
superimpositions because the windows are opaque rather than
transparent, so the user is only dealing with one of them at a time. In
the process of working with a computer, the user repeatedly switches
from one window to another, i.e. the user herself becomes the editor
accomplishing montage between different shots. In this way, window
interface synthesizes two different techniques of presenting
information within a rectangular screen developed by cinema.
        This last example shows once again the extent to which human-
computer interfaces -- and, the cultural interfaces which follow them --
are cinematic, inheriting cinema's particular ways of organizing
perception, attention and memory. Yet it also demonstrates the
cognitive distance between cinema and the computer age. For the
viewers of the 1920's, the temporal replacement of one image by
another, as well as superimposition of two images together were an
aesthetic and perceptual shock, a truly modern and unfamiliar
experience -- as testified, for instance, by Walter Benjamin's description
of cinema in his Artwork essay. [26] Film directors were able to use
montage to create meaning, because the cut from one image to another
was a meaningful, even traumatic (if we are to believe Benjamin)
event. At the end of the century, however, anaesthetized first by
cinema and then by television channel flipping, we feel at home with a
number of overlapping windows on a computer screen. We switch
back and forth between different applications, processes, tasks. Not only
are we no longer shocked, but in fact we feel angry when a computer
occasionally crashes because we opened too many windows at once.
        Cinema, the major cultural form of the twentieth century, has
found a new life as the toolbox of a computer user. What was an
individual artistic vision -- of Griffith, Eisenstein, Gance, Vertov -- has
become a way of work and a way of life for millions in the computer
age. Cinema's aesthetic strategies have become basic organizational
principles of computer software. The window in a fictional world of a
cinematic narrative has become a window in a datascape. In short,
what was cinema has become human-computer interface.
        I will conclude this section by discussing a few artistic projects
which, in different ways, offer alternatives to this trajectory. To
summarize it once again, the trajectory involves gradual translation of
elements and techniques of cinematic perception and language into a
decontextualized set of tools to be used as an interface to any data. In
the process of this translation, cinematic perception is divorced from its
original material embodiment (camera, film stock), as well as from the
historical contexts of its formation. If in cinema the camera functioned
as a material object, co-existing, spatially and temporally, with the
world it was showing us, it has now become a set of abstract operations.
The art projects described below refuse this separation of cinematic
vision from the material world. They reunite perception and material
reality by making the camera and what it records a part of a virtual
world's ontology. They also refuse the universalization of cinematic
vision by computer culture, which (just as post-modern visual culture
in general) treats cinema as a toolbox, a set of "filters" which can be
used to process any input. In contrast, each of these projects employs a
unique cinematic strategy which has a specific relation to the particular
virtual world it reveals to the user.
        In The Invisible Shape of Things Past Joachim Sauter and Dirk
L�¼senbrink of the Berlin-based Art+Com collective created a truly
innovative cultural interface for accessing historical data about Berlin's
history. [27] The interface de-virtualizes cinema, so to speak, by placing
the records of cinematic vision back into their historical and material
context. As the user navigates through a 3-D model of Berlin, he or she
comes across elongated shapes lying on city streets. These shapes,
which the authors call "filmobjects", correspond to documentary
footage recorded at the corresponding points in the city. To create each
shape the original footage is digitized and the frames are stacked one
after another in depth, with the original camera parameters
determining the exact shape. The user can view the footage by clicking
on the first frame. As the frames are displayed one after another, the
shape is getting correspondingly thinner.
        In following with the already noted general trend of computer
culture towards spatialization of every cultural experience, this cultural
interface spatializes time, representing it as a shape in a 3-D space. This
shape can be thought of as a book, with individual frames stacked one
after another as book pages. The trajectory through time and space
taken by a camera becomes a book to be read, page by page. The records
of camera's vision become material objects, sharing the space with the
material reality which gave rise to this vision. Cinema is solidified.
This project, than, can be also understood as a virtual monument to
cinema. The (virtual) shapes situated around the (virtual) city,  remind
us about the era when cinema was the defining form of cultural
expression -- as opposed to a toolbox for data retrieval and use, as it is
becoming today in a computer.
        Hungarian-born artist Tam�¡s Waliczky openly refuses the
default mode of vision imposed by computer software, that of the one-
point linear perspective. Each of his computer animated films The
Garden (1992), The Forest (1993) and  The Way (1994) utilizes a
particular perspectival system: a water-drop perspective in The Garden,
a cylindrical perspective in The Forest and a reverse perspective in The
Way. Working with computer programmers, the artist created custom-
made 3-D software to implement these perspectival systems. Each of
the systems has an inherent relationship to the subject of a film in
which it is used. In The Garden, its subject is the perspective of a small
child, for whom the world does not yet have an objective existence. In
The Forest, the mental trauma of emigration is transformed into the
endless roaming of a camera through the forest which is actually just a
set of transparent cylinders. Finally, in the The Way, the self-sufficiency
and isolation of a Western subject  from his/her environment are
conveyed by the use of a reverse perspective.
        In Waliczky's films the camera and the world are made into a
single whole, whereas in The Invisible Shape of Things Past the
records of the camera are placed back into the world. Rather than
simply subjecting his virtual worlds to different types of perspectival
projection, Waliczky modified the spatial structure of the worlds
themselves. In The Garden, a child playing in a garden becomes the
center of the world; as he moves around, the actual geometry of all the
objects around him is transformed, with objects getting bigger as he gets
close to him. To create The Forest, a number of cylinders were placed
inside each other, each cylinder mapped with a picture of a tree,
repeated a number of times. In the film, we see a camera moving
through this endless static forest in a complex spatial trajectory -- but
this is an illusion. In reality, the camera does move, but the
architecture of the world is constantly changing as well, because each
cylinder is rotating at its own speed. As a result, the world and its
perception are fused together.