Augmenting reality without augmenting vision

A common narrative that people tell about virtual and augmented reality (VR and AR) goes something like this: "VR means total immersion in an environment, allowing a game designer to involve you directly in their completely hand-fabricated version of reality. It does this by completely supplanting your field of vision with a simulated 3D environment. AR, on the other hand, only supplants part of your field of vision, allowing overlays of simulated objects and information atop what is otherwise seen normally in the world."

The attentive reader will notice that one sense in particular was heavily emphasized in this explanation: vision. It seems like many people almost take it as a given that supplanting or augmenting reality means changing what we see in a very literal way, and sometimes this idea becomes almost a magic bullet, as though manipulating vision is all it takes to create compelling experiences, as if a more convincing simulation of vision is the main missing piece for telling better stories that center a human interactor. It's for this reason that I've taken to somewhat tongue-in-cheekedly referring to 3D virtual environments as "eye simulators" (thanks @zarawesome) to distinguish them from all the myriad other ways that one could consider rendering, or communicating, a simulation of space, even 3D space (I mean, you can "go up and down" between levels in games where the interface uses conventions from purely textual IF or ASCII-rendered Roguelikes).

Despite my occasional frustration with the hype surrounding VR and "immersive (visual) realism," I believe that constructed, visual virtualities have an awesome potential beyond their current use in games. Recently, Tale of Tales pointed their followers to the Real Time Art Manifesto that they published 10 years ago, and the most interesting part to me is the bit on storytelling, where they actually explicitly reject the idea of using this medium for telling constructed, drama-managed stories:

Embrace non-linearity.
Let go of the idea of plot.
Realtime is non-linear.
Tell the story through interaction.
Do not use in-game movies or other non-realtime devices to tell the story.
Do not create a “drama manager”: let go of plot!
Plot is not compatible with realtime.

Think “poetry”, not “prose”.

The ancient Greek philosopher Aristotle recognized six elements in Drama.
PLOT
what happens in a play, the order of events,
is only one of them.
Next to plot we have
THEME
or the main idea in the work
CHARACTER
or the personality or role played by an actor
DICTION
the choice and delivery of words
MUSIC/RHYTHM
the sound, rhythm and melody of what is being said
SPECTACLE
the visual elements of the work.
All of these can be useful in non-linear realtime experiences. Except plot.

But the realtime medium offers additional elements that easily augment or replace plot.
INTERACTIVITY
the direct influence of the viewer on the work
IMMERSION
the presence of the viewer in the work
AN AUDIENCE OF ONE
every staging of the work is done for an audience of a single person in the privacy of his
or her
home.

Perhaps this issue is not limited to visual realtime art, as it were; perhaps it's simply a reflection of the new-at-the-time, but by now well-established, idea that indeed there is a tension between allowing full manipulation of an environment (visually realized or not) by an interactor and conveying a structured plot with mandatory authorial beats. But I do think it underscores the main theme of this post: visual and narrative suspension of disbelief are not one and the same.

Perhaps that statement seems obvious, but since interactive narrative researchers who grew up on Star Trek positioned the Holodeck as the holy grail for games, it has been surprisingly difficult to disentangle these two things. Let's call it the Holodeck Fallacy.

When you presume the Holodeck Fallacy, it's all to easy to draw the conclusion that if VR aspires to be the Holodeck, then augmented reality should aspire to be the interface from Minority Report:

Microsoft certainly seems to have made this assumption with the Hololens, and to a lesser extent so has Nintendo.

But recently, I've been much more interested in the ways that our reality can and has already been meaningfully augmented without manipulating one's visual field: specifically, through audio.

Audio as an alternative augmenter

Whenever we talk about augmenting reality, we need to answer two questions:

1) Which part of reality? What is the "default" thing that we expect a human to be doing, in which we are going to intervene with some computational process?

2) What are we augmenting it with?

A really interesting and increasingly well-explored answer to (1) is physical location, as in "location-based games," and to take a really specific example, running for physical exercise. Well, okay, it's interesting to me because it's an activity I happen to enjoy and engage in regularly, but here are some augmented running experiences I have had that fundamentally changed the way I experience that activity:

- Listening to a handcrafted, or algorithmically-generated, playlist of music.
- Using a GPS tracker that occasionally (on which occasion depends on app settings) reports my distance, pace, time, heart rate, and/or other information it can access.
- Playing Zombies, Run!

While the answer to (2) is "audio" for all of these, only the last example also has the answer "story." Largely, until recently, augmentation has been used primarily for adding more factual information to one's day-to-day experience; heck, even a wristwatch could be seen in this light. Collecting data about your geographical trail and repeating it back to you, which in turn may affect your behavior, seems to logically follow from other informational gadgets, but using the same thing for storytelling feels like it's treading new ground.

The way Zombies, Run! works is that, essentially, you are listening to a radio play about a zombie post-apocalypse while you run, which is narrated to you as though you are a character in the story receiving communication from a base through a headset. The story interspersed with lengths of silence in order to space it out to take the amount of time specified by you to match the length of your run.

If that were all there were to it, Zombies, Run! would be nothing but an amusing second-person podcast, which itself does do interesting things as augmented reality: it allows your imagination to connect the visual channels, as well as other bodily senses associated with what you feel while you run, to map onto narrative events. This in itself is interesting.

There are a few more tricks, however: 1. as you run, you (at random, I think) collect items such as med packs, clothing, and books that, after your run, can be used to build up an in-game base to which you have an interface through the app. This mechanism cleverly separates the "staring at a screen" part of the game from the running part. 2. At your own requested frequency, your run will be peppered by zombie chases: you get a warning that they are a certain distance behind you, and if you pick up your pace, you outrun them. If you don't get your pace up quickly enough, they catch you, and you lose one or more pieces of collected inventory.

This last mechanism is interesting mainly because of how it lets reality affect virtuality, not just the other way around. It's almost a cheap form of "biofeedback" that circumvents sensors by using your GPS position plotted over time as an indirect measure of your physical actions. It's one of the few ways you can actually "interact" with the game as you run, since otherwise your location doesn't really matter.

Of course, one could go further with the idea of adding a feedback line from the player's current situation into a narrative fiction. A story told to you while running but where your own location information played into the narration, and where turning in a particular cardinal direction could affect the course of the story, could be interesting. Such things are already sometimes hand-crafted; one thinks of old museum "audio tours," their extension into self-guided city tours, and more recent projects like Improv Everywhere's flashmob project, "The MP3 experiment."

What taking this idea further means, then, is coming up with new enumerations of augmentable activities (walking and running, yes -- but what else?), new means of augmenting them, and, to inform their pairings, new ways that these two things might influence one another. How might an audio story change the way that someone traverses a space, and vice versa? How could we use the data available through a mobile device's sensors -- voice, accelerometer, location, elevation -- to influence a response from a helpful guide or a cunning adversary? Could one make an AI version of the narrator from The Stanley Parable that crafts routes for you to follow in any given (well-mapped) location and reprimandingly adapts to your diversions?

In general, I love the idea of a voice speaking in my ear as I move about a space otherwise in solitude -- telling me things about what I am seeing, suggesting avenues for exploration, or augmenting my visual perception with fiction. The last has the power to transform the ordinary or the mundane, perhaps environments that I see every day, into magical objects and spaces, to imbue them with new meaning and appreciate them in a new light. That, to me, is the real appeal of augmented reality, and it's possible -- perhaps even better -- to do it all without a heads-up display.

Augmenting virtuality with people

One of the ways I've been augmenting my runs recently is by listening to podcasts, and this morning I discovered Imaginary Worlds, a podcast about science fiction and fantasy across different media. The first one I listened to was about Then She Fell, a recent immersive theater project by New York's Third Rail Productions.

I've thought and written before about immersive theater, but thinking about it anew in the context of augmented realities made me see connections I hadn't previously. Imaginary Worlds narrator Eric Molinsky's comments with regard to Then She Fell that what felt compelling to him was the intimacy, the experience of having an actor delivering lines inches from your face, making excruciating amounts of eye contact. Not only that, but also listening and responding to everything you say and do with the attention and improvisational cleverness that only humans, so far, really know how to do.

This made me think that people working on augmented reality experiences are really doing a kind of similar thing to what AR designers are trying to do, but they are approaching the objective from opposite directions: where one takes "reality" as primary and then augments it with meaning that comes from something imaginary, a virtuality, the other takes the fiction, the "virtuality," as primary and substitutes a standard literary figure -- a character in the story -- with something from "reality," namely a guest to the production who doesn't know how the story will play out. Immersive theater substitutes for stage makeup and exaggerated drama, the intensity our brains generate when a real live person in front of us is expecting interaction.

Molinsky notes that one thing he didn't like about Then She Fell was the ambiguity, or perhaps even under-thought quality, of the "audience character:" "I didn't know who I was supposed to be," he says. In other words, what is the audience member's role in the story? In the framing of augmented virtuality, that this was experienced as a failure mode makes perfect sense: while the flow of information from story to interactor is well-established, because humanity has plenty of examples to follow for how that direction works, the other direction of taking unpredictable interactions and reflecting them into story meaning has only video games as a guide. And in that case, linguistic and performative interfaces have been purposely limited, because otherwise, a process wouldn't know how to handle them -- unlike a human actor.

Incidentally, just last night I finally got around to playing Jason Rohrer's Sleep is Death.

Sleep is Death also plays this "augmented virtuality" trick, but with video games as a starting point rather than theater, such that the "typically automated" function substituted with human choice, rather than a character reading lines in a play, is the game itself. Sleep is Death substitutes the uniformity of pre-programmed game responses with on-the-fly, human responses to player-typed dialogue or actions. The game is networked two-player: one of you is the player and one the controller. After each time the player does some action (which the controller can see), the controller has a limited time window in which they can swap out scenery or sprites, type a response in a speech bubble, or provide some other game-like response. It's possible to, like a producer of a play, spend a good long while constructing your scenes before allowing an audience, but then all the dynamic action (including dialogue and switching scenes) is up to improvisation.

If you, like me, find yourself too impatient with the controller interface to experience the game first-hand, you can look at some of the "flipbooks" generated from play on the game's website, which offer some insight.

To reiterate: the main point of this post is that visual suspension of disbelief is neither necessary nor sufficient for narrative suspension of disbelief, and I worry that in the (well-deserved!) river of attention being poured into visual augmentation and VR, we are in some danger of conflating breakthroughs in these technologies with breakthroughs in storytelling. I would like to see attention paid also to the ways that other senses (audio, haptic, olfactory, gustatory, proprioceptive...?) can augment fictional experiences, as well as to the role of social play, i.e. the potentially transformative role that other real-live humans have to play in shaping these experiences, whether at a safe internet distance, inches from your face, or some virtually-distorted mediation between the two.

Everything in Context