Skip to main content

Generativity & interpretation: a study of generated comics

In Scott McCloud's classic comics theory book Understanding Comics, he introduces six kinds of panel transitions:
Since my last reading of the book, I've been curious whether these transition types can be operationalized toward any of the following goals, approximately ordered from "most human effort needed" to "least human effort needed":
  • A Dadaist/Oulipian collaborative cartooning game where players take turns rolling a six-sided die and drawing a panel on a shared piece of paper according to the transition type determined by the die roll.
  • "Fridge poetry" for comics: come up with a fixed set of panels that link together via the different transition types, then let humans decide how to order them.
  • A board game in which each player has a "hand" of panels, as well as some goals that align with good global comic construction, and use the same die-roll mechanism as in the first idea.
  • A mixed-initiative digital comic creation tool in which the system suggests possible next panels based on transition types, and the human selects and modifies these panels.
  • A comic-generating program that creates abstract "comic specifications" by automating next panel selection, and lets a human render the comic concretely.
  • A fully-automated comic generator that does all of the above plus visual element placement and rendering.
A few days ago, I did an experiment in which I made some panels out of index cards and a combination of two sticker packs, then used die rolls to select the first and each next panels. At first, I tried straightforward application of McCloud's transition types, which meant doing a lot of human work to interpret panel sequences in certain ways, and add modifiers/emotes to make that meaning more visible. Here are the first few generated results:

The nice thing about comics (especially when wordless) is that they can be understood as telling stories through a very simple language of visual elements and their spatial relationships to one another, e.g. their relative size, rotation, horizontal and vertical juxtaposition, and the distance between them. Of course, by using robot stickers with humanoid faces (and further augmenting them with emotes), the human brain of the reader fills in some of the semantic gaps that would otherwise be impossible to resolve. Human brains' ability to fill in gaps is also why comics are simpler than animation in this respect: animations are expected to provide continuous motion between frames, whereas two comic frames need only be plausibly connected by some narrative justification. And that's where transition types come in: when you exclude non sequiturs, they constrain the space of next panels to ones that "make sense."

Figuring out how to tell a computer which concrete meanings to apply to an arrangement of visual elements seems like a deep and difficult problem, so I decided to see if I could sidestep it and solve the simpler problem of telling a computer how to generate abstract arrangements of visual elements according to panel transition types. To do so, I came up with the following terms:

Visual element (VE): unique identifier from an infinite set, mappable to visually distinct image components, such as anthropomorphic "characters," scenery, and geometric shapes.

Frame: a named panel outline dictating a minimum number of visual elements requires to fill it in, e.g. "give" requires three visual elements (a giver, a gift, and a giftee). The frame should contain instructions for visual rendering, e.g. an image with three holes for the spatial positions of each element.

Panel: a frame with its holes filled by visual elements, and optionally some additional VEs (e.g. observers carried over from previous panels).

Modifier: visual details overlaid on frames and VEs to add semantic coherence to the comic, such as floating emotes, facial expressions, motion lines, word balloons, and other text.

McCloud's transitions can (mostly) be made sense of in these terms:

Moment: panel i+1 has the same frame and VEs as panel i but different modifiers.
Action: panel i has the same VEs (give or take) as panel i but different frame and modifiers.
Scene: different frame, VEs, and modifiers.
Subject: panel i+j+1 shows VEs from panel i in the same or a similar frame to panel i+j.
Aspect: panel i+1 shows a subset of panel i's VEs together with new VEs.

But of course, these transition types are designed for human interpretation, not machine generation, and there's still a considerable amount of gap-filling to do: what distinguishes an "aspect" change from an "action" change other than the interpretation of different visual elements being part of the same space vs. part of the next step in time? What distinguishes a scene change from a non-sequitur unless the new scene is eventually revealed to connect with the previous one? And furthermore, there's a lot of nondeterminism in when visual elements are allowed to join or leave the narrative, and when new ones can be generated.

So I came up with a more machine-friendly set of panel transitions:

Moment: keep VEs, change frame and/or modifiers.
Add: introduce a VE that didn't appear in the previous panel (but might have appeared earlier).
Subtract: remove a VE from the previous panel (and potentially choose a new frame).
Meanwhile: choose a new frame and only show VEs that didn't appear in the previous panel, generating new VEs if necessary.
Rendez-vous: choose a new frame and fill it with any combination of previously-appearing VEs. Generate new VEs only when there aren't enough previous VEs to fill the frame.

Finally, I also introduced an End transition to allow the generated comic strip to terminate.

After a couple more paper prototype tests, I wrote an ML program to generate abstract comics in this form, e.g.

- ComicGen.gen [] 2;
val it =
  : (ComicGen.panel * ComicGen.transition) list
The bit of the program that interprets transitions is:

case transition of
    Moment =>
             val {name, nholes} = pickFrame currentNVEs
             ({name=name, elements=justPrior}, totalNVEs) 
  | Add =>
             val unused = nonmembers justPrior allPrior
             val howManyNew = 1 (* Random.randRange (1,2) rand *)
             val {name, nholes} = pickFrame (currentNVEs + howManyNew)
             val (new_elts, new_total) = pickRandomVEs unused howManyNew totalNVEs
             val new_elts = new_elts @ justPrior
             ({name=name, elements=new_elts}, new_total) 
  | Subtract =>
             if List.length justPrior > 0 then
                val nVEs = currentNVEs - 1
                val {name, nholes} = pickFrame nVEs
                val elts = removeRandom justPrior
                ({name=name, elements=elts}, totalNVEs)
                val {name, nholes} = pickFrame 0
                ({name=name, elements=[]}, totalNVEs)
  | Meanwhile =>
               val skipVEs = nonmembers justPrior allPrior
               val {name, nholes} = pickRandomFrame ()
               val (elts, newTotal) = pickRandomVEs skipVEs nholes totalNVEs
               ({name=name, elements=elts}, newTotal)
  | RendezVous =>
               val {name, nholes} = pickRandomFrame ()
               val (elts, newTotal) = pickRandomVEs allPrior nholes totalNVEs
               ({name=name, elements=elts}, newTotal)
  | End => ({name="blank", elements=[]}, totalNVEs)
I did a few hand-renderings of these generated strips:

Later, I also wrote a Ceptre version of the generator, mostly just for the comparison exercise. My current conclusions: the Ceptre version is indeed more concise (especially when putting aside the re-implemented arithmetic and basic datatypes), but it was quite a bit more difficult to get bug-free. (If only there were some analog of types for generative logic programs...)

If I continue working on this project, I plan to port my ML code to JavaScript and write a panel renderer so that I can let people play with the generator in a browser. If anyone wants to scoop me for this step, though, please feel free, since this is not my primary research project and I should probably move on from it. :)

Theoretically speaking, there's already a fair amount to reflect on here. I'm used to taking a simulationist approach to narrative generation, i.e. modeling an action possibility space for virtual agents and letting action descriptions constrain the generated artifact. With comic generation, I'm struck by how much the usual nonsensical output of Markov chaining is mitigated by prioritizing referral to previous visual elements, and how "narrative-feeling" these generated panel sequences manage to be.

Mattie Brice has written about the strange lack of interpretive components to games, pointing to the Tarot as an example of a practice that does centralize interpretation within a generative system. (Tarot in particular could be an interesting system to try to operationalize for narrative generation due to two-dimensional "spreads" symbolizing more complex relationships than simple temporal sequentiality.) Divination systems have an established link to generative stories through Nick Montfort's observations of the I Ching and Llull machine being pre-digital examples of text generators. And Mitu Khandaker-Kokoris spoke about two understandings of "immersion," one which is the typical VR fantasy, and the other of which comes from human brains filling in gaps left by sufficiently agentive-seeming abstract rule systems.

I feel like the current mainstream of game design and PCG is so literalist, measuring the effectiveness of play or generated artifacts in terms of how immediately legible they are, while arbitrarily privileging other forms of difficulty (spatial reasoning, twitch reflexes, etc.). In contrast, I find the recurring themes of meaning-making as a mechanic really exciting, conceptualizing human engagement with a system as the effort of interpretative meditation, and embracing that phenomenon as an alternative to other metrics for "flow"/"immersion"/"fun."


  1. This is a really cool project and I think the ICCC community would love it, by the way (deadline's February and it's in Paris!)

    The last paragraph is interesting to me - I think I agree with what you're saying, and I would love to do more work in that space, but personally I feel constrained already by how evaluation-light my work is, and I feel obliged to do work that is easier to do (lazy, meaningless but reviewer-pleasing) evaluation on. It's a sad realisation (the papers I'm drafting for next year are all of this form, in fact).

    (This is @mtrc btw, it took me forever to sign in to comment >_>)

  2. This blog awesome and i learn a lot about programming from here.The best thing about this blog is that you doing from beginning to experts level.

    Love from


Post a Comment

Popular posts from this blog

Using Twine for Games Research (Part II)

This preliminary discussion introduced my thoughts on using Twine as a tool for creating prototypes for games research. I'll start with documenting my first case study: a hack-and-slash RPG-like setting where the player character has a record of attributes ("stats") that evolve through actions that turn certain resources (money, health, items) into others. I've selected this hack-and-slash example because it falls outside the canonical "branching story" domain thought to be Twine's primary use case, but it is not too much trickier to implement. It relies crucially on the management of state in ways that simple branching stories would not, but it does so in a fairly straightforward way.

If all goes well, this post may also serve as a tutorial on the "basics" of Twine (links + variables + expressions). In particular, I'll be using Twine 2/Harlowe, and I haven't seen many tutorials for this new version published yet.

To me, the main "…

Why I don't like the term "AI"

Content note: I replicate some ableist language in this post for the sake of calling it out as ableist.

In games research, some people take pains to distinguish artificial intelligence from computational intelligence (Wikipedia summary), with the primary issue being that AI cares more about replicating human behavior, while CI is "human-behavior-inspired" approaches to solving concrete problems. I don't strongly identify with one of these sub-areas more than the other; the extent to which I hold an opinion is mainly that I find the distinction a bit silly, given that the practical effects seem mainly to be that there are two conferences (CIG and AIIDE) that attract the same people, and a journal (TCIAIG - Transactions on Computational Intelligence and Artificial Intelligence in Games) that seems to resolve the problem by replacing instances of "AI" with "CI/AI."

I have a vague, un-citeable memory of hearing another argument from people who dislike the…

Using Twine for Games Research (Part III)

Where we last left off, I described Twine's basic capabilities and illustrated how to use them in Twine 2 by way of a tiny hack-and-slash RPG mechanic. You can play the result, and you should also be able to download that HTML file and use Twine 2's "import file" mechanism to load the editable source code/passage layout.

Notice that, in terms of game design, it's not much more sophisticated than a slot machine: the only interesting decision we've incorporated is for the player to determine when to stop pushing her luck with repeated adventures and go home with the current spoils.

What makes this type of RPG strategy more interesting to me is the sorts of decisions that can have longer-term effects, the ones where you spend an accumulation of resources on one of several things that might have a substantial payoff down the road. In a more character-based setting, this could be something like increasing skill levels or adding personality traits.

Often, the game-…