Programming interactivity with substructural logic
This will be the first of four posts in which I attempt to organize my thoughts about an idea for a project. It's related to logic programming interactive fiction and the OBT talk I gave this past January, but I'm trying to extend the ideas beyond interactive fiction -- which represents one particular interaction pattern -- into more general process-calculus-ish terms.
Rob Simmons, who just defended a thesis that I take a great deal of inspiration from, has a good primer on substructural logic programming over on his blog. To review briefly, the key concepts I'm building on are:
Rob Simmons, who just defended a thesis that I take a great deal of inspiration from, has a good primer on substructural logic programming over on his blog. To review briefly, the key concepts I'm building on are:
- logic programming, in which a program is specified by declaring several atomic base types and a collection of logical propositions describing how they relate to one another, and a particular proof search strategy determines the semantics of its execution;
- forward-chaining or bottom-up logic programming, a particular proof search (i.e. execution strategy) where a collection of base facts is allowed to evolve independent of the goal, and halts when no new information can be learned;
- linear or otherwise substructural logic programming, where the propositions making up a program are taken from a setting where instead of facts we deal with resources. A rule "a -o b" can be read "replace a with b" rather than "if we know a, add b to the database of things we know in addition to a."
The thing I like especially about the forward-chaining interpretation of logic programming is that it makes what seems like a static, declarative, description of a system, come alive. Each step of program execution is a soup of state, some pieces active, some dormant, and the program effectively describes a pattern-match on that state to roll it forward in some way that isn't completely specified by the programmer; it arises from emergent configurations and the nondeterministic choices. This seems like a programming setting that's ripe for writing physical simulations, animations, and "procedural generation" of content that's so popular in gaming these days. The fact that we can't do this already is, perhaps, mostly a question of tool support -- hooking up some visualization to a trace for a substructural logic language like Celf or Ollibot -- but I think there are still some important theoretical concerns, especially if we want as programmers e.g.
- to be able to specify different rendering modes
- to be able to handle some external control over the proof search/nondeterministic choices from, say, a human
- to specify these systems in modular and composable pieces.
A very abstract process signature
To understand where I'm trying to go, especially on that last point, consider how you might specify the interface in, say, Standard ML, to a module representing an interactive process.
signature Process = sig
type el
type il
val listen : el -> il
val think : il -> il
val reply : il -> el
end
At a very abstract level, what any interactive agent (think: a REPL, a turn-based game, a web server, a human in a conversation) has to do is listen (translate from what they are hearing from some other agent into their own mental model), think (perform some internal processing on the data), and reply (turn the evolution of their thoughts back into something the opposing agent can understand).
This is not too different from the signature I first wrote for Functioning, a little framework for writing games in Standard ML using tom7's graphics and physics library ports (of SDL/ML and Box2D). (Its name is a pun on Processing.) In a game setting, listen polls for keyboard and mouse events, think handles those events in terms of the game logic/physics engine, and reply renders graphics. The "el" (external language) is user input, and the "il" (internal language) is the game's data structures.
The problem with this model is that a game is a thing with a lot of internal state. It's maybe gotta handle screen positions of various things and check for collisions and all kinds of micromanagement in addition to tracking the state of the overarching narrative, more coarse-grained state transitions that, say, move the player to the next level. In Standard ML, you can either try your darnedest to be purely functional and cram all that stuff into one giant record or nested record, representing the "il" type, or you can turn to good ol' global references and do things the imperative way.
I feel like linear logic programming hits a certain sweet spot in terms of handling state. You get a sort of "frame rule" (in the separation logic sense) where you don't have to mention state you're not changing, but you're still able to describe your game logic in a declarative, appropriately-high-level way.
For a tiny example, I'm going to return to good ol' switch toggling. Consider the following CLF (which will evolve into pseudo-CLF) program:
state : type.
on : state.
off : state.
switch : state -> type.
t1 : switch on -o {switch off}.
t2 : switch off -o {switch on}.
If we query the program with some initial state for switch, this program will never saturate and so never terminate. But what we have in mind is that only when something interactively probes the state, it evolves. We can capture this by tossing in an extra token to represent the "signal" from an external source.
sigtoggle : type.
t1 : sigtoggle * switch on -o {switch off}.
t2 : sigtoggle * switch off -o {switch on}.
Now only when a sigtoggle enters the context can a rule fire. Now, how might we specify some pattern of toggle signals to send to this program? Perhaps all we want to specify is that there's some stream of them being generated arbitrarily often, which we might write like
ext : 1 -o {sigtoggle}.
Or maybe we want to synchronously wait for a reply. This requires globally modifying the program:
sigtoggle : type. sigtoggled : type.
t1 : sigtoggle * switch on -o {switch off * sigtoggled}.
t2 : sigtoggle * switch off -o {switch on * sigtoggled}.
ext : sigtoggled -o {sigtoggle}.
This synchronous program can run in CLF with the desired semantics, which of course are no different from the simple infinitely-toggling, single-"process" program we started with, except that we've expanded the loop to include a waypoint. Already, there's something unsatisfying about this: it's tangling up the listen, think, and reply phases. What we started with, the act of transitioning the switch between on and off, was the think phase, munging bits of internal state that we don't necessarily intend anyone we're communicating with to be able to make any sense of. The signals are the only things we want to transcend those barriers. So what I'm imagining is being able to draw a boundary around each of these subprograms in maybe the following syntax
proc Switch (sigtoggle -o {sigtoggled}) =
t1 : switch on -o {switch off}.
t2 : switch off -o {switch on}.
end
proc Toggle (sigtoggled -o {sigtoggle}) =
ext : 1 -o {1}.
end
I intend this to have the same semantics as the preceding program, but with the distinction that
proc Toggle (sigtoggled -o {sigtoggle}) =
ext : switch X -o {1}.
end
would be a scope error, because switch isn't in scope for the Toggle process. The types in parentheses represent the way processes are allowed to communicate, i.e. the communication graph.
Now suppose the switch does want to communicate its state to the external process. How could we specify this?
reply : state -> type.
proc Switch (sigtoggle -o {reply X}) =
t1 : switch on -o {switch off * reply off}.
t2 : switch off -o {switch on * reply on}.
end
This isn't quite right, because now the specific reply must be mentioned in the internal rule in addition to specified as an output at the interface level. I could see a way of elaborating it that does what I have in mind, but it no longer feels canonical.
What I eventually want to be able to do is:
Props-as-signals
For a tiny example, I'm going to return to good ol' switch toggling. Consider the following CLF (which will evolve into pseudo-CLF) program:
state : type.
on : state.
off : state.
switch : state -> type.
t1 : switch on -o {switch off}.
t2 : switch off -o {switch on}.
If we query the program with some initial state for switch, this program will never saturate and so never terminate. But what we have in mind is that only when something interactively probes the state, it evolves. We can capture this by tossing in an extra token to represent the "signal" from an external source.
sigtoggle : type.
t1 : sigtoggle * switch on -o {switch off}.
t2 : sigtoggle * switch off -o {switch on}.
Now only when a sigtoggle enters the context can a rule fire. Now, how might we specify some pattern of toggle signals to send to this program? Perhaps all we want to specify is that there's some stream of them being generated arbitrarily often, which we might write like
ext : 1 -o {sigtoggle}.
Or maybe we want to synchronously wait for a reply. This requires globally modifying the program:
sigtoggle : type. sigtoggled : type.
t1 : sigtoggle * switch on -o {switch off * sigtoggled}.
t2 : sigtoggle * switch off -o {switch on * sigtoggled}.
ext : sigtoggled -o {sigtoggle}.
This synchronous program can run in CLF with the desired semantics, which of course are no different from the simple infinitely-toggling, single-"process" program we started with, except that we've expanded the loop to include a waypoint. Already, there's something unsatisfying about this: it's tangling up the listen, think, and reply phases. What we started with, the act of transitioning the switch between on and off, was the think phase, munging bits of internal state that we don't necessarily intend anyone we're communicating with to be able to make any sense of. The signals are the only things we want to transcend those barriers. So what I'm imagining is being able to draw a boundary around each of these subprograms in maybe the following syntax
proc Switch (sigtoggle -o {sigtoggled}) =
t1 : switch on -o {switch off}.
t2 : switch off -o {switch on}.
end
proc Toggle (sigtoggled -o {sigtoggle}) =
ext : 1 -o {1}.
end
I intend this to have the same semantics as the preceding program, but with the distinction that
proc Toggle (sigtoggled -o {sigtoggle}) =
ext : switch X -o {1}.
end
would be a scope error, because switch isn't in scope for the Toggle process. The types in parentheses represent the way processes are allowed to communicate, i.e. the communication graph.
Now suppose the switch does want to communicate its state to the external process. How could we specify this?
reply : state -> type.
proc Switch (sigtoggle -o {reply X}) =
t1 : switch on -o {switch off * reply off}.
t2 : switch off -o {switch on * reply on}.
end
This isn't quite right, because now the specific reply must be mentioned in the internal rule in addition to specified as an output at the interface level. I could see a way of elaborating it that does what I have in mind, but it no longer feels canonical.
What I eventually want to be able to do is:
- be able to separately compile individual processes, configure them to connect signals to different UNIX I/O ports (including stdin), and run them together
- specify configurations that can e.g. receive or send from many different instances of the same process specification
- have linguistic tools for reasoning about these interactions, e.g. by way of Twelf-style directives or metatheoretic analyses, to prove statements of the form "So long as process p is always going to send another signal s, the program satisfies (liveness/termination/deadlock freedom)."
The last bullet is probably the most theoretical angle (and therefore the one that most appeals to me); I want there to be something as clear, universal, and simple as progress & preservation, but for these partial specifications in logic languages. Something like Rob's generative invariants would probably be a good starting place, but of course the fact that these specifications are partial means we need to reason wrt the properties of anything that can hook up to them.
(Meta) external inputs...
I've also been having a bunch of meetings with people, specifically Will Byrd, Frank Pfenning, Rob Simmons, Umut Acar, and Elsa Gunter. I learned that Umut has some ideas about enabling this kind of interactivity from a self-adjusting/dynamic computation standpoint and linking it to Functional Reactive Programming. Frank and Elsa have a grant to work on making a session-typed language practical for use, and while session types may be the wrong model for exactly what I have in mind, certainly some of the core ideas will be relevant (and good example fodder, e.g. a prime number sieve would be a neat thing to try to specify in terms of modular processes).
Elsa in particular has looked at human/computer interaction examples, and one thing she said that I thought was interesting was that in most cases she's looked at, the human input need only be thought of as a base type value or instance of an enumeration -- which immediately made me think of how the whole reason interactive fiction is way cooler than CYOA is that your choices are infinite; they come from a certain well-defined recursive type, but it has infinite elements.
Anyway, suggestions for further reading are welcome.
That sounds really cool! I am excited about your research and would love to alpha test any tools you come up with :D
ReplyDeleteI'm glad you mentioned FRP; my officemate gave a talk on it a couple of weeks ago (that's where most of his research is), and I was strongly reminded of substructural logic programming during the talk. I forget exactly what it was that reminded me of it, but there was something there, I promise!
I would like to make the argument that HCI itself is but a branch of programming languages research. The very idea of "human-computer interaction" is nothing other than a fancy word for "programming". And programming, being an explanatory activity, is inherently linguistic in nature. The ideas you sketch here may be seem as a step towards a fuller theory of programming that encompasses more explicitly the demands usually called "hci".
ReplyDeletebtw, what is CYOA?
ReplyDelete"choose your own adventure" -- a term from the interactive narrative community usually used to contrast with what they do.
DeleteI am reminded of a few tangents, which I will describe badly because it is late at night. I will attempt to clarify in the future.
ReplyDeleteA few tangents:
- Bret Victor gave seminar talk at Adobe where he demoed a sort of geometric pattern matching to create behavioral animations. (Interesting, maybe, because of how these geometric pattern matches immediately handle continuous values, without looking like math.)
- Which actually brings me to my second tangent, which is that it seems like thinking about sloppier (inexactly-represented, real-valued) state makes sense in a game/UI context; cramming this into logic isn't something that I've seen, but I'm not a PL person.
- One way of looking at linear logic is that it makes parallel execution "automatic"; this reminds me of http://www.dangermouse.net/esoteric/whenever.html , and of Alice's "Do In Order"/"Do Together" constructs. This mode works well when you are working with a world-like game featuring objects, which is maybe all games ever.
- As I think I said earlier, I'm reminded of PureData / MAX by this -- box/wire languages where safety is guaranteed because you can't plug the wrong wire into the wrong box, and you get some (apparent) parallelism by the arrangement and wiring of boxes. [Not actually quite true; Pd is kinda dynamic, but boxes with the wrong wire connected just don't do anything. MAX I don't know. I have a more-statically-typed-ish wire programming language in Preshack.] Also interesting to compare the way that Pd's messages are consumed to linear logic/pattern matching. Oh, worth noting the difference between Pd ("message passing") and, say, maya's hypergraph or blender's shader networks ("dataflow").
- Related to Pd, I have a prototype somewhere of a Pd-like language where messages are passed by throwing physical objects into a simulation; in this case, messages of improper type just bounce off of the receiver, and fly out into the world to mess with other signal processing. Wonderfully chaotic.
I've been implementing a language for GUI programming with linear types and temporal modalities, along the lines of my ICFP paper from 2011, and I find it a very nice style of programming. It's in a functional, rather than a logic programming, style, so it does not directly line up with your ideas.
ReplyDeleteHowever, one observation that I've made is that now that the right types are available, a lot of programs go from being higher-order programs to being first-order programs. That is, you don't have to right callbacks or reason explicitly about event loops any more -- instead you write recursive functions on streams and reason (co)inductively about them. This is a big win for comprehensibility, IMO. (It also lets me advocate for FP on the grounds that it lets you eliminate higher-order functions, which sounds paradoxical enough to get the attention of the audience.)
The first-orderness suggests that interactive programs are indeed a ripe target for hitting with the logic programming hammer. I'll just suggest that you might find explicit temporal modalities useful as well.
Also, if you want to see a compiler, get in touch with me and I'll send you my ML code. (I'm delaying the public release until I finish adding higher kinds and index refinements -- right now it just has F-style polymorphism.)
There's some work from Edinburgh on using Linear Logic proof search to plan dialogue between agents that looks relevant:
ReplyDeletehttp://dream.inf.ed.ac.uk/projects/dialoguell/
Did you know that you can make dollars by locking special pages of your blog or site?
ReplyDeleteSimply join Mgcash and implement their Content Locking tool.