Monday, March 25, 2013

Modeling gameplay in Celf, Part 3

(This is another iteration of the example I developed in Part 1 and Part 2, but barring incrementally understanding the code, I think this post is relatively self-contained. Celf-contained, if you will.)

When I took a simple choice-based ("CYOA") game with a few bits of inventorial state and tried to add handles onto the rules so as to specify a specific sequence of player choices, something interesting happened: I had to make new decisions about which parts of the game the player could control, and how. For instance, whether they win or get eaten by a grue depends on a prior choice to take the lamp from the den or not; they cannot control their fate after that point. This makes clear that "getting the lamp" and "opening the door" are player-facing game controls, whereas "getting eaten by a grue" is a choice made by the game. We wound up enumerating those actions as follows.

'start : action.
'opendoor : action.
'getlamp : action.
'getkey : action.
'starttoden : action.
'starttocellar : action.
'dentocellar : action.
'cellartodoor : action.
'cellartoden : action.

It's tempting, then, to give the player a generalized, combinatorial command language, rather than a finite set of available actions, like so:

'startat : room -> action.
'open : object -> action.
'get : object -> action.
'moveto : room -> action.

For this version (which includes a few other small syntactic changes) the game rules look like this:

start_to_den    : cur_act ('startat den) * at_start -o {at_den * tick}.
start_to_cellar : cur_act ('startat cellar) * at_start -o {at_cellar * tick}.

den_to_cellar : at_den * cur_act ('moveto cellar) -o {at_cellar * tick}.
den_to_lamp   : at_den * cur_act ('get key) * ~got key -o {at_key}.
den_to_key    : at_den * cur_act ('get lamp) * ~got lamp -o {at_lamp}.
get_key       : at_key -o {got key * at_den * tick}.
get_lamp      : at_lamp -o {got lamp * at_den * tick}.

cellar_to_den  : at_cellar * cur_act ('moveto den) -o {at_den * tick}.
cellar_to_door : at_cellar * cur_act ('open door) -o {at_door}.

open_door_without_key : at_door * ~got key -o {at_cellar * ~got key * tick}.
open_door_with_key    : at_door * got key -o {at_dark}.

dark_with_lamp    : at_dark * got lamp -o {at_win}.
dark_without_lamp : at_dark * ~got lamp -o {at_lose}.

Then I started to wonder if I could recover the "fuzz testing" abilities from the original, branching-choice version of the game: could I still use Celf's logic programming engine to randomly "play" the game?

So I replaced this rule, which pulls a next action from a sequential table

next_act : tick * cur N * nth_act N A -o {cur_act A * cur (s N)}.

with this one:

player : tick * cur N -o {cur (s N) * (Pi a:action.cur_act a)}.

...and wasn't optimistic. The Pi a:action part within the forward-chaining monad generates a template cur_act in the context that can be instantiated with any action. Na├»vely, what I thought would happen is that forward chaining would instantiate cur_act at non-applicable actions all over the place, meaning that queries on end states would most of the time fail (the game would reach stuck states).

Thinking about this more, in terms of focusing behavior and by analogy with A -> B, a rule generating Pi x:A.B ought to keep the Pi in focus, forcing a choice of A (e.g. action). But since the proposition in question is actually a type, and depends upon the particular derivation of it, I suspect (as suggested, but glossed over, in Frank's course notes) that it's generating a fresh unification variable that will remain unresolved until further constraints are introduced. In this sense, it sort of gives Pi a more positive character than ->.

 The upshot is that my #query * * * 50 init -o {report END NSTEPS} generates 50 pretty solutions, some winning and some losing, a shorter example of which looks something like this:

Solution: \X1. {
    let {[X2, [X3, [X4, [X5, X6]]]]} = X1 in 
    let {[X7, X8]} = player [X6, X5] in 
    let {[X9, X10]} = start_to_den [X8 !('startat !den), X4] in 
    let {[X11, X12]} = player [X10, X7] in 
    let {X13} = den_to_lamp [X9, [X12 !('get !key), X2]] in 
    let {[X14, [X15, X16]]} = get_key X13 in 
    let {[X17, X18]} = player [X16, X11] in 
    let {X19} = den_to_key [X15, [X18 !('get !lamp), X3]] in 
    let {[X20, [X21, X22]]} = get_lamp X19 in 
    let {[X23, X24]} = player [X22, X17] in 
    let {[X25, X26]} = den_to_cellar [X21, X24 !('moveto !cellar)] in 
    let {[X27, X28]} = player [X26, X23] in 
    let {X29} = cellar_to_door [X25, X28 !('open !door)] in 
    let {X30} = open_door_with_key [X29, X14] in 
    let {X31} = dark_with_lamp [X30, X20] in 
    let {X32} = report_win [X31, X27] in X32}
 #END = w
 #NSTEPS = s !(s !(s !(s !(s !z))))

...which demonstrates (thanks to the ' syntactic markers) how the "player AI" chose to instantiate the universal quantification in a goal-directed way to satisfy the rule, with exactly the random-but-constrained character as before.

So I think that's pretty neat.

 (Code here.)

1 comment:

  1. Rob Simmons pointed out to me that a rule like

    player : tick * acts (cons A As) * cur N
    -o {cur_act A * acts As * cur (s N)}.

    ...and a query seeded with "acts As" for an unbound As also works this way, unifying As with the list of player commands as a separate term. Huh.