Wednesday, June 4, 2014

Relating Event Calculus and Linear Logic

My research uses linear logic to describe actions, i.e. to write rules to answer the question "what happens if..." rather than "what holds if...". The desire to reason about actions follows a long line of endeavors to create AI with logic, where back then AI seemed to mean "computational agents that can take initiative, have goals, reason about their environments, and act upon them." AI-inspired logicians have been reasoning about actions for a long time, and one fruitful tool for doing so is the Event Calculus (Kowalski and Sergot 1986).

In this post, I'm going to describe my understanding of the relationship between Event Calculus and linear logic, by way of some Celf code I wrote last week after reading Vladimir Alexiev's "Event Calculus as a Linear Logic Program." Primarily, this is a post about that particular paper and the tinkering I did around it, but I'll also explain some of the context I've picked up on the way.

Background: Logics of Events and Situations


The Event Calculus (EC) and its predecessor, McCarthy's Situation Calculus (SC), both carry the notion of a fluent, or a proposition indexed by some contingency, i.e. rather than simply "P holds" for a proposition P, we say "P holds under [some circumstance]." They could perhaps, then, be formulated as modal logics in their own right, but instead, they were designed as logic programs, or collections of axioms embedded in first-order logic.

In SC, circumstances are situations, which are described by initial collections of predicates and sequences of actions which can then be "evaluated" on a situation. For instance, if on-table A is a predicate describing a block A being on the table, then {on-table A} is a situation, as is do(pickup A, {on-table A}).

In EC, circumstances are times, and an event is an action that happens at a time. Rather than "P holds," the basic judgment is "P holds at time T." Implication in such a logic seems to map naturally onto the idea of causation: "If 'rain' happens at time T, then 'wet' holds at time T." In The central problem is to determine which fluents hold in which time intervals.

EC can be seen as "dual" to SC in the sense that rather than specify actions in terms of the sets of propositions (situations) that hold between events, we reason independently about events and their effects. I'll be concentrating on EC for this post.

A simple (albeit needlessly violent) example domain is the Yale Shooting Problem, wherein available actions are load, reload, shoot, and wait, and fluents are loaded and alive, each of which can be negated to form another fluent. In EC we can state the domain as follows:

causes(load, loaded).
causes(unload, not loaded).
causes(shoot, not alive).
%% no rule: causes(wait, _).
exclusive(F, not F).
exclusive(F, F).

Given this specification, as well as an initial state and some happens(E,T)assumptions, we'd like to calculate the intervals in which all of the fluents and their negations hold. In brief terms, the event calculus does this by:
- first assuming that by default, fluents persist across time; then
- analyzing happens clauses to introduce new fluents and split intervals in which exclusive fluents hold.

The first of these steps is sometimes referred to as the commonsense law of inertia, i.e. that if causes(E, F) and happens(E, T), all fluents other than F remain at T in their state before T. In terms you may be more familiar with from settings like Separation Logic, this law is a frame property. Actually, AI people use similar language sometimes: by default, first-order logic has a frame problem that calculi like SC and EC must each somehow solve.

Linear logic (LL), on the other hand, supports "local reasoning" or "inertia" or a "frame property" on an intrinsic, or meta-logical, level: the property

P |- Q
--------------
P * R |- Q * R

is simply provable from the inference rules defining LL connectives (or, if we formulate the frame property on a more structural level, then it follows by the Cut principle for the logic).

But more fundamentally, LL and EC (or SC) aren't really in the same category: LL is a logic and EC is a logic program, or at least a specification of one that can be realized in many forms, as we'll see below. Both can be used to model domains of action and state change, but EC does a specific thing with that model, which is calculate intervals for fluents. Linear logic is agnostic to what you do with the encoding, but as a full and proper logic, it includes the ability to:

- use compositional reasoning (i.e. cut or substitution) on entailments and specifications;
- treat the model itself as a logic program, with agent's actions simulated directly as proof search;
- do a post-hoc analysis of the available proof search space;
- use a proof as a meaningful record of action sequences or trees, perhaps to be treated as a plan;
- control proof search with proof-theoretical tools like focusing to better understand the coarse structure by which proofs may be compared or equated to one another.

The Event Calculus as One of Many Linear Logic Programs


Alexiev's paper did not compare LL and EC at the level of languages for encoding actions. They instead noted that the canonical transcription of EC into a first-order logic language such as Prolog makes an unnecessary use of negation as failure (NAF), and that the use of a linear logic-based logic programming language could eliminate the need for this poorly-behaved construct.

The original "simplified event calculus" program in ordinary LP with NAF (emphasis mine) is:

holds(F,T) :-
  happens(Ei, Ti), initiates(Ei, F),
  happens(Et, Tt), terminates(Et, F),
  between(Ti, T, Tt), not broken(Ti, F, Tt).
broken(Ti, F, Tt) :-
  happens(E, T), between(Ti, T, Tt),
  (initiates(E, F1); terminates(E, F1)),
  exclusive(F, F1).
exclusive(F, F).

The essence of the linear variant, as I understand it, is that rather than directly calculating holds(F,T) by calculating unbroken intervals that initiate and terminate the fluent, we start off assuming fluents that hold at the start hold indefinitely, and then iteratively splits the intervals until quiescence. At that point, all of the intervals are at their longest while still consistent with the events known to have occurred. A snippet of the code follows (wherein G stands for a goal continuation):

happens E T G :- causes E Fs, record Fs T G.
record nil T G :- G.
record (F::Fs) T G :- insert F T (record Fs T G).
insert F2 T G :- exclusive F1 F2, int F1 T1 T2, between T1 T T2,
  (int F1 T1 T -o int F2 T T2 -o G).

holds F T G :- (int F T1 T2, between T1 T T2, top) & G.

Here "int F T1 T2" means "fluent F holds between times T1 and T2, as far as we currently know." I've highlighted the most essential rule, which propagates information about a fluent F2 caused by some event that happens at time T, by finding a fluent F1 that's exclusive with it, for which we currently believe it holds past time T, and replacing that knowledge with the new knowledge that F1 holds prior to T and F2 holds afterwards.

I found the continuation-passing style of this program distracting (especially in the last rule, where it necessitates using top and & to ignore irrelevant parts of the context but preserve them for the continuation). So did the author, it seems, who also opted to re-implement this program in Lygon. Unfortunately Lygon uses classical linear logic, which I find utterly unreadable, so I transcribed it into my LLPL of choice, Celf.

Take 0: Direct Transcription into Celf


The core of the program written using forward chaining in Celf is:

happens/causes 
: happens E T * causes E Fs -o {record Fs T}. 

record/nil 
: record fnil T -o {1}.
record/cons 
: record (fcons F Fs) T -o {insert F T * record Fs T}. 

split
: insert F2 T * exclusive F1 F2 * int F1 T1 T2 * between T1 T T2
        -o {int F1 T1 T * int F2 T T2}.

As far as I can tell, this is correct and deterministic, although the determinism is a little less obvious than in the continuation passing-style program. It makes clear to me that the fundamental way linearity "replaces" NAF is by using quiescence: the program doesn't stop until there aren't any more inconsistent intervals it can split.

At this point, though, I started experiencing some philosophical nagging: this program is in some ways an interpreter for EC, or perhaps an encoding, but I wondered if there were some way to make more fundamental use of linear logic and its very appealing frame property... i.e. if I could compile EC domains into linear logic in a way that would faithfully reproduce EC semantics.

I then recalled some discussion among the senior grad students in my group from several years ago about the idea of an encoding vs. an embedding (of an object-logic into a meta-logic). I took away an intuitive sense for the distinction -- an encoding represents object-logic widgets (e.g. connectives) as meta-logic data constructors and uses the meta-logic machinery for manipulating data to operate over them -- whereas an embedding directly interprets a widget in the object logic as a viable semantically-corresponding widget in the meta-logic. The most salient example of the latter I can think of is "higher-order abstract syntax," the idea from LF that we model hypotheticals/binding in object calculi with LF's built in notion of hypothetical context.

The line still feels blurry to me, though, and is even further muddled by this discussion about "shallow" vs "deep" embeddings in Coq, which on the face of it looks an awful lot like the distinction I was just making.

And somehow I also feel like this distinction is relevant to "interpretation vs. compilation," except that I'd expect encoding to map to compilation and embedding to map to interpretation, but while I was working on this Event Calculus stuff, I kept thinking to myself "I want an embedding/compilation, not an encoding/interpretation."

I think I can make sense of that mismatch as follows: in an interpreter, "translating" and running a program are the same step, but in a compiler, the output is source code you can inspect before running it. I wanted to know what "compiled" linear logic source code would look like in the image of translation from Event Calculus.

This post feels super long already, so I think I'm gonna toss my "compilation" experiments into a Part II.

Further Reading


Iliano Cervesato did a lot of work relating Event Calculus and linear logic back in the '90s. Some papers of his that are on my reading list include:

What the Event Calculus actually does, and how to do it efficiently
A general modal framework for the event calculus and its skeptical and credulous variants
Modal event calculus in Lolli

2 comments:

  1. My take on encoding vs embedding:

    The receiver of an encoding (aka "deep embedding") has access to the whole program at once, like compilers do. The encoding incurs the sort of dispatch overhead that afflicts interpreters.

    The receiver of an embedding (aka "shallow embedding") doesn't access the whole program at once, like interpreters shouldn't. The embedding may avoid some dispatch overhead, like compilers do.

    ReplyDelete
  2. AFAICT, a deep embedding of language B into language A is when you are using A to process B, e.g. you might be writing an interpreter for B (or a compiler) using A as your implementation language. It is a sort of embedding (actually, your term "encoding" fits better) since in your A code you can create objects that represent pieces of B code and then you can run them through the `eval` function that you wrote. This is the preferred method when writing compilers, interpreters, etc. However, this approach is quite clumsy if you all want to do is to use some B code inside of your A program.

    On the other hand, you have shallow embeddings of B into A. That's when you use language A to write B programs. Take for example B to be the language of arithmetic expressions and A to be any mainstream programming language (e.g. Haskell). You could say that the standard library of A, which defines all the arithmetic operators, is actually a shallow embedding of B into A (i.e. it lets you write arithmetic expressions as Haskell code).

    The difference is the one between either defining and implementing the arithmetic operators directly (shallow embedding) or building a data structure that represents arithmetic expressions and writing an algorithm which evaluates it (deep embedding). The former is more practical if you want to use arithmetic expressions in your A programs, while the latter is more useful if instead you just want to use A to analyze and transform the expressions.

    As such, deep embeddings are usually quite boring, it's just a matter of translating the definition of B into an interpreter in A. Shallow embeddings, on the other hand, are often quite a bit more elegant since you must devise a way how to make A code look and behave exactly like B code.

    This definition isn't really technical and I think the distinction is commonly a bit more vague. An embedding is often called more shallow the more directly it maps constructs of B to constructs of A (i.e. reusing what is in A vs implementing it from scratch).

    As such, in your case, I would call it a deep embedding, since you basically have the `happens` and `causes` as inputs and then you write some code to interpret this data. I don't know anything about LLP or EC, but I imagine that in order to embed EC in some PL in a shallow manner, the PL would have to have some notion of temporality/dynamicity, otherwise you will be forced to simulate it yourself.


    I hope I didn't muddle things any further.

    Best of luck,
    Jirka

    PS: Blogging PhD research seems like a good idea, I should give it a try too.

    ReplyDelete