### coinductive winnowing

I implemented a version of the winnow algorithm for online learning in SML, because I was trying to understand learning algorithms in terms of coinductive data.

The idea behind a learning algorithm, as I understand it (as someone who was just taught about them yesterday), is that given a stream of tagged inputs (like images marked with the locations of faces), you should get better and better at predicting the tags -- specifically, you should make a number of total mistakes bounded by the input size.

Alternatively, you can think of the "tags" as a function from input to answer (for simplicity, from bitstring to bit) that the learning algorithm is trying to approximate. So the "learner" gets to take this function, which I call the "reference function", as an input, and compare its predictions to the actual result, which it can use to update some priors.

Roughly, it has the shape:

1. Given an input, make a prediction

2. Check the answer with the reference function

3. If we're wrong, refine our approximation

4. GOTO 1

The winnow algorithm has a particularly simple-to-implement instantiation of steps 1 and 3. In a sentence: we keep around a vector of

The correctness of the algorithm, while interesting, is not the point -- my intention by posting this on a PL blog is just to show how you can code up an online algorithm with coinductive data, since I couldn't seem to find something like this when I searched.

The first question I had was: what is the type of a learning algorithm? It should be something that first takes a function from input to output (the thing we're approximating), and returns a function from input to output... along with another function, the better approximation. In code, if we have

then

The learner datatype looks sort of like a stream in that it consists of a single constructor that wraps a suspended computation, but that suspended computation isn't just a thunk -- it actually takes an input. If you think of that input as a stream itself (as I blithely alluded to in the intro), then the learner is sort of like a stream transformer that moves in lock-step with its input.

With winnow written, you can define some function

Then repeatedly call

for various instantiations of y to see learning in action.

Complete code is here.

The idea behind a learning algorithm, as I understand it (as someone who was just taught about them yesterday), is that given a stream of tagged inputs (like images marked with the locations of faces), you should get better and better at predicting the tags -- specifically, you should make a number of total mistakes bounded by the input size.

Alternatively, you can think of the "tags" as a function from input to answer (for simplicity, from bitstring to bit) that the learning algorithm is trying to approximate. So the "learner" gets to take this function, which I call the "reference function", as an input, and compare its predictions to the actual result, which it can use to update some priors.

Roughly, it has the shape:

1. Given an input, make a prediction

2. Check the answer with the reference function

3. If we're wrong, refine our approximation

4. GOTO 1

The winnow algorithm has a particularly simple-to-implement instantiation of steps 1 and 3. In a sentence: we keep around a vector of

*weights*for each position of our bit-vector input, make our prediction based on the weighted sum of the bits of the input, and either halve or double the weights on the positive bits if we are wrong (depending in which direction we were wrong).The correctness of the algorithm, while interesting, is not the point -- my intention by posting this on a PL blog is just to show how you can code up an online algorithm with coinductive data, since I couldn't seem to find something like this when I searched.

The first question I had was: what is the type of a learning algorithm? It should be something that first takes a function from input to output (the thing we're approximating), and returns a function from input to output... along with another function, the better approximation. In code, if we have

type 'a input = 'a vector

datatype ans = YES | NO

datatype bit = ZERO | ONE

datatype 'a learner = Susp of 'a input -> (ans * 'a learner)

then

`winnow : (bit input -> ans) -> bit learner`

The learner datatype looks sort of like a stream in that it consists of a single constructor that wraps a suspended computation, but that suspended computation isn't just a thunk -- it actually takes an input. If you think of that input as a stream itself (as I blithely alluded to in the intro), then the learner is sort of like a stream transformer that moves in lock-step with its input.

With winnow written, you can define some function

`chi`

that accepts certain bit strings, go to the repl and do

- val (Susp f) = winnow chi;

val f = fn : bit input -> ans * bit learner

Then repeatedly call

`- val (x, Susp f) = f y;`

val x = NO : ans

val f = fn : bit input -> ans * bit learner

for various instantiations of y to see learning in action.

Complete code is here.

I had forgotten about this update rule, it's great to be reminded of it. (I work with perceptron updates or stochastic gradient ascent a lot; winnow is intuitively very related.)

ReplyDeleteThe definition of a learning algorithm is a little more annoying to formalize if there's noise in the system. The learner will never be 100% correct and may not even be as close as it could be, but is presumably still a "learner". But your definition seems to work pretty well if we want to avoid saying something horribly general such as we'd need to in order to allow for the learners that don't learn...