### coinductive winnowing

I implemented a version of the winnow algorithm for online learning in SML, because I was trying to understand learning algorithms in terms of coinductive data.

The idea behind a learning algorithm, as I understand it (as someone who was just taught about them yesterday), is that given a stream of tagged inputs (like images marked with the locations of faces), you should get better and better at predicting the tags -- specifically, you should make a number of total mistakes bounded by the input size.

Alternatively, you can think of the "tags" as a function from input to answer (for simplicity, from bitstring to bit) that the learning algorithm is trying to approximate. So the "learner" gets to take this function, which I call the "reference function", as an input, and compare its predictions to the actual result, which it can use to update some priors.

Roughly, it has the shape:
1. Given an input, make a prediction
2. Check the answer with the reference function
3. If we're wrong, refine our approximation
4. GOTO 1

The winnow algorithm has a particularly simple-to-implement instantiation of steps 1 and 3. In a sentence: we keep around a vector of weights for each position of our bit-vector input, make our prediction based on the weighted sum of the bits of the input, and either halve or double the weights on the positive bits if we are wrong (depending in which direction we were wrong).

The correctness of the algorithm, while interesting, is not the point -- my intention by posting this on a PL blog is just to show how you can code up an online algorithm with coinductive data, since I couldn't seem to find something like this when I searched.

The first question I had was: what is the type of a learning algorithm? It should be something that first takes a function from input to output (the thing we're approximating), and returns a function from input to output... along with another function, the better approximation. In code, if we have

``` type 'a input = 'a vector datatype ans = YES | NO datatype bit = ZERO | ONE datatype 'a learner = Susp of 'a input -> (ans * 'a learner) ```

then

`winnow : (bit input -> ans) -> bit learner`

The learner datatype looks sort of like a stream in that it consists of a single constructor that wraps a suspended computation, but that suspended computation isn't just a thunk -- it actually takes an input. If you think of that input as a stream itself (as I blithely alluded to in the intro), then the learner is sort of like a stream transformer that moves in lock-step with its input.

With winnow written, you can define some function `chi` that accepts certain bit strings, go to the repl and do

``` - val (Susp f) = winnow chi; val f = fn : bit input -> ans * bit learner ```

Then repeatedly call
```- val (x, Susp f) = f y; val x = NO : ans val f = fn : bit input -> ans * bit learner ```

for various instantiations of y to see learning in action.

Complete code is here.

1. 