ludus/bytecode_thoughts.md

# Working notes on bytecode stuff

### 2024-12-15
So far, I've done the easy stuff: constants, and ifs.

There's still some easy stuff left:
* [ ] lists
* [ ] dicts
* [ ] when
* [ ] panic

So I'll do those next.

But then we've got two doozies: patterns and bindings, and tuples.

#### Tuples make things hard
In fact, it's tuples that make things hard.
The idea is that, when possible, tuples should be stored on the stack.
That makes them a different creature than anything else.
But the goal is to be able, in a function call, to just push a tuple onto the stack, and then match against it.
Because a tuple _isn't_ just another `Value`, that makes things challenging.
BUT: matching against all other `Values` should be straightforward enough?

I think that the way to do this is to reify patterns.
Rather than try to emit bytecodes to embody patterns, the patterns are some kind of data that get compiled and pushed onto a stack like keywords and interned strings and whatnot.
And then you can push a pattern onto the stack right behind a value, and then have a `match` opcode that pops them off.

Things get a bit gnarly since patterns can be nested. I'll start with the basic cases and run from there.

But when things get *very* gnarly is considering tuples on the stack.
How do you pop off a tuple?

Two thoughts:
1. Just put tuples on the heap. And treat function arguments/matching differently.
2. Have a "register" that stages values to be pattern matched.

##### Regarding the first option
I recall seeing somebody somewhere make a comment that trying to represent function arguments as tuples caused tons of pain.
I can see why that would be the case, from an implementation standpoint.
We should have _values_, and don't do fancy bookkeeping if we don't have to.

_Conceptually_, it makes a great deal of sense to think of tuples as being deeply the same as function invocation.
But _practically_, they are different things, especially with Rust underneath.

This feels like this cuts along the grain, and so this is what I will try.

I suspect that I'll end up specializing a lot around function arguments and calling, but that feels more tractable than the bookkeeping around stack-based tuples.

### 2024-12-17
Next thoughts: take some things systematically rather than choosing an approach first.

#### Things that always match
* Placeholder.
  - I _think_ this is just a no-op. A `let` expression leaves its rhs pushed on the stack.

* Word: put something on the stack, and bind a name.
  - This should follow the logic of locals as articulated in _Crafting Interpreters_.

In both of these cases, there's no conditional logic, simply a bind.

#### Things that never bind
* Atomic values: put the rhs on the stack, then do an equality check, and panic if it fails. Leave the thing on the stack.

#### Analysis
In terms of bytecode, I think one thing to do, in the simple case, is to do the following:
* `push` a `pattern` onto the stack
* `match`--pops the pattern and the value off the stack, and then applies the pattern to the value. It leaves the value on the stack, and pushes a special value onto the stack representing a match, or not.
  - We'll probably want `match-1`, `match-2`, `match-3`, etc., opcodes for matching a value that's that far back in the stack. E.g., `match-1` matches against not the top element, but the `top - 1` element.
  - This is _specifically_ for matching function arguments and `loop` forms.
* There are a few different things we might do from here:
  - `panic_if_no_match`: panic if the last thing is a `no_match`, or just keep going if not.
  - `jump_if_no_match`: in a `match` form or a function, we'll want to move to the next clause if there's no match, so jump to the next clause's `pattern` `push` code.
* Compound patterns are going to be more complex.
  - I think, for example, what you're going to need to do is to get opcodes that work on our data structures, so, for example, when you have a `match_compound` opcode and you start digging into the pattern.
* Compound patterns are specifically _data structures_. So simple structures should be stack-allocated, and and complex structures should be pointers to something on the heap. Maybe?

#### A little note
For instructions that need more than 256 possibilities, we'll need to mush two `u8`s together into a `u16`. The one liner for this is:

```rust
let number = ((first as u16) << 8) | second as u16;
```

#### Oy, stacks and expressions
One thing that's giving me grief is when to pop and when to note on the value stack.

So, like, we need to make sure that a line of code leaves the stack exactly where it was before it ran, with the exception of binding forms: `let`, `fn`, `box`, etc. Those leave one (or more!) items on the stack.

In the simplest case, we have a line of code that's just a constant:

```
false
```
This should emit the bytecode instructions (more or less):
```
push false
pop
```
The push comes from the `false` value.
The pop comes from the end of a (nonbinding) line.

The problem is that there's no way (at all, in Ludus) to distinguish between an expression that's just a constant and a line that is a complete line of code that's an expression.

So if we have the following:
```
let foo = false
```
We want:
```
push false
```
Or, rather, given that `foo` is a word pattern, what we actually want is:
```
push false # constant
push pattern/word # load pattern
pop
pop # compare
push false # for the binding
```

But it's worth it here to explore Ludus's semantics.
It's the case that there are actually only three binding forms (for now): `let`, `fn`, and `box`.
Figuring out `let` will help a great deal.
Match also binds things, but at the very least, match doesn't bind with expressions on the rhs, but a single value.

Think, too about expressions: everything comes down to a single value (of course), even tuples (especially now that I'm separating function calls from tuple values (probably)).
So: anything that *isn't* a binding form should, before the `pop` from the end of a line, only leave a single value on the stack.
Which suggests that, as odd as it is, pushing a single `nil` onto the stack, just to pop it, might make sense.
Or, perhaps the thing to do is to peek: if the line in question is binding or not, then emit different bytecode.
That's probably the thing to do. Jesus, Scott.

And **another** thing worth internalizing: every single instruction that's not an explicit push or pop should leave the stack length unchanged.
So store and load need always to swap in a `nil`

### 2024-12-23
Compiling functions.

So I'm working through the functions chapter of _CI_, and there are a few things that I'm trying to wrap my head around.

First, I'm thinking that since we're not using raw pointers, we'll need some functional indirection to get our current byte.

So one of the hard things here is that, unlike with Lox, Ludus doesn't have fixed-arity functions. That means that the bindings for function calls can't be as dead simple as in Lox. More to the point, because we don't know everything statically, we'll need to do some dynamic magic.

The Bob Nystrom program uses three useful auxiliary constructs to make functions straightforward:

* `CallFrame`s, which know which function is being called, has their own instruction pointer, and an offset for the first stack slot that can be used by the function.

```c
typedef struct {
  ObjFunction* function;
  uint8_t* ip;
  Value* slots;
} CallFrame;

```

Or the Rust equivalent:
```rust
struct CallFrame {
  function: LFn,
  ip: usize,
  stack_root: usize,
}
```

* `Closure`s, which are actual objects that live alongside functions. They have a reference to a function and to an array of "upvalues"...
* `Upvalue`s, which are ways of pointing to values _below_ the `stack_root` of the call frame.

##### Digression: Prelude
I decided to skip the Prelude resolution in the compiler and only work with locals. But actually, closures, arguments, and the prelude are kind of the same problem: referring to values that aren't currently available on the stack.

We do, however, know at compile time the following:
* If a binding's target is on the stack, in a closure, or in the prelude.
* This does, however, require that the function arguments work in a different way.

The way to do this, I reckon, is this:
* Limit arguments (to, say, no more than 7).
* A `CallFrame` includes an arity field.
* It also includes an array of length 7.
* Each `match` operation in function arguments clones from the call frame, and the first instruction for any given body (i.e. once we've done the match) is to clear the arguments registers in the `CallFrame`, thus decrementing all the refcounts of all the heap-allocated objects.
* And the current strategy of scoping and popping in the current implementation of `match` will work just fine!

Meanwhile, we don't actually need upvalues, because bindings cannot change in Ludus. So instead of upvalues and their indirection, we can just emit a bunch of instructions to have a `values` field on a closure. The compiler, meanwhile, will know how to extract and emit instructions both to emit those values *and* to offer correct offsets.

The only part I haven't figured out quite yet is how to encode access to what's stored in a closure.

Also, I'm not certain we need the indirection of a closure object in Ludus. The function object itself can do the work, no?

And the compiler knows which function it's closing over, and we can emit a bunch of instructions to close stuff over easily, after compiling the function and putting it in the constants table. The way to do this is to yank the value to the top of the stack using normal name resolution procedures, and then use a two-byte operand, `Op::Close` + index of the function in the constants table.

##### End of digression.
And, because we know exactly is bound in a given closure, we can actually emit instructions to close over a given value easily.

#### A small optimization
The lifetimes make things complicated; but I'm not sure that I would want to actually manage them manually, given how much they make my head hurt with Rust. I do get the sense that we will, at some point, need some lifetimes. A `Chunk` right now is chunky, with lots of owned `vec`s.

Uncle Bob separates `Chunk`s and `Compiler`s, which, yes! But then we have a problem: all of the information to climb back to source code is in the `Compiler` and not in the `Chunk`. How to manage that encoding?

(Also the keyword and string intern tables should be global, and not only in a single compiler, since we're about to get nested compilers...)

### 2024-12-24
Other interesting optimizations abound:
* `add`, `sub`, `inc`, `dec`, `type`, and other extremely frequently used, simple functions can be compiled directly to built-in opcodes. We still need functions for them, with the same arities, for higher order function use.
  - The special-case logic is in the `Synthetic` compiler branch, rather than anywhere else.
  - It's probably best to disallow re-binding these names anywhere _except_ Prelude, where we'll want them shadowed.
  - We can enforce this in `Validator` rather than `Compiler`.
* `or` and `and` are likewise built-in, but because they don't evaluate their arguments eagerly, that's another, different special case that's a series of eval, `jump_if_false`, eval, `jump_if_false`, instructions.
* More to the point, the difference between `or` and `and` here and the built-ins is that `or` and `and` are variadic, where I was originally thinking about `and` and co. as fixed-arity, with variadic behaviours defined by a shadowing/backing Ludus function. That isn't necessary, I don't think.
* Meanwhile, `and` and `or` will also, of necessity, have backing shadowing functions.

#### More on CallFrames and arg passing
* We don't actually need the arguments register! I was complicating things. The stack between the `stack_root` and the top will be _exactly_ the same as an arguments register would have been in my imagination. So we can determine the number of arguments passed in with `stack.len() - stack_root`, and we can access argument positions with `stack_root + n`, since the first argument is at `stack_root`.
  - This has the added benefit of not having to do any dances to keep the refcount of any heap-allocated objects as low as possible. No extra `Clone`s here.
* In addition, we need two `check_arity` ops: one for fixed-arity clauses, and one for clauses with splatterns. Easily enough done. Remember: opcodes are for special cases!

#### Tail calls
* The way to implement tail calls is actually now really straightforward! The idea is to simply have a `TailCall` rather than a `Call` opcode. In place of creating a new stack frame and pushing it to the call stack on top of the old call frame, you pop the old call frame, then push the new one to the call stack.
* That does mean the `Compiler` will need to keep track of tail calls. This should be pretty straightforward, actually, and the logic is already there in `Validator`.
* The thing here is that the new stack frame simply requires the same return location as the old one it's replacing.
* That reminds me that there's an issue in terms of keeping track of not just the IP, but the chunk. In Lox, the IP is a pointer to a `u8`, which works great in C. But in Rust, we can't use a raw pointer like that, but an index into a `vec<u8>`. Which means the return location needs both a chunk and an index, not just a `u8` pointer:
```rust
struct StackFrame<'a> {
  function: LFn,
  stack_root: usize,
  return: (&'a Chunk, usize),
}
```
(I hate that there's a lifetime here.)

This gives us a way to access everything we need: where to return to, the root of the stack, the chunk (function->chunk), the closures (function->closures).

### 2024-12-26
One particular concern here, which needs some work: recursion is challenging.

In particular, the issue is that if, as I have been planning, a function closes over all its values at the moment it is compiled, the only value type that requires updating is a function. A function can be declared but not yet defined, and then when another function that uses that function is defined, the closed-over value will be to the declaration but not the definition.

One way to handle this, I think is using `std::cell::OnceCell`. Rather than a `RefCell`, `OnceCell` has no runtime overhead. Instead, what happens is you effectively put a `None` in the cell. Then, once you have the value you want to put in there, you call `set` on the `OnceCell`, and it does what it needs to.

This allows for the closures to be closed over right after compilation.

### 2024-12-27
Working on `loop` and `recur`, rather than `fn`--this is the gentler slope.
And discovering that we actually need a `[Value; 15]` return register.
`recur` needs to push all the arguments to the stack, then yank them off into the return register, then pop back to the loop root, then push all the things back onto the stack, then jump to the beginning of the loop.
And that also means I need a different `Value` variant that's a true `Nothing`, not even `nil`, which will _never_ end up anywhere other than a placeholder value in the register and on the stack.

So, next steps:
1. Add `Value::Nothing` and fix all the compiler errors
2. Make the return register `[Value; 15]`, populated with `Value::Nothing`s at initialization.
3. Update `load` and `store` to work with the array rather than a single value.
4. Create `store_n` and `load_n` to work with multiple values.
5. Create a `Vm.arity` method that computes how many non-nothings were passed into the register.
6. Then, implement `recur`
7. And, then, fix up jump indexes in `loop`
8. Fix all the off-by-one errors in the jumps