6.4 KiB
Working notes on bytecode stuff
2024-12-15
So far, I've done the easy stuff: constants, and ifs.
There's still some easy stuff left:
- lists
- dicts
- when
- panic
So I'll do those next.
But then we've got two doozies: patterns and bindings, and tuples.
Tuples make things hard
In fact, it's tuples that make things hard.
The idea is that, when possible, tuples should be stored on the stack.
That makes them a different creature than anything else.
But the goal is to be able, in a function call, to just push a tuple onto the stack, and then match against it.
Because a tuple isn't just another Value
, that makes things challenging.
BUT: matching against all other Values
should be straightforward enough?
I think that the way to do this is to reify patterns.
Rather than try to emit bytecodes to embody patterns, the patterns are some kind of data that get compiled and pushed onto a stack like keywords and interned strings and whatnot.
And then you can push a pattern onto the stack right behind a value, and then have a match
opcode that pops them off.
Things get a bit gnarly since patterns can be nested. I'll start with the basic cases and run from there.
But when things get very gnarly is considering tuples on the stack. How do you pop off a tuple?
Two thoughts:
- Just put tuples on the heap. And treat function arguments/matching differently.
- Have a "register" that stages values to be pattern matched.
Regarding the first option
I recall seeing somebody somewhere make a comment that trying to represent function arguments as tuples caused tons of pain. I can see why that would be the case, from an implementation standpoint. We should have values, and don't do fancy bookkeeping if we don't have to.
Conceptually, it makes a great deal of sense to think of tuples as being deeply the same as function invocation. But practically, they are different things, especially with Rust underneath.
This feels like this cuts along the grain, and so this is what I will try.
I suspect that I'll end up specializing a lot around function arguments and calling, but that feels more tractable than the bookkeeping around stack-based tuples.
2024-12-17
Next thoughts: take some things systematically rather than choosing an approach first.
Things that always match
-
Placeholder.
- I think this is just a no-op. A
let
expression leaves its rhs pushed on the stack.
- I think this is just a no-op. A
-
Word: put something on the stack, and bind a name.
- This should follow the logic of locals as articulated in Crafting Interpreters.
In both of these cases, there's no conditional logic, simply a bind.
Things that never bind
- Atomic values: put the rhs on the stack, then do an equality check, and panic if it fails. Leave the thing on the stack.
Analysis
In terms of bytecode, I think one thing to do, in the simple case, is to do the following:
push
apattern
onto the stackmatch
--pops the pattern and the value off the stack, and then applies the pattern to the value. It leaves the value on the stack, and pushes a special value onto the stack representing a match, or not.- We'll probably want
match-1
,match-2
,match-3
, etc., opcodes for matching a value that's that far back in the stack. E.g.,match-1
matches against not the top element, but thetop - 1
element. - This is specifically for matching function arguments and
loop
forms.
- We'll probably want
- There are a few different things we might do from here:
panic_if_no_match
: panic if the last thing is ano_match
, or just keep going if not.jump_if_no_match
: in amatch
form or a function, we'll want to move to the next clause if there's no match, so jump to the next clause'spattern
push
code.
- Compound patterns are going to be more complex.
- I think, for example, what you're going to need to do is to get opcodes that work on our data structures, so, for example, when you have a
match_compound
opcode and you start digging into the pattern.
- I think, for example, what you're going to need to do is to get opcodes that work on our data structures, so, for example, when you have a
- Compound patterns are specifically data structures. So simple structures should be stack-allocated, and and complex structures should be pointers to something on the heap. Maybe?
A little note
For instructions that need more than 256 possibilities, we'll need to mush two u8
s together into a u16
. The one liner for this is:
let number = ((first as u16) << 8) | second as u16;
Oy, stacks and expressions
One thing that's giving me grief is when to pop and when to note on the value stack.
So, like, we need to make sure that a line of code leaves the stack exactly where it was before it ran, with the exception of binding forms: let
, fn
, box
, etc. Those leave one (or more!) items on the stack.
In the simplest case, we have a line of code that's just a constant:
false
This should emit the bytecode instructions (more or less):
push false
pop
The push comes from the false
value.
The pop comes from the end of a (nonbinding) line.
The problem is that there's no way (at all, in Ludus) to distinguish between an expression that's just a constant and a line that is a complete line of code that's an expression.
So if we have the following:
let foo = false
We want:
push false
Or, rather, given that foo
is a word pattern, what we actually want is:
push false # constant
push pattern/word # load pattern
pop
pop # compare
push false # for the binding
But it's worth it here to explore Ludus's semantics.
It's the case that there are actually only three binding forms (for now): let
, fn
, and box
.
Figuring out let
will help a great deal.
Match also binds things, but at the very least, match doesn't bind with expressions on the rhs, but a single value.
Think, too about expressions: everything comes down to a single value (of course), even tuples (especially now that I'm separating function calls from tuple values (probably)).
So: anything that isn't a binding form should, before the pop
from the end of a line, only leave a single value on the stack.
Which suggests that, as odd as it is, pushing a single nil
onto the stack, just to pop it, might make sense.
Or, perhaps the thing to do is to peek: if the line in question is binding or not, then emit different bytecode.
That's probably the thing to do. Jesus, Scott.
And another thing worth internalizing: every single instruction that's not an explicit push or pop should leave the stack length unchanged.
So store and load need always to swap in a nil