This commit is contained in:
Scott Richmond 2024-12-17 23:45:39 -05:00
parent 096d8d00bc
commit 48754f92a4
4 changed files with 199 additions and 26 deletions

87
bytecode_thoughts.md Normal file
View File

@ -0,0 +1,87 @@
# Working notes on bytecode stuff
### 2024-12-15
So far, I've done the easy stuff: constants, and ifs.
There's still some easy stuff left:
* [ ] lists
* [ ] dicts
* [ ] when
* [ ] panic
So I'll do those next.
But then we've got two doozies: patterns and bindings, and tuples.
#### Tuples make things hard
In fact, it's tuples that make things hard.
The idea is that, when possible, tuples should be stored on the stack.
That makes them a different creature than anything else.
But the goal is to be able, in a function call, to just push a tuple onto the stack, and then match against it.
Because a tuple _isn't_ just another `Value`, that makes things challenging.
BUT: matching against all other `Values` should be straightforward enough?
I think that the way to do this is to reify patterns.
Rather than try to emit bytecodes to embody patterns, the patterns are some kind of data that get compiled and pushed onto a stack like keywords and interned strings and whatnot.
And then you can push a pattern onto the stack right behind a value, and then have a `match` opcode that pops them off.
Things get a bit gnarly since patterns can be nested. I'll start with the basic cases and run from there.
But when things get *very* gnarly is considering tuples on the stack.
How do you pop off a tuple?
Two thoughts:
1. Just put tuples on the heap. And treat function arguments/matching differently.
2. Have a "register" that stages values to be pattern matched.
##### Regarding the first option
I recall seeing somebody somewhere make a comment that trying to represent function arguments as tuples caused tons of pain.
I can see why that would be the case, from an implementation standpoint.
We should have _values_, and don't do fancy bookkeeping if we don't have to.
_Conceptually_, it makes a great deal of sense to think of tuples as being deeply the same as function invocation.
But _practically_, they are different things, especially with Rust underneath.
This feels like this cuts along the grain, and so this is what I will try.
I suspect that I'll end up specializing a lot around function arguments and calling, but that feels more tractable than the bookkeeping around stack-based tuples.
### 2024-12-17
Next thoughts: take some things systematically rather than choosing an approach first.
#### Things that always match
* Placeholder.
- I _think_ this is just a no-op. A `let` expression leaves its rhs pushed on the stack.
* Word: put something on the stack, and bind a name.
- This should follow the logic of locals as articulated in _Crafting Interpreters_.
In both of these cases, there's no conditional logic, simply a bind.
#### Things that never bind
* Atomic values: put the rhs on the stack, then do an equality check, and panic if it fails. Leave the thing on the stack.
#### Analysis
In terms of bytecode, I think one thing to do, in the simple case, is to do the following:
* `push` a `pattern` onto the stack
* `match`--pops the pattern and the value off the stack, and then applies the pattern to the value. It leaves the value on the stack, and pushes a special value onto the stack representing a match, or not.
- We'll probably want `match-1`, `match-2`, `match-3`, etc., opcodes for matching a value that's that far back in the stack. E.g., `match-1` matches against not the top element, but the `top - 1` element.
- This is _specifically_ for matching function arguments and `loop` forms.
* There are a few different things we might do from here:
- `panic_if_no_match`: panic if the last thing is a `no_match`, or just keep going if not.
- `jump_if_no_match`: in a `match` form or a function, we'll want to move to the next clause if there's no match, so jump to the next clause's `pattern` `push` code.
* Compound patterns are going to be more complex.
- I think, for example, what you're going to need to do is to get opcodes that work on our data structures, so, for example, when you have a `match_compound` opcode and you start digging into the pattern.
* Compound patterns are specifically _data structures_. So simple structures should be stack-allocated, and and complex structures should be pointers to something on the heap. Maybe?
#### A little note
For instructions that need more than 256 possibilities, we'll need to mush two `u8`s together into a `u16`. The one liner for this is:
```rust
let number = ((first as u16) << 8) | second as u16;
```
#### Oy, stacks and expressions
One thing that's giving me grief is when to pop and when to note on the value stack.
Consider

View File

@ -7,35 +7,41 @@ use num_traits::FromPrimitive;
#[derive(Copy, Clone, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)] #[derive(Copy, Clone, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)]
pub enum Op { pub enum Op {
Return,
Constant, Constant,
Jump, Jump,
JumpIfFalse, JumpIfFalse,
Pop,
PushBinding,
Store,
Load,
} }
impl std::fmt::Display for Op { impl std::fmt::Display for Op {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
use Op::*; use Op::*;
match self { match self {
Return => write!(f, "return"),
Constant => write!(f, "constant"), Constant => write!(f, "constant"),
Jump => write!(f, "jump"), Jump => write!(f, "jump"),
JumpIfFalse => write!(f, "jump_if_false"), JumpIfFalse => write!(f, "jump_if_false"),
Pop => write!(f, "pop"),
PushBinding => write!(f, "push_binding"),
Store => write!(f, "store"),
Load => write!(f, "load"),
} }
} }
} }
#[derive(Clone, Debug, PartialEq)] #[derive(Clone, Debug, PartialEq)]
pub struct Local { pub struct Binding {
name: &'static str, name: &'static str,
depth: u8, depth: isize,
} }
#[derive(Clone, Debug, PartialEq)] #[derive(Clone, Debug, PartialEq)]
pub struct Chunk<'a> { pub struct Chunk<'a> {
pub locals: Vec<Local>, pub bindings: Vec<Binding>,
scope_depth: usize, scope_depth: isize,
local_count: usize, num_bindings: usize,
pub constants: Vec<Value>, pub constants: Vec<Value>,
pub bytecode: Vec<u8>, pub bytecode: Vec<u8>,
pub spans: Vec<SimpleSpan>, pub spans: Vec<SimpleSpan>,
@ -51,9 +57,9 @@ pub struct Chunk<'a> {
impl<'a> Chunk<'a> { impl<'a> Chunk<'a> {
pub fn new(ast: &'a Spanned<Ast>, name: &'static str, src: &'static str) -> Chunk<'a> { pub fn new(ast: &'a Spanned<Ast>, name: &'static str, src: &'static str) -> Chunk<'a> {
Chunk { Chunk {
locals: vec![], bindings: vec![],
scope_depth: 0, scope_depth: -1,
local_count: 0, num_bindings: 0,
constants: vec![], constants: vec![],
bytecode: vec![], bytecode: vec![],
spans: vec![], spans: vec![],
@ -107,6 +113,15 @@ impl<'a> Chunk<'a> {
self.spans.push(self.span); self.spans.push(self.span);
} }
fn bind(&mut self, name: &'static str) {
println!("binding {name} at depth {}", self.scope_depth);
self.bindings.push(Binding {
name,
depth: self.scope_depth,
});
println!("{:?}", self.bindings)
}
pub fn compile(&mut self) { pub fn compile(&mut self) {
use Ast::*; use Ast::*;
match self.ast { match self.ast {
@ -129,11 +144,22 @@ impl<'a> Chunk<'a> {
} }
Block(lines) => { Block(lines) => {
self.scope_depth += 1; self.scope_depth += 1;
println!("now entering scope level {}", self.scope_depth);
for expr in lines { for expr in lines {
self.visit(expr); self.visit(expr);
self.emit_op(Op::Pop);
} }
self.emit_op(Op::Return); self.emit_op(Op::Store);
self.scope_depth -= 1; self.scope_depth -= 1;
while let Some(binding) = self.bindings.last() {
if binding.depth > self.scope_depth {
self.emit_op(Op::Pop);
self.bindings.pop();
} else {
break;
}
}
self.emit_op(Op::Load);
} }
If(cond, then, r#else) => { If(cond, then, r#else) => {
self.visit(cond); self.visit(cond);
@ -151,12 +177,30 @@ impl<'a> Chunk<'a> {
self.bytecode[jif_idx + 1] = jif_offset as u8; self.bytecode[jif_idx + 1] = jif_offset as u8;
self.bytecode[jump_idx + 1] = jump_offset as u8; self.bytecode[jump_idx + 1] = jump_offset as u8;
} }
// Let(patt, expr) => { Let(patt, expr) => {
// self.visit(expr); println!("let binding!");
// self.visit(patt); self.visit(expr);
// } self.visit(patt);
// WordPattern(name) => {} }
// PlaceholderPattern => {} WordPattern(name) => {
self.bind(name);
}
Word(name) => {
println!("resolving binding {name}");
println!("current bindings {:?}", self.bindings);
self.emit_op(Op::PushBinding);
let biter = self.bindings.iter().enumerate().rev();
for (i, binding) in biter {
println!("at index {i}");
if binding.name == *name {
self.bytecode.push(i as u8);
break;
}
}
}
PlaceholderPattern => {
self.bind("_");
}
_ => todo!(), _ => todo!(),
} }
} }
@ -169,12 +213,16 @@ impl<'a> Chunk<'a> {
let op = Op::from_u8(*byte).unwrap(); let op = Op::from_u8(*byte).unwrap();
use Op::*; use Op::*;
match op { match op {
Return => println!("{i:04}: {op}"), Pop | Store | Load => println!("{i:04}: {op}"),
Constant => { Constant => {
let (_, next) = codes.next().unwrap(); let (_, next) = codes.next().unwrap();
let value = &self.constants[*next as usize].show(self); let value = &self.constants[*next as usize].show(self);
println!("{i:04}: {:16} {next:04}: {value}", op.to_string()); println!("{i:04}: {:16} {next:04}: {value}", op.to_string());
} }
PushBinding => {
let (_, next) = codes.next().unwrap();
println!("{i:04}: {:16} {next:04}", op.to_string());
}
Jump | JumpIfFalse => { Jump | JumpIfFalse => {
let (_, next) = codes.next().unwrap(); let (_, next) = codes.next().unwrap();
println!("{i:04}: {:16} {next:04}", op.to_string()) println!("{i:04}: {:16} {next:04}", op.to_string())

View File

@ -1,5 +1,7 @@
use chumsky::{input::Stream, prelude::*}; use chumsky::{input::Stream, prelude::*};
mod memory_sandbox;
mod spans; mod spans;
mod lexer; mod lexer;
@ -52,7 +54,9 @@ pub fn run(src: &'static str) {
pub fn main() { pub fn main() {
let src = " let src = "
if false let foo = :let_foo
let bar = if true
then { then {
:foo :foo
:bar :bar
@ -63,6 +67,10 @@ if false
2 2
3 3
} }
foo
bar
"; ";
run(src); run(src);
} }

View File

@ -4,6 +4,7 @@ use crate::spans::Spanned;
use crate::value::Value; use crate::value::Value;
use chumsky::prelude::SimpleSpan; use chumsky::prelude::SimpleSpan;
use num_traits::FromPrimitive; use num_traits::FromPrimitive;
use std::mem::swap;
#[derive(Debug, Clone, PartialEq)] #[derive(Debug, Clone, PartialEq)]
pub struct Panic { pub struct Panic {
@ -29,7 +30,7 @@ pub struct Vm<'a> {
pub stack: Vec<Value>, pub stack: Vec<Value>,
pub chunk: &'a Chunk<'a>, pub chunk: &'a Chunk<'a>,
pub ip: usize, pub ip: usize,
pub bindings: Vec<(u8, usize)>, pub return_register: Value,
} }
impl<'a> Vm<'a> { impl<'a> Vm<'a> {
@ -38,24 +39,28 @@ impl<'a> Vm<'a> {
chunk, chunk,
stack: vec![], stack: vec![],
ip: 0, ip: 0,
bindings: vec![], return_register: Value::Nil,
} }
} }
pub fn push(&mut self, value: Value) { pub fn push(&mut self, value: Value) {
println!("{:04} pushing {value:?}", self.ip);
self.stack.push(value); self.stack.push(value);
} }
pub fn pop(&mut self) -> Value { pub fn pop(&mut self) -> Value {
self.stack.pop().unwrap() let value = self.stack.pop().unwrap();
println!("{:04} popping {value:?}", self.ip);
value
} }
pub fn interpret(&mut self) -> Result<Value, Panic> { pub fn interpret(&mut self) -> Result<Value, Panic> {
let byte = self.chunk.bytecode[self.ip]; let Some(byte) = self.chunk.bytecode.get(self.ip) else {
let op = Op::from_u8(byte).unwrap(); return Ok(self.stack.pop().unwrap());
};
let op = Op::from_u8(*byte).unwrap();
use Op::*; use Op::*;
match op { match op {
Return => Ok(self.stack.pop().unwrap()),
Constant => { Constant => {
let const_idx = self.chunk.bytecode[self.ip + 1]; let const_idx = self.chunk.bytecode[self.ip + 1];
let value = self.chunk.constants[const_idx as usize].clone(); let value = self.chunk.constants[const_idx as usize].clone();
@ -70,7 +75,7 @@ impl<'a> Vm<'a> {
} }
JumpIfFalse => { JumpIfFalse => {
let jump_len = self.chunk.bytecode[self.ip + 1]; let jump_len = self.chunk.bytecode[self.ip + 1];
let cond = self.stack.pop().unwrap(); let cond = self.pop();
match cond { match cond {
Value::Nil | Value::False => { Value::Nil | Value::False => {
self.ip += jump_len as usize + 2; self.ip += jump_len as usize + 2;
@ -82,6 +87,31 @@ impl<'a> Vm<'a> {
} }
} }
} }
Pop => {
self.pop();
self.ip += 1;
self.interpret()
}
PushBinding => {
let binding_idx = self.chunk.bytecode[self.ip + 1] as usize;
let binding_value = self.stack[binding_idx].clone();
self.push(binding_value);
self.ip += 2;
self.interpret()
}
Store => {
self.return_register = self.pop();
self.push(Value::Nil);
self.ip += 1;
self.interpret()
}
Load => {
let mut value = Value::Nil;
swap(&mut self.return_register, &mut value);
self.push(value);
self.ip += 1;
self.interpret()
}
} }
} }
} }