working & thinking

This commit is contained in:
Scott Richmond 2024-12-24 12:35:44 -05:00
parent a4f12c8f7d
commit ef0ac40dbe
5 changed files with 164 additions and 26 deletions

View File

@ -132,3 +132,100 @@ That's probably the thing to do. Jesus, Scott.
And **another** thing worth internalizing: every single instruction that's not an explicit push or pop should leave the stack length unchanged. And **another** thing worth internalizing: every single instruction that's not an explicit push or pop should leave the stack length unchanged.
So store and load need always to swap in a `nil` So store and load need always to swap in a `nil`
### 2024-12-23
Compiling functions.
So I'm working through the functions chapter of _CI_, and there are a few things that I'm trying to wrap my head around.
First, I'm thinking that since we're not using raw pointers, we'll need some functional indirection to get our current byte.
So one of the hard things here is that, unlike with Lox, Ludus doesn't have fixed-arity functions. That means that the bindings for function calls can't be as dead simple as in Lox. More to the point, because we don't know everything statically, we'll need to do some dynamic magic.
The Bob Nystrom program uses three useful auxiliary constructs to make functions straightforward:
* `CallFrame`s, which know which function is being called, has their own instruction pointer, and an offset for the first stack slot that can be used by the function.
```c
typedef struct {
ObjFunction* function;
uint8_t* ip;
Value* slots;
} CallFrame;
```
Or the Rust equivalent:
```rust
struct CallFrame {
function: LFn,
ip: usize,
stack_root: usize,
}
```
* `Closure`s, which are actual objects that live alongside functions. They have a reference to a function and to an array of "upvalues"...
* `Upvalue`s, which are ways of pointing to values _below_ the `stack_root` of the call frame.
##### Digression: Prelude
I decided to skip the Prelude resolution in the compiler and only work with locals. But actually, closures, arguments, and the prelude are kind of the same problem: referring to values that aren't currently available on the stack.
We do, however, know at compile time the following:
* If a binding's target is on the stack, in a closure, or in the prelude.
* This does, however, require that the function arguments work in a different way.
The way to do this, I reckon, is this:
* Limit arguments (to, say, no more than 7).
* A `CallFrame` includes an arity field.
* It also includes an array of length 7.
* Each `match` operation in function arguments clones from the call frame, and the first instruction for any given body (i.e. once we've done the match) is to clear the arguments registers in the `CallFrame`, thus decrementing all the refcounts of all the heap-allocated objects.
* And the current strategy of scoping and popping in the current implementation of `match` will work just fine!
Meanwhile, we don't actually need upvalues, because bindings cannot change in Ludus. So instead of upvalues and their indirection, we can just emit a bunch of instructions to have a `values` field on a closure. The compiler, meanwhile, will know how to extract and emit instructions both to emit those values *and* to offer correct offsets.
The only part I haven't figured out quite yet is how to encode access to what's stored in a closure.
Also, I'm not certain we need the indirection of a closure object in Ludus. The function object itself can do the work, no?
And the compiler knows which function it's closing over, and we can emit a bunch of instructions to close stuff over easily, after compiling the function and putting it in the constants table. The way to do this is to yank the value to the top of the stack using normal name resolution procedures, and then use a two-byte operand, `Op::Close` + index of the function in the constants table.
##### End of digression.
And, because we know exactly is bound in a given closure, we can actually emit instructions to close over a given value easily.
#### A small optimization
The lifetimes make things complicated; but I'm not sure that I would want to actually manage them manually, given how much they make my head hurt with Rust. I do get the sense that we will, at some point, need some lifetimes. A `Chunk` right now is chunky, with lots of owned `vec`s.
Uncle Bob separates `Chunk`s and `Compiler`s, which, yes! But then we have a problem: all of the information to climb back to source code is in the `Compiler` and not in the `Chunk`. How to manage that encoding?
(Also the keyword and string intern tables should be global, and not only in a single compiler, since we're about to get nested compilers...)
### 2024-12-24
Other interesting optimizations abound:
* `add`, `sub`, `inc`, `dec`, `type`, and other extremely frequently used, simple functions can be compiled directly to built-in opcodes. We still need functions for them, with the same arities, for higher order function use.
- The special-case logic is in the `Synthetic` compiler branch, rather than anywhere else.
- It's probably best to disallow re-binding these names anywhere _except_ Prelude, where we'll want them shadowed.
- We can enforce this in `Validator` rather than `Compiler`.
* `or` and `and` are likewise built-in, but because they don't evaluate their arguments eagerly, that's another, different special case that's a series of eval, `jump_if_false`, eval, `jump_if_false`, instructions.
* More to the point, the difference between `or` and `and` here and the built-ins is that `or` and `and` are variadic, where I was originally thinking about `and` and co. as fixed-arity, with variadic behaviours defined by a shadowing/backing Ludus function. That isn't necessary, I don't think.
* Meanwhile, `and` and `or` will also, of necessity, have backing shadowing functions.
#### More on CallFrames and arg passing
* We don't actually need the arguments register! I was complicating things. The stack between the `stack_root` and the top will be _exactly_ the same as an arguments register would have been in my imagination. So we can determine the number of arguments passed in with `stack.len() - stack_root`, and we can access argument positions with `stack_root + n`, since the first argument is at `stack_root`.
- This has the added benefit of not having to do any dances to keep the refcount of any heap-allocated objects as low as possible. No extra `Clone`s here.
* In addition, we need two `check_arity` ops: one for fixed-arity clauses, and one for clauses with splatterns. Easily enough done. Remember: opcodes are for special cases!
#### Tail calls
* The way to implement tail calls is actually now really straightforward! The idea is to simply have a `TailCall` rather than a `Call` opcode. In place of creating a new stack frame and pushing it to the call stack on top of the old call frame, you pop the old call frame, then push the new one to the call stack.
* That does mean the `Compiler` will need to keep track of tail calls. This should be pretty straightforward, actually, and the logic is already there in `Validator`.
* The thing here is that the new stack frame simply requires the same return location as the old one it's replacing.
* That reminds me that there's an issue in terms of keeping track of not just the IP, but the chunk. In Lox, the IP is a pointer to a `u8`, which works great in C. But in Rust, we can't use a raw pointer like that, but an index into a `vec<u8>`. Which means the return location needs both a chunk and an index, not just a `u8` pointer:
```rust
struct StackFrame<'a> {
function: LFn,
stack_root: usize,
return: (&'a Chunk, usize),
}
```
(I hate that there's a lifetime here.)
This gives us a way to access everything we need: where to return to, the root of the stack, the chunk (function->chunk), the closures (function->closures).

View File

@ -4,6 +4,7 @@ use crate::value::*;
use chumsky::prelude::SimpleSpan; use chumsky::prelude::SimpleSpan;
use num_derive::{FromPrimitive, ToPrimitive}; use num_derive::{FromPrimitive, ToPrimitive};
use num_traits::FromPrimitive; use num_traits::FromPrimitive;
use std::rc::Rc;
#[derive(Copy, Clone, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)] #[derive(Copy, Clone, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)]
pub enum Op { pub enum Op {
@ -98,7 +99,7 @@ fn is_binding(expr: &Spanned<Ast>) -> bool {
use Ast::*; use Ast::*;
match ast { match ast {
Let(..) | LBox(..) => true, Let(..) | LBox(..) => true,
Fn(name, ..) => *name != "*anon", Fn(name, ..) => !name.is_empty(),
_ => false, _ => false,
} }
} }
@ -176,7 +177,7 @@ impl Chunk {
self.spans.push(self.span); self.spans.push(self.span);
self.bytecode.push(constant_index as u8); self.bytecode.push(constant_index as u8);
self.spans.push(self.span); self.spans.push(self.span);
self.bind("*constant"); self.bind("");
} }
fn emit_op(&mut self, op: Op) { fn emit_op(&mut self, op: Op) {
@ -278,19 +279,19 @@ impl Chunk {
} }
PlaceholderPattern => { PlaceholderPattern => {
self.emit_op(Op::MatchWord); self.emit_op(Op::MatchWord);
self.bind("_"); self.bind("");
} }
NilPattern => { NilPattern => {
self.emit_op(Op::MatchNil); self.emit_op(Op::MatchNil);
self.bind("nil"); self.bind("");
} }
BooleanPattern(b) => { BooleanPattern(b) => {
if *b { if *b {
self.emit_op(Op::MatchTrue); self.emit_op(Op::MatchTrue);
self.bind("true"); self.bind("");
} else { } else {
self.emit_op(Op::MatchFalse); self.emit_op(Op::MatchFalse);
self.bind("false"); self.bind("");
} }
} }
NumberPattern(n) => { NumberPattern(n) => {
@ -373,6 +374,7 @@ impl Chunk {
} }
_ => unreachable!(), _ => unreachable!(),
} }
// TODO: implement longer synthetic expressions
for term in rest { for term in rest {
todo!() todo!()
} }
@ -398,10 +400,10 @@ impl Chunk {
} }
} }
Match(scrutinee, clauses) => { Match(scrutinee, clauses) => {
dbg!(&scrutinee);
self.visit(scrutinee.as_ref()); self.visit(scrutinee.as_ref());
let mut jump_idxes = vec![]; let mut jump_idxes = vec![];
let mut clauses = clauses.iter(); let mut clauses = clauses.iter();
// TODO: add guard checking
while let Some((MatchClause(pattern, _, body), _)) = clauses.next() { while let Some((MatchClause(pattern, _, body), _)) = clauses.next() {
self.scope_depth += 1; self.scope_depth += 1;
self.visit(pattern); self.visit(pattern);
@ -431,8 +433,27 @@ impl Chunk {
self.bytecode[idx] = self.bytecode.len() as u8 - idx as u8 + 2; self.bytecode[idx] = self.bytecode.len() as u8 - idx as u8 + 2;
} }
} }
Fn(lfn) => { Fn(name, body, doc) => {
let fn_chunk = Chunk::new() let mut chunk = Chunk::new(body, self.name, self.src);
chunk.compile();
if crate::DEBUG_COMPILE {
println!("==function: {name}==");
chunk.disassemble();
}
let lfn = crate::value::LFn {
name,
doc: *doc,
chunk,
};
let fn_val = Value::Fn(Rc::new(lfn));
self.emit_constant(fn_val);
self.bind(name);
}
FnDeclaration(name) => {
todo!()
}
FnBody(clauses) => {
self.emit_op(Op::ResetMatch);
} }
_ => todo!(), _ => todo!(),
} }

View File

@ -89,7 +89,8 @@ pub enum Ast {
Box<Option<Spanned<Self>>>, Box<Option<Spanned<Self>>>,
Box<Spanned<Self>>, Box<Spanned<Self>>,
), ),
Fn(LFn), Fn(&'static str, Box<Spanned<Ast>>, Option<&'static str>),
FnBody(Vec<Spanned<Ast>>),
FnDeclaration(&'static str), FnDeclaration(&'static str),
Panic(Box<Spanned<Self>>), Panic(Box<Spanned<Self>>),
Do(Vec<Spanned<Self>>), Do(Vec<Spanned<Self>>),
@ -211,11 +212,10 @@ impl fmt::Display for Ast {
.join("\n") .join("\n")
) )
} }
Fn(name, clauses, _) => { FnBody(clauses) => {
write!( write!(
f, f,
"fn: {}\n{}", "{}",
name,
clauses clauses
.iter() .iter()
.map(|clause| clause.0.to_string()) .map(|clause| clause.0.to_string())
@ -223,6 +223,9 @@ impl fmt::Display for Ast {
.join("\n") .join("\n")
) )
} }
Fn(name, body, ..) => {
write!(f, "fn: {name}\n{}", body.0)
}
FnDeclaration(_name) => todo!(), FnDeclaration(_name) => todo!(),
Panic(_expr) => todo!(), Panic(_expr) => todo!(),
Do(terms) => { Do(terms) => {
@ -928,7 +931,12 @@ where
let lambda = just(Token::Reserved("fn")) let lambda = just(Token::Reserved("fn"))
.ignore_then(fn_unguarded.clone()) .ignore_then(fn_unguarded.clone())
.map_with(|clause, e| (Fn("*anon", vec![clause], None), e.span())); .map_with(|clause, e| {
(
Fn("", Box::new((Ast::FnBody(vec![clause]), e.span())), None),
e.span(),
)
});
let fn_clauses = fn_clause let fn_clauses = fn_clause
.clone() .clone()
@ -1016,7 +1024,10 @@ where
} else { } else {
unreachable!() unreachable!()
}; };
(Fn(name, vec![clause], None), e.span()) (
Fn(name, Box::new((Ast::FnBody(vec![clause]), e.span())), None),
e.span(),
)
}); });
let docstr = select! {Token::String(s) => s}; let docstr = select! {Token::String(s) => s};
@ -1038,7 +1049,10 @@ where
} else { } else {
unreachable!() unreachable!()
}; };
(Fn(name, clauses, docstr), e.span()) (
Fn(name, Box::new((Ast::FnBody(clauses), e.span())), docstr),
e.span(),
)
}); });
let fn_ = fn_named.or(fn_compound).or(fn_decl); let fn_ = fn_named.or(fn_compound).or(fn_decl);

View File

@ -362,7 +362,8 @@ impl<'a> Validator<'a> {
self.declare_fn(name.to_string()); self.declare_fn(name.to_string());
self.status.tail_position = tailpos; self.status.tail_position = tailpos;
} }
Fn(name, clauses, ..) => { FnBody(..) => unreachable!(),
Fn(name, body, ..) => {
let mut is_declared = false; let mut is_declared = false;
match self.bound(name) { match self.bound(name) {
Some((_, _, FnInfo::Declared)) => is_declared = true, Some((_, _, FnInfo::Declared)) => is_declared = true,
@ -380,8 +381,12 @@ impl<'a> Validator<'a> {
let from = self.status.used_bindings.len(); let from = self.status.used_bindings.len();
let mut arities = HashSet::new(); let mut arities = HashSet::new();
let (Ast::FnBody(clauses), _) = body.as_ref() else {
unreachable!()
};
for clause in clauses { for clause in clauses {
// TODO: validate all parts of clauses // we have to do this explicitly here because of arity checking
let (expr, span) = clause; let (expr, span) = clause;
self.ast = expr; self.ast = expr;
self.span = *span; self.span = *span;
@ -390,12 +395,7 @@ impl<'a> Validator<'a> {
self.validate(); self.validate();
} }
// this should be right // collect info about what the function closes over
// we can't bind anything that's already bound,
// even in arg names
// so anything that is already bound and used
// will, of necessity, be closed over
// we don't want to try to close over locals in functions
let mut closed_over = HashSet::new(); let mut closed_over = HashSet::new();
for binding in self.status.used_bindings.iter().skip(from) { for binding in self.status.used_bindings.iter().skip(from) {
if self.bound(binding.as_str()).is_some() { if self.bound(binding.as_str()).is_some() {

View File

@ -8,13 +8,19 @@ use std::rc::Rc;
#[derive(Clone, Debug, PartialEq)] #[derive(Clone, Debug, PartialEq)]
pub struct LFn { pub struct LFn {
pub name: &'static str, pub name: &'static str,
pub body: Vec<Spanned<Ast>>, pub doc: Option<&'static str>,
// pub doc: Option<&'static str>,
// pub enclosing: Vec<(usize, Value)>, // pub enclosing: Vec<(usize, Value)>,
// pub has_run: bool, // pub has_run: bool,
// pub input: &'static str, // pub input: &'static str,
// pub src: &'static str, // pub src: &'static str,
pub chunk: Chunk, pub chunk: Chunk,
pub closed: Vec<Value>,
}
impl LFn {
pub fn close(&mut self, val: Value) {
self.closed.push(val);
}
} }
#[derive(Clone, Debug, PartialEq)] #[derive(Clone, Debug, PartialEq)]