From 6ded94f7d0bd098b36b51fd1d397314a00d848b4 Mon Sep 17 00:00:00 2001 From: Scott Richmond Date: Fri, 6 Jun 2025 00:06:23 -0400 Subject: [PATCH] oops: put working document under version control --- may_2025_thoughts.md | 272 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100644 may_2025_thoughts.md diff --git a/may_2025_thoughts.md b/may_2025_thoughts.md new file mode 100644 index 0000000..3facff0 --- /dev/null +++ b/may_2025_thoughts.md @@ -0,0 +1,272 @@ +# Catching back up +## May 2025 + +### Bugs + +#### `match` is not popping things correctly +``` +=== source code === + +let foo = match :foo with { + :foo -> 1 + :bar -> 2 + :baz -> 3 +} +foo + +=== chunk: test === +IDX | CODE | INFO +0000: reset_match +0001: constant 0000: :foo +0003: match_constant 0000: :foo +0005: jump_if_no_match 0006 +0007: constant 0001: 1 +0009: store +0010: pop +0011: jump 0023 +0013: match_constant 0002: :bar +0015: jump_if_no_match 0006 +0017: constant 0003: 2 +0019: store +0020: pop +0021: jump 0013 +0023: match_constant 0004: :baz +0025: jump_if_no_match 0006 +0027: constant 0005: 3 +0029: store +0030: pop +0031: jump 0003 +0033: panic_no_match +0034: load +0035: match_word +0036: panic_if_no_match +0037: push_binding 0000 +0039: store +0040: pop +0041: pop +0042: load + + + +=== vm run: test === +0000: [] nil +0000: reset_match +0001: [] nil +0001: constant 0000: :foo +0003: [:10] nil +0003: match_constant 0000: :foo +0005: [:10] nil +0005: jump_if_no_match 0006 +0007: [:10] nil +0007: constant 0001: 1 +0009: [:10|1] nil +0009: store +0010: [:10|nil] 1 +0010: pop +0011: [:10] 1 +0011: jump 0023 +0036: [:10] 1 +0036: panic_if_no_match +0037: [:10] 1 <== Should "return" from match here +0037: push_binding 0000 +0039: [:10|:10] 1 +0039: store +0040: [:10|nil] :10 +0040: pop +0041: [:10] :10 +0041: pop +0042: [] :10 +0042: load +:foo +``` +Should return `1`. + +Instruction `0037` is where it goes off the rails. + +### Things left undone +Many. But things that were actively under development and in a state of unfinishedness: + +1. Tuple patterns +2. Loops +3. Function calls + +#### Tuple patterns +This is the blocking issue for loops, function calls, etc. +You need tuple pattern matching to get proper looping and function calls. +Here are some of the issues I'm having: +* Is it possible to represent tuples on the stack? Right now they're allocated on the heap, which isn't great for function calls. +* How to represent complex patterns? There are a few possibilities: + - Hard-coded into the bytecode (this is probably the thing to do?) + - Represented as a data structure, which itself would have to be allocated on the heap + - Some hybrid of the two: + * Easy scalar values are hard-coded: `nil`, `true`, `:foo`, `10` are all built into the bytecode + constants table + * Perhaps dict patterns will have to be data structures + +#### Patterns, generally +On reflection, I think the easiest (perhaps not simplest!) way to go is to model the patterns as separate datatypes stored per-chunk in a vec. +The idea is that we push a value onto the stack, and then have a `match` instruction that takes an index into the pattern vec. +We don't even need to store all pattern types in that vec: constants (which already get stored in the constant vec), interpolations, and compound patterns. +`nil`, `false`, etc. are singletons and can be handled like (but not exactly as) the `nil` and `false` _values_. +This also means we can outsource the pattern matching mechanics to Rust, which means we don't have to fuss with "efficient compiling of pattern matching" titchiness. +This also has the benefit, while probably being fast _enough_, of reflecting the conceptual domain of Ludus, in which patterns and values are different DSLs within the language. +So: model it that way. + +### Now that we've got tuple patterns +#### May 23, 2025 +A few thoughts: +* Patterns aren't _things_, they're complex conditional forms. I had not really learned that; note that the "compiling pattern matching efficiently" paper is about how to optimize those conditionals. The tuple pattern compilation more closely resembles an `if` or `when` form. +* Tuple patterns break the nice stack-based semantics of binding. So do other patterns. That means I had to separate out bindings and the stack. I did this by introducing a representation of the stack into the compiler (it's just a stack-depth counter). + - This ended up being rather titchy. I think there's a lot of room to simplify things by avoiding manipulating this counter directly. My sense is that I should probably move a lot of the `emit_op` calls into methods that ensure all the bookkeeping happens automatically. +* `when` is much closer to `if` than `match`; remember that! +* Function calls should be different from tuple pattern matching. Tuples are currently (and maybe forever?) allocated on the heap. Function calls should *not* have to pass through the heap. The good news: `Arguments` is already a different AST node type than `Tuple`; we'll want an `ArgumentsPattern` pattern node type that's different from (and thus compiled differently than) `TuplePattern`. They'll be similar--the matching logic is the same, after all--but the arguments will be on the stack already, and won't need to be unpacked in the same way. + - One difficulty will be matching against different arities? But actually, we should compile these arities as different functions. + - Given splats, can we actually compile functions into different arities? Consider the following: + ``` + fn foo { + (x) -> & arity 1 + (y, z) -> & arity 2 + (x, y, 2) -> & arity 3 + (...z) -> & arity 0+ + } + ``` + `fn(1, 2, 3)` and `fn(1, 2, 4)` would invoke different "arities" of the function, 3 and 0+, respectively. I suspect the simpler thing really is to just compile each function as a singleton, and just keep track of the number of arguments you're matching. +* Before we get to function calls, `loop`/`recur` make sense as the starting ground. I had started that before, and there's some code in those branches of the compiler. But I ran into tuple pattern matching. That's now done, although actually, the `loop`/`recur` situation probably needs a rewrite from the ground up. +* Just remember: we're not aiming for *fast*, we're aiming for *fast enough*. And I don't have a ton of time. So the thing to do is to be as little clever as possible. +* I suspect the dominoes will fall reasonably quickly from here through the following: + - [x] `list` and `dict` patterns + - [x] updating `repeat` + - [x] standing up `loop`/`recur` + - [x] standing up functions + - [x] more complex synthetic expressions + - [x] `do` expressions +* That will get me a lot of the way there. What's left after that which might be challenging? + - [x] string interpolation + - [x] splats + - [ ] splatterns + - [x] string patterns + - [x] partial application + - [ ] tail calls + - [ ] stack traces in panics + - [ ] actually good lexing, parsing, and validation errors. I got some of the way there in the fall, but everything needs to be "good enough." +* After that, we're in integration hell: taking this thing and putting it together for Computer Class 1. Other things that I want (e.g., `test` forms) are for later on. +* There's then a whole host of things I'll need to get done for CC2: + - some kind of actual parsing strategy (that's good enough for "Dissociated Press"/Markov chains) + - actors + - animation in the frontend +* In addition to a lot of this, I think I need some kind of testing solution. The Janet interpreter is pretty well-behaved. + +### Now that we've got some additional opcodes and loop/recur working +#### 2025-05-27 +The `loop` compilation is _almost_ the same as a function body. That said, the thing that's different is that we don't know the arity of the function that's called. + +A few possibilities: +* Probably the best option: enforce a new requirement that splat patterns in function clauses *must* be longer than any explicit arity of the function. So, taking the above: + ``` + fn foo { + (x) -> & arity 1 + (y, z) -> & arity 2 + (x, y, 2) -> & arity 3 + (...z) -> & arity 0+ + } + ``` +This would give you a validation error that splats must be longer than any other arity. +Similarly, we could enforce this: +``` +fn foo { + (x) -> & arity 1 + (x, y) -> & arity 2 + (x, ...) & arity n > 1; error! too short + (x, y, ...) & arity n > 2; ok! +} +``` +The algorithm for compiling functions ends up being a little bit titchy, because we'll have to store multiple functions (i.e. chunks) with different arities. +Each arity gets a different chunk. +And the call function opcode comes with a second argument that specifies the number of arguments, 0 to 7. +(That means we only get a max of 7 arguments, unless I decide to give the call opcode two bytes, and make the call/return register much bigger.) + +Todo, then: +* [x] reduce the call/return register to 7 +* [x] implement single-arity compilation +* [x] implement single-arity calls, but with two bytes +* [x] compile multiple-arity functions +* [x] add closures + +### Some thoughts while chattin w/ MNL +On `or` and `and`: these should be reserved words with special casing in the parser. +You can't pass them as functions, because that would change their semantics. +So they *look* like functions, but they don't *behave* like functions. +In Clojure, if you try `(def foo or)`, you get an error that you "can't take the value of a macro." +I'll need to change Ludus so that `or` and `and` expressions actually generate different AST nodes, and then compile them from there. + +AND: done. + +### Implementing functions & calls, finally +#### 2025-06-01 +Just to explain where I am to myself: +* I have a rough draft (probably not yet fully functional) of function compilation in the compiler. +* Now I have to implement two op codes: `Call` and `Return`. +* I now need to implement call frames. +* A frame in _Crafting Interpreters_ has: a pointer to a Lox function object, an ip, and an index into the value stack that indicates the stack bottom for this function call. Taking each of these in turn: +* The lifetime for the pointer to the function: + - The pointer to the function object cannot, I think, be an explicit lifetime reference, since I don't think I know how to prove to the Rust borrow checker that the function will live long enough, especially since it's inside an `Rc`. + - That suggests that I actually need the whole `Value::Fn` struct and not just the inner `LFn` struct, so I can borrow it. +* The ip and stack bottom are just placeholders and don't change. + +### Partially applied functions +#### 2025-06-05 +Partially applied functions are a little complicated, because they involve both the compiler and the VM. +Maybe. +My sense is that, perhaps, the way to do them is actually to create a different value branch. +They .... + +### Jumping! Numbers! And giving up the grind +#### 2025-06-05 +Ok! So. +This won't be ready for next week. +That's clear enough now--even though I've made swell progress! +One thing I just discovered, which, well, it feels silly I haven't found this before. +Jump instructions, all of them, need 16 bits, not 8. + +That means some fancy bit shifting, and likely some refactoring of the compiler & vm to make them easier to work with. + +For reference, here's the algorithm for munging u8s and u16s: + +```rust +let a: u16 = 14261; +let b_high: u8 = (a >> 8) as u8; +let b_low: u8 = a as u8; +let c: u16 = ((b_high as u16) << 8) + b_low as u16; +println!("{a} // {b_high}/{b_low} // {c}"); +``` + +To reiterate the punch list that *I would have needed for Computer Class 1*: +* [ ] jump instructions need 16 bits of operand +* [ ] splatterns + - [ ] validator should ensure splatterns are the longest patterns in a form +* [ ] add guards to loop forms +* [ ] check loop forms against function calls: do they still work the way we want them to? +* [ ] tail call elimination +* [ ] stack traces in panics +* [ ] actually good error messages + - [ ] parsing + - [ ] my memory is that validator messages are already good? + - [ ] panics, esp. no match panics +* [ ] getting to prelude + - [ ] `base` should load into Prelude + - [ ] prelude should run properly + - [ ] prelude should be loaded into every context +* [ ] packaging things up + - [ ] add a `to_json` method for values + - [ ] teach Rudus to speak our protocols (stdout and turtle graphics) + - [ ] there should be a Rust function that takes Ludus source and returns valid Ludus status json + - [ ] compile Rust to WASM + - [ ] wire rust-based WASM into JS + - [ ] FINALLY, test Rudus against Ludus test cases + +So this is the work of the week of June 16, maybe? + +Just trying to get a sense of what needs to happen for CC2: +* [ ] Actor model (objects, Spacewar!) +* [ ] Animation hooked into the web frontend (Spacewar!) +* [ ] Saving and loading data into Ludus (perceptrons, dissociated press) +