Thoughts about bytecode VM #95
Labels
No Label
accepted
bug
clj
documentation
enhancement
errors
infrastructure
later
next
now
optimization
proposal
question
research
semantics
syntax
ux
vm
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: twc/ludus#95
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
One thing Computer Class left me with was a sense that Ludus is just too slow. As expected, a bytecode VM seems necessary to speed things up. (That said, probably it's worth profiling how much is spent in the interpreter vs. drawing things. See #94. And that said, probably the thing that's actually taking the most time and slowing things down the most is the schlemiel's algorithm in appending things to lists, which we do A LOT of in drawing turtle graphics.) (
9752a87f27
fixes the schlemiel's algorithm problem, cutting runtimes substantially--but things are still way slow.)A few thoughts about this.
There are two possible models here:
The rest of this comment in this issue presumes some version of (2).
The idea here is to create two flat arrays that represent a Ludus program: a u8 byte array that's instructions (byte code); and a f64 array that are the constant values (NaN-boxed typed values). These are both extremely straightforward to either send from Janet to Zig, or to pass out of a Janet WASM blob and INTO a Zig one. (Actually, we'll need 3: we'll need line info for panics.)
As Munificent Bob puts it, the VM is faster thusly: "Remember that whole family of AST classes we defined in jlox? In clox, we’ve reduced that down to three arrays: bytes of code, constant values, and line information for debugging."
The only real difference/issue I can see is that the constants array will likely need 2 bytes of address info, rather than 1, since I can see 256 constants easily (but not 65k). This is because, instead of building a constants array for a given chunk, we'll be doing it for an entire script/program at once.
In addition, instead of storing line info, we can store token info, which will just be an index in the tokens array generated during scanning (this will be a
[]u16
).One thing that having a separate front and backend will require is more/duplicate bookkeeping: we'll need two versions of the bytecode table, one in Janet, the other in Zig (or C). We'll need to pass an index to a token where a panic occurred; we'll need a way of exfiltrating a return trace; we'll need a version the console and any in/outputs that can be passed back and forth; and we'll need to figure out how to actually build a hybrid program into WASM.
Does this suggest that path #1, RiiZig, is the right thing to do? (Sigh.)
That does suggest we put rather more effort into #18, and that we need a test-case-a-thon (that is NOT tied to Janet).