rudus/thoughts.md
2024-11-09 14:10:08 -05:00

4.1 KiB

VM thoughts

Initial thoughts

We want numbers and bools as unboxed as possible.

Nil is a singleton, and should be static.

Strings come in two flavours:

  • String literals, which are static/interned.
  • Constructed strings, which should be Rc<String>

Keywords are static/interned.

Tuples should be refcounted for now.

Optimization and other thoughts

2024-11-09

  • To put tuples on the stack, we need to know both how long they are (number of members) and how big they are (amount of memory), since tuples can contain other tuples.
    • All other values must be one stack cell:
      • nil is its own thing
      • numbers are a wrapped f64 (at least until we get to NaN boxed values)
      • booleans are a wrapped bool
      • keywords are a wrapped u16 or u32, which is an index into a vec of &strs, which can be read back into a string when printed
      • strings are a &str or an Rc<String> (with two possible wrappers: Value::Str or Value::String)
      • dicts are imbl::HashMap<u16, Value>, with the hash generated on the index of the keyword
      • sets are imbl::HashSet<Value>, with the caveat that f64 isn't Eq, which means that we can't use it for a hash key. The way around this, I think, is to implement Eq for Value, with a panic if we try to put NaN in a set
      • functions are Rc<LFn>
      • boxes are Rc<RefCell>
      • That means everything is either a wrapped Copy (:nil, :number, :bool), an interned reference (:keyword, :string), Rc reference types (:string, :box, :fn), or persistent reference types that have their own clone (:list, :dict, :set)
      • This doesn't cover everything, yet. But other reference types will be Rced structs: to wit, processes and packages.
    • Tuples, meanwhile, have a special representation on the stack.
      • They start with a Value::TupleStart(len: u8, size: u8).
      • They then have a number of members.
      • They end with a Value::TupleEnd(len: u8, size: u8).
      • len indicates the number of members in the tuple; size indicates the size of the tuple on the stack, including the TupleStart and TupleEnd cells. For (), len is 0, and size is 2. Nesting tuples will lead to larger divergences, and will increase size but not len.
      • If sombody tries to stuff more than 255 members in a tuple, nested or not, we get a validation error to tell them to use a list.
        • Or promote it to be a reference type? The natural encoding of a list in Ludus is using a (car, cdr) encoding (or (data, next)). I believe the way to get this out of a scope (block or function) is to expand the tuple fully, which could lead very quickly to very large tuples.
        • But we can easily distinguish between argument tuples and value tuples, and promote value tuples with a size larger than 255 to a Value::BigTuple(Rc<Vec<Value>>).
        • But in no case should we allow arguments to get bigger than 255.
        • Keeping small value tuples on the stack is worthwhile, especially given the importance of result tuples, which should stay on the stack.
  • This naturally leads to questions about pattern matching, especially when we get to a stack-based bytecode VM.
    • A pattern, like a tuple, is a series of cells.
    • The goal is to keep pattern sizes and lengths identical to the tuple data representation.
    • That means that, like data representations, a pattern has to include both a set of bytecode instructions and a data representation on the stack.
    • In fact, I suspect that the fastest way to encode this will be to push the data representation of the scrutinee on the stack, and then to push the pattern, and to then compare within the stack, at different offsets.

Let's not reinvent the wheel

Or, crates we will use

  • chumsky for parsing
  • ariadne for parsing errors
  • imbl for persistent data structures
  • boxing for NaN boxing
  • tailcall for tail recursion

We additionally might want crates for:

  • processes/actors, although given that Ludus will be single-threaded for the forseeable future, it may be lighter weight to just write my own process abstraction
  • in that case, we will need a ringbuffer, ringbuf