So the thing here is that this is a much, much simpler version of the rather unweildy Clojure parser framework I wrote. This leans pretty hard on the Janet PEG framework, but is also bespoke for my purposes here.
This lets me parse not raw text, but scanned tokens, which makes parsing a much easier lift. Also, it lets me do things like insert :break tokens, which really helps manage collection types and blocks.
At current, I do need to shift to returning a data structure rather than `nil`s and `true`s. That way, I can advance the parser by a certain amount, however much a passing test-rule would consume.
Do I really need to optimize this? I don't think so: we can do that later. I could store various information on stacks.
Alternately, returning data structures may well complicate things: KISS!
The idea, however, is that Jesus: we must not have test-rule and apply-rule: combinators without a fixed advance amount can't work that way. So we have only `apply-rule`, which `advance`s the parser by however much should be consumed. BUT! We can always, and ought wherever we might, stash the origin index and roll back any changes (i.e. in `!` and `*`)
)
(defn pprint [x] (printf "%M" x))
(os/cd "janet") ### XXX: to remove, repl only
(import ./scanner :prefix "")
(defn new-parser
``
Creates a new parser from a list of tokens. A parser has three fields in addition to `:tokens`:
:i the index, in the list of tokens, of the current token
:errors an array of errors, treated like a stack
:captured an array of captures (tokens or arrays of tokens), treated like a stack.
``
[{:tokens tokens}]
@{:tokens tokens :i 0 :errors @[] :captured @[]})
(defn pop-to
"Pops items from a stack until it contains `len` items. Returns the stack."
[stack len]
(assert (>= len 0))
(while (< len (length stack)) (array/pop stack))
stack)
(defn popn
"Pops n items from a stack. Returns the stack."
[stack n]
(loop [_ :range [0 n]] (array/pop stack))
stack)
(defn current "Returns the current token." [{:tokens tokens :i i}] (get tokens i))
(defn previous "Returns the token before the current token." [{:tokens tokens :i i}] (get tokens (dec i)))
(defn next "Returns the token after the current token." [{:tokens tokens :i i}] (get tokens (inc i)))
(def add +) ### I `def` `+` later; stash this
(defn advance
"Advances the parser by `count`, or 1 if count is not supplied."
[parser] (update parser :i inc))
(defn stash
"Stashes a parser state: captures the current index, error, and capture stacks."
[{:i i :errors errs :captured cap}]
{:i i :err-len (length errs) :cap-len (length cap)})
(defn restore
"Restores the parser to a stashed state: resets :i, pops any new errors and captures off their stacks."
[parser state]
(def {:i i :err-len err-len :cap-len cap-len} state)
(def {:errors errors :captured captured} parser)
(put parser :i i)
(pop-to errors err-len)
(pop-to captured cap-len)
parser)
(defn at "Returns the token at index i." [parser i] (get (get parser :tokens) i))
(defn capture "Captures the last token." [parser] (update parser :captured array/push (previous parser)))
(defn apply-keyword
"Applies a keyword-based rule. Consumes the passing token."
"Tests a rule: returns whether a rule passes but does not consume any tokens."
[rule parser]
(def origin (stash parser))
(def result (apply-rule rule parser))
(restore parser origin)
result)
(defn !
"Not: negates a rule. If the rule passes, does nothing, returns `nil`. If the rule fails, advances the parser a single token, returns true. Works well only for single token rules."
"Panics the parser, consuming all tokens until the rule matches (including the match). It also adds an error node to both the capture and the error stacks."
## next problem: where do we back out to, and how?
## the answer: an "anchor" point, where we `(try) something, and if it panics, start up with another line; in a block it will be line; in a script it will be toplevel`
## maybe terms in a tuple, list, dict
## it seems noteworthy that the things we want to return to are all either "any" or "some", and all either end with a newline or a break
## they're not all either any or some (empty data structures, but lines in blocks and scripts)
## also: there's a finite number of possibilities for any of them:
## another well-formed term, or the end of the item
"Takes a parser and an origin state: takes everything captured since the origin, gathers it up in a single array, pops it all off the capture stack, and pushes the gathered captures back on the stack."
[parser origin]
(def from (get origin :cap-len))
(def captured (get parser :captured))
(def grouped (array/slice captured from -1))
(pop-to captured from)
(array/push captured grouped)
captured)
(defn &
"Groups rules: pops all captured patterns and pushes them on the captured stack as a single array."
[rule]
@{:name (keyword (string "&" (name rule)))
:rule (fn &* [parser]
(def origin (stash parser))
(def result (apply-rule rule parser))
(if result
(do
(capture-group parser origin)
true)
(do
(restore parser origin)
nil)))})
(defn fn-name
"Returns the name of a function."
[f]
(def rep (string f))
(string/slice rep 10 -2))
(defn /
"Substitution: takes a rule and a function, and applies the function to any captures arising from the rule; the captures are individual parameters to the function."