Ludus should include PEGs as part of Prelude #110

New Issue

scott · 2024-11-28T21:45:26Z

scott commented

2024-11-28 21:45:26 +00:00

For more sophisticated parsing than ELIZA (i.e. for make-a-LISP), we'll want something beyond the string pattern matching.

Take a cue from Janet and Guile: include PEGs as part of the language, instead of regex.

Take a cue from Janet that a parser is actually itself a data structure.

Consider:

let just_a = "a"
let a_or_b = (:or, "a", "b")

let terminator = (:or, :whitespace, ")")
let word = (:many, (:or, :alphanumeric, "+", "-", "*", "/", "-", "_"))
let number = (:many, :digit)
let term = (:or, word, number)
let s_expr = (:between, ("(", ")"), (:sep_by, terminator, term))

parse(s_expr, "(+ 1 2)") &=> (:ok, ["+", "1", "2"])

But note that we can't actually do a lisp that way, since that will only match flat S-expressions. We'll need a way to accomplish recursion. So, as with Janet, we can also do mutually recursive and easily labelled parsers by using a dict:

let s_expr = #{
  :terminator (:or, :whitespace, ")")
  :punctuation (:or, "+", "-", "*", "/", "-", "_")
  :word (:many, (:or, :alphanumeric, :punctuation)
  :number (:many, :digit)
  :term (:or, :word, :number, :s_expr)
  :s_expr (:between, ("(", ")"), (:sep_by, :terminator, :term))
}

parse(s_expr, "(+ 1 (/ 4 2))" & => (:ok, (:s_expr, [(:word, "+"), (:number, "1"), (:s_expr, [(:word, "/"), (:number, "4"), (:number, "2")]))

parse is basically a giant multiclause function/match form that unpacks a parser using pattern matching and compares it against input.

One thing that this requires, I suppose, is rationalizing and finalizing our ways of representing strings as sequences. At current, only strings with valid ASCII chars can be yanked out of strings in this way. Eventually—the same eventually I think we'll have a full-blown PEG in Prelude—we'll have to have some sort of way to work with UTF-8. I have been avoiding that. See: https://faultlore.com/blah/text-hates-you/.

For more sophisticated parsing than ELIZA (i.e. for make-a-LISP), we'll want something beyond the string pattern matching. Take a cue from Janet and Guile: include PEGs as part of the language, instead of regex. Take a cue from Janet that a parser is actually itself a data structure. Consider: ``` let just_a = "a" let a_or_b = (:or, "a", "b") let terminator = (:or, :whitespace, ")") let word = (:many, (:or, :alphanumeric, "+", "-", "*", "/", "-", "_")) let number = (:many, :digit) let term = (:or, word, number) let s_expr = (:between, ("(", ")"), (:sep_by, terminator, term)) parse(s_expr, "(+ 1 2)") &=> (:ok, ["+", "1", "2"]) ``` But note that we can't actually do a lisp that way, since that will only match flat S-expressions. We'll need a way to accomplish recursion. So, as with Janet, we can also do mutually recursive and easily labelled parsers by using a dict: ``` let s_expr = #{ :terminator (:or, :whitespace, ")") :punctuation (:or, "+", "-", "*", "/", "-", "_") :word (:many, (:or, :alphanumeric, :punctuation) :number (:many, :digit) :term (:or, :word, :number, :s_expr) :s_expr (:between, ("(", ")"), (:sep_by, :terminator, :term)) } parse(s_expr, "(+ 1 (/ 4 2))" & => (:ok, (:s_expr, [(:word, "+"), (:number, "1"), (:s_expr, [(:word, "/"), (:number, "4"), (:number, "2")])) ``` `parse` is basically a giant multiclause function/`match` form that unpacks a parser using pattern matching and compares it against input. One thing that this requires, I suppose, is rationalizing and finalizing our ways of representing strings as sequences. At current, only strings with valid ASCII chars can be yanked out of strings in this way. Eventually—the same eventually I think we'll have a full-blown PEG in Prelude—we'll have to have some sort of way to work with UTF-8. I have been avoiding that. See: https://faultlore.com/blah/text-hates-you/.

scott added the

enhancement

ux

labels 2024-11-28 21:45:43 +00:00

scott commented

2024-11-28 22:10:12 +00:00

@matt Pinging you here, because one of the cases I'm thinking about for "history of computing by example" is make-a-LISP (cf https://github.com/kanaka/mal). I don't know if that's actually what we want to do, but I do think that Ludus probably does need a builtin parsing tool, and I can't see how regex is the thing.

This may not be the thing, but it's closer to the thing than regex's line noise.

Thoughts?

@matt Pinging you here, because one of the cases I'm thinking about for "history of computing by example" is make-a-LISP (cf https://github.com/kanaka/mal). I don't know if that's actually what we want to do, but I do think that Ludus probably does need a builtin parsing tool, and I can't see how regex is the thing. This may not be the thing, but it's closer to the thing than regex's line noise. Thoughts?

Sign in to join this conversation.