Ludus should include PEGs as part of Prelude #110

Open
opened 2024-11-28 21:45:26 +00:00 by scott · 1 comment
Owner

For more sophisticated parsing than ELIZA (i.e. for make-a-LISP), we'll want something beyond the string pattern matching.

Take a cue from Janet and Guile: include PEGs as part of the language, instead of regex.

Take a cue from Janet that a parser is actually itself a data structure.

Consider:

let just_a = "a"
let a_or_b = (:or, "a", "b")

let terminator = (:or, :whitespace, ")")
let word = (:many, (:or, :alphanumeric, "+", "-", "*", "/", "-", "_"))
let number = (:many, :digit)
let term = (:or, word, number)
let s_expr = (:between, ("(", ")"), (:sep_by, terminator, term))

parse(s_expr, "(+ 1 2)") &=> (:ok, ["+", "1", "2"])

But note that we can't actually do a lisp that way, since that will only match flat S-expressions. We'll need a way to accomplish recursion. So, as with Janet, we can also do mutually recursive and easily labelled parsers by using a dict:

let s_expr = #{
  :terminator (:or, :whitespace, ")")
  :punctuation (:or, "+", "-", "*", "/", "-", "_")
  :word (:many, (:or, :alphanumeric, :punctuation)
  :number (:many, :digit)
  :term (:or, :word, :number, :s_expr)
  :s_expr (:between, ("(", ")"), (:sep_by, :terminator, :term))
}

parse(s_expr, "(+ 1 (/ 4 2))" & => (:ok, (:s_expr, [(:word, "+"), (:number, "1"), (:s_expr, [(:word, "/"), (:number, "4"), (:number, "2")]))

parse is basically a giant multiclause function/match form that unpacks a parser using pattern matching and compares it against input.

One thing that this requires, I suppose, is rationalizing and finalizing our ways of representing strings as sequences. At current, only strings with valid ASCII chars can be yanked out of strings in this way. Eventually—the same eventually I think we'll have a full-blown PEG in Prelude—we'll have to have some sort of way to work with UTF-8. I have been avoiding that. See: https://faultlore.com/blah/text-hates-you/.

For more sophisticated parsing than ELIZA (i.e. for make-a-LISP), we'll want something beyond the string pattern matching. Take a cue from Janet and Guile: include PEGs as part of the language, instead of regex. Take a cue from Janet that a parser is actually itself a data structure. Consider: ``` let just_a = "a" let a_or_b = (:or, "a", "b") let terminator = (:or, :whitespace, ")") let word = (:many, (:or, :alphanumeric, "+", "-", "*", "/", "-", "_")) let number = (:many, :digit) let term = (:or, word, number) let s_expr = (:between, ("(", ")"), (:sep_by, terminator, term)) parse(s_expr, "(+ 1 2)") &=> (:ok, ["+", "1", "2"]) ``` But note that we can't actually do a lisp that way, since that will only match flat S-expressions. We'll need a way to accomplish recursion. So, as with Janet, we can also do mutually recursive and easily labelled parsers by using a dict: ``` let s_expr = #{ :terminator (:or, :whitespace, ")") :punctuation (:or, "+", "-", "*", "/", "-", "_") :word (:many, (:or, :alphanumeric, :punctuation) :number (:many, :digit) :term (:or, :word, :number, :s_expr) :s_expr (:between, ("(", ")"), (:sep_by, :terminator, :term)) } parse(s_expr, "(+ 1 (/ 4 2))" & => (:ok, (:s_expr, [(:word, "+"), (:number, "1"), (:s_expr, [(:word, "/"), (:number, "4"), (:number, "2")])) ``` `parse` is basically a giant multiclause function/`match` form that unpacks a parser using pattern matching and compares it against input. One thing that this requires, I suppose, is rationalizing and finalizing our ways of representing strings as sequences. At current, only strings with valid ASCII chars can be yanked out of strings in this way. Eventually—the same eventually I think we'll have a full-blown PEG in Prelude—we'll have to have some sort of way to work with UTF-8. I have been avoiding that. See: https://faultlore.com/blah/text-hates-you/.
scott added the
enhancement
next
proposal
ux
labels 2024-11-28 21:45:43 +00:00
Author
Owner

@matt Pinging you here, because one of the cases I'm thinking about for "history of computing by example" is make-a-LISP (cf https://github.com/kanaka/mal). I don't know if that's actually what we want to do, but I do think that Ludus probably does need a builtin parsing tool, and I can't see how regex is the thing.

This may not be the thing, but it's closer to the thing than regex's line noise.

Thoughts?

@matt Pinging you here, because one of the cases I'm thinking about for "history of computing by example" is make-a-LISP (cf https://github.com/kanaka/mal). I don't know if that's actually what we want to do, but I do think that Ludus probably does need a builtin parsing tool, and I can't see how regex is the thing. This may not be the thing, but it's closer to the thing than regex's line noise. Thoughts?
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: twc/ludus#110
No description provided.