Real UTF-8 functions #66

Open
opened 2024-06-05 17:13:40 +00:00 by scott · 1 comment
Owner

The Prelude should have some real UTF-8 functions, especially now that we're using UTF-8 under the hood.

The Prelude should have some real UTF-8 functions, especially now that we're using UTF-8 under the hood.
scott added this to the 0.3.0 milestone 2024-06-05 17:13:40 +00:00
scott added the
enhancement
later
research
labels 2024-06-05 17:13:40 +00:00
Author
Owner

We have some things bundled under this (#91), but this is a general review of string manipulation function. Currently we have: upcase, downcase, strip, words, sentence, trim, split, and join. We want chars. length will give the number of bytes in a string, not the number of characters.

What do we want/need beyond this?

Principles of caution, here:

  • UTF-8 is extremely subtle and complicated; we DO NOT want to have to explain these subtle complications to learners.
  • We already ran into some issues with Jules in the first iteration of Computer Class wanting to use é in names.
  • Perhaps the way to proceed here is to lean into ASCII and its history, rather than trying to fix something in code.

cf: https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme, https://news.ycombinator.com/item?id=20054745

We have some things bundled under this (#91), but this is a general review of string manipulation function. Currently we have: `upcase`, `downcase`, `strip`, `words`, `sentence`, `trim`, `split`, and `join`. We want `chars`. `length` will give the number of _bytes_ in a string, not the number of characters. What do we want/need beyond this? Principles of caution, here: * UTF-8 is extremely subtle and complicated; we DO NOT want to have to explain these subtle complications to learners. * We already ran into some issues with Jules in the first iteration of Computer Class wanting to use é in names. * Perhaps the way to proceed here is to lean into ASCII and its history, rather than trying to fix something in code. cf: https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme, https://news.ycombinator.com/item?id=20054745
scott added this to the Polishing Prelude project 2024-07-21 20:15:38 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: twc/ludus#66
No description provided.