WASM out of bounds memory access bugs #49

Closed
opened 2025-07-01 05:29:03 +00:00 by scott · 2 comments
Owner

I'm about to cry.

I've finally got all the fundamentals working, and WASM decides to throw a possibly show-ending bug. I don't know what to do.

To reproduce:

  • pull this repo, branch actors, latest commit
  • serve up the pkg directory in a web browser, open http://localhost:8080/index.html (or similar)
  • open a JS console
  • use ludus.run to execute ludus code.

Behaviour:

  • a synchronous ludus script will generally work no problem, e.g. ludus.run(":foo") gives us the proper result.
  • an asynchronous ludus script will tend to cause flaky, occasional errors, e.g., ludus.run("print!(:foo);sleep!(3000);print!(:bar)")

These look different in different browsers.

In Firefox, we get "RuntimeError: index out of bounds." At byte 1_751_972 in rudus_bg.wasm. (The wat equivalent file is 57 megabytes. I cannot debug this.) Chrome is by far the most stable (so I'm not quite able to reproduce it right now), but we get an unhandled error in Promise, "Out of Bounds Memory Access." In Safari, it's reliably (but not uniquely or uniformly the case) that we get "Unhandled Promise Rejection: RuntimeError: Out of bounds memory access (evaluating 'wasm.closure327_externref_shim(arg0, arg1, arg2)')." It's noteworthy that Safari will even crash running a single-line synchronous Ludus script.

I have tried several things to address this.

The most consequential of these has been to increase the stack size of whatever I build, which has made Chrome a reasonably stable target, and Firefox mostly stable. To do this, I dropped export CARGO_BUILD_RUSTFLAGS="-Clink-arg=-zstack-size=1000000" into the command line. (Why are we using an environment variable for this?) I learned about this through a link now lost to time.

From this link, I spent some time trying to figure out how to use talc instead of the default allocator; that didn't help (and was very slow?).

Related links:

A few maybe-maybe-not related issues:

  • Chrome seems to behave and give me what I expect in the console.
  • Firefox (recently?) started to give me a Rust panic when calling ludus.run twice in the same console session, because the run verb is getting passed through the worker to ludus, which doesn't know how to respond to a run verb. This is very strange to be, because the previous runs go to completion, and that means the world loop should have returned, but apparently it hasn't.
  • Safari is giving me multiple "Worker: Ludus has been initialized" calls, which is very strange. The run function bound to onmessage in the worker sets initialized to true... not right away but after await init().

Fixing the ordering here actually FIXED THIS ISSUE. I think.

The Firefox-doesn't-kill-the-Ludus-VM properly is a different issue. One to fix.

I'm about to cry. I've finally got all the fundamentals working, and WASM decides to throw a possibly show-ending bug. I don't know what to do. To reproduce: * pull this repo, branch `actors`, latest commit * serve up the `pkg` directory in a web browser, open `http://localhost:8080/index.html` (or similar) * open a JS console * use `ludus.run` to execute ludus code. Behaviour: * a synchronous ludus script will generally work no problem, e.g. `ludus.run(":foo")` gives us the proper result. * an asynchronous ludus script will tend to cause flaky, occasional errors, e.g., `ludus.run("print!(:foo);sleep!(3000);print!(:bar)")` These look different in different browsers. In Firefox, we get "RuntimeError: index out of bounds." At byte **1_751_972** in `rudus_bg.wasm`. (The wat equivalent file is 57 megabytes. I cannot debug this.) Chrome is by far the most stable (so I'm not quite able to reproduce it right now), but we get an unhandled error in Promise, "Out of Bounds Memory Access." In Safari, it's reliably (but not uniquely or uniformly the case) that we get "Unhandled Promise Rejection: RuntimeError: Out of bounds memory access (evaluating 'wasm.closure327_externref_shim(arg0, arg1, arg2)')." It's noteworthy that Safari will even crash running a single-line synchronous Ludus script. I have tried several things to address this. The most consequential of these has been to increase the stack size of whatever I build, which has made Chrome a reasonably stable target, and Firefox mostly stable. To do this, I dropped `export CARGO_BUILD_RUSTFLAGS="-Clink-arg=-zstack-size=1000000"` into the command line. (Why are we using an environment variable for this?) I learned about this through a link now lost to time. From [this link](https://stackoverflow.com/questions/79621931/runtimeerror-memory-access-out-of-bounds-for-wasm-bindgen-and-wasm-pack), I spent some time trying to figure out how to use `talc` instead of the default allocator; that didn't help (and was very slow?). Related links: * https://users.rust-lang.org/t/request-animation-frame-in-regular-wasm-bindgen-not-wasm-bindgen-start-function/77064/8 * https://stackoverflow.com/questions/73829866/runtimeerror-memory-access-out-of-bounds-in-wasm * https://stackoverflow.com/questions/73284608/instantiating-freeing-a-wasm-module-repeatedly-causes-memory-access-out-of-bo * https://github.com/rustwasm/wasm-bindgen/discussions/4185 * https://github.com/rustwasm/wasm-bindgen/discussions/3474 <-- this may have the answer: be extra sure never to call `init` twice? * https://github.com/rustwasm/wasm-bindgen/issues/3368 A few maybe-maybe-not related issues: * Chrome seems to behave and give me what I expect in the console. * Firefox (recently?) started to give me a Rust panic when calling `ludus.run` twice in the same console session, because the run verb is getting passed through the worker to ludus, which doesn't know how to respond to a run verb. This is very strange to be, because the previous runs go to completion, and that means the `world` loop should have returned, but apparently it hasn't. * Safari is giving me multiple "Worker: Ludus has been initialized" calls, which is very strange. The `run` function bound to `onmessage` in the worker sets `initialized` to `true`... not right away but after `await init()`. Fixing the ordering here actually **FIXED THIS ISSUE**. I think. The Firefox-doesn't-kill-the-Ludus-VM properly is a different issue. One to fix.
Author
Owner

@matt I think I just put this to bed, as you see. But if you could pull the most recent commit from the actors branch down and see what happens if/when you serve this on your machine, that would be swell.

FWIW, this is the relevant commit: 4e7557cbcc

@matt I think I just put this to bed, as you see. But if you could pull the most recent commit from the `actors` branch down and see what happens if/when you serve this on your machine, that would be swell. FWIW, this is the relevant commit: https://alea.ludus.dev/twc/rudus/commit/4e7557cbcccc22eaf987a415d738a1a649fbd1f9
Author
Owner

Okay! Done.

Okay! Done.
scott closed this issue 2025-07-02 16:06:40 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: twc/rudus#49
No description provided.