Node.js WebAssembly Zero-Copy Data Exchange with SharedArrayBuffer

If your Node.js and WebAssembly integration involves JSON.stringify, Buffer.from, or any form of serialization at the boundary, you are copying data twice and paying for it in latency and GC pressure. The pattern is common, it ships, and it is slow in ways that are invisible until your throughput requirements get real. SharedArrayBuffer eliminates the copy by mapping a single memory region that both the V8 heap and the Wasm linear memory address directly. This article covers exactly how that works, how to implement it in Rust and JavaScript, how to synchronize access across Worker Threads without race conditions, and the places where the approach will quietly destroy you if you are not careful.

TL;DR

Serialization-based Node.js/Wasm data exchange introduces marshalling overhead and GC pressure at every call boundary
SharedArrayBuffer with shared: true on WebAssembly.Memory creates a single memory region visible to both runtimes without copying
Rust exposes allocation functions and memory offsets; JavaScript reads and writes directly via typed array views
Atomics.wait / Atomics.notify handle synchronization across Worker Threads — without this, you have race conditions
memory.grow() invalidates all existing TypedArray views — cache them only between grow calls, never across them
Node.js does not require COOP/COEP headers for SharedArrayBuffer — that restriction is browser-only
Node.js 24 (LTS) adds Memory64 support — 64-bit Wasm addressing for workloads exceeding 4 GB

The Architecture of Wasm Linear Memory

WebAssembly does not have heap allocation in the traditional sense. It has a flat, contiguous byte array — linear memory — starting at address 0 and growing in 64 KB pages. Every load and store instruction in a Wasm module is an offset into this array. There is no pointer indirection, no object header, no GC involvement.

The WebAssembly.Memory Object

From the JavaScript side, this linear memory is exposed as a WebAssembly.Memory instance. Its .buffer property is an ArrayBuffer — or, when created with shared: true, a SharedArrayBuffer. That distinction is everything.


// Create shared Wasm memory: 16 pages initial (1 MB), 256 pages max (16 MB)
const memory = new WebAssembly.Memory({
 initial: 16,
 maximum: 256,
 shared: true // .buffer returns SharedArrayBuffer, not ArrayBuffer
});

// Instantiate the Wasm module, passing in the shared memory
const { instance } = await WebAssembly.instantiate(wasmBytes, {
 env: { memory }
});

// Both JS and Wasm now address the same backing buffer
const view = new Uint8Array(memory.buffer);

With shared: true, the .buffer property returns a SharedArrayBuffer. Any typed array created on top of it — Uint8Array, Float64Array, whatever — directly accesses the same bytes that the Wasm module reads and writes. No serialization, no copy, no intermediate representation.

Memory Layout: What “Shared” Actually Means

The relationship between the JS side and the Wasm side looks like this:


┌─────────────────────────────────────────────────┐
│  SharedArrayBuffer (backing store) │
│ [page 0][page 1][page 2]...[page N]  │
│ 64KB 64KB 64KB 64KB  │
└───────────────────┬─────────────────────────────┘
   │ same physical memory
  ┌─────────┴──────────┐
  │   │
 ┌──────▼──────┐ ┌───────▼──────────┐
 │ V8 / JS │ │ Wasm linear mem │
 │ TypedArray │ │ load/store ops │
 │ views │ │ at byte offsets │
 └─────────────┘ └──────────────────┘

Both sides see writes immediately. The Wasm module writes to offset 128, the JS Uint8Array at index 128 reflects it. No flush, no copy, no round-trip through the event loop.

Implementation: Rust + JavaScript Without the Copy

The standard pattern is to have the Wasm module expose two things: an allocator that returns a byte offset into linear memory, and the actual processing function. JavaScript allocates via the Wasm allocator, writes input data directly to the shared buffer, calls the function, and reads output from the same buffer.

Rust Side: Exposing Memory Offsets


// lib.rs — compiled to wasm32-unknown-unknown

use std::alloc::{alloc, dealloc, Layout};

/// Allocate `size` bytes in Wasm linear memory.
/// Returns the byte offset — JS uses this as the write address.
#[no_mangle]
pub unsafe extern "C" fn wasm_alloc(size: usize) -> *mut u8 {
 let layout = Layout::from_size_align(size, 8).unwrap();
 alloc(layout)
}

/// Free previously allocated memory.
#[no_mangle]
pub unsafe extern "C" fn wasm_free(ptr: *mut u8, size: usize) {
 let layout = Layout::from_size_align(size, 8).unwrap();
 dealloc(ptr, layout);
}

/// Process a byte slice in-place.
/// `ptr` is the offset, `len` is the byte count.
/// This writes output back to the same region — zero additional allocation.
#[no_mangle]
pub unsafe extern "C" fn process_buffer(ptr: *mut u8, len: usize) -> usize {
 let slice = std::slice::from_raw_parts_mut(ptr, len);
 
 // Example: XOR every byte with 0xAA — replace with real logic
 let mut checksum: u64 = 0;
 for byte in slice.iter_mut() {
 *byte ^= 0xAA;
 checksum = checksum.wrapping_add(*byte as u64);
 }
 
 // Return a result value — output data stays in the shared buffer
 checksum as usize
}

Build this with cargo build --target wasm32-unknown-unknown --release. The exported functions become available on instance.exports.

JavaScript Side: Zero-Copy Read and Write


import { readFile } from 'node:fs/promises';

// Shared memory — created on the JS side, passed into the Wasm module
const memory = new WebAssembly.Memory({
 initial: 16, // 1 MB
 maximum: 256, // 16 MB
 shared: true
});

const wasmBytes = await readFile('./target/wasm32-unknown-unknown/release/mylib.wasm');
const { instance } = await WebAssembly.instantiate(wasmBytes, {
 env: { memory }
});

const { wasm_alloc, wasm_free, process_buffer } = instance.exports;

function processData(inputData) {
 const byteLength = inputData.byteLength;
 
 // Allocate inside Wasm linear memory — returns a byte offset (pointer)
 const ptr = wasm_alloc(byteLength);
 if (ptr === 0) throw new Error('Wasm allocation failed');
 
 // Direct write into shared memory — NO copy through a separate buffer
 // memory.buffer IS the SharedArrayBuffer
 const view = new Uint8Array(memory.buffer, ptr, byteLength);
 view.set(new Uint8Array(inputData));
 
 // Wasm processes the data in-place
 const result = process_buffer(ptr, byteLength);
 
 // Read output directly from shared memory — same region, no copy
 const output = new Uint8Array(memory.buffer, ptr, byteLength).slice();
 
 // Free the Wasm-side allocation
 wasm_free(ptr, byteLength);
 
 return { result, output };
}

The only actual copy here is the final .slice() — necessary if you need the output to outlive the wasm_free call. If the output is consumed immediately, you can skip even that.

Lifecycle Management: Who Owns the Memory

This is where people get into trouble. The rule is simple: whoever allocates, deallocates. If JavaScript calls wasm_alloc, it must call wasm_free. If Wasm allocates internally and returns a pointer, Wasm must free it — or expose a dedicated free function.


// Dangerous pattern — leaks memory on every call
async function leakyProcess(data) {
 const ptr = wasm_alloc(data.byteLength);
 new Uint8Array(memory.buffer, ptr, data.byteLength).set(data);
 process_buffer(ptr, data.byteLength);
 // No wasm_free call — Wasm heap grows forever
}

// Correct pattern — explicit lifecycle
async function safeProcess(data) {
 const ptr = wasm_alloc(data.byteLength);
 try {
 new Uint8Array(memory.buffer, ptr, data.byteLength).set(data);
 return process_buffer(ptr, data.byteLength);
 } finally {
 wasm_free(ptr, data.byteLength); // Runs even if process_buffer throws
 }
}

Node.js 24’s using keyword (explicit resource management, now in LTS) is a natural fit here — you can wrap Wasm allocations in a disposable that calls free at the block boundary.

Synchronization with Worker Threads and Atomics

Shared memory is fast. Shared memory with unsynchronized concurrent access is a race condition waiting for a load spike to manifest. When you add Worker Threads to the mix — which is the whole point of shared Wasm memory in high-throughput scenarios — you need explicit synchronization.

The Problem: No Implicit Ordering


// main.js — BROKEN: Worker might read before Wasm write is visible
const sharedMem = new WebAssembly.Memory({ initial: 1, shared: true });
const view = new Int32Array(sharedMem.buffer);

worker.postMessage({ sharedBuffer: sharedMem.buffer });

// Write some data
view[0] = 42;

// Signal the worker — but there is no guarantee view[0] = 42
// is visible to the worker's CPU core before it reads
worker.postMessage('go');

Memory writes are not automatically visible across threads in the order they happen on the writing thread. CPUs reorder, caches diverge. You need a memory fence — and in the Wasm/JS world, that means Atomics.

The Correct Pattern: Atomics.store / Atomics.wait / Atomics.notify


// Shared control buffer — separate from data buffer
// Index 0: state flag
// 0 = idle, 1 = data ready, 2 = result ready
const controlBuffer = new SharedArrayBuffer(4);
const control = new Int32Array(controlBuffer);

// Shared data buffer
const dataBuffer = new SharedArrayBuffer(1024 * 1024); // 1 MB
const dataView = new Uint8Array(dataBuffer);

// main.js
import { Worker } from 'node:worker_threads';

const worker = new Worker('./worker.js');

// Transfer references — no copy, shared memory
worker.postMessage({ controlBuffer, dataBuffer });

async function processChunk(inputData) {
 return new Promise((resolve) => {
 // Write data to shared buffer
 dataView.set(inputData);
 
 // Atomics.store ensures the data writes above are visible
 // before the worker reads the state change
 Atomics.store(control, 0, 1); // Mark: data ready
 Atomics.notify(control, 0, 1); // Wake one waiting worker
 
 // Poll for result — in production, use a separate notification channel
 const interval = setInterval(() => {
 if (Atomics.load(control, 0) === 2) {
 clearInterval(interval);
 resolve(dataView.slice(0, inputData.byteLength));
 Atomics.store(control, 0, 0); // Reset to idle
 }
 }, 0);
 });
}


// worker.js
import { parentPort, receiveMessageOnPort, workerData } from 'node:worker_threads';
import { readFile } from 'node:fs/promises';

let control, dataView, wasmInstance;

parentPort.once('message', async ({ controlBuffer, dataBuffer }) => {
 control = new Int32Array(controlBuffer);
 dataView = new Uint8Array(dataBuffer);
 
 // Each worker gets its own Wasm instance sharing the same memory
 const memory = new WebAssembly.Memory({
 initial: 16,
 maximum: 256,
 shared: true
 // NOTE: you cannot attach an existing SharedArrayBuffer here directly;
 // pass the memory object itself via postMessage instead
 });
 
 const wasmBytes = await readFile('./mylib.wasm');
 const { instance } = await WebAssembly.instantiate(wasmBytes, { env: { memory } });
 wasmInstance = instance;
 
 // Worker loop
 while (true) {
 // Block until main thread signals data ready
 Atomics.wait(control, 0, 0); // Sleep while value is 0 (idle)
 
 if (Atomics.load(control, 0) === 1) {
 // Process — Wasm reads/writes directly from shared memory
 wasmInstance.exports.process_buffer(0, dataView.byteLength);
 
 // Signal completion — memory fence included in Atomics.store
 Atomics.store(control, 0, 2);
 Atomics.notify(control, 0, 1);
 }
 }
});

Atomics.waitAsync for Non-Blocking Main Thread

Atomics.wait blocks the calling thread. In a Worker that is fine. On the main thread it is illegal — Node.js will throw. Use Atomics.waitAsync instead, which returns a Promise:


// Non-blocking wait on main thread (Node.js 16+)
const { async: waitResult } = Atomics.waitAsync(control, 0, 0);
await waitResult.value; // Resolves when control[0] !== 0

// Now safe to read — the Atomics guarantee visibility of preceding writes
const state = Atomics.load(control, 0);

Performance: What You Actually Get

The numbers depend heavily on payload size and call frequency. Serialization overhead becomes dominant at smaller payloads; at larger ones, the allocation and GC cost of copying dominates. General ranges from zero-copy vs JSON/copy benchmarks on Node.js 24:

Deep Dive

Node.js Cluster vs Worker...

Why Most Node Devs Pick the Wrong Tool Between Cluster and Workers You're staring at a single-threaded Node process that's using 12% of your 8-core server. Someone on the team suggests cluster module. Someone else...

Approach	Throughput (relative)	GC pressure	Latency per call
JSON serialize/deserialize	1× (baseline)	High — new string + parsed object per call	~0.8–3 ms at 1 MB payload
Buffer.from() + copy	2–3×	Medium — allocation per call	~0.3–1 ms at 1 MB payload
SharedArrayBuffer zero-copy	5–10×	Near-zero — no heap allocation	~0.05–0.15 ms at 1 MB payload

The GC pressure difference is what kills you in sustained workloads. Each serialized call produces objects that the V8 garbage collector eventually collects. At high call rates, you start seeing GC pauses in your latency percentiles — p99 spikes that have nothing to do with your application logic and everything to do with your data exchange pattern.

Technical Reference

Node.js Performance Tuning

Node.js Performance Tuning: Why Your p99 Is Lying to You Most Node.js apps look fine on a dashboard — average latency under 50ms, CPU under 40%, no alarms. Then a traffic spike hits and p99...

Pitfalls and Gotchas That Will Bite You

memory.grow() Invalidates All TypedArray Views

This is the most common production bug in SharedArrayBuffer/Wasm setups. When Wasm linear memory grows, the underlying ArrayBuffer is replaced. Any TypedArray view you created on the old buffer is now detached — its .buffer property returns a zero-length ArrayBuffer, and accessing it throws.


const memory = new WebAssembly.Memory({ initial: 1, maximum: 10, shared: true });
const view = new Uint8Array(memory.buffer); // Valid

// ... later, Wasm allocates more than 64 KB and triggers memory.grow() ...

// view is now DETACHED — memory.buffer points to a new SharedArrayBuffer
console.log(view.byteLength); // 0 — detached
view[0] = 1;   // TypeError: Cannot perform %TypedArray%.prototype.set
    // on a detached ArrayBuffer

// Correct: always re-derive views from memory.buffer after a grow
function getView() {
 return new Uint8Array(memory.buffer); // Fresh view on current buffer
}

The safe pattern is to never cache typed array views across grow boundaries. If your Wasm module can grow, re-derive views from memory.buffer on every operation, or after explicit grow checkpoints. The overhead of creating a typed array view is negligible — it is a view creation, not a copy.

Endianness: Wasm Is Always Little-Endian

WebAssembly memory is always little-endian, regardless of the host CPU architecture. This is specified in the standard and enforced by the runtime. MDN documents this explicitly: the WebAssembly.Memory object always operates in little-endian format.

For most x86/x64 and ARM (in little-endian mode) servers this is transparent. It matters if you are writing data on one architecture and reading on another — cross-architecture Wasm scenarios — or if you are sharing a buffer between Wasm and code that assumes big-endian layout.


// If your data protocol is big-endian (network byte order),
// you need explicit byteswap at the boundary
const dataView = new DataView(memory.buffer);

// Write a 32-bit integer in big-endian format explicitly
dataView.setInt32(offset, value, false); // false = big-endian

// Read it back — Wasm sees little-endian bytes
// Your Rust code must handle the swap:
// i32::from_be_bytes(ptr.read() ...) instead of direct read

Security: Shared Memory and Side-Channel Attacks

SharedArrayBuffer was disabled in all browsers in 2018 after Spectre. It came back in 2020 requiring Cross-Origin Isolation (COOP + COEP headers). In Node.js, none of this applies — there is no cross-origin security model, and SharedArrayBuffer works without any headers or flags.

The relevant risk in Node.js is different: if untrusted Wasm code runs in the same process with access to a SharedArrayBuffer that contains sensitive data, it can read that data. SharedArrayBuffer’s high-resolution timing properties (you can implement a fine-grained clock via atomic operations) have been used for timing attacks. Do not put credentials, private keys, or sensitive user data in shared buffers that untrusted modules can access.

The Bun Caveat (If Your Team Uses It)

A confirmed bug exists in Bun as of late 2025: Wasm writes to SharedArrayBuffer-backed memory are not consistently visible to Worker Threads, even though direct JavaScript writes to the same buffer are. Node.js handles this correctly. If your team uses Bun for development and Node.js for production, the behavior diverges. Keep this in mind when debugging synchronization issues.

Worth Reading

Nodejs Async Order

Why Your Node.js Code Runs in the Wrong Sequence You write clean async code, run it, and the callbacks fire in an order that makes zero sense. Not a bug in your logic — a...

Memory64 in Node.js 24

Node.js 24 (LTS since October 2025) ships with V8 13.6 and includes WebAssembly Memory64 support — 64-bit memory addressing, removing the 4 GB linear memory limit. The tradeoff: Memory64 carries a performance penalty between 10% and 100%+ compared to 32-bit mode, because explicit bounds checking cannot be eliminated the way it can with 32-bit address space reservation. Use Memory64 only when your workloads genuinely exceed the 4 GB limit.


// Node.js 24+ only — 64-bit Wasm memory addressing
const largeMemory = new WebAssembly.Memory({
 initial: 1024, // 64 MB
 maximum: 65536, // 4 GB
 shared: true,
 index: 'i64' // Enable 64-bit addressing
});

FAQ

Does Node.js require COOP/COEP headers to use SharedArrayBuffer?

No. The COOP and COEP header requirements apply only in browser contexts where cross-origin isolation is enforced. Node.js has no equivalent security model and SharedArrayBuffer is available without any configuration flags or headers.

How do I pass a SharedArrayBuffer from the main thread to a Worker Thread?

Use worker.postMessage({ buffer: sharedBuffer }). Unlike a regular ArrayBuffer, a SharedArrayBuffer is not transferred — it is cloned by reference. Both the main thread and the worker end up with handles to the same backing memory. Do not include it in the transferList argument.

What happens to existing TypedArray views when Wasm memory grows?

They become detached. The memory.buffer property returns a new SharedArrayBuffer after memory.grow(), and any typed array created on the old buffer throws TypeError on access. Always re-derive views from memory.buffer rather than caching them across grow operations.

Can multiple Wasm instances share the same WebAssembly.Memory object?

Yes. You create one WebAssembly.Memory instance and pass it into multiple WebAssembly.instantiate calls via the import object. All instances address the same linear memory. This is the standard pattern for multi-threaded Wasm with Worker Threads — each worker instantiates the same module against the shared memory object.

Is zero-copy actually zero-copy if I call .slice() on the output?

The .slice() is a copy. If you need the data to outlive the Wasm allocation (because you call wasm_free afterward), you need it. If you consume the output immediately before freeing, you can use the typed array view directly and avoid the copy entirely. The “zero-copy” benefit is in eliminating the serialization and deserialization roundtrip — the optional output copy is orders of magnitude cheaper.

What is the right way to handle concurrent Wasm calls from multiple Worker Threads?

Each worker gets its own Wasm module instance (same module, same shared memory, separate instance). Partition the memory so each worker operates on a distinct region, or implement a lock using Atomics.compareExchange on a control integer. Do not let two workers write to the same memory region without coordination — nothing in Wasm or V8 prevents the race.

Does Rust’s allocator work correctly when the Wasm module shares memory with JavaScript?

Yes, as long as JavaScript does not write into memory regions managed by the Rust allocator without going through the exported alloc/free functions. The Rust allocator manages its own bookkeeping inside linear memory. If JavaScript writes to an arbitrary offset that the allocator considers part of a live allocation’s metadata, you corrupt the allocator state. Always allocate through the exported function and treat the returned pointer as the only safe write target.

From Passing Data to Sharing State

The shift from serialization-based Node.js/Wasm integration to SharedArrayBuffer-based zero-copy is an architectural change, not just an optimization. You are moving from a model where data crosses a boundary to one where there is no boundary — both runtimes operate on the same bytes at the same addresses.

This is not the right approach for every integration point. If you are calling Wasm once per request with a small payload, the complexity overhead of explicit memory management, lifecycle tracking, and Atomics-based synchronization is not worth it. Use typed array views with regular ArrayBuffer and accept the copy cost.

Where it pays off: high-frequency calls, large payloads, sustained throughput requirements, or any path where GC pauses are appearing in your latency distribution. Audio/video processing, binary protocol parsing, numerical computation on large datasets, image manipulation — these are the workloads where the pattern earns its complexity.

The integration points worth auditing first are the ones with serialization: anything using JSON at the Wasm boundary, anything converting between Buffer and typed arrays on every call, anything with observable GC pauses in the p99 latency. Those are the places where replacing the copy with a shared view changes the performance profile in ways that matter.

Written by:

Krun Dev

Related Articles