This is a card in Dave's Virtual Box of Cards.

JavaScript string parsing performance test

Page created: 2023-11-25

Back to JavaScript

Finding: Built-in browser regex operations are faster than manually scanning strings in JavaScript, even with poorly written code.

The actual JS test is on this page. You can run it yourself.

I wanted to test the performance of doing a super low-effort regex-based line-by-line parse of a text script versus a character-by-character scan with manual state.

The test "script" may contains a bit of syntax we’re looking for in this format:

[xxxx] yyyyy

The regex method looks roughly like this:

lines = input.split(by newline)
lines.forEach(line{
	/[.*] .*/.exec(line)
	if(captured 'hello world'){
		increment match count
	}
})

The string scan method looks roughly like this:

for(i=0; i<input.length; i++){
	if(input[i] === '[') we're in a bracket
	if(input[i] === ']') bracket end, capture
		if(captured 'hello world'){
			increment match count
	}
	if(input[i] === '\n') we're no longer in a bracket
}

The simple state machine stuff is mostly elided from the scan example. You can 'View Source' on this page to see the actual code.

Regex much faster than comparing string[i] by character

As I’d suspected, the silly split and match regex method yielded much better results. Even though looping through the characters a single time looks much more efficient, the JS logic simply can’t compete with the heavily optimized native regex functions!

After multiple runs, I’m seeing that both versions can find 10 copies of the syntax match in under a millisecond (with Firefox on my desktop machine).

At 100 copies (and 65,000 characters of "script"), the regex version starts to pull ahead and at 1,000 copies (650,000 chars) it’s at least twice as fast.

Beyond that (10,000 copies at 6.5Mb of script and above) the regex version is well over twice as fast.

Running the same tests on Firefox on my Android phone yields surprisingly similar results. (Clearly my desktop is not being fully utilized!) On the phone, the regex performance is even comparatively better.

Try it yourself!

The script box below isn’t a textarea for entry. It’s just showing the input text "script".

You can make as many copies of it as you want to test. The char count is the length of the resulting copies in memory.

The tests will report the milliseconds taken and number of matches (which should equal the number of script copies you made).

foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop!
foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop! Foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop!
[hello world] Hello World!
foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop! Foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop! foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop! Foo bar baz bop boop bippity-bop bong. Bing bong bong bong ding dong blooooop!
[weeep
foo bar baz bop boop
foo bar baz bop boop
foo bar baz bop boop
Input copies:
chars.
Run tests:

Aside: performance.now()

I’m using the performance.now() (developer.mozilla.org) method to get a monotonic timestamp. This is supposed to be a floating value with fractional milliseconds that starts on page load (or thereabouts).

MDN explains that we can’t have nice things: the accuracy of performance.now() is intentionally off my microseconds to prevent its use in timing attacks and fingerprinting. Fine, whatever. But in fact, Firefox and Safari don’t give fractional milliseconds at all! They both return whole integers.

Argh!

(Chrome apparently gives you fractional values, but I don’t run proprietary advertising delivery software on my computer unless required by the workplace.)

performance.now() still has the advantage of being monotonic, though. By contrast, Date.now() isn’t guaranteed to be monotonic (you never know, ntpd could be skewing your clock to correct it or something and that could put your times off).