Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: Atomic group polyfill not detected as a ReDOS mitigation #9062

Open
pygy opened this issue May 8, 2022 · 3 comments
Open

JS: Atomic group polyfill not detected as a ReDOS mitigation #9062

pygy opened this issue May 8, 2022 · 3 comments

Comments

@pygy
Copy link

pygy commented May 8, 2022

Description of the false positive

You can polyfill atomic groups in JS by using the /(?=(...))\1// pattern, but LGTM doesn't seem to understand it and reports a false positive here.

Here's the culprit:

var oneEscapeOrCharClassMatcher = /^(?:\\.|\[(?=((?:\\.|.)*?))\1\])$/;

There are other similar cases in the project, it is a RegExp composition lib that provides an atomic(x) helper that wraps x accordingly (and uses such RegExps internally).

Still, to my amusement (and slight consternation), LGTM caught an polynomial ReDOS in my lib... Thanks for the project.

URL to the alert on the project page on LGTM.com

https://lgtm.com/projects/g/compose-regexp/compose-regexp.js/snapshot/e31d432f942019263401085e38558c5661dc7460/files/commonjs/compose-regexp.js?sort=name&dir=ASC&mode=heatmap#xdcee8d483c053100:1

@pygy pygy changed the title LGTM.com - false positive JS: Atomic group polyfill not detected as a ReDOS mitigation May 8, 2022
@erik-krogh
Copy link
Contributor

Yes the ReDoS analysis doesn't really understand back-references or lookaheads.

And yes, that's an FP.


But I also think your regexp is broken, and fixing the regexp would also remove the ReDoS FP.

I can get the regexp to match strings like "\\d", "\\w", and "[]".
But I can't get it to match any string that has something between the square brackets.
The reason for the that is the lazy repetition (*?), because that lazy repetition will always just trivially match the empty string.
The following backref will then match the empty string, which is only possible when the entire string is "[]".

From the name of the regexp it sounds like you are trying to match char escapes (which it does) and char classes (it only matches []).

Maybe try /^(?:\\.|\[[^\]]+\])$/ instead, that way your regexp matches anything enclosed by square brackets, and it doesn't match [] (which is a syntax error).


Some examples that show why I think your current regexp is broken:

> /^(?:\\.|\[(?=((?:\\.|.)*?))\1\])$/.test("\\d") // should match - and does - good
true
> /^(?:\\.|\[(?=((?:\\.|.)*?))\1\])$/.test("[]") // should not match - but does - bad
true
> /^(?:\\.|\[(?=((?:\\.|.)*?))\1\])$/.test("[a-z]") // should match - but doesn't - bad
false

@pygy
Copy link
Author

pygy commented May 9, 2022

Mmmh, indeed, well spotted thanks, there's a hole in my test suite, and another one in my understanding of look aheads :-)

Thankfully, the bug has no consequence on the functionality of the lib. This is part of the logic to determine if a non-capturing group is required before applying a quantifier. This bug means there will be useless non-capturing groups. It will soon be fixed.

Edit2: I went for the atomic version of your suggestion, with * as quantifier: /^(?:\\.|\[(?=((?:\\.|[^\]])*))\1\])$/

Btw, /[]+/ is not a syntax error in JS, even in u mode.

I may reject it at some point, because it makes little sense to quantify never, but for now, since it doesn't cause syntax errors I'll keep it.

@erik-krogh
Copy link
Contributor

Mmmh, indeed, well spotted thanks, there's a hole in my test suite, and another one in my understanding of look aheads :-)

No problem.
Regular expressions are hard. I was basically dreaming in automata when I made the ReDoS libraries, and even I get something wrong every once in a while.

Btw, /[]+/ is not a syntax error in JS

Hmm. I misremembered. It's a syntax errors in some languages, I just remembered wrong for JS.

Edit2: I went for the atomic version of your suggestion, with * as quantifier: /^(?:\\.|\[(?=((?:\\.|[^\]])*))\1\])$/

This is nitpicking, but I'm not fond of using features like backreferences when something can be expressed using a plain regular expression.

Looking at your regexp and mine, they don't match the same things (not just the [] thing).
You have a greedy atomic match that will try to match \\. before trying to match [\]], and that's why your regexp correctly matches a string such as "[a\\]]" (because the "\\" string is not eaten by the [\]] term).
Try to flip the order of \\. and [^\]] in your regexp, then the string "[a\\]]" will no longer be matched.
(My regexp would mistakenly not match "[a\\]]", and your regexp made me realize it).

The proper fix (I think), is to modify the regular expression to something like: /^(?:\\.|\[([^\]\\]|\\.)*\])$/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants