JS/RB/PY/Java: add suspicious range query #9712

erik-krogh · 2022-06-24T19:46:38Z

CVE-2021-42740: TP/TN

See the example JS results to see what this query flags.

The issues found are not security related in the vast majority of cases, but they are still clearly bugs.

Example results (JS is the most interesting): JavaScript, Python, Ruby, Java.

Evaluations looks fine: Ruby, Python, JavaScript, Java.
There is a slight slowdown, but I haven't been able to find a badly performing predicate in my new code.

Ruby: Some of the Ruby results are FPs due to the parser not parsing escapes as RegExpEscape.
I haven't looked into why that happens, but I'm quite sure it's somehow a bug in the parser and not the query.

Ruby/Java: The parsing of nested char classes is wrong, e.g. /[a-z&&[^a-c]]+/.
The nested [ and ] are parsed as literals instead of being parsed as a nested char class.
You can see how it should be parsed here: https://regex101.com/r/X6q22R/1

github-actions · 2022-06-24T19:47:53Z

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

github-actions · 2022-06-26T20:43:48Z

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

github-actions · 2022-06-26T20:51:27Z

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-f]{6}", color);
    }
}

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-F]{6}", color);
    }
}

References

Mitre.org: CWE-020
github.com: CVE-2021-42740
wh0.github.io: Exploiting CVE-2021-42740
Common Weakness Enumeration: CWE-20.

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

function isValidHexColor(color) {
    return /^#[0-9a-fA-f]{6}$/i.test(color);
}

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

function isValidHexColor(color) {
    return /^#[0-9A-F]{6}$/i.test(color);
}

References

Mitre.org: CWE-020
github.com: CVE-2021-42740
wh0.github.io: Exploiting CVE-2021-42740
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-f]{6}$', color) is not None

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-F]{6}$', color) is not None

References

Mitre.org: CWE-020
github.com: CVE-2021-42740
wh0.github.io: Exploiting CVE-2021-42740
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

def is_valid_hex_color(color)
    /^#[0-9a-fA-f]{6}$/.match(color)
end

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

def is_valid_hex_color(color)
    /^#[0-9a-fA-F]{6}$/.match(color)
end

References

Mitre.org: CWE-020
github.com: CVE-2021-42740
wh0.github.io: Exploiting CVE-2021-42740
Common Weakness Enumeration: CWE-20.

esbena · 2022-06-28T21:21:50Z

Taking one step back: I think it is preferable to avoid having queries that surface our misparses to end-users.

Suggestions:

wait with merging the Ruby/Java versions of the query, and just move ahead with JS/Python for now.
avoid reporting results if we are able to detect that a misparse is likely

erik-krogh · 2022-06-29T11:17:22Z

avoid reporting results if we are able to detect that a misparse is likely

I've filtered out those results for Java/Ruby.

esbena

Partial review that I don't want to leave hanging while I'm on holiday.

esbena · 2022-07-01T12:59:22Z

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.ql

@@ -0,0 +1,18 @@
+/**
+ * @name Suspicious regexp range


"Suspicious" is too much of a catch-all, I think we can mention the property of the range that makes it suspicious (also, regexp vs regular expression):

"Too large regular expression range"

"Regular expression range with unintended content"

(I think it is OK to ignore the inverted ranges in the prose)

esbena · 2022-07-01T12:59:22Z

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.ql

@@ -0,0 +1,18 @@
+/**
+ * @name Suspicious regexp range
+ * @description Some ranges in regular expression might match more than intended.


The conventional form for descriptions is if-then:

Overly permissive regular expression ranges may cause regular expressions to match more than anticipated

(security angle ends with "may allow an attacker to bypass ...)

esbena · 2022-07-01T12:59:22Z

python/ql/lib/semmle/python/security/SuspiciousRegexpRangeQuery.qll

+      // any non-alpha numeric as part of the range
+      not isAlphanumeric([low, high].toUnicode())
+    ) and
+    // some cases I want to exclude from being flagged


Suggested change

// some cases I want to exclude from being flagged

// allowlist for known ranges

esbena · 2022-07-01T12:59:22Z

python/ql/lib/semmle/python/security/SuspiciousRegexpRangeQuery.qll

+  // the same with " " and "!". " " is the first printable character, and "!" is the first non-white-space printable character.
+  result.isRange([" ", "!"], _)
+  or
+  // I've seen this often enough, looks OK.


Suggested change

// I've seen this often enough, looks OK.

// the `[@-_]` range is intentional

esbena · 2022-07-01T12:59:22Z

python/ql/lib/semmle/python/security/SuspiciousRegexpRangeQuery.qll

+  result.isRange(0.toUnicode(), _)
+}
+
+/** Gets all chars between (and including) `low` and `high`. */


Suggested change

/** Gets all chars between (and including) `low` and `high`. */

/** Gets a char between (and including) `low` and `high`. */

esbena · 2022-07-01T12:59:22Z

python/ql/lib/semmle/python/security/SuspiciousRegexpRangeQuery.qll

+
+/** Gets all chars between (and including) `low` and `high`. */
+bindingset[low, high]
+private string inRange(string low, string high) {


Suggested change

private string inRange(string low, string high) {

private string getInRange(string low, string high) {

(minor)

erik-krogh added WIP Awaiting evaluation labels Jun 24, 2022

github-actions bot added documentation Java JS Python Ruby labels Jun 24, 2022

erik-krogh force-pushed the badRange branch 5 times, most recently from 09de3b8 to 7932be5 Compare Jun 27, 2022

erik-krogh removed WIP Awaiting evaluation labels Jun 28, 2022

add suspicious-regexp-range query

a343cea

erik-krogh force-pushed the badRange branch from 7932be5 to a343cea Compare Jun 28, 2022

erik-krogh marked this pull request as ready for review Jun 28, 2022

erik-krogh requested review from as code owners Jun 28, 2022

erik-krogh added 2 commits Jun 29, 2022

filter out potential misparses from rb/suspicious-regexp-range

2e295e4

filter out potential misparses from java/suspicious-regexp-range

9ecc3a2

esbena reviewed Jul 1, 2022

View changes

JS/RB/PY/Java: add suspicious range query #9712

JS/RB/PY/Java: add suspicious range query #9712

erik-krogh commented Jun 24, 2022 •

edited

github-actions bot commented Jun 24, 2022

errors/warnings:

errors/warnings:

errors/warnings:

errors/warnings:

github-actions bot commented Jun 26, 2022

errors/warnings:

errors/warnings:

errors/warnings:

errors/warnings:

github-actions bot commented Jun 26, 2022 •

edited

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

esbena commented Jun 28, 2022

erik-krogh commented Jun 29, 2022

esbena left a comment

esbena Jul 1, 2022

esbena Jul 1, 2022

esbena Jul 1, 2022

esbena Jul 1, 2022

esbena Jul 1, 2022

esbena Jul 1, 2022

	// some cases I want to exclude from being flagged
	// allowlist for known ranges

	// I've seen this often enough, looks OK.
	// the `[@-_]` range is intentional

	/** Gets all chars between (and including) `low` and `high`. */
	/** Gets a char between (and including) `low` and `high`. */

	private string inRange(string low, string high) {
	private string getInRange(string low, string high) {

JS/RB/PY/Java: add suspicious range query #9712

Are you sure you want to change the base?

JS/RB/PY/Java: add suspicious range query #9712

Conversation

erik-krogh commented Jun 24, 2022 • edited

github-actions bot commented Jun 24, 2022

errors/warnings:

errors/warnings:

errors/warnings:

errors/warnings:

github-actions bot commented Jun 26, 2022

errors/warnings:

errors/warnings:

errors/warnings:

errors/warnings:

github-actions bot commented Jun 26, 2022 • edited

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

Suspicious regexp range

Recommendation

Example

References

esbena commented Jun 28, 2022

erik-krogh commented Jun 29, 2022

esbena left a comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

esbena Jul 1, 2022

Choose a reason for hiding this comment

erik-krogh commented Jun 24, 2022 •

edited

github-actions bot commented Jun 26, 2022 •

edited