Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS/RB/PY/Java: add suspicious range query #9712

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Jun 24, 2022

CVE-2021-42740: TP/TN

See the example JS results to see what this query flags.

The issues found are not security related in the vast majority of cases, but they are still clearly bugs.

Example results (JS is the most interesting): JavaScript, Python, Ruby, Java.

Evaluations looks fine: Ruby, Python, JavaScript, Java.
There is a slight slowdown, but I haven't been able to find a badly performing predicate in my new code.

Ruby: Some of the Ruby results are FPs due to the parser not parsing escapes as RegExpEscape.
I haven't looked into why that happens, but I'm quite sure it's somehow a bug in the parser and not the query.

Ruby/Java: The parsing of nested char classes is wrong, e.g. /[a-z&&[^a-c]]+/.
The nested [ and ] are parsed as literals instead of being parsed as a nested char class.
You can see how it should be parsed here: https://regex101.com/r/X6q22R/1

@erik-krogh erik-krogh added WIP Awaiting evaluation labels Jun 24, 2022
@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jun 24, 2022

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

1 similar comment
@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jun 26, 2022

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jun 26, 2022

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-f]{6}", color);
    }
}

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-F]{6}", color);
    }
}

References

javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

function isValidHexColor(color) {
    return /^#[0-9a-fA-f]{6}$/i.test(color);
}

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

function isValidHexColor(color) {
    return /^#[0-9A-F]{6}$/i.test(color);
}

References

python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-f]{6}$', color) is not None

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-F]{6}$', color) is not None

References

ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

Suspicious regexp range

A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\] ^_` ``.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Recommendation

Don't write character ranges were there might be confusion as to which characters are included in the range.

Example

The following example code checks whether a string is a valid 6 digit hex color.

def is_valid_hex_color(color)
    /^#[0-9a-fA-f]{6}$/.match(color)
end

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

def is_valid_hex_color(color)
    /^#[0-9a-fA-F]{6}$/.match(color)
end

References

@erik-krogh erik-krogh force-pushed the badRange branch 5 times, most recently from 09de3b8 to 7932be5 Compare Jun 27, 2022
@erik-krogh erik-krogh removed WIP Awaiting evaluation labels Jun 28, 2022
@erik-krogh erik-krogh marked this pull request as ready for review Jun 28, 2022
@erik-krogh erik-krogh requested review from as code owners Jun 28, 2022
@esbena
Copy link
Contributor

@esbena esbena commented Jun 28, 2022

Taking one step back: I think it is preferable to avoid having queries that surface our misparses to end-users.

Suggestions:

  • wait with merging the Ruby/Java versions of the query, and just move ahead with JS/Python for now.
  • avoid reporting results if we are able to detect that a misparse is likely

@erik-krogh
Copy link
Contributor Author

@erik-krogh erik-krogh commented Jun 29, 2022

  • avoid reporting results if we are able to detect that a misparse is likely

I've filtered out those results for Java/Ruby.

Copy link
Contributor

@esbena esbena left a comment

Partial review that I don't want to leave hanging while I'm on holiday.

@@ -0,0 +1,18 @@
/**
* @name Suspicious regexp range
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Suspicious" is too much of a catch-all, I think we can mention the property of the range that makes it suspicious (also, regexp vs regular expression):

  • "Too large regular expression range"
  • "Regular expression range with unintended content"

(I think it is OK to ignore the inverted ranges in the prose)

@@ -0,0 +1,18 @@
/**
* @name Suspicious regexp range
* @description Some ranges in regular expression might match more than intended.
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conventional form for descriptions is if-then:

  • Overly permissive regular expression ranges may cause regular expressions to match more than anticipated
    • (security angle ends with "may allow an attacker to bypass ...)

// any non-alpha numeric as part of the range
not isAlphanumeric([low, high].toUnicode())
) and
// some cases I want to exclude from being flagged
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// some cases I want to exclude from being flagged
// allowlist for known ranges

// the same with " " and "!". " " is the first printable character, and "!" is the first non-white-space printable character.
result.isRange([" ", "!"], _)
or
// I've seen this often enough, looks OK.
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// I've seen this often enough, looks OK.
// the `[@-_]` range is intentional

result.isRange(0.toUnicode(), _)
}

/** Gets all chars between (and including) `low` and `high`. */
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Gets all chars between (and including) `low` and `high`. */
/** Gets a char between (and including) `low` and `high`. */


/** Gets all chars between (and including) `low` and `high`. */
bindingset[low, high]
private string inRange(string low, string high) {
Copy link
Contributor

@esbena esbena Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private string inRange(string low, string high) {
private string getInRange(string low, string high) {

(minor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants