New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS/RB/PY/Java: add suspicious range query #9712
base: main
Are you sure you want to change the base?
Conversation
|
QHelp previews: java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelperrors/warnings: |
1 similar comment
|
QHelp previews: java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelperrors/warnings:ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelperrors/warnings: |
|
QHelp previews: java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelpSuspicious regexp rangeA regexp range can by accident match more than was intended. For example, the regular expression On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class RecommendationDon't write character ranges were there might be confusion as to which characters are included in the range. ExampleThe following example code checks whether a string is a valid 6 digit hex color. import java.util.regex.Pattern
public class Tester {
public static boolean is_valid_hex_color(String color) {
return Pattern.matches("#[0-9a-fA-f]{6}", color);
}
}However, the The fix is to use an uppercase import java.util.regex.Pattern
public class Tester {
public static boolean is_valid_hex_color(String color) {
return Pattern.matches("#[0-9a-fA-F]{6}", color);
}
}References
javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelpSuspicious regexp rangeA regexp range can by accident match more than was intended. For example, the regular expression On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class RecommendationDon't write character ranges were there might be confusion as to which characters are included in the range. ExampleThe following example code checks whether a string is a valid 6 digit hex color. function isValidHexColor(color) {
return /^#[0-9a-fA-f]{6}$/i.test(color);
}However, the The fix is to use an uppercase function isValidHexColor(color) {
return /^#[0-9A-F]{6}$/i.test(color);
}References
python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelpSuspicious regexp rangeA regexp range can by accident match more than was intended. For example, the regular expression On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class RecommendationDon't write character ranges were there might be confusion as to which characters are included in the range. ExampleThe following example code checks whether a string is a valid 6 digit hex color. import re
def is_valid_hex_color(color):
return re.match(r'^#[0-9a-fA-f]{6}$', color) is not NoneHowever, the The fix is to use an uppercase import re
def is_valid_hex_color(color):
return re.match(r'^#[0-9a-fA-F]{6}$', color) is not NoneReferences
ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelpSuspicious regexp rangeA regexp range can by accident match more than was intended. For example, the regular expression On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class RecommendationDon't write character ranges were there might be confusion as to which characters are included in the range. ExampleThe following example code checks whether a string is a valid 6 digit hex color. def is_valid_hex_color(color)
/^#[0-9a-fA-f]{6}$/.match(color)
endHowever, the The fix is to use an uppercase def is_valid_hex_color(color)
/^#[0-9a-fA-F]{6}$/.match(color)
endReferences
|
09de3b8
to
7932be5
Compare
|
Taking one step back: I think it is preferable to avoid having queries that surface our misparses to end-users. Suggestions:
|
I've filtered out those results for Java/Ruby. |
Partial review that I don't want to leave hanging while I'm on holiday.
| @@ -0,0 +1,18 @@ | |||
| /** | |||
| * @name Suspicious regexp range | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Suspicious" is too much of a catch-all, I think we can mention the property of the range that makes it suspicious (also, regexp vs regular expression):
- "Too large regular expression range"
- "Regular expression range with unintended content"
(I think it is OK to ignore the inverted ranges in the prose)
| @@ -0,0 +1,18 @@ | |||
| /** | |||
| * @name Suspicious regexp range | |||
| * @description Some ranges in regular expression might match more than intended. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conventional form for descriptions is if-then:
- Overly permissive regular expression ranges may cause regular expressions to match more than anticipated
- (security angle ends with "may allow an attacker to bypass ...)
| // any non-alpha numeric as part of the range | ||
| not isAlphanumeric([low, high].toUnicode()) | ||
| ) and | ||
| // some cases I want to exclude from being flagged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // some cases I want to exclude from being flagged | |
| // allowlist for known ranges |
| // the same with " " and "!". " " is the first printable character, and "!" is the first non-white-space printable character. | ||
| result.isRange([" ", "!"], _) | ||
| or | ||
| // I've seen this often enough, looks OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // I've seen this often enough, looks OK. | |
| // the `[@-_]` range is intentional |
| result.isRange(0.toUnicode(), _) | ||
| } | ||
|
|
||
| /** Gets all chars between (and including) `low` and `high`. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** Gets all chars between (and including) `low` and `high`. */ | |
| /** Gets a char between (and including) `low` and `high`. */ |
|
|
||
| /** Gets all chars between (and including) `low` and `high`. */ | ||
| bindingset[low, high] | ||
| private string inRange(string low, string high) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private string inRange(string low, string high) { | |
| private string getInRange(string low, string high) { |
(minor)
CVE-2021-42740: TP/TN
See the example JS results to see what this query flags.
The issues found are not security related in the vast majority of cases, but they are still clearly bugs.
Example results (JS is the most interesting): JavaScript, Python, Ruby, Java.
Evaluations looks fine: Ruby, Python, JavaScript, Java.
There is a slight slowdown, but I haven't been able to find a badly performing predicate in my new code.
Ruby: Some of the Ruby results are FPs due to the parser not parsing escapes as
RegExpEscape.I haven't looked into why that happens, but I'm quite sure it's somehow a bug in the parser and not the query.
Ruby/Java: The parsing of nested char classes is wrong, e.g.
/[a-z&&[^a-c]]+/.The nested
[and]are parsed as literals instead of being parsed as a nested char class.You can see how it should be parsed here: https://regex101.com/r/X6q22R/1