Skip to content

fix QL match-string pattern#11548

Closed
ginsbach wants to merge 1 commit intomainfrom
ginsbach/RegexFix2
Closed

fix QL match-string pattern#11548
ginsbach wants to merge 1 commit intomainfrom
ginsbach/RegexFix2

Conversation

@ginsbach
Copy link
Copy Markdown
Contributor

@ginsbach ginsbach commented Dec 2, 2022

I am working on a fix for https://github.com/github/codeql-core/issues/2038. Is this the appropriate fix for the problem?

@ginsbach ginsbach added the no-change-note-required This PR does not need a change note label Dec 2, 2022
@ginsbach ginsbach requested a review from a team as a code owner December 2, 2022 14:50
@github-actions github-actions Bot added the Java label Dec 2, 2022
@smowton
Copy link
Copy Markdown
Contributor

smowton commented Dec 2, 2022

Looks highly plausible. Add / adapt a test to verify that this predicate in fact never worked?

@aschackmull
Copy link
Copy Markdown
Contributor

Why does this need a QL-level fix? The backslash character isn't special in a matches context, so it shouldn't need to be escaped, right? To me, this fix look wrong.

@aschackmull
Copy link
Copy Markdown
Contributor

Or am I misremembering? Is backslash the escape character for _ and %?

@aschackmull
Copy link
Copy Markdown
Contributor

aschackmull commented Dec 5, 2022

The argument is a pattern that matches the receiver, in the same way as the LIKE operator in SQL. Patterns may include _ to match a single character and % to match any sequence of characters. A backslash can be used to escape an underscore, a percent, or a backslash. Otherwise, all characters in the pattern other than _ and % and \\ must match exactly.

Found it. However, the description doesn't state how single backslashes that aren't followed by backslash, underscore, or percent are interpreted. I'd expect them to match a single backslash, and not silently fail the entire match - if so, then the QL ought to have raised a warning. Given how few features a matches spec has, would it make sense to have all strings be valid input, i.e. allow such stray backslashes to match as if they were escaped?

@ginsbach
Copy link
Copy Markdown
Contributor Author

ginsbach commented Dec 5, 2022

Or am I misremembering? Is backslash the escape character for _ and %?

Backslash is indeed the escape character (see inmemory/src/com/semmle/inmemory/ast/RegexpUtils.java).

@ginsbach
Copy link
Copy Markdown
Contributor Author

ginsbach commented Dec 5, 2022

The argument is a pattern that matches the receiver, in the same way as the LIKE operator in SQL. Patterns may include _ to match a single character and % to match any sequence of characters. A backslash can be used to escape an underscore, a percent, or a backslash. Otherwise, all characters in the pattern other than _ and % and \\ must match exactly.

Found it. However, the description doesn't state how single backslashes that aren't followed by backslash, underscore, or percent are interpreted. I'd expect them to match a single backslash, and not silently fail the entire match - if so, then the QL ought to have raised a warning. Given how few features a matches spec has, would it make sense to have all strings be valid input, i.e. allow such stray backslashes to match as if they were escaped?

There are a couple of points that need clarification here:

  • The compiler does generate a warning, but only logs it in a very late compiler pass. Everybody agrees that "then the QL ought to have raised a warning" - that's what I'm working on.
  • The match is not failing. The evaluator behaves as you suggest and will continue to do so. Hence, this warning is just a genuine warning to the user along the lines of "are you sure you don't have a typo here?".
  • it would be perfectly possible to simply remove the warning if that's seen as preferable.

@aschackmull
Copy link
Copy Markdown
Contributor

  • The compiler does generate a warning, but only logs it in a very late compiler pass. Everybody agrees that "then the QL ought to have raised a warning" - that's what I'm working on.
  • The match is not failing. The evaluator behaves as you suggest and will continue to do so. Hence, this warning is just a genuine warning to the user along the lines of "are you sure you don't have a typo here?".
  • it would be perfectly possible to simply remove the warning if that's seen as preferable.

Ah, makes more sense to me now, then. I guess that late compiler warning will then simply be removed? Since the matches string could have come from somewhere where it wasn't possible to give an early warning.

I don't have a strong opinion on the matter, but I think I slightly prefer removing the warning completely (and updating the documentation to state that solitary backslash works the same as an escaped backslash (assuming that it isn't followed by underscore or percent)).

@ginsbach
Copy link
Copy Markdown
Contributor Author

ginsbach commented Dec 6, 2022

We have decided that it is easier to just support the existence of solitary backspaces in this context.

@ginsbach ginsbach closed this Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants