-
Notifications
You must be signed in to change notification settings - Fork 1.7k
C: refactor code to solve false positive #8739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I modified this query a lot to solve false positive in query https://lgtm.com/query/878361108346378150/. In this version I used global taint tracking instead of previous local taint tracking
|
Thanks for improving this query. Have you considered using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work on this query!
Some suggestions and questions below.
I would recommend you think about adding tests for this query in the cpp/ql/test/experimental/query-tests/Security/CWE/CWE-020 directory. Though tests are not strictly required for queries in experimental, I think as you're doing a lot of work on this query they might have quite a lot of value.
| this.getName().regexpMatch("C_SYSC_[a-zA-Z]+") or | ||
| this.getName().regexpMatch("SYSC_[a-zA-Z]+") or | ||
| this.getName().regexpMatch("compat_SyS_[a-zA-Z]+") or | ||
| this.getName().regexpMatch("SyS_[a-zA-Z]+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using the [a-zA-Z]+ pattern rather than .* (or .match("C_SYSC_%"))? I'm not saying one or other approach is better, just curious about the intended behaviour e.g. for functions potentially with numeric digits in their names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using the
[a-zA-Z]+pattern rather than.*(or.match("C_SYSC_%"))? I'm not saying one or other approach is better, just curious about the intended behaviour e.g. for functions potentially with numeric digits in their names.
Hello @geoffw0 ,
As I commented in the query SysCallFunction is used to model linux syscall, and as far as I know there is no linux syscall have numeric digits in their names. And frankly speaking I don't think have numeric digits in function names is a good coding style hahah:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://filippo.io/linux-syscall-table/ there seem to be some syscalls with digits such as pread64, dup2, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://filippo.io/linux-syscall-table/ there seem to be some syscalls with digits such as pread64, dup2, etc.
Sorry for missing your reply, Thank you. I will try out another way to model linux syscall :)
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Outdated
Show resolved
Hide resolved
| * parameter dummy will be regared as user-mode pointer used | ||
| * in unsafe_put_user without security check using access_ok | ||
| * but in fact dummy is only used to read memory otherwise | ||
| * instead of wring user-mode memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * instead of wring user-mode memory. | |
| * instead of writing user-mode memory. |
| } | ||
|
|
||
| /* | ||
| * Since there is no convenient way to indentify user mode pointer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Since there is no convenient way to indentify user mode pointer, | |
| * Since there is no convenient way to identify user mode pointer, |
| class UserModePtrNode extends DataFlow::Node { | ||
| UserModePtrNode() { | ||
| exists(SysCallParameter p | this.asParameter() = p) or | ||
| exists(FunctionCall va | this.asExpr() = va) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line match only a call to a SysCallFunction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line match only a call to a
SysCallFunction?
In this query I model taint souce comes from Linux syscall and other function return values that maybe used as user-mode pointer. So this line SysCallParameter used to model parameter from linux syscall parameter, and FunctionCall used to model other function return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still matching every function call in the program.
| exists(FunctionCall va | this.asExpr() = va) | |
| exists(FunctionCall va | this.asExpr() = va and va.getTarget() instanceof SysCallFunction) |
| * Track all UserModePtrNode that flow to UnSafePutUserMacro | ||
| */ | ||
|
|
||
| class UnsafePutUserConfig extends TaintTracking::Configuration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider overriding the isSanitizer method of this TaintTracking::Configuration to block flow through a UserModePtrCheckMacro, instead of having a separate configuration to detect when one is reachable.
Note that the meaning would not be exactly the same. A barrier has to be on the path in question, whereas your UserModePtrCheckConfig taint tracking configuration looks for any reachable UserModePtrCheckMacro (even if, for example, it comes after the unsafe put).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @geoffw0
You are absolutely right isSanitizer or isBarrier is the most way to whether user-mode pointer node flow through a UserModePtrCheckMacro, but it doest not work and I don't know why.
I have tried to define UnsafePutUserConfig in this way
class UnsafePutUserConfig extends TaintTracking::Configuration {
UnsafePutUserConfig() { this = "UnsafePutUserConfig" }
override predicate isSource(DataFlow::Node node) { node instanceof UserModePtrNode }
override predicate isSink(DataFlow::Node node) {
exists(UnSafePutUserMacro m | node.asExpr() = m.getExprOperand())
}
override predicate isSanitizer(DataFlow::Node node){
exists(UserModePtrCheckMacro m |
node.asExpr() = m.getArgument()
)
}
}
or like this
class UnsafePutUserConfig extends DataTracking::Configuration {
UnsafePutUserConfig() { this = "UnsafePutUserConfig" }
override predicate isSource(DataFlow::Node node) { node instanceof UserModePtrNode }
override predicate isSink(DataFlow::Node node) {
exists(UnSafePutUserMacro m | node.asExpr() = m.getExprOperand())
}
override predicate isBarrier(DataFlow::Node node){
exists(UserModePtrCheckMacro m |
node.asExpr() = m.getArgument()
)
}
}
But isSanitizer or isBarrier neither blocks the data flow from source to unsafe_put_user, can cause many false positives. for example, the following code is not buggy, but my query will think ptr is not validated before use
void function foo()
{
void* ptr = func_return_user_mode_pointer();
bar(ptr);
}
void function bar(void * ptr)
{
if(!access_ok(ptr))
return;
usafe_put_user(0x41414141, ptr)
}
This is why I defined another taint config UserModePtrCheckConfig:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @geoffw0
You are absolutely right isSanitizer or isBarrier is the most way to whether user-mode pointer node flow through a UserModePtrCheckMacro, but it doest not work and I don't know why.
I have tried to define UnsafePutUserConfig in this way
class UnsafePutUserConfig extends TaintTracking::Configuration { UnsafePutUserConfig() { this = "UnsafePutUserConfig" } override predicate isSource(DataFlow::Node node) { node instanceof UserModePtrNode } override predicate isSink(DataFlow::Node node) { exists(UnSafePutUserMacro m | node.asExpr() = m.getExprOperand()) } override predicate isSanitizer(DataFlow::Node node){ exists(UserModePtrCheckMacro m | node.asExpr() = m.getArgument() ) } }or like this
class UnsafePutUserConfig extends DataTracking::Configuration { UnsafePutUserConfig() { this = "UnsafePutUserConfig" } override predicate isSource(DataFlow::Node node) { node instanceof UserModePtrNode } override predicate isSink(DataFlow::Node node) { exists(UnSafePutUserMacro m | node.asExpr() = m.getExprOperand()) } override predicate isBarrier(DataFlow::Node node){ exists(UserModePtrCheckMacro m | node.asExpr() = m.getArgument() ) } }But isSanitizer or isBarrier neither blocks the data flow from source to unsafe_put_user, can cause many false positives. for example, the following code is not buggy, but my query will think ptr is not validated before use
void function foo() { void* ptr = func_return_user_mode_pointer(); bar(ptr); } void function bar(void * ptr) { if(!access_ok(ptr)) return; usafe_put_user(0x41414141, ptr) }This is why I defined another taint config UserModePtrCheckConfig:(
@rdmarsh2 Thanks for your advice, Would you mind to take a look at my comments above? isSanitizer or isBarrier does not work for me, but I will try Configuration::isBarrierGuard
Hello @rdmarsh2 Really confused, I even think I may ran into some bugs in CodeQL:(, I defiend DataFlow::BarrierGuard like this and override isSanitizerGuard in taint tracking configuration But there are still many obvious false positives you can see the pic I attached |
I'm really confused, you can see in my attached pic data flow does not go through user_access_begin But the predicate checked holds for user_access_begin |
|
If I defined a taint tracking config without any sanitizer or barrier, I found that data flow does not flow through any UserModePtrCheckMacro node defined in a If statement for following code the taint flow path query result is Path:
user_access_begin expaned in if statement is ignored |
|
Really confused:( I don't know why UserModePtrCheckConfig can track taint from UserModePtrNode to UserModePtrCheckMacro |
That's expected - that use isn't part of the dataflow path here, since it's consumed by |
Hello @rdmarsh2 Thanks for your advice, but in fact I also tried I had ran quick evaluation for isSanitizerGuard, and I can get a lot results of call to user_access_begin as my pic attached shows |
Hello @rdmarsh2 As you put it user_access_begin is not part of dataflow path here, so I think predicate isSanitizerGuard is also useless here since isSanitizerGuard is intented to stop dataflow, but in fact user_access_begin is not part of dataflow |
Hello @geoffw0 I found that some latest version of Linux kernel have another to define syscall, and my query fails to select them. I'll try some other ways |
|
Hello @geoffw0 Are you still working on this pull request? I think I can not have better way to solve problem we discussed above |
As @rdmarsh2 suggest I tried to use Configuration::isBarrierGuard to avoid twice taint tracking, but it does not work
|
Hi @4B5F5F4B, I've just had a good look at your query and I think your sources ( Try replacing
On to those barriers. I had a bit of trouble with Its not perfect yet, but I was able to get good results on the tests with these changes. Hope that helps! |
Hello @geoffw0 So I think there seems to no no much point in using something like isBarrier or BarrierGuard, twice global taint flow configuration is good enough. |
|
OK, I'll run the checks on the query as it is now then... |
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Fixed
Show fixed
Hide fixed
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Fixed
Show fixed
Hide fixed
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Fixed
Show fixed
Hide fixed
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Fixed
Show fixed
Hide fixed
roll back to twice global taint tracking to ensure less false positive and effeciency
Hello @geoffw0 Thank you but I make another commit to roll back to twice global taint tracking, and choose another to model linux syscall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with rolling this back to the two dataflow configs solution, but the definition of UserModePtrNode still doesn't look right to me.
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/experimental/Security/CWE/CWE-020/NoCheckBeforeUnsafePutUser.ql
Show resolved
Hide resolved
| class UserModePtrNode extends DataFlow::Node { | ||
| UserModePtrNode() { | ||
| exists(SysCallParameter p | this.asParameter() = p) or | ||
| exists(FunctionCall va | this.asExpr() = va) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still matching every function call in the program.
| exists(FunctionCall va | this.asExpr() = va) | |
| exists(FunctionCall va | this.asExpr() = va and va.getTarget() instanceof SysCallFunction) |
|
|
||
| class UserModePtrNode extends DataFlow::Node { | ||
| UserModePtrNode() { | ||
| exists(SysCallParameter p | this.asParameter() = p) or |
Check warning
Code scanning / CodeQL
Expression can be replaced with a cast
| class UserModePtrNode extends DataFlow::Node { | ||
| UserModePtrNode() { | ||
| exists(SysCallParameter p | this.asParameter() = p) or | ||
| exists(FunctionCall va | this.asExpr() = va) |
Check warning
Code scanning / CodeQL
Expression can be replaced with a cast
…fePutUser.ql Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…fePutUser.ql Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…fePutUser.ql Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
|
Hello @geoffw0 Thank you again for your reviewing my code, but I have to explain why I model UserModePtrNode in following code As I left in comment since there is no convient way to identify user mode pointer, In this query syscall parameter or any function return value will be regared source node that's used as user mode pointer. Please consider following example void foo() |
|
OK, I understand now. It looks like this branch needs merging with I can't find any real world results for this query, because its really specialized for the Linux kernel so there isn't a lot of code out there to try it on. We will have to depend on the tests to reveal how well it works. |
OK |
When will the query merger into main branch? |
|
Ah, sorry, I thought it had merge conflicts (misreading the new interface). I've started the checks. Happy to merge if everything looks good after that. |
|
The test is failing with the following difference: That's the only result of the test, so its saying we now get no results. This suggests to me that something is going wrong in the query (or perhaps the test isn't close enough to reality). |
|
Was this closed on purpose @4B5F5F4B? |





I modified this query a lot to solve false positive in query https://lgtm.com/query/878361108346378150/. In this version I used global taint tracking instead of previous local taint tracking. And I changed the type of problem to path-problem to make the resutls more readable.