C++: Improve the CleartextFileWrite query by geoffw0 · Pull Request #6273 · github/codeql

geoffw0 · 2021-07-13T17:47:46Z

Improve the CleartextFileWrite.ql query:

add test cases inspired by things I found on LGTM.
straightforward improvements to the heuristic rules in SensitiveExprs.qll (reduces FPs in this query and a couple of others that also use the same library).
add simple dataflow to CleartextFileWrite.ql (increases TPs).
promote the query to @precision high.

I'm open to discussion about whether this is really enough to justify @precision high. I've only spent a few hours on these improvements, but the results look good to me (https://lgtm.com/query/5235174286264674280/) and enabling it will modestly improve our SAMATE results as well as flagging this bad practice to users.

rdmarsh2 · 2021-07-14T00:37:17Z

A lot of the remaining results look like false positives or wontfix.

The tpasswd results on gnutls/gnutls are all one variable that I think is a filename, not an actual password.
The git/git result is a credential daemon passing things over a local socket - I think it's OK to have an alert that gets suppressed here.
The systemd ones look like they're generated recovery keys, but again, I think it's OK.
The zeromq/libzmq result is in test code (and presumably filtered outside the query).

That leaves 5 projects with true positives, 2 where it's a wontfix (but presumably was thought about carefully), and one false positives. I think if we run it on a bigger set of projects, it will look better, but we should actually do that before raising the precision.

MathiasVP · 2021-07-14T07:52:00Z

The changes all LGTM! I agree with @rdmarsh2, though: We should either:

Run this PR on a bigger set of projects before we merge it, or
Revert the last commit that increases the precision to high and run the query on a bigger set of projects later / wait until the dist upgrade.

In the spirit of less friction, I say we just roll with it and run this PR on all of LGTM, and merge it if the results look good.

geoffw0 · 2021-07-14T08:10:25Z

Thanks for the detailed reviews. I'll run it on a bigger set of projects then we can decide whether to merge with or without the precision change...

geoffw0 · 2021-07-15T13:20:15Z

135 of 11,601 projects have results for this query (a little over 1%; we can add more data flow support if we want to find more results).

Reviewing the first 20 projects with results:

there are plenty of TPs, e.g. where passwords or recovery keys are written out plaintext to files or logs as they're created.
I saw several results in password crackers, which was unexpected! I don't think there's anything wrong with reporting these, it is intended behaviour, but it's also a security issue, though I doubt the maintainers of such projects would want to fix it. The query also is not noisy in those projects (or any others), so I don't anticipate this frustrating any white-hat efforts.
~~some FPs where the type is integer (i64 passwords_max, int password_count, int account_id, int passwds). This is the most common type of FP and should be fixed.~~ - now fixed.
~~I'm not convinced by many of the results featuring "account" / "accnt". Some of them may be TPs but many are not. In the .interests of accuracy I'll remove those strings from SensitiveExprs.qll.~~ - now fixed
I'm not convinced "conf" should be an exclusion after all. The motivating case looks like a TP on closer inspection, and there are other cases with similar strings nearby that appear to be TPs as well. - "conf" is no longer a recognized exception.

So I'm going to make a few changes, test, and then probably start another big run.

geoffw0 · 2021-07-15T14:44:28Z

I've made the changes, I'll wait for any immediate suggestions before I do another big run.

geoffw0 · 2021-07-22T11:00:00Z

Update: 105 out of 11,628 projects now have results for this query. I'd like to find more results, but precision is the priority for now.

Of the first 20 or so projects with results:

most of the results are looking good (at least superficially).
the i64 passwords_max / int password_count / int account_id / int passwds issue is fixed.
the "account" / "accnt" issue is fixed.
in steveathon/cups there are two results on an int *password_tries along with the char *password both output with %p for some reason. I'm not quite sure what the point of this is but password_tries ought to be safe and %p will not output the password. - now fixed.
in rockdaboot/gnutls and gnutls / GnuTLS there are some fprintfs which I'm fairly sure are outputting the name of a password file, not the password itself. This is what the "%conf%" heuristic I tried was meant to exclude, but it turned out not to be terribly predictive of the issue. I'm tempted to add an exclusion for any variable that's also used in an fopen, though this seems a bit specific. - now fixed.
~~similarly in pmarkowsky/vulnerable-wu-ftpd-2.6.2 there's also a result on a password file name, in this case it's called passwdpath and is passed to stat and fopen.~~ - now fixed.

These are fairly rare FPs, but I think I'm going to make one more revision.

geoffw0 · 2021-07-22T17:33:22Z

I've pushed some more changes, and confirmed they work on the intended targets here: https://lgtm.com/query/1386328828902572629/

geoffw0 · 2021-07-22T17:47:09Z

CPP-Differences job in progress: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2168/ (I'm mainly interested in checking that any performance cost is modest).

rdmarsh2 · 2021-07-23T00:05:29Z

+/**
+ * An operation on a filename.
+ */
+predicate filenameOperation(FunctionCall op, Expr path) {


Could this be shared with the TOCTOU query somehow, or are they too different?

Yes, now that the TOCTOU changes are merged I should be able to do this...

On reflection I think the right way to do this is to add a new 'FileOperation' or 'Files' model to 'models', with methods such as getAFilenameParameter (and perhaps getAFileDescriptorParameter) exposing the information we need. I'd rather do this in a separate PR, preferably after we've had the public/private model implementations discussion as well.

Alternatively I could just move the filenameOperation, accessCheck and stat predicates into a common library, but I'm not sure where is appropriate, and we'll end up maintaining this interface.

Ticket for this: https://github.com/github/codeql-c-team/issues/607

Let's wait for that discussion to finish and do it in a separate PR.

geoffw0 · 2021-07-26T10:53:21Z

According to CPP-Differences there are some possible slowdowns:

Security/CWE/CWE-190/ArithmeticUncontrolled.ql	9005	9137	+132
Security/CWE/CWE-311/CleartextFileWrite.ql	41	78	+37
Security/CWE/CWE-290/AuthenticationBypass.ql	1238	1272	+34
Security/CWE/CWE-022/TaintedPath.ql	2979	3005	+26
Security/CWE/CWE-191/UnsignedDifferenceExpressionComparedZero.ql	267	290	+23
Security/CWE/CWE-311/CleartextBufferWrite.ql	501	519	+18

The changes for CleartextFileWrite.ql and CleartextBufferWrite.ql (which also uses SensitiveExprs.qll) are easily explained and look acceptable. The other changes don't seem to relate to this PR at all, and we've been seeing a lot of wobble lately so I'm inclined to dismiss them ... but @MathiasVP didn't you experience something odd with ArithmeticUncontrolled.ql lately?

There are also two result changes, both in kamailio/kamailio. They look good to me.

MathiasVP · 2021-07-26T11:05:38Z

The other changes don't seem to relate to this PR at all, and we've been seeing a lot of wobble lately so I'm inclined to dismiss them ... but @MathiasVP didn't you experience something odd with ArithmeticUncontrolled.ql lately?

Yes. ArithmeticUncontrolled.ql will re-evaluate the IR unless your branch includes #6347. I don't think you have that PR includes in either of your runs so that performance-problem problem should cancel out.

It does mean, however, that the ArithmeticUncontrolled.ql query will compute quite a lot of stuff. So I would expect quite a lot of noise given how wobbly Jenkins has become. So I don't think the slowdown in ArithmeticUncontrolled.ql should be a cause for concern.

geoffw0 · 2021-07-26T11:50:11Z

Thanks. Then I think this is ready to merge, assuming agreement with my plans for filenameOperation.

geoffw0 added 5 commits July 13, 2021 17:32

C++: More test cases.

1339533

C++: Fix some easy FPs.

7500d75

C++: Add simple dataflow to the query.

652f903

C++: Change note.

dd03828

C++: Increase the query precision.

9896339

geoffw0 added the C++ label Jul 13, 2021

geoffw0 requested a review from a team as a code owner July 13, 2021 17:47

github-actions Bot added the documentation label Jul 13, 2021

geoffw0 added 3 commits July 15, 2021 14:25

C++: Tune SensitiveExprs.qll based on real TP and FP results.

aabb2fc

C++: More test cases.

dd95c53

C++: Exclude integral types from SensitiveExprs.

e5e8a1b

geoffw0 added 5 commits July 22, 2021 15:47

C++: More test cases and correct an existing one.

86ee5fe

C++: Exclude 'path'.

1d58218

C++: Exclude results that are used as file names.

f8fed26

C++: Exclude results formatted with a character other than %s.

e9b96ad

C++: Autoformat.

d9682aa

rdmarsh2 reviewed Jul 23, 2021

View reviewed changes

rdmarsh2 approved these changes Jul 26, 2021

View reviewed changes

rdmarsh2 merged commit fbb3f2e into github:main Jul 26, 2021

Conversation

geoffw0 commented Jul 13, 2021

Uh oh!

rdmarsh2 commented Jul 14, 2021

Uh oh!

MathiasVP commented Jul 14, 2021

Uh oh!

geoffw0 commented Jul 14, 2021

Uh oh!

geoffw0 commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 commented Jul 15, 2021

Uh oh!

geoffw0 commented Jul 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 commented Jul 22, 2021

Uh oh!

geoffw0 commented Jul 22, 2021

Uh oh!

rdmarsh2 Jul 23, 2021

Choose a reason for hiding this comment

Uh oh!

geoffw0 Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

geoffw0 Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

geoffw0 Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

geoffw0 commented Jul 26, 2021

Uh oh!

MathiasVP commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 commented Jul 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

geoffw0 commented Jul 15, 2021 •

edited

Loading

geoffw0 commented Jul 22, 2021 •

edited

Loading

MathiasVP commented Jul 26, 2021 •

edited

Loading