Skip to content

C++: Improve the CleartextFileWrite query#6273

Merged
rdmarsh2 merged 13 commits into
github:mainfrom
geoffw0:cleartext-storage-file
Jul 26, 2021
Merged

C++: Improve the CleartextFileWrite query#6273
rdmarsh2 merged 13 commits into
github:mainfrom
geoffw0:cleartext-storage-file

Conversation

@geoffw0
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 commented Jul 13, 2021

Improve the CleartextFileWrite.ql query:

  • add test cases inspired by things I found on LGTM.
  • straightforward improvements to the heuristic rules in SensitiveExprs.qll (reduces FPs in this query and a couple of others that also use the same library).
  • add simple dataflow to CleartextFileWrite.ql (increases TPs).
  • promote the query to @precision high.

I'm open to discussion about whether this is really enough to justify @precision high. I've only spent a few hours on these improvements, but the results look good to me (https://lgtm.com/query/5235174286264674280/) and enabling it will modestly improve our SAMATE results as well as flagging this bad practice to users.

@geoffw0 geoffw0 added the C++ label Jul 13, 2021
@geoffw0 geoffw0 requested a review from a team as a code owner July 13, 2021 17:47
@rdmarsh2
Copy link
Copy Markdown
Contributor

A lot of the remaining results look like false positives or wontfix.

  • The tpasswd results on gnutls/gnutls are all one variable that I think is a filename, not an actual password.
  • The git/git result is a credential daemon passing things over a local socket - I think it's OK to have an alert that gets suppressed here.
  • The systemd ones look like they're generated recovery keys, but again, I think it's OK.
  • The zeromq/libzmq result is in test code (and presumably filtered outside the query).

That leaves 5 projects with true positives, 2 where it's a wontfix (but presumably was thought about carefully), and one false positives. I think if we run it on a bigger set of projects, it will look better, but we should actually do that before raising the precision.

@MathiasVP
Copy link
Copy Markdown
Contributor

The changes all LGTM! I agree with @rdmarsh2, though: We should either:

  • Run this PR on a bigger set of projects before we merge it, or
  • Revert the last commit that increases the precision to high and run the query on a bigger set of projects later / wait until the dist upgrade.

In the spirit of less friction, I say we just roll with it and run this PR on all of LGTM, and merge it if the results look good.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 14, 2021

Thanks for the detailed reviews. I'll run it on a bigger set of projects then we can decide whether to merge with or without the precision change...

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 15, 2021

135 of 11,601 projects have results for this query (a little over 1%; we can add more data flow support if we want to find more results).

Reviewing the first 20 projects with results:

  • there are plenty of TPs, e.g. where passwords or recovery keys are written out plaintext to files or logs as they're created.
  • I saw several results in password crackers, which was unexpected! I don't think there's anything wrong with reporting these, it is intended behaviour, but it's also a security issue, though I doubt the maintainers of such projects would want to fix it. The query also is not noisy in those projects (or any others), so I don't anticipate this frustrating any white-hat efforts.
  • some FPs where the type is integer (i64 passwords_max, int password_count, int account_id, int passwds). This is the most common type of FP and should be fixed. - now fixed.
  • I'm not convinced by many of the results featuring "account" / "accnt". Some of them may be TPs but many are not. In the .interests of accuracy I'll remove those strings from SensitiveExprs.qll. - now fixed
  • I'm not convinced "conf" should be an exclusion after all. The motivating case looks like a TP on closer inspection, and there are other cases with similar strings nearby that appear to be TPs as well. - "conf" is no longer a recognized exception.

So I'm going to make a few changes, test, and then probably start another big run.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 15, 2021

I've made the changes, I'll wait for any immediate suggestions before I do another big run.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 22, 2021

Update: 105 out of 11,628 projects now have results for this query. I'd like to find more results, but precision is the priority for now.

Of the first 20 or so projects with results:

  • most of the results are looking good (at least superficially).
  • the i64 passwords_max / int password_count / int account_id / int passwds issue is fixed.
  • the "account" / "accnt" issue is fixed.
  • in steveathon/cups there are two results on an int *password_tries along with the char *password both output with %p for some reason. I'm not quite sure what the point of this is but password_tries ought to be safe and %p will not output the password. - now fixed.
  • in rockdaboot/gnutls and gnutls / GnuTLS there are some fprintfs which I'm fairly sure are outputting the name of a password file, not the password itself. This is what the "%conf%" heuristic I tried was meant to exclude, but it turned out not to be terribly predictive of the issue. I'm tempted to add an exclusion for any variable that's also used in an fopen, though this seems a bit specific. - now fixed.
  • similarly in pmarkowsky/vulnerable-wu-ftpd-2.6.2 there's also a result on a password file name, in this case it's called passwdpath and is passed to stat and fopen. - now fixed.

These are fairly rare FPs, but I think I'm going to make one more revision.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 22, 2021

I've pushed some more changes, and confirmed they work on the intended targets here: https://lgtm.com/query/1386328828902572629/

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 22, 2021

CPP-Differences job in progress: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2168/ (I'm mainly interested in checking that any performance cost is modest).

/**
* An operation on a filename.
*/
predicate filenameOperation(FunctionCall op, Expr path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be shared with the TOCTOU query somehow, or are they too different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now that the TOCTOU changes are merged I should be able to do this...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection I think the right way to do this is to add a new 'FileOperation' or 'Files' model to 'models', with methods such as getAFilenameParameter (and perhaps getAFileDescriptorParameter) exposing the information we need. I'd rather do this in a separate PR, preferably after we've had the public/private model implementations discussion as well.

Alternatively I could just move the filenameOperation, accessCheck and stat predicates into a common library, but I'm not sure where is appropriate, and we'll end up maintaining this interface.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for that discussion to finish and do it in a separate PR.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 26, 2021

According to CPP-Differences there are some possible slowdowns:

Security/CWE/CWE-190/ArithmeticUncontrolled.ql	9005	9137	+132
Security/CWE/CWE-311/CleartextFileWrite.ql	41	78	+37
Security/CWE/CWE-290/AuthenticationBypass.ql	1238	1272	+34
Security/CWE/CWE-022/TaintedPath.ql	2979	3005	+26
Security/CWE/CWE-191/UnsignedDifferenceExpressionComparedZero.ql	267	290	+23
Security/CWE/CWE-311/CleartextBufferWrite.ql	501	519	+18

The changes for CleartextFileWrite.ql and CleartextBufferWrite.ql (which also uses SensitiveExprs.qll) are easily explained and look acceptable. The other changes don't seem to relate to this PR at all, and we've been seeing a lot of wobble lately so I'm inclined to dismiss them ... but @MathiasVP didn't you experience something odd with ArithmeticUncontrolled.ql lately?

There are also two result changes, both in kamailio/kamailio. They look good to me.

@MathiasVP
Copy link
Copy Markdown
Contributor

MathiasVP commented Jul 26, 2021

The other changes don't seem to relate to this PR at all, and we've been seeing a lot of wobble lately so I'm inclined to dismiss them ... but @MathiasVP didn't you experience something odd with ArithmeticUncontrolled.ql lately?

Yes. ArithmeticUncontrolled.ql will re-evaluate the IR unless your branch includes #6347. I don't think you have that PR includes in either of your runs so that performance-problem problem should cancel out.

It does mean, however, that the ArithmeticUncontrolled.ql query will compute quite a lot of stuff. So I would expect quite a lot of noise given how wobbly Jenkins has become. So I don't think the slowdown in ArithmeticUncontrolled.ql should be a cause for concern.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Jul 26, 2021

Thanks. Then I think this is ready to merge, assuming agreement with my plans for filenameOperation.

@rdmarsh2 rdmarsh2 merged commit fbb3f2e into github:main Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants