Skip to content

Conversation

@MathiasVP
Copy link
Contributor

This PR fixes two conflation issues that were giving us a bunch of FPs on https://github.com/lief-project/lief.

The first fix is very simple (see 153df2c), and the second fix took a couple of days of debugging (see 7c32721) 😂.

Commit-by-commit review recommended.

@MathiasVP MathiasVP requested a review from jketema June 9, 2023 14:36
@MathiasVP MathiasVP requested a review from a team as a code owner June 9, 2023 14:36
@github-actions github-actions bot added the C++ label Jun 9, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jun 9, 2023
Comment on lines +724 to +736
void does_not_write_source_to_dereference(int *p) // $ ast-def=p ir-def=*p
{
int x = source();
p = &x;
*p = 42;
}

void test_does_not_write_source_to_dereference()
{
int x;
does_not_write_source_to_dereference(&x);
sink(x); // $ ast,ir=733:7 SPURIOUS: ast,ir=726:11
}
Copy link
Contributor Author

@MathiasVP MathiasVP Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a testcase I came up with as an attempt to further reduce the testcase added in 90ffb45. However, it looks like this is a different conflation problem since test this wasn't fixed by this PR.

To be clear: This isn't a regression caused by this PR. I just tagged it along here to avoid a merge conflict with this PR.

@jketema
Copy link
Contributor

jketema commented Jun 9, 2023

We seem to be missing some taint flow now (see missing DCA results).

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

@jketema
Copy link
Contributor

jketema commented Jun 12, 2023

DCA nightly suite result changes:

  • Changes in SAMATE: some results disappear that should have been blocked by isBarrierIn in cpp/cleartext-transmission, but weren't.
  • The vim results losses for cpp/command-line-injection and cpp/path-injection seem genuine lost results. It's currently not clear to me why these are lost.
  • The new apache/httpd results look like genuine results.

DCA MCTV suite result changes:

  • Erlang OTP changes were FPs that now disappeared. Previously, at some point in the path is started confusing the allocated array with a constant one.
  • vim cpp/constant-array-overflow: we lose all but 69 results. It seems we might have broken something related to global variables?

@MathiasVP
Copy link
Contributor Author

vim cpp/constant-array-overflow: we lose all but 69 results. It seems we might have broken something related to global variables?

That would explain the large performance improvement on vim 😅

@jketema
Copy link
Contributor

jketema commented Jun 12, 2023

vim cpp/constant-array-overflow: we lose all but 69 results. It seems we might have broken something related to global variables?

That would explain the large performance improvement on vim 😅

Question is what. The vim dataflow paths are rather horrendous as usual.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jun 12, 2023

If you want to get started on debugging this while I'm away I suggest:

  1. Pick the simplest possible path we lose
  2. Make the paths as explicit as possible by setting castnode to any and hidden nodes to none.
  3. do a "binary search" with a partial flow version of the query to see which step is gone. We already have a hypothesis that this is related to global, so maybe you can make this debugging search slightly more clever
  4. Once we know which step is missing we can do a small test case and debug it like we did last week

1-3 is probably a whole day kind of task already. So if you're too busy with other things feel free to leave it for me to do once I'm back

@jketema
Copy link
Contributor

jketema commented Jun 12, 2023

So if you're too busy with other things feel free to leave it for me to do once I'm back

I think I might just give it a go.

@jketema
Copy link
Contributor

jketema commented Jun 13, 2023

So if you're too busy with other things feel free to leave it for me to do once I'm back

I think I might just give it a go.

Gave it a go, but wasn't able to make any progress. I tried restricting the source of one of the problematic queries as much possible, and tried making is start later on a path that is disappearing. However, even with this partial flow is basically not computable on my machine.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jun 18, 2023

Thanks for all the investigations on this Jeroen! I've pushed a reduced testcase demonstrating the missing flow on vim (turns out our model for strncpy has been broken for who-knows-how-long 😬), and we've been saved because of the accidental conflation that we've now fixed.

Luckily, the fix was super easy. I'll start another DCA run to see what the impact of this is.

@jketema
Copy link
Contributor

jketema commented Jun 18, 2023

I don't think the strncpy fix is sufficient. Running cpp/constant-array-overflow with the updated branch on vim, I only get 69 results instead of the 1000s that were there before. Or were those all FPs?

@MathiasVP
Copy link
Contributor Author

Good point. Yeah, the strncpy fix was strictly a taintflow fix, and that query uses only dataflow. So we still need to figure out if those lost results are because we broke something, or because we fixed conflation issues

@MathiasVP
Copy link
Contributor Author

Investigation so far: It looks like we're losing all the vim results that come from the uf_name field. These are indeed all FPs (because it's a flexible array member). But I'm still trying to figure out why we lose them.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jun 19, 2023

I've done a spot check of some of the lost results and it does look like the it's caused by the now-fixed conflation 🎉. For example, the flow starts here and then moves to:

And at this point we're suddenly tracking the indirection now (instead of the pointer). And after this PR this is no longer happening 🎉.

@MathiasVP MathiasVP force-pushed the fix-more-conflation-in-dataflow branch from 1ec3eb9 to 992af55 Compare June 22, 2023 09:59
@MathiasVP
Copy link
Contributor Author

DCA looks great! We got the lost result on vim back, and we're still seeing a 43% performance improvement 😮. I'll add a testcase that demonstrates the effect of 992af55 (via a force-push), but then I think this PR is good to go 🎉

@jketema
Copy link
Contributor

jketema commented Jun 22, 2023

I'll add a testcase

There is also a test regression that need to be looked at.

@MathiasVP
Copy link
Contributor Author

Thanks for the heads up. That looks like 992af55 fixed a missing flow, but I'll double check to verify that.

@MathiasVP MathiasVP force-pushed the fix-more-conflation-in-dataflow branch from 992af55 to aca4716 Compare June 22, 2023 16:40
@MathiasVP
Copy link
Contributor Author

@jketema this PR should be all ready now. I've force-pushed a testcase that demonstrates the missing flow we saw on vim, but otherwise nothing has changed since the DCA run.

The query test change was just some changes to path explanations. So nothing major to see there.

@MathiasVP MathiasVP force-pushed the fix-more-conflation-in-dataflow branch from aca4716 to 79fb6a6 Compare June 22, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants