Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data flow: Add summary/return context to pruning stages 2-4 #11087

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Nov 2, 2022

Description

We recently experienced a significant slowdown on a Ruby project, as a result of merely adding a few more flow summaries.

Since flow summaries are compiled down to synthesized callables, it means we are doing a lot of flow through (into + out of) callables, which made me think we should perhaps add more precision to the pruning stages (except the initial stage 1).

This PR does exactly that:

  • The first commit adds a new column for tracking the parameter from which data originates in forwards pruning, but only when the previous pruning stage suggests that there may be flow trough that parameter. This extra information allows us to restrict flow through.
  • The second commit replaces the boolean toReturn column with a return context column in reverse pruning, but only when the previous pruning stage suggests that there may be flow trough that return. This extra information allows us to restrict flow through.
  • The third commit accounts for return nodes with multiple return kinds (see commit message for an example). While the last pruning stage does handle multiple return kinds correctly, the stages 2-4 did not, and we were potentially missing out on early pruning on disallowed self-flow trough a parameter.

Performance

Performance is mostly unchanged, except for the problematic project mentioned above, but also the C++ project vim seems to benefit a lot from this change.

Tuple counts for RegExprConfiguration on canvas-lms.

Before
stage nodes fields conscand states tuples config
1 Fwd 1331887 8913 -1 1 1371258 RegExpConfiguration
1 Rev 455651 7350 -1 1 470032 RegExpConfiguration
2 Fwd 65545 1604 2104 1 110390 RegExpConfiguration
2 Rev 48352 1288 1679 1 59156 RegExpConfiguration
3 Fwd 24160 817 61964 1 34918517 RegExpConfiguration
3 Rev 18629 749 52933 1 2901239 RegExpConfiguration
4 Fwd 15222 486 5681 1 769663 RegExpConfiguration
4 Rev 7969 259 3247 1 250867 RegExpConfiguration
5 Fwd 7704 256 1400 1 68631 RegExpConfiguration
5 Rev 203 12 14 1 211 RegExpConfiguration
After
stage nodes fields conscand states tuples config
1 Fwd 1331978 8913 -1 1 1371415 RegExpConfiguration
1 Rev 455740 7351 -1 1 470146 RegExpConfiguration
2 Fwd 63344 1544 2005 1 104142 RegExpConfiguration
2 Rev 46884 1237 1607 1 56454 RegExpConfiguration
3 Fwd 23556 795 52179 1 19231076 RegExpConfiguration
3 Rev 17992 730 43641 1 1216547 RegExpConfiguration
4 Fwd 14924 474 2559 1 213137 RegExpConfiguration
4 Rev 6109 231 1069 1 36373 RegExpConfiguration
5 Fwd 5855 229 851 1 24881 RegExpConfiguration
5 Rev 203 12 14 1 211 RegExpConfiguration

Tuple counts for ExecTaintConfiguration on vim.

Before
stage nodes fields conscand states tuples config
1 Fwd 750805 1087 -1 89 972795 ExecTaintConfiguration
1 Rev 236960 647 -1 49 308037 ExecTaintConfiguration
2 Fwd 206142 430 563 42 17322804 ExecTaintConfiguration
2 Rev 138396 271 309 42 8429954 ExecTaintConfiguration
3 Fwd 40022 186 341 42 4484835 ExecTaintConfiguration
3 Rev 36481 168 259 42 2628448 ExecTaintConfiguration
4 Fwd 35660 163 466 42 4354790 ExecTaintConfiguration
4 Rev 34818 159 457 42 4026259 ExecTaintConfiguration
5 Fwd 28664 125 418 30 3396948 ExecTaintConfiguration
5 Rev 18949 111 323 30 2696213 ExecTaintConfiguration
After
stage nodes fields conscand states tuples config
1 Fwd 750805 1087 -1 89 972795 ExecTaintConfiguration
1 Rev 236960 647 -1 49 308037 ExecTaintConfiguration
2 Fwd 175570 339 413 34 11808959 ExecTaintConfiguration
2 Rev 101983 179 198 34 5013273 ExecTaintConfiguration
3 Fwd 26404 123 214 33 2269729 ExecTaintConfiguration
3 Rev 24422 118 188 33 1362670 ExecTaintConfiguration
4 Fwd 23043 116 390 33 2968707 ExecTaintConfiguration
4 Rev 22575 113 384 33 2793791 ExecTaintConfiguration
5 Fwd 21976 109 361 30 3744345 ExecTaintConfiguration
5 Rev 18805 108 360 30 3504032 ExecTaintConfiguration

@hvitved hvitved force-pushed the dataflow/summary-ctx branch 4 times, most recently from 02437cc to 2fa9cae Compare Nov 4, 2022
@hvitved hvitved changed the title Data flow: Add summary context to pruning stages 2-4 Data flow: Add summary/return context to pruning stages 2-4 Nov 4, 2022
hvitved added 4 commits Nov 7, 2022
…estricting flow through

For example, flow out via parameters allows for return nodes with multiple
return kinds:

```csharp
void SetXOrY(C x, C y, bool b)
{
    C c = x;
    if (b)
        c = y;
    c.Field = taint; // post-update node for `c` has two return kinds
}
```
@hvitved hvitved added the no-change-note-required This PR does not need a change note label Nov 8, 2022
@hvitved hvitved marked this pull request as ready for review Nov 8, 2022
@hvitved hvitved requested review from a team as code owners Nov 8, 2022
@MathiasVP
Copy link
Contributor

MathiasVP commented Nov 8, 2022

Performance is mostly unchanged, except for the problematic project mentioned above, but also the C++ project vim seems to benefit a lot from this change.

FWIW, we saw a rather large C/C++ performance regression on vim/vim when we added global dataflow (because vim apparently really likes global variables), and I think this PR brings back that performance 🎉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants