Data flow: Add summary/return context to pruning stages 2-4 #11087

hvitved · 2022-11-02T15:17:34Z

Description

We recently experienced a significant slowdown on a Ruby project, as a result of merely adding a few more flow summaries.

Since flow summaries are compiled down to synthesized callables, it means we are doing a lot of flow through (into + out of) callables, which made me think we should perhaps add more precision to the pruning stages (except the initial stage 1).

This PR does exactly that:

The first commit adds a new column for tracking the parameter from which data originates in forwards pruning, but only when the previous pruning stage suggests that there may be flow trough that parameter. This extra information allows us to restrict flow through.
The second commit replaces the boolean toReturn column with a return context column in reverse pruning, but only when the previous pruning stage suggests that there may be flow trough that return. This extra information allows us to restrict flow through.
The third commit accounts for return nodes with multiple return kinds (see commit message for an example). While the last pruning stage does handle multiple return kinds correctly, the stages 2-4 did not, and we were potentially missing out on early pruning on disallowed self-flow trough a parameter.

Performance

Performance is mostly unchanged, except for the problematic project mentioned above, but also the C++ project vim seems to benefit a lot from this change.

Tuple counts for `RegExprConfiguration` on canvas-lms.

Before

stage	nodes	fields	conscand	states	tuples	config
1 Fwd	1331887	8913	-1	1	1371258	RegExpConfiguration
1 Rev	455651	7350	-1	1	470032	RegExpConfiguration
2 Fwd	65545	1604	2104	1	110390	RegExpConfiguration
2 Rev	48352	1288	1679	1	59156	RegExpConfiguration
3 Fwd	24160	817	61964	1	34918517	RegExpConfiguration
3 Rev	18629	749	52933	1	2901239	RegExpConfiguration
4 Fwd	15222	486	5681	1	769663	RegExpConfiguration
4 Rev	7969	259	3247	1	250867	RegExpConfiguration
5 Fwd	7704	256	1400	1	68631	RegExpConfiguration
5 Rev	203	12	14	1	211	RegExpConfiguration

After

stage	nodes	fields	conscand	states	tuples	config
1 Fwd	1331978	8913	-1	1	1371415	RegExpConfiguration
1 Rev	455740	7351	-1	1	470146	RegExpConfiguration
2 Fwd	63344	1544	2005	1	104142	RegExpConfiguration
2 Rev	46884	1237	1607	1	56454	RegExpConfiguration
3 Fwd	23556	795	52179	1	19231076	RegExpConfiguration
3 Rev	17992	730	43641	1	1216547	RegExpConfiguration
4 Fwd	14924	474	2559	1	213137	RegExpConfiguration
4 Rev	6109	231	1069	1	36373	RegExpConfiguration
5 Fwd	5855	229	851	1	24881	RegExpConfiguration
5 Rev	203	12	14	1	211	RegExpConfiguration

Tuple counts for `ExecTaintConfiguration` on vim.

Before

stage	nodes	fields	conscand	states	tuples	config
1 Fwd	750805	1087	-1	89	972795	ExecTaintConfiguration
1 Rev	236960	647	-1	49	308037	ExecTaintConfiguration
2 Fwd	206142	430	563	42	17322804	ExecTaintConfiguration
2 Rev	138396	271	309	42	8429954	ExecTaintConfiguration
3 Fwd	40022	186	341	42	4484835	ExecTaintConfiguration
3 Rev	36481	168	259	42	2628448	ExecTaintConfiguration
4 Fwd	35660	163	466	42	4354790	ExecTaintConfiguration
4 Rev	34818	159	457	42	4026259	ExecTaintConfiguration
5 Fwd	28664	125	418	30	3396948	ExecTaintConfiguration
5 Rev	18949	111	323	30	2696213	ExecTaintConfiguration

After

stage	nodes	fields	conscand	states	tuples	config
1 Fwd	750805	1087	-1	89	972795	ExecTaintConfiguration
1 Rev	236960	647	-1	49	308037	ExecTaintConfiguration
2 Fwd	175570	339	413	34	11808959	ExecTaintConfiguration
2 Rev	101983	179	198	34	5013273	ExecTaintConfiguration
3 Fwd	26404	123	214	33	2269729	ExecTaintConfiguration
3 Rev	24422	118	188	33	1362670	ExecTaintConfiguration
4 Fwd	23043	116	390	33	2968707	ExecTaintConfiguration
4 Rev	22575	113	384	33	2793791	ExecTaintConfiguration
5 Fwd	21976	109	361	30	3744345	ExecTaintConfiguration
5 Rev	18805	108	360	30	3504032	ExecTaintConfiguration

…estricting flow through For example, flow out via parameters allows for return nodes with multiple return kinds: ```csharp void SetXOrY(C x, C y, bool b) { C c = x; if (b) c = y; c.Field = taint; // post-update node for `c` has two return kinds } ```

MathiasVP · 2022-11-08T09:58:54Z

Performance is mostly unchanged, except for the problematic project mentioned above, but also the C++ project vim seems to benefit a lot from this change.

FWIW, we saw a rather large C/C++ performance regression on vim/vim when we added global dataflow (because vim apparently really likes global variables), and I think this PR brings back that performance 🎉.

github-actions bot added C# C++ DataFlow Library Java Python Ruby Swift labels Nov 2, 2022

hvitved force-pushed the dataflow/summary-ctx branch 4 times, most recently from 02437cc to 2fa9cae Compare Nov 4, 2022

hvitved changed the title ~~Data flow: Add summary context to pruning stages 2-4~~ Data flow: Add summary/return context to pruning stages 2-4 Nov 4, 2022

hvitved added 4 commits Nov 7, 2022

Data flow: Add summary context to pruning stages 2-4

68c2ad3

Data flow: Add return context to pruning stages 2-4

1653e45

Data flow: Sync files

5010dbc

hvitved force-pushed the dataflow/summary-ctx branch from 2fa9cae to 5010dbc Compare Nov 7, 2022

hvitved added the no-change-note-required This PR does not need a change note label Nov 8, 2022

hvitved marked this pull request as ready for review Nov 8, 2022

hvitved requested review from a team as code owners Nov 8, 2022

hvitved assigned aschackmull Nov 8, 2022

Data flow: Add summary/return context to pruning stages 2-4 #11087

Data flow: Add summary/return context to pruning stages 2-4 #11087

hvitved commented Nov 2, 2022 •

edited

MathiasVP commented Nov 8, 2022 •

edited

Data flow: Add summary/return context to pruning stages 2-4 #11087

Are you sure you want to change the base?

Data flow: Add summary/return context to pruning stages 2-4 #11087

Conversation

hvitved commented Nov 2, 2022 • edited

Description

Performance

Tuple counts for RegExprConfiguration on canvas-lms.

Tuple counts for ExecTaintConfiguration on vim.

MathiasVP commented Nov 8, 2022 • edited

hvitved commented Nov 2, 2022 •

edited

Tuple counts for `RegExprConfiguration` on canvas-lms.

Tuple counts for `ExecTaintConfiguration` on vim.

MathiasVP commented Nov 8, 2022 •

edited