C++: Fix dataflow inconsistencies #15040

MathiasVP · 2023-12-07T23:07:42Z

For some reason I made the incorrect decision to have the post-update node for a field write, and the post-update node for an argument node, as two separate dataflow nodes. That doesn't make a lot of sense since the two aren't mutually exclusive. For example, you can have a field write to an argument node in a situation like:

void set_field(int*);
...
struct S {
  int* p;
} s;
set_field(s.p);

This PR cleans up this situation by merging those two dataflow nodes into a single IPA branch. This gets rid of a bunch of inconsistency errors 🎉

1�7 single IPA branch.

jketema

Two questions.

jketema · 2023-12-08T09:38:55Z

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll

@@ -84,7 +83,7 @@ private predicate parameterIsRedefined(Parameter p) {
 class FieldAddress extends Operand {
  FieldAddressInstruction fai;

-  FieldAddress() { fai = this.getDef() }
+  FieldAddress() { fai = this.getDef() and not Ssa::ignoreOperand(this) }


Why is this change needed?

Otherwise we end up generating dataflow nodes that aren't used anywhere: https://github.com/github/codeql/pull/15040/files#diff-e9e4c8dbaa54c9f16e1533c09bb66ff3ec456e747f98de139672cfa0412a463cR40-R42. Consider, for exapmle, this:

void f(int*); struct S { int x; }; void test() { S s; f(&s.x); }

the IR will look like:

r1 = &s.x; r2 = call to f : r1 m3 = WriteSideEffect : r1 ...

And that WriteSideEffect isn't used for dataflow (and neither is its operand), so we shouldn't generate a PostUpdateNode for that r1 operand occurring on the WriteSideEffect.

Ah, it's because we now do operand = any(FieldAddress fa).getObjectAddressOperand() instead of having a node for the field address.

Exactly, yeah.

jketema · 2023-12-08T09:53:15Z

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll

-  override Expr getDefinedExpr() {
-    result = fieldAddress.getObjectAddress().getUnconvertedResultExpression()
+  final override Node getPreUpdateNode() { hasOperandAndIndex(result, operand, indirectionIndex) }
+
+  final override Expr getDefinedExpr() {
+    result = operand.getDef().getUnconvertedResultExpression()
  }
+}


Do I understand correctly that most of the test changes are due to the definition of getDefinedExpr now being different for fields?

I don't think so, no. The behavior of getDefinedExpr is basically never used in any queries or tests. The only place I can think of is to implement the asPartialDefinition predicate, which also really isn't used by any of our queries.

A few small things that are used by queries did change, though. In particular, the location of a PostFieldUpdateNode used to be given by the location of the field, whereas now it's given by the location of the qualifier.

And slightly more subtle: The post-update node used to be its own dataflow node (i.e., the PostUpdateFieldNode IPA branch), and the pre-update node of that used to be the qualifier of the field. Now, the post-update node is a PostUpdateNodeImpl IPA branch, and the pre-update node is the field itself. This shouldn't matter for queries if they're not touching internal stuff, though.

MathiasVP requested a review from a team as a code owner December 7, 2023 23:07

github-actions bot added the C++ label Dec 7, 2023

MathiasVP added the no-change-note-required This PR does not need a change note label Dec 7, 2023

MathiasVP added 2 commits December 7, 2023 23:11

C++: Merge 'PostUpdateFieldNode' and 'IndirectArgumentOutNode' into a 1�7

d6871c7

1�7 single IPA branch.

C++: Accept test changes.

e648058

MathiasVP force-pushed the fewer-dataflow-branches branch from 9cc0853 to e648058 Compare December 7, 2023 23:11

MathiasVP added 2 commits December 8, 2023 09:29

C++: Accept more test changes.

1c73d43

Merge branch 'main' into fewer-dataflow-branches

7b83947

MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Dec 8, 2023

jketema reviewed Dec 8, 2023

View reviewed changes

Merge branch 'main' into fewer-dataflow-branches

90a62b2

jketema approved these changes Dec 8, 2023

View reviewed changes

MathiasVP merged commit 30c67ba into github:main Dec 8, 2023
13 of 14 checks passed

C++: Fix dataflow inconsistencies #15040

C++: Fix dataflow inconsistencies #15040

MathiasVP commented Dec 7, 2023

jketema left a comment

jketema Dec 8, 2023

MathiasVP Dec 8, 2023

jketema Dec 8, 2023

MathiasVP Dec 8, 2023

jketema Dec 8, 2023

MathiasVP Dec 8, 2023

C++: Fix dataflow inconsistencies #15040

C++: Fix dataflow inconsistencies #15040

Conversation

MathiasVP commented Dec 7, 2023

jketema left a comment

Choose a reason for hiding this comment

jketema Dec 8, 2023

Choose a reason for hiding this comment

MathiasVP Dec 8, 2023

Choose a reason for hiding this comment

jketema Dec 8, 2023

Choose a reason for hiding this comment

MathiasVP Dec 8, 2023

Choose a reason for hiding this comment

jketema Dec 8, 2023

Choose a reason for hiding this comment

MathiasVP Dec 8, 2023

Choose a reason for hiding this comment