Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Fix dataflow inconsistencies #15040

Merged
merged 5 commits into from Dec 8, 2023

Conversation

MathiasVP
Copy link
Contributor

For some reason I made the incorrect decision to have the post-update node for a field write, and the post-update node for an argument node, as two separate dataflow nodes. That doesn't make a lot of sense since the two aren't mutually exclusive. For example, you can have a field write to an argument node in a situation like:

void set_field(int*);
...
struct S {
  int* p;
} s;
set_field(s.p);

This PR cleans up this situation by merging those two dataflow nodes into a single IPA branch. This gets rid of a bunch of inconsistency errors 🎉

@MathiasVP MathiasVP requested a review from a team as a code owner December 7, 2023 23:07
@github-actions github-actions bot added the C++ label Dec 7, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Dec 7, 2023
@MathiasVP MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Dec 8, 2023
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions.

@@ -84,7 +83,7 @@ private predicate parameterIsRedefined(Parameter p) {
class FieldAddress extends Operand {
FieldAddressInstruction fai;

FieldAddress() { fai = this.getDef() }
FieldAddress() { fai = this.getDef() and not Ssa::ignoreOperand(this) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we end up generating dataflow nodes that aren't used anywhere: https://github.com/github/codeql/pull/15040/files#diff-e9e4c8dbaa54c9f16e1533c09bb66ff3ec456e747f98de139672cfa0412a463cR40-R42. Consider, for exapmle, this:

void f(int*);

struct S { int x; };

void test() {
  S s;  
  f(&s.x);
}

the IR will look like:

r1 = &s.x;
r2 = call to f       : r1
m3 = WriteSideEffect : r1
...

And that WriteSideEffect isn't used for dataflow (and neither is its operand), so we shouldn't generate a PostUpdateNode for that r1 operand occurring on the WriteSideEffect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's because we now do operand = any(FieldAddress fa).getObjectAddressOperand() instead of having a node for the field address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, yeah.

Comment on lines -579 to +575
override Expr getDefinedExpr() {
result = fieldAddress.getObjectAddress().getUnconvertedResultExpression()
final override Node getPreUpdateNode() { hasOperandAndIndex(result, operand, indirectionIndex) }

final override Expr getDefinedExpr() {
result = operand.getDef().getUnconvertedResultExpression()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that most of the test changes are due to the definition of getDefinedExpr now being different for fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no. The behavior of getDefinedExpr is basically never used in any queries or tests. The only place I can think of is to implement the asPartialDefinition predicate, which also really isn't used by any of our queries.

A few small things that are used by queries did change, though. In particular, the location of a PostFieldUpdateNode used to be given by the location of the field, whereas now it's given by the location of the qualifier.

And slightly more subtle: The post-update node used to be its own dataflow node (i.e., the PostUpdateFieldNode IPA branch), and the pre-update node of that used to be the qualifier of the field. Now, the post-update node is a PostUpdateNodeImpl IPA branch, and the pre-update node is the field itself. This shouldn't matter for queries if they're not touching internal stuff, though.

@MathiasVP MathiasVP merged commit 30c67ba into github:main Dec 8, 2023
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants