Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Ensure only one Variable exists for every global variable #9700

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jketema
Copy link
Contributor

@jketema jketema commented Jun 24, 2022

Depending on the extraction order, before this change there might be multiple GlobalVariables per declared global variable. See the tests in cpp/ql/test/library-tests/variables/global. This change ensures that only one of those GlobalVariables is visible to the user if we can locate a unique definition. If not, the old situation persists.

Note that an exception needs to be made for templated variables. Here, the definition refers to the non-instantiated template, while a declaration that is not a definition refers to an instantiation. In case the instantiation refers to a template parameter, the mangled names of the template and the instantiation will be identical. This happens for example in the following case:

template <typename T>
T x = T(42);           // Uninstantiated templated variable

template <typename T>
class C {
  T y = x<T>;          // Instantiation using a template parameter
};

Since the uninstantiated template and the instantiation are two different entities, we do not unify them as described above.

To note:

  • ResolveGlobalVariable.qll was resolved from ResolveClass.qll, so it's fairly aggressive with pragma[noinline]. I have not checked what happens performance-wise when I remove them. Let me know if it's worth checking this.
  • Depends on two internal PRs, so CI will currently fail on the updated test.
  • The above template case is covered by cpp/ql/test/library-tets/templates/variables

@jketema jketema added the depends on internal PR label Jun 24, 2022
@jketema jketema requested a review from as a code owner Jun 24, 2022
@jketema
Copy link
Contributor Author

@jketema jketema commented Jun 24, 2022

Note that this also maps the following declarations of a onto a single variable of type int

// a.cpp 
int a;
// b.cpp
extern long a;

This is consistent with linker behaviour, but might not be desirable from a CodeQl perspective? Note that the there will still be two VariableDeclarationEntries: one of type int and one of type long.

@sashabu
Copy link
Contributor

@sashabu sashabu commented Jun 27, 2022

Note that this also maps the following declarations of a onto a single variable of type int

// a.cpp 
int a;
// b.cpp
extern long a;

This is consistent with linker behaviour, but might not be desirable from a CodeQl perspective? Note that the there will still be two VariableDeclarationEntries: one of type int and one of type long.

Assuming these are linked together, that's an ODR violation. I don't think we need to go to great lengths to analyse code with ODR violations considering it's UB and we don't even have enough information to guess at the symptoms (e.g. if int is 16 bits and long is 32 bits, we have no way of guessing what other object a write to extern long a; might clobber in practice, or indeed whether optimisations accidentally "unclobber" an overlapping object).

If we do want to "handle" such cases, I think the best thing to do would be to detect ODR violations in a dedicated query and assume they're not present in all other queries. Since as you say there are two VariableDeclarationEntry values, we have enough information for this.

However what happens if we tweak the example to be valid C++ by giving a.cpp's a internal linkage (e.g. const int a; or static int a;)? I'm hoping name mangling takes care of giving the two objects different names, but it might be good to add a test for this.

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jun 27, 2022

I think I agree with @sashabu that we shouldn't try to mitigate ODR issues like this on the extractor side. However, I do have a comment with regards to this:

If we do want to "handle" such cases, I think the best thing to do would be to detect ODR violations in a dedicated query and assume they're not present in all other queries.

ODR violations have given us plenty of performance issues in the past because people didn't know their code contained ODR violations, and it turned some analyses into awful exponential-time algorithms.

Luckily, they're normally quite simple fixes on the QL side (see for example the old GVN library and the new GVN library for some of the mitigation stuff we've done to guard ourselves against stuff like "a variable with multiple types", or "a field lookup returning multiple fields").

So we can't quite ignore ODR issues in the analyses because we've alerted about it in some other query, since the presence of an ODR violation can prevent the suite from completing.

@sashabu
Copy link
Contributor

@sashabu sashabu commented Jun 27, 2022

So we can't quite ignore ODR issues in the analyses because we've alerted about it in some other query, since the presence of an ODR violation can prevent the suite from completing.

@MathiasVP - I think having the two VariableDeclarationEntry values is sufficient for this?

You're quite right that my comment was overly general. More precisely, what I was trying to say is that it's good to have a way to detect ODR violations, but we don't necessarily need to try and model the semantics of a C++-like language where ODR violations are well-defined (e.g. by defining differently-typed "overloads" of the variable). I think we're in agreement on that?

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jun 27, 2022

You're quite right that my comment was overly general. More precisely, what I was trying to say is that it's good to have a way to detect ODR violations, but we don't necessarily need to try and model the semantics of a C++-like language where ODR violations are well-defined (e.g. by defining differently-typed "overloads" of the variable). I think we're in agreement on that?

Yes, totally in agreement on that 😄.

@jketema
Copy link
Contributor Author

@jketema jketema commented Jun 27, 2022

dedicated query

We have one internally.

@jketema
Copy link
Contributor Author

@jketema jketema commented Jun 27, 2022

However what happens if we tweak the example to be valid C++ by giving a.cpp's a internal linkage (e.g. const int a; or static int a;)?

Global variables with internal linkage have different name mangling. One of the internal fixes for this PR actually corrected some issues there and includes tests.

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jun 28, 2022

ResolveGlobalVariable.qll was resolved from ResolveClass.qll, so it's fairly aggressive with pragma[noinline]. I have not checked what happens performance-wise when I remove them. Let me know if it's worth checking this.

I think the aggressive use of pragma[noinline] in ResolveClass.qll is due to the fact that this file was written back when the optimizer was dumber (and we didn't have pragma[only_bind_into] to force join orders). So I would guess that many of the noinlines could be removed or replaced with pragma[only_bind_into] for a slight performance win. But that's probably an issue for another PR. If the performance ends up looking good I think it makes sense to just copy the style into this PR.

jketema added 2 commits Jun 28, 2022
Depending on the extraction order, before this change there might be multiple
`GlobalVariable`s per declared global variable. See the tests in
`cpp/ql/test/library-tests/variables/global`. This change ensures that only one
of those `GlobalVariable`s is visible to the user if we can locate a unique
definition. If not, the old situation persists.

Note that an exception needs to be made for templated variables. Here, the
definition refers to the non-instantiated template, while a declaration that
is not a definition refers to an instantiation. In case the instantiation refers
to a template parameter, the mangled names of the template and the instantiation
will be identical. This happens for example in the following case:
```
template <typename T>
T x = T(42);           // Uninstantiated templated variable

template <typename T>
class C {
  T y = x<T>;          // Instantiation using a template parameter
};
```
Since the uninstantiated template and the instantiation are two different
entities, we do not unify them as described above.
Copy link
Contributor

@MathiasVP MathiasVP left a comment

This LGTM :shipit:!

I'll approve the PR and leave the merging to whoever merges the internal PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants