Skip to content

Python: Add annotated call-graph tests#3790

Merged
tausbn merged 7 commits intogithub:masterfrom
RasmusWL:python-add-annotated-callgraph-tests
Jul 10, 2020
Merged

Python: Add annotated call-graph tests#3790
tausbn merged 7 commits intogithub:masterfrom
RasmusWL:python-add-annotated-callgraph-tests

Conversation

@RasmusWL
Copy link
Member

Start off by reading the README added for explanation :)


The tests included here is by no means complete yet. I wanted to get the approval from the rest of the team before investing more time into it. Please let me know @tausbn and @yoff 😊

I have made sure that all the added Python files are runnable, so we can manually inspect that the annotations are correct.

I would at least like to port all call related cases from https://github.com/github/codeql/tree/master/python/ql/test/library-tests/PointsTo/regressions, although not all of them are critical to handle right away, I think this should the right place to track such instances going forwards.

I also want to point out that I added tests that show the error handling works in a good way. I moved this out of the regular tests because it polluted everything. However, getting the error reporting right was not easy, so I wanted to keep it in there to show it actually works 😄

For further discussion:

Going forwards (if you all agree that this is nice), we need to figure out how to handle these 2 cases

1) "magic" methods in Python

How to handle special methods in Python. For example, if we have obj = MyClass(), then str(obj) would call __str__ if that is defined on MyClass. This can also happen for accessing/assigning/deleting object properties with @property, and many other cases, such as obj + 1 using __add__.

Our current setup of Value.getACall() returning a CallNode doesn't fully support this (since these aren't calls), so although I have included a test file for this in class_advanced.py, I have not annotated anything (so nothing is tested).

2) Class construction

Currently the edges I consider that points-to can resolve is very simple:

class PointsToResolver extends CallGraphResolver, TPointsToResolver {
    override predicate callEdge(Call call, Function callable) {
        exists(PythonFunctionValue func_value |
            func_value.getScope() = callable and
            call = func_value.getACall().getNode()
        )
    }

    override string toString() { result = "PointsToResolver" }
}

so it doesn't report there being an from A() -> A.__init__ (see class_simple.py). I thought about adding support for this with the code in the following snippet, but I kinda feel like that is cheating ("if you need to write custom code to make it work, then it's not supported"). I want to know whether you agree.

exists(ClassValue cls |
    call = cls.getACall().getNode() and
    cls.lookup("__init__").(PythonFunctionValue).getScope() = callable
)

See the added README for in-depth details
@RasmusWL RasmusWL requested a review from a team as a code owner June 24, 2020 20:33
@yoff
Copy link
Contributor

yoff commented Jun 26, 2020

On "Class Construction": In the upcoming data flow implementation, we currently use CallableValue which seems a bit more general than Function but explicitly excludes classes. It appears that calling a class is simply a second case. As such, having a second predicate or clause is not cheating.

@yoff
Copy link
Contributor

yoff commented Jun 26, 2020

Actually, looks like you can just use Value, both CallableValue.getACall and ClassValue.getACall forward to Value.getACall.

@RasmusWL
Copy link
Member Author

RasmusWL commented Jul 1, 2020

On "Class Construction": In the upcoming data flow implementation, we currently use CallableValue which seems a bit more general than Function but explicitly excludes classes. It appears that calling a class is simply a second case. As such, having a second predicate or clause is not cheating.

Actually, looks like you can just use Value, both CallableValue.getACall and ClassValue.getACall forward to Value.getACall.

Yep, using ClassValue.getACall will allow us to find the call A(), but it will not resolve to A.__init__ which is what I initially wanted. However, that is also a bit too simplistic a view of things, since A.__new__ is called first, which then calls A.__init__. Metaclasses can also "interfere" with this process.

So it seems like my initial idea of A() -> A.__init__ was flawed, and instead I should allow it to go to the class.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: really good stuff. I've made a bunch of comments and suggestions, but these are mostly about the documentation. The code itself seems solid. 💪

}

/** There is an obvious problem with the annotation `name` */
predicate name_in_error_state(string name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a logic to whether the names of these methods are written using underscores or in mixedCase? (Ditto comment_for above. Also maybe some of these should be private.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, that's just me being inconsistent. I have transformed everything into camelCase now. I guess some of them could be private, but seeing as they're in a path that can't be imported from anywhere (since it contains -), I don't really see the point in thinking too much about this.

...
else:
...
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the significance of this example (and by extension, the section in which it appears). 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can see that it's a little empty 😄 tried to fix it up in a new commit :)

RasmusWL and others added 6 commits July 6, 2020 17:21
Co-authored-by: Taus <tausbn@gmail.com>
Adjusting test setup properly requires some deep thinking, and I don't think I'm
ready to do that right now. Added a TODO instead.
@RasmusWL RasmusWL requested a review from tausbn July 6, 2020 17:06
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Regarding the whole __init__/__new__ issue, it may be better to verify correctness at the level of dataflow rather than just the call graph (i.e. testing that arguments to classes flow to the correct places for these two special methods).

@tausbn tausbn merged commit df3eb9f into github:master Jul 10, 2020
@RasmusWL RasmusWL deleted the python-add-annotated-callgraph-tests branch July 13, 2020 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments