Python: Add annotated call-graph tests by RasmusWL · Pull Request #3790 · github/codeql

RasmusWL · 2020-06-24T20:33:37Z

Start off by reading the README added for explanation :)

The tests included here is by no means complete yet. I wanted to get the approval from the rest of the team before investing more time into it. Please let me know @tausbn and @yoff 😊

I have made sure that all the added Python files are runnable, so we can manually inspect that the annotations are correct.

I would at least like to port all call related cases from https://github.com/github/codeql/tree/master/python/ql/test/library-tests/PointsTo/regressions, although not all of them are critical to handle right away, I think this should the right place to track such instances going forwards.

I also want to point out that I added tests that show the error handling works in a good way. I moved this out of the regular tests because it polluted everything. However, getting the error reporting right was not easy, so I wanted to keep it in there to show it actually works 😄

For further discussion:

Going forwards (if you all agree that this is nice), we need to figure out how to handle these 2 cases

1) "magic" methods in Python

How to handle special methods in Python. For example, if we have obj = MyClass(), then str(obj) would call __str__ if that is defined on MyClass. This can also happen for accessing/assigning/deleting object properties with @property, and many other cases, such as obj + 1 using __add__.

Our current setup of Value.getACall() returning a CallNode doesn't fully support this (since these aren't calls), so although I have included a test file for this in class_advanced.py, I have not annotated anything (so nothing is tested).

2) Class construction

Currently the edges I consider that points-to can resolve is very simple:

class PointsToResolver extends CallGraphResolver, TPointsToResolver {
    override predicate callEdge(Call call, Function callable) {
        exists(PythonFunctionValue func_value |
            func_value.getScope() = callable and
            call = func_value.getACall().getNode()
        )
    }

    override string toString() { result = "PointsToResolver" }
}

so it doesn't report there being an from A() -> A.__init__ (see class_simple.py). I thought about adding support for this with the code in the following snippet, but I kinda feel like that is cheating ("if you need to write custom code to make it work, then it's not supported"). I want to know whether you agree.

exists(ClassValue cls |
    call = cls.getACall().getNode() and
    cls.lookup("__init__").(PythonFunctionValue).getScope() = callable
)

See the added README for in-depth details

yoff · 2020-06-26T14:36:28Z

On "Class Construction": In the upcoming data flow implementation, we currently use CallableValue which seems a bit more general than Function but explicitly excludes classes. It appears that calling a class is simply a second case. As such, having a second predicate or clause is not cheating.

yoff · 2020-06-26T14:42:00Z

Actually, looks like you can just use Value, both CallableValue.getACall and ClassValue.getACall forward to Value.getACall.

RasmusWL · 2020-07-01T10:04:46Z

On "Class Construction": In the upcoming data flow implementation, we currently use CallableValue which seems a bit more general than Function but explicitly excludes classes. It appears that calling a class is simply a second case. As such, having a second predicate or clause is not cheating.

Actually, looks like you can just use Value, both CallableValue.getACall and ClassValue.getACall forward to Value.getACall.

Yep, using ClassValue.getACall will allow us to find the call A(), but it will not resolve to A.__init__ which is what I initially wanted. However, that is also a bit too simplistic a view of things, since A.__new__ is called first, which then calls A.__init__. Metaclasses can also "interfere" with this process.

So it seems like my initial idea of A() -> A.__init__ was flawed, and instead I should allow it to go to the class.

tausbn

Overall: really good stuff. I've made a bunch of comments and suggestions, but these are mostly about the documentation. The code itself seems solid. 💪

python/ql/test/experimental/library-tests/CallGraph/options

python/ql/test/experimental/library-tests/CallGraph-xfail/call_edge_xfail.py

tausbn · 2020-07-06T12:47:16Z

python/ql/test/experimental/library-tests/CallGraph/CallGraphTest.qll

+}
+
+/** There is an obvious problem with the annotation `name` */
+predicate name_in_error_state(string name) {


Is there a logic to whether the names of these methods are written using underscores or in mixedCase? (Ditto comment_for above. Also maybe some of these should be private.)

nah, that's just me being inconsistent. I have transformed everything into camelCase now. I guess some of them could be private, but seeing as they're in a path that can't be imported from anywhere (since it contains -), I don't really see the point in thinking too much about this.

python/ql/test/experimental/library-tests/CallGraph/CallGraphTest.qll

python/ql/test/experimental/library-tests/CallGraph/README.md

tausbn · 2020-07-06T12:54:36Z

python/ql/test/experimental/library-tests/CallGraph/README.md

+    ...
+else:
+    ...
+```


I'm not sure I understand the significance of this example (and by extension, the section in which it appears). 🤔

yeah, I can see that it's a little empty 😄 tried to fix it up in a new commit :)

Co-authored-by: Taus <tausbn@gmail.com>

Adjusting test setup properly requires some deep thinking, and I don't think I'm ready to do that right now. Added a TODO instead.

tausbn

I think this looks good. Regarding the whole __init__/__new__ issue, it may be better to verify correctness at the level of dataflow rather than just the call graph (i.e. testing that arguments to classes flow to the correct places for these two special methods).

Python: Add annotated call-graph tests

155bbbd

See the added README for in-depth details

RasmusWL added the Python label Jun 24, 2020

RasmusWL requested a review from a team as a code owner June 24, 2020 20:33

tausbn requested changes Jul 6, 2020

View reviewed changes

RasmusWL and others added 6 commits July 6, 2020 17:21

Python: Fix grammar

acfc62c

Co-authored-by: Taus <tausbn@gmail.com>

Python: Unlimited import depth

849159b

Python: Explain random example

9e252d5

Python: Autoformat

cd8ea78

Python: Disable class instantiation annotation for now

65c4e6c

Adjusting test setup properly requires some deep thinking, and I don't think I'm ready to do that right now. Added a TODO instead.

Python: Consistently use camelCase in annotated call-graph tests

d00e739

RasmusWL requested a review from tausbn July 6, 2020 17:06

tausbn approved these changes Jul 10, 2020

View reviewed changes

tausbn merged commit df3eb9f into github:master Jul 10, 2020

RasmusWL deleted the python-add-annotated-callgraph-tests branch July 13, 2020 08:41

Conversation

RasmusWL commented Jun 24, 2020

For further discussion:

1) "magic" methods in Python

2) Class construction

Uh oh!

yoff commented Jun 26, 2020

Uh oh!

yoff commented Jun 26, 2020

Uh oh!

RasmusWL commented Jul 1, 2020

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tausbn Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

RasmusWL Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tausbn Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

RasmusWL Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments