Python: Improve various library modeling#6349
Conversation
Intended for use with dca
Before, results from `dca` would look something like
## + py/meta/alerts/remote-flow-sources-reach
- django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48
reachable with taint-tracking from RemoteFlowSource
- django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12
reachable with taint-tracking from RemoteFlowSource
now it should make it easier to spot _what_ it is that actually changed,
since we pretty-print the node.
So it matches the new style we're using in aiohttp/twisted/...
Such that the result of `request.FILES["key"].file.read()` is tainted
since UploadedFile is the abstract base class, all real usage would be of one of the subclasses, so removing this to not provide a false hope that it actually works. I don't think investing the time into making this work would give any value, so that's why I didn't do it ;)
Like before, omitted ClassInstantiation
Having the additional taint step just next to the other definitions, so everything is together.
InstanceSourceApiNode is a really good idea, but it just happened too soon. I can't do what I need if I have to supply an API-node. So to avoid confusion between deprecating to/from InstanceSource in those classes, I opted to do some major reorganizing as well 👍 Due to aliasing restrictions, I had to use a little trick with the `WerkzeugOld` module.
Also removed a misleading comment link to method on wrong class :D
I know that the TODO about not having the tools to handling `meth = obj.meth; meth()` is outdated now that we `DataFlow::MethodCallNode`, but I'm planning to deal with that later on ;)
It would probably have been easier to do this as the _first_ thing... but that's too late now 😓
Such that it should be next to the other class-related predicates (such as `instance()`), the class is called `AdditionalTaintStep`, and it marked private. I also moved any modeling of attributes as well, while I was at it.
A few stragglers that did not have the same TODO comments as the others
These were written way before the ones in DataFlowPrivate, but apparently didn't cover quite as much :|
I realized that if you ever wanted to the way taint-steps works again, you would have to go to all the 117 places it has been implemented, and change EVERY ONE OF THEM :( so trying to solve that problem here. Not super happy with the name, but that was just the best I could come up with :D
yoff
left a comment
There was a problem hiding this comment.
This is excellent, thanks for shaping up our libraries!
I suppose a by now sort-of-standard name for InstanceTaintStepHelper would be InstanceTaintStepConfiguration, but I am not sure it sends a better signal.
|
|
||
| /** A direct instantiation of `django.utils.datastructures.MultiValueDict`. */ | ||
| private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode { | ||
| override CallNode node; |
There was a problem hiding this comment.
This override should be unnecessary, given extends DataFlow::CallCfgNode.
| /** An attribute read on an django request that is a `MultiValueDict` instance. */ | ||
| class DjangoHttpRequestMultiValueDictInstances extends Django::MultiValueDict::InstanceSource { | ||
| DjangoHttpRequestMultiValueDictInstances() { | ||
| this.(DataFlow::AttrRead).getObject() = django::http::request::HttpRequest::instance() and | ||
| this.(DataFlow::AttrRead).getAttributeName() in ["GET", "POST", "FILES"] | ||
| } | ||
| } |
There was a problem hiding this comment.
I suggest to extend DataFlow::AttrRead here.
There was a problem hiding this comment.
That would be nice, but DataFlow::AttrRead is an abstract class, so it doesn't work that well in practice 😞 we could have applied the ::Range pattern, but that seems a bit late now 😞
There was a problem hiding this comment.
Could we use the new-fangled extends ... instanceof ... instead? (I've still not fully internalised how that affects things.)
There was a problem hiding this comment.
Ah, fair enough, then I do not require a change here, although the suggestion by @tausbn is interesting.
| /** | ||
| * Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related `await`. | ||
| */ |
|
Hm, reviewing commit-by-commit might not have been the best choice here... |
As pointed out in review, we don't need this override any more!
Seeing |
Co-authored-by: Rasmus Wriedt Larsen <rasmuswl@github.com>
It got rather involved in the end, but I think the end result is both a more in-depth modeling, and a better way to model things moving forwards.
Performance test in https://github.com/dsp-testing/RasmusWL-dca/issues/29