-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: support pathlib
#5681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: support pathlib
#5681
Conversation
6cc8a5b to
52a9040
Compare
This because `resolve` accesses the file system, I am open to not include that fact in the modelling.
that were only listed as file sytem accesses.
| // Special handling of the `/` operator | ||
| exists(BinaryExprNode slash, DataFlow::Node left | | ||
| slash.getOp() instanceof Div and | ||
| left.asCfgNode() = slash.getLeft() and | ||
| left.getALocalSource() = pathlibPath() | ||
| | | ||
| nodeTo.asCfgNode() = slash and | ||
| ( | ||
| // type-preserving call | ||
| nodeFrom = left | ||
| or | ||
| // data injection | ||
| nodeFrom.asCfgNode() = slash.getRight() | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this to also handle if the path is the right operator?
A demonstration:
In [1]: from pathlib import Path
In [2]: "bar" / Path("foo")
Out[2]: PosixPath('bar/foo')There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, thanks for catching this!
Will also support both operands being paths
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions for improvements. Apart from that I think this looks good. 👍
| // Type-preserving call | ||
| exists(DataFlow::AttrRead returnsPath | | ||
| returnsPath.getAttributeName() = pathlibPathMethod() | ||
| | | ||
| nodeTo.(DataFlow::CallCfgNode).getFunction() = returnsPath and | ||
| nodeFrom = returnsPath.getObject() | ||
| ) | ||
| or | ||
| // Type-preserving attribute | ||
| exists(DataFlow::AttrRead isPath | isPath.getAttributeName() = pathlibPathAttribute() | | ||
| nodeTo = isPath and | ||
| nodeFrom = isPath.getObject() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's a lot of overlap between this and the type tracker. Could we perhaps factor out these steps appropriately, and avoid repeating ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, see if you like...it could potentially also be done for the injection part.
| ) | ||
| ) | ||
| or | ||
| // Data injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, it's not immediately obvious to me what "Data injection" means here. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that we should (taint-)track data from a non-Path-object (in)to a Path-object.
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the refactoring helped a lot, but I still think it could be improved further. I have added a few suggestions.
| /** | ||
| * Flow for type presering mehtods. | ||
| */ | ||
| private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very generic-sounding name (and QLDoc comment) for something that's specific to pathlib. Perhaps something with pathlib in the name would be better?
(Also, two typos: preserving methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I had the generalisation of this in my head...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue for this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)
| /** | ||
| * Flow for type presering attributes. | ||
| */ | ||
| private predicate typePreservingAttribute(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my other comment (and also another typo).
| slash.getOp() instanceof Div and | ||
| ( | ||
| pathOperand.asCfgNode() = slash.getLeft() and | ||
| dataOperand.asCfgNode() = slash.getRight() | ||
| or | ||
| pathOperand.asCfgNode() = slash.getRight() and | ||
| dataOperand.asCfgNode() = slash.getLeft() | ||
| ) and | ||
| pathOperand.getALocalSource() = pathlibPath() | ||
| | | ||
| nodeTo.asCfgNode() = slash and | ||
| nodeFrom in [ | ||
| // type-preserving call | ||
| pathOperand, | ||
| // data injection | ||
| dataOperand | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not just boil down to the following?
slash.getOp() instanceof Div and
nodeFrom.asCfgNode() = slash.getAnOperand() and
nodeFrom.getALocalSource() = pathlibPath() and
nodeTo.asCfgNode() = slash(In which case, more sharing with the pathlibPath typetracker might be possible.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost, except the argument of type path lib.Path does not have to be nodeFrom. It still does become much shorter this way, though 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing is a bit tricky for data injection, because the taint step would prefer to refer to the type tracker to limit the set of nodes.
Co-authored-by: Taus <tausbn@github.com>
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the PR again. There are still some points I would like to see addressed, but I don't want this to turn into an eternal PR, so feel free to ignore my suggestions if you disagree with them. We can always clean this up later.
I believe the code is functionally correct.
With that in mind, I've marked this as approved.
| /** | ||
| * Flow for type presering mehtods. | ||
| */ | ||
| private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)
Co-authored-by: Taus <tausbn@github.com>
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's see if it floats. 🚢
Adds support for
pathlib.Paths by tracking these objects with a type tracker and adding appropriate taint steps.This enables a new extension of
FileSystemAccessbase onpathlib.Pathobjects.I included most of the API since the QLDoc for
FileSystemAccessis quite broad, but I am open to dropping calls such asresolveoris_fifowhich may be harder to exploit.Currently missing:
cwdandhome