New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby: Track flow from splat arguments to positional parameters #13878
base: main
Are you sure you want to change the base?
Conversation
f417fca
to
5eb2746
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! I have a few minor comments.
| @@ -300,6 +308,10 @@ private module Cached { | |||
| TSynthHashSplatParameterNode(DataFlowCallable c) { | |||
| isParameterNode(_, c, any(ParameterPosition p | p.isKeyword(_))) | |||
| } or | |||
| TSynthSplatParameterNode(DataFlowCallable c) { | |||
| exists(c.asCallable()) and // exclude library callables | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought it wouldn't be useful to have splat flow into library callables since they have no real method bodies, but of course they can still propagate flow from their positional params and so they shouldn't necessarily be excluded here.
But the other issue is I got some consistency test failures when I didn't exclude them - see here:
Cannot find DataFlowConsistency.expected file.
--- expected
+++ actual
@@ -1,1 +1,6 @@
-
+reverseRead
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
[5/8 comp 15.4s eval 168ms] FAILED(RESULT) /Users/hmac/src/codeql/ruby/ql/test/library-tests/dataflow/hash-flow/CONSISTENCY/DataFlowConsistency.ql
I don't intuitively understand why the origin of a read step would always need a PostUpdateNode - we're not updating anything? Particularly for these read steps, it doesn't seem relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought it wouldn't be useful to have splat flow into library callables since they have no real method bodies, but of course they can still propagate flow from their positional params and so they shouldn't necessarily be excluded here.
We want to make sure that library callables involving positional summaries also work when arguments are passed in through a splat argument. E.g. if returnFst is modeled to return its first positional parameter, then we want there to be flow from x to the result of returnFst(*[x,y]).
But the other issue is I got some consistency test failures when I didn't exclude them
Ok, it looks like we actually need to synthesize a post-update splat parameter node as well (the same goes for hash splat parameters). Consider:
def foo(x, y)
x[0] = y
end
a = [0]
args = [a, taint]
foo(*args);
sink(args[0][0])I suggest we do this follow-up, so let's keep the restriction for library callables for now.
| @@ -1408,6 +1425,11 @@ predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) { | |||
| ppos.isAnyNamed() and apos.isKeyword(_) | |||
| or | |||
| apos.isAnyNamed() and ppos.isKeyword(_) | |||
| or | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move these disjuncts up after the ppos.isSplatAll() and apos.isSplatAll() case, to keep cases involving splats together.
| or | ||
| exists(int n | n > 0 | | ||
| parameter = callable.getParameter(n).(SplatParameter) and | ||
| pos.isSplat(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need not exists(callable.getParameter(n+1)) here? Otherwise, I would think that in
def foo(a, *splats, b)
end
foo(0, *[1, 2])both 1 and 2 get passed into the splat parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tradeoff there is that we will not get flow from 1 into splats[0] - we're effectively only handling flow into splat parameters that are the final parameter in the method. This might be fine, as rarely are splat parameters followed by positional params, and some future work will improve this further. I suggest we add not exists(SimpleParameter p, int m | m > n | p = callable.getParameter(m)) to rule out cases where a positional parameter follows the splat parameter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more - that makes sense, since we're only handling exact splat position matches at the moment. I'll push up a change that does this.
| * | ||
| * Then `getAParameter(element 0) = x` and `getAParameter(element 1) = y`. | ||
| */ | ||
| ParameterNode getAParameter(ContentSet c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to include unknowns here as well? I.e., if we have a splat argument with a value v at some unknown index, then v could in principle end up in all positional parameters of callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't considered unknowns at all yet - I'll take a look and see.
This models flow in the following case:
def foo(x, y)
sink x # 1
sink y # 2
end
args = [source 1, source 2]
foo(*args)
We do this by introducing a SynthSplatParameterNode which accepts
content from the splat argument, if one is given at the callsite.
From this node we add read steps to each positional parameter.
In cases where there are positional parameters after a splat parameter, don't attempt to match the splat parameter to a splat argument. We need more sophisticated modelling to handle these cases, which is future work.
This models flow in the following case:
We do this by introducing a SynthSplatParameterNode which accepts
content from the splat argument, if one is given at the callsite.
From this node we add read steps to each positional parameter.