Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby: Track flow from splat arguments to positional parameters #13878

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

hmac
Copy link
Contributor

@hmac hmac commented Aug 3, 2023

This models flow in the following case:

def foo(x, y)
  sink x # 1
  sink y # 2
end

args = [source 1, source 2]
foo(*args)

We do this by introducing a SynthSplatParameterNode which accepts
content from the splat argument, if one is given at the callsite.
From this node we add read steps to each positional parameter.

@github-actions github-actions bot added the Ruby label Aug 3, 2023
@hmac hmac force-pushed the splat-flow branch 4 times, most recently from f417fca to 5eb2746 Compare August 8, 2023 08:05
@hmac hmac changed the title Ruby: Flow through splat arguments/params Ruby: Track flow from splat arguments to positional parameters Aug 8, 2023
@hmac hmac marked this pull request as ready for review August 8, 2023 15:26
@hmac hmac requested a review from a team as a code owner August 8, 2023 15:26
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I have a few minor comments.

@@ -300,6 +308,10 @@ private module Cached {
TSynthHashSplatParameterNode(DataFlowCallable c) {
isParameterNode(_, c, any(ParameterPosition p | p.isKeyword(_)))
} or
TSynthSplatParameterNode(DataFlowCallable c) {
exists(c.asCallable()) and // exclude library callables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought it wouldn't be useful to have splat flow into library callables since they have no real method bodies, but of course they can still propagate flow from their positional params and so they shouldn't necessarily be excluded here.

But the other issue is I got some consistency test failures when I didn't exclude them - see here:

Cannot find DataFlowConsistency.expected file.
--- expected
+++ actual
@@ -1,1 +1,6 @@
-
+reverseRead
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
+| file://:0:0:0:0 | synthetic *args | Origin of readStep is missing a PostUpdateNode. |
[5/8 comp 15.4s eval 168ms] FAILED(RESULT) /Users/hmac/src/codeql/ruby/ql/test/library-tests/dataflow/hash-flow/CONSISTENCY/DataFlowConsistency.ql

I don't intuitively understand why the origin of a read step would always need a PostUpdateNode - we're not updating anything? Particularly for these read steps, it doesn't seem relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought it wouldn't be useful to have splat flow into library callables since they have no real method bodies, but of course they can still propagate flow from their positional params and so they shouldn't necessarily be excluded here.

We want to make sure that library callables involving positional summaries also work when arguments are passed in through a splat argument. E.g. if returnFst is modeled to return its first positional parameter, then we want there to be flow from x to the result of returnFst(*[x,y]).

But the other issue is I got some consistency test failures when I didn't exclude them

Ok, it looks like we actually need to synthesize a post-update splat parameter node as well (the same goes for hash splat parameters). Consider:

def foo(x, y)
  x[0] = y
end

a = [0]
args = [a, taint]
foo(*args);
sink(args[0][0])

I suggest we do this follow-up, so let's keep the restriction for library callables for now.

@@ -1408,6 +1425,11 @@ predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) {
ppos.isAnyNamed() and apos.isKeyword(_)
or
apos.isAnyNamed() and ppos.isKeyword(_)
or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move these disjuncts up after the ppos.isSplatAll() and apos.isSplatAll() case, to keep cases involving splats together.

or
exists(int n | n > 0 |
parameter = callable.getParameter(n).(SplatParameter) and
pos.isSplat(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need not exists(callable.getParameter(n+1)) here? Otherwise, I would think that in

def foo(a, *splats, b)
end

foo(0, *[1, 2])

both 1 and 2 get passed into the splat parameter.

Copy link
Contributor Author

@hmac hmac Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tradeoff there is that we will not get flow from 1 into splats[0] - we're effectively only handling flow into splat parameters that are the final parameter in the method. This might be fine, as rarely are splat parameters followed by positional params, and some future work will improve this further. I suggest we add not exists(SimpleParameter p, int m | m > n | p = callable.getParameter(m)) to rule out cases where a positional parameter follows the splat parameter. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more - that makes sense, since we're only handling exact splat position matches at the moment. I'll push up a change that does this.

*
* Then `getAParameter(element 0) = x` and `getAParameter(element 1) = y`.
*/
ParameterNode getAParameter(ContentSet c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to include unknowns here as well? I.e., if we have a splat argument with a value v at some unknown index, then v could in principle end up in all positional parameters of callable.

Copy link
Contributor Author

@hmac hmac Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't considered unknowns at all yet - I'll take a look and see.

This models flow in the following case:

    def foo(x, y)
      sink x # 1
      sink y # 2
    end

    args = [source 1, source 2]
    foo(*args)

We do this by introducing a SynthSplatParameterNode which accepts
content from the splat argument, if one is given at the callsite.
From this node we add read steps to each positional parameter.
In cases where there are positional parameters after a splat parameter,
don't attempt to match the splat parameter to a splat argument. We need
more sophisticated modelling to handle these cases, which is future
work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants