New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add neighborhood scope token feature to ATM library #7158
base: main
Are you sure you want to change the base?
Conversation
7f6b9e4
to
748443a
|
Suggestions for names for this feature (please add to this)
A few thoughts from @tiferet:
|
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Outdated
Show resolved
Hide resolved
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Show resolved
Hide resolved
a14a636
to
0dc2cef
Co-authored-by: Chris Smowton <smowton@github.com>
Co-authored-by: Chris Smowton <smowton@github.com>
This provides functionality for getting the token features associated with a neighborhood around an AST node. It is strongly related to `FunctionBodies`. Co-authored-by: Chris Smowton <smowton@github.com>
Co-authored-by: Chris Smowton <smowton@github.com>
Co-authored-by: Tiferet Gazit <tiferet@github.com>
0dc2cef
to
05460f6
| // approximates the behavior of the classifer on non-generic body features where large body | ||
| // features are replaced by the absent token. | ||
| if count(DatabaseFeatures::AstNode node, string token | bodyTokens(rootNode, node, token)) > 256 | ||
| if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need getBodyTokenFeatureForNeighborhoodNode, or can we reuse getBodyTokenFeatureForEntity?
| if getNumDescendents(node.getParentNode()) > maxNumDescendants() | ||
| if | ||
| // `node` will always have a parent as we start at and endpoint | ||
| node.getParentNode() = getOutermostEnclosingFunction(node) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean the neighborhood can never be the entire enclosing function? We could instead do
if
node = getOutermostEnclosingFunction(node) or
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and this does happen in this example: https://github.com/wilsto/BoardOS/blob/develop/client/app/KPI/KPI.controller.js#L96-L100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This emulates what happens in getTokenBodyFeatureForEntity, which also returns the function body but not the top-level function AST node (i.e. the function name + parameters I think).
In the above example, neighborhoodBody is <ABSENT> but enclosingFunctionBody is also really short (and would be identical to neighborhoodBody).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions about this:
- If this emulates what happens in
getTokenBodyFeatureForEntitythen why doesneighborhoodBodyend up<ABSENT>whileenclosingFunctionBodyis short but not absent? - Why would we want
neighborhoodBodyto be<ABSENT>rather than being short and identical toenclosingFunctionBody? There could still be useful signal in the short sequence. Different features are have different paths through the network (different parameters are learned for each), so having identical values in some instances isn't redundant. Also, we're hoping we may be able to replace the full function body with these more localized features eventually.
| then result = node | ||
| else result = getNeighborhoodAstNode(node.getParentNode()) | ||
| } | ||
|
|
||
| /** Count number of descendants of an AST node */ | ||
| int getNumDescendents(Raw::AstNode node) { result = count(node.getAChildNode*()) } | ||
|
|
||
| private ASTNode getContainer(ASTNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOI, what's the difference between a container and a parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Containers skip up to an enclosing function, parents step one level up the AST graph, e.g. for
f() {
if(endpoint) {
…
}
}
the container is f() but the parent is if(…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then why do we need getContainer*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an endpoint is enclosed in a function that's enclosed in a function, do the enclosingFunction features look at the outermost function?
| @@ -8,6 +8,12 @@ import javascript | |||
| import CodeToFeatures | |||
| import EndpointScoring | |||
|
|
|||
| /** Maximum number of descendants of an AST node to be considered to be in the "neighborhood" of that node */ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before running the dev pipeline, do we want to produce several different features with different values of maxNumDescendants, so we can experiment and see which give good signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds sensible yes.
codeql, rather than a fork. This makes testing these changes inbackendmuch more straightforward.Taking the work in https://github.com/github/ml-ql-adaptive-threat-modeling-backend/tree/annarailton/neighbourhood-features and moving it into the CodeQL library.
The feature
neighborhoodBodyis now a token feature extracted inExctractEndpointData.ql, alongside the likes ofenclosingFunctionBody.See github/ml-ql-adaptive-threat-modeling#1553
The text was updated successfully, but these errors were encountered: