Dataflow: Refactor FlowState to be paired with Node #18633

aschackmull · 2025-01-30T11:02:06Z

This refactors the representation of FlowState in the data flow library. Stage 1 is extracted to its own file and then for subsequent stages we either use NodeEx directly or a (NodeEx, FlowState) pair as the node type in order to eliminate the FlowState column.

shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll

+        )
+      }
+
+      private predicate partialPathOutOfCallable0(


shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll

+      }
+
+      pragma[noinline]
+      private predicate partialPathThroughCallable0(


aschackmull · 2025-01-30T13:03:29Z

Commit-by-commit review encouraged, but the second commit really needs to be viewed with git diff --minimal as that saves about 6000 lines of superfluous diff output.

hvitved

Nice refactor, trivial comments only. Also, performance needs to be fixed.

hvitved · 2025-01-30T13:33:57Z

shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll

+      /* End: Stage 1 logic. */
+    }
+
+    private module Stage1Common {


Should this be moved up before the Stage1 module, perhaps?

It contains a bit of a mix of input and output predicates relative to the Stage1 module. It's mostly output, though, hence why I put it after.

hvitved · 2025-01-30T13:34:23Z

shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll

+
+      private class Cc = boolean;
+
+      /* Begin: Stage 1 logic. */


This comment (and the end comment) is perhaps redundant now?

Yes, I'll remove them.

hvitved · 2025-01-31T09:22:56Z

shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll

@@ -169,4237 +171,2703 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {
  /**
   * Constructs a data flow computation given a full input configuration.


QL doc should probably also mention the Stage1 parameter.

asgerf · 2025-01-31T12:52:22Z

Wouldn't it be cheaper and simpler to pair it up with the access path instead? I imagine the TNil case storing the FlowState (with similar adaptions to the various AP approximations):

newtype TAccessPath =
  TNil(FlowState state) { ...} or
  TCons(Content head, TAccessPath tail) { ... }

aschackmull · 2025-01-31T13:52:09Z

Wouldn't it be cheaper and simpler to pair it up with the access path instead?

I don't think so. It might be cheaper in terms of number of constructed elements of IPA types, but I definitely don't think it's simpler. Firstly, doing so would be a semantic change in all the cases where we aren't tracking an exact access path. Secondly, the tracking of FlowStates would then be different in each stage and that would mean that stages 2-6 would still need to care about and handle FlowState, whereas the point of the present refactor is that we can handle FlowState once and for all between stages 1 and 2, and then the bulk of the data flow library simply doesn't need to ever care about FlowState, which I think is a major improvement in terms of reducing the complexity of the library. In particular, it has become much easier to see that the stateful in- and out-barriers are now handled correctly. That's a huge pain in the current implementation.

…e as well.

aschackmull · 2025-02-05T08:04:35Z

Dca looks good now.

hvitved · 2025-02-05T08:09:34Z

There is an impressive JS speedup on microsoft__vscode; is that because of db1ed67? (cc @asgerf ).

aschackmull · 2025-02-05T08:43:19Z

There is an impressive JS speedup on microsoft__vscode; is that because of db1ed67? (cc @asgerf ).

I think it might simply be due to reduced memory pressure, since we've removed a column from most predicates. Impressive nonetheless.
Additionally a few predicates now track state, which they didn't before, and that yields a very minor precision improvement that reduces the tuple count a little. Testing locally, I couldn't get the same speedup, but that may be because I simply have more ram available.

asgerf · 2025-02-05T08:47:36Z

There is an impressive JS speedup on microsoft__vscode; is that because of db1ed67?

That's great news! The stage timings seem to indicate js/insecure-randomness and to a lesser degree js/xss as the main contributors. Insecure randomness has long been a problematic query for that project, so it would be interesting to find out why it became faster.

aschackmull · 2025-02-05T10:25:37Z

I just checked js/insecure-randomness and looked at it with the "Compare Performance" feature in VSCode. The largest predicate, Stage4::fwdFlow1 computes fewer tuples for some reason, but otherwise it looks fairly similar, so I think reduced ram-pressure is the main factor in the speedup. That query computes a huge number of tuples in stage 4, even more than js/xss.

github-actions bot added the DataFlow Library label Jan 30, 2025

github-advanced-security bot found potential problems Jan 30, 2025

View reviewed changes

aschackmull force-pushed the dataflow/refactor-flowstate branch from 1a930cc to dd0a07e Compare January 30, 2025 11:43

aschackmull marked this pull request as ready for review January 30, 2025 12:06

aschackmull added the no-change-note-required This PR does not need a change note label Jan 30, 2025

hvitved requested changes Jan 31, 2025

View reviewed changes

aschackmull requested a review from a team as a code owner January 31, 2025 14:00

github-actions bot added the JS label Jan 31, 2025

aschackmull added 14 commits February 4, 2025 10:46

Dataflow: Rename signature to preempt name clash.

02a81a0

Dataflow: Move Stage1 to its own file. Stick flow exploration in ther…

04db61a

…e as well.

Dataflow: Remove superfluous constraint.

3cbf8e5

Dataflow: Move definition of toNormalSinkNode.

d5759a7

Dataflow: Parameterise stages 2-6 over the node type.

1799bf9

Dataflow: Prepare a (node,state) pair type.

1166aa6

Dataflow: Use (node,state) pair as node type in stage 2+.

b4197b0

Dataflow: Minor cleanup.

e0cb70a

Dataflow: Rename two predicates to remove need for alias defs.

b2d42ee

Dataflow: Avoid duplication in fwdFlow1 disjunction.

2597ef6

Dataflow: Remove unused predicate.

e55130e

JS: Simplify config in PrototypePollutingFunction.ql.

db1ed67

Dataflow: Fixup some qldoc.

da34c0b

Dataflow: Fix join-order issue.

73d7250

aschackmull force-pushed the dataflow/refactor-flowstate branch from 72ca71e to 73d7250 Compare February 4, 2025 09:48

hvitved approved these changes Feb 5, 2025

View reviewed changes

aschackmull merged commit bcec7ee into github:main Feb 5, 2025
35 checks passed

aschackmull deleted the dataflow/refactor-flowstate branch February 5, 2025 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataflow: Refactor FlowState to be paired with Node #18633

Dataflow: Refactor FlowState to be paired with Node #18633

aschackmull commented Jan 30, 2025 •

edited

Loading

aschackmull commented Jan 30, 2025

hvitved left a comment

hvitved Jan 30, 2025

aschackmull Jan 31, 2025

hvitved Jan 30, 2025

aschackmull Jan 31, 2025

hvitved Jan 31, 2025

asgerf commented Jan 31, 2025

aschackmull commented Jan 31, 2025

aschackmull commented Feb 5, 2025

hvitved commented Feb 5, 2025

aschackmull commented Feb 5, 2025

asgerf commented Feb 5, 2025

aschackmull commented Feb 5, 2025

		@@ -169,4237 +171,2703 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {
		/**
		* Constructs a data flow computation given a full input configuration.

Dataflow: Refactor FlowState to be paired with Node #18633

Dataflow: Refactor FlowState to be paired with Node #18633

Conversation

aschackmull commented Jan 30, 2025 • edited Loading

aschackmull commented Jan 30, 2025

hvitved left a comment

Choose a reason for hiding this comment

hvitved Jan 30, 2025

Choose a reason for hiding this comment

aschackmull Jan 31, 2025

Choose a reason for hiding this comment

hvitved Jan 30, 2025

Choose a reason for hiding this comment

aschackmull Jan 31, 2025

Choose a reason for hiding this comment

hvitved Jan 31, 2025

Choose a reason for hiding this comment

asgerf commented Jan 31, 2025

aschackmull commented Jan 31, 2025

aschackmull commented Feb 5, 2025

hvitved commented Feb 5, 2025

aschackmull commented Feb 5, 2025

asgerf commented Feb 5, 2025

aschackmull commented Feb 5, 2025

aschackmull commented Jan 30, 2025 •

edited

Loading