Traversal optimizations #7244

galvana · 2026-01-17T01:16:46Z

Description Of Changes

This PR significantly improves the performance of graph traversal operations by replacing O(N²) algorithms with O(N+E) alternatives.

Performance Improvements

Optimization	Before	After	Improvement
`traverse()` in traversal.py	O(N²)	O(N+E)	~100x faster at 3000 nodes
`compute_all_descendants()`	O(N²)	O(N+E)	~100x faster at 3000 nodes
Dataset-level `after` deps	O(N²)	O(N)	Linear lookup

Key Changes:

traverse() in traversal.py - O(N²) → O(N+E)
- Legacy used MatchingQueue.pop_first_match() which scans the queue on each iteration
- New version uses Kahn's algorithm with in-degree tracking for optimal complexity
compute_all_descendants() in create_request_tasks.py - O(N²) → O(N+E)
- Was calling networkx.descendants() for each node in a loop
- Now pre-computes all descendants in a single reverse topological pass
Dataset-level after dependencies - O(N²) → O(N)
- Was scanning all nodes to find collections in a dataset
- Now uses pre-built _collections_by_dataset index for O(1) lookups

Performance comparison

Collections	Legacy O(N²)	Optimized O(N+E)	Speedup
100	~12 ms	~1 ms	12x
1,000	~1.1 s	~7 ms	161x
5,000	~27 s	~30 ms	900x
10,000	~2.2 min	~80 ms	1,680x
25,000	~14 min*	~500 ms	1,688x
50,000	~57 min*	~1 s	3,415x

* Extrapolated from O(N²) growth curve

Code Changes

Added optimized traverse() method using Kahn's algorithm with in-degree tracking
Added compute_all_descendants() function for O(N+E) descendant computation
Pre-indexed edges by node address (edges_by_node) for O(1) edge lookups
Pre-built dataset-to-collections index (_collections_by_dataset) for O(1) lookups
Added skip_verification parameter to BaseTraversal to avoid redundant traversal during reachability checks
Retained _traverse_legacy() method for test comparison purposes

Tests

New test file test_traversal_optimization_comparison.py validates that the optimized algorithm produces identical results to the legacy implementation:

Test Class	Purpose
`TestTraversalComparison`	Validates equivalence on deterministic graph topologies (linear chains, star graphs, diamond graphs, graphs with `after` dependencies)
`TestRandomGraphEquivalence`	Validates equivalence on randomly generated graphs with varying densities and sizes
`TestTraversalErrorEquivalence`	Ensures both algorithms raise the same errors for invalid graphs (disconnected nodes, unreachable collections)

Steps to Confirm

Existing tests should pass.

Pre-Merge Checklist

vercel · 2026-01-17T01:16:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
fides-plus-nightly	Ready	Preview, Comment	Jan 29, 2026 1:01am

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
fides-privacy-center	Ignored		Jan 29, 2026 1:01am

greptile-apps · 2026-01-24T07:05:43Z

Greptile Overview

Greptile Summary

This PR implements significant performance optimizations for graph traversal operations, achieving ~100x speedup at 3000 nodes by replacing O(N²) algorithms with O(N+E) alternatives.

Key optimizations:

traverse() method: Replaced MatchingQueue.pop_first_match() scanning with Kahn's algorithm variant using in-degree tracking for O(N+E) complexity
compute_all_descendants(): Added function to pre-compute all descendants in single reverse topological pass instead of calling networkx.descendants() per node
Dataset indexing: Pre-built _collections_by_dataset index for O(1) dataset-to-collections lookups
Edge indexing: Pre-indexed edges by node address (edges_by_node) for O(1) edge lookups
Verification optimization: Added skip_verification parameter to avoid redundant traversal during reachability checks

Code quality:

Comprehensive test suite validates equivalence between optimized and legacy implementations across multiple graph topologies
Legacy _traverse_legacy() method retained for reference and comparison
Clear documentation of algorithmic approach and complexity analysis

Confidence Score: 4/5

This PR is safe to merge with minimal risk - well-tested performance optimization
Score reflects thorough testing approach and algorithmic correctness. The optimized implementation is validated against the legacy implementation across multiple test scenarios. The only minor style issue is a single-character variable name in the legacy code (which is intentionally kept for reference). The changes are well-documented with clear complexity analysis.
No files require special attention - the implementation is solid and well-tested

Important Files Changed

Filename	Overview
src/fides/api/graph/traversal.py	Replaced O(N²) traversal algorithm with O(N+E) Kahn's algorithm variant using in-degree tracking; added pre-indexing optimizations for edges and dataset collections; retained legacy implementation for comparison
src/fides/api/task/create_request_tasks.py	Added `compute_all_descendants()` function for O(N+E) descendant computation; updated task creation functions to pre-compute descendants once instead of calling networkx.descendants per node
src/fides/api/graph/node_filters.py	Updated `OptionalIdentityFilter` to use `skip_verification` parameter and explicit `traverse()` call to avoid redundant verification overhead
tests/ops/graph/test_traversal_optimization_comparison.py	Comprehensive test suite comparing optimized and legacy traversal algorithms across deterministic topologies, random graphs, and error cases

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-24T07:05:51Z

Additional Comments (1)

src/fides/api/graph/traversal.py
single-character variable name n - use descriptive name like current_node

Context Used: Rule from dashboard - Use full names for variables, not 1 to 2 characters (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Traversal optimizations

364ae4b

galvana added 3 commits January 16, 2026 17:40

N² to N

11c4ded

Adding tests

e93cd7a

Cleaning up tests

f02bbbe

galvana marked this pull request as ready for review January 24, 2026 07:03

galvana requested a review from a team as a code owner January 24, 2026 07:03

galvana requested review from adamsachs and removed request for a team January 24, 2026 07:03

Adding change log

a658673

greptile-apps bot reviewed Jan 24, 2026

View reviewed changes

galvana added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Jan 24, 2026

Merge branch 'main' into traversal-optimizations

9a1c013

vercel bot deployed to Preview – fides-plus-nightly January 26, 2026 22:15 View deployment

Merge branch 'main' into traversal-optimizations

2fa342e

vercel bot deployed to Preview – fides-plus-nightly January 29, 2026 01:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traversal optimizations #7244

Traversal optimizations #7244

galvana commented Jan 17, 2026 •

edited

Loading

Uh oh!

vercel bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Traversal optimizations #7244

Are you sure you want to change the base?

Traversal optimizations #7244

Conversation

galvana commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Performance Improvements

Performance comparison

Code Changes

Tests

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Jan 24, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galvana commented Jan 17, 2026 •

edited

Loading

vercel bot commented Jan 17, 2026 •

edited

Loading