Scalability notes
relationalstats.linkprediction uses sparse matrix operations where possible.
Sparse-friendly metrics
The following metrics can benefit from sparse adjacency matrix operations:
common_neighborsjaccardadamic_adarpreferential_attachmentresource_allocationsaltonsorensenhub_promotedhub_depressedlhn_locallocal_path
Potentially expensive metrics
The following metrics may require global matrix operations:
katzrwract
These should be used carefully on large graphs.
Katz
Katz uses the matrix expression:
This may become expensive when the graph has many nodes.
Random walk with restart
Random walk with restart uses:
where
Average commute time
Average commute time uses the pseudoinverse of the Laplacian:
This requires dense linear algebra in the current implementation.
Future optimization directions
Future versions may investigate:
- better sparse linear solvers;
- approximate global metrics;
- Numba acceleration;
- GraphBLAS-compatible backends;
- compiled backends for large-scale graph statistics.