Validation against R linkprediction
This document describes the planned validation strategy for relationalstats.linkprediction against R linkprediction::proxfun.
The goal is to validate equivalent metrics on small graphs where expected scores can be manually inspected and reproduced.
Scope
The first validation targets are local and semi-local metrics:
common_neighborsjaccardadamic_adarpreferential_attachmentresource_allocationsaltonsorensenhub_promotedhub_depressedlhn_localshortest_pathlocal_path
Global metrics may be validated later with tolerance-based comparisons:
katzrwract
These metrics depend on matrix conventions, numerical precision, parameter choices, and graph connectivity assumptions.
Required R outputs
For each validation fixture, export the following from R:
graph_name
edge_list
directed flag
node ordering
pair list
selected metrics
proxfun output table
parameter values
seed, when applicableThe preferred exported format is CSV:
tests/validation_against_r/fixtures/linkprediction/
toy_graph_edges.csv
toy_graph_pairs.csv
toy_graph_r_proxfun_scores.csvPython validation inputs
The Python validation should reconstruct the same graph using:
import networkx as nxThen run:
from relationalstats.linkprediction import proxfun_full
scores = proxfun_full(
G,
pairs=pairs,
metrics=metrics,
directed=False,
)Comparison levels
Validation should classify each metric as one of:
exact
near-exact
tolerance-based
conceptual
not equivalentRecommended rules:
| Level | Meaning |
|---|---|
exact | Scores are equal for integer or deterministic local metrics. |
near-exact | Scores match up to floating-point precision. |
tolerance-based | Scores match within a documented numerical tolerance. |
conceptual | Trends or rankings are similar, but values are not expected to match exactly. |
not equivalent | The R and Python implementations use different definitions or assumptions. |
Expected validation level by metric
| Metric | Expected validation level |
|---|---|
common_neighbors | exact |
jaccard | near-exact |
adamic_adar | near-exact |
preferential_attachment | exact |
resource_allocation | near-exact |
salton | near-exact |
sorensen | near-exact |
hub_promoted | near-exact |
hub_depressed | near-exact |
lhn_local | near-exact |
shortest_path | near-exact |
local_path | tolerance-based |
katz | tolerance-based |
rwr | tolerance-based |
act | tolerance-based or conceptual |
Important assumptions
The validation must explicitly align:
- Node ordering.
- Directed versus undirected graph interpretation.
- Whether existing edges are scored.
- Whether only non-edges are scored.
- Parameter values for global metrics.
- Handling of disconnected node pairs.
- Floating-point tolerance.
- Treatment of zero-degree nodes.
- Treatment of self-loops.
- Whether the graph is simple, weighted, or multigraph-like.
Recommended toy graphs
Initial validation should use small graphs.
Path graph
0 -- 1 -- 2Useful for validating:
- common neighbors;
- Jaccard;
- Adamic-Adar;
- resource allocation;
- shortest path.
Triangle graph
0 -- 1
| /
2Useful for validating behavior when all pairs are already connected.
Star graph
1
|
2 -- 0 -- 3
|
4Useful for validating hub-sensitive metrics.
Disconnected graph
0 -- 1 2 -- 3Useful for validating shortest path, ACT, and disconnected-pair behavior.
R fixture generation sketch
A future R script should generate reproducible validation outputs.
Suggested location:
scripts/export_r_validation_outputs.RSuggested output folder:
tests/validation_against_r/fixtures/linkprediction/The script should:
- Build small graphs.
- Define a fixed pair list.
- Run
linkprediction::proxfun. - Export scores to CSV.
- Save graph edge lists and pair lists.
- Record package versions.
Current status
The first package implementation includes Python-side unit tests for manually verifiable small graphs.
R validation fixtures are planned for a later validation release.