Skip to content

Link prediction metrics

This document summarizes the initial link prediction metrics supported by relationalstats.linkprediction.proxfun_full.

Notation

  • G=(V,E): graph.
  • u,vV: node pair to score.
  • Γ(u): neighborhood of node u.
  • ku=|Γ(u)|: degree of node u.
  • A: adjacency matrix.
  • (A)uv: number of walks of length from u to v.

For the first release, the recommended default is directed=False, which scores pairs on an undirected version of the graph.

Metric families

MechanismMetrics
Triadic closurecommon_neighbors, jaccard, adamic_adar
Popularity / degreepreferential_attachment, degree
Structural similaritysalton, sorensen, lhn_local
Hub dynamicshub_promoted, hub_depressed
Diffusion / flowresource_allocation
Geodesic proximityshortest_path
Extended short pathslocal_path
Global centralitykatz
Stochastic diffusionrwr
Random-walk global metricact

Common neighbors

CN(u,v)=|Γ(u)Γ(v)|

Counts the number of shared neighbors between two nodes.

Jaccard coefficient

J(u,v)=|Γ(u)Γ(v)||Γ(u)Γ(v)|

Measures neighborhood overlap normalized by neighborhood union.

Adamic-Adar index

AA(u,v)=wΓ(u)Γ(v)1log(kw)

Penalizes shared neighbors with high degree.

Preferential attachment

PA(u,v)=kukv

Scores pairs highly when both nodes have high degree.

Resource allocation index

RA(u,v)=wΓ(u)Γ(v)1kw

Similar to Adamic-Adar, but with a stronger penalty for high-degree shared neighbors.

Salton index / cosine similarity

Salton(u,v)=|Γ(u)Γ(v)|kukv

Cosine-like structural similarity between node neighborhoods.

Sorensen index

Sorensen(u,v)=2|Γ(u)Γ(v)|ku+kv

Another normalized neighborhood-overlap score.

Hub promoted index

HPI(u,v)=|Γ(u)Γ(v)|min(ku,kv)

Promotes similarity when the smaller-degree node shares many of its neighbors.

Hub depressed index

HDI(u,v)=|Γ(u)Γ(v)|max(ku,kv)

Penalizes similarity when one of the nodes is a high-degree hub.

Leicht-Holme-Newman local index

LHN(u,v)=|Γ(u)Γ(v)|kukv

Reduces degree-driven bias in common-neighbor scores.

Shortest path

d(u,v)=dist(u,v)

Lower values indicate closer nodes. Disconnected nodes receive infinite distance in the current implementation.

Local path index

LP(u,v)=(A2)uv+β(A3)uv

The default value is:

β=0.01

This combines length-2 and length-3 walks.

Katz index

Katz(u,v)==1β(A)uv

Matrix form:

Katz=(IβA)1I

The default value is:

β=0.005

Katz is a global metric and can be expensive for large graphs.

Random walk with restart

Random walk with restart estimates the diffusion probability of reaching a node from a source while repeatedly returning to the source.

The current implementation uses:

RWR=(I(1α)P)1

where:

  • P is the row-normalized transition matrix.
  • α is the restart probability.

The default value is:

α=0.15

RWR is a global matrix-based metric and can be expensive for large graphs.

Degree features

degree_source=kudegree_target=kv

These features expose the individual degree of each node in the pair.

Average commute time score

Average commute time is based on the Moore-Penrose pseudoinverse of the graph Laplacian.

C(u,v)=vol(G)(Luu++Lvv+2Luv+)

The implemented score is:

ACT_score(u,v)=1C(u,v)

Disconnected pairs receive score 0.0.

This metric requires dense linear algebra and is expensive for large graphs.

Division by zero policy

When a denominator is zero, the score is set to 0.0.

This affects metrics such as:

  • jaccard
  • salton
  • sorensen
  • hub_promoted
  • hub_depressed
  • lhn_local
  • act

Scalability notes

Sparse matrix operations are used where possible for local metrics.

The following metrics are potentially expensive:

  • katz
  • rwr
  • act

These should be used carefully on large graphs.

Built with VitePress and deployed with GitHub Pages.