GraphEm Rapids: High-Performance Graph Embedding

High-performance GraphEm implementation using PyTorch and RAPIDS cuVS. Force-directed layout with geometric intersection detection produces embeddings that correlate strongly with centrality measures.

Features

Unified API: Scipy sparse adjacency matrices, sklearn-style parameters (n_components, n_neighbors)
Multiple Backends: PyTorch (1K-100K vertices), RAPIDS cuVS (100K+ vertices), automatic selection
GPU Acceleration: CUDA support, memory-efficient chunking, automatic CPU fallback
Graph Generators: Erdős-Rényi, scale-free, SBM, bipartite, Delaunay, and more
Influence Maximization: Fast embedding-based seed selection

Installation

pip install graphem-rapids              # PyTorch backend
pip install graphem-rapids[cuda]        # + CUDA support
pip install graphem-rapids[rapids]      # + RAPIDS cuVS
pip install graphem-rapids[all]         # Everything

Quick Start

import graphem_rapids as gr

# Generate graph (returns sparse adjacency matrix)
adjacency = gr.generate_er(n=1000, p=0.01)

# Create embedder (automatic backend selection)
embedder = gr.create_graphem(adjacency, n_components=3)

# Run layout
embedder.run_layout(num_iterations=50)

# Get positions and visualize
positions = embedder.get_positions()  # numpy array (n, d)
embedder.display_layout()             # 2D or 3D plot

Backend Selection

Automatic (Recommended)

embedder = gr.create_graphem(adjacency, n_components=3)

Explicit PyTorch

embedder = gr.GraphEmbedderPyTorch(
    adjacency, n_components=3, device='cuda',
    L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10,
    batch_size=None  # Automatic (or manual: 1024)
)

Explicit RAPIDS cuVS

embedder = gr.GraphEmbedderCuVS(
    adjacency, n_components=3,
    index_type='auto',  # 'brute_force', 'ivf_flat', 'ivf_pq'
    sample_size=1024, batch_size=None
)

Index Types: brute_force (<100K), ivf_flat (100K-1M), ivf_pq (>1M vertices)

Check Backends

info = gr.get_backend_info()
print(f"CUDA: {info['cuda_available']}, Recommended: {info['recommended_backend']}")

Configuration

Environment Variables:

export GRAPHEM_BACKEND=pytorch        # Force backend
export GRAPHEM_PREFER_GPU=true        # Prefer GPU
export GRAPHEM_MEMORY_LIMIT=8         # GB
export GRAPHEM_VERBOSE=true

Programmatic:

from graphem_rapids.utils.backend_selection import BackendConfig, get_optimal_backend

config = BackendConfig(n_vertices=50000, force_backend='cuvs', memory_limit=16.0)
backend = get_optimal_backend(config)
embedder = gr.create_graphem(adjacency, backend=backend)

Graph Generators

All generators return scipy sparse adjacency matrices:

# Random
gr.generate_er(n=1000, p=0.01, seed=42)
gr.generate_random_regular(n=100, d=3, seed=42)

# Scale-free & small-world
gr.generate_ba(n=300, m=3, seed=42)             # Barabási-Albert
gr.generate_ws(n=1000, k=6, p=0.3, seed=42)     # Watts-Strogatz
gr.generate_scale_free(n=100, seed=42)

# Community structures
gr.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, seed=42)
gr.generate_caveman(l=10, k=10)
gr.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=42)

# Bipartite
gr.generate_bipartite_graph(n_top=50, n_bottom=100, p=0.2, seed=42)
gr.generate_complete_bipartite_graph(n_top=50, n_bottom=100)

# Geometric
gr.generate_geometric(n=100, radius=0.2, dim=2, seed=42)
gr.generate_delaunay_triangulation(n=100, seed=42)
gr.generate_road_network(width=30, height=30)   # 2D grid

# Trees
gr.generate_balanced_tree(r=2, h=10)

Influence Maximization

adjacency = gr.generate_er(n=1000, p=0.01)
embedder = gr.create_graphem(adjacency, n_components=3)
embedder.run_layout(num_iterations=50)

# Fast: embedding-based selection
seeds = gr.graphem_seed_selection(embedder, k=10)

# Evaluate with Independent Cascade model
import networkx as nx
G = nx.from_scipy_sparse_array(adjacency)
influence, _ = gr.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=100)

# Compare with greedy (slow, optimal)
greedy_seeds, _ = gr.greedy_seed_selection(G, k=10, p=0.1)

Advanced

Memory Management

from graphem_rapids.utils.memory_management import MemoryManager, get_gpu_memory_info

mem_info = get_gpu_memory_info()
print(f"GPU: {mem_info['free']:.1f}GB free / {mem_info['total']:.1f}GB total")

adjacency = gr.generate_er(n=1000, p=0.01)
with MemoryManager(cleanup_on_exit=True):
    embedder = gr.create_graphem(adjacency)
    embedder.run_layout(50)

Batch Size Tuning

from graphem_rapids.utils.memory_management import get_optimal_chunk_size

adjacency = gr.generate_er(n=1000, p=0.01)

# Automatic (recommended)
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=None)

# Manual
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=1024)

# Programmatic
optimal = get_optimal_chunk_size(n_vertices=1000000, n_components=3, backend='pytorch')
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=optimal)

Testing & Benchmarking

pytest                                          # Run all tests
pytest tests/test_pytorch_backend.py            # Specific backend
python benchmarks/run_benchmarks.py             # Performance tests
python benchmarks/compare_backends.py --sizes 1000,10000,100000

Contributing

See CONTRIBUTING.md for development setup, testing, and contribution guidelines.

Citation

@misc{kolpakov-rivin-2025fast,
  title={Fast Geometric Embedding for Node Influence Maximization},
  author={Kolpakov, Alexander and Rivin, Igor},
  year={2025},
  eprint={2506.07435},
  archivePrefix={arXiv},
  primaryClass={cs.SI},
  url={https://arxiv.org/abs/2506.07435}
}

License

MIT License - see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
graphem_rapids		graphem_rapids
images		images
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraphEm Rapids: High-Performance Graph Embedding

Features

Installation

Quick Start

Backend Selection

Automatic (Recommended)

Explicit PyTorch

Explicit RAPIDS cuVS

Check Backends

Configuration

Graph Generators

Influence Maximization

Advanced

Memory Management

Batch Size Tuning

Testing & Benchmarking

Contributing

Citation

License

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

License

sashakolpakov/graphem-rapids

Folders and files

Latest commit

History

Repository files navigation

GraphEm Rapids: High-Performance Graph Embedding

Features

Installation

Quick Start

Backend Selection

Automatic (Recommended)

Explicit PyTorch

Explicit RAPIDS cuVS

Check Backends

Configuration

Graph Generators

Influence Maximization

Advanced

Memory Management

Batch Size Tuning

Testing & Benchmarking

Contributing

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages