We present Untamed, a novel framework that integrates unconstrained tensor decomposition using NASCAR to identify functional networks, with state-of-the-art graph node embedding to generate cortical parcellations. Our method produces near-homogeneous, spatially coherent regions aligned with large-scale functional networks, while avoiding strong assumptions like statistical independence required in ICA. Across multiple datasets, Untamed consistently demonstrates improved or comparable performance in functional connectivity homogeneity and task contrast alignment compared to existing atlases. The pipeline is fully automated, allowing for rapid adaptation to new datasets and the generation of custom parcellations. The atlases derived from the Genomics Superstruct Project (GSP) dataset, along with the code for generating customizable parcel numbers, are publicly available at:
π https://untamed-atlas.github.io
The Untamed pipeline consists of three main stages:
- Graph Construction: Build adjacency matrices from NASCAR spatial maps with spatial constraints
- Graph Embedding: Generate node embeddings using NetMF (Network Matrix Factorization)
- Clustering: Apply k-means clustering to produce the final parcellation
- Installation & Requirements
- Quick Start Guide
- Step-by-Step Tutorial
- Using with Custom Data
- Parameter Guide
- Output Description
- Troubleshooting
- Citation
- MATLAB (R2020b or later recommended)
- Sufficient memory (100GB+ RAM recommended)
-
Clone or download this repository
git clone https://github.com/snapfinger/Untamed.git cd Untamed -
Verify MATLAB installation
- Open MATLAB and ensure it's properly configured
- All required functions are included in the repository
-
Optional Dependencies (for generating NASCAR networks from scratch)
- TFOCS toolbox (http://cvxr.com/tfocs/): Required only if you need to generate NASCAR networks with constraints
- N-way Toolbox (http://www.models.life.ku.dk/nwaytoolbox/download): Required only for CORCONDIA metric calculation
Note: These are only needed if you're generating NASCAR networks from raw fMRI data. For using pre-computed networks (as in the quick start), they are not required.
-
Navigate to the repository directory in MATLAB
cd /path/to/Untamed
-
Run the main script
run_untamed.m
-
Find your results
- Outputs will be saved in
./untamed_output/ - The final parcellation is in
parcel.mat
- Outputs will be saved in
That's it! The script will automatically:
- Load the pre-computed NASCAR networks from
artifact/nascar_GSP_rst.mat - Construct graph adjacency matrices
- Generate embeddings
- Create a parcellation with 50 parcels per hemisphere (default)
This section provides a detailed walkthrough of the Untamed pipeline, explaining each step and how to customize it for your needs.
The pipeline requires NASCAR spatial maps as input. These are functional network spatial patterns derived from tensor decomposition of multi-subject fMRI data.
Pre-computed input (included):
artifact/nascar_GSP_rst.mat: Contains NASCAR networks from the GSP dataset- Structure:
rst_50{2, 1}contains the spatial maps - Format:
[num_vertices Γ num_networks]matrix - Surface space: fsaverage6 (37,476 left + 37,471 right = 74,947 vertices)
- Structure:
What if you need to generate NASCAR networks from your own data?
- See the Using with Custom Data section below
- Refer to the NASCAR documentation in
helper/NASCAR_v2/README - Example code is available in
helper/NASCAR_v2/demo.m
The first stage builds a graph representation of cortical connectivity.
What happens:
-
Correlation computation: Spatial correlation matrix is computed from NASCAR maps
spatial = rst_50{2, 1}; nascar_corr = corr(spatial');
-
Hemisphere separation: Left and right hemispheres are processed separately
nascar_corr_L = nascar_corr(1:37476, 1:37476); nascar_corr_R = nascar_corr(37477:end, 37477:end);
-
Gaussian kernel transformation: Converts correlations to similarity weights
sigma = 0.5; nascar_corr_L = exp(nascar_corr_L ./ (2 * sigma^2));
-
Spatial constraint addition: Ensures spatial coherence by restricting connections to neighboring vertices
nascar_corrL_nb = addSpatialConstraintOneHem(nascar_corr_L, 'L', nb_L);
Output: Adjacency matrices saved to ./untamed_output/adj.mat
The second stage generates low-dimensional embeddings for each cortical vertex.
What happens:
-
NetMF embedding: Applies Network Matrix Factorization to learn vertex representations
embed_L = netmf_embed_wrapper(nascar_corrL_nb, negative_sampling, alpha, T, embed_dim);
-
Parameters:
T = 7: Window length for random walk approximationembed_dim = 128: Dimensionality of output embeddingsalpha = 0.5: Power scaling factornegative_sampling = 1: Negative sampling flag
Output: Embeddings saved to ./untamed_output/embed.mat
The final stage clusters embeddings to produce the parcellation.
What happens:
-
Multiple k-means runs: Performs 500 independent k-means clustering runs with random initializations
[parcels, sumd_exps] = kmeans_cluster(embed_L, embed_R, ... cluster_num_left, cluster_num_right, ... km_max_iter, exp_num);
-
Best parcellation selection: Chooses the run with the smallest within-cluster sum of squares
[parcel, ~] = kmeans_best_cluster(parcels, sumd_exps);
Output: Final parcellation saved to ./untamed_output/parcel.mat
If you have pre-computed brain networks:
-
Prepare your brain network file:
- Format: MATLAB
.matfile - Variable:
rst_50(or adjust the variable name inrun_untamed.m) - Structure:
rst_50{2, 1}should contain[num_vertices Γ num_networks]spatial maps - Surface space: Must be in fsaverage6 space (or adjust vertex numbers accordingly)
- Format: MATLAB
-
Update the script:
% In run_untamed.m, line 27, change: load('artifact/nascar_GSP_rst.mat'); % To: load('path/to/your/brain_network_file.mat');
-
Adjust vertex numbers (if not using fsaverage6):
% In run_untamed.m, lines 33-34, update: LEFT_VERT_NUM = your_left_vertex_count; RIGHT_VERT_NUM = your_right_vertex_count;
-
Update spatial constraint files (if needed):
- The script uses
files/fsavg6_nb1_index.matfor spatial constraints - For different surface spaces, you'll need to generate corresponding neighborhood files
- The script uses
If you need to generate NASCAR networks from scratch:
-
Prepare your fMRI data:
- Format: 3D tensor
[vertices Γ timepoints Γ subjects] - Preprocessing: Standard fMRI preprocessing (motion correction, registration to fsaverage6, etc.)
- Format: 3D tensor
-
Temporal alignment (recommended especially for non-task fMRI data):
% Use GroupBrainSync for temporal alignment addpath('helper/GroupBrainSync'); aligned_data = groupBrainSync(your_fmri_tensor);
-
Run NASCAR decomposition:
addpath('helper/NASCAR_v2'); option = srscpd('opt'); option.nonnegative = [0 0 1]; % Non-negative constraint on spatial mode option.rankOneOptALS.useTFOCS = false; option.optAlg.normRegParam = 0.001; option.optAlg.optSolver.maxNumIter = 2000; R = 50; % Desired number of networks result = srscpd(aligned_data, R, option); % Extract spatial maps spatial_maps = result(R).U{1}; % [vertices Γ R]
-
Save in the expected format:
rst_50{2, 1} = spatial_maps; save('your_nascar_output.mat', 'rst_50'); -
Use the saved file as described in Option 1 above.
These parameters control the behavior of the Untamed pipeline and can be adjusted in run_untamed.m:
-
nb(default:1for testing purposes only, use 20-50 for real data)- Spatial neighborhood constraint level
- Controls how many neighbors are considered for spatial constraints
- Higher values allow connections to more distant neighbors
-
sigma(default:0.5)- Gaussian kernel bandwidth for similarity transformation
- Controls the decay rate of similarity with correlation
- Lower values create more sparse graphs
- Range: 0.1-1.0
-
T(default:7)- NetMF embedding window length
- Controls the context window for random walk approximation
- Higher values capture longer-range dependencies
- Range: 3-10
-
embed_dim(default:128)- Dimensionality of output embeddings
- Higher dimensions capture more information but require more memory
- Range: 128 / 256
-
alpha(default:0.5)- Power scaling factor for embeddings
- Controls the influence of singular values
- Range: 0.0-1.0
-
negative_sampling(default:1)- Flag for negative sampling in NetMF
-
cluster_num_left(default:50)- Desired number of parcels in the left hemisphere
- Adjust based on your resolution needs
- Range: 10-200+
-
cluster_num_right(default:50)- Desired number of parcels in the right hemisphere
- Can be different from left hemisphere
-
km_max_iter(default:20000)- Maximum iterations for each k-means run
- Increase if convergence issues occur
-
exp_num(default:500)- Number of independent k-means runs
- Higher values improve robustness but increase computation time
- Range: 100-1000
- For higher resolution parcellations: Increase
cluster_num_leftandcluster_num_right - For faster computation: Reduce
exp_num,embed_dim, orT - For better spatial coherence: Increase
nbor adjustsigma - For different surface spaces: Update
LEFT_VERT_NUMandRIGHT_VERT_NUM, and provide corresponding neighborhood files
All outputs are saved in ./untamed_output/ directory:
-
adj.mat- Contains:
nascar_corrL_nb,nascar_corrR_nb - Description: Graph adjacency matrices for left and right hemispheres
- Format: Sparse or full matrices
[num_vertices Γ num_vertices]
- Contains:
-
embed.mat- Contains:
embed_L,embed_R - Description: Vertex embeddings for left and right hemispheres
- Format: Cell arrays with embeddings
[num_vertices Γ embed_dim] - Structure:
embed_L{1, 2}contains the actual embedding matrix
- Contains:
-
parcel.mat- Contains:
parcel - Description: Final parcellation labels
- Format: Integer vector
[74947 Γ 1](for fsaverage6) - Values: Parcel IDs (1 to
cluster_num_leftfor left,cluster_num_left+1tocluster_num_left+cluster_num_rightfor right)
- Contains:
Load and visualize the parcellation:
load('./untamed_output/parcel.mat');
% parcel is a [74947 Γ 1] vector with parcel labels
% First 37476 entries: left hemisphere
% Remaining entries: right hemisphereExport for visualization tools:
% For FreeSurfer/Connectome Workbench
% Save as annotation file or gifti format
% (You may need additional scripts for format conversion)Solution:
- Reduce
embed_dim(e.g., from 128 to 64) - Process hemispheres separately if already not doing so
- Close other MATLAB processes
- Use a machine with more RAM
Solution:
- Increase
km_max_iter(e.g., to 50000) - Check if embeddings contain NaN or Inf values
- Try different random seeds
Solution:
- Increase
nb(spatial neighborhood constraint) - Adjust
sigmato create stronger spatial connections - Check that spatial constraint files are correct for your surface space
Solution:
- Verify
LEFT_VERT_NUMandRIGHT_VERT_NUMmatch your data - Check that your networks are in the correct surface space
- Ensure hemisphere separation indices are correct
Solution:
- Ensure you're running from the repository root directory
- Check that
files/fsavg6_nb1_index.matexists - Verify
artifact/nascar_GSP_rst.matis present (or your custom file)
Solution:
- Reduce
exp_num(fewer k-means runs) - Reduce
embed_dimorT - Use a machine with more CPU cores (some functions support parallel processing)
- Check if you can use sparse matrices for adjacency matrices
- Check the code comments in
run_untamed.mfor detailed explanations - Review the NASCAR documentation in
helper/NASCAR_v2/README - Ensure all required files are in the correct locations
- Verify MATLAB version compatibility (R2016b+)
run_untamed.m: Main script to generate parcellationsartifact/nascar_GSP_rst.mat: Pre-computed NASCAR networks from GSP datasethelper/: Supporting functions and toolboxesNASCAR_v2/: NASCAR tensor decomposition implementationGroupBrainSync/: Temporal alignment for multi-subject data- NetMF embedding functions
- Clustering utilities
files/: Required data files (spatial constraints, vertex mappings)vis_spatial_corr.ipynb: Jupyter notebook for visualizing spatial correlations of NASCAR networks and ICA networksfigs/untamed_framework.jpg: Pipeline diagram
If you use Untamed in your research, please cite:
@article{untamed2025,
title={Untamed: Unconstrained Tensor Decomposition and Graph Node Embedding for Cortical Parcellation},
author={Liu, Yijun and Li, Jian and Wisnowski, Jessica L and Leahy, Richard M},
journal={bioRxiv},
pages={2024.01.05.574423},
year={2025},
publisher={Cold Spring Harbor Laboratory}
}If you use the NASCAR component, please also cite:
@article{li2023identification,
title={Identification of overlapping and interacting networks reveals intrinsic spatiotemporal organization of the human brain},
author={Li, Jian and Liu, Yijun and Wisnowski, Jessica L and Leahy, Richard M},
journal={NeuroImage},
volume={270},
pages={119944},
year={2023},
publisher={Elsevier}
}