A high-performance CPU-first neural network framework written in Go, with experimental WebGPU compute shaders for GPU acceleration (in development, only select layers supported). Features WebAssembly export for browser deployment. Now with transformer inference support!
Models trained in any platform work instantly in all others. Bit-for-bit identical results across Go, Python, C#, TypeScript, and browser WASM.
| Platform | Package | Install |
|---|---|---|
| Go | GitHub | go get github.com/openfluke/loom |
| Python | PyPI | pip install welvet |
| C#/.NET | NuGet | dotnet add package Welvet |
| TypeScript/Node | NPM | npm install @openfluke/welvet |
| Browser | WASM | import { init } from "@openfluke/welvet" |
Pre-compiled binaries for:
- Linux: x86_64, ARM64, ARMv7
- Windows: x86_64, x86, ARM64
- macOS: Apple Silicon (M1/M2/M3), Intel, Universal
- Android: ARM64, ARMv7
- iOS: ARM64 (XCFramework)
- True Embeddability: Single binary. Zero external dependencies. No Python runtime needed.
- Hybrid Gradient/Geometric Engine: Neural Tweening combines geometric gap-closing with backpropagation-guided momentum for real-time adaptation.
- Structural Parallelism: Native support for Inception, ResNeXt, Siamese, and MoE architectures via
LayerParallelwith 6 combine modes. - Native Mixed-Precision: Generic tensor backend supports
int8,uint16,float32,float64natively. - Complete Training Infrastructure: 7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants.
- Pure Go Tokenizer: HuggingFace-compatible BPE tokenizer for LLM inference.
- Step-Based Execution: Real-time inference with layer-by-layer control via
StepForwardAPI. - Network Telemetry: Runtime introspection via
GetMethodsJSON()andExtractNetworkBlueprint().
- Ecosystem Maturity: No central "Model Zoo" or pip-installable convenience; relies on loading external checkpoints.
- GPU Support: WebGPU acceleration is implemented (Dense, Conv2D, MHA) but is beta/experimental and less stable than CuDNN/CUDA.
- Operator Coverage: While "Deep" support is good (MHA, LSTM), "Broad" support (e.g., 3D Conv, Deformable Attn, FFTs) is missing compared to SciPy/JAX.
- Math Backend: Relies on custom explicit forward/backward passes rather than a general-purpose symbolic autograd graph.
🎉 Transformer Inference: SmolLM2-135M-Instruct runs entirely in browser WASM with pure Go implementation.
🤯 Grid Softmax = Native MoE: Mathematically proven equivalent to PyTorch MoE with 97.1% loss reduction. See
examples/moe_proof_demo.go.
⚡ Grid Scatter Mode: Place parallel branch outputs at specific 2D/3D grid positions for multi-agent systems, hierarchical RL, and ensemble methods with explicit topology.
🧠 Neural Tweening: Train and run simultaneously with 100% accuracy on shallow networks, never crashes to 0% during task changes. Benchmarks →
| Feature Category | Feature | Loom (Go) | PyTorch (Py) | TF / TFLite | GoMLX (Go) | Spago (Go) | Core ML | TF.js | Candle (Rust) |
|---|---|---|---|---|---|---|---|---|---|
| Core | Primary Language | Go | Python | Python / C++ | Go | Go | Swift / ObjC | JS / TS | Rust |
| Runtime Dependency | None (Binary) | Heavy (Pip) | Binary (Edge) | CGo / XLA | None | OS-Native | Browser | None | |
| Auto-Differentiation | ✅ Full | ✅ Full | ✅ Full (XLA) | ✅ Manual | ❌ (Inference) | ✅ Full | ✅ Full | ||
| Loading | Safetensors | ✅ Native | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
| ONNX Support | ❌ | ✅ (Export) | ✅ | ❌ | ✅ (Import) | ✅ | |||
| Structure Inference | ✅ Auto-Detect | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Training | Gradient Descent | ✅ Manual Chain | ✅ Standard | ✅ Standard | ✅ Standard | ✅ Standard | ✅ (On-device) | ✅ Standard | ✅ Standard |
| Neural Tweening | ✅ Hybrid Engine | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| LR Schedulers | ✅ 7 Types | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| Optimizers | ✅ 3 (SGD/AdamW/RMSprop) | ✅ Many | ✅ Many | ✅ | ✅ | ✅ | ✅ | ||
| Layer Support | Dense (MLP) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Conv2D | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | |
| Conv1D | ✅ Native | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | |
| RNN / LSTM | ✅ Full Gate | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Transformer (MHA) | ✅ (Explicit) | ✅ | ✅ | ✅ | ✅ (BERT) | ✅ | ✅ | ✅ | |
| SwiGLU | ✅ Native | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | |
| Parallel / MoE | ✅ Structure | ❌ (Manual) | ❌ (Manual) | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Sequential Layers | ✅ Native | ✅ | ✅ | ✅ | |||||
| Embeddings | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Tokenizer | ✅ Pure Go | ❌ (Rust/C++) | ❌ (C++) | ❌ | ❌ | ✅ | ❌ | ✅ | |
| Normalization | LayerNorm | ✅ Native | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| RMSNorm | ✅ Native | ✅ | ❌ | ❌ | ❌ | ✅ | |||
| Residual/Skip | ✅ Native | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | |
| Advanced | Stitch Layers | ✅ Native | ❌ (Manual) | ❌ (Manual) | ❌ | ❌ | ❌ | ❌ | ❌ |
| Dynamic Arch Gen | ✅ Built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Step-Based Forward | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| K-Means Clustering | ✅ Parallel | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Correlation Analysis | ✅ Pearson/Spearman | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Model Evaluation | ✅ Deviation/Metrics | ✅ | ✅ | ||||||
| Network Telemetry | ✅ Blueprint API | ❌ | ❌ | ❌ | ❌ | ❌ | |||
| Runtime Introspection | ✅ Reflection | ❌ | ❌ | ❌ | ❌ | ||||
| Platform | WASM Training | ✅ Full | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ (Slow) | ✅ |
| Cross-Lang ABI | ✅ Universal | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ||
| Ecosystem | HuggingFace Hub | ✅ Native | ✅ Native | ❌ | ✅ | ❌ | ✅ | ✅ | |
| Pre-trained Zoo | ❌ | ✅ Massive | ✅ Massive | ❌ | ✅ (Small) | ✅ (Apple) | ✅ Large | ||
| Mobile/Web | ✅ WASM / C-ABI | ✅ (Mobile) | ✅ King | ❌ | ❌ | ✅ King (iOS) | ✅ King (Web) | ✅ (WASM) |
| Category | Feature | Loom | GoMLX | Gorgonia | Spago | Go-Deep | Gonum |
|---|---|---|---|---|---|---|---|
| Foundation | Primary implementation | Pure Go | CGo (XLA) | Pure Go + CGo | Pure Go | Pure Go | Pure Go |
| Tensor Backend | Custom (Generic) | XLA (C++) | Custom | Custom (Dense) | Custom | Dense Matrix | |
| Autograd | ✅ Full | ✅ Symbolic | ✅ Dynamic | ✅ Backprop | ❌ | ||
| Model | Load Safetensors | ✅ Native | ✅ | ❌ | ❌ | ❌ | ❌ |
| Model Export | binary/json | XLA format | Onnx (Import) | Gob | Json | ❌ | |
| Architecture | Dense (MLP) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ (Matrix Mul) |
| Conv2D | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | |
| Conv1D | ✅ Native | ✅ | ❌ | ❌ | |||
| RNN / LSTM | ✅ Full Gate | ✅ | ✅ BiLSTM | ❌ | ❌ | ||
| Transformer (MHA) | ✅ Explicit | ✅ | ✅ (BERT) | ❌ | ❌ | ||
| SwiGLU | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | |
| Embeddings | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | |
| Parallel / MoE | ✅ MoE + Gating | ❌ (Manual) | ❌ | ❌ | ❌ | ❌ | |
| Sequential Layers | ✅ Native + Nested | ❌ | ❌ | ||||
| Tokenizer | ✅ Pure Go | ❌ (Deps) | ❌ | ✅ (WordPiece) | ❌ | ❌ | |
| Training | Gradient Descent | ✅ Manual | ✅ Standard | ✅ Standard | ✅ Standard | ✅ Standard | ❌ |
| Hybrid Tweening | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | |
| LR Schedulers | ✅ 7 Types | ✅ | ✅ | ❌ | ❌ | ||
| Optimizers | ✅ SGD/AdamW/RMSprop | ✅ | ✅ | ✅ | ❌ | ||
| Softmax Variants | ✅ 10 Types | ❌ | |||||
| Normalization | LayerNorm | ✅ Native | ✅ | ✅ | ❌ | ❌ | |
| RMSNorm | ✅ Native | ✅ | ❌ | ❌ | ❌ | ❌ | |
| Residual/Skip | ✅ Native | ✅ | ✅ | ❌ | ❌ | ❌ | |
| Advanced | RoPE Embeddings | ✅ GQA Support | ✅ | ❌ | ❌ | ❌ | ❌ |
| Network Grafting | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Step-Based Forward | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Dynamic Arch Gen | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | |
| K-Means Clustering | ✅ Parallel | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Correlation Analysis | ✅ Pearson/Spearman | ❌ | ❌ | ❌ | ❌ | ❌ | |
| Model Evaluation | ✅ Full Suite | ❌ | ❌ | ||||
| Network Telemetry | ✅ Blueprint | ❌ | ❌ | ❌ | ❌ | ||
| Runtime Introspection | ✅ Reflection | ❌ | ❌ | ❌ | ❌ | ||
| Platform | C-ABI (Polyglot) | ✅ Universal | ❌ | ❌ | ❌ | ❌ | ❌ |
| WASM Training | ✅ Full | ❌ (XLA) | ❌ | ❌ | ❌ | ❌ | |
| Ecosystem | HuggingFace | ❌ | ❌ | ✅ (Load) | ❌ | ❌ | |
| Documentation | ✅ Good | ✅ Good | ✅ Good | ✅ Excellent | |||
| Maintenance | 🔥 Active | 🔥 Active | ⏸️ Paused | 🔥 Active |
| Layer Type | Numerical Type | Loom | GoMLX | Gorgonia | Spago | PyTorch |
|---|---|---|---|---|---|---|
| All Layers | Float32 | ✅ | ✅ | ✅ | ✅ (Float64) | ✅ |
| (Dense, Conv, | Float64 (High Prec) | ✅ Native | ✅ | ✅ | ✅ | ✅ |
| RNN, Attn) | Float16 / BF16 | ✅ (XLA) | ❌ | ❌ | ✅ | |
| Int8 (Training) | ✅ Native | ❌ | ❌ | ❌ | ||
| Int8 (Inference) | ✅ | ❌ | ❌ | ❌ | ✅ (Quant) | |
| Int16, Int32, Int64 | ✅ Native | ✅ (XLA) | ❌ | ❌ (Tensor Only) | ||
| Uint8, Uint16, Uint32 | ✅ Native | ✅ (XLA) | ❌ | ✅ (Uint8 Only) |
Note
Complete Type System: Unlike frameworks that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific sim) across every layer type without changes to the model code.
- Choose PyTorch if you are doing Research, need the latest SOTA models, or rely on complex dynamic architectures.
- Choose TensorFlow / TFLite if you need robust Mobile/Edge Deployment.
- Choose GoMLX if you need High-Performance Training in Go and can tolerate CGo/C++ dependencies.
- Choose Core ML if you are targeting iOS/macOS exclusively.
- Choose Loom if you need Pure Go-Native Embedding (Cloud/CLI/Server), want a single binary with zero dependencies, need to experiment with the Neural Tweening training paradigm, or need unique features like Step-Based Forward Pass for real-time inference and Dynamic Architecture Generation for automated model exploration.
| Layer | Type String | Description |
|---|---|---|
| Dense | dense |
Fully connected layer |
| LSTM | lstm |
Long Short-Term Memory |
| RNN | rnn |
Recurrent Neural Network |
| GRU | gru |
Gated Recurrent Unit |
| Conv2D | conv2d |
2D Convolution |
| Conv1D | conv1d |
1D Convolution |
| Multi-Head Attention | multi_head_attention |
Transformer attention |
| LayerNorm | layer_norm |
Layer normalization |
| RMSNorm | rms_norm |
RMS normalization |
| SwiGLU | swiglu |
SwiGLU activation layer |
| Softmax | softmax |
10 variants (Standard, Grid, Hierarchical, Temperature, Gumbel, Masked, Sparsemax, Entmax, Adaptive, Mixture) |
| Embedding | embedding |
Token embedding |
| Parallel | parallel |
Branching with 6 combine modes (add, concat, multiply, average, grid_scatter, filter) |
| Sequential | sequential |
Grouped sub-layers |
relu, sigmoid, tanh, softmax, gelu, swish, mish, leaky_relu, elu, selu, linear
# Clone the repository
git clone https://github.com/openfluke/loom.git
cd loom
# Install dependencies
go mod downloadpackage main
import (
"fmt"
"github.com/openfluke/loom/nn"
)
func main() {
network := nn.NewNetwork(4096, 4, 4, 5) // 80 total layers
if err := network.InitGPU(); err != nil {
panic(err)
}
defer network.ReleaseGPU()
input := make([]float32, 4096)
output, gpuTime, _ := network.ForwardGPU(input)
fmt.Printf("GPU Forward time: %v, Output size: %d\n", gpuTime, len(output))
}// Save a trained model
err := network.SaveModel("model.json", "my_model")
// Load it back - ONE LINE!
loadedNet, err := nn.LoadModel("model.json", "my_model")
// Or use strings (great for APIs/databases/WASM)
jsonString, err := network.SaveModelToString("my_model")
loadedNet, err := nn.LoadModelFromString(jsonString, "my_model")| Function | Go | Python | TypeScript | C# | C |
|---|---|---|---|---|---|
| Create | BuildNetworkFromJSON() |
create_network_from_json() |
createNetworkFromJSON() |
CreateLoomNetwork() |
CreateLoomNetwork() |
| Forward | ForwardCPU() |
forward_simple() |
forward() |
LoomForward() |
LoomForward() |
| Train | Train() |
train_simple() |
train() |
LoomTrain() |
LoomTrain() |
| Save | SaveModelToString() |
save_model_simple() |
saveModel() |
LoomSaveModel() |
LoomSaveModel() |
| Load | LoadModelFromString() |
load_model_simple() |
loadLoomNetwork() |
LoomLoadModel() |
LoomLoadModel() |
| Evaluate | EvaluateNetwork() |
evaluate_network_simple() |
evaluate() |
LoomEvaluateNetwork() |
LoomEvaluateNetwork() |
pip install welvetimport welvet
config = {"batch_size": 1, "layers": [...]}
welvet.create_network_from_json(config)
output = welvet.forward_simple([0.1, 0.2, 0.3, 0.4])See python/README.md for complete documentation.
npm install @openfluke/welvetimport { init, createNetworkFromJSON } from "@openfluke/welvet";
await init();
const network = createNetworkFromJSON(JSON.stringify(config));
const output = network.ForwardCPU(JSON.stringify([[0.1, 0.2, 0.3, 0.4]]));See typescript/README.md for complete documentation.
dotnet add package Welvetusing Welvet;
Network.CreateFromJson(config);
var output = NativeMethods.LoomForward(input, input.Length);See csharp/README.md for complete documentation.
loom/
├── nn/ # Neural network package (core)
├── tokenizer/ # Pure Go BPE tokenizer
├── wasm/ # WebAssembly module
├── cabi/ # C ABI for FFI
├── python/ # Python package (welvet)
├── typescript/ # TypeScript/WASM package
├── csharp/ # C#/.NET package (Welvet)
├── fabric/ # Demo application
├── pods/ # GPU compute pods
├── model_conversion/ # HuggingFace model import
├── docs/ # Documentation
└── detector/ # GPU device detection
- Neural Network Package - Detailed API documentation
- Neural Tweening Benchmarks - 19-test comprehensive benchmark
- Python Bindings - PyPI package docs
- TypeScript Bindings - NPM package docs
- C# Bindings - NuGet package docs
- WASM Module - Browser deployment
- C ABI - FFI reference
- Model Conversion - HuggingFace import guide
More Examples: See github.com/openfluke/tva for additional examples and experiments.
- Go: 1.24 or higher
- GPU: WebGPU-compatible GPU (Vulkan, Metal, or D3D12) - optional
- OS: Linux, macOS, or Windows
Contributions are welcome! Please feel free to submit a Pull Request.
Apache License 2.0 - see LICENSE file for details.
Made with ❤️ by Openfluke