Skip to content

openfluke/loom

Repository files navigation

LOOM - Layered Omni-architecture Openfluke Machine

A high-performance CPU-first neural network framework written in Go, with experimental WebGPU compute shaders for GPU acceleration (in development, only select layers supported). Features WebAssembly export for browser deployment. Now with transformer inference support!

Go Version License PyPI npm NuGet Python .NET

🌍 Cross-Ecosystem Compatibility

Models trained in any platform work instantly in all others. Bit-for-bit identical results across Go, Python, C#, TypeScript, and browser WASM.

Platform Package Install
Go GitHub go get github.com/openfluke/loom
Python PyPI pip install welvet
C#/.NET NuGet dotnet add package Welvet
TypeScript/Node NPM npm install @openfluke/welvet
Browser WASM import { init } from "@openfluke/welvet"

Supported Platforms

Pre-compiled binaries for:

  • Linux: x86_64, ARM64, ARMv7
  • Windows: x86_64, x86, ARM64
  • macOS: Apple Silicon (M1/M2/M3), Intel, Universal
  • Android: ARM64, ARMv7
  • iOS: ARM64 (XCFramework)

Key Strengths

  • True Embeddability: Single binary. Zero external dependencies. No Python runtime needed.
  • Hybrid Gradient/Geometric Engine: Neural Tweening combines geometric gap-closing with backpropagation-guided momentum for real-time adaptation.
  • Structural Parallelism: Native support for Inception, ResNeXt, Siamese, and MoE architectures via LayerParallel with 6 combine modes.
  • Native Mixed-Precision: Generic tensor backend supports int8, uint16, float32, float64 natively.
  • Complete Training Infrastructure: 7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants.
  • Pure Go Tokenizer: HuggingFace-compatible BPE tokenizer for LLM inference.
  • Step-Based Execution: Real-time inference with layer-by-layer control via StepForward API.
  • Network Telemetry: Runtime introspection via GetMethodsJSON() and ExtractNetworkBlueprint().

Key Limitations

  • Ecosystem Maturity: No central "Model Zoo" or pip-installable convenience; relies on loading external checkpoints.
  • GPU Support: WebGPU acceleration is implemented (Dense, Conv2D, MHA) but is beta/experimental and less stable than CuDNN/CUDA.
  • Operator Coverage: While "Deep" support is good (MHA, LSTM), "Broad" support (e.g., 3D Conv, Deformable Attn, FFTs) is missing compared to SciPy/JAX.
  • Math Backend: Relies on custom explicit forward/backward passes rather than a general-purpose symbolic autograd graph.

What's New

🎉 Transformer Inference: SmolLM2-135M-Instruct runs entirely in browser WASM with pure Go implementation.

🤯 Grid Softmax = Native MoE: Mathematically proven equivalent to PyTorch MoE with 97.1% loss reduction. See examples/moe_proof_demo.go.

Grid Scatter Mode: Place parallel branch outputs at specific 2D/3D grid positions for multi-agent systems, hierarchical RL, and ensemble methods with explicit topology.

🧠 Neural Tweening: Train and run simultaneously with 100% accuracy on shallow networks, never crashes to 0% during task changes. Benchmarks →


Framework Comparison

Global AI Landscape

Feature Category Feature Loom (Go) PyTorch (Py) TF / TFLite GoMLX (Go) Spago (Go) Core ML TF.js Candle (Rust)
Core Primary Language Go Python Python / C++ Go Go Swift / ObjC JS / TS Rust
Runtime Dependency None (Binary) Heavy (Pip) Binary (Edge) CGo / XLA None OS-Native Browser None
Auto-Differentiation ⚠️ Hybrid/Manual ✅ Full ✅ Full ✅ Full (XLA) ✅ Manual ❌ (Inference) ✅ Full ✅ Full
Loading Safetensors Native
ONNX Support ✅ (Export) ⚠️ ✅ (Import) ⚠️
Structure Inference Auto-Detect
Training Gradient Descent ✅ Manual Chain ✅ Standard ✅ Standard ✅ Standard ✅ Standard ✅ (On-device) ✅ Standard ✅ Standard
Neural Tweening Hybrid Engine
LR Schedulers 7 Types ⚠️ Basic
Optimizers 3 (SGD/AdamW/RMSprop) ✅ Many ✅ Many ⚠️
Layer Support Dense (MLP)
Conv2D
Conv1D Native
RNN / LSTM Full Gate
Transformer (MHA) ✅ (Explicit) ✅ (BERT)
SwiGLU Native
Parallel / MoE Structure ❌ (Manual) ❌ (Manual)
Sequential Layers Native ⚠️ ⚠️ ⚠️ ⚠️
Embeddings
Tokenizer Pure Go ❌ (Rust/C++) ❌ (C++)
Normalization LayerNorm Native
RMSNorm Native ⚠️ (Manual) ⚠️ (Manual)
Residual/Skip Native
Advanced Stitch Layers Native ❌ (Manual) ❌ (Manual)
Dynamic Arch Gen Built-in
Step-Based Forward Unique
K-Means Clustering Parallel
Correlation Analysis Pearson/Spearman
Model Evaluation Deviation/Metrics ⚠️ ⚠️ ⚠️ ⚠️ ⚠️
Network Telemetry Blueprint API ⚠️ ⚠️
Runtime Introspection Reflection ⚠️ (Python) ⚠️ ⚠️
Platform WASM Training Full ✅ (Slow)
Cross-Lang ABI Universal ⚠️
Ecosystem HuggingFace Hub ⚠️ (Read/Inspect) ✅ Native ✅ Native
Pre-trained Zoo ✅ Massive ✅ Massive ✅ (Small) ✅ (Apple) ✅ Large ⚠️ Growing
Mobile/Web WASM / C-ABI ✅ (Mobile) King King (iOS) King (Web) ✅ (WASM)

Go Ecosystem Comparison

Category Feature Loom GoMLX Gorgonia Spago Go-Deep Gonum
Foundation Primary implementation Pure Go CGo (XLA) Pure Go + CGo Pure Go Pure Go Pure Go
Tensor Backend Custom (Generic) XLA (C++) Custom Custom (Dense) Custom Dense Matrix
Autograd ⚠️ Hybrid ✅ Full ✅ Symbolic ✅ Dynamic ✅ Backprop
Model Load Safetensors Native
Model Export binary/json XLA format Onnx (Import) Gob Json
Architecture Dense (MLP) ✅ (Matrix Mul)
Conv2D
Conv1D Native ⚠️ (via 2D) ⚠️ (via 2D)
RNN / LSTM Full Gate ⚠️ Basic ✅ BiLSTM
Transformer (MHA) Explicit ⚠️ Hard ✅ (BERT)
SwiGLU
Embeddings
Parallel / MoE MoE + Gating ❌ (Manual)
Sequential Layers Native + Nested ⚠️ (Manual) ⚠️ (Manual) ⚠️ (Manual)
Tokenizer Pure Go ❌ (Deps) ✅ (WordPiece)
Training Gradient Descent ✅ Manual ✅ Standard ✅ Standard ✅ Standard ✅ Standard
Hybrid Tweening Unique
LR Schedulers 7 Types ⚠️ Basic
Optimizers SGD/AdamW/RMSprop ⚠️ SGD
Softmax Variants 10 Types ⚠️ Standard ⚠️ Standard ⚠️ Standard ⚠️ Standard
Normalization LayerNorm Native ⚠️ Manual
RMSNorm Native
Residual/Skip Native
Advanced RoPE Embeddings GQA Support
Network Grafting Unique
Step-Based Forward Unique
Dynamic Arch Gen Unique
K-Means Clustering Parallel
Correlation Analysis Pearson/Spearman
Model Evaluation Full Suite ⚠️ ⚠️ ⚠️
Network Telemetry Blueprint ⚠️
Runtime Introspection Reflection ⚠️
Platform C-ABI (Polyglot) Universal
WASM Training Full ❌ (XLA)
Ecosystem HuggingFace ⚠️ (Load) ✅ (Load)
Documentation ⚠️ Growing ✅ Good ✅ Good ✅ Good ⚠️ Minimal ✅ Excellent
Maintenance 🔥 Active 🔥 Active ⚠️ Slow ⏸️ Paused ⚠️ Slow 🔥 Active

Native Numerical Type & Precision Support

Layer Type Numerical Type Loom GoMLX Gorgonia Spago PyTorch
All Layers Float32 ✅ (Float64)
(Dense, Conv, Float64 (High Prec) Native
RNN, Attn) Float16 / BF16 ⚠️ (Storage) ✅ (XLA)
Int8 (Training) Native ⚠️ (QAT Wrapper)
Int8 (Inference) ✅ (Quant)
Int16, Int32, Int64 Native ✅ (XLA) ⚠️ (Tensor) ❌ (Tensor Only)
Uint8, Uint16, Uint32 Native ✅ (XLA) ⚠️ (Tensor) ✅ (Uint8 Only)

Note

Complete Type System: Unlike frameworks that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific sim) across every layer type without changes to the model code.

Summary Verdict

  • Choose PyTorch if you are doing Research, need the latest SOTA models, or rely on complex dynamic architectures.
  • Choose TensorFlow / TFLite if you need robust Mobile/Edge Deployment.
  • Choose GoMLX if you need High-Performance Training in Go and can tolerate CGo/C++ dependencies.
  • Choose Core ML if you are targeting iOS/macOS exclusively.
  • Choose Loom if you need Pure Go-Native Embedding (Cloud/CLI/Server), want a single binary with zero dependencies, need to experiment with the Neural Tweening training paradigm, or need unique features like Step-Based Forward Pass for real-time inference and Dynamic Architecture Generation for automated model exploration.

Layer Types & Features

Supported Layer Types

Layer Type String Description
Dense dense Fully connected layer
LSTM lstm Long Short-Term Memory
RNN rnn Recurrent Neural Network
GRU gru Gated Recurrent Unit
Conv2D conv2d 2D Convolution
Conv1D conv1d 1D Convolution
Multi-Head Attention multi_head_attention Transformer attention
LayerNorm layer_norm Layer normalization
RMSNorm rms_norm RMS normalization
SwiGLU swiglu SwiGLU activation layer
Softmax softmax 10 variants (Standard, Grid, Hierarchical, Temperature, Gumbel, Masked, Sparsemax, Entmax, Adaptive, Mixture)
Embedding embedding Token embedding
Parallel parallel Branching with 6 combine modes (add, concat, multiply, average, grid_scatter, filter)
Sequential sequential Grouped sub-layers

Activation Functions

relu, sigmoid, tanh, softmax, gelu, swish, mish, leaky_relu, elu, selu, linear


Quick Start

Installation

# Clone the repository
git clone https://github.com/openfluke/loom.git
cd loom

# Install dependencies
go mod download

Simple Example

package main

import (
    "fmt"
    "github.com/openfluke/loom/nn"
)

func main() {
    network := nn.NewNetwork(4096, 4, 4, 5)  // 80 total layers

    if err := network.InitGPU(); err != nil {
        panic(err)
    }
    defer network.ReleaseGPU()

    input := make([]float32, 4096)
    output, gpuTime, _ := network.ForwardGPU(input)

    fmt.Printf("GPU Forward time: %v, Output size: %d\n", gpuTime, len(output))
}

Model Serialization

// Save a trained model
err := network.SaveModel("model.json", "my_model")

// Load it back - ONE LINE!
loadedNet, err := nn.LoadModel("model.json", "my_model")

// Or use strings (great for APIs/databases/WASM)
jsonString, err := network.SaveModelToString("my_model")
loadedNet, err := nn.LoadModelFromString(jsonString, "my_model")

Cross-Platform API

Function Go Python TypeScript C# C
Create BuildNetworkFromJSON() create_network_from_json() createNetworkFromJSON() CreateLoomNetwork() CreateLoomNetwork()
Forward ForwardCPU() forward_simple() forward() LoomForward() LoomForward()
Train Train() train_simple() train() LoomTrain() LoomTrain()
Save SaveModelToString() save_model_simple() saveModel() LoomSaveModel() LoomSaveModel()
Load LoadModelFromString() load_model_simple() loadLoomNetwork() LoomLoadModel() LoomLoadModel()
Evaluate EvaluateNetwork() evaluate_network_simple() evaluate() LoomEvaluateNetwork() LoomEvaluateNetwork()

Language Bindings

Python

pip install welvet
import welvet

config = {"batch_size": 1, "layers": [...]}
welvet.create_network_from_json(config)
output = welvet.forward_simple([0.1, 0.2, 0.3, 0.4])

See python/README.md for complete documentation.

TypeScript / Node.js

npm install @openfluke/welvet
import { init, createNetworkFromJSON } from "@openfluke/welvet";

await init();
const network = createNetworkFromJSON(JSON.stringify(config));
const output = network.ForwardCPU(JSON.stringify([[0.1, 0.2, 0.3, 0.4]]));

See typescript/README.md for complete documentation.

C# / .NET

dotnet add package Welvet
using Welvet;

Network.CreateFromJson(config);
var output = NativeMethods.LoomForward(input, input.Length);

See csharp/README.md for complete documentation.


Project Structure

loom/
├── nn/                  # Neural network package (core)
├── tokenizer/           # Pure Go BPE tokenizer
├── wasm/                # WebAssembly module
├── cabi/                # C ABI for FFI
├── python/              # Python package (welvet)
├── typescript/          # TypeScript/WASM package
├── csharp/              # C#/.NET package (Welvet)
├── fabric/              # Demo application
├── pods/                # GPU compute pods
├── model_conversion/    # HuggingFace model import
├── docs/                # Documentation
└── detector/            # GPU device detection

Documentation

More Examples: See github.com/openfluke/tva for additional examples and experiments.


Requirements

  • Go: 1.24 or higher
  • GPU: WebGPU-compatible GPU (Vulkan, Metal, or D3D12) - optional
  • OS: Linux, macOS, or Windows

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Apache License 2.0 - see LICENSE file for details.


Made with ❤️ by Openfluke