LOOM - Layered Omni-architecture Openfluke Machine

A high-performance CPU-first neural network framework written in Go, with experimental WebGPU compute shaders for GPU acceleration (in development, only select layers supported). Features WebAssembly export for browser deployment. Now with transformer inference support!

🌍 Cross-Ecosystem Compatibility

Models trained in any platform work instantly in all others. Bit-for-bit identical results across Go, Python, C#, TypeScript, and browser WASM.

Platform	Package	Install
Go	GitHub	`go get github.com/openfluke/loom`
Python	PyPI	`pip install welvet`
C#/.NET	NuGet	`dotnet add package Welvet`
TypeScript/Node	NPM	`npm install @openfluke/welvet`
Browser	WASM	`import { init } from "@openfluke/welvet"`

Supported Platforms

Pre-compiled binaries for:

Linux: x86_64, ARM64, ARMv7
Windows: x86_64, x86, ARM64
macOS: Apple Silicon (M1/M2/M3), Intel, Universal
Android: ARM64, ARMv7
iOS: ARM64 (XCFramework)

Key Strengths

True Embeddability: Single binary. Zero external dependencies. No Python runtime needed.
Hybrid Gradient/Geometric Engine: Neural Tweening combines geometric gap-closing with backpropagation-guided momentum for real-time adaptation.
Structural Parallelism: Native support for Inception, ResNeXt, Siamese, and MoE architectures via LayerParallel with 6 combine modes.
Native Mixed-Precision: Generic tensor backend supports int8, uint16, float32, float64 natively.
Complete Training Infrastructure: 7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants.
Pure Go Tokenizer: HuggingFace-compatible BPE tokenizer for LLM inference.
Step-Based Execution: Real-time inference with layer-by-layer control via StepForward API.
Network Telemetry: Runtime introspection via GetMethodsJSON() and ExtractNetworkBlueprint().

Key Limitations

Ecosystem Maturity: No central "Model Zoo" or pip-installable convenience; relies on loading external checkpoints.
GPU Support: WebGPU acceleration is implemented (Dense, Conv2D, MHA) but is beta/experimental and less stable than CuDNN/CUDA.
Operator Coverage: While "Deep" support is good (MHA, LSTM), "Broad" support (e.g., 3D Conv, Deformable Attn, FFTs) is missing compared to SciPy/JAX.
Math Backend: Relies on custom explicit forward/backward passes rather than a general-purpose symbolic autograd graph.

What's New

🎉 Transformer Inference: SmolLM2-135M-Instruct runs entirely in browser WASM with pure Go implementation.

🤯 Grid Softmax = Native MoE: Mathematically proven equivalent to PyTorch MoE with 97.1% loss reduction. See examples/moe_proof_demo.go.

⚡ Grid Scatter Mode: Place parallel branch outputs at specific 2D/3D grid positions for multi-agent systems, hierarchical RL, and ensemble methods with explicit topology.

🧠 Neural Tweening: Train and run simultaneously with 100% accuracy on shallow networks, never crashes to 0% during task changes. Benchmarks →

Framework Comparison

Global AI Landscape

Feature Category	Feature	Loom (Go)	PyTorch (Py)	TF / TFLite	GoMLX (Go)	Spago (Go)	Core ML	TF.js	Candle (Rust)
Core	Primary Language	Go	Python	Python / C++	Go	Go	Swift / ObjC	JS / TS	Rust
	Runtime Dependency	None (Binary)	Heavy (Pip)	Binary (Edge)	CGo / XLA	None	OS-Native	Browser	None
	Auto-Differentiation	⚠️ Hybrid/Manual	✅ Full	✅ Full	✅ Full (XLA)	✅ Manual	❌ (Inference)	✅ Full	✅ Full
Loading	Safetensors	✅ Native	✅	✅	✅	❌	❌	❌	✅
	ONNX Support	❌	✅ (Export)	✅	⚠️	❌	✅ (Import)	✅	⚠️
	Structure Inference	✅ Auto-Detect	❌	❌	❌	❌	❌	❌	❌
Training	Gradient Descent	✅ Manual Chain	✅ Standard	✅ Standard	✅ Standard	✅ Standard	✅ (On-device)	✅ Standard	✅ Standard
	Neural Tweening	✅ Hybrid Engine	❌	❌	❌	❌	❌	❌	❌
	LR Schedulers	✅ 7 Types	✅	✅	✅	⚠️ Basic	✅	✅	✅
	Optimizers	✅ 3 (SGD/AdamW/RMSprop)	✅ Many	✅ Many	✅	✅	⚠️	✅	✅
Layer Support	Dense (MLP)	✅	✅	✅	✅	✅	✅	✅	✅
	Conv2D	✅	✅	✅	✅	❌	✅	✅	✅
	Conv1D	✅ Native	✅	✅	✅	❌	✅	✅	✅
	RNN / LSTM	✅ Full Gate	✅	✅	✅	✅	✅	✅	✅
	Transformer (MHA)	✅ (Explicit)	✅	✅	✅	✅ (BERT)	✅	✅	✅
	SwiGLU	✅ Native	✅	✅	✅	❌	❌	❌	✅
	Parallel / MoE	✅ Structure	❌ (Manual)	❌ (Manual)	❌	❌	❌	❌	❌
	Sequential Layers	✅ Native	✅	✅	⚠️	⚠️	⚠️	✅	⚠️
	Embeddings	✅	✅	✅	✅	✅	✅	✅	✅
	Tokenizer	✅ Pure Go	❌ (Rust/C++)	❌ (C++)	❌	❌	✅	❌	✅
Normalization	LayerNorm	✅ Native	✅	✅	✅	✅	✅	✅	✅
	RMSNorm	✅ Native	⚠️ (Manual)	⚠️ (Manual)	✅	❌	❌	❌	✅
	Residual/Skip	✅ Native	✅	✅	✅	❌	✅	✅	✅
Advanced	Stitch Layers	✅ Native	❌ (Manual)	❌ (Manual)	❌	❌	❌	❌	❌
	Dynamic Arch Gen	✅ Built-in	❌	❌	❌	❌	❌	❌	❌
	Step-Based Forward	✅ Unique	❌	❌	❌	❌	❌	❌	❌
	K-Means Clustering	✅ Parallel	❌	❌	❌	❌	❌	❌	❌
	Correlation Analysis	✅ Pearson/Spearman	❌	❌	❌	❌	❌	❌	❌
	Model Evaluation	✅ Deviation/Metrics	✅	✅	⚠️	⚠️	⚠️	⚠️	⚠️
	Network Telemetry	✅ Blueprint API	❌	⚠️	❌	❌	❌	⚠️	❌
	Runtime Introspection	✅ Reflection	⚠️ (Python)	⚠️	❌	❌	❌	⚠️	❌
Platform	WASM Training	✅ Full	❌	❌	❌	❌	❌	✅ (Slow)	✅
	Cross-Lang ABI	✅ Universal	❌	❌	❌	❌	❌	❌	⚠️
Ecosystem	HuggingFace Hub	⚠️ (Read/Inspect)	✅ Native	✅ Native	❌	✅	❌	✅	✅
	Pre-trained Zoo	❌	✅ Massive	✅ Massive	❌	✅ (Small)	✅ (Apple)	✅ Large	⚠️ Growing
	Mobile/Web	✅ WASM / C-ABI	✅ (Mobile)	✅ King	❌	❌	✅ King (iOS)	✅ King (Web)	✅ (WASM)

Go Ecosystem Comparison

Category	Feature	Loom	GoMLX	Gorgonia	Spago	Go-Deep	Gonum
Foundation	Primary implementation	Pure Go	CGo (XLA)	Pure Go + CGo	Pure Go	Pure Go	Pure Go
	Tensor Backend	Custom (Generic)	XLA (C++)	Custom	Custom (Dense)	Custom	Dense Matrix
	Autograd	⚠️ Hybrid	✅ Full	✅ Symbolic	✅ Dynamic	✅ Backprop	❌
Model	Load Safetensors	✅ Native	✅	❌	❌	❌	❌
	Model Export	binary/json	XLA format	Onnx (Import)	Gob	Json	❌
Architecture	Dense (MLP)	✅	✅	✅	✅	✅	✅ (Matrix Mul)
	Conv2D	✅	✅	✅	✅	✅	❌
	Conv1D	✅ Native	✅	⚠️ (via 2D)	⚠️ (via 2D)	❌	❌
	RNN / LSTM	✅ Full Gate	✅	⚠️ Basic	✅ BiLSTM	❌	❌
	Transformer (MHA)	✅ Explicit	✅	⚠️ Hard	✅ (BERT)	❌	❌
	SwiGLU	✅	✅	❌	❌	❌	❌
	Embeddings	✅	✅	✅	✅	❌	❌
	Parallel / MoE	✅ MoE + Gating	❌ (Manual)	❌	❌	❌	❌
	Sequential Layers	✅ Native + Nested	⚠️ (Manual)	⚠️ (Manual)	⚠️ (Manual)	❌	❌
	Tokenizer	✅ Pure Go	❌ (Deps)	❌	✅ (WordPiece)	❌	❌
Training	Gradient Descent	✅ Manual	✅ Standard	✅ Standard	✅ Standard	✅ Standard	❌
	Hybrid Tweening	✅ Unique	❌	❌	❌	❌	❌
	LR Schedulers	✅ 7 Types	✅	✅	⚠️ Basic	❌	❌
	Optimizers	✅ SGD/AdamW/RMSprop	✅	✅	✅	⚠️ SGD	❌
	Softmax Variants	✅ 10 Types	⚠️ Standard	⚠️ Standard	⚠️ Standard	⚠️ Standard	❌
Normalization	LayerNorm	✅ Native	✅	⚠️ Manual	✅	❌	❌
	RMSNorm	✅ Native	✅	❌	❌	❌	❌
	Residual/Skip	✅ Native	✅	✅	❌	❌	❌
Advanced	RoPE Embeddings	✅ GQA Support	✅	❌	❌	❌	❌
	Network Grafting	✅ Unique	❌	❌	❌	❌	❌
	Step-Based Forward	✅ Unique	❌	❌	❌	❌	❌
	Dynamic Arch Gen	✅ Unique	❌	❌	❌	❌	❌
	K-Means Clustering	✅ Parallel	❌	❌	❌	❌	❌
	Correlation Analysis	✅ Pearson/Spearman	❌	❌	❌	❌	❌
	Model Evaluation	✅ Full Suite	⚠️	⚠️	⚠️	❌	❌
	Network Telemetry	✅ Blueprint	❌	⚠️	❌	❌	❌
	Runtime Introspection	✅ Reflection	❌	⚠️	❌	❌	❌
Platform	C-ABI (Polyglot)	✅ Universal	❌	❌	❌	❌	❌
	WASM Training	✅ Full	❌ (XLA)	❌	❌	❌	❌
Ecosystem	HuggingFace	⚠️ (Load)	❌	❌	✅ (Load)	❌	❌
	Documentation	⚠️ Growing	✅ Good	✅ Good	✅ Good	⚠️ Minimal	✅ Excellent
	Maintenance	🔥 Active	🔥 Active	⚠️ Slow	⏸️ Paused	⚠️ Slow	🔥 Active

Native Numerical Type & Precision Support

Layer Type	Numerical Type	Loom	GoMLX	Gorgonia	Spago	PyTorch
All Layers	Float32	✅	✅	✅	✅ (Float64)	✅
(Dense, Conv,	Float64 (High Prec)	✅ Native	✅	✅	✅	✅
RNN, Attn)	Float16 / BF16	⚠️ (Storage)	✅ (XLA)	❌	❌	✅
	Int8 (Training)	✅ Native	❌	❌	❌	⚠️ (QAT Wrapper)
	Int8 (Inference)	✅	❌	❌	❌	✅ (Quant)
	Int16, Int32, Int64	✅ Native	✅ (XLA)	⚠️ (Tensor)	❌	❌ (Tensor Only)
	Uint8, Uint16, Uint32	✅ Native	✅ (XLA)	⚠️ (Tensor)	❌	✅ (Uint8 Only)

Note

Complete Type System: Unlike frameworks that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific sim) across every layer type without changes to the model code.

Summary Verdict

Choose PyTorch if you are doing Research, need the latest SOTA models, or rely on complex dynamic architectures.
Choose TensorFlow / TFLite if you need robust Mobile/Edge Deployment.
Choose GoMLX if you need High-Performance Training in Go and can tolerate CGo/C++ dependencies.
Choose Core ML if you are targeting iOS/macOS exclusively.
Choose Loom if you need Pure Go-Native Embedding (Cloud/CLI/Server), want a single binary with zero dependencies, need to experiment with the Neural Tweening training paradigm, or need unique features like Step-Based Forward Pass for real-time inference and Dynamic Architecture Generation for automated model exploration.

Layer Types & Features

Supported Layer Types

Layer	Type String	Description
Dense	`dense`	Fully connected layer
LSTM	`lstm`	Long Short-Term Memory
RNN	`rnn`	Recurrent Neural Network
GRU	`gru`	Gated Recurrent Unit
Conv2D	`conv2d`	2D Convolution
Conv1D	`conv1d`	1D Convolution
Multi-Head Attention	`multi_head_attention`	Transformer attention
LayerNorm	`layer_norm`	Layer normalization
RMSNorm	`rms_norm`	RMS normalization
SwiGLU	`swiglu`	SwiGLU activation layer
Softmax	`softmax`	10 variants (Standard, Grid, Hierarchical, Temperature, Gumbel, Masked, Sparsemax, Entmax, Adaptive, Mixture)
Embedding	`embedding`	Token embedding
Parallel	`parallel`	Branching with 6 combine modes (add, concat, multiply, average, grid_scatter, filter)
Sequential	`sequential`	Grouped sub-layers

Activation Functions

relu, sigmoid, tanh, softmax, gelu, swish, mish, leaky_relu, elu, selu, linear

Quick Start

Installation

# Clone the repository
git clone https://github.com/openfluke/loom.git
cd loom

# Install dependencies
go mod download

Simple Example

package main

import (
    "fmt"
    "github.com/openfluke/loom/nn"
)

func main() {
    network := nn.NewNetwork(4096, 4, 4, 5)  // 80 total layers

    if err := network.InitGPU(); err != nil {
        panic(err)
    }
    defer network.ReleaseGPU()

    input := make([]float32, 4096)
    output, gpuTime, _ := network.ForwardGPU(input)

    fmt.Printf("GPU Forward time: %v, Output size: %d\n", gpuTime, len(output))
}

Model Serialization

// Save a trained model
err := network.SaveModel("model.json", "my_model")

// Load it back - ONE LINE!
loadedNet, err := nn.LoadModel("model.json", "my_model")

// Or use strings (great for APIs/databases/WASM)
jsonString, err := network.SaveModelToString("my_model")
loadedNet, err := nn.LoadModelFromString(jsonString, "my_model")

Cross-Platform API

Function	Go	Python	TypeScript	C#	C
Create	`BuildNetworkFromJSON()`	`create_network_from_json()`	`createNetworkFromJSON()`	`CreateLoomNetwork()`	`CreateLoomNetwork()`
Forward	`ForwardCPU()`	`forward_simple()`	`forward()`	`LoomForward()`	`LoomForward()`
Train	`Train()`	`train_simple()`	`train()`	`LoomTrain()`	`LoomTrain()`
Save	`SaveModelToString()`	`save_model_simple()`	`saveModel()`	`LoomSaveModel()`	`LoomSaveModel()`
Load	`LoadModelFromString()`	`load_model_simple()`	`loadLoomNetwork()`	`LoomLoadModel()`	`LoomLoadModel()`
Evaluate	`EvaluateNetwork()`	`evaluate_network_simple()`	`evaluate()`	`LoomEvaluateNetwork()`	`LoomEvaluateNetwork()`

Language Bindings

Python

pip install welvet

import welvet

config = {"batch_size": 1, "layers": [...]}
welvet.create_network_from_json(config)
output = welvet.forward_simple([0.1, 0.2, 0.3, 0.4])

See python/README.md for complete documentation.

TypeScript / Node.js

npm install @openfluke/welvet

import { init, createNetworkFromJSON } from "@openfluke/welvet";

await init();
const network = createNetworkFromJSON(JSON.stringify(config));
const output = network.ForwardCPU(JSON.stringify([[0.1, 0.2, 0.3, 0.4]]));

See typescript/README.md for complete documentation.

C# / .NET

dotnet add package Welvet

using Welvet;

Network.CreateFromJson(config);
var output = NativeMethods.LoomForward(input, input.Length);

See csharp/README.md for complete documentation.

Project Structure

loom/
├── nn/                  # Neural network package (core)
├── tokenizer/           # Pure Go BPE tokenizer
├── wasm/                # WebAssembly module
├── cabi/                # C ABI for FFI
├── python/              # Python package (welvet)
├── typescript/          # TypeScript/WASM package
├── csharp/              # C#/.NET package (Welvet)
├── fabric/              # Demo application
├── pods/                # GPU compute pods
├── model_conversion/    # HuggingFace model import
├── docs/                # Documentation
└── detector/            # GPU device detection

Documentation

Neural Network Package - Detailed API documentation
Neural Tweening Benchmarks - 19-test comprehensive benchmark
Python Bindings - PyPI package docs
TypeScript Bindings - NPM package docs
C# Bindings - NuGet package docs
WASM Module - Browser deployment
C ABI - FFI reference
Model Conversion - HuggingFace import guide

More Examples: See github.com/openfluke/tva for additional examples and experiments.

Requirements

Go: 1.24 or higher
GPU: WebGPU-compatible GPU (Vulkan, Metal, or D3D12) - optional
OS: Linux, macOS, or Windows

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Apache License 2.0 - see LICENSE file for details.

Made with ❤️ by Openfluke

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LOOM - Layered Omni-architecture Openfluke Machine

🌍 Cross-Ecosystem Compatibility

Supported Platforms

Key Strengths

Key Limitations

What's New

Framework Comparison

Global AI Landscape

Go Ecosystem Comparison

Native Numerical Type & Precision Support

Summary Verdict

Layer Types & Features

Supported Layer Types

Activation Functions

Quick Start

Installation

Simple Example

Model Serialization

Cross-Platform API

Language Bindings

Python

TypeScript / Node.js

C# / .NET

Project Structure

Documentation

Requirements

Contributing

License

About

Uh oh!

Releases 7

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
arcagitesting		arcagitesting
cabi		cabi
csharp		csharp
detector		detector
docs		docs
internal_testing		internal_testing
nn		nn
pods		pods
python		python
tokenizer		tokenizer
tva		tva
typescript		typescript
wasm		wasm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collect_readmes.sh		collect_readmes.sh
go.mod		go.mod
go.sum		go.sum

License

openfluke/loom

Folders and files

Latest commit

History

Repository files navigation

LOOM - Layered Omni-architecture Openfluke Machine

🌍 Cross-Ecosystem Compatibility

Supported Platforms

Key Strengths

Key Limitations

What's New

Framework Comparison

Global AI Landscape

Go Ecosystem Comparison

Native Numerical Type & Precision Support

Summary Verdict

Layer Types & Features

Supported Layer Types

Activation Functions

Quick Start

Installation

Simple Example

Model Serialization

Cross-Platform API

Language Bindings

Python

TypeScript / Node.js

C# / .NET

Project Structure

Documentation

Requirements

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages