Skip to content

Lightning-fast XML parsing and XPath querying for Elixir, powered by Rust NIFs.

License

Notifications You must be signed in to change notification settings

wearecococo/expath

Repository files navigation

Expath

Hex.pm Documentation CI

Lightning-fast XML parsing and XPath querying for Elixir, powered by Rust NIFs.

Expath provides blazing-fast XML processing through Rust's battle-tested sxd-document and sxd-xpath libraries, delivering 2-10x performance improvements compared to existing Elixir XML libraries.

✨ Key Features

  • 🚀 Blazing Fast: 2-10x faster than SweetXml with Rust-powered NIFs
  • 🔄 Parse-Once, Query-Many: Efficient document reuse for multiple XPath queries
  • 🛡️ Battle-Tested: Built on proven Rust XML libraries (sxd-document, sxd-xpath)
  • 🎯 Simple API: Clean, intuitive interface with comprehensive documentation
  • ⚡ Thread-Safe: Safe concurrent access to parsed documents
  • 🌐 Namespace Support: Full XML namespace support for SOAP, RSS, and complex XML
  • 🔧 Zero Dependencies: No external XML parsers required

🚀 Quick Start

Installation

Add expath to your list of dependencies in mix.exs:

def deps do
  [
    {:expath, "~> 0.2.0"}
  ]
end

Then run:

mix deps.get
mix deps.compile

Basic Usage

Simple XPath query

xml = """
<library>
  <book id="1">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
  </book>
  <book id="2">
    <title>1984</title>
    <author>George Orwell</author>
  </book>
</library>
"""

# Extract all book titles
{:ok, titles} = Expath.select(xml, "//title/text()")
# => ["The Great Gatsby", "1984"]

# Find specific book
{:ok, [title]} = Expath.select(xml, "//book[@id='1']/title/text()")
# => ["The Great Gatsby"]

# Count books
{:ok, [count]} = Expath.select(xml, "count(//book)")
# => ["2"]

Parse-Once, Query-Many (Recommended for Multiple Queries)

# Parse document once
{:ok, doc} = Expath.new(xml)

# Run multiple queries efficiently
{:ok, titles} = Expath.query(doc, "//title/text()")
{:ok, authors} = Expath.query(doc, "//author/text()")
{:ok, book_count} = Expath.query(doc, "count(//book)")

# Document is automatically cleaned up when out of scope

📊 Performance Benchmarks

Real-world performance comparison with SweetXml across different document sizes:

Document Size Speed Improvement Use Case
Small (644B) 2-3x faster API responses, config files
Medium (5.6KB) 2.3x faster RSS feeds, small datasets
Large (904KB) 8-10x faster Large documents, bulk processing

Benchmark Results Summary

*** Large XML Performance ***
Expath (Rust NIFs)    78.27 iterations/sec (12.78 ms avg)
SweetXml               7.77 iterations/sec (128.64 ms avg)

Comparison: Expath is 10.07x faster

Run your own benchmarks:

mix run bench/benchmark.exs

📖 API Reference

Core Functions

Expath.select/2 - Single Query

Perfect for one-off XPath queries.

Expath.select(xml_string, xpath_expression)
# Returns: {:ok, results} | {:error, reason}

Expath.new/1 - Parse Document

Creates a reusable document for multiple queries.

{:ok, doc} = Expath.new(xml_string)
# Returns: {:ok, %Expath.Document{}} | {:error, reason}

Expath.query/2 - Query Parsed Document

Query a previously parsed document.

{:ok, results} = Expath.query(document, xpath_expression)
# Returns: {:ok, results} | {:error, reason}

XPath Support

Expath supports the full XPath 1.0 specification:

# Node selection
Expath.select(xml, "//book")                    # All book elements
Expath.select(xml, "/library/book[1]")          # First book
Expath.select(xml, "//book[@id='1']")           # Book with id="1"

# Text extraction
Expath.select(xml, "//title/text()")            # All title text
Expath.select(xml, "//book/@id")                # All id attributes

# Functions
Expath.select(xml, "count(//book)")             # Count books
Expath.select(xml, "//book[position()=1]")     # First book
Expath.select(xml, "//book[contains(@class,'fiction')]") # Contains filter

# Complex expressions
Expath.select(xml, "//book[price > 10]/title/text()") # Conditional selection

XML Namespace Support

Expath provides full support for XML namespaces, essential for SOAP, RSS, and complex XML documents:

# XML with namespaces
xml = """
<library xmlns:book="http://example.com/book" xmlns:meta="http://example.com/metadata">
  <book:collection meta:id="sci-fi">
    <book:title>1984</book:title>
    <book:author>George Orwell</book:author>
  </book:collection>
</library>
"""

# Define namespace mappings
namespaces = %{
  "book" => "http://example.com/book",
  "meta" => "http://example.com/metadata"
}

# Query with namespace support
{:ok, titles} = Expath.select(xml, "//book:title/text()", namespaces)
# => ["1984"]

{:ok, ids} = Expath.select(xml, "//book:collection/@meta:id", namespaces)
# => ["sci-fi"]

# Multiple queries with namespace support
{:ok, doc} = Expath.new(xml)
{:ok, titles} = Expath.query(doc, "//book:title/text()", namespaces)
{:ok, authors} = Expath.query(doc, "//book:author/text()", namespaces)

For comprehensive namespace documentation, see NAMESPACE_GUIDE.md.

Error Handling

Expath provides detailed error information:

# Invalid XML (detected during query)
{:error, :invalid_xml} = Expath.select("<root><unclosed>", "/*")

# Invalid XPath expression
{:error, :invalid_xpath} = Expath.select(xml, "//[invalid")

# XPath evaluation errors
{:error, :xpath_error} = Expath.query(doc, "unknown-function()")

Performance

Expath is designed for high-performance XML processing:

  • Native Speed: Rust NIFs provide near-native performance
  • Zero-Copy: Efficient string handling between Elixir and Rust
  • Resource Caching: Parse once, query many times without re-parsing
  • Memory Efficient: Automatic memory management via Erlang garbage collection

Performance Example

# Large XML document
xml = File.read!("large_document.xml")

# Parse once (expensive operation)
{:ok, doc} = Expath.new(xml)

# Multiple queries (very fast - no re-parsing)
Enum.each(1..1000, fn _i ->
  {:ok, _results} = Expath.query(doc, "//some/xpath")
end)

Platform Support

Expath supports all platforms where Rust and Erlang are available:

  • Linux (x86_64, aarch64)
  • macOS (Intel, Apple Silicon)
  • Windows (x86_64)

Apple Silicon (M1/M2) Setup

Expath includes special configuration for Apple Silicon Macs. If you encounter linking issues, ensure you have:

  1. Native Erlang installation (not x86_64 via Rosetta)
  2. Native Rust toolchain for aarch64-apple-darwin

The included Cargo configuration handles the necessary linker flags automatically.

Examples

RSS Feed Processing

defmodule RSSProcessor do
  def process_feed(rss_xml) do
    {:ok, doc} = Expath.new(rss_xml)

    {:ok, titles} = Expath.query(doc, "//item/title/text()")
    {:ok, links} = Expath.query(doc, "//item/link/text()")
    {:ok, descriptions} = Expath.query(doc, "//item/description/text()")

    titles
    |> Enum.zip([links, descriptions])
    |> Enum.map(fn {title, [link, description]} ->
      %{title: title, link: link, description: description}
    end)
  end
end

Configuration File Parsing

defmodule ConfigParser do
  def parse_config(xml_config) do
    {:ok, doc} = Expath.new(xml_config)

    {:ok, database_host} = Expath.query(doc, "//database/@host")
    {:ok, database_port} = Expath.query(doc, "//database/@port")
    {:ok, features} = Expath.query(doc, "//features/feature/@name")

    %{
      database: %{host: database_host, port: database_port},
      features: features
    }
  end
end

Data Extraction Pipeline

defmodule DataExtractor do
  def extract_products(xml_data) do
    {:ok, doc} = Expath.new(xml_data)

    # Extract in parallel using cached document
    tasks = [
      Task.async(fn -> Expath.query(doc, "//product/@id") end),
      Task.async(fn -> Expath.query(doc, "//product/name/text()") end),
      Task.async(fn -> Expath.query(doc, "//product/price/text()") end),
      Task.async(fn -> Expath.query(doc, "//product/category/text()") end)
    ]

    [ids, names, prices, categories] =
      tasks
      |> Enum.map(&Task.await/1)
      |> Enum.map(fn {:ok, results} -> results end)

    [ids, names, prices, categories]
    |> Enum.zip()
    |> Enum.map(fn {id, name, price, category} ->
      %{id: id, name: name, price: price, category: category}
    end)
  end
end

Development

Prerequisites

  • Elixir 1.18 or later
  • Erlang/OTP 27 or later
  • Rust 1.70 or later
  • C compiler (gcc, clang, or MSVC)

Building from Source

git clone https://github.com/yourusername/expath.git
cd expath
mix deps.get
mix compile

Running Tests

mix test

Building Documentation

mix docs

Docker Development

For cross-platform testing or if you prefer containerized development, Expath includes comprehensive Docker support:

Quick Start with Docker

# Run all tests in Linux container
./scripts/docker-test.sh

# Or use docker-compose for specific tasks
docker-compose run test
docker-compose run benchmark
docker-compose run quality

Available Docker Services

  • dev: Development environment with all dependencies
  • test: Run the full test suite
  • benchmark: Execute performance benchmarks
  • quality: Run code quality checks (Credo)

Docker Commands

# Build and test everything
docker-compose up test

# Run interactive development shell
docker-compose run dev iex -S mix

# Execute benchmarks
docker-compose run benchmark

# Check code quality
docker-compose run quality

# Clean up containers
docker-compose down --volumes

Multi-Architecture Testing

The Docker setup supports testing on different architectures:

# Test on current architecture
docker-compose run test

# Build for specific platform (requires BuildKit)
DOCKER_PLATFORM=linux/amd64 docker-compose run test

This is particularly useful for ensuring your NIFs work correctly across different platforms before deployment.

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (mix test)
  5. Commit your changes (git commit -am 'Add some feature')
  6. Push to the branch (git push origin my-new-feature)
  7. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of the excellent sxd-document and sxd-xpath Rust crates
  • Uses Rustler for safe Elixir-Rust interoperability
  • Inspired by the need for high-performance XML processing in Elixir applications

Changelog

v0.1.0 (Initial Release)

  • High-performance XML parsing via Rust NIFs
  • Full XPath 1.0 support
  • Parse-once, query-many Document resource API
  • Comprehensive error handling
  • Apple Silicon support
  • Complete test suite and documentation

About

Lightning-fast XML parsing and XPath querying for Elixir, powered by Rust NIFs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published