Skip to content

TDS v3 Internals Redesign#183

Draft
mjaric wants to merge 40 commits intomasterfrom
next
Draft

TDS v3 Internals Redesign#183
mjaric wants to merge 40 commits intomasterfrom
next

Conversation

@mjaric
Copy link
Member

@mjaric mjaric commented Feb 27, 2026

TDS v3 Internals Redesign

Goal: Incrementally rewrite the TDS library internals to deliver a well-layered, well-tested codebase with
user-extensible types, clean protocol layers, correct TLS, and proper error handling. Without breaking the
public API.

Motivation

The current TDS library (v2.x) works but has fundamental architectural problems:

  • Monolithic type system - types.ex is 1800+ lines; adding one type requires 9-11 edits; no user extensibility
    without forking
  • TLS layer hacks - Tds.Tls GenServer has race conditions, unbounded buffers, and leaked processes
  • Implicit state machine - Process.put for hidden state, unguarded transitions, silent transaction corruption
  • No separation of concerns - packet framing, message construction, token parsing tangled together; binary
    macros in two modules with different endianness semantics
  • Ad-hoc error handling - multiple SQL errors silently discarded, unknown tokens crash, no error classification

Phase 1: Constants & Binary Macros - Single Source of Truth

  • Task 1.1 - Create Tds.Protocol.Constants (compile-time macros for all TDS protocol constants)
  • Task 1.2 - Create Tds.Protocol.Binary (unified binary macros with explicit endianness)
  • Task 1.3 - Migrate all modules to use Constants and Binary; fix prelogin :instopt decode bug
  • Task 1.4 - Remove old BinaryUtils and Grammar modules

Phase 2: Packet Framing Layer

  • Task 2.1 - Create Tds.Protocol.Packet (encode/decode TDS packet frames with bounded reassembly)
  • Task 2.2 - Migrate Messages and Protocol to use Packet

Phase 3: Extensible Type System

  • Task 3.1 - Define Tds.Type behaviour (type_codes, decode_metadata, decode, encode,
    param_descriptor, infer)
  • Task 3.2 - Create Tds.Type.Registry (TDS code <-> handler module mapping, user types checked first)
  • Task 3.3 - Implement built-in type handlers (integer, float, decimal, string, binary, boolean, datetime,
    uuid, money, xml)
  • Task 3.4 - Wire Type.Registry into Protocol, replacing the monolithic types.ex

Phase 4: Transport Abstraction

  • Task 4.1 - Create Tds.Transport behaviour and Transport.Tcp implementation
  • Task 4.2 - Implement Transport.Tls + TlsHandshake cb_info module (eliminates GenServer, fixes race
    conditions)
  • Task 4.3 - Migrate Protocol to use Transport

Phase 5: State Machine Cleanup

  • Task 5.1 - Create Protocol.State with explicit phase transitions (eliminates Process.put, guards
    invalid transitions)
  • Task 5.2 - Extract Protocol.Connection and Protocol.Execution modules

Phase 6: Error Handling

  • Task 6.1 - Create error hierarchy (Tds.Error with :reason, :errors, :context, :original)
  • Task 6.2 - Aggregate multiple SQL errors, add operation context, classify errors

Phase 7: Login/Prelogin & Connection Init Cleanup

  • Task 7.1 - Fix Prelogin bugs (exhaustive encryption state match, replace inspect(self()) thread ID)
  • Task 7.2 - Fix Login7 (input validation, automatic offset calculation, FeatureExt groundwork)
  • Task 7.3 - Add session option validation with SQL injection prevention

Bonus

  • Task B.1 - Create mix tds.export_errors task for refreshing SQL Server error codes CSV

Quality Gates

  • All 269 existing tests pass after every commit
  • Code coverage analysis to identify blind spots (currently 68% overall)
  • Each new module has dedicated unit tests
  • Compile clean: mix compile --warnings-as-errors
  • Format clean: mix format --check-formatted
  • No references to old modules: grep -r "BinaryUtils\|Protocol.Grammar" lib/ returns empty
  • Public API unchanged: lib/tds.ex function signatures match v2.x
  • Full integration tests against SQL Server via Docker

this module will be used as source of truth, it will allso alow us to quickly search to the code by searching using symbol usage
… feature extensions

New tokens added (not in the original implementation):

- `offset` (0x78), `altmetadata` (0x88), `dataclassification` (0xA3), `tabname` (0xA4), `colinfo` (0xA5), `featureextack` (0xAE), `altrow` (0xD3), `sessionstate` (0xE4), `sspi` (0xED), `fedauthinfo` (0xEE)
Consolidates little-endian macros from `Tds.BinaryUtils` and big-endian and parameterized macros from `Tds.Protocol.Grammar` into a single module.

Establishes clear conventions for byte order, with little-endian as the default and `_be` suffixes for big-endian variants. This centralizes common binary encoding and decoding utilities, improving consistency and maintainability across the TDS protocol implementation.
Centralizes all TDS protocol constants (packet types, token types, data types, etc.) into `Tds.Protocol.Constants` and binary utility functions into `Tds.Protocol.Binary`. This refactor replaces hardcoded hexadecimal values with compile-time macros, enhancing maintainability and consistency across the codebase.

Corrects a bug in `Tds.Protocol.Prelogin` where the `:instopt` prelogin token was incorrectly decoded as an `:encryption` token.
@mjaric mjaric requested a review from josevalim February 27, 2026 11:48
Introduces `Tds.Protocol.Packet` for TDS packet framing.

Provides `encode/2` to split payloads into 4096-byte packets, each with an 8-byte header and up to 4088 bytes of data. Packet IDs start at 1 and wrap at 256.

Adds `decode_header/1` to parse 8-byte packet headers from incoming binaries.
Introduces `Packet.reassemble/2` to reconstruct complete TDS messages from a stream of packets. This function correctly handles message fragmentation by:

*   Concatenating data from multiple packets.
*   Validating sequential packet IDs to ensure correct ordering, including wrap-around behavior.
*   Enforcing a configurable maximum payload size to mitigate denial-of-service risks.
*   Gracefully managing partial `recv` operations and multiple packets within a single read from the socket.
Improves code quality and readability within the packet reassembly module.

- Relocates type and constant definitions to the module top for better organization.
- Removes an unused parameter from the `extract_and_continue` function.
- Enhances documentation for the `:max_payload_size` option to show a clear default value.
- Replaces a nested conditional with pattern matching in `finish_or_continue` for cleaner, more idiomatic code.
Migrates all calls to `encode_packets` and `encode_header` in `Messages`, `Prelogin`, and `Login7` to use the unified `Tds.Protocol.Packet.encode/2` function.

Removes the now-unused `encode_header/4` and `encode_packets/3` from `Messages`, centralizing packet serialization logic and reducing duplication.
Adjusts `login7_test.exs` to correctly handle `iodata` returned by `Login7.encode`, converting it to a binary for assertion.

Removes `messages_test.exs` as its functionality is now covered and superseded by `PacketTest`.
Replaces the internal `msg_recv` and `next_tds_pkg` functions with `Packet.reassemble/1`.

This centralizes the logic for receiving and reassembling TDS packets into a dedicated module. It simplifies the `Tds.Protocol` module and improves error handling by leveraging the new `Packet` module's more robust error reporting.
Provides a central lookup to map TDS type codes and Elixir type names to handler modules. This supports resolving handlers during decoding (by type code) and encoding (by name or value inference), while allowing user-defined handlers to override built-ins.
Handles conversion between MSSQL's mixed-endian wire format and standard RFC 4122 representation. Supports encoding from both binary and string UUID formats.
Implements a stub handler for the sql_variant (0x62) type. Decoding returns the raw binary data without inner-type dispatch, while encoding is explicitly disabled as TDS does not support sql_variant as an RPC parameter.
mjaric added 11 commits March 5, 2026 17:23
Replaces Types.decode_info/decode_data calls in tokens.ex with the
new Registry + DataReader + handler pipeline. Falls back to the old
path for type codes not yet covered by handlers.

Key changes:
- tokens.ex uses @registry module attribute and decode_type_metadata/1
  + decode_type_value/2 helpers for the new pipeline
- DateTime handler includes type_code in metadata, decodes
  datetimeoffset as UTC DateTime (matching old behavior)
- UUID handler returns raw wire bytes during transition (no reorder)
- Registry fixes Tds.Type.UDT -> Tds.Type.Udt case mismatch
- Tests updated for Elixir struct returns (NaiveDateTime, DateTime,
  Decimal, Date, Time) instead of tuple format
Wires up the new type handler system for the encode path
(RPC parameter encoding and sp_executesql param descriptors).

Key changes:
- protocol.ex: Adds :registry field to state struct, initializes
  it from config in connect/1, threads it to Parameter functions
- messages.ex: encode_rpc_param tries new handler pipeline first
  (Registry lookup -> handler.encode), falls back to old Types
  module for unhandled types (tuples, UUID during transition)
- parameter.ex: prepared_params accepts optional registry, uses
  handler.param_descriptor with fallback to Types module
- All handler encode functions now include the type code byte in
  meta_bin (colmetadata_binary), matching the wire format contract
  that the caller concatenates meta_bin directly after status flags
- UUID encode skipped during transition since the handler does
  byte reordering but decode does not yet

Handler meta_bin format standardized across all 12 handlers:
  {type_code, <<type_code, ...type_specific_meta...>>, value_bin}

The transition approach uses try/rescue FunctionClauseError to
gracefully fall back to old Types module when a handler cannot
encode a particular value format (e.g. tuple datetimes).
Deletes lib/tds/types.ex and removes all fallback paths that
referenced it from tokens.ex, messages.ex, parameter.ex, and
query.ex. The new type system (Registry + individual handler
modules) now handles all encode/decode/param_descriptor paths
exclusively.

Key changes to handlers:
- Integer: handles null type code (0x1F) with fixed-0 reader
- Decimal: registers :numeric type name alongside :decimal
- Binary: registers :image type name alongside :binary
- UUID: encodes bytes as-is (no reorder), matching old behavior
- DateTime: adds tuple-format and legacy datetime/smalldatetime
  encode support for backward compatibility

Removes old Tds.Types references from all test files and
rewrites bench/type_system_bench.exs to use new handler APIs.
- Adds comprehensive v3.0.0 entry to CHANGELOG.md detailing breaking changes, new features, and performance gains.
- Updates README.md data mapping tables to reflect mandatory Elixir calendar structs and Decimal money types.
- Documents the new pluggable Tds.Type behavior and custom type registration via extra_types.
- Formalizes the deprecation of Tds.Types.UUID in favor of Ecto.UUID with protocol-level byte reordering.
@mjaric
Copy link
Member Author

mjaric commented Mar 7, 2026

@josevalim , @wojtekmach

Up to 3724dd1 was biggest shift. Phase 3 is finished! 😌

This was biggest milestone and there are some good metrics to share 📊 and there are braking changes, so this will go as v3 (that was a plan anyways).

Type System Benchmark Results

Machine: Apple M4 Pro, 48 GB, macOS
Elixir 1.18.1, Erlang/OTP 27.2, JIT enabled

PLP Chunk Reassembly (iolist vs binary concat)

Scenario Before (concat) After (iolist) Speedup
1 MB 5.38 K ips (186 us) 43.15 K ips (23 us) 8.0x
10 MB 0.49 K ips (2026 us) 3.44 K ips (291 us) 7.0x

Memory:

Scenario Before After Reduction
1 MB 35.44 KB 18.13 KB 1.96x less
10 MB 357.38 KB 170.70 KB 2.10x less

Type Decode/Encode Throughput

Baseline (before refactor, monolithic Tds.Types)

Benchmark ips average median
decode integer 7.12 M 0.140 us 0.125 us
decode string (500 chars) 1.71 M 0.59 us 0.58 us
encode 1000 decimal params 0.63 K 1589.33 us 1563.94 us

Memory:

Benchmark Memory
decode integer 72 B
decode string 1.31 KB
encode 1000 decimal params 1.99 MB

Post-Refactor (behaviour handlers + DataReader + Registry)

Benchmark ips average median
decode integer 10.96 M 0.0912 us 0.0830 us
decode string (500 chars) 1.75 M 0.57 us 0.54 us
encode 1000 decimal params 5.39 K 185.50 us 182.62 us

Memory:

Benchmark Memory
decode integer 48 B
decode string 1.22 KB
encode 1000 decimal params 226.56 KB

Summary

Metric Improvement
Integer decode throughput 54% faster (7.12M -> 10.96M ips)
String decode throughput ~2% faster (within noise)
Decimal encode throughput 8.5x faster (0.63K -> 5.39K ips)
Decimal encode memory 8.8x less (1.99 MB -> 226 KB)
Integer decode memory 33% less (72 B -> 48 B)
PLP reassembly 8x faster, 2x less memory

Key improvements

  1. Decimal encode: Eliminated Decimal.Context process dictionary mutation. Precision and scale computed from value metadata instead of set_decimal_precision/1 global side effects.

  2. PLP reassembly: Iolist accumulation ([chunk | acc] then :lists.reverse |> IO.iodata_to_binary) replaces quadratic buf <> :binary.copy(chunk) pattern.

  3. Integer decode: Direct handler dispatch via Registry avoids the monolithic case/cond chain in the old Tds.Types`.

  4. Binary safety: All DataReader strategies sever sub-binary references via :binary.copy/1, preventing memory retention of entire TCP packet buffers when decoded values are held long-term.

Copy link
Member

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations!

I took a quick look and dropped two comments about public api and end-to-end testing approach in case it's something worth considering. I haven't gotten to the type extensibility part which besides performance improvements seems to be the biggest deal for v3.

@@ -0,0 +1,239 @@
defmodule Tds.Protocol.Binary do
@moduledoc """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking, do you want to expose this and similar modules as part of the public API?

Neither MyXQL nor Postgrex does that.

I was thinking about maybe doing that, have a :gen_tcp-esque MyXQL.Socket (or :gen_mysql) and then probably expose protocol details in there somehow. But there was never a use case.

So yeah, unless you have a use case, I wouldn't make it public either, it's easier to open it up than changing API thats already public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main driver behind this refactoring was to enable better Unit Testing for the protocol. By decoupling these modules, I can now take a message fragment from Wireshark and test it directly using binary fixtures. This makes the test setup much "cheaper" and more reliable.

In fact, this approach has already helped me catch several bugs where the type was correct but the endianness was wrong.

Regarding the module structure, elixir lacks an "internal" scope, so these modules are technically public. However, I’ve organized them this way to make the code self-documenting. By mirroring the ubiquitous language and terminology of the TDS spec. All Tds.Type.* modules are now mirroring spec language, make it easier for contributors to follow the protocol logic (I hope 😄 ) . The goal is internal clarity and spec-compliance rather than expanding the library's public API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized you are referring to attribute, so question is why not@moduledoc false , right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’d moduledoc false it until further notice :)

@@ -0,0 +1,367 @@
defmodule Tds.TypeIntegrationTest do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach to maybe consider is full end-to-end tests. In MyXQL, they caught a bunch of protocol bugs and they serve as important regression safety net.

It's maybe a little bit hard to see at a glance (and arguably testing helpers could be significantly improved) but each assert_roundtrip inserts value, and thus tests encoding, and receives value and thus tests decoding. We do that for both text and binary protocols:

https://github.com/elixir-ecto/myxql/blob/v0.8.1/test/myxql/protocol/values_test.exs#L14:L34

Copy link
Member Author

@mjaric mjaric Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my happiest moment :) this is actually TDD test that we can chose to throw after I finish implementation. I'm not sure why it is named integration, maybe copy paste and bad rename issue :)

I'll change the name.

BTW, until now TDD was my only friend, so I could not replace protocol stuff that easy to reuse old integration test. But at the end, integration tests helped a lot to verify if plugging new implementation didn't brake anything. Plus we want to respect old public API , so I avoided changing integration tests

@mjaric mjaric requested a review from wojtekmach March 12, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants