Conversation
this module will be used as source of truth, it will allso alow us to quickly search to the code by searching using symbol usage
… feature extensions New tokens added (not in the original implementation): - `offset` (0x78), `altmetadata` (0x88), `dataclassification` (0xA3), `tabname` (0xA4), `colinfo` (0xA5), `featureextack` (0xAE), `altrow` (0xD3), `sessionstate` (0xE4), `sspi` (0xED), `fedauthinfo` (0xEE)
Consolidates little-endian macros from `Tds.BinaryUtils` and big-endian and parameterized macros from `Tds.Protocol.Grammar` into a single module. Establishes clear conventions for byte order, with little-endian as the default and `_be` suffixes for big-endian variants. This centralizes common binary encoding and decoding utilities, improving consistency and maintainability across the TDS protocol implementation.
Centralizes all TDS protocol constants (packet types, token types, data types, etc.) into `Tds.Protocol.Constants` and binary utility functions into `Tds.Protocol.Binary`. This refactor replaces hardcoded hexadecimal values with compile-time macros, enhancing maintainability and consistency across the codebase. Corrects a bug in `Tds.Protocol.Prelogin` where the `:instopt` prelogin token was incorrectly decoded as an `:encryption` token.
Introduces `Tds.Protocol.Packet` for TDS packet framing. Provides `encode/2` to split payloads into 4096-byte packets, each with an 8-byte header and up to 4088 bytes of data. Packet IDs start at 1 and wrap at 256. Adds `decode_header/1` to parse 8-byte packet headers from incoming binaries.
Introduces `Packet.reassemble/2` to reconstruct complete TDS messages from a stream of packets. This function correctly handles message fragmentation by: * Concatenating data from multiple packets. * Validating sequential packet IDs to ensure correct ordering, including wrap-around behavior. * Enforcing a configurable maximum payload size to mitigate denial-of-service risks. * Gracefully managing partial `recv` operations and multiple packets within a single read from the socket.
Improves code quality and readability within the packet reassembly module. - Relocates type and constant definitions to the module top for better organization. - Removes an unused parameter from the `extract_and_continue` function. - Enhances documentation for the `:max_payload_size` option to show a clear default value. - Replaces a nested conditional with pattern matching in `finish_or_continue` for cleaner, more idiomatic code.
Migrates all calls to `encode_packets` and `encode_header` in `Messages`, `Prelogin`, and `Login7` to use the unified `Tds.Protocol.Packet.encode/2` function. Removes the now-unused `encode_header/4` and `encode_packets/3` from `Messages`, centralizing packet serialization logic and reducing duplication.
Adjusts `login7_test.exs` to correctly handle `iodata` returned by `Login7.encode`, converting it to a binary for assertion. Removes `messages_test.exs` as its functionality is now covered and superseded by `PacketTest`.
Replaces the internal `msg_recv` and `next_tds_pkg` functions with `Packet.reassemble/1`. This centralizes the logic for receiving and reassembling TDS packets into a dedicated module. It simplifies the `Tds.Protocol` module and improves error handling by leveraging the new `Packet` module's more robust error reporting.
Provides a central lookup to map TDS type codes and Elixir type names to handler modules. This supports resolving handlers during decoding (by type code) and encoding (by name or value inference), while allowing user-defined handlers to override built-ins.
Handles conversion between MSSQL's mixed-endian wire format and standard RFC 4122 representation. Supports encoding from both binary and string UUID formats.
Implements a stub handler for the sql_variant (0x62) type. Decoding returns the raw binary data without inner-type dispatch, while encoding is explicitly disabled as TDS does not support sql_variant as an RPC parameter.
Replaces Types.decode_info/decode_data calls in tokens.ex with the new Registry + DataReader + handler pipeline. Falls back to the old path for type codes not yet covered by handlers. Key changes: - tokens.ex uses @registry module attribute and decode_type_metadata/1 + decode_type_value/2 helpers for the new pipeline - DateTime handler includes type_code in metadata, decodes datetimeoffset as UTC DateTime (matching old behavior) - UUID handler returns raw wire bytes during transition (no reorder) - Registry fixes Tds.Type.UDT -> Tds.Type.Udt case mismatch - Tests updated for Elixir struct returns (NaiveDateTime, DateTime, Decimal, Date, Time) instead of tuple format
Wires up the new type handler system for the encode path
(RPC parameter encoding and sp_executesql param descriptors).
Key changes:
- protocol.ex: Adds :registry field to state struct, initializes
it from config in connect/1, threads it to Parameter functions
- messages.ex: encode_rpc_param tries new handler pipeline first
(Registry lookup -> handler.encode), falls back to old Types
module for unhandled types (tuples, UUID during transition)
- parameter.ex: prepared_params accepts optional registry, uses
handler.param_descriptor with fallback to Types module
- All handler encode functions now include the type code byte in
meta_bin (colmetadata_binary), matching the wire format contract
that the caller concatenates meta_bin directly after status flags
- UUID encode skipped during transition since the handler does
byte reordering but decode does not yet
Handler meta_bin format standardized across all 12 handlers:
{type_code, <<type_code, ...type_specific_meta...>>, value_bin}
The transition approach uses try/rescue FunctionClauseError to
gracefully fall back to old Types module when a handler cannot
encode a particular value format (e.g. tuple datetimes).
Deletes lib/tds/types.ex and removes all fallback paths that referenced it from tokens.ex, messages.ex, parameter.ex, and query.ex. The new type system (Registry + individual handler modules) now handles all encode/decode/param_descriptor paths exclusively. Key changes to handlers: - Integer: handles null type code (0x1F) with fixed-0 reader - Decimal: registers :numeric type name alongside :decimal - Binary: registers :image type name alongside :binary - UUID: encodes bytes as-is (no reorder), matching old behavior - DateTime: adds tuple-format and legacy datetime/smalldatetime encode support for backward compatibility Removes old Tds.Types references from all test files and rewrites bench/type_system_bench.exs to use new handler APIs.
- Adds comprehensive v3.0.0 entry to CHANGELOG.md detailing breaking changes, new features, and performance gains. - Updates README.md data mapping tables to reflect mandatory Elixir calendar structs and Decimal money types. - Documents the new pluggable Tds.Type behavior and custom type registration via extra_types. - Formalizes the deprecation of Tds.Types.UUID in favor of Ecto.UUID with protocol-level byte reordering.
|
Up to 3724dd1 was biggest shift. Phase 3 is finished! 😌 This was biggest milestone and there are some good metrics to share 📊 and there are braking changes, so this will go as v3 (that was a plan anyways). Type System Benchmark ResultsMachine: Apple M4 Pro, 48 GB, macOS PLP Chunk Reassembly (iolist vs binary concat)
Memory:
Type Decode/Encode ThroughputBaseline (before refactor, monolithic
|
| Benchmark | ips | average | median |
|---|---|---|---|
| decode integer | 7.12 M | 0.140 us | 0.125 us |
| decode string (500 chars) | 1.71 M | 0.59 us | 0.58 us |
| encode 1000 decimal params | 0.63 K | 1589.33 us | 1563.94 us |
Memory:
| Benchmark | Memory |
|---|---|
| decode integer | 72 B |
| decode string | 1.31 KB |
| encode 1000 decimal params | 1.99 MB |
Post-Refactor (behaviour handlers + DataReader + Registry)
| Benchmark | ips | average | median |
|---|---|---|---|
| decode integer | 10.96 M | 0.0912 us | 0.0830 us |
| decode string (500 chars) | 1.75 M | 0.57 us | 0.54 us |
| encode 1000 decimal params | 5.39 K | 185.50 us | 182.62 us |
Memory:
| Benchmark | Memory |
|---|---|
| decode integer | 48 B |
| decode string | 1.22 KB |
| encode 1000 decimal params | 226.56 KB |
Summary
| Metric | Improvement |
|---|---|
| Integer decode throughput | 54% faster (7.12M -> 10.96M ips) |
| String decode throughput | ~2% faster (within noise) |
| Decimal encode throughput | 8.5x faster (0.63K -> 5.39K ips) |
| Decimal encode memory | 8.8x less (1.99 MB -> 226 KB) |
| Integer decode memory | 33% less (72 B -> 48 B) |
| PLP reassembly | 8x faster, 2x less memory |
Key improvements
-
Decimal encode: Eliminated
Decimal.Contextprocess dictionary mutation. Precision and scale computed from value metadata instead ofset_decimal_precision/1global side effects. -
PLP reassembly: Iolist accumulation (
[chunk | acc]then:lists.reverse |> IO.iodata_to_binary) replaces quadraticbuf <> :binary.copy(chunk)pattern. -
Integer decode: Direct handler dispatch via Registry avoids the monolithic case/cond chain in the old Tds.Types`.
-
Binary safety: All DataReader strategies sever sub-binary references via
:binary.copy/1, preventing memory retention of entire TCP packet buffers when decoded values are held long-term.
wojtekmach
left a comment
There was a problem hiding this comment.
Congratulations!
I took a quick look and dropped two comments about public api and end-to-end testing approach in case it's something worth considering. I haven't gotten to the type extensibility part which besides performance improvements seems to be the biggest deal for v3.
| @@ -0,0 +1,239 @@ | |||
| defmodule Tds.Protocol.Binary do | |||
| @moduledoc """ | |||
There was a problem hiding this comment.
Double-checking, do you want to expose this and similar modules as part of the public API?
Neither MyXQL nor Postgrex does that.
I was thinking about maybe doing that, have a :gen_tcp-esque MyXQL.Socket (or :gen_mysql) and then probably expose protocol details in there somehow. But there was never a use case.
So yeah, unless you have a use case, I wouldn't make it public either, it's easier to open it up than changing API thats already public.
There was a problem hiding this comment.
The main driver behind this refactoring was to enable better Unit Testing for the protocol. By decoupling these modules, I can now take a message fragment from Wireshark and test it directly using binary fixtures. This makes the test setup much "cheaper" and more reliable.
In fact, this approach has already helped me catch several bugs where the type was correct but the endianness was wrong.
Regarding the module structure, elixir lacks an "internal" scope, so these modules are technically public. However, I’ve organized them this way to make the code self-documenting. By mirroring the ubiquitous language and terminology of the TDS spec. All Tds.Type.* modules are now mirroring spec language, make it easier for contributors to follow the protocol logic (I hope 😄 ) . The goal is internal clarity and spec-compliance rather than expanding the library's public API.
There was a problem hiding this comment.
I just realized you are referring to attribute, so question is why not@moduledoc false , right?
There was a problem hiding this comment.
Yes, I’d moduledoc false it until further notice :)
| @@ -0,0 +1,367 @@ | |||
| defmodule Tds.TypeIntegrationTest do | |||
There was a problem hiding this comment.
Another approach to maybe consider is full end-to-end tests. In MyXQL, they caught a bunch of protocol bugs and they serve as important regression safety net.
It's maybe a little bit hard to see at a glance (and arguably testing helpers could be significantly improved) but each assert_roundtrip inserts value, and thus tests encoding, and receives value and thus tests decoding. We do that for both text and binary protocols:
https://github.com/elixir-ecto/myxql/blob/v0.8.1/test/myxql/protocol/values_test.exs#L14:L34
There was a problem hiding this comment.
Not my happiest moment :) this is actually TDD test that we can chose to throw after I finish implementation. I'm not sure why it is named integration, maybe copy paste and bad rename issue :)
I'll change the name.
BTW, until now TDD was my only friend, so I could not replace protocol stuff that easy to reuse old integration test. But at the end, integration tests helped a lot to verify if plugging new implementation didn't brake anything. Plus we want to respect old public API , so I avoided changing integration tests
TDS v3 Internals Redesign
Goal: Incrementally rewrite the TDS library internals to deliver a well-layered, well-tested codebase with
user-extensible types, clean protocol layers, correct TLS, and proper error handling. Without breaking the
public API.
Motivation
The current TDS library (v2.x) works but has fundamental architectural problems:
types.exis 1800+ lines; adding one type requires 9-11 edits; no user extensibilitywithout forking
Tds.TlsGenServer has race conditions, unbounded buffers, and leaked processesProcess.putfor hidden state, unguarded transitions, silent transaction corruptionmacros in two modules with different endianness semantics
Phase 1: Constants & Binary Macros - Single Source of Truth
Tds.Protocol.Constants(compile-time macros for all TDS protocol constants)Tds.Protocol.Binary(unified binary macros with explicit endianness):instoptdecode bugBinaryUtilsandGrammarmodulesPhase 2: Packet Framing Layer
Tds.Protocol.Packet(encode/decode TDS packet frames with bounded reassembly)MessagesandProtocolto usePacketPhase 3: Extensible Type System
Tds.Typebehaviour (type_codes,decode_metadata,decode,encode,param_descriptor,infer)Tds.Type.Registry(TDS code <-> handler module mapping, user types checked first)uuid, money, xml)
Type.Registryinto Protocol, replacing the monolithictypes.exPhase 4: Transport Abstraction
Tds.Transportbehaviour andTransport.TcpimplementationTransport.Tls+TlsHandshakecb_info module (eliminates GenServer, fixes raceconditions)
Protocolto useTransportPhase 5: State Machine Cleanup
Protocol.Statewith explicit phase transitions (eliminatesProcess.put, guardsinvalid transitions)
Protocol.ConnectionandProtocol.ExecutionmodulesPhase 6: Error Handling
Tds.Errorwith:reason,:errors,:context,:original)Phase 7: Login/Prelogin & Connection Init Cleanup
inspect(self())thread ID)Bonus
mix tds.export_errorstask for refreshing SQL Server error codes CSVQuality Gates
mix compile --warnings-as-errorsmix format --check-formattedgrep -r "BinaryUtils\|Protocol.Grammar" lib/returns emptylib/tds.exfunction signatures match v2.x