A comprehensive endurance racing database covering IMSA WeatherTech, WEC (including 24h Le Mans), ELMS, Asian Le Mans, and Le Mans Cup. Built with Ruby and DuckDB.
# Access the database directly from HuggingFace
duckdb "hf://datasets/tobil/imsa/imsa.duckdb"
# Example: Fastest LMDh lap times by manufacturer
SELECT manufacturer, chassis, MIN(lap_time) as fastest
FROM laps
WHERE homologation = 'LMDh' AND session = 'race'
GROUP BY manufacturer, chassis
ORDER BY fastest;| Series | Years | Events | Notable Races |
|---|---|---|---|
| IMSA WeatherTech | 2020-2026 | 60+ | 24h Daytona, 12h Sebring, Petit Le Mans |
| WEC | 2020-2025 | 50+ | 24 Hours of Le Mans, Spa, Bahrain |
| ELMS | 2020-2025 | 30+ | 4 Hours races across Europe |
| Asian Le Mans | 2020-2025 | 20+ | Dubai, Abu Dhabi |
| Le Mans Cup | 2020-2025 | 20+ | LMP3 & GT3 support series |
- Multi-Series Support: IMSA, WEC, ELMS, Asian Le Mans, Le Mans Cup
- Chassis & Homologation Data: Track LMDh vs LMH vs GT3 performance
- Weather Integration: Temperature, humidity, rain conditions per lap
- Driver Licensing: FIA license levels (Platinum/Gold/Silver/Bronze)
- Cross-Series Analysis: Normalized classes for comparing across series
motorsportdb/
βββ data/ # Raw CSV data by series/year/event
β βββ imsa/
β β βββ 2021/
β β β βββ 01-daytona-international-speedway/
β β β β βββ 202101301340-race-laps.csv
β β β β βββ 202101301340-race-results.csv
β β β β βββ 202101301340-race-weather.csv
β β β βββ events.json # Event manifest with names
β β βββ ...
β βββ wec/
β βββ elms/
β βββ alms/
β βββ lmc/
β
βββ output/ # Generated database and exports
β βββ imsa.duckdb # Main DuckDB database
β βββ laps.csv # Full lap data export
β βββ drivers.csv # Driver summary
β βββ seasons.csv # Season overview
β
βββ SQL Schema Files # Processed in order (000 β 008)
β βββ 000-settings.sql # Tracks, classes, macros
β βββ 001-event-drivers.sql # Driver extraction from results
β βββ 001b-chassis.sql # Chassis homologation lookup
β βββ 002-event-laps.sql # Lap parsing with stints
β βββ 003-event-weather.sql # Weather data processing
β βββ 004-laps.sql # Main laps table with weather join
β βββ 005-season-views.sql # Year/series-specific views
β βββ 006-bpillar.sql # BPillar performance filtering
β βββ 007-class-normalization.sql # Cross-series class mapping
β βββ 008-event-metadata.sql # Circuit details, race types
β
βββ Configuration Files
β βββ tracks.json # 45 circuits with coordinates & aliases
β βββ classes.json # Main series classes (filters support series)
β βββ chassis.json # Chassis β homologation/manufacturer mapping
β
βββ Quality Assurance
β βββ lint/
β β βββ check_database.rb # Schema and data integrity checks
β β βββ check_data_quality.rb # Race session analysis
β βββ test_database.rb # Database tests
β
βββ Skills (Claude Code)
β βββ skills/
β βββ imsa/ # IMSA analysis skill
β βββ marimo/ # Notebook generation skill
β
βββ import.rb # Multi-series data importer
βββ Rakefile # Build tasks
Every lap with driver, team, chassis, and weather data.
| Column | Description |
|---|---|
series_code |
Series identifier (imsa, wec, elms, alms, lmc) |
year, event, session |
When and where |
session_id |
Unique session identifier for joins |
car, class |
Car number and racing class |
driver_name, driver_id |
Driver info (id is stable across name variants) |
lap, lap_time |
Lap number and duration |
chassis |
Full chassis name (e.g., "Porsche 963") |
homologation |
Regulatory category: LMDh, LMH, DPi, LMP2, LMP3, GTE, GT3 |
manufacturer |
Car manufacturer |
license, license_rank |
FIA license level and numeric rank |
stint_number, stint_lap |
Stint tracking |
air_temp_f, track_temp_f, raining |
Weather conditions |
High-level overview of all sessions.
Aggregated driver info with latest license and team.
Lap-by-lap Elo ratings computed independently per class. Every driver starts at 1500.
| Column | Type | Description |
|---|---|---|
driver_id |
VARCHAR | Normalized driver identifier |
driver_name |
VARCHAR | Display name |
class |
VARCHAR | GTP, LMP2, GTD, etc. (independent pools) |
series_code |
VARCHAR | imsa, wec, elms |
year |
VARCHAR | Season year |
event |
VARCHAR | Event name |
session_date |
TIMESTAMP | Event date |
elo_before |
INTEGER | Elo before event (0 for first event) |
elo_after |
INTEGER | Elo after event |
delta |
INTEGER | Change (first event includes +1500 base) |
laps |
INTEGER | Laps driven in event |
cumulative_laps |
INTEGER | Career laps in class |
Key design: First event's delta includes +1500 base, so SUM(delta) always equals current Elo.
-- Current Elo for a driver
SELECT SUM(delta) as elo FROM driver_elo
WHERE driver_id = 'laurens vanthoor' AND class = 'GTP';
-- Elo leaderboard
SELECT driver_name, elo, total_laps FROM driver_elo_current
WHERE class = 'GTP' ORDER BY elo DESC LIMIT 10;
-- Elo at a point in time
SELECT SUM(delta) as elo FROM driver_elo
WHERE driver_id = 'laurens vanthoor' AND class = 'GTP'
AND session_date <= '2024-06-01';Pre-aggregated leaderboard with current Elo, total laps, and event counts per driver/class.
Track coordinates and metadata from tracks.json.
Maps chassis names to homologation categories.
| Category | Description | Example Chassis |
|---|---|---|
| LMDh | Le Mans Daytona hybrid | Porsche 963, Cadillac V-Series.R, BMW M Hybrid V8 |
| LMH | Le Mans Hypercar | Toyota GR010, Ferrari 499P, Peugeot 9X8 |
| DPi | Daytona Prototype international | Cadillac DPi, Acura DPi (discontinued) |
| LMP2 | Le Mans Prototype 2 | Oreca 07, Dallara LMP2 |
| LMP3 | Le Mans Prototype 3 | Ligier JS P320, Duqueine D08 |
| GTE | GT Endurance | Corvette C8.R, Ferrari 488 GTE |
| GT3 | GT3 specification | Ferrari 296 GT3, Porsche 911 GT3 R |
# Import a specific series/year
ruby import.rb imsa 2024
ruby import.rb wec 2024
ruby import.rb elms 2024
# Or use rake tasks
rake import # IMSA current year
rake import_wec # WEC current year
rake import_all[2024] # All series for 2024rake db:update # Build/rebuild database
rake db:open # Open in DuckDB CLIrake lint # Database integrity checks
rake lint_data # Data quality analysis
rake check # Full check (db:update + lint + lint_data)- Never compare lap times across different events - Track conditions, weather, and layouts vary too much
- Never compare lap times across different classes - GTP, LMP2, GTD are completely different cars
- Always filter by session_id for meaningful comparisons - This ensures same track/conditions
- Default to race sessions - Practice and qualifying have different strategies
The bpillar_quartile column (race sessions only) intelligently filters laps for representative pace:
| Quartile | Meaning | Use for |
|---|---|---|
| 1 | Fastest 25% | Pure pace analysis |
| 2 | 25-50% | Good representative pace |
| 3 | 50-75% | Traffic, marginal conditions |
| 4 | Slowest 25% | Pit laps, cautions, issues |
| NULL | Excluded | First lap, pit in/out, outliers |
Best practice: Filter to bpillar_quartile IN (1, 2) for pace analysis. This automatically excludes:
- First lap of the race
- Pit in/out laps
- Laps under caution
- Laps with traffic or issues
- Check
rainingcolumn for wet conditions - Wet and dry lap times are NOT comparable
- Temperature affects tire performance significantly
- Use
air_temp_fandtrack_temp_ffor analysis
SELECT
homologation,
manufacturer,
COUNT(DISTINCT driver_id) as drivers,
MIN(lap_time) as fastest_lap,
AVG(lap_time) as avg_lap
FROM laps
WHERE class IN ('GTP', 'HYPERCAR')
AND session = 'race'
AND bpillar_quartile IN (1, 2)
GROUP BY homologation, manufacturer
ORDER BY fastest_lap;SELECT
driver_name,
STRING_AGG(DISTINCT series_code, ', ') as series,
COUNT(DISTINCT event) as events
FROM laps
WHERE year = '2024'
GROUP BY driver_name
HAVING COUNT(DISTINCT series_code) > 1
ORDER BY events DESC;SELECT
CASE WHEN raining THEN 'Wet' ELSE 'Dry' END as conditions,
class,
AVG(lap_time) as avg_lap,
COUNT(*) as laps
FROM laps
WHERE session = 'race' AND event = 'Le Mans'
GROUP BY raining, class
ORDER BY class, conditions;All data is collected from Al Kamel Systems timing infrastructure used by ACO-sanctioned series.
MIT
- Add new tracks to
tracks.json - Add new chassis to
chassis.json - Run
rake checkto validate changes