GitHub - padieul/tactic-annot: Experimental Lean 4 proof automation system using LLM-guided aesop tactic based strategies and logging successful hints for future annotations.

Aesop Annotation Discovery Agent

Description:

This is a prototype of a hybrid LLM-guided proof automation system with a meta-learning objective developed during ItaLean2025. More specifically:

Primary Function: An agent that attempts to automatically prove theorems using Lean 4's aesop tactic with varying configurations
Meta-Learning Goal: Extract implicit proof knowledge (which lemmas/hints make proofs work) to generate @[aesop] annotations for Mathlib. (not implemented yet)

Strategy Pattern: Multiple proof approaches (naive aesop, LLM-assisted)
LLM Workflow: Uses LLM to suggest aesop hints when simpler strategies fail
Iterative Refinement: Retries with error feedback and temperature decay
Knowledge Extraction: Captures successful proof patterns in a registry

Experiments:

All experiments are based on the following Mathlib file containing 124 theorems/lemmas.

Note: These experiments used non-reasoning LLMs (Qwen3-Coder models). Using reasoning/thinking models or just better/larger models could potentially improve performance significantly.

30B Model (Qwen3-Coder-30B-A3B-Instruct):

Run 1: Successes | Failures - 95/124 theorems (80 naive + 14 LLM)
Run 2: Successes | Failures - 2/28 theorems
Run 3: Successes | Failures - 0/26 theorems
Final Result: 97/124 theorems proven (78.2%)

480B Model (Qwen3-Coder-480B-A35B-Instruct):

Run 1: Successes | Failures - 103/124 theorems (80 naive + 22 LLM)
Run 2: Successes | Failures - 2/21 theorems
Run 3: Successes | Failures - 0/19 theorems
Final Result: 105/124 theorems proven (84.7%)

Example Successes (480B Model, Run 2):

Success: eval₂_pow

Original theorem:

theorem eval₂_pow (n : Nat) : (p ^ n).eval₂ f x = p.eval₂ f x ^ n :=
  (eval₂RingHom _ _).map_pow _ _

LLM-generated aesop proof:

theorem eval₂_pow (n : Nat) : (p ^ n).eval₂ f x = p.eval₂ f x ^ n := by
  induction n with
  | zero => simp [pow_zero, eval₂_one]
  | succ n ih => 
      simp [pow_succ, eval₂_mul, ih]

Successfully proven after 2 LLM attempts using induction strategy

Success: eval₂_eq_zero_of_dvd_of_eval₂_eq_zero

Original theorem:

theorem eval₂_eq_zero_of_dvd_of_eval₂_eq_zero (h : p ∣ q) (h0 : eval₂ f x p = 0) :
    eval₂ f x q = 0 :=
  zero_dvd_iff.mp (h0 ▸ eval₂_dvd f x h)

LLM-generated aesop proof:

theorem eval₂_eq_zero_of_dvd_of_eval₂_eq_zero (h : p ∣ q) (h0 : eval₂ f x p = 0) :
    eval₂ f x q = 0 := by
  aesop (add norm simp [eval₂_dvd, Dvd.dvd])

Successfully proven after 4 LLM attempts by adding divisibility lemmas to aesop

Example Failures (480B Model, Run 2):

Failure (Syntax Error): eval₂_multiset_sum

Theorem:

theorem eval₂_multiset_sum (s : Multiset R[X]) (x : S) :
    eval₂ f x s.sum = (s.map (eval₂ f x)).sum

LLM-suggested proof:

theorem eval₂_multiset_sum (s : Multiset R[X]) (x : S) :
    eval₂ f x s.sum = (s.map (eval₂ f x)).sum :=
  by aesop (add safe (apply map_multiset_sum)) (add norm (apply eval₂AddMonoidHom))

Error:

aesop: Unable to interpret 'apply map_multiset_sum' as a safe rule. Try specifying a builder.

Analysis: The LLM correctly identified relevant lemmas but used incorrect aesop syntax. The apply builder requires proper specification (e.g., safe apply map_multiset_sum instead of safe (apply map_multiset_sum)).

Failure (Partial Progress): mul_X_add_natCast_comp

Theorem:

theorem mul_X_add_natCast_comp {n : Nat} :
    (p * (X + (n : R[X]))).comp q = p.comp q * (q + n)

LLM-suggested proof:

theorem mul_X_add_natCast_comp {n : Nat} :
    (p * (X + (n : R[X]))).comp q = p.comp q * (q + n) := by
  aesop (add norm simp [mul_add, add_comp, mul_X_comp, natCast_mul_comp, eval₂_natCast, coe_eval₂RingHom])

Error:

unsolved goals
R : Type u_1
inst : Semiring R
p q : R[X]
n : ℕ
⊢ p.comp q * ↑n + p.comp q * q = p.comp q * (q + ↑n)

Analysis: Aesop made partial progress by simplifying the left side but couldn't complete the proof. The remaining goal is a simple ring equality that requires additional lemmas about distributivity or commutativity in the aesop ruleset.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
aesop_agent		aesop_agent
context		context
data		data
docker		docker
experiments		experiments
images		images
logs		logs
strategies		strategies
validators		validators
.gitignore		.gitignore
lean_cleaner.py		lean_cleaner.py
main.py		main.py
output.log		output.log
readme.md		readme.md
requirements.txt		requirements.txt
run_logger.py		run_logger.py
theorem_extractor.py		theorem_extractor.py
theorem_registry.py		theorem_registry.py
theorem_validator.py		theorem_validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Aesop Annotation Discovery Agent

Description:

Experiments:

Example Successes (480B Model, Run 2):

Example Failures (480B Model, Run 2):

About

Uh oh!

Languages

padieul/tactic-annot

Folders and files

Latest commit

History

Repository files navigation

Aesop Annotation Discovery Agent

Description:

Experiments:

Example Successes (480B Model, Run 2):

Example Failures (480B Model, Run 2):

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages