A lightweight demonstration of controlled AI execution using the OpenAI API.
This project evaluates flat .txt files representing individual profiles.
Each file may contain:
- A stated occupation
- A job description
- Supporting (or conflicting) responsibilities
The agent must determine — based only on the document text — whether the person is truly a BAKER.
Importantly, some files include a false or misleading self-described label (e.g., someone claims “baker” but performs no baking duties).
This ensures the system is not performing a simple keyword search.
Instead, the model must:
- Analyze responsibilities
- Compare claims against duties
- Detect contradictions
- Provide verbatim evidence
- Return structured JSON under a strict schema
The real objective is demonstrating a reproducible, auditable agent loop:
Agentic API calling is just a controlled loop: iterate files → call model → parse JSON → persist results.
- Project Structure
- Installation
- Setting Up OpenAI API Access
- Environment Setup
- Agentic API Design Philosophy
- Biggest Issue with AI + API Calling
- Token Planning Guide
- Engineering Takeaway
- Running the Demo
- Why This Matters
Place your files in the following layout:
people_demo/
sample_people/
01_amelia_hart_true_baker.txt
02_marcus_lee_false_baker_label.txt
03_priya_nair_data_analyst.txt
04_jose_alvarez_chef_trap_overlap.txt
05_sarah_kim_flight_attendant.txt
run_role_check.py
.env
Install required dependencies:
pip install openai python-dotenv
If you need help installing Python or running commands in PowerShell, see the setup guide:
To run this project, you need an OpenAI API key.
- Go to https://platform.openai.com\
- Sign up or log in.\
- Verify your email if prompted.
The OpenAI API is usage-based (pay per token).
- Navigate to Billing in the dashboard.\
- Add a payment method (credit/debit card).\
- Optionally set a monthly spending limit for safety.
For small demos like this, usage typically costs only a few cents.
- Go to API Keys in the dashboard.\
- Click Create new secret key.\
- Copy the key immediately (you won't be able to see it again).
It will look like:
sk-xxxxxxxxxxxxxxxxxxxxxxxx
Create a .env file in the root directory:
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini
Using python-dotenv keeps credentials out of source control and makes
the demo clean and portable.
This demo uses structured, agent-style prompting:
- Each file is evaluated independently
- Prompts are minimal and role-focused
- Output is structured JSON
- No long context windows
- No stuffing multiple documents into a single call
This avoids a major hidden failure mode of large language models.
Once you stuff a large amount of text into the context window, model performance often degrades instead of improving.
This is counterintuitive.
Engineers frequently assume:
"More context = more accuracy."
In practice:
- Attention becomes diluted
- Signal-to-noise ratio drops
- Subtle contradictions get ignored
- The model begins inferring beyond the provided evidence
Large prompts increase:
- Hallucination risk
- Instruction drift
- Cost
- Latency
This demo intentionally keeps calls small and controlled.
For engineering estimation:
1,000 words ≈ 1,300--1,500 tokens
Tokens ≈ words × 1.3
Tokens Approx Words
1,000 ~750 5,000 ~3,800 10,000 ~7,500 100,000 ~75,000
For agentic document parsing systems:
- Keep prompts small
- Chunk larger documents
- Use retrieval (RAG) when needed
- Keep API calls under ~10--20k tokens
- Never assume bigger context equals better reasoning
Well-designed small calls often outperform massive context dumps.
Once your .env is configured and dependencies are installed:
python run_role_check.py
The script will:
- Iterate over the
sample_peoplefolder - Evaluate each profile
- Output structured results (e.g., JSON)
Below is a sample structured output generated by the agent:
The agent extracts structured fields including:
namestated_occupationbaker_statusevidence(direct text excerpts from the source file)reason(concise model justification)_file(source filename reference)
This demonstrates how the loop produces deterministic, structured outputs from unstructured text inputs.
This project demonstrates:
- Controlled AI execution
- Deterministic evaluation pipelines
- Structured output enforcement
- Practical token budgeting
- Real-world API design discipline
AI systems should be architected --- not prompted casually.
This repository was created for educational and demonstration purposes only.
Created by M. Joseph Tomlinson IV
Contact: mjtiv@udel.edu
Feel free to use, modify, adapt, and build upon this project as you see fit.

