Official repository for the research paper "Supernova Event Dataset: Interpreting Large Language Models' Personality through Critical Event Analysis" by
Pranav Agarwal1 · Ioana Ciucă2
1Mila, Quebec AI Institute · 2Stanford University
Actionable Interpretability Workshop at ICML 2025
📄 Paper | 🌐 Project Page | 🤗 Dataset | 📊 Demo
In this work, we interpret the personality traits of Large Language Models (LLMs) using our proposed Supernova Event Dataset, which includes Wikipedia articles consisting of historical events, biographies, news events, and scientific discoveries. We benchmark models based on their identification and ranking of key life or discovery events, a complex task requiring causal reasoning. A second LLM acts as a judge to infer each model’s personality based on its event selection and interpretation. Our analysis show distinct traits, like emotional reasoning in Orca 2 and analytical framing in Qwen 2.5, enhancing interpretability and trust.
- Python 3.8 or higher
- API keys for OpenAI, Anthropic, and/or Gemini (depending on models used)
# Clone the repository
git clone https://github.com/pranaval/supernova-event-dataset.git
cd supernova-event-dataset
# Create virtual environment
python3.8 -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the root directory:
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GOOGLE_API_KEY=your_google_key_hereThe Supernova Event Dataset contains 592 carefully curated Wikipedia articles:
| Domain | Count | Description |
|---|---|---|
| 🎭 Biographies | 192 | Life stories of influential figures |
| 📰 News Events | 200 | Major contemporary events |
| 📚 Historical Events | 200 | Significant historical occurrences |
| 🔬 Scientific Discoveries | 25* | Comprehensive discovery narratives |
*Scientific discoveries use Google Gemini 2.5 Pro with Deep Research for comprehensive articles
# Extract all datasets
tar -zxvf Dataset/biographies.tar.xz
tar -zxvf Dataset/historical-events.tar.xz
tar -zxvf Dataset/major-news-events.tar.xzRun event extraction for each domain:
# Biographies
python biography_dataset.py --model orca-2
# Historical Events
python history_dataset.py --model phi-4
# News Events
python news_dataset.py --model orca-2
# Movie Scripts (optional additional domain)
python movies_dataset.py --model qwen-2.5Consolidate results and extract personality patterns:
# Generate personality analysis for all models
python extract_personality.py
Create personality visualizations:
# Generate radar plots and semantic space mapping
python plot_personality.py
If you find our work useful, please cite:
@article{agarwal2025supernova,
title={Supernova Event Dataset: Interpreting Large Language Models' Personality through Critical Event Analysis},
author={Agarwal, Pranav and Ciucă, Ioana},
journal={arXiv preprint arXiv:2506.12189},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Wikipedia for article content
- Model providers (OpenAI, Anthropic, Google) for API access
- Fundamental of Ollama
