Skip to content

Official Code Repo for the paper "Supernova Event Dataset: Interpreting Large Language Model's Personality through Critical Event Analysis" accepted at Actionable Interpretability Workshop at ICML 2025

License

Notifications You must be signed in to change notification settings

errai34/Supernova-Event-Dataset

 
 

Repository files navigation

Supernova Event Dataset: Official Code Repository

Paper Dataset License: MIT Python 3.8+

Official repository for the research paper "Supernova Event Dataset: Interpreting Large Language Models' Personality through Critical Event Analysis" by

Pranav Agarwal1 · Ioana Ciucă2

1Mila, Quebec AI Institute · 2Stanford University

Actionable Interpretability Workshop at ICML 2025

📄 Paper | 🌐 Project Page | 🤗 Dataset | 📊 Demo

Overview of the Supernova Event Dataset methodology

Overview

In this work, we interpret the personality traits of Large Language Models (LLMs) using our proposed Supernova Event Dataset, which includes Wikipedia articles consisting of historical events, biographies, news events, and scientific discoveries. We benchmark models based on their identification and ranking of key life or discovery events, a complex task requiring causal reasoning. A second LLM acts as a judge to infer each model’s personality based on its event selection and interpretation. Our analysis show distinct traits, like emotional reasoning in Orca 2 and analytical framing in Qwen 2.5, enhancing interpretability and trust.

Quick Start

Prerequisites

  • Python 3.8 or higher
  • API keys for OpenAI, Anthropic, and/or Gemini (depending on models used)

Installation

# Clone the repository
git clone https://github.com/pranaval/supernova-event-dataset.git
cd supernova-event-dataset

# Create virtual environment
python3.8 -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GOOGLE_API_KEY=your_google_key_here

Dataset

The Supernova Event Dataset contains 592 carefully curated Wikipedia articles:

Domain Count Description
🎭 Biographies 192 Life stories of influential figures
📰 News Events 200 Major contemporary events
📚 Historical Events 200 Significant historical occurrences
🔬 Scientific Discoveries 25* Comprehensive discovery narratives

*Scientific discoveries use Google Gemini 2.5 Pro with Deep Research for comprehensive articles

Download Dataset

# Extract all datasets
tar -zxvf Dataset/biographies.tar.xz
tar -zxvf Dataset/historical-events.tar.xz
tar -zxvf Dataset/major-news-events.tar.xz

Usage

1️⃣ Extract Critical Events

Run event extraction for each domain:

# Biographies
python biography_dataset.py --model orca-2

# Historical Events  
python history_dataset.py --model phi-4

# News Events
python news_dataset.py --model orca-2

# Movie Scripts (optional additional domain)
python movies_dataset.py --model qwen-2.5

2️⃣ Analyze Model Personality

Consolidate results and extract personality patterns:

# Generate personality analysis for all models
python extract_personality.py

3️⃣ Visualize Results

Create personality visualizations:

# Generate radar plots and semantic space mapping
python plot_personality.py

Citation

If you find our work useful, please cite:

@article{agarwal2025supernova,
  title={Supernova Event Dataset: Interpreting Large Language Models' Personality through Critical Event Analysis},
  author={Agarwal, Pranav and Ciucă, Ioana},
  journal={arXiv preprint arXiv:2506.12189},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Wikipedia for article content
  • Model providers (OpenAI, Anthropic, Google) for API access
  • Fundamental of Ollama

About

Official Code Repo for the paper "Supernova Event Dataset: Interpreting Large Language Model's Personality through Critical Event Analysis" accepted at Actionable Interpretability Workshop at ICML 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%