CalmSet: A Domain-Specific Test Collection for Affective Music Retrieval for Children with ASD
CalmSet is a domain-specific test collection designed to support research in affective music retrieval, music emotion recognition, and human-in-the-loop annotation for therapeutic music contexts. The dataset and accompanying code formalize a modular therapeutic music library originally developed for interactive music therapy and repurpose it as a reproducible benchmarking resource.
This repository contains the end-to-end annotation, aggregation, and benchmarking pipeline used to construct CalmSet, corresponding directly to the algorithms and analyses described in the paper.
CalmSet/ ├── CLAP Annotations (Algorithm 1)/ ├── GPT Descriptions (Algorithm 2)/ ├── CrowdSourced Human Annotations/ ├── Borda Aggregation (Algorithm 3)/ ├── Dataset Characterization (5.1–5.4)/ ├── Benchmarking (Section 5.5)/ ├── README.md ├── LICENSE └── .gitignore
CalmSet is constructed using a hybrid human-in-the-loop annotation pipeline:
- CLAP proposes candidate affective intent labels for each audio track.
- GPT-based models generate auxiliary semantic descriptions conditioned on candidate labels.
- Crowd workers provide ranked judgments of therapeutic or affective intent without seeing model outputs.
- Human judgments are aggregated using a Borda count–based procedure to produce graded relevance annotations.
- The resulting dataset is evaluated using standard information retrieval benchmarks.
Implements Algorithm 1 from the paper.
- Runs zero-shot inference using CLAP (Contrastive Language–Audio Pretraining).
- Produces candidate affective or therapeutic intent labels.
- Computes similarity scores between audio tracks and label prompts.
Outputs from this stage are not shown to human annotators and are used only as candidate signals and baselines.
Implements Algorithm 2 from the paper.
- Generates natural-language semantic descriptions for each track.
- Conditions generation on the top-ranked CLAP labels.
- Produces auxiliary textual metadata used to contextualize affective intent.
These descriptions support analysis and benchmarking but are not treated as ground-truth labels.
Contains scripts and processing logic for collecting ranked judgments from qualified crowd workers.
Key properties:
- Annotators rank affective or therapeutic intent labels.
- Annotators do not see CLAP predictions or GPT descriptions.
- Multiple independent judgments are collected per track.
This stage captures the raw human signal prior to aggregation.
Implements Algorithm 3 from the paper.
- Aggregates ranked human judgments using a Borda count–based procedure.
- Converts rankings into graded relevance scores.
- Applies deterministic tie-breaking to ensure reproducibility.
The output of this stage forms the final gold relevance annotations.
Analysis scripts supporting Sections 5.1–5.4 of the paper.
Includes:
- Dataset statistics and distributions,
- Label frequency and co-occurrence analysis,
- Structural diversity of modular compositions,
- Agreement and consistency analyses.
Implements retrieval benchmarks described in Section 5.5 of the paper.
Includes:
- Baseline retrieval methods (e.g., BM25, CLAP-based retrieval),
- Evaluation using graded relevance metrics (nDCG@k, MAP@k, Recall@k),
- Macro-averaged results across affective intent queries.
This folder enables reproducible benchmarking using CalmSet.
The audio files and final annotation CSVs are released separately via a public dataset repository.
This GitHub repository contains all code necessary to reproduce the annotation pipeline, aggregation procedure, and evaluation results.
CalmSet is intended for:
- Affective and therapeutic music retrieval research,
- Music emotion recognition and representation learning,
- Evaluation of zero-shot and weakly supervised audio–text models,
- Studies of human–AI agreement in semantic music annotation.
The dataset is designed for graded relevance evaluation, not only binary classification.
This repository is released under the MIT License.
Please see the LICENSE file for details.
If you use this code or the CalmSet dataset in your research, please cite the accompanying paper: TBD
For questions, issues, or collaboration inquiries, please open a GitHub issue or contact the authors directly at sensifylab@gmail.com
The dataset can be found on Kaggle at https://www.kaggle.com/datasets/sensifylab/calmset-music-for-children-with-asd/data