Skip to content

Releases: Agent-CI/embedsim

v0.1.1

05 Oct 22:27

Choose a tag to compare

What's Changed

  • More robust configuration, more explicit model names by @tcdent in #4

New Contributors

  • @tcdent made their first contribution in #4

Full Changelog: v0.1.0...v0.1.1

embedsim 0.1.0

04 Oct 01:22

Choose a tag to compare

Release Notes - embedsim v0.1.0

A Python library for measuring semantic similarity and detecting outliers in text collections using
embeddings.

Features

Core Functionality:

  • pairsim() - Compare two texts using cosine similarity of their embeddings
  • groupsim() - Analyze text collections and identify outliers using centroid-based coherence scoring

Embedding Model Support:

  • OpenAI models (openai-3-small, openai-3-large) via API
  • Local sentence-transformer models (Jina v2, MiniLM, etc.) for privacy and offline use
  • Configurable via function parameters or environment variables

Use Cases:

  • Content moderation and off-topic detection
  • Document clustering and outlier identification
  • Quality assurance for generated content
  • Search relevance scoring
  • Duplicate detection