SceneIt: Video Semantic Search with CLIP and Pinecone

SceneIt is a system for semantic video search. Instead of scrubbing through footage manually, you can query your films with natural language (e.g., "when does a dog appear on the couch?") or with an image example (e.g., upload a still frame from another film to find visually similar scenes).

At its core, SceneIt extracts representative frames from videos, embeds them using OpenAI CLIP, and indexes the embeddings in Pinecone for efficient semantic search.

Demo :

Features

Upload videos directly to S3 via presigned URLs for scalability and resumability.
Automatic shot boundary detection with PySceneDetect (FFmpeg backend).
Keyframe extraction: one representative frame per detected shot, stored in S3.
CLIP embeddings for both image frames and text queries.
Pinecone vector database for fast nearest-neighbor search.
Two query modes:
- Text → Frame search: encode natural language queries and retrieve matching frames.
- Image → Frame search: encode an uploaded still image and retrieve visually similar frames.
JSON manifest for each processed video, including metadata, thumbnails, and embedding status.

Architecture

Upload
- Frontend requests a presigned S3 URL from the FastAPI backend.
- Video file is uploaded directly to S3.
- Backend stores the S3 URI for further processing.
Process
- Backend downloads the video locally from S3.
- PySceneDetect with FFmpeg detects shots.
- One keyframe per shot is extracted and uploaded to S3.
- Each keyframe is embedded with CLIP.
- Embeddings and metadata are inserted into Pinecone.
Search
- Text queries are encoded into CLIP text embeddings.
- Image queries are encoded into CLIP image embeddings.
- Pinecone performs k-NN search against the stored embeddings.
- The system returns top-k matching frames with timestamps and S3 URLs.

Tech Stack

Frontend: Next.js, React, styled-components
Backend: FastAPI, boto3 (S3 integration), PySceneDetect, ffmpeg, OpenCV (optional frame processing)
ML model: CLIP (via HuggingFace Transformers)
Vector DB: Pinecone
Storage: Amazon S3

Setup

Environment variables (`.env` in backend)

AWS_REGION=us-east-1
S3_BUCKET=sceneit
S3_PREFIX=uploads/
AWS_ACCESS_KEY_ID=your-key-id
AWS_SECRET_ACCESS_KEY=your-secret-key

PINECONE_API_KEY=your-pinecone-key
PINECONE_ENVIRONMENT=us-east1-gcp

Install Dependencies

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Frontend

cd frontend 
npm install

Root Project

npm install

Run Services

Run Back/Front

In root:

npm run deev

Run Only Frontend

npm run dev:frontend

Run Only Backend

npm run dev:backend

Example Workflows

Text Query

POST /search_embeddings
{
  "filename": "s3://sceneit/uploads/myfilm.mp4",
  "text_search": "man holding a red umbrella",
  "top_k": 5
}

Response: Returns 5 frames with timestamps where CLIP embeddings match the query.

Image Query

POST /search_embeddings
{
  "filename": "s3://sceneit/uploads/myfilm.mp4",
  "image_search": <upload-file>,
  "top_k": 5
}

Response: Finds frames visually similar to the uploaded image.

Pinecone Model

Frame Embedding Record

{
  "frame_id": "uuid",
  "video_id": "uuid",
  "shot_index": 3,
  "time_sec": 42.8,
  "vector": [0.123, -0.456, 0.789,...],
  "modality": "image",
  "s3_key": "uploads/frames/myfilm/shot_0003.jpg"
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.next		.next
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SceneIt: Video Semantic Search with CLIP and Pinecone

Demo :

Features

Architecture

Tech Stack

Setup

Environment variables (`.env` in backend)

Install Dependencies

Backend

Frontend

Root Project

Run Services

Run Back/Front

Run Only Frontend

Run Only Backend

Example Workflows

Text Query

Image Query

Pinecone Model

About

Uh oh!

Releases

Packages

Languages

chzzou92/sceneit

Folders and files

Latest commit

History

Repository files navigation

SceneIt: Video Semantic Search with CLIP and Pinecone

Demo :

Features

Architecture

Tech Stack

Setup

Environment variables (.env in backend)

Install Dependencies

Backend

Frontend

Root Project

Run Services

Run Back/Front

Run Only Frontend

Run Only Backend

Example Workflows

Text Query

Image Query

Pinecone Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Environment variables (`.env` in backend)

Packages