Skip to content

πŸ€– μžλ™ν™”λœ AI λ…Όλ¬Έ 팟캐슀트 μ„œλΉ„μŠ€ - HuggingFace νŠΈλ Œλ”© 논문을 Gemini Pro둜 μš”μ•½ν•˜κ³  Google TTS둜 μŒμ„± λ³€ν™˜ν•˜μ—¬ 맀일 μ•„μΉ¨ 팟캐슀트둜 제곡

Notifications You must be signed in to change notification settings

hanseungsoo13/papercast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

57 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PaperCast: AI λ…Όλ¬Έ 팟캐슀트 ν”Œλž«νΌ

맀일 μ•„μΉ¨ μžλ™μœΌλ‘œ Hugging Face νŠΈλ Œλ”© λ…Όλ¬Έ Top 3λ₯Ό μˆ˜μ§‘ν•˜μ—¬ Gemini Pro둜 μš”μ•½ν•˜κ³ , Google TTS둜 μŒμ„± λ³€ν™˜ν•œ ν›„ Google Cloud Storage에 μ—…λ‘œλ“œν•˜μ—¬ μ›Ή ν”Œλž«νΌμ—μ„œ μž¬μƒ/λ‹€μš΄λ‘œλ“œ κ°€λŠ₯ν•˜κ²Œ λ§Œλ“œλŠ” ν’€μŠ€νƒ μžλ™ν™” 팟캐슀트 μ„œλΉ„μŠ€μž…λ‹ˆλ‹€.

Features

  • πŸ€– μžλ™ μˆ˜μ§‘: 맀일 μ•„μΉ¨ 6μ‹œ(KST) Hugging Face νŠΈλ Œλ”© λ…Όλ¬Έ Top 3 μžλ™ μˆ˜μ§‘
  • πŸ“ AI μš”μ•½: Google Gemini Proλ₯Ό μ‚¬μš©ν•œ ν•œκ΅­μ–΄ μš”μ•½ 생성
  • πŸ“„ 3쀄 μš”μ•½: 각 논문별 핡심 λ‚΄μš©μ„ 3μ€„λ‘œ 간단 μš”μ•½
  • πŸŽ™οΈ TTS λ³€ν™˜: Google Cloud Text-to-Speech둜 κ³ ν’ˆμ§ˆ μŒμ„± 생성
  • ☁️ ν΄λΌμš°λ“œ μ €μž₯: Google Cloud Storage에 MP3 파일 μ—…λ‘œλ“œ
  • 🌐 ν’€μŠ€νƒ μ›Ή ν”Œλž«νΌ: FastAPI λ°±μ—”λ“œ + Next.js ν”„λ‘ νŠΈμ—”λ“œ
  • πŸ“± λ°˜μ‘ν˜• UI: λͺ¨λ°”일/λ°μŠ€ν¬ν†± μ΅œμ ν™”λœ μ‚¬μš©μž μΈν„°νŽ˜μ΄μŠ€
  • 🎡 κ³ κΈ‰ μ˜€λ””μ˜€ ν”Œλ ˆμ΄μ–΄: μž¬μƒ/μΌμ‹œμ •μ§€, λ³Όλ₯¨ 쑰절, ꡬ간 이동
  • πŸ“„ λ…Όλ¬Έ λ·°μ–΄: ArXiv PDF 직접 링크 및 μž„λ² λ“œ 지원
  • πŸ”— 슀마트 링크: λ…Όλ¬Έ 상세 νŽ˜μ΄μ§€, 원문 링크, μ—ν”Όμ†Œλ“œ λ„€λΉ„κ²Œμ΄μ…˜
  • πŸ”„ μ™„μ „ μžλ™ν™”: GitHub Actionsλ₯Ό ν†΅ν•œ 무인 운영
  • πŸ“’ Slack μ•Œλ¦Ό: 성곡/μ‹€νŒ¨ μ•Œλ¦Ό 및 μ›ΉνŽ˜μ΄μ§€ 링크 포함

Quick Start

방법 1: GitHub Actions μžλ™ 배포 (ꢌμž₯)

1. GitHub Secrets μ„€μ •

Repository β†’ Settings β†’ Secrets and variables β†’ Actionsμ—μ„œ λ‹€μŒ Secrets μ„€μ •:

Secret Name Description
GEMINI_API_KEY Google Gemini API ν‚€
GCP_SERVICE_ACCOUNT_KEY GCP Service Account JSON (base64)
GCP_PROJECT_ID Google Cloud ν”„λ‘œμ νŠΈ ID
GCS_BUCKET_NAME Google Cloud Storage 버킷 이름
DATABASE_URL PostgreSQL λ°μ΄ν„°λ² μ΄μŠ€ URL
VERCEL_TOKEN Vercel API 토큰
VERCEL_ORG_ID Vercel 쑰직 ID
VERCEL_PROJECT_ID Vercel ν”„λ‘œμ νŠΈ ID
SLACK_WEBHOOK_URL Slack μ›Ήν›… URL (선택사항)

2. μžλ™ 배포 μ‹€ν–‰

  • 맀일 6μ‹œ KST: μžλ™ μ‹€ν–‰
  • μˆ˜λ™ μ‹€ν–‰: Repository β†’ Actions β†’ Daily Podcast Generation β†’ Run workflow

3. 배포 κ²°κ³Ό 확인

  • ν”„λ‘ νŠΈμ—”λ“œ: https://papercast.vercel.app
  • λ°±μ—”λ“œ: https://papercast-backend-xxx-uc.a.run.app
  • API λ¬Έμ„œ: https://papercast-backend-xxx-uc.a.run.app/docs

방법 2: 둜컬 개발 ν™˜κ²½

Prerequisites

  • Python 3.12 이상
  • Node.js 18 이상
  • uv (Python νŒ¨ν‚€μ§€ λ§€λ‹ˆμ €)
  • npm λ˜λŠ” yarn (Node.js νŒ¨ν‚€μ§€ λ§€λ‹ˆμ €)
  • Google Cloud Platform 계정
  • GitHub 계정

Installation

  1. Clone the repository:
git clone https://github.com/hanseungsoo13/papercast.git
cd papercast
  1. Install uv (if not already installed):
# Linux/Mac
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv
  1. Install dependencies with uv:
# κ°€μƒν™˜κ²½ μžλ™ 생성 및 μ˜μ‘΄μ„± μ„€μΉ˜
uv sync

# λ˜λŠ” 개발 μ˜μ‘΄μ„± 포함 μ„€μΉ˜
uv sync --dev
  1. Configure environment:

ν”„λ‘œμ νŠΈ λ£¨νŠΈμ— .env 파일 생성:

# .env 파일 생성
touch .env

.env νŒŒμΌμ— λ‹€μŒ λ‚΄μš© μž…λ ₯:

# Google Gemini API Key (ν•„μˆ˜)
# λ°œκΈ‰: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_gemini_api_key_here

# Google Cloud Service Account (ν•„μˆ˜)
# GCP Consoleμ—μ„œ Service Account 생성 ν›„ JSON ν‚€ λ‹€μš΄λ‘œλ“œ
GOOGLE_APPLICATION_CREDENTIALS=./credentials/service-account.json

# Google Cloud Storage Bucket Name (ν•„μˆ˜)
GCS_BUCKET_NAME=papercast-podcasts

# Optional: 기타 μ„€μ •
TZ=Asia/Seoul
LOG_LEVEL=INFO
PAPERS_TO_FETCH=3
PODCAST_TITLE_PREFIX=Daily AI Papers

Service Account JSON ν‚€ μ €μž₯:

# credentials 디렉토리 생성
mkdir -p credentials

# GCP Consoleμ—μ„œ λ‹€μš΄λ‘œλ“œν•œ JSON ν‚€λ₯Ό μ €μž₯
# (예: service-account.json)
cp ~/Downloads/your-service-account-key.json credentials/service-account.json
  1. μ„€μ • 검증 (ꢌμž₯):
# uvλ₯Ό μ‚¬μš©ν•œ μ„€μ • 검증
uv run python check_config.py

# λ˜λŠ” 직접 μ‹€ν–‰
python check_config.py
  1. Run locally:

ν’€μŠ€νƒ 개발 μ„œλ²„ μ‹€ν–‰:

# 톡합 μ‹€ν–‰ 슀크립트 (ꢌμž₯)
./scripts/run-fullstack.sh

# λ˜λŠ” κ°œλ³„ μ‹€ν–‰
# API μ„œλ²„ (터미널 1)
uv run uvicorn api.main:app --host 0.0.0.0 --port 8001 --reload

# ν”„λ‘ νŠΈμ—”λ“œ μ„œλ²„ (터미널 2)
cd frontend && npm run dev

팟캐슀트 생성 νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰:

# uvλ₯Ό μ‚¬μš©ν•œ μ‹€ν–‰ (ꢌμž₯)
uv run python src/main.py

# λ˜λŠ” 직접 μ‹€ν–‰
uv run python -m src.main

πŸ’‘ 개발 ν™˜κ²½: ν’€μŠ€νƒ 개발 μ‹œ ./scripts/run-fullstack.sh μ‚¬μš© πŸ’‘ 팟캐슀트 생성: uv run python src/main.py μ‚¬μš©

Testing

Run Unit Tests

# uvλ₯Ό μ‚¬μš©ν•œ λ‹¨μœ„ ν…ŒμŠ€νŠΈ μ‹€ν–‰
uv run pytest tests/unit/ -v

# 컀버리지 포함
uv run pytest tests/unit/ -v --cov=src --cov-report=html

Run Contract Tests

# Contract ν…ŒμŠ€νŠΈ μ‹€ν–‰ (μ‹€μ œ API 호좜 λ˜λŠ” Mock)
uv run pytest tests/contract/ -v --run-contract-tests

# Contract ν…ŒμŠ€νŠΈ μŠ€ν‚΅ (κΈ°λ³Έκ°’)
uv run pytest tests/contract/ -v

Run Integration Tests

# 톡합 ν…ŒμŠ€νŠΈ μ‹€ν–‰
uv run pytest tests/integration/ -v

# 전체 νŒŒμ΄ν”„λΌμΈ ν…ŒμŠ€νŠΈλ§Œ μ‹€ν–‰
uv run pytest tests/integration/test_pipeline.py::TestPipelineIntegration::test_full_pipeline_end_to_end -v

Run All Tests

# λͺ¨λ“  ν…ŒμŠ€νŠΈ μ‹€ν–‰ (Contract μ œμ™Έ)
uv run pytest -v

# Contract ν…ŒμŠ€νŠΈ 포함 λͺ¨λ“  ν…ŒμŠ€νŠΈ
uv run pytest -v --run-contract-tests

Test Coverage Report

ν…ŒμŠ€νŠΈ μ‹€ν–‰ ν›„ htmlcov/index.html을 λΈŒλΌμš°μ €λ‘œ μ—΄μ–΄ 컀버리지 리포트λ₯Ό ν™•μΈν•˜μ„Έμš”.

Configuration

πŸ’‘ μžμ„Έν•œ μ„€μ • κ°€μ΄λ“œ:

λΉ λ₯Έ μ„€μ •

μžλ™ μ„€μ • 슀크립트 μ‚¬μš©:

# uvλ₯Ό μ‚¬μš©ν•œ 슀크립트 μ‹€ν–‰
uv run ./setup_env.sh

# λ˜λŠ” 직접 μ‹€ν–‰
./setup_env.sh

이 μŠ€ν¬λ¦½νŠΈλŠ” .env 파일과 ν•„μš”ν•œ 디렉토리λ₯Ό μžλ™μœΌλ‘œ μƒμ„±ν•©λ‹ˆλ‹€.

Required Environment Variables

  • GEMINI_API_KEY: Google Gemini API key
  • GOOGLE_APPLICATION_CREDENTIALS: Path to GCP service account JSON
  • GCS_BUCKET_NAME: Google Cloud Storage bucket name

Optional Variables

  • TZ: Timezone (default: Asia/Seoul)
  • LOG_LEVEL: Logging level (default: INFO)
  • PODCAST_TITLE_PREFIX: Podcast title prefix
  • PAPERS_TO_FETCH: Number of papers to fetch (default: 3)

GitHub Actions Setup

Required Secrets

GitHub Repository β†’ Settings β†’ Secrets and variables β†’ Actionsμ—μ„œ λ‹€μŒ Secretsλ₯Ό μΆ”κ°€ν•˜μ„Έμš”:

Secret Name Description How to Get
GEMINI_API_KEY Google Gemini API ν‚€ Google AI Studioμ—μ„œ λ°œκΈ‰
GCP_SERVICE_ACCOUNT_KEY GCP Service Account JSON (base64 encoded) GCP Consoleμ—μ„œ Service Account 생성 ν›„ ν‚€ λ‹€μš΄λ‘œλ“œ, base64 -w 0 < key.json λͺ…λ Ήμ–΄λ‘œ 인코딩
GCS_BUCKET_NAME Google Cloud Storage 버킷 이름 예: papercast-podcasts
SLACK_WEBHOOK_URL Slack Webhook URL (선택사항) Slack APIμ—μ„œ Incoming Webhook 생성

Service Account κΆŒν•œ μ„€μ •

GCP Service Account에 λ‹€μŒ 역할을 λΆ€μ—¬ν•˜μ„Έμš”:

  • Cloud Storage Admin: MP3 파일 및 메타데이터 μ—…λ‘œλ“œ
  • Cloud Text-to-Speech Admin: μŒμ„± λ³€ν™˜

Credential μ„€μ • 문제 ν•΄κ²°

JSON λ””μ½”λ”© 였λ₯˜κ°€ λ°œμƒν•˜λŠ” 경우:

  1. Base64 인코딩 확인:

    cat service-account-key.json | base64 -w 0
  2. 둜컬 ν…ŒμŠ€νŠΈ:

    export GCP_SERVICE_ACCOUNT_KEY="your_base64_encoded_key"
    python test_credentials.py
  3. μžμ„Έν•œ 문제 ν•΄κ²°: Credential 문제 ν•΄κ²° κ°€μ΄λ“œ μ°Έμ‘°

μ›Œν¬ν”Œλ‘œμš° μ‹€ν–‰

  1. μžλ™ μ‹€ν–‰: 맀일 μ˜€μ „ 6μ‹œ (KST)에 μžλ™μœΌλ‘œ μ‹€ν–‰λ©λ‹ˆλ‹€
  2. μˆ˜λ™ μ‹€ν–‰:
    • GitHub Repository β†’ Actions β†’ Daily Podcast Generation
    • "Run workflow" λ²„νŠΌ 클릭

νŠΈλŸ¬λΈ”μŠˆνŒ…

Secrets μ„€μ • 확인

# GitHub CLIλ₯Ό μ‚¬μš©ν•˜λŠ” 경우
gh secret list

μ›Œν¬ν”Œλ‘œμš° 둜그 확인

  • Actions νƒ­μ—μ„œ μ‹€νŒ¨ν•œ μ›Œν¬ν”Œλ‘œμš° 클릭
  • 각 단계별 둜그 확인

일반적인 문제

  1. "API key not valid"

    • Gemini API ν‚€κ°€ μ˜¬λ°”λ₯Έμ§€ 확인
    • API ν‚€ μ œν•œ μ„€μ • 확인
  2. "Permission denied" (GCS)

    • Service Account κΆŒν•œ 확인
    • 버킷 이름이 μ˜¬λ°”λ₯Έμ§€ 확인
  3. "Quota exceeded"

    • API ν• λ‹ΉλŸ‰ 확인
    • 무료 ν‹°μ–΄ ν•œλ„ 확인

Development

Running Tests

# All tests with uv
uv run pytest

# Specific test types
uv run pytest tests/unit/ -m unit
uv run pytest tests/integration/ -m integration
uv run pytest tests/contract/ -m contract

# With coverage
uv run pytest --cov=src --cov-report=html

Code Quality

# Format code with uv
uv run black src/ tests/

# Lint with uv
uv run pylint src/

# Type check with uv
uv run mypy src/

# λ˜λŠ” uvλ₯Ό μ‚¬μš©ν•œ 개발 도ꡬ μ‹€ν–‰
uv run --group dev black src/ tests/
uv run --group dev pylint src/
uv run --group dev mypy src/

Project Structure

papercast/
β”œβ”€β”€ src/                    # Core Python modules
β”‚   β”œβ”€β”€ models/            # Data models (Paper, Podcast, ProcessingLog)
β”‚   β”œβ”€β”€ services/          # Core services
β”‚   β”‚   β”œβ”€β”€ collector.py   # Hugging Face paper collection
β”‚   β”‚   β”œβ”€β”€ summarizer.py  # Gemini Pro summarization
β”‚   β”‚   β”œβ”€β”€ tts.py        # Google TTS conversion
β”‚   β”‚   β”œβ”€β”€ uploader.py   # GCS upload
β”‚   β”‚   └── generator.py  # Static site generation
β”‚   β”œβ”€β”€ utils/             # Utilities (logger, retry, config)
β”‚   └── main.py           # Main pipeline
β”œβ”€β”€ api/                   # FastAPI backend
β”‚   β”œβ”€β”€ routes/           # API endpoints
β”‚   β”‚   β”œβ”€β”€ health.py     # Health check endpoints
β”‚   β”‚   └── episodes.py   # Episode endpoints
β”‚   β”œβ”€β”€ schemas.py        # Pydantic response schemas
β”‚   β”œβ”€β”€ repository.py     # Data access layer
β”‚   β”œβ”€β”€ dependencies.py   # FastAPI dependencies
β”‚   └── main.py          # FastAPI app
β”œβ”€β”€ frontend/              # Next.js frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/   # React components
β”‚   β”‚   β”œβ”€β”€ pages/        # Next.js pages
β”‚   β”‚   β”œβ”€β”€ services/     # API client
β”‚   β”‚   └── styles/       # CSS styles
β”‚   β”œβ”€β”€ package.json      # Node.js dependencies
β”‚   └── next.config.js    # Next.js configuration
β”œβ”€β”€ tests/                 # Test suite
β”‚   β”œβ”€β”€ unit/            # Unit tests
β”‚   β”œβ”€β”€ integration/     # Integration tests
β”‚   β”œβ”€β”€ contract/        # Contract tests
β”‚   └── api/             # API tests
β”œβ”€β”€ scripts/              # Utility scripts
β”‚   β”œβ”€β”€ run-fullstack.sh # Full-stack development server
β”‚   β”œβ”€β”€ run-api.sh       # API server only
β”‚   └── dev-regenerate.py # Site regeneration
β”œβ”€β”€ .github/workflows/
β”‚   └── daily-podcast.yml # GitHub Actions workflow
β”œβ”€β”€ static-site/          # Generated static site
└── data/
    β”œβ”€β”€ papers/          # Collected papers
    └── podcasts/        # Generated podcasts

License

MIT

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details.

Documentation

About

πŸ€– μžλ™ν™”λœ AI λ…Όλ¬Έ 팟캐슀트 μ„œλΉ„μŠ€ - HuggingFace νŠΈλ Œλ”© 논문을 Gemini Pro둜 μš”μ•½ν•˜κ³  Google TTS둜 μŒμ„± λ³€ν™˜ν•˜μ—¬ 맀일 μ•„μΉ¨ 팟캐슀트둜 제곡

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published