Objective

JamFinder is a proposed full-stack application that solves a specific problem: extracting music from hundreds of hours of mixed audio recordings (conversation + informal jam sessions) and providing a non-technical user with a polished dashboard to review, curate, and remaster the discovered music segments.

The system combines AI-powered audio classification (inaSpeechSegmenter, Spleeter) with a production-grade remastering pipeline (Demucs, noisereduce, pedalboard, matchering, pyloudnorm) behind a drag-and-drop Next.js dashboard. The target user never touches a terminal — they upload files, browse detected music segments on an interactive waveform, star their favorites, and click "Remaster" to get cleaned-up, loudness-normalized output files.

---

Subjective

This project originated from a conversation in yoga class. A fellow student mentioned he has roughly a hundred hours of iPhone Voice Memo recordings — casual hangouts where he and his friends would talk, jam on instruments, and play music intermittently. The music is buried in hours of conversation, and he has no way to find it without listening to everything end-to-end. He wants to surface the musical moments, decide which ones are worth keeping, and then clean them up for potential production or personal archives.

It's a perfect example of a real human need that sits at the intersection of AI audio processing and practical UX design — the kind of project that would have been impossibly tedious five years ago but is now tractable with open source ML models.

---

Initiation & Collaboration Process

The Opening Prompt

The project began with a single request to Claude Code:

"Someone in my yoga class has a hundred hours of audio clips of him and his friends sitting around talking and intermittently jamming on music and playing music. He wants to find out where the music is in those audio recordings and then put them into a dashboard so that he can go through them and determine what he wants to remaster and produce. Figure out what open source libraries would be able to identify the music in those recordings and then what open source library would be good for remastering them, and then enter plan mode and construct this project for me to review."

Research Phase

Claude deployed two parallel research agents — one investigating music detection/segmentation libraries, the other investigating remastering tools. Within about 45 seconds, both returned comprehensive assessments of the open source landscape.

Music Detection findings:

inaSpeechSegmenter (INA, France) — CNN-based classifier that outputs timestamped (label, start, end) tuples for music, speech, noise, and silence. Best primary tool.
Spleeter (Deezer) — 2-stem separation enables detecting music underneath conversation by measuring energy on the accompaniment track. Excellent second pass.
pyAudioAnalysis — alternative classifier requiring more manual tuning.
PANNs — 527-class AudioSet tagger for granular instrument identification if needed.

Remastering findings:

Demucs (Meta/audiocraft) — state-of-the-art stem separation (vocals, drums, bass, other)
noisereduce — spectral gating for hiss/hum removal
pedalboard (Spotify) — production-grade audio effects chain (EQ, compression, reverb, limiting)
matchering — automated mastering that matches a reference track's frequency profile
pyloudnorm — ITU-R BS.1770 loudness normalization to streaming standards

Clarifying Questions

Before designing the system, Claude asked three targeted questions to narrow the solution space:

Target user's technical level? → Non-technical — needs a polished GUI with drag-and-drop, no terminal usage
Audio format? → iPhone Voice Memos (m4a) — compressed AAC recordings requiring FFmpeg conversion
Tech stack preference? → Python + Next.js — Python backend for audio ML, Next.js dashboard for the review UI

These three answers shaped every subsequent architectural decision: the drag-and-drop upload flow, the async job queue (since m4a conversion + ML inference is slow), the preset-based remaster controls (no knob-twiddling), and the Docker Compose packaging (Brian can run it for his friend with a single command).

---

Project Outline

Architecture

┌─────────────────────────────────────────────────┐
│                  Next.js Dashboard              │
│  Upload files → View segments → Play/review     │
│  Mark favorites → Trigger remaster → Download   │
└──────────────────────┬──────────────────────────┘
                       │ REST API
┌──────────────────────┴──────────────────────────┐
│              Python FastAPI Backend              │
│  /upload  /segments  /remaster  /download        │
└──────────┬────────────┬────────────┬────────────┘
           │            │            │
     ┌─────┴─────┐ ┌───┴────┐ ┌────┴──────┐
     │ Detection  │ │ SQLite │ │ Remaster  │
     │ Pipeline   │ │   DB   │ │ Pipeline  │
     └───────────┘ └────────┘ └───────────┘

Stack: FastAPI + Celery + Redis (async jobs) | Next.js + Tailwind + wavesurfer.js (waveform UI) | SQLite (metadata) | Docker Compose (packaging)

Phase 1: Scaffold & Upload

FastAPI app with Celery worker and Redis broker
POST /api/upload — accepts m4a/wav/mp3, converts to wav via FFmpeg
SQLite schema: recordings (file metadata + processing status), segments (timestamps + labels + favorites), remaster_jobs (queue + output paths)
Next.js drag-and-drop upload zone with progress bars and recording list

Phase 2: Music Detection Pipeline

Two-pass detection, triggered automatically after upload:

Pass 1 — inaSpeechSegmenter: Classifies every frame as music, speech, noise, or silence. Stores all music segments with timestamps. Merges adjacent segments with gaps under 3 seconds.

Pass 2 — Spleeter overlap detection: For segments Pass 1 labeled as speech, runs 2-stem separation and computes RMS energy on the accompaniment track. High energy = music playing under conversation. These get labeled as "overlap" segments.

Dashboard — Segment Browser:

Full waveform visualization with colored overlays (green = music, blue = overlap, gray = speech)
Click-to-play any segment
Star/favorite button and notes field per segment
Filter by type: All / Music / Overlap / Favorites

Phase 3: Remastering Pipeline

Per-segment pipeline triggered by "Remaster" button:

Input segment (wav clip)
  → Demucs (separate into drums/bass/vocals/other)
  → noisereduce (clean each stem individually)
  → pedalboard (EQ, compression, light reverb per stem)
  → Remix stems (balanced levels)
  → matchering (master against reference track — optional)
  → pyloudnorm (normalize to -14 LUFS)
  → Output remastered wav + mp3

Presets for the non-technical user:

Quick Clean — noise reduction + loudness normalization only (~10s)
Full Remaster — complete pipeline (~2-5 min per segment)
Vocal Focus — boost vocals, reduce instruments
Instrumental Focus — remove/reduce vocals, enhance instruments

Dashboard — Remaster UI:

Preset selector dropdown, before/after A/B playback toggle
Optional reference track upload for matchering
Download button (wav + mp3), batch remaster for all favorites

Phase 4: Polish & Delivery

Bulk upload (multiple files at once), processing queue with progress
Batch export (zip all remastered tracks)
Dark mode (music production aesthetic)
Keyboard shortcuts: space = play/pause, arrows = navigate segments, f = favorite
Mobile-responsive for phone review
Docker Compose packaging: one docker compose up runs everything

Hardware Estimates

GPU recommended (CUDA or Apple Silicon MPS) but CPU fallback works
100 hours detection on M-series Mac: ~4-6 hours
Storage budget: ~400GB (m4a source + wav conversion + stem separation)

---

As a Case Study

This memo documents the full arc of a collaboration: from a casual real-world need expressed in plain language, through structured research, targeted clarifying questions, and architectural planning — all before writing a single line of code.

What makes this a useful reference:

Start with the problem, not the tech. The prompt described a human need ("find the music in these recordings") without prescribing tools. The research phase identified the right tools for the job.

Parallel research agents. Two independent research threads ran simultaneously — one for detection, one for remastering — collapsing what would have been hours of manual research into under a minute.

Three questions reshaped everything. Technical level, audio format, and stack preference eliminated entire categories of wrong solutions before any design work began.

The plan is the deliverable. Before any implementation, there's a complete architecture, phased roadmap, file structure, dependency list, and verification plan. This document itself becomes the project's north star.

Anyone can use this pattern: describe your need in plain language → let research identify the tools → answer clarifying questions to narrow the design space → review the plan → then build.