Objective
JamFinder is a proposed full-stack application that solves a specific problem: extracting music from hundreds of hours of mixed audio recordings (conversation + informal jam sessions) and providing a non-technical user with a polished dashboard to review, curate, and remaster the discovered music segments.
The system combines AI-powered audio classification (inaSpeechSegmenter, Spleeter) with a production-grade remastering pipeline (Demucs, noisereduce, pedalboard, matchering, pyloudnorm) behind a drag-and-drop Next.js dashboard. The target user never touches a terminal — they upload files, browse detected music segments on an interactive waveform, star their favorites, and click "Remaster" to get cleaned-up, loudness-normalized output files.
---
Subjective
This project originated from a conversation in yoga class. A fellow student mentioned he has roughly a hundred hours of iPhone Voice Memo recordings — casual hangouts where he and his friends would talk, jam on instruments, and play music intermittently. The music is buried in hours of conversation, and he has no way to find it without listening to everything end-to-end. He wants to surface the musical moments, decide which ones are worth keeping, and then clean them up for potential production or personal archives.
It's a perfect example of a real human need that sits at the intersection of AI audio processing and practical UX design — the kind of project that would have been impossibly tedious five years ago but is now tractable with open source ML models.
---
Initiation & Collaboration Process
The Opening Prompt
The project began with a single request to Claude Code:
"Someone in my yoga class has a hundred hours of audio clips of him and his friends sitting around talking and intermittently jamming on music and playing music. He wants to find out where the music is in those audio recordings and then put them into a dashboard so that he can go through them and determine what he wants to remaster and produce. Figure out what open source libraries would be able to identify the music in those recordings and then what open source library would be good for remastering them, and then enter plan mode and construct this project for me to review."
Research Phase
Claude deployed two parallel research agents — one investigating music detection/segmentation libraries, the other investigating remastering tools. Within about 45 seconds, both returned comprehensive assessments of the open source landscape.
Music Detection findings:
- inaSpeechSegmenter (INA, France) — CNN-based classifier that outputs timestamped
(label, start, end)tuples for music, speech, noise, and silence. Best primary tool. - Spleeter (Deezer) — 2-stem separation enables detecting music underneath conversation by measuring energy on the accompaniment track. Excellent second pass.
- pyAudioAnalysis — alternative classifier requiring more manual tuning.
- PANNs — 527-class AudioSet tagger for granular instrument identification if needed.
Remastering findings:
- Demucs (Meta/audiocraft) — state-of-the-art stem separation (vocals, drums, bass, other)
- noisereduce — spectral gating for hiss/hum removal
- pedalboard (Spotify) — production-grade audio effects chain (EQ, compression, reverb, limiting)
- matchering — automated mastering that matches a reference track's frequency profile
- pyloudnorm — ITU-R BS.1770 loudness normalization to streaming standards
Clarifying Questions
Before designing the system, Claude asked three targeted questions to narrow the solution space:
- Target user's technical level? → Non-technical — needs a polished GUI with drag-and-drop, no terminal usage
- Audio format? → iPhone Voice Memos (m4a) — compressed AAC recordings requiring FFmpeg conversion
- Tech stack preference? → Python + Next.js — Python backend for audio ML, Next.js dashboard for the review UI
These three answers shaped every subsequent architectural decision: the drag-and-drop upload flow, the async job queue (since m4a conversion + ML inference is slow), the preset-based remaster controls (no knob-twiddling), and the Docker Compose packaging (Brian can run it for his friend with a single command).
---
Project Outline
Architecture
┌─────────────────────────────────────────────────┐
│ Next.js Dashboard │
│ Upload files → View segments → Play/review │
│ Mark favorites → Trigger remaster → Download │
└──────────────────────┬──────────────────────────┘
│ REST API
┌──────────────────────┴──────────────────────────┐
│ Python FastAPI Backend │
│ /upload /segments /remaster /download │
└──────────┬────────────┬────────────┬────────────┘
│ │ │
┌─────┴─────┐ ┌───┴────┐ ┌────┴──────┐
│ Detection │ │ SQLite │ │ Remaster │
│ Pipeline │ │ DB │ │ Pipeline │
└───────────┘ └────────┘ └───────────┘Stack: FastAPI + Celery + Redis (async jobs) | Next.js + Tailwind + wavesurfer.js (waveform UI) | SQLite (metadata) | Docker Compose (packaging)
Phase 1: Scaffold & Upload
- FastAPI app with Celery worker and Redis broker
- POST /api/upload — accepts m4a/wav/mp3, converts to wav via FFmpeg
- SQLite schema:
recordings(file metadata + processing status),segments(timestamps + labels + favorites),remaster_jobs(queue + output paths) - Next.js drag-and-drop upload zone with progress bars and recording list
Phase 2: Music Detection Pipeline
Two-pass detection, triggered automatically after upload:
Pass 1 — inaSpeechSegmenter: Classifies every frame as music, speech, noise, or silence. Stores all music segments with timestamps. Merges adjacent segments with gaps under 3 seconds.
Pass 2 — Spleeter overlap detection: For segments Pass 1 labeled as speech, runs 2-stem separation and computes RMS energy on the accompaniment track. High energy = music playing under conversation. These get labeled as "overlap" segments.
Dashboard — Segment Browser:
- Full waveform visualization with colored overlays (green = music, blue = overlap, gray = speech)
- Click-to-play any segment
- Star/favorite button and notes field per segment
- Filter by type: All / Music / Overlap / Favorites
Phase 3: Remastering Pipeline
Per-segment pipeline triggered by "Remaster" button:
Input segment (wav clip)
→ Demucs (separate into drums/bass/vocals/other)
→ noisereduce (clean each stem individually)
→ pedalboard (EQ, compression, light reverb per stem)
→ Remix stems (balanced levels)
→ matchering (master against reference track — optional)
→ pyloudnorm (normalize to -14 LUFS)
→ Output remastered wav + mp3Presets for the non-technical user:
- Quick Clean — noise reduction + loudness normalization only (~10s)
- Full Remaster — complete pipeline (~2-5 min per segment)
- Vocal Focus — boost vocals, reduce instruments
- Instrumental Focus — remove/reduce vocals, enhance instruments
Dashboard — Remaster UI:
- Preset selector dropdown, before/after A/B playback toggle
- Optional reference track upload for matchering
- Download button (wav + mp3), batch remaster for all favorites
Phase 4: Polish & Delivery
- Bulk upload (multiple files at once), processing queue with progress
- Batch export (zip all remastered tracks)
- Dark mode (music production aesthetic)
- Keyboard shortcuts: space = play/pause, arrows = navigate segments, f = favorite
- Mobile-responsive for phone review
- Docker Compose packaging: one
docker compose upruns everything
Hardware Estimates
- GPU recommended (CUDA or Apple Silicon MPS) but CPU fallback works
- 100 hours detection on M-series Mac: ~4-6 hours
- Storage budget: ~400GB (m4a source + wav conversion + stem separation)
---
As a Case Study
This memo documents the full arc of a collaboration: from a casual real-world need expressed in plain language, through structured research, targeted clarifying questions, and architectural planning — all before writing a single line of code.
What makes this a useful reference:
- Start with the problem, not the tech. The prompt described a human need ("find the music in these recordings") without prescribing tools. The research phase identified the right tools for the job.
- Parallel research agents. Two independent research threads ran simultaneously — one for detection, one for remastering — collapsing what would have been hours of manual research into under a minute.
- Three questions reshaped everything. Technical level, audio format, and stack preference eliminated entire categories of wrong solutions before any design work began.
- The plan is the deliverable. Before any implementation, there's a complete architecture, phased roadmap, file structure, dependency list, and verification plan. This document itself becomes the project's north star.
Anyone can use this pattern: describe your need in plain language → let research identify the tools → answer clarifying questions to narrow the design space → review the plan → then build.
