System Design: Static Audio Detection Pipeline
Context
Design an offline (non-live) audio detection system that processes static audio files (e.g., user-uploaded clips) for policy compliance and quality. The goal is to ingest files, extract signals (speech-to-text, spectral features, keywords), combine them via rules, classify outcomes, and support human review where needed.
Requirements
-
Functional
-
Ingest audio files from object storage.
-
Preprocess (validation, transcoding, noise reduction, segmentation).
-
Extract features: spectral analysis, speech-to-text (STT), keyword/phrase detection.
-
Combine signals using a rule-based post-processor to classify each asset as: Clean, Problematic, or Needs Human Review.
-
Persist artifacts (features, transcript, decisions) and expose results via API/stream.
-
Discover new files automatically; support both event-driven and scheduled/batch discovery.
-
Provide a manual review workflow (assignment, labeling, consensus, requeueing, audit).
-
Support reprocessing/backfill when rules change.
-
Non-Functional
-
Scalability: handle large daily volumes with predictable throughput.
-
Latency: near-real-time (minutes) for most files.
-
Reliability/fault tolerance: at-least-once processing, idempotent tasks, DLQs.
-
Cost efficiency: optimize storage/compute and third-party API usage.
-
Security/privacy: encryption at rest/in-transit, access controls, audit trails.
-
Observability: metrics, logs, traces; quality and health monitoring.
-
Out of Scope
-
Selecting or training ML models. Assume pluggable components.
Deliverables
-
Functional and non-functional requirements.
-
Key entities and data model.
-
High-level architecture (storage, compute, orchestration).
-
End-to-end processing flow from ingestion to output.
-
Integration of STT, spectral analysis, keyword detection, noise reduction, and rule-based post-processing.
-
File discovery strategy (event-driven vs cron/batch).
-
Outcome classification scheme and manual review workflow.
-
Scalability, throughput/latency targets, data retention, fault tolerance, backfill, and cost controls.
-
Success metrics and monitoring/alerting for quality and system health.