Voice Assistant Knowledge Pipeline: Holidays and Animals
Context
You are designing an end-to-end knowledge pipeline for a global voice assistant (e.g., Alexa) to answer user questions about holidays worldwide and animal-related queries. Your design should support high accuracy, low latency, and continuous freshness across locales and languages.
Part 1 — Holidays
Design an end-to-end data pipeline that enables the assistant to answer holiday-related questions worldwide. Describe:
-
Data sources (official/government, religious/lunar, public datasets) and how to evaluate reliability and licenses.
-
Ingestion architecture (batch/stream, change detection, versioning, schema validation).
-
Normalization and reconciliation across calendar systems (Gregorian, lunar/lunisolar, federal/country-specific), including:
-
Date rules (e.g., "first Monday in September"; observance shifts when a date falls on a weekend).
-
Multi-day holidays, regional variants, and time zones.
-
Conversion from lunar/lunisolar to Gregorian per year and locale.
-
Storage layers (canonical knowledge graph, precomputed expansions, search index, cache) and data models.
-
Query/answer layer (NLU intents, entity resolution, localization, latency budget, fallbacks) and how you will keep content current (freshness SLAs, monitoring, editorial overrides).
Part 2 — Animals
Extend the pipeline so the assistant can answer animal-related questions. Describe:
-
Additional data elements and taxonomies (e.g., scientific classification, common names, habitats, conservation status).
-
Required ML/NLP models (domain classification, entity linking, attribute extraction, summarization) and a feature store.
-
How you will detect and correct misrouted queries such as "Peppa Pig" (a cartoon character) that are falsely labeled as animal questions. Include confidence thresholds, intent reclassification, and human-in-the-loop.
Part 3 — Debugging Framework
You receive error logs where the assistant fails on specific animal questions. Propose a framework to:
-
Categorize errors end-to-end (ASR, language detection, intent, entity linking, knowledge gaps, freshness, rendering).
-
Trace root causes with observability and reproducible pipelines.
-
Prioritize fixes using an impact-severity-effort framework.
Notes
-
Calendars differ by locale—normalize date formats, offsets, time zones, and observance rules.
-
Implement calibrated confidence thresholds and intent reclassification to handle ambiguous queries like "Peppa Pig".