Event Instrumentation And Measurement Design
Asked of: Data Scientist
Last updated

What's being tested
Ability to design reliable, debuggable telemetry for product measurement: choosing metrics/events, schema and sampling tradeoffs, handling loss/duplication/latency, and protecting privacy while enabling experiments.
Core knowledge
- Define numerator/denominator and unit-of-analysis (user, session, device) before choosing events or aggregations.
- Event schema: stable event_id, user_id (hashed), client_ts and server_ts, event_type, version, and typed properties.
- Ingestion semantics: at-least-once vs exactly-once, dedup via event_id, idempotent writes, and watermark/late-arrival handling.
- Sampling strategies: uniform, stratified, deterministic hashing; account for sampling in metric backfills and variance.
- Cardinality limits: avoid high-cardinality keys (full URLs, free-text) that explode joins and storage.
- Privacy/compliance: PII hashing/salting, PI removal at SDK, consent flags, and retention/expiry policies.
- Pipeline tooling/tradeoffs: Kafka for high-throughput streams, Flink/Spark for streaming aggregation, columnar warehouses for aggregates (BigQuery/Parquet); monitor telemetry loss and service-level SLAs.
Worked example — typical interview: "Design instrumentation to measure a new feature's conversion"
Start by clarifying the primary metric (e.g., conversion rate = converted users / exposed users) and unit (user-level, first conversion per user, per-session). Identify required events: exposure (impression) and conversion, each with stable event_id, hashed user_id, client_ts, server_ts, feature_flag_id, and assignment token. State sampling plan (deterministic hashing of user_id to keep consistent treatment) and guardrail metrics (DAU, error rate, page load time). Outline pipeline: client emits events to Kafka with retries and idempotency; stream job deduplicates by event_id, applies sampling weights, joins exposure→conversion within a time window, and writes to daily aggregates for analysis. Note how you’ll validate (end-to-end tests, synthetic events) and monitor (ingestion loss, event-rate vs expected).
A common pitfall
Focusing on raw event counts instead of defining denominators and unit-of-analysis leads to misleading metrics (e.g., counting clicks vs unique users). Another tempting mistake is ignoring telemetry loss and deduplication: assuming client events are complete will bias rates if mobile SDK drops events or sampling is inconsistent. Also avoid adding high-cardinality properties by default—they break joins and inflate storage/costs.
Further reading
- Martin Kleppmann, Designing Data-Intensive Applications (esp. chapters on streaming and consistency).
- Apache Avro schema evolution docs (best practices for schema versioning).
Related concepts
- Event Instrumentation And Data Quality
- Instrumentation, Logging, And Data Quality
- Instrumentation, Logging, Labeling, And Data Quality
- Product Metric Design And Diagnostic Deep DivesAnalytics & Experimentation
- A/B Testing And Product Metric DiagnosticsAnalytics & Experimentation
- Product Metric Frameworks