PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Analytics & Experimentation/Other

Design metrics resilient to data quality

Last updated: Mar 29, 2026

Quick Overview

This question evaluates the ability to design robust data-quality metrics and apply SQL window-function techniques for deduplication and time-windowed completeness calculations, focusing on quantifying how ingestion delays, duplicates, and negative amounts can bias payment metrics.

  • Medium
  • Other
  • Analytics & Experimentation
  • Data Scientist

Design metrics resilient to data quality

Company: Other

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: Medium

Interview Round: Technical Screen

Design a robust metric and compute it using only window functions (no JOINs) to show how data-quality issues change conclusions. Schema: payments_raw(txn_id STRING, loan_id INT, event_time TIMESTAMP, amount DECIMAL(10,2), ingestion_time TIMESTAMP, source STRING). Sample: +--------+---------+---------------------+--------+---------------------+--------+ | txn_id | loan_id | event_time | amount | ingestion_time | source | +--------+---------+---------------------+--------+---------------------+--------+ | a1 | 1 | 2025-08-25 10:00:00 | 200.00 | 2025-08-25 10:00:05 | app | | a1_dup | 1 | 2025-08-25 10:00:00 | 200.00 | 2025-08-25 10:05:12 | web | | b1 | 1 | 2025-08-29 09:00:00 | 300.00 | 2025-09-03 12:00:00 | batch | | c1 | 2 | 2025-08-31 14:00:00 | 100.00 | 2025-08-31 14:00:03 | app | | d1 | 2 | 2025-09-01 08:00:00 | -50.00 | 2025-09-01 08:00:05 | app | +--------+---------+---------------------+--------+---------------------+--------+ Tasks: (a) Define precisely a '7-day arrival completeness' metric for the window [2025-08-25, 2025-09-01], where today = 2025-09-01: the share of unique payments with event_time on day D that have at least one ingested record within 7 days of event_time. State how you will de-duplicate payments using only window functions (e.g., partition by loan_id, amount, event_time within 2 minutes; keep the earliest ingestion_time). (b) Compute two daily time series for D in [2025-08-25, 2025-09-01]: naive_completeness (no dedup, only same-day arrivals) and robust_completeness (your dedup rule and 7-day wait; censor results for D where the 7-day window is incomplete as of 2025-09-01). (c) Output a summary table with D, naive_completeness, robust_completeness, bias = naive - robust, and a flag if negative amounts exist on D. Explain one real-world decision that would differ if you used the naive metric.

Quick Answer: This question evaluates the ability to design robust data-quality metrics and apply SQL window-function techniques for deduplication and time-windowed completeness calculations, focusing on quantifying how ingestion delays, duplicates, and negative amounts can bias payment metrics.

Related Interview Questions

  • Design and power an A/B on question mix - Other (medium)
  • Design an A/B test with guardrails - Other (hard)
  • Find and fix metric drops systematically - Other (medium)
  • Separate demand from supply for jeans - Other (medium)
  • Design an A/B test for a Celebrate reaction - Other (hard)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
1
0

Design a robust metric and compute it using only window functions (no JOINs) to show how data-quality issues change conclusions. Schema: payments_raw(txn_id STRING, loan_id INT, event_time TIMESTAMP, amount DECIMAL(10,2), ingestion_time TIMESTAMP, source STRING). Sample: +--------+---------+---------------------+--------+---------------------+--------+ | txn_id | loan_id | event_time | amount | ingestion_time | source | +--------+---------+---------------------+--------+---------------------+--------+ | a1 | 1 | 2025-08-25 10:00:00 | 200.00 | 2025-08-25 10:00:05 | app | | a1_dup | 1 | 2025-08-25 10:00:00 | 200.00 | 2025-08-25 10:05:12 | web | | b1 | 1 | 2025-08-29 09:00:00 | 300.00 | 2025-09-03 12:00:00 | batch | | c1 | 2 | 2025-08-31 14:00:00 | 100.00 | 2025-08-31 14:00:03 | app | | d1 | 2 | 2025-09-01 08:00:00 | -50.00 | 2025-09-01 08:00:05 | app | +--------+---------+---------------------+--------+---------------------+--------+ Tasks: (a) Define precisely a '7-day arrival completeness' metric for the window [2025-08-25, 2025-09-01], where today = 2025-09-01: the share of unique payments with event_time on day D that have at least one ingested record within 7 days of event_time. State how you will de-duplicate payments using only window functions (e.g., partition by loan_id, amount, event_time within 2 minutes; keep the earliest ingestion_time). (b) Compute two daily time series for D in [2025-08-25, 2025-09-01]: naive_completeness (no dedup, only same-day arrivals) and robust_completeness (your dedup rule and 7-day wait; censor results for D where the 7-day window is incomplete as of 2025-09-01). (c) Output a summary table with D, naive_completeness, robust_completeness, bias = naive - robust, and a flag if negative amounts exist on D. Explain one real-world decision that would differ if you used the naive metric.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Other•More Data Scientist•Other Data Scientist•Other Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.