PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates data quality auditing competencies, including validation of timestamps and timezones, detection of label leakage and inconsistent units, provenance and schema enforcement, and hands‑on use of SQL/Python for dataset inspection.

  • Medium
  • Capital One
  • Data Manipulation (SQL/Python)
  • Data Scientist

Audit flight data quality from metadata

Company: Capital One

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

You’re given an airline on‑time dataset and a one‑page “Metadata” slide that claims: flight_date (string, local time), dep_time/arr_time (HHMM local), dep_delay/arr_delay (minutes, negative allowed), cancelled (0/1), tail_num (aircraft id), origin, dest, aircraft_type (optional), weather_code (optional). The modeling target will be arrival delay in minutes. Audit data quality before modeling: 1) List at least 8 concrete, high‑risk issues you would proactively test for (e.g., timezone mixups, impossible HHMM values like 2460, clock changes around DST, negative delays beyond physical plausibility, cancelled flights with non‑null delays, duplicated tail_num semantics, leakage via “arr_delay” filled post‑facto, inconsistent units). 2) For each, specify an exact detection check (e.g., dep_time not matching ^\d{3,4}$ OR casted minutes >= 24*60; arr_delay < -60 OR > 600; origin/dest not in allowed IATA set) and how you’d quantify impact (prevalence, row counts, feature coverage). 3) Propose mitigation that avoids target leakage (e.g., remove post‑arrival features, impute within training folds, align to UTC with airport TZ map, drop DST transition minutes) and describe how you’d verify the fixes with before/after distributions and holdout data. 4) Define a lightweight data contract for upstream producers (types, ranges, allowed nulls, semantic rules) and how you’d enforce it in CI with schema tests and sampling.

Quick Answer: This question evaluates data quality auditing competencies, including validation of timestamps and timezones, detection of label leakage and inconsistent units, provenance and schema enforcement, and hands‑on use of SQL/Python for dataset inspection.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Clean and Merge Housing Data - Capital One (easy)
  • Find Lowest Prices for Highly Rated Categories - Capital One (medium)
  • Write SQL to compute campaign net revenue - Capital One (Medium)
  • Merge CSVs and build revenue pivot with pandas - Capital One (Medium)
  • Find top category per region in Aug 2025 - Capital One (Medium)