PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Stripe

Debug Validation Error Aggregation

Last updated: Jun 21, 2026

Quick Overview

This question evaluates debugging, error-handling, and message-normalization skills in Python schema-validation libraries, focusing on aggregation of Colander-style Invalid exceptions into dotted-path error dictionaries.

  • hard
  • Stripe
  • Software Engineering Fundamentals
  • Software Engineer

Debug Validation Error Aggregation

Company: Stripe

Role: Software Engineer

Category: Software Engineering Fundamentals

Difficulty: hard

Interview Round: Onsite

You are working in a Python schema-validation library modeled on [Colander](https://docs.pylonsproject.org/projects/colander/). A schema is a tree of nodes; each node can run multiple validators. When validation fails, validators raise an `Invalid` exception. A single failure can carry several messages, and failures from child nodes are attached as `children`. The library flattens this exception tree into a structured, dotted-path dictionary through an `asdict()` method (e.g. `{'account.balance': 'must be non-negative', 'items.0.qty': 'required'}`). The current implementation has **three failing test groups**. You are given the exact failing test commands and assertions. Your task is to debug the code, identify the **root cause** of each group, and implement targeted fixes — not just patch the crash site. Walk through how you would debug each failure (reproduce, trace, locate the first bad value) and describe the precise code-level change you would make. Address each of the three Parts below. ### Constraints & Assumptions - This is a **debugging** exercise on an existing codebase, not a greenfield design — prefer the smallest correct change at the right boundary, and reuse one normalization path rather than scattering `if x is None` guards. - The `Invalid` exception node exposes (Colander-style): `msg` (a string, a list of strings, or `None`), `children` (a list of child `Invalid` nodes), `node` (the schema node, with a `.name`), and `pos` (position of a child within a sequence/mapping). - A sentinel value (Colander calls it `colander.null`) represents a missing/empty value during serialization; a node may carry a `default` to substitute when the value is missing. - Fixes must be backward compatible: real (non-empty) messages and valid child errors must never be dropped, and existing passing tests must stay green. - Assume CPython 3.x; you may add small private helper functions but should not pull in new third-party dependencies. ### Clarifying Questions to Ask - What is the exact expected `asdict()` output shape — dotted keys mapping to a single joined string, or to a list of messages? What separator joins multiple messages on one node? - For a node with no message of its own but with failing children, should its key appear in the output at all, or only the children's keys? - Is `colander.null` (the missing sentinel) distinct from `None` and `''`, and should all three be treated as "null-like" for Part 3, or only the sentinel? - When a sequence field is empty and has **no** `default`, what should serialize produce — `[]`, the sentinel, or an error? - Are message strings ever intentionally whitespace-only or `'0'` / `'false'`-like values that I must *not* discard? - Can the same `Invalid` node legitimately appear more than once in the tree (shared references / cycles) such that I need to guard against double-counting or infinite recursion? ### Part 1 — Error aggregation crashes on null / empty messages Some validators produce a message of `None`, an empty string `''`, or a list that contains null/empty entries (e.g. `[None, 'required']`). The aggregation logic that combines messages for a node currently assumes every message is a non-empty string, so it raises (typically a `TypeError` from joining `None`, or it emits empty fragments like `'; '`). Fix the aggregation so it **filters out** null/empty messages before combining them, while **preserving** every real message. ```hint Where to start Run the failing test and look at the exact line that raises. It is almost certainly a `'; '.join(messages)` (or equivalent) where `messages` contains a `None`. Patching that one line is the symptom fix — ask *where do these messages enter the system* so you fix it once. ``` ```hint Normalization boundary Instead of guarding the join, ask what shape the join *wants*: a flat list of clean, non-empty strings. If every message value were funneled through one normalization step at the point it's read — before any combining — the join would never see a `None`, an empty string, or a nested list again. Decide what that step does with each kind of input (scalar vs. collection, present vs. absent). ``` #### What This Part Should Cover - Locates where the bad value *first enters* (a validator returning `None`/`''`/a mixed list), not just where `join` crashes. - Introduces a reusable normalization helper applied at one boundary rather than scattering guards across the call sites. - Preserves every real message and has a reasoned position on what to do with a node left with zero real messages (omit its key vs. emit `''`). ### Part 2 — Nested validation errors are lost or crash Validators nest across multiple schema levels, so an `Invalid` carries `children`, each of which may itself have messages and children. The flatten / `asdict()` walk currently handles the parent and the children inconsistently: nested values such as `[None, 'must be positive']` either crash the walk or cause valid child errors to be silently dropped (e.g. when a parent has no useful message of its own). Make `asdict()` propagate and combine inner errors correctly: no crash, no null messages in the output, and **no lost valid child errors**. ```hint Apply the same rule everywhere The bug is asymmetry: the parent's messages and the children's messages go through different code. Use the **same** `normalize_messages` from Part 1 at every level of the recursion so a `None` deep in a child list behaves identically to a `None` at the root. ``` ```hint Don't gate descent on the parent A parent node with an empty message must still contribute its children. Build the dotted key from each node's `.name` (and `.pos` for sequence children) as you descend; only emit a key when, after normalization, that node actually has messages — but always keep walking `children`. ``` #### What This Part Should Cover - Treats parent and child uniformly: normalization runs at every recursion level, so nested nulls behave like root-level nulls. - Ensures `children` are visited even when a parent carries no message of its own, so failing descendants are not silently dropped. - Builds correct dotted paths, using `.name` for mappings and `.pos` for sequence positions, with no lost or duplicated leaves. ### Part 3 — Sequence serialization defaults are applied inconsistently During **serialization**, empty and null-like sequence inputs are treated differently. An empty list `[]`, the missing sentinel, and a list containing only null-like placeholders (e.g. `[None]`, `[None, None]`) currently take different branches, so the node's `default` is applied in some cases but not others. Make default handling uniform: `[]`, the missing sentinel, and "only null-like items" should all resolve to the same default-application path for a sequence field. ```hint Define "empty" once The bug is that "nothing was supplied" has three surface forms here and each hits a different branch. Collapse them: what single boolean question, asked once before the item loop, would be true for *all* of the empty/null-like forms and false for a sequence with real items? Branch on that one question instead of special-casing each form. (Watch what your test does for `[]` — make sure it lands on the empty side.) ``` #### What This Part Should Cover - Defines a single "effectively empty" predicate that unifies the sentinel, `[]`, and all-null-like lists, then branches on it once instead of special-casing each form. - Handles the `[]` case correctly (e.g. notes that `all(...)` over an empty list is `True`) so no separate special case is needed. - States what serialization produces when the field is empty and has **no** default, and keeps the change to the smallest correct boundary. ### What a Strong Answer Covers These dimensions span all three Parts: - **Test-first debugging discipline**: reproduce one failing test at a time, read the assertion, then trace to where the bad value *first* appears rather than where it crashes. - **Root-cause vs symptom**: identifies that Parts 1 and 2 share one cause (un-normalized messages) and fixes it at a boundary with a reusable helper, not with scattered guards. - **Preserving valid data**: filtering removes only genuinely empty messages; real messages and child errors survive, and existing passing tests stay green. - **Communication**: states assumptions out loud, asks the clarifying questions above, and explains each change before writing it. ### Follow-up Questions - The interviewer says "your normalize helper recurses into lists — what happens if a malformed `Invalid` tree contains a cycle, and how would you make `asdict()` safe against unbounded recursion?" - "We now want `asdict()` to return a **list** of messages per key instead of a joined string. What changes, and how do you keep it backward compatible for existing callers?" - "How would you add regression tests that lock in these fixes so a future refactor can't silently reintroduce the null-message crash?" - "If two different child nodes flatten to the same dotted key (e.g. duplicate sequence positions), how should their messages be merged?"

Quick Answer: This question evaluates debugging, error-handling, and message-normalization skills in Python schema-validation libraries, focusing on aggregation of Colander-style Invalid exceptions into dotted-path error dictionaries.

Related Interview Questions

  • Generate a ride map image via POST API - Stripe (medium)
|Home/Software Engineering Fundamentals/Stripe

Debug Validation Error Aggregation

Stripe logo
Stripe
May 14, 2026, 12:00 AM
hardSoftware EngineerOnsiteSoftware Engineering Fundamentals
92
0

You are working in a Python schema-validation library modeled on Colander. A schema is a tree of nodes; each node can run multiple validators. When validation fails, validators raise an Invalid exception. A single failure can carry several messages, and failures from child nodes are attached as children. The library flattens this exception tree into a structured, dotted-path dictionary through an asdict() method (e.g. {'account.balance': 'must be non-negative', 'items.0.qty': 'required'}).

The current implementation has three failing test groups. You are given the exact failing test commands and assertions. Your task is to debug the code, identify the root cause of each group, and implement targeted fixes — not just patch the crash site.

Walk through how you would debug each failure (reproduce, trace, locate the first bad value) and describe the precise code-level change you would make. Address each of the three Parts below.

Constraints & Assumptions

  • This is a debugging exercise on an existing codebase, not a greenfield design — prefer the smallest correct change at the right boundary, and reuse one normalization path rather than scattering if x is None guards.
  • The Invalid exception node exposes (Colander-style): msg (a string, a list of strings, or None ), children (a list of child Invalid nodes), node (the schema node, with a .name ), and pos (position of a child within a sequence/mapping).
  • A sentinel value (Colander calls it colander.null ) represents a missing/empty value during serialization; a node may carry a default to substitute when the value is missing.
  • Fixes must be backward compatible: real (non-empty) messages and valid child errors must never be dropped, and existing passing tests must stay green.
  • Assume CPython 3.x; you may add small private helper functions but should not pull in new third-party dependencies.

Clarifying Questions to Ask

  • What is the exact expected asdict() output shape — dotted keys mapping to a single joined string, or to a list of messages? What separator joins multiple messages on one node?
  • For a node with no message of its own but with failing children, should its key appear in the output at all, or only the children's keys?
  • Is colander.null (the missing sentinel) distinct from None and '' , and should all three be treated as "null-like" for Part 3, or only the sentinel?
  • When a sequence field is empty and has no default , what should serialize produce — [] , the sentinel, or an error?
  • Are message strings ever intentionally whitespace-only or '0' / 'false' -like values that I must not discard?
  • Can the same Invalid node legitimately appear more than once in the tree (shared references / cycles) such that I need to guard against double-counting or infinite recursion?

Part 1 — Error aggregation crashes on null / empty messages

Some validators produce a message of None, an empty string '', or a list that contains null/empty entries (e.g. [None, 'required']). The aggregation logic that combines messages for a node currently assumes every message is a non-empty string, so it raises (typically a TypeError from joining None, or it emits empty fragments like '; ').

Fix the aggregation so it filters out null/empty messages before combining them, while preserving every real message.

What This Part Should Cover

  • Locates where the bad value first enters (a validator returning None / '' /a mixed list), not just where join crashes.
  • Introduces a reusable normalization helper applied at one boundary rather than scattering guards across the call sites.
  • Preserves every real message and has a reasoned position on what to do with a node left with zero real messages (omit its key vs. emit '' ).

Part 2 — Nested validation errors are lost or crash

Validators nest across multiple schema levels, so an Invalid carries children, each of which may itself have messages and children. The flatten / asdict() walk currently handles the parent and the children inconsistently: nested values such as [None, 'must be positive'] either crash the walk or cause valid child errors to be silently dropped (e.g. when a parent has no useful message of its own).

Make asdict() propagate and combine inner errors correctly: no crash, no null messages in the output, and no lost valid child errors.

What This Part Should Cover

  • Treats parent and child uniformly: normalization runs at every recursion level, so nested nulls behave like root-level nulls.
  • Ensures children are visited even when a parent carries no message of its own, so failing descendants are not silently dropped.
  • Builds correct dotted paths, using .name for mappings and .pos for sequence positions, with no lost or duplicated leaves.

Part 3 — Sequence serialization defaults are applied inconsistently

During serialization, empty and null-like sequence inputs are treated differently. An empty list [], the missing sentinel, and a list containing only null-like placeholders (e.g. [None], [None, None]) currently take different branches, so the node's default is applied in some cases but not others.

Make default handling uniform: [], the missing sentinel, and "only null-like items" should all resolve to the same default-application path for a sequence field.

What This Part Should Cover

  • Defines a single "effectively empty" predicate that unifies the sentinel, [] , and all-null-like lists, then branches on it once instead of special-casing each form.
  • Handles the [] case correctly (e.g. notes that all(...) over an empty list is True ) so no separate special case is needed.
  • States what serialization produces when the field is empty and has no default, and keeps the change to the smallest correct boundary.

What a Strong Answer Covers

These dimensions span all three Parts:

  • Test-first debugging discipline : reproduce one failing test at a time, read the assertion, then trace to where the bad value first appears rather than where it crashes.
  • Root-cause vs symptom : identifies that Parts 1 and 2 share one cause (un-normalized messages) and fixes it at a boundary with a reusable helper, not with scattered guards.
  • Preserving valid data : filtering removes only genuinely empty messages; real messages and child errors survive, and existing passing tests stay green.
  • Communication : states assumptions out loud, asks the clarifying questions above, and explains each change before writing it.

Follow-up Questions

  • The interviewer says "your normalize helper recurses into lists — what happens if a malformed Invalid tree contains a cycle, and how would you make asdict() safe against unbounded recursion?"
  • "We now want asdict() to return a list of messages per key instead of a joined string. What changes, and how do you keep it backward compatible for existing callers?"
  • "How would you add regression tests that lock in these fixes so a future refactor can't silently reintroduce the null-message crash?"
  • "If two different child nodes flatten to the same dotted key (e.g. duplicate sequence positions), how should their messages be merged?"
Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Stripe•More Software Engineer•Stripe Software Engineer•Stripe Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.