PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

Snowflake software-engineer onsite coding question: design a compact, robust wire format to serialize and deserialize a dictionary trie. Covers DFS vs BFS encodings, delimiter vs length-prefix framing and escaping, explicit end-of-word markers, streaming, malformed/partial-input handling, forward/backward compatibility, Unicode, compression, memory limits, and O(N) complexity.

  • Medium
  • Snowflake
  • Coding & Algorithms
  • Software Engineer

Serialize and deserialize a dictionary trie

Company: Snowflake

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

##### Question Design and implement serialization and deserialization for a **trie** (prefix tree) that stores a large dictionary of lowercase English words. The trie supports shared prefixes, arbitrary branching, and per-node end-of-word marking. Your design must produce a compact, parseable wire format and be able to faithfully reconstruct the original trie. Implement the two operations: ``` serialize(root) -> bytes / string deserialize(data) -> root ``` Address the following in your design and discussion: 1. **Wire format.** Specify a compact encoding that captures the trie structure: which characters branch from each node, the shared-prefix sharing inherent to the trie, and an explicit **end-of-word marker** on nodes that terminate a word. 2. **Traversal order.** Compare a **DFS (pre-order)** encoding against a **BFS (level-order)** encoding. Discuss the trade-offs for compactness, streaming, and reconstruction. 3. **Framing.** Compare a **delimiter-based** scheme against a **length-prefix** scheme for delimiting children/nodes. Discuss **escape-character handling** if delimiters can collide with payload bytes. 4. **Robustness.** Detect and handle **malformed or partial/truncated input** gracefully (raise a clear error rather than producing a corrupt trie or crashing). 5. **Streaming.** Make the encoding **streaming-friendly** so a producer can write and a consumer can read incrementally without buffering the whole structure in memory. 6. **Evolvability.** Provide **forward/backward compatibility** so future fields (e.g. word frequency, payloads) can be added without breaking old readers/writers. 7. **Extensions.** Discuss handling **Unicode** (beyond a–z), **optional compression**, and behavior under tight **memory limits**. 8. **Complexity.** Analyze the **time and space complexity** of both serialize and deserialize. Provide pseudocode for both `serialize` and `deserialize`.

Quick Answer: Snowflake software-engineer onsite coding question: design a compact, robust wire format to serialize and deserialize a dictionary trie. Covers DFS vs BFS encodings, delimiter vs length-prefix framing and escaping, explicit end-of-word markers, streaming, malformed/partial-input handling, forward/backward compatibility, Unicode, compression, memory limits, and O(N) complexity.

A **trie** (prefix tree) stores a dictionary of words. Each node carries an end-of-word flag and a map of labeled child edges (one character per edge), so words sharing a prefix share the path to their divergence point. Design a wire format and implement `serialize(root) -> string` and `deserialize(data) -> root` so the trie can be faithfully reconstructed. **Recommended format (DFS pre-order, length-prefixed):** emit each node as `<eow_flag '0'/'1'>` then `<num_children>` (decimal digits terminated by `.`), then for each child in sorted order its edge character followed by that child's subtree. Because every node announces its child count up front, decoding is a single self-describing pass with **no delimiters and no escaping** — which also makes it streaming-friendly and robust to truncation (a bounds-checked reader raises a clear error instead of building a corrupt trie). **Your task (to make this runnable & checkable):** implement a function `solution(words)` that builds a trie from the input list `words`, serializes it, deserializes the result back into a fresh trie, and returns the **sorted list of distinct words** reconstructed from the rebuilt trie. A correct serialize/deserialize round-trip must return exactly the set of input words (deduplicated, sorted), including the empty string if present. **Example:** `words = ["cat", "car", "card", "dog"]` returns `["car", "card", "cat", "dog"]`. **Follow-ups to discuss:** DFS vs BFS encoding; delimiter vs length-prefix framing (and escape handling); detecting malformed/partial input; forward/backward compatibility via a magic+version header; Unicode (UTF-8 edges), optional compression, and bounded-memory streaming.

Constraints

  • 0 <= len(words) <= 10^4
  • Each word consists of lowercase English letters 'a'-'z' (the empty string "" is allowed and marks the root as end-of-word).
  • 0 <= len(word) <= 100
  • Duplicate words may appear in the input; the reconstructed trie stores each distinct word once.
  • The returned list must be sorted in ascending (lexicographic) order.

Examples

Input: (["cat", "car", "card", "dog"],)

Expected Output: ['car', 'card', 'cat', 'dog']

Explanation: Shared prefixes 'ca' and 'car' are stored once in the trie; the round trip recovers all four words sorted.

Input: ([],)

Expected Output: []

Explanation: Empty dictionary: the trie is just a non-terminal root with zero children; serialize/deserialize round-trips to an empty word set.

Input: (["a"],)

Expected Output: ['a']

Explanation: Single one-character word: root has one child 'a' marked end-of-word.

Input: (["", "a", "ab"],)

Expected Output: ['', 'a', 'ab']

Explanation: The empty string marks the root itself as end-of-word; the format's per-node eow flag must preserve this on the root.

Input: (["zebra", "zebra", "ze"],)

Expected Output: ['ze', 'zebra']

Explanation: Duplicate 'zebra' is stored once; 'ze' is an internal node that is also a terminal word along the path to 'zebra'.

Input: (["banana", "band", "ban", "can"],)

Expected Output: ['ban', 'banana', 'band', 'can']

Explanation: Branching at 'ban' (to 'banana' and 'band') and a separate root branch 'can' exercise arbitrary branching and shared prefixes.

Hints

  1. Model the trie as a labeled n-ary tree: each node has an end-of-word boolean and a map from edge character to child node.
  2. For a self-describing format, write each node's child count BEFORE its children. The decoder then reads exactly that many (character, subtree) pairs — no delimiters, no escaping.
  3. Pre-order DFS pairs naturally with recursion: emit the current node, then recurse into each child. Deserialize is the mirror image driven by the counts in the stream.
  4. Make every read bounds-checked. If the stream ends mid-node, a flag is out of range, or there are trailing bytes after the root subtree, raise a clear error rather than returning a half-built trie.
  5. To verify a round trip, collect words by DFS from both the original and the rebuilt trie; the deduplicated, sorted lists must match.
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a JSON Parser - Snowflake (hard)
  • Solve Matrix and Array Distance Problems - Snowflake (medium)
  • Solve Array Distance and Wiki Navigation - Snowflake (medium)
  • Implement Document Predicate APIs - Snowflake (medium)
  • Find Shortest Wiki Click Path - Snowflake (medium)