Meta Software Engineer Interview Prep Guide
Everything Meta actually asks Software Engineer candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.
Last updated

Technical Screen
Coding & Algorithms
-
Binary Tree Traversals, Vertical Order, And Views — covered in depth under Take-home Project below.
-
BST Algorithms And Lowest Common Ancestor — covered in depth under Take-home Project below.
-
Graph, Grid, BFS/DFS, And Union-Find — covered in depth under Onsite below.
-
Top-K, Heaps, Quickselect, And Frequency Analysis — covered in depth under Take-home Project below.
-
Arrays, Intervals, Sliding Windows, And Prefix Sums — covered in depth under Take-home Project below.
-
String Parsing, Palindromes, And Normalization — covered in depth under Take-home Project below.
-
Linked Lists, Pointers, Caches, And In-Memory Stores — covered in depth under Take-home Project below.
System Design
-
Leaderboards And Real-Time Ranking — covered in depth under Take-home Project below.
-
Auctions, Ticketing, And Real-Time Messaging — covered in depth under Onsite below.
Behavioral & Leadership
- Behavioral Ownership, Conflict, Ambiguity, And Growth — covered in depth under Onsite below.
Onsite
Coding & Algorithms
-
Binary Tree Traversals, Vertical Order, And Views — covered in depth under Take-home Project below.
-
BST Algorithms And Lowest Common Ancestor — covered in depth under Take-home Project below.

What's being tested
These problems test graph modeling from strings, grids, and matrices, then choosing the right traversal: BFS for shortest path, DFS for connected components/topological ordering, and memoized DFS for DAG-style dynamic programming. Interviewers are probing correctness on edge cases, complexity discipline, and clean implementation under Meta-style time pressure.
Patterns & templates
-
Grid BFS shortest path — use
deque, mark visited on enqueue, explore 4 or 8 directions;O(mn)time and space. -
Island counting — scan every cell, launch
dfs(r,c)or iterative stack on unvisited land; mutate grid or maintainvisited. -
Longest increasing path — model matrix as DAG by value order;
dfs(r,c)withmemo[r][c]givesO(mn). -
Topological sort for unknown alphabet — build directed edges from first differing chars, then use
in_degree+ queue; detect cycles. -
Custom lexicographic validation — map
char -> rank, compare adjacent words only; handle prefix invalid case like"abc"before"ab". -
Union-Find alternative for components —
find,union, path compression, union by rank; useful when connections are streamed or repeated. -
Sparse representation — for sparse dot product, store
{index: value}or sorted pairs; iterate over smaller map forO(min(k1,k2)).
Common pitfalls
Pitfall: Marking grid nodes visited only when popped from BFS can enqueue the same cell many times and distort shortest-path logic.
Pitfall: In alien dictionary, adding constraints from every character position is wrong; only the first differing character between adjacent words matters.
Pitfall: Recursive DFS can hit recursion limits on large grids; mention iterative stack or recursion-limit handling if dimensions are large.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
-
Top-K, Heaps, Quickselect, And Frequency Analysis — covered in depth under Take-home Project below.
-
Arrays, Intervals, Sliding Windows, And Prefix Sums — covered in depth under Take-home Project below.
-
String Parsing, Palindromes, And Normalization — covered in depth under Take-home Project below.
-
Linked Lists, Pointers, Caches, And In-Memory Stores — covered in depth under Take-home Project below.
System Design
- Leaderboards And Real-Time Ranking — covered in depth under Take-home Project below.
What's being tested
These prompts test whether you can design low-latency, high-concurrency distributed systems where many users observe or mutate the same shared object in real time. For auctions and ticketing, the hard part is preserving correctness under contention: one winning bid, one ticket allocation, no oversells, deterministic ordering, and recoverable state transitions. For messaging, the emphasis shifts toward delivery semantics, fanout, presence, and multi-device synchronization. Meta cares because many product surfaces combine real-time updates, massive fanout, partial failures, abuse pressure, and user-visible latency; a strong SWE should reason clearly about APIs, storage, sharding, consistency, and operational tradeoffs.
Core knowledge
-
Clarify the consistency boundary early. An auction’s highest bid or a ticket seat allocation usually needs strong consistency per auction, listing, or event; global strong consistency is unnecessary. The common design is “single-writer per hot entity” using a shard leader, partition lock, or transactional row update.
-
Model the write path around invariants, not just entities. For auctions:
Auction(id, seller_id, status, end_time, current_price, version),Bid(id, auction_id, bidder_id, amount, ts, status). The invariant is: accepted bid amount must exceed current price and auction must be open. Useversionor compare-and-swap. -
Optimistic concurrency control works well for moderate contention:
UPDATE auctions SET current_price=?, version=version+1 WHERE id=? AND version=? AND status='OPEN'. If the update count is zero, retry or reject. Under extreme contention, move to a per-auction serialized command queue. -
Partition by contention domain. For auctions, shard by
auction_id; for chat, shard byconversation_id; for tickets, shard byevent_idorsection_id. This keeps ordering local. A single partition can handle thousands to tens of thousands of ops/sec, but celebrity auctions may need special handling. -
Ordering is scoped, not universal. Use a monotonically increasing sequence number per auction or conversation, e.g.
bid_seqormessage_seq. Wall-clock timestamps are useful for display but unsafe for correctness because clocks skew and retries reorder requests. -
Real-time push typically uses WebSockets, Server-Sent Events, or mobile push notifications.
WebSocketssupport bidirectional low-latency updates; SSE is simpler for server-to-client streams. For disconnected clients, store durable events and resume fromlast_seen_seq. -
Delivery semantics should be explicit. Most real systems provide at-least-once delivery plus client-side deduplication using
message_id,bid_id, or an idempotency key. Exactly-once end-to-end is usually too expensive; exactly-once effects are achieved by idempotent writes and sequence checks. -
Fanout strategy depends on group size. For small chats or auction watchers, fan out updates directly to connected sessions. For large rooms or popular auctions, publish once to a broker such as
Kafka,Pulsar, orRedis Streams, then have gateway nodes pull and push to local connections. -
Hot keys are the central scaling risk. A viral auction, concert ticket drop, or large group chat can overload one shard. Mitigations include queueing, rate limits, per-user throttles, read replicas for state snapshots, batching updates, or subdividing inventory by section/block when correctness allows.
-
End-of-auction logic must be deterministic. Do not rely on a best-effort timer alone. Store
end_time, reject bids with server receive time after close, and run a closing job that transitionsOPEN -> CLOSING -> CLOSED. Late bids should have clear semantics: rejected, grace-period accepted, or anti-sniping extension. -
Ticketing introduces reservation leases. A seat can move
AVAILABLE -> HELD -> PURCHASEDwith a TTL, often 5–10 minutes. Use transactional updates or conditional writes to prevent oversell, and have an expiration worker release abandoned holds. Payment completion must be idempotent. -
Backpressure and degradation are design features. Track
p50,p95,p99latency, queue depth, droppedWebSocketconnections, retry rate, and accepted/rejected bid counts. Under overload, degrade by slowing reads, batching notifications, or rejecting low-priority requests before corrupting state.
Worked example
For Design an online auction platform, start by asking: are bids ascending only, is there a fixed end time, do we support automatic proxy bidding, what is the target scale, and what latency is expected for bid visibility? A strong assumption set might be: millions of users, thousands of active auctions, p99 bid acceptance under 200 ms, strong correctness per auction, and eventual consistency for watcher notifications.
Organize the answer around four pillars: data model, bid acceptance path, real-time propagation, and failure handling. The write path could route POST /auctions/{id}/bids through an API service to the shard owning auction_id, where a transaction checks auction state, compares amount against current_price, writes a durable bid row, increments bid_seq, and updates the auction snapshot. After commit, the service publishes BidAccepted(auction_id, bid_seq, amount) to a stream consumed by WebSocket gateway nodes.
The key tradeoff to call out is optimistic concurrency versus serialized command processing. Optimistic updates are simple and fast when contention is low, but a celebrity auction may cause heavy retries and tail latency; a per-auction queue or actor model serializes commands and gives deterministic ordering at the cost of queueing delay. Close by saying that, with more time, you would cover anti-abuse rate limits, seller/buyer trust controls, payment escrow, and observability around hot auctions and delayed fanout.
A second angle
For Design a real-time messenger, the same real-time infrastructure appears, but the correctness boundary changes from “one highest bid” to “ordered message history per conversation.” Instead of rejecting conflicting writes, the system assigns each accepted message a per-conversation message_seq and persists it before fanout. Delivery can be at-least-once, because duplicate messages can be removed by message_id; auctions cannot tolerate duplicate accepted bids changing state. Messaging also needs multi-device sync, read receipts, offline storage, and presence, while auctions focus more on contention control, close-time semantics, and state transitions.
Common pitfalls
Pitfall: Designing the system as if
Kafkaordering solves bid correctness.
A broker can preserve order within a partition, but it does not by itself enforce “bid must be higher than current price” or “auction must still be open.” Correctness belongs in the state transition layer: a transaction, compare-and-swap, single-writer actor, or conditional write around the authoritative auction record.
Pitfall: Jumping straight into components without defining invariants.
A weak answer lists load balancer, cache, database, queue, and WebSocket but never says what must never happen. A stronger answer states invariants first: no accepted bid below current price, no bids after close, no ticket oversell, monotonic sequence numbers, idempotent retries, and recoverable event history.
Pitfall: Treating real-time push as the source of truth.
WebSocket messages are a notification path, not durable state. Clients should be able to reconnect with last_seen_seq, fetch missed events from storage, and reconcile their local view with the authoritative auction, ticket, or conversation state.
Connections
Interviewers may pivot from here into rate limiting, idempotency, distributed locking, event sourcing, leader election, or cache invalidation. Ticketing variants often test the same ideas through inventory reservations and payment idempotency, while messaging variants probe delivery semantics, offline replay, and fanout at very large scale.
Further reading
-
Designing Data-Intensive Applications — Martin Kleppmann’s book is the best single source for replication, partitioning, transactions, streams, and consistency tradeoffs.
-
The Log: What every software engineer should know about real-time data’s unifying abstraction — Jay Kreps’ essay explains why ordered logs are so useful for replay, fanout, and recovery.
-
RFC 6455: The WebSocket Protocol — useful background on the protocol commonly used for bid updates, chat delivery, and real-time client sessions.
Practice questions
Behavioral & Leadership
What's being tested
Meta behavioral interviews for Software Engineers probe whether you can create forward progress when ownership is unclear, priorities shift, or people disagree. The interviewer is not looking for “I worked hard”; they are testing judgment under ambiguity, technical ownership, conflict resolution, and whether you can turn failure or feedback into better engineering behavior. Strong answers show how you clarified constraints, made tradeoffs explicit, influenced without authority, protected system quality, and learned in a way that changed future execution. Meta cares because engineers often operate in large, fast-moving codebases where cross-team dependencies, legacy systems, launch pressure, and incomplete requirements are normal.
Core knowledge
-
STAR-L framing is the baseline: Situation, Task, Action, Result, Learning. For Meta, the “Action” section should be the longest and include concrete engineering moves: design review, rollout plan, debugging path, ownership boundary, or tradeoff analysis.
-
Ownership means driving the outcome, not personally doing every task. A strong SWE story distinguishes between what you owned directly—API design, migration plan, service reliability, code review quality—and what you influenced through partner teams, escalation, or written alignment.
-
Ambiguity management starts by reducing unknowns into decisions. Good clarifying questions include: “What is the user-visible failure?”, “What is the deadline tied to?”, “What correctness or latency bar matters?”, “Who owns upstream/downstream systems?”, and “What happens if we do nothing?”
-
Conflict resolution should be anchored in technical evidence, not personality. Instead of “the other engineer was wrong,” explain the competing proposals, such as synchronous RPC versus async queue, monolith patch versus service boundary, or fast launch versus reliability hardening, then show how you evaluated risks.
-
Tradeoff language is critical. For example, “I chose a feature-flagged incremental rollout over a full rewrite because it reduced blast radius, even though it temporarily increased code complexity.” Meta interviewers listen for explicit cost-benefit reasoning, not hindsight-perfect decisions.
-
Escalation is not failure when used responsibly. A mature engineer first aligns facts, documents options, and seeks peer review; escalation to a tech lead or manager is appropriate when impact, deadline, or ownership remains unresolved after reasonable attempts.
-
Cross-team collaboration often involves dependency risk. Strong examples mention interface contracts, API versioning, rollout sequencing, backwards compatibility, migration ownership, and fallback behavior. Avoid drifting into program management; keep the focus on engineering decisions and execution.
-
Failure stories should include a real mistake and a changed behavior. “The launch slipped because another team was late” is weak. Better: “I underestimated integration risk, so I now create an integration milestone before feature-complete and add contract tests around shared APIs.”
-
Feedback receptivity is judged by behavior change, not agreement. If leadership disagreed with your approach, show how you separated intent from wording, asked for concrete examples, tested the feedback against outcomes, and adjusted your communication or technical approach.
-
Impact measurement for SWE behavioral answers should use engineering and user-facing signals:
`p95`latency, error rate, crash-free sessions, on-call pages, deployment frequency, rollback rate, migration completion, code review cycle time, or reduction in manual operations. -
Bias toward action does not mean reckless execution. A good Meta-style answer shows you made a reversible decision when possible, used feature flags or staged rollout, instrumented key paths, and set a checkpoint to revisit the decision with fresh data.
-
Learning loop should be operationalized. Mention artifacts like postmortems, runbooks, design docs, regression tests, dashboards, code ownership updates, or review checklists. The strongest endings show how the lesson prevented a later bug, outage, delay, or misalignment.
Worked example
For “Describe an ambiguous project you handled,” a strong candidate would frame the first 30 seconds by stating the context, the unknowns, and the stakes: “I was asked to improve reliability for a service with rising `p99` latency, but there was no clear owner, no agreed target, and several suspected causes.” They would clarify assumptions: whether the ambiguity was technical, organizational, or requirement-driven; what success metric mattered; and what constraints existed around launch dates or backward compatibility. The answer skeleton should have four pillars: first, how they scoped the ambiguity; second, how they gathered evidence; third, how they aligned stakeholders; fourth, how they delivered incrementally.
A strong story might describe creating a short design note that listed candidate causes, owners, risks, and a proposed phased plan. The candidate could explain that they compared two options: a broad rewrite of the request path versus a narrower fix around a known cache-miss pattern. The explicit tradeoff: the rewrite might have produced a cleaner architecture, but the narrower fix was safer because it could be guarded by a feature flag, rolled out to 5%, then 25%, then 100%, and measured using `p95`/`p99` latency and error rate. They should also mention how they communicated uncertainty: “I told the team this would not solve every latency issue, but it would validate whether this path contributed materially to tail latency.” A good close would be: “If I had more time, I would have added earlier load testing and documented ownership for the dependency so the next incident had a clear responder.”
A second angle
For “Share different perspective from leadership feedback,” the same ownership principle applies, but the emphasis shifts from ambiguity in the work to ambiguity in interpersonal interpretation. The candidate should avoid sounding defensive; the goal is to show they could disagree with feedback while still extracting a useful signal. For example, leadership might say, “You are moving too slowly,” while the engineer believes they were protecting reliability during a risky migration. A strong answer would separate the facts from the label: they could ask what signals created that perception, show the risk analysis behind their rollout plan, and then adjust by communicating milestones more visibly. The lesson is not “I convinced leadership I was right”; it is “I learned that good technical judgment still needs proactive communication.”
Common pitfalls
Pitfall: Giving a generic teamwork story with no engineering substance.
A weak answer says, “I scheduled meetings, listened to everyone, and we compromised.” That sounds cooperative but not senior enough for a SWE interview. A better answer names the technical disagreement, such as data consistency versus availability, client-side versus server-side validation, or short-term patch versus long-term migration, then explains how you resolved it.
Pitfall: Presenting conflict as a personality problem.
Avoid “the other team was difficult,” “my manager did not understand,” or “leadership kept changing requirements.” Interviewers want evidence that you can operate inside messy organizations without blame. Reframe the issue as misaligned incentives, unclear ownership, insufficient data, or different risk tolerance, then show how you created alignment.
Pitfall: Ending with impact but no learning.
Many candidates finish with “we launched successfully” or “the bug was fixed.” For ownership and growth questions, that is incomplete. Add what changed afterward: a new pre-launch checklist, better design review practice, improved alerting, earlier dependency mapping, or a personal habit like writing decision logs for ambiguous projects.
Connections
Interviewers may pivot from this area into system design tradeoffs, especially if your story involves reliability, migrations, APIs, or scaling. They may also ask follow-ups on debugging, incident response, code quality, technical influence, or prioritization under constraints, so prepare stories that can go one layer deeper technically.
Further reading
-
Crucial Conversations — Practical framework for handling disagreement without avoiding the real issue.
-
The Staff Engineer’s Path — Strong treatment of technical leadership, influence without authority, and operating through ambiguity.
-
Google SRE Book: Postmortem Culture — Useful model for discussing failure, accountability, and learning without blame.
Practice questions
Take-home Project
Coding & Algorithms

What's being tested
These problems test binary tree traversal with positional state: tracking depth for side views, column index for vertical order, and sometimes row/order for tie-breaking. Interviewers are probing whether you can choose BFS vs DFS, preserve required ordering, handle empty/single-node trees, and explain O(n) or sorting-related complexity clearly.
Patterns & templates
-
Right side view via BFS — process level by level; append the last node seen per level;
O(n)time,O(w)space. -
Left and right views in one pass — during level-order traversal, record first and last node per level; avoid two separate traversals.
-
DFS depth tracking — visit right-first for right view or left-first for left view; record first value at each depth;
O(h)recursion space. -
Vertical order traversal — assign root column
0, leftcol - 1, rightcol + 1; group values by column in adict. -
BFS for vertical order tie-breaking — when ties are by breadth-first visitation order, use a queue of
(node, col)instead of DFS. -
Column output ordering — track
min_colandmax_colduring traversal forO(k)ordered output, or sort column keys forO(k log k). -
BST to doubly linked list — use in-order traversal to relink
leftasprevandrightasnext; preserve sorted order.
Common pitfalls
Pitfall: Using DFS for vertical order when the expected tie-break is BFS order; this can silently produce the wrong sequence within a column.
Pitfall: Appending every node at a depth for right view instead of only the last visible node per level.
Pitfall: Forgetting that recursion depth can be
O(n)on a skewed tree; call this out or use iterative traversal if stack overflow matters.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
BST algorithms test whether you exploit ordering instead of treating the tree as an arbitrary binary tree. Expect recursive/iterative traversal, subtree aggregation, in-order ordering, range pruning, pointer rewiring, and sometimes lowest common ancestor via split-point logic.
Patterns & templates
-
In-order traversal gives sorted order in a BST; use for
convertBSTToDoublyList, validation, kth element, and sorted accumulation. -
Post-order aggregation returns
(sum, count)or richer tuples from children; ideal for subtree-average checks inO(n)time. -
Range pruning for
rangeSumBST(root, low, high)skips left whennode.val < lowand skips right whennode.val > high. -
Iterative DFS stack avoids recursion-depth failures on skewed trees; same
O(h)space average,O(n)worst-case. -
BST LCA split point: if both targets are less, go left; if both greater, go right; otherwise current node is LCA in
O(h). -
Preorder BST construction uses bounds or a monotonic index; each value consumed once,
O(n)time,O(h)recursion stack. -
In-place pointer threading for doubly list conversion tracks
prevandhead; carefully setleft/rightasprev/next.
Common pitfalls
Pitfall: Doing full traversal for range sum when BST ordering allows pruning; interviewers expect the optimized branch-skipping version.
Pitfall: Returning only a subtree sum for average checks; you need both
sumandcount, and integer division semantics must be clarified.
Pitfall: Forgetting skewed-tree behavior: recursion can hit
O(n)depth, and “balanced tree” should not be assumed unless stated.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
Top-K selection tests whether you can avoid unnecessary full sorting when only a small subset or order statistic is needed. Interviewers probe your command of heaps, quickselect, frequency maps, and complexity tradeoffs under constraints like duplicates, ties, and large n.
Patterns & templates
-
K closest points — compare squared distance
x*x + y*y; use full sortO(n log n), max-heapO(n log k), or quickselect averageO(n). -
K-th largest element — convert to index
n - kin sorted ascending order; implementquickselect(nums, left, right, target)carefully. -
Min-heap for K largest — push elements, pop when size exceeds
k; final heap contains answer set inO(n log k)time. -
Max-heap for K smallest / closest — store negative priority or custom comparator; cap heap size at
kto avoidO(n)heap growth. -
Frequency analysis — build
Counter/ hashmap inO(n), then select topkby heap, bucket sort, or quickselect over unique keys. -
Bucket sort for frequencies — array of
n + 1buckets givesO(n)time for top frequent elements; space isO(n). -
Quickselect partitioning — average
O(n), worst-caseO(n^2); randomize pivot and be precise about<,>, and equal values.
Common pitfalls
Pitfall: Sorting everything by default is correct but may miss the expected optimization when the interviewer asks for
O(n)orO(n log k).
Pitfall: For K-th largest, confusing
kwith zero-based index causes off-by-one errors; usetarget = len(nums) - k.
Pitfall: Returning heap contents without considering order is usually fine for “top K elements,” but not for “sorted top K”; clarify output requirements.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
Meta interviewers are probing linear-time array reasoning: transforming arrays without extra passes, counting contiguous ranges with prefix sums, and maintaining interval/window invariants under edge cases. You need to state constraints, choose the right template quickly, and defend O(n) or O(n log n) tradeoffs.
Patterns & templates
-
Prefix/suffix products for
productExceptSelf— two passes,O(n)time,O(1)extra space excluding output; handle one or multiple zeros. -
Prefix sum + hashmap for target-sum subarrays — maintain
count[prefix - k]; initialize{0: 1}; works with negatives unlike sliding window. -
Range-sum precomputation — build
prefix[i + 1] = prefix[i] + nums[i]; answersum(l,r)asprefix[r+1] - prefix[l]inO(1). -
Sliding window for longest transformable consecutive segment — expand right, track violation budget, shrink left until valid; usually
O(n). -
Interval gap scan for missing ranges — track previous boundary, compare
prev + 1tocurr - 1; guard empty input and integer limits. -
Order statistics for k-th largest — use min-heap size
kforO(n log k)or Quickselect averageO(n); clarify mutation.
Common pitfalls
Pitfall: Using sliding window for target-sum subarrays when numbers can be negative; use prefix-sum counts instead.
Pitfall: Forgetting boundary sentinels in missing ranges, especially empty arrays,
lower,upper, and 32-bit overflow aroundINT_MIN/INT_MAX.
Pitfall: Claiming
O(1)space forproductExceptSelfwhile allocating separate left and right arrays; only the output array may be excluded.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
String parsing and palindrome reasoning are tested through two-pointer scans, dynamic programming, center expansion, and careful character normalization. Interviewers are probing whether you can turn ambiguous text rules into correct code with explicit complexity, clean edge-case handling, and readable implementation.
Patterns & templates
-
Two-pointer palindrome check —
isPal(l, r)runs inO(n)time,O(1)space; compare inward after normalization. -
One-deletion near-palindrome — on first mismatch, test
isPal(l+1, r)orisPal(l, r-1); avoid branching recursively. -
Palindrome permutation — track odd character counts with
Counteror a bitmask; valid iff odd count is<= 1. -
Center expansion for substrings — expand around
2n-1centers;O(n^2)time,O(1)space; count each successful expansion. -
K-deletion palindrome DP — compute longest palindromic subsequence, answer
n - LPS <= k;O(n^2)time, optimizable toO(n)space. -
Decimal string addition — scan right-to-left with carry; never cast to integer if arbitrary precision is required.
-
Expression parsing without stack — maintain
result,last_term,num, andop; handle precedence by adjusting the previous term.
Common pitfalls
Pitfall: Ignoring normalization rules: clarify case-folding, whitespace, punctuation, and Unicode before coding palindrome checks.
Pitfall: For near-palindromes, deleting repeatedly turns a one-deletion problem into exponential search; only branch once at the first mismatch.
Pitfall: In expression evaluation, integer division semantics vary; state whether truncation is toward zero, floor, or language default.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
These problems test pointer manipulation, hash map + linked list composition, and stateful in-memory data modeling under strict complexity guarantees. Interviewers are probing whether you can preserve invariants, reason about edge cases, and explain tradeoffs like O(1) average access versus extra memory.
Patterns & templates
-
Hash map deep copy for
copyRandomList— map original nodes to clones inO(n)time/space; handlenullrandom pointers cleanly. -
Interleaved linked-list cloning — weave clone nodes into the original list for
O(n)time andO(1)auxiliary space; restore original links. -
Character stream comparison across linked string chunks — implement
nextChar()iterators; compare lazily inO(total chars)time. -
LRU cache template — combine
dict[key] -> nodewith a doubly linked list;getandputmust both move nodes to front. -
TTL/versioned store design — store per key-field a sorted history of
(timestamp, value, expiry); use binary search for historical reads. -
Top-k selection for closest points — use max-heap size
kforO(n log k)or Quickselect averageO(n); avoid square roots. -
Invariant-first coding — define sentinel
head/tail, node ownership, expiration semantics, and tie-breaking before writing update logic.
Common pitfalls
Pitfall: Updating cache values without refreshing recency breaks LRU semantics; every successful
getand existing-keyputshould move the node.
Pitfall: Deep-copying only
nextpointers creates sharedrandomreferences; verify cloned nodes never point back into the original structure.
Pitfall: For TTL history, confusing “current value expired” with “historical value never existed” leads to incorrect
getAt(timestamp)behavior.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
System Design

What's being tested
Interviewers are probing whether you can design a low-latency ranked data system under frequent updates, large fanout reads, and correctness constraints. Strong answers combine data structures like heaps, balanced trees, and skip lists with distributed-system choices around sharding, caching, consistency, and failure recovery. Meta cares because ranking-like systems appear in feeds, games, payments, creator dashboards, messaging metadata, and abuse systems where users expect fast, fresh, and explainable ordering. The interviewer is not looking for one magic database; they want to see you identify query patterns, pick the right ranking representation, and reason through scale, ties, time windows, and real-time updates.
Core knowledge
-
Leaderboard query patterns determine the design. Top- queries, “rank of user,” “users around me,” percentile buckets, and time-windowed rankings need different indexes. A system optimized only for
GET top 100may fail badly onGET rank(user_id)at scale. -
Sorted sets are the canonical single-node structure.
RedisZSETuses a hash table plus skip list, supportingZADD,ZREVRANGE, andZREVRANKin roughly updates and range reads. This is excellent up to millions of members per shard. -
Tie-breaking must be explicit and stable. Common ordering is
(score DESC, updated_at ASC, user_id ASC)or(amount DESC, user_id ASC). If two users have equal scores and the API does not define ties, pagination becomes inconsistent and users may see rank flicker. -
Write path usually separates durable events from serving state. A score update can be written to a durable store like
MySQL,Postgres,Cassandra, orDynamoDB, then applied to a serving index such asRedis. For real-time systems, the serving index is optimized for reads, not treated as the only source of truth. -
Read path should match latency goals. For a game leaderboard with
p99 < 100ms, serve top- and neighbor queries from memory-backed indexes. Durable stores are better for history, reconciliation, and rebuilding, not for every rank query. -
Sharding rankings is hard because rank is global. Sharding by
user_idgives balanced writes but makes global top- require merging per-shard top lists. Sharding by score range helps range queries but creates hot shards near the top. A common approach is per-shard top- plus a merger service for global results. -
Top- can be maintained cheaply with a min-heap. For streaming “maintain top 100 payers,” keep a hash map
payer_id -> totaland a heap ordered by(total, tie_breaker). Updates are with lazy deletion or indexed heaps; full exact ranking still requires an ordered structure over all users. -
Rank computation can be exact or approximate. Exact rank requires knowing how many users have greater score: . Approximate rank can use histograms, quantile sketches, or bucketed score ranges when exact neighbor ordering is unnecessary.
-
Time windows multiply storage requirements. “Daily,” “weekly,” and “all-time” leaderboards are often separate indexes:
game_id:daily:2026-05-23,game_id:weekly:2026-W21, andgame_id:alltime. Sliding windows are harder than calendar windows because old events must expire or be subtracted. -
Idempotency matters for score updates. Retried client requests, duplicate transaction events, or replayed scheduled withdrawals can double-count unless each mutation has an idempotency key such as
event_id. Store applied event IDs or make updates derived from authoritative totals instead of blindly incrementing. -
Consistency tradeoffs should be named. Many leaderboards accept eventual consistency of a few seconds, but financial rankings or payment totals may require read-after-write correctness. State whether users need immediate rank visibility, monotonic score updates, or merely eventual convergence.
-
Operational recovery requires rebuildability. If
Redisloses state or a shard is corrupted, you need to reconstruct rankings from durable score snapshots plus event logs. A good design includes periodic snapshots, replay from a known offset, and comparison jobs to detect divergence.
Worked example
For Design a real-time game leaderboard, a strong candidate first clarifies: “Are rankings per game, per region, or global? What queries are required: top 100, my rank, friends around me, or historical seasons? What are update and read rates, acceptable staleness, and tie rules?” Then they declare reasonable assumptions, for example: 100M players, 1M daily active players per game, score updates during matches, p99 read latency under 100ms, and eventual consistency under 2 seconds acceptable.
The answer can be organized around four pillars: API design, data model, serving architecture, and scaling/failure handling. APIs might include POST /scores, GET /leaderboards/{game_id}/top?limit=100, GET /leaderboards/{game_id}/users/{user_id}/rank, and GET /leaderboards/{game_id}/users/{user_id}/neighbors. The data model stores authoritative player scores in a durable table keyed by (game_id, season_id, user_id) and keeps a serving index in Redis ZSET keyed by (game_id, season_id, region).
For scaling, the candidate should discuss per-partition leaderboards and a fan-in aggregator: each shard maintains top-, then a merger computes global top- by k-way merge. For rank(user), exact global rank across shards is expensive unless each shard can answer “count scores greater than X,” so the candidate should either propose ordered indexes per shard or state an approximation. A key tradeoff to flag is freshness versus global exactness: exact rank on every write may add cross-shard coordination, while eventual global rank gives much lower latency and higher availability.
A strong close would be: “If I had more time, I’d cover anti-cheat score validation, season rollover, backfills from durable logs, cache warmup after failover, and observability such as update lag, rank-query p99, and shard hotness.”
A second angle
For Maintain top N payers, the same ranking concept appears, but the constraints are more algorithmic and correctness-heavy. Instead of designing a full user-facing leaderboard with neighbor queries, you can focus on maintaining an aggregate payer_id -> cumulative_amount and returning the top payers after each transaction. If is small, a hash map plus min-heap is better than globally sorting after every payment; if arbitrary ranks are needed, switch to a balanced tree or sorted-set representation. Payment events also raise stricter idempotency and correctness requirements than games: duplicate transaction processing or inconsistent tie-breaking can create visibly wrong financial results.
Common pitfalls
Pitfall: Treating the problem as “just use
Redis” without matching data structure to query shape.
Redis ZSET is a strong component, not a complete design. If the interviewer asks for global rank, time windows, durability, or cross-region behavior, you need to explain how the sorted set is populated, partitioned, rebuilt, and reconciled.
Pitfall: Ignoring exactness and tie semantics.
A tempting answer is “sort by score and return the top users,” but real systems need deterministic ordering. Define whether equal scores share a rank, use dense ranking, or use unique positions; then carry that rule through storage, pagination, and API responses.
Pitfall: Over-indexing every possible leaderboard too early.
Maintaining all-time, daily, weekly, regional, friends-only, clan, and percentile leaderboards on every update can explode write amplification. A better answer starts with the core access patterns, precomputes the hot ones, and computes rare views asynchronously or from durable aggregates.
Connections
Interviewers may pivot from leaderboards into caching strategy, event-driven architecture, distributed counters, rate limiting, or pagination consistency. They may also ask for an in-memory version, where the focus shifts to heaps, balanced trees, hash maps, and complexity analysis rather than distributed storage.
Further reading
-
Redis Sorted Sets documentation — practical commands and complexity for the most common leaderboard primitive.
-
Designing Data-Intensive Applications — deeper treatment of replication, partitioning, consistency, and stream-derived materialized views.
-
The Log: What every software engineer should know about real-time data's unifying abstraction — useful mental model for rebuilding serving indexes from durable event history.
Practice questions