PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Google

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Last updated: Jul 1, 2026

Quick Overview

This question evaluates a candidate's understanding of concurrency primitives, specifically the internal mechanics of future/promise abstractions such as thread-pool execution, completion state machines, and thread-safe callback registration. It tests practical software engineering skill by requiring a non-blocking fan-out/fan-in implementation for parallel array processing built on that primitive, commonly used to assess depth of multi-threading knowledge beyond basic API usage.

  • Google
  • Software Engineering Fundamentals
  • Software Engineer

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Company: Google

Role: Software Engineer

Category: Software Engineering Fundamentals

Interview Round: Technical Screen

You are asked to design and implement, from scratch, a simplified asynchronous "future / promise" abstraction in the spirit of Java's `CompletableFuture` — call it `AsyncFuture<T>`. Go deep into the underlying mechanics, not just the public API: the thread-pool execution model, the completion state machine, how callbacks behave when they are registered *before* versus *after* completion, and how you guarantee thread safety and memory visibility across threads. Then use that primitive to build a parallel data-processing routine that splits a large input array into chunks, processes the chunks concurrently on the same primitive, and combines the per-chunk results into a single final result — without blocking a worker thread. The interviewer is probing two things: (1) your grasp of the multi-threading architecture and concurrency-control scheme behind a future, and (2) your ability to write correct array-splitting / fan-out-fan-in code on top of it. ### Constraints & Assumptions - Language is Java. You may **not** use `java.util.concurrent.CompletableFuture` itself, but you may use lower-level primitives (`ExecutorService`, `ReentrantLock`/`Condition`, `synchronized`, atomics, queues). - Single JVM, in-memory; no distributed concerns. - A future is completed **at most once** (either with a value or with a throwable). Every registered callback must run **exactly once**, regardless of whether it was registered before or after completion. - Composition must be **non-blocking**: transforming or combining futures must not block a pool thread waiting on `get()`. - The array-processing reduce operation is **associative**; element order is not significant to the final reduced value. - Input arrays can be large (think millions of elements); the per-chunk work may be CPU-bound or IO-bound. - The implementation must be correct under concurrent callback registration and completion from multiple threads. ### Clarifying Questions to Ask - Do we need cancellation and timeouts, or only completion plus composition (`thenApply` / `thenCompose` / combine)? - On which thread should callbacks run — the thread that completes the future, the pool, or a caller-supplied executor? Are there ordering guarantees among multiple callbacks on the same future? - Is the chunk reduce operation guaranteed associative, and does element order matter for the final result? - What is the expected input size, and is the per-chunk work CPU-bound or IO-bound (this drives pool sizing)? - If one chunk fails, should the whole computation fail fast, or should we collect partial results? - May I rely on JDK concurrency utilities (`ExecutorService`, `ReentrantLock`, atomics), or must this be built from raw threads and `synchronized` only? ### Part 1 — The core async future primitive Design and implement `AsyncFuture<T>` with at least: a static factory `supplyAsync(Supplier<T>, Executor)` that runs the supplier on the pool and completes the future; the composition operators `thenApply(Function<T,U>)`, `thenCompose(Function<T, AsyncFuture<U>>)`, and `thenCombine(AsyncFuture<U>, BiFunction<T,U,R>)`; a blocking `get()`; and exception propagation (`completeExceptionally`, with errors flowing through the composition chain). Explain the completion state machine and how you resolve the race between registering a callback and the future completing. ```hint State machine Model exactly two observable phases — PENDING and COMPLETED — backed by a single result slot that holds *either* a value *or* a throwable. The transition to COMPLETED must happen exactly once; gate it with a lock (or a CAS) so a second `complete*` call is a no-op. ``` ```hint The core race A callback may be added before *or* after completion. Resolve it under the same lock that guards the state: if still PENDING, enqueue the callback; if already COMPLETED, run it immediately with the stored result. Never lose a callback and never run one twice. Run user callbacks *outside* the lock to avoid holding it during arbitrary code. ``` ```hint Composition `thenApply` creates a fresh downstream `AsyncFuture`, then registers a callback on `this` that — on success — applies the function and completes the downstream, and on failure propagates the throwable. `thenCompose` does the same but the function returns a future, so you must *flatten*: register a second callback on the inner future to complete the downstream. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Array-splitting parallel processing Using `AsyncFuture`, implement a routine `parallelProcess(int[] data, int chunkSize, Function<int[], Long> mapChunk, BinaryOperator<Long> reduce, Executor pool)` that splits `data` into chunks of size `chunkSize`, processes each chunk asynchronously (one `supplyAsync` per chunk), and asynchronously folds the per-chunk results into a single `AsyncFuture<Long>`. Walk through how you split, dispatch, and combine, and how you choose `chunkSize` relative to the pool size. ```hint Splitting Compute chunk boundaries by index: chunk `start` covers `[start, min(start + chunkSize, n))`. Guard the last partial chunk, the empty-input case, and `chunkSize >= n`. Decide whether to copy each slice (self-contained tasks) or pass offsets to avoid copying. ``` ```hint Combine without blocking Launch one `supplyAsync` per chunk to get a `List<AsyncFuture<Long>>`, then fold that list with your `thenCombine` + `reduce` so the aggregate future completes only when every chunk is done. Do **not** call `get()` inside a pool task — a blocked worker waiting on work that needs another worker can starve or deadlock the pool. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - How would you add cancellation and timeouts? What should happen to a callback that is already mid-execution when cancellation arrives? - Your callbacks currently run on whichever thread completes the future. What are the risks (deep recursion through a long chain, a slow callback blocking the completer), and how would you offload callbacks to an executor? - How exactly does `thenCompose` differ from `thenApply` when the function returns a future, and why does flattening matter for correctness? - How would you implement `allOf` over N futures using a single atomic counter instead of nesting `thenCombine` calls pairwise, and how does that change the critical-path depth?

Quick Answer: This question evaluates a candidate's understanding of concurrency primitives, specifically the internal mechanics of future/promise abstractions such as thread-pool execution, completion state machines, and thread-safe callback registration. It tests practical software engineering skill by requiring a non-blocking fan-out/fan-in implementation for parallel array processing built on that primitive, commonly used to assess depth of multi-threading knowledge beyond basic API usage.

Related Interview Questions

  • Process Sharded Login Logs - Google (medium)
  • Design an ads retrieval service using a heap - Google (easy)
  • Design a waitlist manager - Google (easy)
  • Design an editable sequence with marker - Google (medium)
  • Design a Dormitory Room-Assignment System (OOD) - Google (medium)
|Home/Software Engineering Fundamentals/Google

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Google logo
Google
Jun 29, 2026, 12:00 AM
Software EngineerTechnical ScreenSoftware Engineering Fundamentals
1
0

You are asked to design and implement, from scratch, a simplified asynchronous "future / promise" abstraction in the spirit of Java's CompletableFuture — call it AsyncFuture<T>. Go deep into the underlying mechanics, not just the public API: the thread-pool execution model, the completion state machine, how callbacks behave when they are registered before versus after completion, and how you guarantee thread safety and memory visibility across threads.

Then use that primitive to build a parallel data-processing routine that splits a large input array into chunks, processes the chunks concurrently on the same primitive, and combines the per-chunk results into a single final result — without blocking a worker thread.

The interviewer is probing two things: (1) your grasp of the multi-threading architecture and concurrency-control scheme behind a future, and (2) your ability to write correct array-splitting / fan-out-fan-in code on top of it.

Constraints & Assumptions

  • Language is Java. You may not use java.util.concurrent.CompletableFuture itself, but you may use lower-level primitives ( ExecutorService , ReentrantLock / Condition , synchronized , atomics, queues).
  • Single JVM, in-memory; no distributed concerns.
  • A future is completed at most once (either with a value or with a throwable). Every registered callback must run exactly once , regardless of whether it was registered before or after completion.
  • Composition must be non-blocking : transforming or combining futures must not block a pool thread waiting on get() .
  • The array-processing reduce operation is associative ; element order is not significant to the final reduced value.
  • Input arrays can be large (think millions of elements); the per-chunk work may be CPU-bound or IO-bound.
  • The implementation must be correct under concurrent callback registration and completion from multiple threads.

Clarifying Questions to Ask

  • Do we need cancellation and timeouts, or only completion plus composition ( thenApply / thenCompose / combine)?
  • On which thread should callbacks run — the thread that completes the future, the pool, or a caller-supplied executor? Are there ordering guarantees among multiple callbacks on the same future?
  • Is the chunk reduce operation guaranteed associative, and does element order matter for the final result?
  • What is the expected input size, and is the per-chunk work CPU-bound or IO-bound (this drives pool sizing)?
  • If one chunk fails, should the whole computation fail fast, or should we collect partial results?
  • May I rely on JDK concurrency utilities ( ExecutorService , ReentrantLock , atomics), or must this be built from raw threads and synchronized only?

Part 1 — The core async future primitive

Design and implement AsyncFuture<T> with at least: a static factory supplyAsync(Supplier<T>, Executor) that runs the supplier on the pool and completes the future; the composition operators thenApply(Function<T,U>), thenCompose(Function<T, AsyncFuture<U>>), and thenCombine(AsyncFuture<U>, BiFunction<T,U,R>); a blocking get(); and exception propagation (completeExceptionally, with errors flowing through the composition chain). Explain the completion state machine and how you resolve the race between registering a callback and the future completing.

What This Part Should Cover Premium

Part 2 — Array-splitting parallel processing

Using AsyncFuture, implement a routine parallelProcess(int[] data, int chunkSize, Function<int[], Long> mapChunk, BinaryOperator<Long> reduce, Executor pool) that splits data into chunks of size chunkSize, processes each chunk asynchronously (one supplyAsync per chunk), and asynchronously folds the per-chunk results into a single AsyncFuture<Long>. Walk through how you split, dispatch, and combine, and how you choose chunkSize relative to the pool size.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • How would you add cancellation and timeouts? What should happen to a callback that is already mid-execution when cancellation arrives?
  • Your callbacks currently run on whichever thread completes the future. What are the risks (deep recursion through a long chain, a slow callback blocking the completer), and how would you offload callbacks to an executor?
  • How exactly does thenCompose differ from thenApply when the function returns a future, and why does flattening matter for correctness?
  • How would you implement allOf over N futures using a single atomic counter instead of nesting thenCombine calls pairwise, and how does that change the critical-path depth?
Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Google•More Software Engineer•Google Software Engineer•Google Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.