How do I approach Software Engineering Fundamentals interview questions?

Software Engineering Fundamentals questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master software engineering fundamentals interviews.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Google during technical interviews.

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Quick Overview

This question evaluates a candidate's understanding of concurrency primitives, specifically the internal mechanics of future/promise abstractions such as thread-pool execution, completion state machines, and thread-safe callback registration. It tests practical software engineering skill by requiring a non-blocking fan-out/fan-in implementation for parallel array processing built on that primitive, commonly used to assess depth of multi-threading knowledge beyond basic API usage.

Company: Google

Role: Software Engineer

Category: Software Engineering Fundamentals

Interview Round: Technical Screen

You are asked to design and implement, from scratch, a simplified asynchronous "future / promise" abstraction in the spirit of Java's `CompletableFuture` — call it `AsyncFuture<T>`. Go deep into the underlying mechanics, not just the public API: the thread-pool execution model, the completion state machine, how callbacks behave when they are registered *before* versus *after* completion, and how you guarantee thread safety and memory visibility across threads. Then use that primitive to build a parallel data-processing routine that splits a large input array into chunks, processes the chunks concurrently on the same primitive, and combines the per-chunk results into a single final result — without blocking a worker thread. The interviewer is probing two things: (1) your grasp of the multi-threading architecture and concurrency-control scheme behind a future, and (2) your ability to write correct array-splitting / fan-out-fan-in code on top of it. ### Constraints & Assumptions - Language is Java. You may **not** use `java.util.concurrent.CompletableFuture` itself, but you may use lower-level primitives (`ExecutorService`, `ReentrantLock`/`Condition`, `synchronized`, atomics, queues). - Single JVM, in-memory; no distributed concerns. - A future is completed **at most once** (either with a value or with a throwable). Every registered callback must run **exactly once**, regardless of whether it was registered before or after completion. - Composition must be **non-blocking**: transforming or combining futures must not block a pool thread waiting on `get()`. - The array-processing reduce operation is **associative**; element order is not significant to the final reduced value. - Input arrays can be large (think millions of elements); the per-chunk work may be CPU-bound or IO-bound. - The implementation must be correct under concurrent callback registration and completion from multiple threads. ### Clarifying Questions to Ask - Do we need cancellation and timeouts, or only completion plus composition (`thenApply` / `thenCompose` / combine)? - On which thread should callbacks run — the thread that completes the future, the pool, or a caller-supplied executor? Are there ordering guarantees among multiple callbacks on the same future? - Is the chunk reduce operation guaranteed associative, and does element order matter for the final result? - What is the expected input size, and is the per-chunk work CPU-bound or IO-bound (this drives pool sizing)? - If one chunk fails, should the whole computation fail fast, or should we collect partial results? - May I rely on JDK concurrency utilities (`ExecutorService`, `ReentrantLock`, atomics), or must this be built from raw threads and `synchronized` only? ### Part 1 — The core async future primitive Design and implement `AsyncFuture<T>` with at least: a static factory `supplyAsync(Supplier<T>, Executor)` that runs the supplier on the pool and completes the future; the composition operators `thenApply(Function<T,U>)`, `thenCompose(Function<T, AsyncFuture<U>>)`, and `thenCombine(AsyncFuture<U>, BiFunction<T,U,R>)`; a blocking `get()`; and exception propagation (`completeExceptionally`, with errors flowing through the composition chain). Explain the completion state machine and how you resolve the race between registering a callback and the future completing. ```hint State machine Model exactly two observable phases — PENDING and COMPLETED — backed by a single result slot that holds *either* a value *or* a throwable. The transition to COMPLETED must happen exactly once; gate it with a lock (or a CAS) so a second `complete*` call is a no-op. ``` ```hint The core race A callback may be added before *or* after completion. Resolve it under the same lock that guards the state: if still PENDING, enqueue the callback; if already COMPLETED, run it immediately with the stored result. Never lose a callback and never run one twice. Run user callbacks *outside* the lock to avoid holding it during arbitrary code. ``` ```hint Composition `thenApply` creates a fresh downstream `AsyncFuture`, then registers a callback on `this` that — on success — applies the function and completes the downstream, and on failure propagates the throwable. `thenCompose` does the same but the function returns a future, so you must *flatten*: register a second callback on the inner future to complete the downstream. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Array-splitting parallel processing Using `AsyncFuture`, implement a routine `parallelProcess(int[] data, int chunkSize, Function<int[], Long> mapChunk, BinaryOperator<Long> reduce, Executor pool)` that splits `data` into chunks of size `chunkSize`, processes each chunk asynchronously (one `supplyAsync` per chunk), and asynchronously folds the per-chunk results into a single `AsyncFuture<Long>`. Walk through how you split, dispatch, and combine, and how you choose `chunkSize` relative to the pool size. ```hint Splitting Compute chunk boundaries by index: chunk `start` covers `[start, min(start + chunkSize, n))`. Guard the last partial chunk, the empty-input case, and `chunkSize >= n`. Decide whether to copy each slice (self-contained tasks) or pass offsets to avoid copying. ``` ```hint Combine without blocking Launch one `supplyAsync` per chunk to get a `List<AsyncFuture<Long>>`, then fold that list with your `thenCombine` + `reduce` so the aggregate future completes only when every chunk is done. Do **not** call `get()` inside a pool task — a blocked worker waiting on work that needs another worker can starve or deadlock the pool. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - How would you add cancellation and timeouts? What should happen to a callback that is already mid-execution when cancellation arrives? - Your callbacks currently run on whichever thread completes the future. What are the risks (deep recursion through a long chain, a slow callback blocking the completer), and how would you offload callbacks to an executor? - How exactly does `thenCompose` differ from `thenApply` when the function returns a future, and why does flattening matter for correctness? - How would you implement `allOf` over N futures using a single atomic counter instead of nesting `thenCombine` calls pairwise, and how does that change the critical-path depth?

Quick Answer: This question evaluates a candidate's understanding of concurrency primitives, specifically the internal mechanics of future/promise abstractions such as thread-pool execution, completion state machines, and thread-safe callback registration. It tests practical software engineering skill by requiring a non-blocking fan-out/fan-in implementation for parallel array processing built on that primitive, commonly used to assess depth of multi-threading knowledge beyond basic API usage.

You are asked to design and implement, from scratch, a simplified asynchronous "future / promise" abstraction in the spirit of Java's CompletableFuture — call it AsyncFuture<T>. Go deep into the underlying mechanics, not just the public API: the thread-pool execution model, the completion state machine, how callbacks behave when they are registered before versus after completion, and how you guarantee thread safety and memory visibility across threads.

Then use that primitive to build a parallel data-processing routine that splits a large input array into chunks, processes the chunks concurrently on the same primitive, and combines the per-chunk results into a single final result — without blocking a worker thread.

The interviewer is probing two things: (1) your grasp of the multi-threading architecture and concurrency-control scheme behind a future, and (2) your ability to write correct array-splitting / fan-out-fan-in code on top of it.

Constraints & Assumptions

Language is Java. You may not use java.util.concurrent.CompletableFuture itself, but you may use lower-level primitives ( ExecutorService , ReentrantLock / Condition , synchronized , atomics, queues).
Single JVM, in-memory; no distributed concerns.
A future is completed at most once (either with a value or with a throwable). Every registered callback must run exactly once , regardless of whether it was registered before or after completion.
Composition must be non-blocking : transforming or combining futures must not block a pool thread waiting on get() .
The array-processing reduce operation is associative ; element order is not significant to the final reduced value.
Input arrays can be large (think millions of elements); the per-chunk work may be CPU-bound or IO-bound.
The implementation must be correct under concurrent callback registration and completion from multiple threads.

Clarifying Questions to Ask

Do we need cancellation and timeouts, or only completion plus composition ( thenApply / thenCompose / combine)?
On which thread should callbacks run — the thread that completes the future, the pool, or a caller-supplied executor? Are there ordering guarantees among multiple callbacks on the same future?
Is the chunk reduce operation guaranteed associative, and does element order matter for the final result?
What is the expected input size, and is the per-chunk work CPU-bound or IO-bound (this drives pool sizing)?
If one chunk fails, should the whole computation fail fast, or should we collect partial results?
May I rely on JDK concurrency utilities ( ExecutorService , ReentrantLock , atomics), or must this be built from raw threads and synchronized only?

Part 1 — The core async future primitive

Design and implement AsyncFuture<T> with at least: a static factory supplyAsync(Supplier<T>, Executor) that runs the supplier on the pool and completes the future; the composition operators thenApply(Function<T,U>), thenCompose(Function<T, AsyncFuture<U>>), and thenCombine(AsyncFuture<U>, BiFunction<T,U,R>); a blocking get(); and exception propagation (completeExceptionally, with errors flowing through the composition chain). Explain the completion state machine and how you resolve the race between registering a callback and the future completing.

What This Part Should Cover Premium

Part 2 — Array-splitting parallel processing

Using AsyncFuture, implement a routine parallelProcess(int[] data, int chunkSize, Function<int[], Long> mapChunk, BinaryOperator<Long> reduce, Executor pool) that splits data into chunks of size chunkSize, processes each chunk asynchronously (one supplyAsync per chunk), and asynchronously folds the per-chunk results into a single AsyncFuture<Long>. Walk through how you split, dispatch, and combine, and how you choose chunkSize relative to the pool size.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

How would you add cancellation and timeouts? What should happen to a callback that is already mid-execution when cancellation arrives?
Your callbacks currently run on whichever thread completes the future. What are the risks (deep recursion through a long chain, a slow callback blocking the completer), and how would you offload callbacks to an executor?
How exactly does thenCompose differ from thenApply when the function returns a future, and why does flattening matter for correctness?
How would you implement allOf over N futures using a single atomic counter instead of nesting thenCombine calls pairwise, and how does that change the critical-path depth?

Quick Overview

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Company: Google

Role: Software Engineer

Category: Software Engineering Fundamentals

Interview Round: Technical Screen

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Quick Overview

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — The core async future primitive

What This Part Should Cover Premium

Part 2 — Array-splitting parallel processing

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Write your answer

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Quick Overview

Build a Custom CompletableFuture: Async Primitive and Parallel Array Processing

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — The core async future primitive

What This Part Should Cover Premium

Part 2 — Array-splitting parallel processing

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Write your answer