Storage, Indexing, APIs, And Secure Execution
Asked of: Software Engineer
Last updated
What's being tested
This cluster tests whether you can design stateful backend systems where storage layout, indexing, API semantics, authorization, replication, and execution safety all interact. Google cares because many production services are not just CRUD APIs: they require global consistency tradeoffs, high-throughput storage, secure multi-tenancy, and clear failure behavior under partial outages. The interviewer is probing for your ability to turn requirements into data models, choose the right consistency guarantees, expose APIs that survive retries and concurrency, and defend the system against abuse or corruption. Strong answers make tradeoffs explicit instead of assuming one database, one cache, or one queue solves everything.
Core knowledge
-
Storage model selection should follow access patterns, not preference. Use
`Postgres`/`Spanner`for transactional metadata, object storage such as`GCS`/`S3`for blobs, log-structured storage for append-heavy workloads, and search indexes such as`Elasticsearch`/`Lucene`only when query shape justifies index maintenance cost. -
Content-addressable storage stores data by hash, for example
`sha256(content) -> blob`, enabling deduplication across files or tenants. Collision risk is negligible with cryptographic hashes, but authorization must not be based on “knowing the hash”; metadata ACLs still gate access to each logical file. -
Reference counting for deduplicated blobs works when increments/decrements are transactional with file metadata. The hard case is garbage collection: avoid deleting blobs immediately after count reaches zero unless you handle races with in-flight writers, usually via mark-and-sweep, tombstones, or delayed deletion windows.
-
Append-only logs are typically partitioned by key or stream, with offsets assigned monotonically per partition. Throughput scales roughly with number of partitions: but ordering is only guaranteed within one partition, not globally.
-
Replication choices define durability and latency. Leader-follower replication with quorum writes can require for strong reads in quorum systems, while asynchronous replication improves write latency but risks data loss or stale reads during failover.
-
Indexing is a write/read tradeoff. A primary key index supports point lookups, secondary indexes support alternate access paths, inverted indexes support text search, and sparse/time indexes reduce cost for logs. Every index increases write amplification and backfill complexity.
-
API idempotency is mandatory for unreliable networks. For mutating endpoints like
`POST /submissions`or`POST /notes/{id}/edits`, accept an`Idempotency-Key`and store request fingerprints so client retries do not create duplicate executions, files, or log segments. -
Concurrency control can be pessimistic locks, optimistic version checks, or operation-based merges. Collaborative editing usually needs CRDTs or operational transformation when multiple clients edit offline; simpler
`version`/`etag`compare-and-swap is acceptable for low-contention metadata. -
Consistency semantics must be stated per operation. A restaurant menu update may need read-your-writes for admins but eventual consistency to devices; a coding judge submission needs durable enqueue before returning success; a collaborative note may need low-latency local echo with later convergence.
-
Secure execution for untrusted code requires sandboxing, not just process separation. Use containers,
`seccomp`, namespaces, cgroups, read-only filesystems, network isolation, CPU/memory/time limits, and per-run scratch directories; assume submitted code is malicious, fork-bombs, exfiltrates, or exploits syscalls. -
Multi-tenant isolation spans data, compute, and quotas. Store tenant ownership in metadata rows, enforce authorization at every API boundary, rate-limit expensive operations, and avoid cross-tenant dedupe leaks such as revealing that another tenant already uploaded a file.
-
Observability and recovery need first-class design. Track
`p50`/`p99`latency, error rate, queue depth, replication lag, judge sandbox failures, index freshness, and garbage-collection backlog. Design replay paths from durable logs and use checksums to detect storage corruption.
Worked example
For Design an Online Coding Judge Platform, a strong candidate starts by clarifying the core flow: users submit code for a problem, the system compiles/runs it against test cases, returns verdicts, and stores submission history. In the first 30 seconds, ask about expected submissions per second, languages supported, maximum runtime, whether test cases are public/private, and whether verdict latency or throughput is more important. Declare assumptions such as “I’ll optimize for bursty contest traffic, untrusted code, and per-submission results within seconds.”
The answer skeleton should have four pillars: an API layer, durable submission storage, asynchronous execution workers, and sandboxed judging infrastructure. The API might expose `POST /submissions`, `GET /submissions/{id}`, and `GET /problems/{id}`, with idempotency keys on submission creation. Metadata can live in `Spanner` or `Postgres`, large test artifacts in object storage, and a queue such as `Pub/Sub` or `Kafka` can decouple request traffic from judge capacity.
The most important design decision to flag is that execution workers must not trust user code. Each run should execute inside an isolated sandbox with `cgroups` limits, `seccomp` syscall filtering, disabled outbound network, bounded filesystem, and strict wall-clock timeout. Another explicit tradeoff: pre-warming language runtimes improves latency but increases worker cost and isolation complexity. Close by saying that, with more time, you would cover contest-scale fairness, plagiarism detection hooks, regional capacity, and detailed replay/debug tooling for disputed verdicts.
A second angle
For Design deduplicated file storage on filesystem, the same storage and API thinking applies, but the center of gravity shifts from compute isolation to blob identity, metadata correctness, and lifecycle management. Instead of sandboxing untrusted programs, the risk is unauthorized data inference, cross-tenant leakage, and corrupted reference counts. A good design separates logical file metadata from physical chunks: `file_id -> owner, path, chunk_hashes` and `chunk_hash -> location, size, ref_count`. The API should make upload retries safe, perhaps by using resumable upload sessions and chunk-level checksums. The key tradeoff is dedupe granularity: whole-file dedupe is simpler and cheaper, while chunk-level dedupe saves more storage but increases metadata size, random reads, and garbage-collection complexity.
Common pitfalls
Pitfall: Treating storage as “just use a database.”
A tempting weak answer is to put notes, logs, files, menus, and submissions all in one relational table model. What lands better is mapping each object to its dominant access pattern: append logs need segment files and offset indexes, files need blob storage plus metadata, menus need versioned configs and rollout windows, and submissions need durable state transitions.
Pitfall: Hand-waving consistency instead of naming guarantees.
Saying “eventual consistency is fine” or “we need strong consistency” globally is usually too vague. Specify where guarantees matter: exactly-once user-visible submission creation via idempotency, at-least-once worker processing with idempotent state transitions, read-your-writes for admin menu edits, and convergence for collaborative note replicas.
Pitfall: Mentioning security only at the end.
For a coding judge or multi-tenant storage service, security is not an add-on. A better answer introduces threat models early: malicious code execution, tenant boundary violations, hash oracle attacks, quota abuse, and unauthorized reads through caches or indexes.
Connections
Interviewers may pivot from here into distributed consensus, cache invalidation, schema evolution, rate limiting, or disaster recovery. They may also ask for deeper treatment of CRDTs for collaborative editing, LSM trees and compaction for log storage, or capability-based authorization for secure APIs.
Further reading
-
The Google File System — classic paper on chunked distributed storage, replication, and operational tradeoffs.
-
Bigtable: A Distributed Storage System for Structured Data — useful for understanding tablets, sparse indexes, and large-scale storage design.
-
Operating Systems: Three Easy Pieces — strong foundation for processes, isolation, filesystems, and resource management relevant to secure execution.
Featured in interview prep guides
Practice questions
- Design an Online Coding Judge PlatformGoogle · Software Engineer · Onsite · medium
- How to host many domains on one IP?Google · Software Engineer · Technical Screen · medium
- Design a global restaurant menu update systemGoogle · Software Engineer · Onsite · hard
- Design lexicographic range query word storeGoogle · Software Engineer · Onsite · hard
- Design executable notebook service APIsGoogle · Software Engineer · Technical Screen · hard
- Design deduplicated file storage on filesystemGoogle · Software Engineer · Technical Screen · hard
- Design line-preserving file chunker pipelineGoogle · Software Engineer · Technical Screen · hard
- Design key management serviceGoogle · Software Engineer · Technical Screen · hard
- Design task scheduler with dependenciesGoogle · Software Engineer · Technical Screen · hard
- Design distributed log storage serviceGoogle · Software Engineer · Technical Screen · hard
- Design viewing history and resume serviceGoogle · Software Engineer · Technical Screen · hard
- Design relational-to-NoSQL migration pipelineGoogle · Software Engineer · Technical Screen · hard
Related concepts
- Scalable Backend Architecture And Data ModelingSystem Design
- Distributed Key-Value Storage And TransactionsSystem Design
- API Design, Data Modeling, and IndexingSystem Design
- Scalable Service And Distributed System DesignSystem Design
- Security, Multitenancy, And AuthorizationSystem Design
- Durable Key-Value Stores And CachesSystem Design