Design bookstore and chat messaging systems
Company: Databricks
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
You are given two independent system-design problems.
---
## Question A — Design an Online Bookstore Backend
Design the backend for a large-scale online bookstore (similar in spirit to an e-commerce bookstore).
### Functional requirements
- Users can:
- Browse books by category, author, etc.
- Search for books by title, author, ISBN, keywords.
- View detailed book information (price, description, reviews, inventory status, etc.).
- Add/remove books from a shopping cart.
- Place orders and pay for them.
- View order history and order status.
- Admins can:
- Add/update/remove books and their metadata.
- Manage inventory (stock counts per book, per warehouse/region if needed).
### Non-functional requirements
- System should support at least millions of users and a very large catalog (millions of books).
- Fast read performance for browsing and searching.
- Reasonable consistency for inventory and order placement (no significant overselling or double-charging).
- High availability and fault tolerance.
### Tasks
1. Clarify and state any additional assumptions you want to make (e.g., scale, traffic patterns, read/write ratio, regions).
2. Propose a **high-level architecture**, including:
- Main services/components (e.g., catalog service, search service, cart service, order service, payment service, inventory service, user service).
- Data storage choices for each component (e.g., SQL vs NoSQL, search index, caches).
3. Design:
- Data model at a high level (key entities and relationships: Book, Author, Inventory, User, Cart, Order, etc.).
- Key APIs (for example, search books, add to cart, checkout, update inventory).
4. Explain **how you will maintain inventory correctness** when many users place orders at the same time.
5. Discuss how you would **scale** the system:
- Horizontal scaling of services.
- Sharding or partitioning strategies for data.
- Use of caching/CDN.
6. Address **fault tolerance and reliability**:
- What happens if part of the system fails?
- How to handle partial failures around payments and orders (idempotency, retries, etc.).
---
## Question B — Design a Real-Time Chat Application
Design the backend architecture for a large-scale, real-time chat application that supports both one-to-one and group messaging.
### Functional requirements
- Users can:
- Send and receive 1:1 messages.
- Participate in group chats.
- See message history for their conversations.
- See whether contacts are online/offline.
- Optionally see typing indicators and read receipts.
- Messages should arrive in near real-time when users are online.
- Messages sent while a user is offline should be delivered when they come back online.
### Non-functional requirements
- System should support at least millions of daily active users.
- Low latency for message delivery (ideally < 200–500 ms end-to-end under normal load).
- High availability — the service should tolerate machine and zone failures.
- Messages should not be lost once the sender gets a success acknowledgment.
- Message ordering should be well-defined within a conversation.
### Tasks
1. State reasonable **assumptions** about scale (number of users, QPS, message volume), message size, and whether you must support multi-device login.
2. Propose a **high-level architecture**, including:
- Client connections (e.g., WebSocket/long-polling/HTTP streaming) and an edge/gateway layer.
- Core backend services (e.g., chat service, message service/store, presence service, user service, notification service).
- Any messaging/queueing infrastructure (e.g., pub/sub, message queues) needed.
3. Design the **data model** at a high level:
- Entities like User, Conversation (1:1 or group), Membership, Message.
- How you store and query message history.
4. Describe the **message send/receive flow**:
- How a message goes from Sender → Server(s) → Recipients.
- How you ensure that each message is delivered **at least once** and avoid duplicates being shown to the user.
- How you define and maintain ordering within a conversation.
5. Explain how you would **scale** the system:
- How to shard conversations or messages.
- How to handle a large number of long-lived connections.
6. Discuss **presence, read receipts, and typing indicators**:
- How to track and propagate online/offline status.
- How to design read receipts and typing indicators without overloading the system.
7. Address **reliability and failure scenarios**:
- What happens if a server with active connections dies.
- How clients recover (reconnect and resync missed messages).
- Data durability and backups.
Quick Answer: This question evaluates a candidate's competency in large-scale system design, including distributed systems architecture, data modeling, consistency and concurrency control, scalability, availability, fault tolerance, API design, and real-time messaging patterns.