You are asked to design the backend for a large-scale hotel booking system that runs behind a very high-traffic consumer app (think a TikTok-like app where a hotel goes viral and suddenly millions of users click into the same property).
Users can:
-
Browse hotels for a city and date range.
-
View
near real-time availability and prices
for a specific hotel.
-
Place a
booking request
and receive a
confirmation or rejection
.
The interviewer gives you the high-level requirements and asks you to reason carefully about business properties first, then derive the architecture and key trade-offs.
Functional requirements
-
Search for hotels by city, date range, and basic filters.
-
For a given hotel and date range, show up-to-date availability (rooms left) and price.
-
Create a booking for a chosen hotel, room type, and date range.
-
Guarantee that
the same room-night is not double-booked
.
-
Optionally support cancellation (you can keep this high-level).
Non-functional requirements
-
High concurrency:
-
Assume up to ~100k read requests/sec (availability checks) and ~10k booking attempts/sec at peak.
-
Low latency:
-
P99 latency for availability checks and booking confirmation should be
< 200 ms
end-to-end from the client’s perspective.
-
High availability:
-
The system must continue working if individual nodes or even a whole zone go down.
-
Consistency characteristics:
-
Weak/eventual consistency is acceptable for displaying availability counts on the UI.
A user may occasionally see an outdated count.
-
Strong correctness is required for final booking confirmation
(no double-booking).
-
Message loss tolerance:
-
For streaming / push-style
availability updates
to clients, it is acceptable if
some update messages are lost
(they will be refreshed soon anyway).
-
It is
not acceptable
to lose actual booking requests or confirmations.
-
Hotspot handling:
-
Some hotels may become
extremely hot
(e.g., after a viral video), causing a huge, skewed load.
-
The system should
avoid concentrating all traffic for one hot hotel on a single node
.
-
Latency vs reliability trade-off:
-
You should discuss
protocol choices
(e.g., HTTP vs WebSocket vs UDP or similar) and how they impact latency and reliability.
Specific discussion points the interviewer cares about
-
Latency vs reliability:
-
When and why might you choose a low-overhead protocol such as
UDP or WebSocket-style persistent connections
instead of plain HTTP for certain flows?
-
Which parts of the system can tolerate message loss, and which cannot?
-
Preventing double bookings for the same hotel/room:
-
Under high concurrency, how do you ensure two users cannot both successfully book the
last available room-night
?
-
Discuss techniques such as
rate limiting, request coalescing/merging, throttling, and concurrency control
(locks, atomic counters, queues, etc.).
-
Sharding / horizontal scaling for hot hotels:
-
How would you horizontally partition the load for a very hot hotel?
-
Compare strategies like
sharding by room
(e.g., room ID or room type) versus
bucketizing by user
(e.g., hashing on user ID) and discuss pros/cons.
Assume you are free to pick any technologies (e.g., relational vs NoSQL databases, caches like Redis, message queues like Kafka, etc.).
Task:
Design the system at a high level:
-
Identify the major components/services.
-
Propose a data model for hotels, rooms, inventory, and bookings.
-
Describe the end-to-end flows for
availability lookup
and
booking confirmation
.
-
Explain how your design achieves:
-
Low latency with acceptable reliability trade-offs.
-
No double-booking despite high concurrency.
-
Good handling of
hot hotels
via sharding/partitioning.
-
Explicitly call out the key trade-offs and why you made those choices.