Design Instagram (Feed, Photos, and Friend Recommendations)
Company: Anthropic
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
# Design Instagram (Feed, Photos, and Friend Recommendations)
Design the backend for a photo-sharing social network like Instagram. Users follow other users, upload posts (an image plus a caption), and open a home feed showing recent posts from the accounts they follow, newest first. The system must serve a very read-heavy workload at large scale and also surface "people you may want to follow" recommendations.
### Constraints & Assumptions
- Hundreds of millions of users; reads vastly outnumber writes (feed opens ≫ posts created), on the order of a 100:1 read/write ratio.
- A post is an image (stored in object storage / served via CDN) plus a caption and metadata; feeds show reverse-chronological recent posts (treat ranking as out of scope unless asked).
- The follow graph is directed (A follows B does not imply B follows A) and highly skewed: some accounts have tens of millions of followers.
- Targets: home-feed load p99 well under a few hundred milliseconds; high availability; a brief delay before a new post appears in followers' feeds is acceptable.
### Clarifying Questions to Ask
- What is the scale (DAU, posts/day, average and maximum follower counts) and the read:write ratio?
- Is the feed strictly reverse-chronological or ranked? Is eventual consistency (a few seconds before a post appears in feeds) acceptable?
- How skewed is the follow graph — do we must-handle celebrity accounts with tens of millions of followers?
- What media sizes and formats must we support, and what are the latency targets for image load vs. feed metadata?
- Is the friend-recommendation requirement "good enough suggestions" or a precise quality bar with specific signals?
### Part 1 — Requirements, data model, and APIs
Lay out functional and non-functional requirements, the core data model (users, follow graph, posts, media), and the main APIs (create post with image upload, follow/unfollow, get home feed). Describe how images are uploaded and served separately from post metadata.
```hint Separate media from metadata
Upload the image to object storage (often via a pre-signed URL straight from the client) and store only a media URL/key on the post row; serve images through a CDN, not your app servers.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Feed generation: fan-out on write vs. on read
Design how a user's home feed is produced. Compare **fan-out on write** (push a new post id into each follower's precomputed feed at post time) with **fan-out on read** (gather and merge followees' recent posts at read time). Recommend an approach, justify it for a read-heavy workload, and explain how it handles celebrity accounts and how it serves high read concurrency.
```hint Push vs pull
Fan-out on write makes reads cheap (read your precomputed feed) at the cost of expensive writes; fan-out on read makes writes cheap but reads expensive. With reads ≫ writes, default to push — then patch its weakness.
```
```hint The celebrity problem
Pushing one post to 50M follower feeds is a write storm. Use a hybrid: fan-out on write for normal accounts, fan-out on read for a small set of huge accounts, and merge the two at feed-read time.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Scaling: partitioning, media delivery, and recommendations
Scale the system. Cover how the data and feed stores are partitioned across many nodes (and why **consistent hashing** is used to add/remove nodes with minimal reshuffling), how images are stored and delivered globally, replication for read scale and availability, and a first-cut design for the friend/follow recommendation feature.
```hint Consistent hashing
Sharding by `user_id` with a simple `hash % N` reshuffles almost everything when `N` changes. Consistent hashing (with virtual nodes) moves only ~1/N of keys when a node is added or removed, and evens out hotspots.
```
```hint Recommendations starting point
"People you may follow" is largely a graph problem: rank candidates by friends-of-friends (shared-connection count) plus signals like mutual follows, shared interests/affinity, and popularity; precompute offline and refresh periodically.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- A user with 5,000 followees opens their feed — how do you keep that read fast under fan-out-on-write, and what's the cost of fan-out-on-read for them?
- When a user unfollows someone, how do the already-pushed posts get removed (or do they), and what consistency guarantee do you offer?
- How would you evolve the strictly chronological feed into a ranked feed without rebuilding the whole pipeline?
- How do you keep the precomputed feed store from growing without bound (per-user feed length caps, TTLs, recomputation on read for inactive users)?
Quick Answer: This question evaluates a candidate's ability to design a large-scale, read-heavy social media backend, covering data modeling for follow graphs, media storage, and feed generation. It tests system design skills such as choosing fan-out strategies for feed delivery, handling highly skewed follower distributions, and reasoning about latency and availability trade-offs at scale.