System Design: End-to-End Web App for Interacting with a GPT-like Model
Context
You are designing a multi-tenant, browser-based SaaS application that allows users to interact with a GPT-like LLM. The app must support real-time token streaming, snapshotting conversations, search over saved artifacts, and sharing with role-based access. Assume you can call out to an external model provider and you control the rest of the stack.
Requirements
-
Real-time chat
-
Let users enter a prompt, submit, establish a session/connection to the model, and stream tokens to the UI in real time.
-
Snapshots
-
Save a snapshot of the current state, including: conversation messages, system prompt, model/version, tuning parameters (temperature, top_p), with timestamps.
-
Search
-
Support full-text and metadata search over saved snapshots by content, tags, creator, and date.
-
Sharing and access control
-
Share snapshots via links and role-based access: public/unlisted/team; roles include view/comment/duplicate.
Deliverables
Describe:
-
High-level architecture (frontend, backend, data stores, search/indexing)
-
API design (key endpoints and contracts)
-
Data schemas (core entities and relationships)
-
Real-time transport choice (SSE vs WebSocket), streaming backpressure, idempotency, retries, error handling
-
Authentication/authorization, multi-tenant isolation, rate limiting, auditing, cost tracking, privacy/compliance
-
Frontend state management for live streaming (pause/resume, partial updates, optimistic UI), snapshot versioning, preventing data loss on network interruptions
-
Scalability (stateless services, session affinity, caching), consistency choices, observability (logs, traces, metrics), deployment strategy, capacity estimation
-
Testing strategies (unit, contract, E2E) and how to enable offline drafts that later sync