System Design: Input Capture, Ordering, and "What Did the User Input?"
Context
Design a client-side input SDK that runs inside a desktop application process (Windows/macOS/Linux). It must capture keyboard and mouse input from multiple devices, reconstruct what the user typed (including IME/composition), and expose APIs for state queries, replay, and persistence.
Assume the app has access to OS event streams (e.g., raw input/Quartz/evdev) and the text input pipeline (key events, text/IME composition events). The system should add minimal latency and be thread-safe.
Requirements
-
Input coverage
-
Keyboard: key press/release, modifiers, auto-repeat, dead keys.
-
Text input: IME/composition (start/update/end), committed text, deletions.
-
Mouse: button press/release, movement, scroll (wheel/touch), cursor position.
-
Multi-device: multiple keyboards/mice connected simultaneously.
-
Ordering and interpretation
-
Preserve total ordering across devices (global timeline) with stable sequencing.
-
Debounce noisy hardware for keys/buttons.
-
Detect double-clicks with configurable time/position thresholds.
-
Reconstruct the typed text reliably across locales and IMEs.
-
APIs
-
Query current state: which keys/modifiers are down, pointer state/position, active composition, last-click metadata.
-
Query derived text: "what did the user type" over a time/window or session.
-
Replay inputs deterministically.
-
Persist sessions and reload.
-
Engineering
-
Specify data models (event schemas), classes/interfaces.
-
Event queues, buffering strategy, backpressure.
-
Concurrency and thread-safety model.
-
Latency targets and non-functional goals.
-
Failure handling and recovery.
Provide a detailed design covering the above, with clear assumptions and trade-offs.