You are designing a lightweight load balancer for a Python-based backend service that dispatches tasks to a pool of worker processes.
Describe how you would design the load balancer with the following requirements:
-
Worker State Machine
-
Each worker can be in states such as:
IDLE
,
BUSY
,
FAILED
,
DRAINING
, etc.
-
The load balancer must track each worker's state and only assign new tasks to eligible workers.
-
State transitions should be well-defined (e.g.,
IDLE -> BUSY -> IDLE
,
BUSY -> FAILED
, etc.).
-
Task Dispatching with a Priority Queue
-
Incoming tasks have priorities (e.g., higher priority tasks should be processed first).
-
Use a priority queue (or similar) so that the dispatcher always assigns the highest-priority available task to a suitable worker.
-
Handle the case where tasks may expire or time out if not processed within a deadline.
-
Dynamic Scaling (Scale Up / Scale Down)
-
The system should automatically scale out (add workers) when load increases and scale in (remove workers) when load decreases.
-
Explain what metrics you would monitor (e.g., queue length, task latency, worker utilization) and how they drive scaling decisions.
-
Describe how to safely drain and remove workers without losing or duplicating tasks.
-
Timeouts and Reliability
-
If a worker does not complete a task within a configured timeout, the task should be retried or reassigned.
-
Workers can fail or become unreachable; the load balancer must detect this and transition their state appropriately.
-
Ensure at-least-once processing of tasks while minimizing duplicate processing.
-
Implementation Considerations
-
Assume this system will be implemented in Python.
-
Discuss the core components/classes you would define (e.g.,
Worker
,
Task
,
Scheduler
,
PriorityQueue
abstraction).
-
Explain the data structures to track workers, their states, and tasks in the queue.
-
Clarify how concurrency is handled: threads vs processes vs async IO.
Explain your design end-to-end. Include how tasks enter the system, how they are scheduled and executed, how worker states are updated, and how the system remains consistent and resilient under failures and scaling events.