This question evaluates a candidate's ability to design end-to-end streaming machine learning systems, including online text preprocessing, tokenization, embedding generation, continuous model training, and low-latency classification serving.
You are given a continuously arriving stream of text data for a classification task. Design an end-to-end machine learning system that:
Explain your choices for data preprocessing, tokenization, embedding generation, model architecture, training strategy, evaluation metrics, and deployment. Also discuss how you would handle large data volume, model updates, and consistency between training and serving.