Design backend to score and classify tweets
Company: xAI
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
You are asked to build a small backend application that processes Twitter data and uses an external LLM-style API to score and classify each tweet.
### Requirements
1. **Data Source**
- Use a public Twitter dataset from Kaggle (assume it is provided as CSV/JSON files with fields like tweet id, user id, timestamp, text, etc.).
2. **Data Modeling and Ingestion**
- Define a strict data model for a tweet (e.g., using a schema-validation library like Pydantic in Python or an equivalent in your language of choice).
- Validate each record from the Kaggle dataset against this model.
- Ingest the validated tweets into a **local database** (you may choose a concrete technology, e.g., PostgreSQL, SQLite, or another relational/NoSQL store, but be prepared to justify your choice).
3. **Inference Worker**
- Implement an **inference worker service** that:
- Fetches tweets from the database that have not yet been processed.
- Calls a third-party LLM-like API (referred to as the **Grok API**) to:
- Assign a numeric score to each tweet (e.g., sentiment score, relevance score, or toxicity score).
- Classify each tweet into one or more categories (e.g., sentiment classes like positive/neutral/negative, or topical categories such as sports/politics/entertainment). You may define a reasonable category scheme.
- Stores the score and classification results back into the database, associated with the corresponding tweet.
- Your worker should be robust:
- Handle API failures, timeouts, rate limits, and retries.
- Avoid double-processing the same tweet.
4. **Validation / Quality Check**
- Propose and implement at least one method to **validate or sanity-check** the quality of the scores and classifications. Examples:
- Simple heuristic checks (e.g., tweets containing obvious positive/negative words vs. model sentiment).
- Manual spot-checking tools (e.g., an endpoint or script that samples random tweets and displays their text and labels for review).
- Aggregate statistics (e.g., distribution of sentiment scores; flag anomalies).
5. **Interface / Demo (Conceptual)**
- Assume you will record a short demo of your system. You do **not** need to design a full UI here, but:
- Describe what API endpoints, CLI commands, or simple views you would expose to demonstrate the system working end-to-end (from raw Kaggle data to classified tweets stored in the DB).
6. **Non-Functional Requirements**
- Discuss how your design would handle:
- **Scalability** if the Kaggle dataset is large (millions of tweets) or if you later switch from a one-time batch to continuous ingestion.
- **Reliability** (e.g., preventing data loss, reprocessing after failures).
- **Observability** (logging, metrics, and monitoring of ingestion and inference pipelines).
### What to Deliver (in the interview)
In the interview, you should:
- Describe the **overall architecture** of your system: components, their responsibilities, and data flows.
- Justify your choices of technologies (language, database, libraries, queueing mechanisms, etc.).
- Detail the **data model** for tweets and for storing inference results.
- Explain the design of the **inference worker**, including error handling and concurrency.
- Explain your **validation** approach and how you would use it in practice.
- Optionally, outline how you would extend this system if you needed to support:
- Different scoring/classification tasks.
- Multiple LLM/ML providers instead of just the Grok API.
Quick Answer: This question evaluates backend system design skills including data modeling and ingestion, integration with external LLM-style inference APIs, worker orchestration for scoring and classification, and operational competencies such as scalability, reliability, and observability.