Solve Five OA Coding Tasks
Company: Upstart
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
This is an online-assessment screen consisting of **five independent coding tasks**. Each task is self-contained — there is no shared state between them — and is graded against its own hidden test suite. Implement a function for every task below.
The tasks escalate in subtlety: the first three are warm-up string/list manipulations, while the last two (digit-comparison and robots-file parsing) hide most of the edge cases that decide whether you pass.
---
### Constraints & Assumptions
- Each task is graded independently; partial credit is awarded per task, so attempt all five even if one stumps you.
- Inputs are well-formed for their type (Part 1 receives a string, Part 2/3 receive lists, Part 4 receives non-negative integers, Part 5 receives a list of strings). You do not need to validate types.
- Part 3's input list is guaranteed non-empty. Coordinates may include negative or duplicate points.
- Part 4's integers are non-negative and fit in a standard 64-bit integer; `statedSum` need not equal `num1 + num2` (that is the whole point).
- Part 5's lines may contain leading/trailing whitespace, blank lines, comment-like lines, and `Disallow:`/`User-agent:` directives in any interleaving. Prefix matching (`"User-agent:"`, `"Disallow:"`) is exact and case-sensitive on the directive name.
- Aim for linear time in the size of each input; none of these tasks needs better than $O(n)$.
### Clarifying Questions to Ask
- General: Should every function **return** its result (not print it), and are there hidden tests for empty inputs (empty string, empty list, single line / single element)?
- Part 1: Should a `'.'` in the middle (or at the start/end) of a group still be replaced before the trailing `'!'` is appended, and do tabs / newlines count as whitespace separators (not just the space character)?
- Part 2: Is `0` treated as even, and may the input contain negative integers (which makes the language's modulo sign-convention relevant)?
- Part 3: Is the required output `[minX, minY, width, height]` rather than `[minX, minY, maxX, maxY]`, and what are `width` / `height` when all points coincide?
- Part 4: When `correct` and `statedSum` have different digit counts, do I pad the shorter conceptually with leading zeros and report **every** differing position — including high-order positions that exist only in the longer value?
- Part 5: Must dedup preserve first-occurrence order across the whole file, and is matching truly case-sensitive for both the directive prefixes and the URL values?
---
### Part 1 — Add drama to text
Given a string `s`, return a new string in which:
- every period `'.'` is replaced with an exclamation mark `'!'`, and
- one additional exclamation mark is appended after each **group**, where a *group* is a maximal contiguous run of non-whitespace characters.
Whitespace between groups must be preserved exactly (the kind and amount of whitespace is unchanged). Only the non-whitespace runs gain a trailing `'!'`.
Example: `"hi. there"` → `"hi!! there!"` (the first group `"hi."` becomes `"hi!"` after the period swap, then gains a trailing `!` → `"hi!!"`; the second group `"there"` gains a trailing `!`).
```hint Where to start
Tokenize while remembering the gaps. Split the string into alternating *non-whitespace runs* and *whitespace runs* so you can transform each run and then re-stitch them in order — a regex like `\S+` for groups (or a single left-to-right scan tracking "am I currently inside a group?") makes the boundaries explicit.
```
```hint Edge cases to nail
Decide the behavior for: leading/trailing whitespace, an empty string, a string that is all whitespace (no groups → no extra `!`), and consecutive whitespace. The `'.'`→`'!'` replacement applies to *every* period, including ones in the middle of a group, **before** you append the per-group `!`. A lone `"."` is a group and becomes `"!!"`.
```
#### What a Strong Answer Covers
- Tokenization that preserves whitespace runs verbatim (kind and amount) and only mutates the non-whitespace groups.
- Correct ordering of the two transforms: replace `'.'`→`'!'` across the whole group, then append exactly one trailing `'!'`.
- Boundary behavior: empty string, all-whitespace string, leading/trailing/consecutive whitespace, and a lone `"."`.
### Part 2 — Filter and reverse even numbers
Given a list of integers `nums`, remove every odd number and return the remaining even numbers in the **reverse** of their original relative order.
Example: `[1, 2, 3, 4, 6, 5]` → `[6, 4, 2]`.
```hint Approach
Two independent steps: keep only the evens, then reverse the surviving order. Watch the modulo of negatives in your language (in Python `-4 % 2 == 0`, so negatives behave; in C/Java/JS `%` can return a negative remainder, so test the result against `0` rather than against `1`, or use a bitwise parity check). Remember `0` is even.
```
#### What a Strong Answer Covers
- A correct parity test that survives negative inputs (compare `% 2 == 0`, never `== 1`, or use `(x & 1) == 0`) and treats `0` as even.
- The output is reversed relative to the original even-element order, not the full-list order.
- Empty / all-odd inputs return an empty list.
### Part 3 — Compute a bounding rectangle
Given a non-empty list of coordinates, where each coordinate is a two-element list `[x, y]`, return the four-element list `[minX, minY, width, height]` where:
- `minX` is the minimum x over all points,
- `minY` is the minimum y over all points,
- `width = maxX - minX`,
- `height = maxY - minY`.
```hint Approach
A single pass tracking `minX, maxX, minY, maxY` is enough — no need to materialize separate x/y lists. The output is `[minX, minY, maxX - minX, maxY - minY]`, **not** `[minX, minY, maxX, maxY]`; a single point (or all coincident points) yields width/height of `0`.
```
#### What a Strong Answer Covers
- The output **shape** is `[minX, minY, width, height]`, where width/height are spans (differences), not the max coordinates.
- A single-pass extrema scan with correct initialization (the non-empty guarantee makes the first point a safe seed).
- Negative, duplicate, and coincident points (zero-area rectangle) handled correctly.
### Part 4 — Find incorrect digit positions in a sum
Given three non-negative integers `num1`, `num2`, and `statedSum`:
- Compute the correct value `correct = num1 + num2`.
- Compare `statedSum` against `correct` **digit by digit, from the ones place upward**: index `0` is the ones digit, index `1` the tens, index `2` the hundreds, and so on.
- Return the list of zero-based indexes where the two digits differ (in ascending order).
- The two values may have different digit lengths. Treat any missing higher-order digit (a position present in one value but past the most-significant digit of the other) as `0`, and keep comparing until **every** digit position of **both** `correct` and `statedSum` has been examined.
Example: `num1 = 99`, `num2 = 1`, `statedSum = 190` → `correct = 100`. Comparing per position: index 0 → `0` vs `0` (match), index 1 → `0` vs `9` (differ), index 2 → `1` vs `1` (match). Result: `[1]`.
```hint Where to start
Don't convert to strings and zip — string indexing runs most-significant-digit-first, but the index here counts from the *ones* place. Think about how to read out the ones digit, then the tens, then the hundreds, of each number directly (peel with `% 10` and `// 10`), so position `0` is the first thing you compare.
```
```hint Loop bound
The two values can have different digit counts, so ask yourself when the comparison is actually finished — stopping as soon as the *shorter* number runs out silently drops real mismatches in the leading digits of the longer one (e.g. `correct = 100` vs `statedSum = 5` differs at index 2, the hundreds place, which you only see if you don't quit early). Treat a position that exists in one number but not the other as a digit `0` there. Also sanity-check the all-zero case `correct == statedSum == 0`.
```
#### What a Strong Answer Covers
- Comparison anchored at the **ones** place (index 0), not at the most-significant digit — i.e. it does not string-zip from the left.
- The loop runs to the **longer** of the two values, substituting `0` for the exhausted number's missing high-order digits.
- The all-equal case returns `[]`, the all-zero case returns `[]`, and results come out in ascending index order.
### Part 5 — Extract disallowed URLs for bot user agents
Given `lines`, a list of strings that together form the contents of a robots-style text file (one element per line), extract disallowed URLs as follows:
- The file is divided into **sections**. A new section begins at any line that starts with the exact prefix `"User-agent:"`. The remainder of that line (after the prefix), trimmed of surrounding spaces, is the section's **user-agent value**.
- A section is **relevant** if its user-agent value either is exactly `"*"` **or** ends with the exact suffix `"Bot"`.
- Inside a relevant section, for each line that starts with the exact prefix `"Disallow:"`, collect the URL value — the remainder after the prefix, trimmed of surrounding spaces.
- A section's scope ends when the next `"User-agent:"` line begins or the file ends. Lines belonging to a non-relevant section are ignored, as is any `"Disallow:"` that appears before the first `"User-agent:"` line.
- Return all collected URLs with duplicates removed, **preserving the order of first occurrence**. Matching is **case-sensitive**, so `"/A"` and `"/a"` are distinct URLs.
```hint State machine
This is a one-pass scan where your state is "am I currently inside a section I care about?". Each `"User-agent:"` line opens a new section, so it's the moment to re-evaluate that state from the section's value; each `"Disallow:"` line is only interesting while the state says the current section qualifies. Be careful applying the relevance rule precisely: `"*Bot"` ends with `"Bot"` but `"Bot*"` does not, and `"*"` must match the whole value, not merely contain it.
```
```hint Dedup with order
A plain `set` removes duplicates but loses order, which the output requires. You need a way to skip a URL you've already emitted while still keeping each URL's *first* appearance in order (e.g. a `seen` set paired with a result list, or `list(dict.fromkeys(...))`). Because matching is case-sensitive, do **not** lowercase anything before comparing or storing.
```
#### What a Strong Answer Covers
- A relevance flag recomputed on **every** `User-agent:` line, so a non-bot section suppresses its `Disallow:` lines and the next qualifying section re-enables harvesting.
- Precise relevance matching: exact `"*"` and exact `"Bot"` **suffix**, both case-sensitive; pre-first-section `Disallow:` ignored.
- Order-preserving, case-sensitive dedup of the collected URLs.
---
### What a Strong Answer Covers
These dimensions span all five parts.
- **Correctness on the stated examples** for all five tasks, plus the non-obvious edge cases each task hides (empty/whitespace-only input, single element, all-odd list, coincident points, length-mismatched numbers, interleaved/irrelevant robots sections).
- **The two decisive edge cases** that most often separate full from partial credit: Part 4 driving the loop on the *longer* of the two values (not the shorter), and Part 5 preserving first-occurrence order while staying case-sensitive.
- **Linear-time, single-pass** implementations where natural, with clear variable names and no quadratic re-scans.
- **Defensive handling** of language-specific gotchas (negative modulo, off-by-one in digit indexing, regex vs. manual tokenization for groups).
- **Self-tests / dry-runs** on the provided examples to catch mistakes before submitting.
### Follow-up Questions
- Part 1: How would you generalize "drama" so the appended punctuation and the replaced character are configurable, and how would Unicode whitespace (full-width spaces, non-breaking spaces) change your tokenizer?
- Part 4: If `num1`, `num2`, and `statedSum` were given as decimal *strings* far exceeding 64-bit range, how would your approach change, and how would you handle a comparison in a non-decimal base?
- Part 5: Real `robots.txt` allows `Allow:` directives, longest-match precedence, wildcards (`*`, `$`), and grouping where consecutive `User-agent` lines share one rule block. How would you extend the parser, and how would you decide whether a given URL is crawlable for a specific agent?
- General: How would you structure these five functions and their tests so the suite runs fast and each failure points unambiguously at the responsible task?
Quick Answer: This multi-part problem evaluates programming fundamentals including string manipulation, list filtering and reversal, parsing, and simple geometric computations, with emphasis on correctness, edge-case handling, and implementation efficiency.