Implement classes within an abstract Python framework
Company: Bloomberg
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's skills in object-oriented design, implementing concrete classes from abstract interfaces, file I/O and streaming data transformations, error handling, and integration with a registry and CLI within a Python codebase.
Constraints
- 0 <= len(rows) <= 100000
- 1 <= len(header) <= 50
- 1 <= chunk_size <= 10000
- 0 <= max_retries <= 5
- The normalized header contains the keys `id`, `name`, and `age`
- Normalized header names are unique
Examples
Input: ([' ID ', ' Name ', 'Age', ' Email '], [['1', ' Alice ', '30', 'ALICE@EXAMPLE.COM'], ['x2', 'Bob', '22', 'bob@example.com'], ['3', ' Carol Danvers ', '41', ''], ['4', ' ', '28', 'dave@example.com'], ['5', 'Eve', '0', 'EVE@EXAMPLE.COM']], 2, 1)
Expected Output: (['{"id":1,"name":"Alice","age":30,"email":"alice@example.com"}', '{"id":3,"name":"Carol Danvers","age":41,"email":null}', '{"id":5,"name":"Eve","age":0,"email":"eve@example.com"}'], [1, 3])
Explanation: Rows 1 and 3 fail because `id` is not an integer and `name` becomes empty after trimming. The other rows are normalized and emitted as compact JSON.
Input: (['id', 'name', 'age'], [], 3, 2)
Expected Output: ([], [])
Explanation: No data rows means no output lines and no failures.
Input: ([' ID ', ' Name ', ' AGE ', ' City ', ' Email '], [['7', ' Frank ', '52', ' New York ', ' FRANK@EXAMPLE.COM '], ['8', 'Grace', '33', ' ', ''], ['9', 'Henry']], 2, 0)
Expected Output: (['{"id":7,"name":"Frank","age":52,"city":"New York","email":"frank@example.com"}', '{"id":8,"name":"Grace","age":33,"city":null,"email":null}'], [2])
Explanation: The first two rows succeed. In row 2, `city` and `email` become `None` after trimming. The last row has the wrong number of columns and fails immediately.
Input: ([' ID', 'Name', ' Age ', ' Extra Note '], [['10', ' Ivy Jones ', '-1', ' hello '], ['11', 'Jack', '27', ' keep spaces inside ']], 1, 2)
Expected Output: (['{"id":11,"name":"Jack","age":27,"extra_note":"keep spaces inside"}'], [0])
Explanation: The first row fails because age is negative. The second row succeeds, and the optional `extra_note` field is trimmed but its internal multiple spaces are preserved.
Solution
def solution(headers, rows, chunk_size, max_retries):
import json
import re
if chunk_size <= 0:
raise ValueError("chunk_size must be positive")
if max_retries < 0:
raise ValueError("max_retries must be non-negative")
space_re = re.compile(r"\s+")
def normalize_header(header):
return space_re.sub("_", header.strip().lower())
normalized_headers = [normalize_header(h) for h in headers]
output_lines = []
failed_rows = []
attempts_per_row = max_retries + 1
for start in range(0, len(rows), chunk_size):
chunk = rows[start:start + chunk_size]
for offset, row in enumerate(chunk):
row_index = start + offset
success = False
for _ in range(attempts_per_row):
try:
if len(row) != len(normalized_headers):
raise ValueError("column count mismatch")
record = {}
for key, raw_value in zip(normalized_headers, row):
value = raw_value.strip()
if key == "id":
record[key] = int(value)
elif key == "name":
name = space_re.sub(" ", value)
if not name:
raise ValueError("empty name")
record[key] = name
elif key == "age":
age = int(value)
if age < 0:
raise ValueError("negative age")
record[key] = age
else:
if value == "":
record[key] = None
elif key == "email":
record[key] = value.lower()
else:
record[key] = value
output_lines.append(json.dumps(record, separators=(",", ":")))
success = True
break
except (ValueError, TypeError):
continue
if not success:
failed_rows.append(row_index)
return (output_lines, failed_rows)Time complexity: O(r * c * (max_retries + 1)), where r is the number of rows and c is the number of columns. Space complexity: O(c) auxiliary space, excluding the returned output.
Hints
- Precompute the normalized column names once, then use a helper to validate and transform a single row.
- Process rows chunk by chunk, but keep the original row index so failed rows can be logged correctly.