How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Bloomberg.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Bloomberg during technical interviews.

Implement classes within an abstract Python framework

Quick Overview

This question evaluates a candidate's skills in object-oriented design, implementing concrete classes from abstract interfaces, file I/O and streaming data transformations, error handling, and integration with a registry and CLI within a Python codebase.

Company: Bloomberg

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given an existing Python codebase (~200 lines shown) that defines an abstract base class DataProcessor with abstract methods load(self), transform(self, record), and save(self, records); a registry/factory that instantiates processors by name; and a CLI that wires them together. Without editing the abstractions, implement a concrete CsvToJsonProcessor that: ( 1) reads CSV files in chunks; ( 2) transforms each row to a normalized JSON object; ( 3) writes line-delimited JSON to an output path; ( 4) handles bad rows with a retry/logging policy; and ( 5) integrates with the registry so the CLI can invoke it by the name 'csv_to_json'. Provide the class implementation outline and explain the control flow through the existing template methods.

Quick Answer: This question evaluates a candidate's skills in object-oriented design, implementing concrete classes from abstract interfaces, file I/O and streaming data transformations, error handling, and integration with a registry and CLI within a Python codebase.

You are working inside an existing Python framework that already defines an abstract base class `DataProcessor` with template methods `load`, `transform`, and `save`, plus a registry/factory and CLI wiring. Your task is to implement the core behavior of a concrete processor named `csv_to_json`. To keep the problem self-contained and testable, you will write a function that simulates what the concrete `CsvToJsonProcessor` would do: - `load`: read CSV records in contiguous chunks of size `chunk_size` - `transform`: normalize one row at a time into a JSON-ready object - `save`: emit line-delimited JSON records in the same order as successful rows - bad rows: retry transformation up to `max_retries` times after the first failure; if the row still fails, log it by recording its 0-based row index and skip it Normalization rules: 1. Normalize each header by trimming whitespace, lowercasing, and replacing runs of spaces with a single underscore. 2. The normalized header is guaranteed to contain `id`, `name`, and `age`, and normalized headers are unique. 3. A row is bad if its column count does not match the header length. 4. Required fields: `id`, `name`, `age`. - `id`: convert to integer - `name`: trim and collapse internal whitespace to a single space; it must not be empty - `age`: convert to integer and it must be non-negative 5. Optional fields: - empty string becomes `None` - `email`, if present and non-empty, is lowercased - all other optional fields are trimmed but otherwise unchanged 6. Output each successful record as a compact JSON string (one JSON object per line conceptually). Return a tuple `(output_lines, failed_rows)` where: - `output_lines` is a list of JSON strings for successfully processed rows - `failed_rows` is a list of 0-based row indices that still failed after all retries This captures the control flow of the framework's template methods without requiring actual file I/O.

Constraints

0 <= len(rows) <= 100000
1 <= len(header) <= 50
1 <= chunk_size <= 10000
0 <= max_retries <= 5
The normalized header contains the keys `id`, `name`, and `age`
Normalized header names are unique

Examples

Input: ([' ID ', ' Name ', 'Age', ' Email '], [['1', ' Alice ', '30', 'ALICE@EXAMPLE.COM'], ['x2', 'Bob', '22', 'bob@example.com'], ['3', ' Carol Danvers ', '41', ''], ['4', ' ', '28', 'dave@example.com'], ['5', 'Eve', '0', 'EVE@EXAMPLE.COM']], 2, 1)

Expected Output: (['{"id":1,"name":"Alice","age":30,"email":"alice@example.com"}', '{"id":3,"name":"Carol Danvers","age":41,"email":null}', '{"id":5,"name":"Eve","age":0,"email":"eve@example.com"}'], [1, 3])

Explanation: Rows 1 and 3 fail because `id` is not an integer and `name` becomes empty after trimming. The other rows are normalized and emitted as compact JSON.

Input: (['id', 'name', 'age'], [], 3, 2)

Expected Output: ([], [])

Explanation: No data rows means no output lines and no failures.

Input: ([' ID ', ' Name ', ' AGE ', ' City ', ' Email '], [['7', ' Frank ', '52', ' New York ', ' FRANK@EXAMPLE.COM '], ['8', 'Grace', '33', ' ', ''], ['9', 'Henry']], 2, 0)

Expected Output: (['{"id":7,"name":"Frank","age":52,"city":"New York","email":"frank@example.com"}', '{"id":8,"name":"Grace","age":33,"city":null,"email":null}'], [2])

Explanation: The first two rows succeed. In row 2, `city` and `email` become `None` after trimming. The last row has the wrong number of columns and fails immediately.

Input: ([' ID', 'Name', ' Age ', ' Extra Note '], [['10', ' Ivy Jones ', '-1', ' hello '], ['11', 'Jack', '27', ' keep spaces inside ']], 1, 2)

Expected Output: (['{"id":11,"name":"Jack","age":27,"extra_note":"keep spaces inside"}'], [0])

Explanation: The first row fails because age is negative. The second row succeeds, and the optional `extra_note` field is trimmed but its internal multiple spaces are preserved.

Solution

def solution(headers, rows, chunk_size, max_retries):
    import json
    import re

    if chunk_size <= 0:
        raise ValueError("chunk_size must be positive")
    if max_retries < 0:
        raise ValueError("max_retries must be non-negative")

    space_re = re.compile(r"\s+")

    def normalize_header(header):
        return space_re.sub("_", header.strip().lower())

    normalized_headers = [normalize_header(h) for h in headers]
    output_lines = []
    failed_rows = []
    attempts_per_row = max_retries + 1

    for start in range(0, len(rows), chunk_size):
        chunk = rows[start:start + chunk_size]
        for offset, row in enumerate(chunk):
            row_index = start + offset
            success = False

            for _ in range(attempts_per_row):
                try:
                    if len(row) != len(normalized_headers):
                        raise ValueError("column count mismatch")

                    record = {}
                    for key, raw_value in zip(normalized_headers, row):
                        value = raw_value.strip()

                        if key == "id":
                            record[key] = int(value)
                        elif key == "name":
                            name = space_re.sub(" ", value)
                            if not name:
                                raise ValueError("empty name")
                            record[key] = name
                        elif key == "age":
                            age = int(value)
                            if age < 0:
                                raise ValueError("negative age")
                            record[key] = age
                        else:
                            if value == "":
                                record[key] = None
                            elif key == "email":
                                record[key] = value.lower()
                            else:
                                record[key] = value

                    output_lines.append(json.dumps(record, separators=(",", ":")))
                    success = True
                    break
                except (ValueError, TypeError):
                    continue

            if not success:
                failed_rows.append(row_index)

    return (output_lines, failed_rows)

Time complexity: O(r * c * (max_retries + 1)), where r is the number of rows and c is the number of columns. Space complexity: O(c) auxiliary space, excluding the returned output.

Hints

Precompute the normalized column names once, then use a helper to validate and transform a single row.
Process rows chunk by chunk, but keep the original row index so failed rows can be logged correctly.

Quick Overview