PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's skills in object-oriented design, implementing concrete classes from abstract interfaces, file I/O and streaming data transformations, error handling, and integration with a registry and CLI within a Python codebase.

  • Medium
  • Bloomberg
  • Coding & Algorithms
  • Data Engineer

Implement classes within an abstract Python framework

Company: Bloomberg

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given an existing Python codebase (~200 lines shown) that defines an abstract base class DataProcessor with abstract methods load(self), transform(self, record), and save(self, records); a registry/factory that instantiates processors by name; and a CLI that wires them together. Without editing the abstractions, implement a concrete CsvToJsonProcessor that: ( 1) reads CSV files in chunks; ( 2) transforms each row to a normalized JSON object; ( 3) writes line-delimited JSON to an output path; ( 4) handles bad rows with a retry/logging policy; and ( 5) integrates with the registry so the CLI can invoke it by the name 'csv_to_json'. Provide the class implementation outline and explain the control flow through the existing template methods.

Quick Answer: This question evaluates a candidate's skills in object-oriented design, implementing concrete classes from abstract interfaces, file I/O and streaming data transformations, error handling, and integration with a registry and CLI within a Python codebase.

You are working inside an existing Python framework that already defines an abstract base class `DataProcessor` with template methods `load`, `transform`, and `save`, plus a registry/factory and CLI wiring. Your task is to implement the core behavior of a concrete processor named `csv_to_json`. To keep the problem self-contained and testable, you will write a function that simulates what the concrete `CsvToJsonProcessor` would do: - `load`: read CSV records in contiguous chunks of size `chunk_size` - `transform`: normalize one row at a time into a JSON-ready object - `save`: emit line-delimited JSON records in the same order as successful rows - bad rows: retry transformation up to `max_retries` times after the first failure; if the row still fails, log it by recording its 0-based row index and skip it Normalization rules: 1. Normalize each header by trimming whitespace, lowercasing, and replacing runs of spaces with a single underscore. 2. The normalized header is guaranteed to contain `id`, `name`, and `age`, and normalized headers are unique. 3. A row is bad if its column count does not match the header length. 4. Required fields: `id`, `name`, `age`. - `id`: convert to integer - `name`: trim and collapse internal whitespace to a single space; it must not be empty - `age`: convert to integer and it must be non-negative 5. Optional fields: - empty string becomes `None` - `email`, if present and non-empty, is lowercased - all other optional fields are trimmed but otherwise unchanged 6. Output each successful record as a compact JSON string (one JSON object per line conceptually). Return a tuple `(output_lines, failed_rows)` where: - `output_lines` is a list of JSON strings for successfully processed rows - `failed_rows` is a list of 0-based row indices that still failed after all retries This captures the control flow of the framework's template methods without requiring actual file I/O.

Constraints

  • 0 <= len(rows) <= 100000
  • 1 <= len(header) <= 50
  • 1 <= chunk_size <= 10000
  • 0 <= max_retries <= 5
  • The normalized header contains the keys `id`, `name`, and `age`
  • Normalized header names are unique

Examples

Input: ([' ID ', ' Name ', 'Age', ' Email '], [['1', ' Alice ', '30', 'ALICE@EXAMPLE.COM'], ['x2', 'Bob', '22', 'bob@example.com'], ['3', ' Carol Danvers ', '41', ''], ['4', ' ', '28', 'dave@example.com'], ['5', 'Eve', '0', 'EVE@EXAMPLE.COM']], 2, 1)

Expected Output: (['{"id":1,"name":"Alice","age":30,"email":"alice@example.com"}', '{"id":3,"name":"Carol Danvers","age":41,"email":null}', '{"id":5,"name":"Eve","age":0,"email":"eve@example.com"}'], [1, 3])

Explanation: Rows 1 and 3 fail because `id` is not an integer and `name` becomes empty after trimming. The other rows are normalized and emitted as compact JSON.

Input: (['id', 'name', 'age'], [], 3, 2)

Expected Output: ([], [])

Explanation: No data rows means no output lines and no failures.

Input: ([' ID ', ' Name ', ' AGE ', ' City ', ' Email '], [['7', ' Frank ', '52', ' New York ', ' FRANK@EXAMPLE.COM '], ['8', 'Grace', '33', ' ', ''], ['9', 'Henry']], 2, 0)

Expected Output: (['{"id":7,"name":"Frank","age":52,"city":"New York","email":"frank@example.com"}', '{"id":8,"name":"Grace","age":33,"city":null,"email":null}'], [2])

Explanation: The first two rows succeed. In row 2, `city` and `email` become `None` after trimming. The last row has the wrong number of columns and fails immediately.

Input: ([' ID', 'Name', ' Age ', ' Extra Note '], [['10', ' Ivy Jones ', '-1', ' hello '], ['11', 'Jack', '27', ' keep spaces inside ']], 1, 2)

Expected Output: (['{"id":11,"name":"Jack","age":27,"extra_note":"keep spaces inside"}'], [0])

Explanation: The first row fails because age is negative. The second row succeeds, and the optional `extra_note` field is trimmed but its internal multiple spaces are preserved.

Solution

def solution(headers, rows, chunk_size, max_retries):
    import json
    import re

    if chunk_size <= 0:
        raise ValueError("chunk_size must be positive")
    if max_retries < 0:
        raise ValueError("max_retries must be non-negative")

    space_re = re.compile(r"\s+")

    def normalize_header(header):
        return space_re.sub("_", header.strip().lower())

    normalized_headers = [normalize_header(h) for h in headers]
    output_lines = []
    failed_rows = []
    attempts_per_row = max_retries + 1

    for start in range(0, len(rows), chunk_size):
        chunk = rows[start:start + chunk_size]
        for offset, row in enumerate(chunk):
            row_index = start + offset
            success = False

            for _ in range(attempts_per_row):
                try:
                    if len(row) != len(normalized_headers):
                        raise ValueError("column count mismatch")

                    record = {}
                    for key, raw_value in zip(normalized_headers, row):
                        value = raw_value.strip()

                        if key == "id":
                            record[key] = int(value)
                        elif key == "name":
                            name = space_re.sub(" ", value)
                            if not name:
                                raise ValueError("empty name")
                            record[key] = name
                        elif key == "age":
                            age = int(value)
                            if age < 0:
                                raise ValueError("negative age")
                            record[key] = age
                        else:
                            if value == "":
                                record[key] = None
                            elif key == "email":
                                record[key] = value.lower()
                            else:
                                record[key] = value

                    output_lines.append(json.dumps(record, separators=(",", ":")))
                    success = True
                    break
                except (ValueError, TypeError):
                    continue

            if not success:
                failed_rows.append(row_index)

    return (output_lines, failed_rows)

Time complexity: O(r * c * (max_retries + 1)), where r is the number of rows and c is the number of columns. Space complexity: O(c) auxiliary space, excluding the returned output.

Hints

  1. Precompute the normalized column names once, then use a helper to validate and transform a single row.
  2. Process rows chunk by chunk, but keep the original row index so failed rows can be logged correctly.
Last updated: May 18, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Minimize Travel Assignment Cost - Bloomberg (medium)
  • Determine Balloon Popping Time - Bloomberg (medium)
  • Solve meeting and tree problems - Bloomberg (easy)
  • Minimize travel cost with two cities - Bloomberg (easy)
  • Check connectivity between two subway stations - Bloomberg (easy)