How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Coreweave.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Coreweave during technical interviews.

Implement web data fetch and storage tool | Coreweave Coding Question

Quick Overview

This question evaluates proficiency in HTTP client communication with token-based authentication, parsing structured responses into data models, relational database interaction including handling duplicate records, configuration management, and robust error handling and testing.

Implement web data fetch and storage tool

Company: Coreweave

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

### Problem You are asked to implement a small program that retrieves data from a remote web service protected by token-based authentication, parses the response, and stores the parsed data into a database. ### Requirements 1. **Remote request with token-based authentication** - The program should send an HTTP(S) request to a given URL (e.g., provided via a config file or command-line argument). - The remote service uses **token-based authentication** (e.g., a bearer token). - The program should attach the given token to the request in the appropriate HTTP header (for example, `Authorization: Bearer <token>`). 2. **Parse response content** - Assume the remote service returns a structured response (e.g., JSON) containing a list of items. - Each item has a few fields such as an `id`, a `timestamp`, and a `message` (you can assume reasonable names and types if not specified exactly). - The program should parse the response body and extract these fields into in-memory objects/records. 3. **Store into a database** - The program should connect to a relational database (e.g., PostgreSQL or MySQL) using a connection string or config. - Create (or assume the existence of) a table with appropriate columns to store the parsed fields (e.g., `id`, `timestamp`, `message`). - Insert the parsed records into the table. - Handle duplicates in a sensible way (for example, avoid inserting the same `id` more than once, or perform an upsert). 4. **Error handling and robustness** - Handle common errors such as: - Network failures or timeouts when calling the remote service. - Non-2xx HTTP status codes. - Invalid or unexpected response formats. - Database connection or insertion errors. - Log errors or print meaningful messages so that a user or operator can understand what went wrong. 5. **Execution interface** - The program can be a command-line tool. - It should accept at least: - The remote service URL. - The authentication token. - Database connection information. Describe how you would design and implement this program, including: - How you would structure the code (e.g., separation between HTTP client, parser, and database layer). - How you would manage configuration (URL, token, DB credentials). - How you would test it (unit tests, integration tests, mocking the remote service and DB).

Quick Answer: This question evaluates proficiency in HTTP client communication with token-based authentication, parsing structured responses into data models, relational database interaction including handling duplicate records, configuration management, and robust error handling and testing.

In this simplified coding version of a web fetch-and-store tool, actual HTTP calls and database operations are simulated with Python data structures. You are given a provided authentication token, the expected token required by the remote service, a list of fetch attempts, and a list of records already stored in the database. Each fetch attempt is a dictionary with keys 'status' and 'body'. A successful response must have a 2xx status code, a dictionary body, and an 'items' field containing a list. Process attempts in order and stop at the first successful, well-formed response. Then parse its items and upsert them into the database. Each valid item must be a dictionary containing an integer 'id', an integer 'timestamp', and a string 'message'. Use 'id' as the primary key. If an incoming item has a newer or equal timestamp than the stored record, overwrite the stored record; otherwise ignore it. Invalid items are ignored. Return a summary of the operation including the final stored records sorted by id.

Constraints

0 <= len(attempts) <= 10^5
0 <= len(existing_records) <= 10^5
The total number of items inside the chosen response is at most 10^5
IDs and timestamps are integers in the range [-10^9, 10^9]
Process fetch attempts in the given order and stop at the first valid 2xx response with a body of the form {'items': [...]}

Examples

Input: ('secret', 'secret', [{'status': 500, 'body': {'items': []}}, {'status': 200, 'body': {'items': [{'id': 1, 'timestamp': 120, 'message': 'new'}, {'id': 2, 'timestamp': 90, 'message': 'hello'}, {'id': 2, 'timestamp': 80, 'message': 'stale'}, {'id': 'x', 'timestamp': 5, 'message': 'bad'}, {'id': 3, 'timestamp': 50, 'message': 'same'}]}}], [(1, 100, 'old'), (3, 50, 'keep')])

Expected Output: {'result': 'ok', 'records': [(1, 120, 'new'), (2, 90, 'hello'), (3, 50, 'same')], 'inserted': 1, 'updated': 2, 'ignored': 2, 'errors': 1}

Explanation: The first attempt fails with HTTP 500, so errors becomes 1. The second attempt is valid. Record 1 is updated to timestamp 120, record 2 is inserted, the stale duplicate for record 2 is ignored, the malformed item with id='x' is ignored, and record 3 is overwritten because equal timestamps are allowed to replace the stored message.

Input: ('wrong', 'secret', [{'status': 200, 'body': {'items': [{'id': 2, 'timestamp': 1, 'message': 'ignored'}]}}], [(1, 10, 'a')])

Expected Output: {'result': 'auth_error', 'records': [(1, 10, 'a')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 1}

Explanation: Authentication fails before any fetch attempt is processed, so the database remains unchanged.

Input: ('secret', 'secret', [{'status': None, 'body': None}, {'status': 403, 'body': {}}, {'status': 200, 'body': ['bad']}], [])

Expected Output: {'result': 'fetch_error', 'records': [], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 3}

Explanation: The first attempt simulates a network failure, the second is a non-2xx response, and the third has an invalid body format. No usable response is found.

Input: ('token', 'token', [{'status': 200, 'body': {'items': []}}], [(2, 5, 'x')])

Expected Output: {'result': 'ok', 'records': [(2, 5, 'x')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 0}

Explanation: A valid response is found immediately, but it contains no items, so the stored data stays the same.

Input: ('abc', 'abc', [], [(5, 7, 'saved')])

Expected Output: {'result': 'fetch_error', 'records': [(5, 7, 'saved')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 0}

Explanation: There are no fetch attempts at all, so no valid response can be chosen and the database remains unchanged.

Solution

def solution(provided_token, expected_token, attempts, existing_records):
    def snapshot(db):
        return sorted((record_id, ts, message) for record_id, (ts, message) in db.items())

    db = {}
    for record_id, ts, message in existing_records:
        db[record_id] = (ts, message)

    if provided_token != expected_token:
        return {
            'result': 'auth_error',
            'records': snapshot(db),
            'inserted': 0,
            'updated': 0,
            'ignored': 0,
            'errors': 1
        }

    errors = 0
    items = None

    for attempt in attempts:
        if not isinstance(attempt, dict):
            errors += 1
            continue

        status = attempt.get('status')
        body = attempt.get('body')

        if type(status) is not int or not (200 <= status <= 299):
            errors += 1
            continue

        if not isinstance(body, dict):
            errors += 1
            continue

        body_items = body.get('items')
        if not isinstance(body_items, list):
            errors += 1
            continue

        items = body_items
        break

    if items is None:
        return {
            'result': 'fetch_error',
            'records': snapshot(db),
            'inserted': 0,
            'updated': 0,
            'ignored': 0,
            'errors': errors
        }

    inserted = 0
    updated = 0
    ignored = 0

    for item in items:
        if not isinstance(item, dict):
            ignored += 1
            continue

        record_id = item.get('id')
        ts = item.get('timestamp')
        message = item.get('message')

        if type(record_id) is not int or type(ts) is not int or not isinstance(message, str):
            ignored += 1
            continue

        if record_id not in db:
            db[record_id] = (ts, message)
            inserted += 1
        else:
            current_ts, _ = db[record_id]
            if ts >= current_ts:
                db[record_id] = (ts, message)
                updated += 1
            else:
                ignored += 1

    return {
        'result': 'ok',
        'records': snapshot(db),
        'inserted': inserted,
        'updated': updated,
        'ignored': ignored,
        'errors': errors
    }

Time complexity: O(E + A + K), where E is the number of existing records, A is the number of fetch attempts scanned until a valid one is found (or all attempts if none is valid), and K is the number of items in the chosen response. Space complexity: O(E), for the in-memory database map.

Hints

A hash map keyed by record id makes database upserts efficient.
Separate the problem into three phases: authenticate, find the first usable response, then parse and apply item updates.

Quick Overview