Implement web data fetch and storage tool
Company: Coreweave
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency in HTTP client communication with token-based authentication, parsing structured responses into data models, relational database interaction including handling duplicate records, configuration management, and robust error handling and testing.
Constraints
- 0 <= len(attempts) <= 10^5
- 0 <= len(existing_records) <= 10^5
- The total number of items inside the chosen response is at most 10^5
- IDs and timestamps are integers in the range [-10^9, 10^9]
- Process fetch attempts in the given order and stop at the first valid 2xx response with a body of the form {'items': [...]}
Examples
Input: ('secret', 'secret', [{'status': 500, 'body': {'items': []}}, {'status': 200, 'body': {'items': [{'id': 1, 'timestamp': 120, 'message': 'new'}, {'id': 2, 'timestamp': 90, 'message': 'hello'}, {'id': 2, 'timestamp': 80, 'message': 'stale'}, {'id': 'x', 'timestamp': 5, 'message': 'bad'}, {'id': 3, 'timestamp': 50, 'message': 'same'}]}}], [(1, 100, 'old'), (3, 50, 'keep')])
Expected Output: {'result': 'ok', 'records': [(1, 120, 'new'), (2, 90, 'hello'), (3, 50, 'same')], 'inserted': 1, 'updated': 2, 'ignored': 2, 'errors': 1}
Explanation: The first attempt fails with HTTP 500, so errors becomes 1. The second attempt is valid. Record 1 is updated to timestamp 120, record 2 is inserted, the stale duplicate for record 2 is ignored, the malformed item with id='x' is ignored, and record 3 is overwritten because equal timestamps are allowed to replace the stored message.
Input: ('wrong', 'secret', [{'status': 200, 'body': {'items': [{'id': 2, 'timestamp': 1, 'message': 'ignored'}]}}], [(1, 10, 'a')])
Expected Output: {'result': 'auth_error', 'records': [(1, 10, 'a')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 1}
Explanation: Authentication fails before any fetch attempt is processed, so the database remains unchanged.
Input: ('secret', 'secret', [{'status': None, 'body': None}, {'status': 403, 'body': {}}, {'status': 200, 'body': ['bad']}], [])
Expected Output: {'result': 'fetch_error', 'records': [], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 3}
Explanation: The first attempt simulates a network failure, the second is a non-2xx response, and the third has an invalid body format. No usable response is found.
Input: ('token', 'token', [{'status': 200, 'body': {'items': []}}], [(2, 5, 'x')])
Expected Output: {'result': 'ok', 'records': [(2, 5, 'x')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 0}
Explanation: A valid response is found immediately, but it contains no items, so the stored data stays the same.
Input: ('abc', 'abc', [], [(5, 7, 'saved')])
Expected Output: {'result': 'fetch_error', 'records': [(5, 7, 'saved')], 'inserted': 0, 'updated': 0, 'ignored': 0, 'errors': 0}
Explanation: There are no fetch attempts at all, so no valid response can be chosen and the database remains unchanged.
Solution
def solution(provided_token, expected_token, attempts, existing_records):
def snapshot(db):
return sorted((record_id, ts, message) for record_id, (ts, message) in db.items())
db = {}
for record_id, ts, message in existing_records:
db[record_id] = (ts, message)
if provided_token != expected_token:
return {
'result': 'auth_error',
'records': snapshot(db),
'inserted': 0,
'updated': 0,
'ignored': 0,
'errors': 1
}
errors = 0
items = None
for attempt in attempts:
if not isinstance(attempt, dict):
errors += 1
continue
status = attempt.get('status')
body = attempt.get('body')
if type(status) is not int or not (200 <= status <= 299):
errors += 1
continue
if not isinstance(body, dict):
errors += 1
continue
body_items = body.get('items')
if not isinstance(body_items, list):
errors += 1
continue
items = body_items
break
if items is None:
return {
'result': 'fetch_error',
'records': snapshot(db),
'inserted': 0,
'updated': 0,
'ignored': 0,
'errors': errors
}
inserted = 0
updated = 0
ignored = 0
for item in items:
if not isinstance(item, dict):
ignored += 1
continue
record_id = item.get('id')
ts = item.get('timestamp')
message = item.get('message')
if type(record_id) is not int or type(ts) is not int or not isinstance(message, str):
ignored += 1
continue
if record_id not in db:
db[record_id] = (ts, message)
inserted += 1
else:
current_ts, _ = db[record_id]
if ts >= current_ts:
db[record_id] = (ts, message)
updated += 1
else:
ignored += 1
return {
'result': 'ok',
'records': snapshot(db),
'inserted': inserted,
'updated': updated,
'ignored': ignored,
'errors': errors
}Time complexity: O(E + A + K), where E is the number of existing records, A is the number of fetch attempts scanned until a valid one is found (or all attempts if none is valid), and K is the number of items in the chosen response. Space complexity: O(E), for the in-memory database map.
Hints
- A hash map keyed by record id makes database upserts efficient.
- Separate the problem into three phases: authenticate, find the first usable response, then parse and apply item updates.