PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Data Manipulation (SQL/Python)/Motive

Fetch and parse JSON from REST API

Last updated: Jun 15, 2026

Quick Overview

A Motive Software Engineer onsite coding exercise: build a small, resilient script from scratch that performs an authenticated HTTP GET against a paginated REST API, parses the JSON, extracts and transforms records, and produces both serialized CSV/JSON output and per-category top-k aggregates. It evaluates HTTP/REST integration, rate-limit and retry handling with backoff, timeouts, malformed-record handling, unit testing, logging, and complexity analysis.

  • Medium
  • Motive
  • Data Manipulation (SQL/Python)
  • Software Engineer

Fetch and parse JSON from REST API

Company: Motive

Role: Software Engineer

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

##### Question Set up a minimal coding environment from scratch and write runnable code that fetches and processes data from a REST API. You are given an endpoint that returns a JSON payload (for example, fleet/courier tracking records with fields such as `order_id`, `courier_id`, `latitude`, `longitude`, and `timestamp`). Perform an HTTP GET against the endpoint, parse the response, extract the specified fields, and produce the required output. Walk the interviewer through your approach as you go. Your solution should address all of the following: 1. **HTTP integration.** Perform the HTTP GET and parse the JSON response. Handle authentication (e.g., bearer token / API key). 2. **Robustness.** Handle pagination, rate limiting (backoff/retry with exponential backoff), request timeouts, and malformed or partial records without crashing the run. 3. **Field extraction & transformation.** Extract the specified fields and transform the records into the required target schema. 4. **Output #1 — serialized records.** Output the transformed records as CSV and/or JSON. 5. **Output #2 — aggregation.** Compute aggregates per category and return the top-k items. 6. **Quality.** Include basic unit tests, plus error handling and logging. 7. **Analysis.** Explain your approach, the edge cases you considered, and the time/space complexity of your solution.

Quick Answer: A Motive Software Engineer onsite coding exercise: build a small, resilient script from scratch that performs an authenticated HTTP GET against a paginated REST API, parses the JSON, extracts and transforms records, and produces both serialized CSV/JSON output and per-category top-k aggregates. It evaluates HTTP/REST integration, rate-limit and retry handling with backoff, timeouts, malformed-record handling, unit testing, logging, and complexity analysis.

Solution

This is a hands-on coding exercise: the interviewer wants to watch you build a small, correct, and resilient API-ingestion script from an empty directory. There is no single "right" answer — they are grading code organization, error handling, and how you reason about edge cases. A clean reference implementation in Python: ```python import csv import json import time import logging from collections import Counter, defaultdict from typing import Iterator, Optional import requests logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") log = logging.getLogger(__name__) BASE_URL = "https://api.example.com/v1/records" MAX_RETRIES = 5 TIMEOUT = 10 # seconds def fetch_page(session: requests.Session, url: str, params: dict) -> dict: """GET one page with timeout, retry/backoff, and rate-limit handling.""" backoff = 1.0 for attempt in range(1, MAX_RETRIES + 1): try: resp = session.get(url, params=params, timeout=TIMEOUT) if resp.status_code == 429: # rate limited wait = float(resp.headers.get("Retry-After", backoff)) log.warning("Rate limited; sleeping %.1fs (attempt %d)", wait, attempt) time.sleep(wait) backoff *= 2 continue resp.raise_for_status() return resp.json() except (requests.Timeout, requests.ConnectionError) as e: log.warning("Transient error %s; retrying in %.1fs", e, backoff) time.sleep(backoff) backoff *= 2 raise RuntimeError(f"Failed to fetch {url} after {MAX_RETRIES} attempts") def iter_records(token: str) -> Iterator[dict]: """Iterate all records across pages. Yields raw record dicts.""" session = requests.Session() session.headers.update({"Authorization": f"Bearer {token}"}) cursor: Optional[str] = None while True: params = {"limit": 100} if cursor: params["cursor"] = cursor payload = fetch_page(session, BASE_URL, params) for rec in payload.get("data", []): yield rec cursor = payload.get("next_cursor") if not cursor: break REQUIRED = ("order_id", "courier_id", "latitude", "longitude", "timestamp") def transform(rec: dict) -> Optional[dict]: """Project to target schema; return None for malformed/partial records.""" try: return { "order_id": str(rec["order_id"]), "courier_id": str(rec["courier_id"]), "latitude": float(rec["latitude"]), "longitude": float(rec["longitude"]), "timestamp": str(rec["timestamp"]), } except (KeyError, TypeError, ValueError) as e: log.warning("Dropping malformed record: %s (%s)", rec, e) return None def collect(token: str) -> list[dict]: rows, dropped = [], 0 for raw in iter_records(token): row = transform(raw) if row is None: dropped += 1 else: rows.append(row) log.info("Collected %d rows, dropped %d malformed", len(rows), dropped) return rows def write_outputs(rows: list[dict]) -> None: # Output #1: serialized records as JSON and CSV. with open("out.json", "w") as f: json.dump(rows, f, indent=2) if rows: with open("out.csv", "w", newline="") as f: w = csv.DictWriter(f, fieldnames=list(REQUIRED)) w.writeheader() w.writerows(rows) def top_k_per_category(rows: list[dict], key: str, value: str, k: int) -> dict: """Output #2: aggregate a metric by category, then top-k. Example: count of orders per courier_id, top k busiest couriers.""" counts: Counter = Counter(r[key] for r in rows) return dict(counts.most_common(k)) if __name__ == "__main__": import os rows = collect(os.environ["API_TOKEN"]) write_outputs(rows) print(top_k_per_category(rows, key="courier_id", value="order_id", k=10)) ``` **Key points to surface to the interviewer:** - **Auth:** send the token on a `requests.Session` so it is reused across paginated calls. - **Pagination:** drive it off whatever the API exposes — a cursor/`next_cursor`, a `page` parameter, or a `Link` header. Stop when the next token is absent. - **Rate limiting & retries:** treat HTTP 429 specially (honor `Retry-After`), and use exponential backoff for transient timeouts/connection errors. Cap the number of retries so a dead endpoint can't hang the job forever. - **Timeouts:** always pass an explicit `timeout=` to every request; a missing timeout can block indefinitely. - **Malformed/partial data:** validate and coerce per record, drop (and log) the bad ones, and keep going rather than aborting the whole run. Count what you dropped. - **Streaming:** generator-based iteration (`iter_records`) keeps memory bounded — you don't have to hold every page in memory at once. - **Outputs:** support both shapes the question asks for — the flat serialized rows (CSV/JSON) and the aggregation (per-category counts, then `most_common(k)` for top-k). - **Testing:** unit-test `transform` (good/malformed records) and `fetch_page` (mock `requests` to assert 429 backoff and timeout-retry behavior) with `responses` or `unittest.mock`. **Complexity:** O(n) time to fetch, transform, and aggregate n records. The per-category counter is O(c) extra space for c distinct categories, and `most_common(k)` is O(c log k). Streaming keeps the live memory footprint to roughly one page plus the kept rows, which is O(n) only if you must materialize all outputs.

Explanation

Open-ended implementation exercise scored on code structure, defensive error handling, and clear reasoning rather than one exact answer. The reference covers the full superset of asks across the merged posts: authenticated paginated GET, 429/Retry-After plus exponential backoff, explicit timeouts, per-record validation that drops malformed/partial rows, transformation to a target schema, serialized CSV/JSON output, per-category aggregation with top-k, unit tests, logging, and a time/space complexity discussion.
|Home/Data Manipulation (SQL/Python)/Motive

Fetch and parse JSON from REST API

Motive logo
Motive
Aug 13, 2025, 12:00 AM
MediumSoftware EngineerOnsiteData Manipulation (SQL/Python)
6
0
Question

Set up a minimal coding environment from scratch and write runnable code that fetches and processes data from a REST API. You are given an endpoint that returns a JSON payload (for example, fleet/courier tracking records with fields such as order_id, courier_id, latitude, longitude, and timestamp). Perform an HTTP GET against the endpoint, parse the response, extract the specified fields, and produce the required output. Walk the interviewer through your approach as you go.

Your solution should address all of the following:

  1. HTTP integration. Perform the HTTP GET and parse the JSON response. Handle authentication (e.g., bearer token / API key).
  2. Robustness. Handle pagination, rate limiting (backoff/retry with exponential backoff), request timeouts, and malformed or partial records without crashing the run.
  3. Field extraction & transformation. Extract the specified fields and transform the records into the required target schema.
  4. Output #1 — serialized records. Output the transformed records as CSV and/or JSON.
  5. Output #2 — aggregation. Compute aggregates per category and return the top-k items.
  6. Quality. Include basic unit tests, plus error handling and logging.
  7. Analysis. Explain your approach, the edge cases you considered, and the time/space complexity of your solution.
Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Motive•More Software Engineer•Motive Software Engineer•Motive Data Manipulation (SQL/Python)•Software Engineer Data Manipulation (SQL/Python)

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.