PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/Thumbtack

Compare list/dict; parse JSON/CSV at scale

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of Python data structures (list vs dict), algorithmic time and memory complexity, ordering guarantees in CPython, large-scale parsing and streaming of JSON/CSV, and robust data-cleaning and error-handling strategies.

  • Medium
  • Thumbtack
  • Data Manipulation (SQL/Python)
  • Data Scientist

Compare list/dict; parse JSON/CSV at scale

Company: Thumbtack

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Compare Python list and dict precisely: for append/insert/lookup/update/delete, state average and worst-case time complexity, memory implications, and ordering guarantees in CPython 3. How would you store and retrieve values in each (show concise code for appending to a list and updating a dict)? Define JSON vs. CSV and when you would choose JSON over CSV (consider nesting, schema evolution, interoperability, compression). Show exact Python code to stream-read both formats: (a) JSON Lines file via iterating line-by-line and json.loads; (b) CSV via csv.DictReader; and (c) pandas read_csv with chunksize to compute the sum of a numeric column 'value' in data.csv without exceeding memory. Explain how you would handle malformed rows, missing/NaN values, bad encodings, and numeric overflow; propose chunk-size heuristics for a 10 GB file on a 16 GB RAM machine; and provide a non-pandas alternative that still streams safely.

Quick Answer: This question evaluates understanding of Python data structures (list vs dict), algorithmic time and memory complexity, ordering guarantees in CPython, large-scale parsing and streaming of JSON/CSV, and robust data-cleaning and error-handling strategies.

Related Interview Questions

  • Write monthly new-vs-returning requests SQL - Thumbtack (Medium)
  • Compute weekly 3-week rolling sums in SQL - Thumbtack (Medium)
  • Write complex joins and window functions - Thumbtack (Medium)
  • Compute weighted response rates by job category - Thumbtack (Medium)
Thumbtack logo
Thumbtack
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
3
0

Compare Python list and dict precisely: for append/insert/lookup/update/delete, state average and worst-case time complexity, memory implications, and ordering guarantees in CPython 3. How would you store and retrieve values in each (show concise code for appending to a list and updating a dict)? Define JSON vs. CSV and when you would choose JSON over CSV (consider nesting, schema evolution, interoperability, compression). Show exact Python code to stream-read both formats: (a) JSON Lines file via iterating line-by-line and json.loads; (b) CSV via csv.DictReader; and (c) pandas read_csv with chunksize to compute the sum of a numeric column 'value' in data.csv without exceeding memory. Explain how you would handle malformed rows, missing/NaN values, bad encodings, and numeric overflow; propose chunk-size heuristics for a 10 GB file on a 16 GB RAM machine; and provide a non-pandas alternative that still streams safely.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Thumbtack•More Data Scientist•Thumbtack Data Scientist•Thumbtack Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.