Describe a project where you ingested and processed a dataset of at least 500 million rows or 1 TB end-to-end. Detail storage formats and partitioning, memory and compute constraints, schema evolution, data quality checks, indexing strategies, and tools chosen (e.g., Spark SQL vs. Pandas vs. BigQuery) and why. Provide before/after run times and cost, and a code-level optimization you used (e.g., vectorization, predicate pushdown, window functions, bucketing). How would your approach change if limited to a single machine with 32 GB RAM?