Append country tables and rank salaries in USD
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You have separate country-level employee tables that must be appended and ranked by salary converted to USD using an exchange rate table. SQL schema and small samples: employees_us(emp_id INT, name TEXT, salary DECIMAL, currency_code CHAR(3), country_code CHAR(2)) rows: 1001, Alice, 120000, USD, US; 1002, Bob, 90000, USD, US. employees_uk(emp_id INT, name TEXT, salary DECIMAL, currency_code CHAR(3), country_code CHAR(2)) rows: 2001, Claire, 80000, GBP, GB; 2002, Dan, 95000, GBP, GB. employees_jp(emp_id INT, name TEXT, salary DECIMAL, currency_code CHAR(3), country_code CHAR(2)) rows: 3001, Emi, 12000000, JPY, JP; 3002, Fumi, 8500000, JPY, JP. exchange_rates(currency_code CHAR(3), rate_to_usd DECIMAL(10,4), rate_date DATE) rows: USD, 1.0000, 2025-08-31; GBP, 1.2700, 2025-08-31; JPY, 0.0068, 2025-08-31. Tasks: 1) Write a single SQL query that appends the country tables (assume same columns) and returns the top 10 employees by salary_usd, with columns (country_code, emp_id, name, salary_original, currency_code, salary_usd), using the most recent exchange rate per currency on or before 2025-08-31. 2) Ensure the plan avoids duplicate counting if an employee appears in multiple country tables (use primary key (country_code, emp_id)). 3) Provide a Python (pandas) alternative that reads all CSVs from a parent directory (pattern employees_*.csv), concatenates, joins to exchange_rates.csv, computes salary_usd, and returns the top 10 by salary_usd.
Quick Answer: This question evaluates skills in data manipulation and integration, specifically appending country-level tables, normalizing salaries via historical exchange rates, deduplicating by a composite primary key, and ranking results, with implementations expected in both SQL and pandas.