Explain pandas and SQL basics
Company: Rbcroyalbank
Role: Data Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
You are interviewing for a Data Engineer co-op/intern role. Answer the following short technical questions.
Python / pandas:
1. What is the difference between a pandas `Series` and a pandas `DataFrame`? Give one practical example of when you would use each.
SQL concepts:
2. What is the difference between `WHERE` and `HAVING` in SQL, and when should each be used?
SQL query task:
3. You have a table:
`customer_events`
- `event_id` BIGINT
- `customer_email` STRING
- `source_system` STRING
- `created_at` TIMESTAMP
Assume `created_at` is stored in UTC. Write a SQL query to find duplicate `customer_email` values in the table. A duplicate means the same `customer_email` appears more than once in the full table. Return these output columns:
- `customer_email`
- `duplicate_count`
Only include emails that appear more than once.
Quick Answer: This question evaluates familiarity with pandas data structures (Series vs DataFrame), SQL filtering semantics (WHERE vs HAVING), and practical SQL aggregation for detecting duplicate records, assessing competencies in data manipulation, aggregation, and basic data quality checks.