Differentiate pandas objects and SQL filters
Company: Rbcroyalbank
Role: Data Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
## Python (pandas)
1. What is the difference between a **pandas `Series`** and a **pandas `DataFrame`**?
- Discuss structure (1D vs 2D), indexing, column labels, and common use cases.
## SQL
2. What is the difference between **`WHERE`** and **`HAVING`**?
- When do you use each in queries that include `GROUP BY` and aggregates?
3. Write SQL to find duplicates.
### Table
Assume a table of users:
- `users` (
- `user_id` BIGINT PRIMARY KEY,
- `email` VARCHAR,
- `created_at` TIMESTAMP
)
### Tasks
A. Return **emails that appear more than once**, with their duplicate count.
- Output columns: `email`, `dup_count`
B. Return the **full rows** for users whose `email` is duplicated.
- Output columns: `user_id`, `email`, `created_at`
- If there are multiple users with the same email, return all of them.
### Assumptions
- Treat `NULL` emails as non-duplicates unless specified otherwise.
- SQL dialect can be ANSI SQL (window functions allowed).
Quick Answer: This question evaluates understanding of pandas data structures (Series vs DataFrame), SQL filtering and aggregation semantics (WHERE vs HAVING), and the ability to construct SQL queries to identify duplicate records.