Analyze survey with gender imbalance
Company: Pinterest
Role: Data Scientist
Category: Statistics & Math
Difficulty: Hard
Interview Round: Onsite
# Analyze survey with gender imbalance
## Scenario
You ran a user survey to measure satisfaction with a new product feature. Each respondent reports:
- `gender ∈ {female, male}`
- `satisfaction` (e.g., binary satisfied/not satisfied, or a 1–5 Likert score)
After data collection, you notice the survey respondents are **not gender-balanced** (e.g., the female/male split in respondents differs from the true split in your user population).
## Questions
1. **What are the key concerns** with using this survey to estimate overall satisfaction for the full user population?
2. **How would you validate** whether the imbalance is problematic (i.e., whether it biases your estimate)? What checks or additional data would you use?
3. **What distributional assumptions** might you make for:
- the gender counts in the sample (female vs male), and
- the satisfaction outcome within each gender?
Explain when those assumptions are reasonable.
4. Compute the probability of observing exactly **30 female respondents out of 70 total respondents**:
- (a) if each respondent is sampled independently from a population where the true female proportion is `p`, and
- (b) if you sampled *without replacement* from a finite population of size `N` with `F` females.
5. How would **stratified sampling** help here, and how would you analyze the results if you used stratified sampling (including how to combine strata to estimate overall satisfaction)?
### Constraints & Assumptions
- Preserve the scope, facts, inputs, and requested outputs from the prompt above.
- If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
- Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
### Clarifying Questions to Ask
- Clarify the random variables, distributional assumptions, independence assumptions, and desired output.
- Show enough derivation for the interviewer to follow the reasoning.
- Explain how you would validate the result with simulation or sensitivity checks.
### What a Strong Answer Covers
- A correct setup with definitions, formulas, and boundary conditions.
- A step-by-step derivation or estimation plan.
- Interpretation of the result, including uncertainty and practical limitations.
- Checks for assumptions, edge cases, and numerical stability.
### Follow-up Questions
- How would the result change if the assumptions were relaxed?
- Can you verify the answer with a simulation?
- What is the most likely source of estimation error?
Quick Answer: Analyze survey with gender imbalance evaluates statistical assumptions, formulas, estimation strategy, uncertainty, edge cases, and interpretation in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.