Implement a Python function to clean and rank student scores.
You are given a table (or DataFrame) students with schema:
| column | type | notes |
|---|
| student_id | int | unique identifier |
| math_score | float | may be null |
| english_score | float | may be null |
| physics_score | float | may be null |
Task
Write a function (e.g., select_top_students(students_df) -> pd.DataFrame) that:
-
Removes students with at least 2 missing scores
across the three subjects.
-
For remaining students,
fill missing scores with the subject median
(median computed per subject using the remaining students’ non-missing values).
-
Sort
the remaining students by:
-
math_score
descending, then
-
physics_score
descending
-
(optional tie-breaker)
student_id
ascending.
-
Return the top 5
students as a table with the same columns.
Notes / edge cases to handle
-
Missing values may be represented as
None
/
NaN
.
-
If fewer than 5 students remain, return all remaining.
-
If a subject median is undefined (e.g., all remaining values are null for that subject), specify and implement a reasonable behavior (e.g., leave as null or raise an error)—state your assumption in comments.