Detecting Dead Links: Build and Evaluate a Classifier
Scenario
You have a dataset of 1,000 URLs labeled as good (alive) or bad (dead). The classes are likely imbalanced (e.g., far fewer dead links than good ones).
Task
-
Describe how you would build the classifier end-to-end (data prep, features, model, validation, and deployment considerations).
-
Explain which evaluation metric(s) you would choose for imbalanced data.
-
Clarify why AUROC might be preferred over accuracy when the classes are imbalanced.
Hint: A strong baseline is logistic regression with class-imbalance-aware metrics.