You have 1,000 URLs labeled as bad or good and a much larger unlabeled pool, with bad links rare. Design features and train a logistic regression. Explain your evaluation plan under class imbalance: stratified K-folds, ROC-AUC vs PR-AUC, calibration (reliability curves), and why accuracy is misleading. Choose a decision threshold by minimizing expected misclassification cost given asymmetric costs. Discuss class weighting or resampling, leakage checks, monitoring for dataset shift between labeled and production traffic, and an offline-to-online validation plan with shadow or canary deployment.