Design email to avoid Promotions without online tests

Q: Design email to avoid Promotions without online tests

This question evaluates a data scientist's competency in offline risk‑minimization for email deliverability, covering probabilistic risk estimation, robust optimization, calibration, and covariate‑shift handling within a Machine Learning framework.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Offline Design of a Transactional Email to Minimize Promotions/Spam Classification

Context

You must finalize the design of a single in‑game transactional email before any send. Your goal is to minimize the probability that mailbox providers (Gmail, Outlook, Yahoo, etc.) place it in Promotions or Spam. You cannot run any online A/B tests or get post‑send user feedback; the decision must be made entirely offline.

You have 12 months of historical email data with fields:

send_id, subject, body_html, num_links, link_domains (e.g., game.com, store.game.com, help.game.com, partner.com), anchor_text_types (CTA vs neutral), num_images, template_id
sender_reputation_metrics: complaint_rate_7d, bounce_rate_7d, domain_dkim_pass
send_time_utc, segment (region, platform), provider (gmail/outlook/yahoo)
label from seed inboxing tests or logs: folder_label in {primary, promotions, spam}

Design variables you control now:

number of links (1–5)
which domains are linked
anchor text style (CTA vs neutral)
subject tokens (subject length ≤ 60 characters)
presence of a hero image (binary)

Required business constraints (examples):

Must include at least one help link (help.game.com)
If predicted risk with partner.com links is above a threshold, do not include partner.com

Tasks

(a) Formulate an offline risk‑minimization problem to choose the email design (decision vector) that minimizes P(Promotions/Spam), including objective, constraints, and any robustness term (e.g., worst‑case over providers or conformal upper bounds).

(b) Specify features and a modeling approach to estimate risk, including how to avoid leakage (e.g., time‑based splits, template_id handling) and how to calibrate probabilities.

(c) Describe how to address covariate shift between historical templates and the proposed new design (e.g., domain adaptation, monotonic constraints on num_links, or semi‑synthetic data generation).

(d) Propose a search/optimization strategy over the discrete design space (e.g., beam search with a learned surrogate, Bayesian optimization with mixed variables, or ILP with learned risk).

(e) Explain how to validate the chosen design offline without any new user feedback (e.g., off‑policy evaluation with inverse propensity weighting, stratified provider‑wise risk, and conformal prediction intervals).

(f) If historical labels are scarce or noisy for some providers, propose a fallback (e.g., weak labeling via an open‑source classifier plus a small hand‑labeled set) and how you would quantify added uncertainty in the final decision.

(g) Deliver a concrete decision rule (e.g., select the lowest‑risk design whose worst‑case provider risk upper bound at 90% confidence is below X%) and justify your chosen X.

Design email to avoid Promotions without online tests

Quick Overview

Offline Design of a Transactional Email to Minimize Promotions/Spam Classification

Context

Tasks

Solution

Comments (0)