PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Tencent

Design email to avoid Promotions without online tests

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in offline risk‑minimization for email deliverability, covering probabilistic risk estimation, robust optimization, calibration, and covariate‑shift handling within a Machine Learning framework.

  • hard
  • Tencent
  • Machine Learning
  • Data Scientist

Design email to avoid Promotions without online tests

Company: Tencent

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You must finalize the design of an in‑game transactional email before any send. Goal: minimize the probability mailbox providers (e.g., Gmail/Outlook) classify it as Promotional/Spam. Constraint: you cannot run any online A/B tests or get post‑send user feedback; the decision must be made entirely offline. You can use historical data from the last 12 months of game emails that includes: send_id, subject, body_html, num_links, link_domains (e.g., game.com, store.game.com, help.game.com, partner.com), anchor_text_types (CTA vs neutral), num_images, template_id, sender_reputation_metrics (complaint_rate_7d, bounce_rate_7d, domain_dkim_pass), send_time_utc, segment (region, platform), provider (gmail/outlook/yahoo), and label from seed inboxing tests or logs: folder_label in {primary, promotions, spam}. Design variables you control now: number of links (1–5), which domains are linked, anchor text style (CTA vs neutral), subject tokens, presence of hero image. a) Formulate an offline risk‑minimization problem to choose the design (decision vector) that minimizes P(Promotions/Spam) subject to constraints (e.g., must include at least one help link; subject length ≤ 60; no partner.com link if risk > threshold). Write the objective, constraints, and any robustness term you would include (e.g., worst‑case over providers or conformal upper bounds). b) Specify features and a modeling approach to estimate risk, including how you will avoid leakage (e.g., using time‑based splits, template_id handling) and calibrate probabilities. c) Describe how you will address covariate shift between historical templates and the proposed new design (e.g., domain adaptation, monotonic constraints on num_links, or semi‑synthetic data generation). d) Propose a search/optimization strategy over the discrete design space (e.g., beam search with learned surrogate, Bayesian optimization with mixed variables, or ILP with learned risk). e) Explain how you will validate the chosen design offline without any new user feedback (e.g., off‑policy evaluation with inverse propensity/importance weighting given historic sending policies, stratified provider‑wise risk, and conformal prediction intervals). f) If historical labels are scarce or noisy for some providers, propose a fallback (e.g., weak labeling via open‑source classifier plus small hand‑labeled set) and how you would quantify added uncertainty in the final decision. g) Deliver a concrete decision rule (e.g., select the lowest‑risk design whose worst‑case provider risk upper bound at 90% confidence is below X%) and justify your chosen X.

Quick Answer: This question evaluates a data scientist's competency in offline risk‑minimization for email deliverability, covering probabilistic risk estimation, robust optimization, calibration, and covariate‑shift handling within a Machine Learning framework.

Tencent logo
Tencent
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
1
0

Offline Design of a Transactional Email to Minimize Promotions/Spam Classification

Context

You must finalize the design of a single in‑game transactional email before any send. Your goal is to minimize the probability that mailbox providers (Gmail, Outlook, Yahoo, etc.) place it in Promotions or Spam. You cannot run any online A/B tests or get post‑send user feedback; the decision must be made entirely offline.

You have 12 months of historical email data with fields:

  • send_id, subject, body_html, num_links, link_domains (e.g., game.com, store.game.com, help.game.com, partner.com), anchor_text_types (CTA vs neutral), num_images, template_id
  • sender_reputation_metrics: complaint_rate_7d, bounce_rate_7d, domain_dkim_pass
  • send_time_utc, segment (region, platform), provider (gmail/outlook/yahoo)
  • label from seed inboxing tests or logs: folder_label in {primary, promotions, spam}

Design variables you control now:

  • number of links (1–5)
  • which domains are linked
  • anchor text style (CTA vs neutral)
  • subject tokens (subject length ≤ 60 characters)
  • presence of a hero image (binary)

Required business constraints (examples):

  • Must include at least one help link (help.game.com)
  • If predicted risk with partner.com links is above a threshold, do not include partner.com

Tasks

(a) Formulate an offline risk‑minimization problem to choose the email design (decision vector) that minimizes P(Promotions/Spam), including objective, constraints, and any robustness term (e.g., worst‑case over providers or conformal upper bounds).

(b) Specify features and a modeling approach to estimate risk, including how to avoid leakage (e.g., time‑based splits, template_id handling) and how to calibrate probabilities.

(c) Describe how to address covariate shift between historical templates and the proposed new design (e.g., domain adaptation, monotonic constraints on num_links, or semi‑synthetic data generation).

(d) Propose a search/optimization strategy over the discrete design space (e.g., beam search with a learned surrogate, Bayesian optimization with mixed variables, or ILP with learned risk).

(e) Explain how to validate the chosen design offline without any new user feedback (e.g., off‑policy evaluation with inverse propensity weighting, stratified provider‑wise risk, and conformal prediction intervals).

(f) If historical labels are scarce or noisy for some providers, propose a fallback (e.g., weak labeling via an open‑source classifier plus a small hand‑labeled set) and how you would quantify added uncertainty in the final decision.

(g) Deliver a concrete decision rule (e.g., select the lowest‑risk design whose worst‑case provider risk upper bound at 90% confidence is below X%) and justify your chosen X.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Tencent•More Data Scientist•Tencent Data Scientist•Tencent Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.