PracHub
Questions
Premium
Learning
Careers
Back
|
Home
/
Machine Learning
/
TikTok
Explain RL policy types and modern policy gradients
TikTok
Jan 11, 2026, 12:00 AM
Software Engineer
Technical Screen
Machine Learning
0
0
Loading...
Machine Learning Fundamentals (RL + Attention)
Part A — Reinforcement Learning
Define
on-policy
vs
off-policy
learning.
What makes an algorithm on-policy/off-policy?
Give examples of each.
Explain
TRPO
(Trust Region Policy Optimization).
What problem is it trying to solve compared to vanilla policy gradient?
What is the role of a KL-divergence “trust region” constraint?
Explain
PPO
(Proximal Policy Optimization).
How does PPO approximate TRPO’s trust-region behavior?
What is the clipped surrogate objective trying to prevent?
Explain
GAE
(Generalized Advantage Estimation).
What is the definition of the advantage function?
How does GAE trade off bias vs variance, and what does the
λ
\lambda
λ
parameter do?
Part B — Attention Mechanisms
Explain
linear attention
.
Why is it called “linear” (in what variable does complexity become linear)?
What approximation or structural change is made vs. standard softmax attention?
What information can be lost compared with exact softmax attention?
Explain
Group-Query Attention (GQA)
.
What is grouped/shared between heads?
Why does it help inference efficiency and KV-cache memory?
What are typical quality/accuracy trade-offs?
Solution
Login required
Show
Comments (0)
Sign in
to leave a comment
Loading comments...
Browse More Questions
More Machine Learning
•
More TikTok
•
More Software Engineer
•
TikTok Software Engineer
•
TikTok Machine Learning
•
Software Engineer Machine Learning
Explain RL policy types and modern policy gradients | TikTok Interview Question