Camera Calibration and 3D Geometry for Autonomy
Asked of: Machine Learning Engineer
Last updated
What's being tested
Interviewers probe whether you can connect image formation and calibration math to practical ML pipelines: how to convert pixels to rays, use intrinsics/extrinsics in training and inference, quantify geometric error, and design data/metrics that expose calibration drift. Tesla cares because learned perception models must consume geometrically-correct inputs (undistorted images, registered 3D data) and because small calibration errors cascade into large depth/pose errors during autonomy. Expect clarifying questions about coordinate frames, units, and where calibration lives in the stack.
Core knowledge
-
Pinhole camera model and intrinsic matrix: , projection after dividing by ; first-order mapping used everywhere in reprojection and augmentations.
-
Camera intrinsics: focal lengths (), principal point (), and skew (usually 0); intrinsics are in pixels and must match image resolution and rectification pipeline.
-
Distortion models: Brown–Conrady radial () and tangential () terms; undistort maps via
OpenCV’sinitUndistortRectifyMap. Unmodeled distortion biases learned features. -
Extrinsics: rigid transform (rotation R, translation t) between camera and vehicle/LiDAR frames; expressed as and used to transform back-projected rays into world coordinates.
-
Stereo geometry & depth: disparity , depth for rectified stereo (baseline ); depth sensitivity scales with so long-range depth is fragile.
-
Epipolar constraints & matrices: Fundamental matrix (uncalibrated) and Essential matrix (calibrated) satisfy ; used for outlier rejection and self-supervised losses.
-
PnP and pose estimation: given 3D-2D correspondences, solve Perspective-n-Point (
PnP) for camera pose; RANSAC for robust inliers. Accuracy depends on distribution of 3D points (depth spread, non-coplanar). -
Bundle adjustment & calibration refinement: joint optimization of poses, intrinsics, and 3D points; implemented with
Ceres Solverorg2o; costly but gives global consistency — used offline or as a refinement stage. -
Metrics: reprojection error in pixels (mean / median; <0.5px excellent), depth RMSE (meters) and % within thresholds; for stereo also report disparity error in pixels. Monitor per-camera, per-temperature, and per-lens.
-
Differentiable reprojection: integrate camera transforms into training with losses like reprojection, photometric, or geometric consistency; ensure gradients flow through camera intrinsics if you learn them.
-
Rolling-shutter & temporal sync: rolling shutter warps projection for moving platforms; timestamp alignment across sensors is critical — sync errors appear as geometric residuals and should be instrumented in datasets.
-
Data practices for MLEs: produce both undistorted and raw images, save
K, distortion params, andT_cam_to_vehicleper-file; store calibration metadata in training manifests for reproducibility and drift analysis. -
Synthetic data & domain gap: simulate correct intrinsics/distortion and add realistic noise (motion blur, sensor noise) to narrow sim2real; consider learning per-frame calibration offsets if cameras have small time-varying biases.
Worked example — common interview prompt: "Project 3D points into camera and compute reprojection error"
Frame it: ask whether points are in the same coordinate frame as the camera, whether intrinsics and distortion are already known, and whether to report mean or RMS pixel error. Skeleton: (1) transform 3D points into camera frame using extrinsics: ; (2) apply pinhole projection , ; (3) apply distortion model or undistort observed pixels consistently; (4) compute per-point pixel residuals and summarise (mean, median, >1px percent). Tradeoff to flag: whether to undistort points first or project then distort to match raw observations — both valid but must be consistent with how ground-truth keypoints were measured. Close by noting practicalities: clip points with , robustify with RANSAC or Huber loss, and if time allowed propose bundle adjustment to jointly refine pose and intrinsics.
A second angle — "Estimate depth to a lane marker using calibrated stereo while accounting for low texture"
Here the constraint changes: you must reason about disparity quantization, matching quality, and uncertainty propagation. Outline: rectify images using initUndistortRectifyMap, compute disparity (block matcher or learned network), convert to depth , and propagate disparity variance into depth variance . Practical MLE moves: filter by confidence maps, fuse LiDAR when available, and train stereo networks with geometric consistency and photometric augmentation to handle low-texture regions. Emphasize baseline selection, subpixel refinement, and metrics that penalize long-range depth errors more.
Common pitfalls
Pitfall: Treating intrinsics as immutable constants. In practice, intrinsics drift (thermal, focus changes); a better answer explains monitoring, per-drive re-calibration triggers, or learning small per-frame intrinsics offsets during training.
Pitfall: Applying undistortion inconsistently. A tempting but wrong approach is undistorting only training images; inference still uses raw pipeline — always specify if your model expects rectified/undistorted images and document conversion in the runtime pipeline.
Pitfall: Reporting only mean reprojection error. Mean hides heavy-tailed failures; report median, percentiles, and per-scene breakdowns and demonstrate robustness methods (RANSAC, Huber) you’d add.
Connections
-
Sensor fusion & state estimation (visual-inertial odometry, LiDAR-camera calibration) — interviewers may pivot to fusing calibrated camera rays with
IMUor LiDAR. -
Self-supervised geometry (depth/pose networks) and SLAM — expect pivots to end-to-end learning of depth with geometric losses and drift correction.
Further reading
-
Zhang, “A flexible new technique for camera calibration” (2000) — classic intrinsics estimation method.
-
[Hartley & Zisserman, “Multiple View Geometry”] — deep reference for epipolar geometry, essential/fundamental matrices, and bundle adjustment.
-
OpenCVcalibration docs — practical functions (calibrateCamera,stereoRectify,initUndistortRectifyMap) you will use in pipelines.
Related concepts
- Autonomous Driving Perception ModelsMachine Learning
- Autonomy Data Engine and Active LearningML System Design
- Model Evaluation and Calibration
- Safety, Alignment, Guardrails, and Responsible LLM Deployment
- Distributed Training and GPU Efficiency for Autonomy Models
- Model Evaluation, Calibration, And Thresholding