PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates skills in exact and approximate alphanumeric string matching, streaming pattern lookup, noise-tolerant comparison, and performance-oriented data structures and encodings.

  • Medium
  • Optiver
  • Coding & Algorithms
  • Data Scientist

Match alphanumeric patterns in a stream

Company: Optiver

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

Given a reference 7–8 character alphanumeric code and four candidate codes, select the exact match as quickly as possible. Design a system that: - Efficiently determines whether any candidate exactly matches the reference (consider case sensitivity and lookalike characters such as O/0 and I/ 1). - Supports a streaming setting where batches arrive at increasing rates and latency must remain low. - Handles input noise (e.g., OCR errors) by optionally allowing a configurable Hamming distance threshold without compromising throughput. Discuss your data structures (e.g., hashing, tries, SIMD-friendly encodings), algorithmic choices, complexity, and how you would benchmark and scale this for high QPS.

Quick Answer: This question evaluates skills in exact and approximate alphanumeric string matching, streaming pattern lookup, noise-tolerant comparison, and performance-oriented data structures and encodings.

You are building the matching core of a high-throughput stream processor at a trading firm. Each incoming batch contains a short alphanumeric reference code (typically 7-8 characters) and a list of candidate codes (e.g. captured via OCR). You must decide which candidates match the reference. Implement a function `matchCandidates(reference, candidates, normalizeLookalikes, maxHammingDistance)` that returns the list of indices (in ascending order) of the candidates that match the reference, where a candidate matches if: 1. It has the **same length** as the reference (different lengths never match). 2. Its **Hamming distance** to the reference (number of positions at which the characters differ) is **at most** `maxHammingDistance`. When `normalizeLookalikes` is true, before comparing you must canonicalize both the reference and each candidate as follows, in order: - Convert every character to **uppercase** (so matching becomes case-insensitive). - Map the digit `'0'` to the letter `'O'`. - Map the digit `'1'` to the letter `'I'`. When `normalizeLookalikes` is false, compare the strings exactly as given (case-sensitive, no lookalike folding). Return the indices of all matching candidates in increasing order. Return an empty list if none match.

Constraints

  • 1 <= len(reference) <= 64
  • 0 <= len(candidates) <= 10^4
  • Each candidate string has length 0 to 64
  • Codes contain ASCII letters and digits only
  • 0 <= maxHammingDistance <= len(reference)
  • Indices in the result must be in strictly ascending order

Examples

Input: ('A1B2C3D', ['A1B2C3D', 'X9Y8Z7Q', 'A1B2C3X', 'A1B2C3D'], False, 0)

Expected Output: [0, 3]

Explanation: Exact match, no normalization, threshold 0. Indices 0 and 3 are identical to the reference; index 2 differs in the last char; index 1 is entirely different.

Input: ('A1B2C3D', ['AIB2C3D', 'A1B2C30', 'a1b2c3d', 'A1B2C3D'], True, 0)

Expected Output: [0, 2, 3]

Explanation: With normalization, reference canonicalizes to 'AIB2C3D'. Index 0 ('AIB2C3D') and index 3 already match; index 2 ('a1b2c3d' -> 'AIB2C3D') matches via case+1->I folding. Index 1 ('A1B2C30' -> 'AIB2C3O') differs in the last position, so it is excluded.

Input: ('ABC1234', ['ABC1235', 'ABC1244', 'ABC9234', 'XYZ0000'], False, 1)

Expected Output: [0, 1, 2]

Explanation: Threshold 1: indices 0, 1, 2 each differ in exactly one position from 'ABC1234'. Index 3 differs in many positions and is excluded.

Input: ('CODE123', ['CODE124', 'CODX125', 'CODE123'], False, 2)

Expected Output: [0, 1, 2]

Explanation: Threshold 2: index 0 differs in 1 position, index 1 in 2 positions, index 2 in 0 positions; all are within the allowed distance.

Input: ('SHORT12', ['SHORT1', 'SHORT123', 'SHORT12'], False, 0)

Expected Output: [2]

Explanation: Length pre-filter: 'SHORT1' (6 chars) and 'SHORT123' (8 chars) differ in length from the 7-char reference and never match; only the exact 'SHORT12' at index 2 qualifies.

Input: ('Z9Z9Z9Z', [], False, 0)

Expected Output: []

Explanation: Empty candidate list yields an empty result.

Input: ('OOIIILOO', ['00IIIL00', 'OOI1ILOO', '12345678'], True, 0)

Expected Output: [0, 1]

Explanation: With normalization, '00IIIL00' folds 0->O to 'OOIIILOO' (match) and 'OOI1ILOO' folds 1->I to 'OOIIILOO' (match). '12345678' folds to 'I2345678', which does not match.

Hints

  1. Length is a cheap pre-filter: any candidate whose length differs from the reference can never match, regardless of the Hamming threshold.
  2. Do the lookalike/case canonicalization once per string up front, then compare canonical forms character by character.
  3. When counting differing positions, short-circuit as soon as the running count exceeds maxHammingDistance to stay fast on long batches.
  4. Order matters in the canonicalization: uppercase first (so '1'->I and '0'->O folding applies uniformly to lowercase input too).
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find missing numbers in sequences - Optiver (hard)
  • Design a circular queue data structure - Optiver (medium)
  • Optimize flight and cargo bookings for profit - Optiver (hard)
  • Design a balloon stability tracker - Optiver (Medium)
  • Compare two programs for equivalence - Optiver (Medium)