PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates proficiency in designing scalable, time-versioned key-value data structures and in graph-based shortest-path planning for costed type conversions, assessing algorithmic thinking, complexity analysis, and robustness under large-scale constraints in the Coding & Algorithms domain.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Implement KV store and plan type conversions

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Part 1 — Versioned key-value store: Implement a data structure with set(key, value, t) and get_at(key, t) that returns the value for key whose timestamp is the greatest <= t (or None if none). Timestamps are integers and may arrive out of order. Optimize both time and space for up to 1e5 keys and 1e6 operations. Explain your data structures, complexity, and edge cases (duplicate timestamps per key, missing keys, very large t, memory). Provide working code for core methods. Part 2 — Type conversion planning: Given primitive types and directed conversion rules (from, to, cost), answer Q queries (source_type, target_type) by returning the minimum total cost to transform source to target, or -1 if impossible. Types <= 1e4, rules <= 1e5, queries <= 1e5. Explain algorithm choices (e.g., Dijkstra on demand vs preprocessing), complexity, path reconstruction, and how to handle dynamic updates to rules.

Quick Answer: This question evaluates proficiency in designing scalable, time-versioned key-value data structures and in graph-based shortest-path planning for costed type conversions, assessing algorithmic thinking, complexity analysis, and robustness under large-scale constraints in the Coding & Algorithms domain.

Part 1: Versioned Key-Value Store

Design a versioned key-value store that processes operations in order. Each 'set' operation stores a string value for a key at an integer timestamp. Each 'get' operation asks for the value of that key whose timestamp is the greatest value less than or equal to the query timestamp, considering only versions inserted by earlier operations. Timestamps may arrive out of order. If the same key is set multiple times at the same timestamp, the newest value overwrites the old one. Return the answers for all 'get' operations in order. To keep the return type simple, return the string 'None' when no version exists.

Constraints

  • 0 <= len(ops) <= 10^6
  • All four input lists have the same length
  • There can be up to 10^5 distinct keys
  • Timestamps are integers in the range [0, 10^9]
  • Stored values are non-empty strings and will never be the literal string 'None'

Examples

Input: (['set','set','get','set','get','get'], ['a','a','a','a','a','a'], ['x','y','','z','',''], [5,10,7,6,6,100])

Expected Output: ['x','z','y']

Explanation: The query at time 7 sees version (5,'x'). After setting timestamp 6 to 'z', the query at time 6 returns 'z'. The final query returns the latest version at or before 100, which is 'y' at timestamp 10.

Input: (['get','set','get','get'], ['m','m','m','m'], ['','v','',''], [4,5,4,5])

Expected Output: ['None','None','v']

Explanation: Before any set, the key is missing. After setting at timestamp 5, a query for time 4 still has no valid version, while time 5 returns 'v'.

Input: (['set','set','get','set','get','get'], ['a','a','a','b','b','a'], ['old','new','','x','',''], [3,3,3,2,2,10])

Expected Output: ['new','x','new']

Explanation: The second set for key 'a' uses the same timestamp and overwrites the old value. Key 'b' is independent.

Input: ([], [], [], [])

Expected Output: []

Explanation: Edge case: no operations means no output.

Solution

def solution(ops, keys, values, timestamps):
    class Node:
        __slots__ = ('ts', 'val', 'prio', 'left', 'right')
        def __init__(self, ts, val, prio):
            self.ts = ts
            self.val = val
            self.prio = prio
            self.left = None
            self.right = None

    seed = 2463534242

    def next_rand():
        nonlocal seed
        seed ^= (seed << 13) & 0xFFFFFFFF
        seed ^= (seed >> 17)
        seed ^= (seed << 5) & 0xFFFFFFFF
        return seed & 0xFFFFFFFF

    def rotate_right(root):
        new_root = root.left
        root.left = new_root.right
        new_root.right = root
        return new_root

    def rotate_left(root):
        new_root = root.right
        root.right = new_root.left
        new_root.left = root
        return new_root

    def insert(root, ts, val):
        if root is None:
            return Node(ts, val, next_rand())
        if ts == root.ts:
            root.val = val
            return root
        if ts < root.ts:
            root.left = insert(root.left, ts, val)
            if root.left.prio < root.prio:
                root = rotate_right(root)
        else:
            root.right = insert(root.right, ts, val)
            if root.right.prio < root.prio:
                root = rotate_left(root)
        return root

    def get_le(root, ts):
        ans = 'None'
        while root is not None:
            if root.ts <= ts:
                ans = root.val
                root = root.right
            else:
                root = root.left
        return ans

    roots = {}
    out = []
    for i, op in enumerate(ops):
        key = keys[i]
        ts = timestamps[i]
        if op == 'set':
            roots[key] = insert(roots.get(key), ts, values[i])
        else:
            out.append(get_le(roots.get(key), ts))
    return out

Time complexity: Expected O(log m) per operation, where m is the number of distinct timestamps stored for that key; O(n log m_max) expected total for n operations.. Space complexity: O(v), where v is the number of distinct (key, timestamp) versions stored..

Hints

  1. A hash map alone is not enough because each key needs fast predecessor queries by timestamp.
  2. Think of keeping one ordered structure per key, where insert and 'greatest timestamp <= t' can both be done in logarithmic time.

Part 2: Type Conversion Planning

You are given primitive types numbered from 0 to n - 1 and directed conversion rules (from_type, to_type, cost). For each query (source_type, target_type), return the minimum total cost required to convert the source to the target, or -1 if it is impossible. All costs are non-negative. The queries are given as a batch, so you may reuse work across queries.

Constraints

  • 1 <= n <= 10^4
  • 0 <= len(rules) <= 10^5
  • 0 <= len(queries) <= 10^5
  • 0 <= u, v, source, target < n
  • 0 <= cost <= 10^9

Examples

Input: (5, [(0,1,2),(1,2,3),(0,2,10),(2,3,1)], [(0,2),(0,3),(3,0),(4,4)])

Expected Output: [5,6,-1,0]

Explanation: 0->2 is cheaper through 1. 0->3 is 0->1->2->3. There is no path from 3 to 0. A type always converts to itself at cost 0.

Input: (3, [(0,1,5),(0,1,2),(1,2,4),(0,2,10)], [(0,1),(0,2),(2,2)])

Expected Output: [2,6,0]

Explanation: There are duplicate rules from 0 to 1; the cheapest one should be used. Then 0->1->2 costs 6, which beats the direct edge of cost 10.

Input: (2, [], [(0,1),(1,1)])

Expected Output: [-1,0]

Explanation: Edge case: with no rules, only source == target queries have cost 0.

Input: (6, [(0,5,7),(1,5,3),(2,1,1),(2,5,20),(3,2,2)], [(0,5),(2,5),(3,5),(4,5)])

Expected Output: [7,4,6,-1]

Explanation: For target 5, 2 reaches 5 more cheaply through 1, and 3 reaches 5 through 2 and 1.

Solution

def solution(n, rules, queries):
    from collections import defaultdict
    from heapq import heappush, heappop

    forward = [dict() for _ in range(n)]
    reverse = [dict() for _ in range(n)]

    for u, v, w in rules:
        prev = forward[u].get(v)
        if prev is None or w < prev:
            forward[u][v] = w
            reverse[v][u] = w

    graph = [list(nei.items()) for nei in forward]
    rev_graph = [list(nei.items()) for nei in reverse]

    by_source = defaultdict(list)
    by_target = defaultdict(list)
    for idx, (s, t) in enumerate(queries):
        by_source[s].append((t, idx))
        by_target[t].append((s, idx))

    def dijkstra(start, adj, needed_nodes):
        dist = {start: 0}
        heap = [(0, start)]
        remaining = set(needed_nodes)
        found = {}

        if start in remaining:
            found[start] = 0
            remaining.remove(start)
            if not remaining:
                return found

        INF = 10**30
        while heap and remaining:
            d, u = heappop(heap)
            if d != dist.get(u):
                continue
            if u in remaining:
                found[u] = d
                remaining.remove(u)
                if not remaining:
                    break
            for v, w in adj[u]:
                nd = d + w
                if nd < dist.get(v, INF):
                    dist[v] = nd
                    heappush(heap, (nd, v))
        return found

    answers = [-1] * len(queries)

    if len(by_source) <= len(by_target):
        for s, items in by_source.items():
            needed = {t for t, _ in items}
            found = dijkstra(s, graph, needed)
            for t, idx in items:
                answers[idx] = found.get(t, -1)
    else:
        for t, items in by_target.items():
            needed = {s for s, _ in items}
            found = dijkstra(t, rev_graph, needed)
            for s, idx in items:
                answers[idx] = found.get(s, -1)

    return answers

Time complexity: Worst-case O(min(Us, Ut) * (E log V)), where Us is the number of distinct sources in queries and Ut is the number of distinct targets; early stopping often makes it faster in practice.. Space complexity: O(E + V + Q) for the graphs, query grouping, and answer array..

Hints

  1. If many queries share the same source, one shortest-path run can answer all of them; the same idea works with shared targets on the reversed graph.
  2. Because all costs are non-negative, Dijkstra's algorithm is appropriate, and you can stop early once all nodes needed for the current group of queries are settled.
Last updated: May 22, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement IP Address Arithmetic - OpenAI (hard)
  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Generate Data Labeling Schedules - OpenAI (medium)