PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Intuit

Sum palindrome-change costs over all substrings

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of string algorithms, combinatorial counting, and algorithmic optimization for aggregating pairwise mismatch costs across all substrings.

  • medium
  • Intuit
  • Coding & Algorithms
  • Software Engineer

Sum palindrome-change costs over all substrings

Company: Intuit

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Take-home Project

You are given a DNA string `dna` consisting only of characters `A`, `C`, `G`, `T`. For any substring `dna[l..r]`, define its **palindrome modification cost** as the minimum number of single-character substitutions required to make that substring a palindrome. (Equivalently, it is the number of mismatched character pairs when comparing from both ends: count of indices `i` where `dna[l+i] != dna[r-i]` for `i < (r-l+1)/2`.) Compute the **sum of palindrome modification costs over all substrings** of `dna`, and output the result as a 64-bit integer. ### Input - A single string `dna`. ### Output - A single integer: the sum of costs across all substrings. ### Notes - You may assume the result fits in a signed 64-bit integer (`long` / `int64`).

Quick Answer: This question evaluates understanding of string algorithms, combinatorial counting, and algorithmic optimization for aggregating pairwise mismatch costs across all substrings.

Solution

### Approach The naive definition iterates over every substring and, inside each, walks both ends comparing mirror characters. There are $O(n^2)$ substrings and each costs up to $O(n)$ to score, so the direct simulation is $O(n^3)$ — far too slow for any non-trivial string. The key is to **stop thinking in terms of substrings and start thinking in terms of character pairs.** A single mismatched pair `(dna[i], dna[j])` contributes `1` to the cost of *every* substring in which positions `i` and `j` happen to be mirror images of each other. So: $$\text{answer} = \sum_{\substack{0 \le i < j \le n-1 \\ dna[i] \ne dna[j]}} (\text{number of substrings where } i \text{ and } j \text{ are a mirror pair})$$ #### When are `i` and `j` a mirror pair? In substring `dna[l..r]`, the position `l + t` mirrors `r - t`. So `i` and `j` (with `i < j`) are a mirror pair exactly when they are **equidistant from the two ends**: $$i - l = r - j \iff l + r = i + j$$ Fix the pair `(i, j)`. Every valid substring is obtained by expanding equally outward: `l = i - t`, `r = j + t` for some `t \ge 0`, subject to staying in bounds: - `l \ge 0` requires `t \le i` - `r \le n-1` requires `t \le n-1-j` So `t` ranges over `0 .. min(i, n-1-j)`, giving exactly $$\text{weight}(i, j) = \min(i,\; n-1-j) + 1$$ substrings in which they mirror. Intuitively this is the smaller of "how much room there is to the left of `i`" and "how much room there is to the right of `j`," plus one for the tight substring `dna[i..j]` itself. That collapses the whole problem to a double loop over pairs: $$\boxed{\;\text{answer} = \sum_{0 \le i < j \le n-1} [\,dna[i] \ne dna[j]\,]\cdot\big(\min(i,\,n-1-j)+1\big)\;}$$ ### Reference solution — $O(n^2)$ This is the version to write on the whiteboard: it is short, obviously correct, and fast enough for typical constraints ($n$ up to a few thousand). ```python def sum_palindrome_costs(dna: str) -> int: n = len(dna) total = 0 for i in range(n): for j in range(i + 1, n): if dna[i] != dna[j]: total += min(i, n - 1 - j) + 1 return total ``` ```java public static long sumPalindromeCosts(String dna) { int n = dna.length(); long total = 0; // 64-bit accumulator — required for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { if (dna.charAt(i) != dna.charAt(j)) { total += Math.min(i, n - 1 - j) + 1; } } } return total; } ``` - **Time:** $O(n^2)$ — one pass over all $\binom{n}{2}$ pairs, $O(1)$ work each. - **Space:** $O(1)$. ### Optimizing to $O(n \cdot \sigma)$ time, where $\sigma = 4$ The double loop is wasteful because the alphabet is tiny (`A`, `C`, `G`, `T`). We can compute the same sum in $O(n \cdot \sigma)$ time by handling the weight `min(i, n-1-j) + 1` as a piecewise function and using per-character prefix sums. Since $\sigma = 4$ is a constant this is linear in $n$, but the $\sigma$ factor is real: it appears in both the inner copy loop and the prefix-table footprint below. Fix the right index `j` and let `m = n - 1 - j`. For each left index `i < j`: $$\min(i, m) + 1 = \begin{cases} i + 1 & \text{if } i \le m \quad(\text{pair lies on/left of the center}) \\ m + 1 & \text{if } i > m \quad(\text{pair lies right of the center}) \end{cases}$$ For a fixed `j` we want this summed over all `i < j` **whose character differs** from `dna[j]`. Using the identity "differs = all − equal," maintain, as a prefix over `i`: - `cnt[c]` — how many positions so far hold character `c`; - `wsum[c]` — the running sum of `(i + 1)` over positions holding character `c`. Split `i \in [0, j-1]` at the threshold `m`: - **Left part** `i \le m`: each contributes weight `i + 1`. Subtract the equal-character `wsum`. - **Right part** `i > m`: each contributes the constant `m + 1`. Subtract the equal-character `cnt`, multiply by `m + 1`. To answer "prefix up to index `m` per character," store the prefix arrays and index into them: ```python def sum_palindrome_costs_linear(dna: str) -> int: n = len(dna) if n < 2: return 0 idx = {'A': 0, 'C': 1, 'G': 2, 'T': 3} # cnt[p][c] = # of positions in [0, p-1] holding char c # wsum[p][c] = sum of (pos + 1) over positions in [0, p-1] holding char c cnt = [[0] * 4 for _ in range(n + 1)] wsum = [[0] * 4 for _ in range(n + 1)] for p in range(n): for c in range(4): cnt[p + 1][c] = cnt[p][c] wsum[p + 1][c] = wsum[p][c] c = idx[dna[p]] cnt[p + 1][c] += 1 wsum[p + 1][c] += (p + 1) total = 0 for j in range(n): m = n - 1 - j cj = idx[dna[j]] thr = min(m, j - 1) # last i that uses weight (i+1) # Left part: i in [0, thr], weight (i+1), only chars != dna[j] if thr >= 0: p = thr + 1 all_w = wsum[p][0] + wsum[p][1] + wsum[p][2] + wsum[p][3] total += all_w - wsum[p][cj] # Right part: i in [thr+1, j-1], constant weight (m+1), chars != dna[j] a, b = thr + 1, j - 1 if a <= b: all_c = sum(cnt[b + 1][c] - cnt[a][c] for c in range(4)) eq_c = cnt[b + 1][cj] - cnt[a][cj] total += (m + 1) * (all_c - eq_c) return total ``` - **Time:** $O(n \cdot \sigma)$; with $\sigma = 4$ this is $O(n)$. - **Space:** $O(n \cdot \sigma)$ for the two prefix tables — also $O(n)$ for fixed $\sigma$. The tables are the price of the speedup; they can be trimmed (e.g. roll the prefix forward and answer each `j` from running totals), but the version above keeps the index-into-prefix logic explicit. Both implementations have been checked against the brute-force $O(n^3)$ simulation on thousands of random strings and agree exactly. ### Worked examples | `dna` | answer | why | |-------|--------|-----| | `"A"` | `0` | single char, no pairs | | `"AC"` | `1` | only substring with a pair is `"AC"`: one mismatch | | `"ACA"` | `2` | `"AC"`→1, `"CA"`→1, `"ACA"`→0 (already a palindrome) | | `"AAAA"` | `0` | every character equal, no mismatched pair ever | | `"ACGT"` | `7` | all length-≥2 substrings contribute mismatches | | `"ACGTACGT"` | `44` | larger case for sanity-checking an implementation | Quick hand-check of `"ACGT"` via the pair formula ($n = 4$, all six pairs mismatch): | `(i, j)` | `min(i, 3-j) + 1` | |----------|-------------------| | (0,1) | min(0,2)+1 = 1 | | (0,2) | min(0,1)+1 = 1 | | (0,3) | min(0,0)+1 = 1 | | (1,2) | min(1,1)+1 = 2 | | (1,3) | min(1,0)+1 = 1 | | (2,3) | min(2,0)+1 = 1 | Sum $= 1+1+1+2+1+1 = 7$. ✓ ### Correctness argument - **Cost decomposes over pairs.** For a fixed substring, its palindrome cost is, by the problem's own restatement, the count of mismatched mirror pairs. Summing those indicators over all substrings and swapping the order of summation gives exactly the pair-contribution form above — no double counting, because each (substring, mirror-pair) incidence is counted once. - **Weight is exact.** For a given pair `(i, j)`, the substrings in which they mirror are precisely the symmetric expansions `l = i - t`, `r = j + t`; the count of in-bounds `t` is `min(i, n-1-j) + 1`. Distinct substrings ↔ distinct `t`, so there is no overcounting. - **Only mismatches count**, enforced by the `dna[i] != dna[j]` guard. ### Edge cases & pitfalls - **Empty string or length 1:** there are no pairs, so the answer is `0`. Both loops naturally produce `0`. - **All-equal string** (e.g. `"AAAA"`): every guard fails, answer `0`. - **Overflow — the headline trap.** The total counts (substring, mirror-pair) incidences, so it grows on the order of $n^3$ (more precisely it is bounded by roughly $n^3/6$). For strings a few thousand characters long this exceeds the signed 32-bit range — a worst-case mismatch pattern such as repeating `ACGT` crosses $2^{31}-1$ around $n \approx 3.6\text{k}$. **Use a 64-bit accumulator** (`long` in Java/C++, Python ints are arbitrary precision). The problem explicitly says the result fits in signed 64-bit, so `long` is sufficient but `int` is not. - **Off-by-one in the weight:** the `+ 1` accounts for the tight substring `dna[i..j]` (`t = 0`); forgetting it undercounts every pair. - **Mirror condition, not adjacency:** a common mistake is to only compare adjacent or fixed-distance characters. The pair must satisfy `l + r = i + j`, which is what the `min(i, n-1-j)` weight encodes. - **Alphabet generality:** nothing in the $O(n^2)$ solution depends on the alphabet being `ACGT`; it works for any characters. The linear optimization only needs the alphabet to be small/known.

Related Interview Questions

  • Add One to Digit Array - Intuit (easy)
  • Find Business Degrees of Separation - Intuit (hard)
  • Validate bracket sequence - Intuit (easy)
  • Produce valid student lineup from parent array - Intuit (medium)
  • Find largest filename from ls -l output - Intuit (medium)
Intuit logo
Intuit
Oct 13, 2025, 12:00 AM
Software Engineer
Take-home Project
Coding & Algorithms
16
0

You are given a DNA string dna consisting only of characters A, C, G, T.

For any substring dna[l..r], define its palindrome modification cost as the minimum number of single-character substitutions required to make that substring a palindrome. (Equivalently, it is the number of mismatched character pairs when comparing from both ends: count of indices i where dna[l+i] != dna[r-i] for i < (r-l+1)/2.)

Compute the sum of palindrome modification costs over all substrings of dna, and output the result as a 64-bit integer.

Input

  • A single string dna .

Output

  • A single integer: the sum of costs across all substrings.

Notes

  • You may assume the result fits in a signed 64-bit integer ( long / int64 ).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Intuit•More Software Engineer•Intuit Software Engineer•Intuit Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.