Design Data Structure for Sparse Matrices Operations

Q: Design Data Structure for Sparse Matrices Operations

This question evaluates a candidate's ability to design efficient data structures and implement algorithms for sparse matrix storage and operations, emphasizing space-efficient representations, matrix arithmetic, and computational complexity reasoning.

Q: How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

Q: What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Pinterest.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Pinterest during technical interviews.

Question

##### Scenario

Analytics engine stores extremely sparse numeric matrices.

##### Question

Design a data structure to store two sparse matrices and implement print(), add(A,B) and multiply(A,B).  Discuss complexity for each operation.

##### Hints

Use dictionary-of-keys or CSR; pre-index rows and columns to speed multiplication.

PracHub · Accepted Answer

def sparse_matrix_ops(dimA, dimB, A_entries, B_entries, op):
    def as_pair(x):
        if isinstance(x, (list, tuple)) and len(x) == 2:
            return int(x[0]), int(x[1])
        raise ValueError("dim must be a length-2 list/tuple")

ra, ca = as_pair(dimA)
    rb, cb = as_pair(dimB)

def normalize(entries, r, c):
        rows = {}
        if entries is None:
            return rows
        for e in entries:
            if not (isinstance(e, (list, tuple)) and len(e) == 3):
                raise ValueError("each entry must be [i,j,val]")
            i, j, v = e
            i = int(i); j = int(j); v = int(v)
            if not (0 <= i < r and 0 <= j < c):
                raise ValueError("index out of bounds")
            if v == 0:
                continue
            row = rows.get(i)
            if row is None:
                row = {}
                rows[i] = row
            row[j] = row.get(j, 0) + v
            if row[j] == 0:
                del row[j]
        return rows

Arows = normalize(A_entries, ra, ca)
    Brows = normalize(B_entries, rb, cb)

def to_triplets(rows):
        trip = []
        for i in sorted(rows):
            row = rows[i]
            for j in sorted(row):
                v = row[j]
                if v != 0:
                    trip.append([i, j, v])
        return trip

if op == "printA":
        trip = to_triplets(Arows)
        return "
".join(f"{i} {j} {v}" for i, j, v in trip)
    if op == "printB":
        trip = to_triplets(Brows)
        return "
".join(f"{i} {j} {v}" for i, j, v in trip)

if op == "add":
        if ra != rb or ca != cb:
            raise ValueError("dimension mismatch for add")
        res = {}
        def add_from(src):
            for i, row in src.items():
                rrow = res.get(i)
                if rrow is None:
                    rrow = {}
                    res[i] = rrow
                for j, v in row.items():
                    rrow[j] = rrow.get(j, 0) + v
                    if rrow.get(j, 0) == 0:
                        rrow.pop(j, None)
        add_from(Arows)
        add_from(Brows)
        return to_triplets(res)

if op == "multiply":
        if ca != rb:
            raise ValueError("dimension mismatch for multiply")
        res = {}
        for i, arow in Arows.items():
            rrow = res.get(i)
            if rrow is None:
                rrow = {}
                res[i] = rrow
            for k, av in arow.items():
                brow = Brows.get(k)
                if not brow:
                    continue
                for j, bv in brow.items():
                    rrow[j] = rrow.get(j, 0) + av * bv
                    if rrow.get(j, 0) == 0:
                        rrow.pop(j, None)
        return to_triplets(res)

raise ValueError("unsupported op")

We store each matrix as a dictionary-of-keys mapping row -> {col: value}. During normalization we aggregate duplicates and drop zeros so all operations work on canonical sparse rows. Addition merges the two row dictionaries by key, maintaining sparsity and removing zero sums. Multiplication iterates over nonzeros of a row i in A; for each (i,k) we look up row k in B and accumulate contributions into row i of the result (i.e., C[i,j] += A[i,k] * B[k,j]). Output is converted to a sorted list of triplets for add/multiply or a newline-joined string for printA/printB.

Quick Overview

Explanation

Hints

Quick Overview