Invented schema and sample data below. Assume 'today' is 2025-09-01 and 'last 7 days' means 2025-08-26 through 2025-09-01 inclusive. Only consider lists whose created_at falls in that window.
Schema:
Sample tables: lists +---------+----------+------------+ | list_id | owner_id | created_at | +---------+----------+------------+ | 1 | 10 | 2025-08-30 | | 2 | 11 | 2025-08-31 | | 3 | 12 | 2025-09-01 | +---------+----------+------------+
list_items +---------+------+ | list_id | item | +---------+------+ | 1 | A | | 1 | B | | 1 | C | | 2 | A | | 2 | C | | 2 | D | | 3 | B | | 3 | C | | 3 | E | +---------+------+
Task: Write a single SQL query that returns exactly one row: (list_id_small, list_id_large, overlap_count, jaccard) for the unordered pair of lists with the maximum item overlap among eligible lists. Define overlap_count = |items(Li) ∩ items(Lj)| and jaccard = overlap_count / |items(Li) ∪ items(Lj)|. Break ties by higher jaccard, then by (list_id_small, list_id_large) ascending. For the sample data above, the expected row is (1, 2, 2, 0.5). Requirements: 1) Do not count duplicate items within a list more than once. 2) Use standard SQL constructs (CTEs/window functions allowed). 3) Explain indexes you would add to make this fast at scale. 4) Provide a variant that returns the top-5 pairs.