Validate CSV rows under multiple verification rules
Company: TikTok
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in CSV parsing, string normalization and multi-rule data validation, covering whitespace handling, length checks, case-insensitive substring filtering and set-based word overlap comparisons.
Constraints
- `0 <=` number of data rows `<= 10^4`
- Total length of `data` is at most `10^6` characters
- CSV is simple: commas are separators and do not appear inside field values
- Leading and trailing spaces around field values may exist and should be trimmed
Examples
Input: ('col1,col2,col3,col4,col5,col6\na,land water,c,d,land water LLC,f\na,Good Company,c,d,land water,f\na,b,c,d,e,f\n1,2,3,,5,6',)
Expected Output: ['VERIFIED: land water', 'NOT VERIFIED: Good Company', 'NOT VERIFIED: b', 'NOT VERIFIED: 2']
Explanation: The first row passes all checks. The second fails because `col2` contains the forbidden substring `company`. The third fails because `col5` is too short. The fourth fails because `col4` is empty.
Input: ('col1,col2,col3,col4,col5,col6\na,Alpha Inc Beta,c,alpha gamma,abcde,f\na,delta llc,c,omega delta,1234567890123456789012345678901,f',)
Expected Output: ['VERIFIED: Alpha Inc Beta', 'VERIFIED: delta llc']
Explanation: This tests the boundary lengths for `col5` (5 and 31). It also shows that `Inc` and `LLC` are removed before computing word overlap.
Input: ('col1,col2,col3,col4,col5,col6\na,LLC Inc,c,llc inc,valid text,f\na, red red blue ,c, blue red , valid name ,f',)
Expected Output: ['NOT VERIFIED: LLC Inc', 'VERIFIED: red red blue']
Explanation: The first row fails because removing `LLC` and `Inc` leaves no words in `col2`. The second row passes after trimming spaces, and duplicate words do not matter because the overlap uses sets.
Input: ('col1,col2,col3,col4,col5,col6',)
Expected Output: []
Explanation: With only a header and no data rows, there is nothing to report.
Input: ('col1,col2,col3,col4,col5,col6\na,blue sky,c,green field,blue sky inc,f\na,Acme co.,c,acme,valid name,f',)
Expected Output: ['VERIFIED: blue sky', 'NOT VERIFIED: Acme co.']
Explanation: The first row is verified because `col2` overlaps enough with `col5` even though `col4` does not match. The second row fails because `col2` contains the forbidden substring `co.`.
Hints
- Normalize each row first: split by commas, trim each field, and reject the row early if any basic rule fails.
- For the overlap rule, convert words into sets and compare `len(col2_words & other_words) / len(col2_words)`.