PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates skills in CSV parsing, string normalization and multi-rule data validation, covering whitespace handling, length checks, case-insensitive substring filtering and set-based word overlap comparisons.

  • medium
  • TikTok
  • Coding & Algorithms
  • Software Engineer

Validate CSV rows under multiple verification rules

Company: TikTok

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given a multi-line string representing a CSV dataset. - The first line is a header. - Each subsequent line contains **exactly 6 columns**: `col1,col2,col3,col4,col5,col6`. For each data row (excluding the header), output one line in the same order: - `VERIFIED: <col2_value>` or - `NOT VERIFIED: <col2_value>` A row is **VERIFIED** only if it satisfies **all** rules below. ## Rules 1. **No empty fields**: all 6 columns must be present and **non-empty**. A value that is only whitespace counts as empty. 2. **Column 5 length constraint**: `len(trim(col5))` must be between **5 and 31 inclusive**. 3. **Forbidden words in col2**: `col2` must **not** contain any of these substrings (case-insensitive): - `company`, `firm`, `co.`, `corporation`, `group` 4. **Word overlap requirement**: - Split `col2`, `col4`, and `col5` into lowercase words by whitespace. - Remove the words `LLC` and `Inc` (case-insensitive) from the comparison sets. - `col2` must share **at least 50%** of its (remaining) words with **either** `col4` **or** `col5`. ## Input A single string `data` containing the CSV content. ## Output Print one line per data row: `VERIFIED: ...` or `NOT VERIFIED: ...`. ## Example Input: ``` col1,col2,col3,col4,col5,col6 a,land water,c,d,land water LLC,f a,Good Company,c,d,land water,f a,b,c,d,e,f 1,2,3,,5,6 ``` Output: ``` VERIFIED: land water NOT VERIFIED: Good Company NOT VERIFIED: b NOT VERIFIED: 2 ``` ## Notes - Maintain row order in output. - Assume the CSV does not contain escaped commas inside fields (simple split by commas is acceptable unless you choose to implement full CSV parsing).

Quick Answer: This question evaluates skills in CSV parsing, string normalization and multi-rule data validation, covering whitespace handling, length checks, case-insensitive substring filtering and set-based word overlap comparisons.

Implement a function `solution(data)` that validates rows in a simple CSV dataset and returns one status string per data row. The input is a single multi-line string. The first line is a header and should be ignored. Each remaining non-blank line should be split by commas into fields, and leading/trailing whitespace should be trimmed from every field before validation and before reporting `col2` in the output. For each data row, return either `VERIFIED: <col2>` or `NOT VERIFIED: <col2>`. A row is VERIFIED only if all of the following are true: 1. It has exactly 6 columns, and all 6 trimmed fields are non-empty. 2. The trimmed value of `col5` has length between 5 and 31 inclusive. 3. The trimmed value of `col2` does not contain any of these substrings, case-insensitively: `company`, `firm`, `co.`, `corporation`, `group`. 4. For the word-overlap rule: - Split `col2`, `col4`, and `col5` into lowercase words using whitespace. - Remove the words `llc` and `inc` from each set. - Treat the remaining words as sets of unique words. - At least 50% of the remaining words in `col2` must appear in either `col4` or `col5`. - If no words remain in `col2` after removing `llc` and `inc`, the row is NOT VERIFIED. Return the results in the same row order.

Constraints

  • `0 <=` number of data rows `<= 10^4`
  • Total length of `data` is at most `10^6` characters
  • CSV is simple: commas are separators and do not appear inside field values
  • Leading and trailing spaces around field values may exist and should be trimmed

Examples

Input: ('col1,col2,col3,col4,col5,col6\na,land water,c,d,land water LLC,f\na,Good Company,c,d,land water,f\na,b,c,d,e,f\n1,2,3,,5,6',)

Expected Output: ['VERIFIED: land water', 'NOT VERIFIED: Good Company', 'NOT VERIFIED: b', 'NOT VERIFIED: 2']

Explanation: The first row passes all checks. The second fails because `col2` contains the forbidden substring `company`. The third fails because `col5` is too short. The fourth fails because `col4` is empty.

Input: ('col1,col2,col3,col4,col5,col6\na,Alpha Inc Beta,c,alpha gamma,abcde,f\na,delta llc,c,omega delta,1234567890123456789012345678901,f',)

Expected Output: ['VERIFIED: Alpha Inc Beta', 'VERIFIED: delta llc']

Explanation: This tests the boundary lengths for `col5` (5 and 31). It also shows that `Inc` and `LLC` are removed before computing word overlap.

Input: ('col1,col2,col3,col4,col5,col6\na,LLC Inc,c,llc inc,valid text,f\na, red red blue ,c, blue red , valid name ,f',)

Expected Output: ['NOT VERIFIED: LLC Inc', 'VERIFIED: red red blue']

Explanation: The first row fails because removing `LLC` and `Inc` leaves no words in `col2`. The second row passes after trimming spaces, and duplicate words do not matter because the overlap uses sets.

Input: ('col1,col2,col3,col4,col5,col6',)

Expected Output: []

Explanation: With only a header and no data rows, there is nothing to report.

Input: ('col1,col2,col3,col4,col5,col6\na,blue sky,c,green field,blue sky inc,f\na,Acme co.,c,acme,valid name,f',)

Expected Output: ['VERIFIED: blue sky', 'NOT VERIFIED: Acme co.']

Explanation: The first row is verified because `col2` overlaps enough with `col5` even though `col4` does not match. The second row fails because `col2` contains the forbidden substring `co.`.

Hints

  1. Normalize each row first: split by commas, trim each field, and reject the row early if any basic rule fails.
  2. For the overlap rule, convert words into sets and compare `len(col2_words & other_words) / len(col2_words)`.
Last updated: May 12, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Parse a nested list from a string - TikTok (medium)
  • Implement stacks, streaming median, and upward path sum - TikTok (easy)
  • Maximize sum with no adjacent elements - TikTok (medium)
  • Implement stack variants and path-sum check - TikTok (medium)
  • Find the longest palindromic substring - TikTok (easy)