This question evaluates proficiency in algorithmic string processing and implementation of foundational machine learning models, covering skills such as designing an efficient distinct-character substring algorithm, implementing binary logistic regression training and prediction, and building a multinomial Naive Bayes text classifier.

You are given three implementation tasks that mix algorithms and basic machine learning models.
Given a string s consisting of ASCII characters, find the length of the longest contiguous substring in which all characters are distinct (no character appears more than once within that substring).
s
.
s
that has all unique characters.
Example:
s = abcabcbb
→ the answer is
3
(one such substring is
abc
).
s = bbbbb
→ the answer is
1
(any single
b
).
s = ""
(empty string) → the answer is
0
.
Constraints (typical):
0 <= len(s) <= 10^5
O(n)
where
n = len(s)
O(min(n, alphabet_size))
Implement a function with a signature similar to:
int longestDistinctSubstringLength(const string& s)
(C++), or
int longest_distinct_substring_length(string s)
(Python-style pseudocode).
Implement a binary logistic regression classifier without using any machine learning libraries.
Design a class LogisticRegression that supports:
learning_rate: float
num_iterations: int
X
is an
n x d
numeric matrix of features (e.g., list of length
n
, each element a length-
d
list of floats).
y
is a length-
n
list/array of labels, each label is either
0
or
1
.
i
, compute
z_i = w · x_i + b
, then probability of class 1 is
p_i = sigmoid(z_i) = 1 / (1 + exp(-z_i))
.
- Update `w` and `b` with gradient descent for `num_iterations` steps.
X
, return a list/array of predicted probabilities
p_i
for class
1
for each row.
X
, first compute probabilities with
predict_proba
, then convert to labels
0
or
1
using threshold
0.5
.
Constraints (typical):
n
(number of samples) up to about
10^4
.
d
(number of features) up to about
10^2
.
10^4
.
You may implement in any language, but do not use built-in ML training utilities.
Implement a multinomial Naive Bayes classifier for text classification.
Assume:
docs_train
, a list of documents, where each document is represented as a list of token strings, e.g.
[["this", "movie", "is", "great"], ["awful", "film"], ...]
.
y_train
, a list of integers where
y_train[i]
is the class index for
docs_train[i]
(values in
{0, 1, ..., C-1}
).
docs_test
, same format as
docs_train
but without labels.
Design a class NaiveBayes that supports:
docs_train
.
C
be the number of classes.
c
and token
w
in the vocabulary, estimate conditional probabilities with
Laplace smoothing
(add-one smoothing):
where `V` is the vocabulary size.
log P(y = c)
and
log P(w | y = c)
) to avoid underflow.
c
, compute the log posterior up to a constant:
c
with the highest log score.
Constraints (typical):
10^4
.
10^4
.
Focus on correctly implementing the core training (fit) and prediction (predict) logic; you do not need to handle file I/O or text preprocessing (assume tokenization is already done).