You are given several short, independent probability and statistics questions similar to those in a data / ML screening test. Answer all sub-questions.
1. Ordering Poisson distributions by their rate parameter
You are told that three different Poisson-distributed random variables XA,XB,XC have their probability mass functions (PMFs) plotted on the same graph (support on non-negative integers). The plots are described as follows:
-
Distribution A
: Most of its probability mass is concentrated on values 0, 1, and 2. The mode (highest bar) is at 1. The probability at 0 is high, and probabilities drop off quickly after 3.
-
Distribution B
: The mode is at 3. The distribution is more spread out than A: there is still noticeable probability up to around 6 or 7, but very little after that.
-
Distribution C
: The mode is at 5. The distribution is the most spread-out of the three, with noticeable probability from around 2 up to 10 or more.
All three are Poisson distributions with parameters λA, λB, and λC respectively.
Question 1: Based on the qualitative description of the plots, order the three rate parameters from smallest to largest:
-
(a)
λA
,
λB
,
λC
-
(b)
λC
,
λB
,
λA
-
(c)
λA
,
λC
,
λB
-
(d)
λB
,
λA
,
λC
Pick the correct ordering and briefly justify your choice.
2. Identifying type of bias in a study
A startup incubator wants to understand “what makes startups successful.” They collect data only from companies that have already raised Series C or later funding and are still operating. They analyze features such as team size, prior founder experience, average age of founders, and industry, and then publish a report claiming: “These are the characteristics that make startups successful.”
You are asked: What is the primary type of bias in this study? Choose the best option and briefly explain.
-
(a) Survivor (survivorship) bias
-
(b) Sampling bias (non-representative sample)
-
(c) Recall bias
-
(d) No bias; the study design is appropriate
(You may mention more than one kind of bias if relevant, but identify the primary/statistically standard name.)
3. True/False questions on covariance and correlation
For each of the following statements about covariance and correlation between two real-valued random variables X and Y, answer True or False and provide a brief justification.
-
Statement A:
If
Cov(X,Y)=0
, then
X
and
Y
are independent.
-
Statement B:
The Pearson correlation coefficient
ρXY
is always between
−1
and
1
, inclusive.
-
Statement C:
The Pearson correlation coefficient between
X
and
Y
is given by
ρXY=σXσYCov(X,Y),
where σX and σY are the standard deviations of X and Y, respectively.
4. Statement D: If we rescale X by a positive constant a>0, i.e., define X′=aX, then the correlation between X′ and Y is the same as the correlation between X and Y.
5. Statement E: A very high correlation between X and Y implies that X causes changes in Y.
State True/False for each and justify in one or two sentences.