viewof n = Inputs.range([30,100],{ step: 1, label: 'sample size' })
Plot.plot({
style: { },
grid: true,
x: { label: `uniform quantiles →`, line: true },
y: { label: `↑ observed quantiles`, line: true },
marks: [
Plot.link({length: 1}, {
x1: 0,
x2: 1,
y1: 0,
y2: 1,
}),
Plot.dot({length: n}, {
x: d3.range(n).map(i => (i+1)/(n+1)),
y: d3.sort( Array.from({length: n}, d3.randomUniform()) )
}),
]
})
Quantile-Quantile plot (also known as Q-Q plot) is an extremely useful visual tool for exploratory data analysis (EDA). A Q-Q plot is not particularly a summary of data, rather an informal assessment of goodness of fit to discern the disparity of two distributions. Quantiles from one distribution (usually from data) is plotted against those of another distribution (usually a theoretical, known model). For more examples and discussions, see [1].
Quantiles
Since the concept of a Q-Q plot is based on quantiles. We begin with the definition of quantiles of a probability distribution.
Theoretical Quantiles
Definition 1 (Theoretical Quantiles) For any p\in[0,1], a pth quantile of a random variable X is defined to be that value x_p\in\mathbb R such that \mathbb P(X\leq x_p)=p.
In other words, the probability that X realizes a value not greater than a pth quantile is p. For p=\frac{1}{2}, x_p is commonly known as median of X. If F(x) denotes the CDF of X, one notes that x_p=F^{-1}(p), provided the CDF F(x) is invertible1 near x_p.
Quantiles are not unique
In general, quantiles are not unique. Easy examples can be found when X is discrete. For an example on the continuous side, take X\sim\mathrm{unif}([0,1]) to see that any number not less than 1 is a pth quantile for p=1. See the CDF of X below to convince yourself.
Moving forward, we assume that the X is a continuous random variables and that its CDF F is a strictly increasing, continuous function, at least on an interval of the real line. As a consequence, the CDF is invertible everywhere and the pth quantile x_p=F^{-1}(p) is uniquely defined. Examples of such distributions include the exponential, \chi^2- and F-distribution on (0,\infty), normal and Student’s t-distribution on \mathbb R, the Beta distribution on (0,1), etc.
Exercise 1 Find a continuous distribution with the expected value 0 and whose CDF is only intertible on a bounded interval of the real line.
Observed Quantiles
While the quantiles of a probability distribution can be concretely defined (Definition 1), there have been quite a few conventions for the assignment of quantiles for a batch of observations or a dataset. Although, for a large sample they make little to no difference for a descriptive analysis. We use the following convention:
For a random sample X_1, X_2, \ldots, X_n of size n, the order statistics are denoted by X_{(1)}\leq X_{(2)}\leq\ldots\leq X_{(n)}. And, the k/(n+1) quantile of data is assigned to X_{(k)}, the kth-order statistic.
Plotting and Studying Q-Q Plots
Observed vs Theoretical Quantiles
Let us consider a sample of n from a uniform distribution from [0,1]. As proved in
Testing Uniform Random Generator
Two Observed Batches
Conclusion
References
[1]
M. B. Wilk and R. Gnanadesikan, “Probability plotting methods for the analysis of data,” Biometrika, vol. 55, no. 1, pp. 1–17, 1968, Available: http://www.jstor.org/stable/2334448. [Accessed: Aug. 07, 2022]
Footnotes
A function f:A\to B is called invertible near a point x_0\in A if there is an interval I containing the point x_0 such that f is a bijective map when restricted on I.↩︎
Citation
BibTeX citation:
@online{majhi2024,
author = {Majhi, Sushovan},
title = {Quantile-Quantile {Plots}},
date = {2024-11-15},
url = {https://smajhi.com/tutorials/data-science/qq},
langid = {en}
}
For attribution, please cite this work as:
S.
Majhi, “Quantile-Quantile Plots,” Nov. 15, 2024. Available:
https://smajhi.com/tutorials/data-science/qq