Quantile-Quantile Plots

Visualization builds intuitions – math makes them concrete

Author

Sushovan Majhi

Published

November 15, 2024

Quantile-Quantile plot (also known as Q-Q plot) is an extremely useful visual tool for exploratory data analysis (EDA). A Q-Q plot is not particularly a summary of data, rather an informal assessment of goodness of fit to discern the disparity of two distributions. Quantiles from one distribution (usually from data) is plotted against those of another distribution (usually a theoretical, known model). For more examples and discussions, see [1].

Quantiles

Since the concept of a Q-Q plot is based on quantiles. We begin with the definition of quantiles of a probability distribution.

Theoretical Quantiles

Definition 1 (Theoretical Quantiles) For any p\in[0,1], a pth quantile of a random variable X is defined to be that value x_p\in\mathbb R such that \mathbb P(X\leq x_p)=p.

In other words, the probability that X realizes a value not greater than a pth quantile is p. For p=\frac{1}{2}, x_p is commonly known as median of X. If F(x) denotes the CDF of X, one notes that x_p=F^{-1}(p), provided the CDF F(x) is invertible1 near x_p.

Quantiles are not unique

In general, quantiles are not unique. Easy examples can be found when X is discrete. For an example on the continuous side, take X\sim\mathrm{unif}([0,1]) to see that any number not less than 1 is a pth quantile for p=1. See the CDF of X below to convince yourself.

(a) PDF
(b) CDF
Figure 1: The density (left) and cumulative distribution (right) functions of uniform [0,1] are shown by the blue lines. The CDF is only invertible on the support.

Moving forward, we assume that the X is a continuous random variables and that its CDF F is a strictly increasing, continuous function, at least on an interval of the real line. As a consequence, the CDF is invertible everywhere and the pth quantile x_p=F^{-1}(p) is uniquely defined. Examples of such distributions include the exponential, \chi^2- and F-distribution on (0,\infty), normal and Student’s t-distribution on \mathbb R, the Beta distribution on (0,1), etc.

Exercise 1 Find a continuous distribution with the expected value 0 and whose CDF is only intertible on a bounded interval of the real line.

Observed Quantiles

While the quantiles of a probability distribution can be concretely defined (Definition 1), there have been quite a few conventions for the assignment of quantiles for a batch of observations or a dataset. Although, for a large sample they make little to no difference for a descriptive analysis. We use the following convention:

For a random sample X_1, X_2, \ldots, X_n of size n, the order statistics are denoted by X_{(1)}\leq X_{(2)}\leq\ldots\leq X_{(n)}. And, the k/(n+1) quantile of data is assigned to X_{(k)}, the kth-order statistic.

Plotting and Studying Q-Q Plots

Observed vs Theoretical Quantiles

Let us consider a sample of n from a uniform distribution from [0,1]. As proved in

Testing Uniform Random Generator

Two Observed Batches

Conclusion

References

[1]
M. B. Wilk and R. Gnanadesikan, “Probability plotting methods for the analysis of data,” Biometrika, vol. 55, no. 1, pp. 1–17, 1968, Available: http://www.jstor.org/stable/2334448. [Accessed: Aug. 07, 2022]

Footnotes

  1. A function f:A\to B is called invertible near a point x_0\in A if there is an interval I containing the point x_0 such that f is a bijective map when restricted on I.↩︎

Citation

BibTeX citation:
@online{majhi2024,
  author = {Majhi, Sushovan},
  title = {Quantile-Quantile {Plots}},
  date = {2024-11-15},
  url = {https://smajhi.com/tutorials/data-science/qq},
  langid = {en}
}
For attribution, please cite this work as:
S. Majhi, “Quantile-Quantile Plots,” Nov. 15, 2024. Available: https://smajhi.com/tutorials/data-science/qq