3 Portfolio Theory
3.1 Introduction
TODO: Given some menu of possible investments, what mix should we hold? How should we hold value through time?
Mean-variance analysis is part of fundamental analysis, specifically within modern portfolio theory (MPT). Developed by Harry Markowitz in 1952 and later expanded by William Sharpe, it focuses on analyzing fundamental characteristics of securities, specifically their expected returns (mean) and risk (variance), to construct optimal portfolios.
While technical analysis looks at price patterns and trading volumes to predict future movements, mean-variance analysis examines the underlying statistical properties of assets and their relationships to each other through correlations and covariances. This approach aims to find portfolios that maximize expected return for a given level of risk, or minimize risk for a given expected return, creating what’s known as the efficient frontier.
The key distinction is that mean-variance analysis relies on fundamental properties of the assets (their return distributions and correlations) rather than chart patterns or technical indicators. It forms part of the theoretical foundation for quantitative fundamental analysis in portfolio management.
3.2 Modern portfolio theory
3.2.1 History and pedagogy
Keywords:
- Modern portfolio theory (MPT)
- Harry Markowitz (1927-2023)
- Markowitz model
- Sharpe ratio
Historical background:
- Markowitz, H.M. (1952). Portfolio selection. 1
- Roy, A.D. (1952). Safety first and the holding of assets. 2
- Markowitz, H.M. (1959). Portfolio Selection: Efficient Diversification of Investments. 3
- Merton, R.C. (1972). An analytic derivation of the efficient portfolio frontier. 4
- Levy, H. & Markowitz, H.M. (1979). Approximating expected utility by a function of mean and variance. 5
- In 1990, Harry Markowitz, Merton Miller, and William F. Sharpe were awarded the Nobel Prize in Economics “for their pioneering work in the theory of financial economics”.
- Markowitz, H.M. (1990). Nobel lecture: Foundations of portfolio theory. 6
- Markowitz, H.M. (2005). Market efficiency: A theoretical distinction and so what? 7
1 Markowitz (1952).
2 Roy (1952).
3 Markowitz (1959).
4 Merton (1972).
5 Levy & Markowitz (1979).
6 Markowitz (1990).
7 Markowitz (2005).
Lecture notes:
- Armerin, F. (2023). Lecture notes: More on mean-variance analysis.
- Caflisch, R. (2003). Lecture notes: Mathematics of Finance.
- Das, S.R. (2016). Data Science: Theories, Models, Algorithms, and Analytics. 8
- Also: Das, S.R. (2017). Being mean with variance: Markowitz optimization.
- Ireland, P. (2013). Lecture notes: Principles of Macroeconomics.
- Ireland, P. (2024). Lecture notes: Mathematics for Economists.
- Ireland, P. (2025). Lecture notes: Financial Economics.
- In particular, Lecture 6
- Kasa, K. (2023). Lecture notes by Ken Kasa (SFU)
- In particular, Lecture 7
- Kwok, Y.K. (2017). Lecture notes: Fundamentals of Mathematical Finance. 9
- In particular, Lecture 2
- Sigman, K. (2005). Notes on fund theorems.
- Tam, A.S. (2021). Lagrangians and portfolio optimization.
3.2.2 Markowitz portfolio problem
Return of a portfolio:
\[ r = \vec{w}^\intercal \, \vec{r} = \sum_i w_{i} \, r_{i} \]
Variance of a portfolio:
\[ \sigma^2 = \vec{w}^\intercal \, V \, \vec{w} = \sum_{ij} w_{i} \, V_{ij} \, w_{j} \]
TODO: Show above 10
10 Luenberger (1998), p. 150.
Given an \(n\)-dimensional vector of expected returns, \(\vec{\mu}\), an \(n\times{}n\)-dimensional expected covariance matrix, \(V\), an \(m\times{}n\)-dimensional constraint matrix, \(A\), an \(m\)-dimensional constraint vector, \(\vec{b}\), and a target return, \(r_{\ast}\), solve for the portfolio weights, \(\vec{w}_{\ast}\), an \(n\)-dimensional vector, that are efficient, i.e. those that minimize the standard deviation of the portfolio return, \(\sigma\), for a given target return. Return \((\vec{w}_{\ast}, \sigma_{\ast})\). 11
Solve
\[ \vec{w}_{\ast} = \underset{w}{\mathrm{argmin}}\ \vec{w}^\intercal \, V \, \vec{w} \]
such that
\[ \vec{w} \cdot \vec{1} = 1 \]
\[ \vec{w} \cdot \vec{\mu} = r_{\ast} \]
and with further optional constraints
\[ A \, \vec{w} \geq \vec{b} \]
11 Markowitz (1959), p. 172.
There are a lot of topics to discuss about solving for the efficient frontier:
- How there is an analytic solution if you allow shorts
- Solving with Lagrange multipliers
- Solving with numerical convex optimization
TODO: Discuss the above more.
It can be shown12 that there is an analytic solution where:
12 Merton (1972) was the first to show there was an analytic solution to the Markowitz portfolio problem? For the analytic results descussed here, we generally follow Kwok (2017). Note that we use variable names following Kwok, whereas to convert from Merton to Kwok: \(a_\mathrm{M} = b_\mathrm{K}\), \(b_\mathrm{M} = c_\mathrm{K}\), \(c_\mathrm{M} = a_\mathrm{K}\).
\[ a \equiv \vec{1}^\intercal \, V^{-1} \, \vec{1}, \qquad b \equiv \vec{1}^\intercal \, V^{-1} \, \vec{\mu}, \qquad c \equiv \vec{\mu}^\intercal \, V^{-1} \, \vec{\mu}, \qquad d \equiv a\,c - b^2 \]
There are two efficient portfolios of note: the minimum variance portfolio, \(\vec{w}_\mathrm{min}\), and the tangent portfolio, \(\vec{w}_\mathrm{tan}\).
The minimum variance portfolio is
\[ \vec{w}_\mathrm{min} = \frac{V^{-1} \, \vec{1}}{a} = \frac{V^{-1} \, \vec{1}}{\vec{1}^\intercal \, V^{-1} \, \vec{1}} \]
It has a return
\[ r_\mathrm{min} = \vec{w}_\mathrm{min} \cdot \vec{\mu} = \frac{\vec{1}^\intercal \, V^{-1} \, \vec{\mu}}{a} = \frac{b}{a} \]
and a variance
\[ \sigma_\mathrm{min}^2 = \vec{w}_\mathrm{min}^\intercal \, V \, \vec{w}_\mathrm{min} = \left( \frac{\vec{1}^\intercal \, V^{-1}}{a} \right) V \left( \frac{V^{-1} \, \vec{1}}{a} \right) = \frac{\vec{1}^\intercal \, V^{-1} \, \vec{1}}{a^2} = \frac{1}{a} \]
The tangent portfolio is
\[ \vec{w}_\mathrm{tan} = \frac{V^{-1} \, \vec{\mu}}{b} = \frac{V^{-1} \, \vec{\mu}}{\vec{1}^\intercal \, V^{-1} \, \vec{\mu}} \]
It has a return
\[ r_\mathrm{tan} = \vec{w}_\mathrm{tan} \cdot \vec{\mu} = \frac{\vec{\mu}^\intercal \, V^{-1} \, \vec{\mu}}{b} = \frac{c}{b} \]
and a variance
\[ \sigma_\mathrm{tan}^2 = \vec{w}_\mathrm{tan}^\intercal \, V \, \vec{w}_\mathrm{tan} = \left( \frac{\vec{\mu}^\intercal \, V^{-1}}{b} \right) V \left( \frac{V^{-1} \, \vec{\mu}}{b} \right) = \frac{\vec{\mu}^\intercal \, V^{-1} \, \vec{\mu}}{b^2} = \frac{c}{b^2} \]
The efficient frontier can be written as a linear combination of any two efficient portfolios. This is discussed in more detail in the section on Fund theorems. Written as a combination of the minimum variance and the tangent portfolios gives
\[ \vec{w}_{\ast} = \psi \, \vec{w}_\mathrm{min} + (1-\psi) \, \vec{w}_\mathrm{tan} \]
where
\[ \psi = (c - b \, r_{\ast}) \, a \, / \, d \]
The efficient frontier portfolio can be equivalently written
\[\begin{align} \vec{w}_{\ast} &= \psi \, \vec{w}_\mathrm{min} + (1-\psi) \, \vec{w}_\mathrm{tan} \\ &= \left( \frac{c - b \, r_{\ast}}{d} \right) a \, \vec{w}_\mathrm{min} + \left( \frac{a \, r_{\ast} - b}{d} \right) b \, \vec{w}_\mathrm{tan} \\ &= \left( \frac{c - b \, r_{\ast}}{d} \right) V^{-1} \, \vec{1} + \left( \frac{a \, r_{\ast} - b}{d} \right) V^{-1} \, \vec{\mu} \end{align}\]
Along the frontier, the return is
\[ r_{\ast} = \psi \, r_\mathrm{min} + (1-\psi) \, r_\mathrm{tan} \]
The variance is
\[ \sigma^2_{\ast} = \frac{a}{d} \, r_{\ast}^{2} - \frac{2 \, b}{d} \, r_{\ast} + \frac{c}{d} \]
TODO: Note calculation order of \(\vec{w}_\mathrm{min}(\mu, V)\) and \(\vec{w}_\mathrm{tan}(\mu, V, r_\mathrm{f})\), then calculate \(r_{\ast}(\sigma_{\ast})\), scanning from \(\sigma_\mathrm{min}\) to \(\sigma_\mathrm{max}\).

In general, depending on the correlations of the assets, the efficient frontier portfolios will short various positions, indicated by having negative weights.
3.2.3 No-shorts frontier
If one adds an additional constraint to the Markowitz portfolio problem as stated, requiring that we don’t short any positions
\[ w_i \geq 0 \]
then the problem doesn’t have an analytic solution. TODO: Citation needed.
The no-shorts frontier can be solved numerically with quadratic programming. In general, the no-shorts frontier will follow the unconstrained efficient frontier when there isn’t any shorting in the efficient portfolios, and the no-shorts frontier will pull away from the efficient frontier to somewhat lower returns when there is shorting on the efficient frontier.
An example of the efficient frontier and the no-shorts frontier is shown in Figure 3.2.

Quadratic programming and convex optimization are discussed in more detail in the section on Convex optimization.
3.2.4 Efficient-market hypothesis
- Efficient-market hypothesis
- Eugene Fama (b. 1939)
- Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. 13
13 Fama (1970).
3.2.5 Lessons of MPT
Markowitz:
[I]n trying to make variance small it is not enough to invest in many securities. It is necessary to avoid investing in securities with high covariances among themselves. We should diversify across industries because firms in different industries, especially industries with different economic characteristics, have lower covariances than firms within an industry. 14
14 Markowitz (1952), p. 89.
Dalio: “The Holy Grail”, see Figure 3.3.

3.3 Estimation of covariance matrices
3.3.1 Overview
This is how we estimate \(V\) (and \(\mu\)).
- Estimation of covariance matrices
- Marsaglia, G. (1964). Conditional means and covariances of normal variables with singular covariance matrix. 15
- Coqueret, G. & Milhau, V. (2014). Estimating covariance matrices for portfolio optimization 16
- Fan, J., Liao, Y., & Liu, H. (2015). An overview on the estimation of large covariance and precision matrices. 17
- Ayyala, D.N. (2020). High-dimensional statistical inference: Theoretical development to data analytics. 18

Software:
3.3.2 Sample mean and covariance
TODO:
- Sample mean and covariance
- Rolling mean and covariance
3.3.3 Online mean and covariance
- Algorithms for calculating variance
- Welford, B.P. (1962). Note on a method for calculating corrected sums of squares and products. 19
- Neely, P.M. (1966). Comparison of several algorithms for computation of means, standard deviations and correlation coefficients. 20
- Youngs, E.A. & Cramer, E.M. (1971). Some results relevant to choice of sum and sum-of-product algorithms. 21
- Ling, R.F. (1974). Comparison of several algorithms for computing sample means and variances. 22
- Chan, T.F., Golub, G.H., & LeVeque, R.J. (1979). Updating formulae and a pairwise algorithm for computing sample variances. 23
- Pébay, P. (2008). Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments. 24
- Finch, T. (2009). Incremental calculation of weighted mean and variance. 25
- Cook, J.D. (2014). Accurately computing running variance.
- Meng, X. (2015). Simpler online updates for arbitrary-order central moments. 26
- Pébay, P., Terriberry, T.B., Kolla, H. & Bennett, J. (2016). Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. 27
- Schubert, E. & Gertz, M. (2018). Numerically stable parallel computation of (co-)variance. 28
- Chen, C. (2019). Welford algorithm for updating variance.
19 Welford (1962).
20 Neely (1966).
21 Youngs & Cramer (1971).
22 Ling (1974).
23 Chan, Golub, & LeVeque (1979).
24 Pébay (2008).
25 Finch (2009).
26 Meng (2015).
27 Pébay, Terriberry, Kolla, & Bennett (2016).
28 Schubert & Gertz (2018).
Univariate central moments:
\[ \mu_{p} = \mathbb{E}\left[(x - \mu_{1})^{p}\right] \]
\[ M_{p} = \sum_{i}^{n} (x_i - \mu_{1})^{p} \]
\[ \mu_{p} = \frac{M_{p}}{n} \,, \qquad \mu = \frac{M_{1}}{n} \,, \qquad \sigma^2 = \frac{M_{2}}{n} \]
Online mean:
\[ \delta \equiv x_n - \mu_{n - 1} \]
\[ \hat{\mu}_{n} = \mu_{n-1} + \frac{\delta}{n} \]
Online variance:
\[ S_{n} = M_{2,n} \]
\[ \hat{\sigma}^2 = \frac{S_{n}}{n-1} \]
where the \(n-1\) includes Bessel’s correction for sample variance.
Incrementally,
\[ S_{n} = S_{n-1} + (x_n - \mu_{n - 1}) (x_n - \mu_n) \]
Note that for \(n > 1\),
\[ (x_n - \mu_n) = \frac{n-1}{n} (x_n - \mu_{n - 1}) \]
Therefore,
\[ S_{n} = S_{n-1} + \frac{n-1}{n} (x_n - \mu_{n - 1}) (x_n - \mu_{n - 1}) \]
\[ S_{n} = S_{n-1} + \frac{n-1}{n} \delta^2 = S_{n-1} + \delta \left( \delta - \frac{\delta}{n} \right) \]
Online covariance (Welford algorithm):
\[ C_{n}(x, y) = C_{n-1} + (x_n - \bar{x}_{n - 1}) (y_n - \bar{y}_n) = C_{n-1} + \delta_{x} \delta_{y}^\prime \]
\[ C_{n}(x, y) = C_{n-1} + \frac{n-1}{n} (x_n - \bar{x}_{n - 1}) (y_n - \bar{y}_{n - 1}) = C_{n-1} + \frac{n-1}{n} \delta_{x} \delta_{y} \]
\[ \hat{V}_{xy} = \frac{C_{n}(x, y)}{n-1} \]
Matrix form:
\[ C_{n} = C_{n-1} + \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right) \left( \vec{x}_{n} - \vec{\mu}_{n} \right)^\intercal = C_{n-1} + \vec{\delta} \: \vec{\delta^\prime}^\intercal \]
\[ C_{n} = C_{n-1} + \frac{n-1}{n} \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right) \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right)^\intercal = C_{n-1} + \frac{n-1}{n} S(\vec{x}_{n}, \vec{\mu}_{n-1}) \]
\[ \hat{V} = \frac{C_{n}}{n-1} \]
Note that the update term for the online covariance is a term in a scatter matrix, \(S\), using the currently observed data, \(\vec{x}_{n}\), and the previous means, \(\vec{\mu}_{n-1}\). But also note that the \(\vec{\delta} \: \vec{\delta^\prime}^\intercal\) form is also convenient because it comes naturally normalized and can be readily generalized for weighting.
Weighted mean:
\[ \hat{\mu}_{n} = \mu_{n-1} + \frac{w_{n,n}}{W_n} \delta = \mu_{n-1} + \frac{w_{n,n}}{W_n} (x_n - \mu_{n - 1}) \]
where
\[ W_{n} = \sum_{i=1}^{n} w_{n,i} \]
Weighted covariance:
\[ C_{n} = \frac{W_n - w_{n,n}}{W_{n-1}} C_{n-1} + w_{n,n} \left( x_{n} - \bar{x}_{n-1} \right) \left( y_{n} - \bar{y}_{n} \right) \]
\[ C_{n} = \frac{W_n - w_{n,n}}{W_{n-1}} C_{n-1} + w_{n,n} \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right) \left( \vec{x}_{n} - \vec{\mu}_{n} \right)^\intercal \]
where
\[ \hat{V} = \frac{C_{n}}{W_{n}} \]
Exponential-weighted mean:
\[ \alpha = 1 - \mathrm{exp}\left( \frac{-\Delta{}t}{\tau} \right) \simeq \frac{\Delta{}t}{\tau} \]
\[ \hat{\mu}_{n} = \mu_{n-1} + \alpha (x_{n} - \mu_{n-1}) = (1 - \alpha) \mu_{n-1} + \alpha x_{n} \]
Exponential-weighted covariance:
\[ C_{n} = (1 - \alpha) C_{n-1} + \alpha \left( x_{n} - \bar{x}_{n-1} \right) \left( y_{n} - \bar{y}_{n} \right) \]
\[ C_{n} = (1 - \alpha) C_{n-1} + \alpha \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right) \left( \vec{x}_{n} - \vec{\mu}_{n} \right)^\intercal \]
where by summing a geometric series, one can show that for exponential weighting, \(W_{n} = 1\), so \(\hat{V} = C_{n}\).
Rolling mean (reverse Welford algorithm):
\[ \vec{\mu}_{n-1} = \vec{\mu}_{n} - \frac{1}{n-1} \left( \vec{x}_{n} - \vec{\mu}_{n} \right) \]
Rolling covariance (reverse Welford algorithm):
\[ C_{n-1} = C_{n} - \left( \vec{x}_{n} - \vec{\mu}_{n-1} \right) \left( \vec{x}_{n} - \vec{\mu}_{n} \right)^\intercal = C_{n} - \vec{\delta} \: \vec{\delta^\prime}^\intercal \]
3.3.4 Shrinkage estimators
- Shrinkage
- Ledoit, O. & Wolf, M. (2001). Honey, I shrunk the sample covariance matrix. 29
- Ledoit, O. & Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. 30
3.3.5 Precision matrices
- Precision matrices
- Galloway, M. (2019). Shrinking characteristics of precision matrix estimators: An illustration via regression. 31
- Bax, K., Taufer, E., & Paterlini, S. (2022). A generalized precision matrix for t-Student distributions in portfolio optimization. 32
- Dutta, S. & Jain, S. (2023). Precision versus shrinkage: A comparative analysis of covariance estimation methods for portfolio allocation. 33
TODO:
- Why are precision matrices sparse?
3.4 Convex optimization
This is how we minimize \(\sigma\).
- Linear programming
- George Dantzig (1914-2005)
- Quadratic programming
- No-shorts efficient frontier
- Karush-Kuhn-Tucker (KKT) conditions
- Jagannathan, R. & Ma, T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. 34
- Tam, A.S. (2021). Lagrangians and portfolio optimization.
- Markowitz’s Critical Line Algorithm (CLA)
- Markowitz, H.M. (1956). The optimization of a quadratic function subject to linear constraints. 35
- Bailey, D.H. & López de Prado, M. (2013). An open-source implementation of the critical-line algorithm for portfolio optimization. 36
- Markowitz, H.M., Starer, D., Fram, H., & Gerber, S. (2019). Avoiding the downside: A practical review of the Critical Line Algorithm for mean-semivariance portfolio optimization. 37
- Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. 38
- Software:
34 Jagannathan & Ma (2003).
35 Markowitz (1956).
36 Bailey & López de Prado (2013).
37 Markowitz, Starer, Fram, & Gerber (2019).
38 Boyd & Vandenberghe (2004).
TODO: Discuss optimizing the no-shorts frontier.
3.5 Fund theorems
3.5.1 Mutual fund separation theorem
- Mutual fund separation theorem
- Cass, D. & Stiglitz, J.E. (1970). The structure of investor preferences and asset returns, and separability in portfolio allocation: A contribution to the pure theory of mutual funds. 39
- Chamberlain, G. (1983). A characterization of the distributions that imply mean-variance utility functions. 40
- Owen, J. & Rabinovitch, R. (1983). On the class of elliptical distributions and their applications to the theory of portfolio choice. 41
Cass & Stiglitz:
[G]iven a market in which there are available \(n\) different assets, nonetheless all the opportunities relevant to the investor’s decision can be provided by a set of \(m\) (\(< n\)) “mutual funds,” i.e., a set of \(m\) linear combinations (with weights adding to one) of the available assets. 42
42 Cass & Stiglitz (1970), p. 122.
3.5.2 Two-fund theorem
Continuing the discussion of the context of a portfolio of risky assets (no risk-free asset; to be considered in the next section).
Tobin43 is often credited as the first to note, and later Merton44 exposited more formally, the Two-fund theorem:
Merton:
Given \(m\) assets satisfying the conditions […], there are two portfolios (“mutual funds”) constructed from these \(m\) assets, such that all risk-averse individuals, who choose their portfolios so as to maximize utility functions dependent only on the mean and variance of their portfolios, will be indifferent in choosing between portfolios from among the original \(m\) assets or from these two funds. 45
45 Merton (1972), p. 1858.
Kasa:
Any portfolio on the efficient frontier can be written as a linear combination of two fixed efficient portfolios.
\[ \vec{w}_{\ast} = \psi \, \vec{w}_{1} + (1-\psi) \, \vec{w}_{2} \]
TODO: reparameterize? 46
46 TODO: Throughout this we have parameterized \(\psi\) such as it goes from 0 to 1, we go from holding asset 2 to 1. Let’s reparameterize so that \(\psi \rightarrow (1-\psi)\).
3.5.3 One-fund theorem
Now we consider adding the posibility of holding a risk-free asset with a risk-free return, \(r_\mathrm{f}\).
One-fund theorem:
Kwok:
Any efficient portfolio can be expressed as a [linear] combination of the risk free asset and the portfolio (or fund) represented by \(M\).
Kasa:
Any portfolio on the efficient frontier can be written as a linear combination of one fixed efficient non-risk-free portfolio and the risk-free asset.
The portfolio weights are
\[ \vec{w}_{\ast} = \kappa \, \vec{w}_\mathrm{f} + (1-\kappa) \, \vec{w}_\mathrm{tan} \]
The portfolio return is
\[ r_{\ast} = \kappa \, r_\mathrm{f} + (1-\kappa) \, r_\mathrm{tan} \]
The portfolio standard deviation is
\[ \sigma_{\ast} = \left| 1-\kappa \right| \sigma_\mathrm{tan} \]
Since the efficient frontier is a linear combination of the risk-free, “cash”, and a single portfolio of risky assets, “stocks”, then it forms a line in return-risk-space from the risk-free asset to the tangent portfolio, and follows the line further up if one allows borrowing at the risk-free rate and investing in the tangent portfolio. This line is called the Capital Allocation Line because it represents the possible portfolios one can have depending on how much of their cash they have deployed into risky assets in the market.
The functional form of the Capital Allocation Line is
\[ r_\mathrm{CAL}(\sigma) = r_\mathrm{f} + \sigma \, \sqrt{ a \, r_\mathrm{f}^{2} - 2 \, b \, r_\mathrm{f} + c} \]
TODO: Double-check the expression and example values of this slope.
Note that while the shape of the efficient frontier is unchanged by introducing or varying the risk-free rate of return, which portfolio along the frontier that is the tangent portfolio will depend on the risk-free rate of return.
The tangent portfolio with a risk-free asset is
\[ \vec{w}_\mathrm{tan} = \frac{V^{-1} \, (\vec{\mu} - r_\mathrm{f} \, \vec{1})}{\vec{1}^\intercal \, V^{-1} \, (\vec{\mu} - r_\mathrm{f} \, \vec{1})} \]
It has a return
\[ r_\mathrm{tan} = \vec{\mu} \cdot \vec{w}_\mathrm{tan} = \frac{c - b \, r_\mathrm{f}}{b - a \, r_\mathrm{f}} \]
and a variance
\[ \sigma_\mathrm{tan}^{2} = \frac{\left|\vec{\mu} - r_\mathrm{f} \, \vec{1}\right|^2}{ (\vec{\mu} - r_\mathrm{f} \, \vec{1})^\intercal \, V^{-1} \, (\vec{\mu} - r_\mathrm{f} \, \vec{1})} = \frac{a \, r_\mathrm{f}^2 - 2 \, b \, r_\mathrm{f} + c}{(b - a \, r_\mathrm{f})^2} \]
The tangent portfolio is the portfolio with the maximum Sharpe ratio, \(S_i\).
\[ S_i \equiv \frac{ r_i - r_\mathrm{f} }{ \sigma_i } \label{eq:sharpe_ratio} \]
The Sharpe ratio is a measure of how much excess return an asset had over a risk-free asset, adjusted for the risk as measured by the standard deviation of return.
TODO:
- Citation needed for the one-fund theorem
- Related to the efficient-market hypothesis: in equilibrium, the tangent portfolio becomes the market portfolio
3.6 Capital asset pricing model
Keywords:
- Capital Asset Pricing Model (CAPM)
- William F. Sharpe (b. 1934)
- Beta
- Alpha
- Security Characteristic Line (SCL)
- Security Market Line (SML)
- Jensen’s alpha
- Treynor ratio
Background:
- Jensen, M. (1968). The performance of mutual funds in the period 1945-1964. 47
- Sharpe, W.F. (1963). A simplified model for portfolio analysis. 48
- Sharpe, W.F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. 49
- Sharpe, W.F. (1999). Portfolio Theory and Capital Markets. 50
- Sharpe, W.F. (1990). Nobel lecture: Capital asset prices with and without negative holdings. 51
\[ \beta_i = \frac{ \mathrm{Cov}(r_i, r_m) }{ \mathrm{Var}(r_m) } = \mathrm{Cor}(r_i, r_m) \: \frac{\sigma_i}{\sigma_m} \label{eq:sharpe_beta} \]
Thought in \(r_i\) vs \(r_m\) space, accumulating points over time, \(\alpha_{i}\) and \(\beta_{i}\) can be calculated via linear regression:
SCL:
\[ r_{it} - r_\mathrm{f} = \hat{\alpha}_i + \hat{\beta}_i \, (r_{mt} - r_\mathrm{f}) + \varepsilon_{it} \label{eq:alpha_beta_regression} \]
The Security Characteristic Line (SCL) is the line in \(r_i\) vs \(r_m\), fit to a particular asset, \(i\), with its slope, \(\hat{\beta}_{i}\), and its \((r_i - r_\mathrm{f})\) intercept, \(\hat{\alpha}_i\).
Jensen’s alpha uses the same form, but at a particular time point, using a historical fit for \(\hat{\beta}_{i}\), but not \(\alpha_{i}\).
\[ \alpha_{i} = (r_i - r_\mathrm{f}) - \hat{\beta}_{i} \, (r_m - r_\mathrm{f}) \label{eq:jensen_alpha} \]
TODO: Compare with this:
\[ \alpha_{i} = (r_i - r_\mathrm{f}) - \hat{\beta}_{i} \, (\mu_{m} - r_\mathrm{f}) \]
The Security Market Line (SML), thought in \(r_i\) vs \(\beta_i\) space, goes through the market portfolio at (\(\beta_m\), \(r_m\)).
SML:
\[ \mathbb{E}(r_i) = r_\mathrm{f} + \beta_i \left( \mathbb{E}(r_m) - r_\mathrm{f} \right) \]

\[ T_i \equiv \frac{ r_i - r_\mathrm{f} }{ \beta_i } \label{eq:treynor_ratio} \]
- Gibbons, M., Ross, S., & Shanken, J. (1989). A test of the efficiency of a given portfolio. 52
- Luenberger, D.G. (1998). Investment Science. 53
3.7 Black-Litterman model
3.8 Factor models
3.8.1 Factor analysis
3.8.2 Fama-French model
- Fama-French model three-factor model
- Fama, E.F. & French, K.R. (1992). The cross-section of expected stock returns. 56
56 Fama & French (1992).
3.8.3 Carhart four-factor model
- Carhart four-factor model
- Yontar, T. & Benham, F. (2016). US small cap equity: Which benchmark is best?. 57
57 Yontar & Benham (2016).
3.9 Risk preferences
- Risk preferences
- Kelly criterion
- Kelly, J.L. (1956). A new interpretation of information rate. 58
- General consumption/investment problem
- Gambler’s ruin problem
- Merton’s portfolio problem
- Merton, R.C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. 59
- Karatzas, I., Lehoczky, J.P., Sethi, S.P., & Shreve, S.E (1986). Explicit solution of a general consumption/investment problem. 60
- Conditional Value at Risk (CVaR) or Expected shortfall
- Rockafellar, R.T. & Uryasev, S. (2000). Optimization of conditional value-at-risk. 61
3.10 Postmodern portfolio theory
3.10.1 Criticisms of MPT
Criticisms of MPT:
- Sensitivity of portfolio weights to the estimates of \(\hat{\mu}\) and \(\hat{V}\).
- Error propagation
- Even assuming Gaussian distributed returns
- Problem of induction
- Past performance is no guarantee of future results
- Criticisms of using historical estimators of \(\hat{\mu}\) and \(\hat{V}\)
- Non-Gaussian distributed returns
- Heteroskedasticity
- Variance is not a good measure of risk
- Downside risk is better
- Criticisms of the Efficient Market Hypothesis
3.10.2 Error propagation
- Lo, A.W. (2002). The statistics of Sharpe ratios. 62
- Bodnar, T. & Schmid, W. (2011). On the exact distribution of the estimated expected utility portfolio weights: Theory and applications. 63
- Bodnar, T., Mazur, S., & Podgórski, K. (2016). Singular inverse Wishart distribution and its application to portfolio theory. 64
3.10.3 Heteroskedasticity
3.10.4 Downside risk
- Downside risk, semi-variance, semi-deviation, target semi-variance (TSV), target semi-deviation
- Markowitz, H.M., Starer, D., Fram, H., & Gerber, S. (2019). Avoiding the downside: A practical review of the Critical Line Algorithm for mean-semivariance portfolio optimization. 65
- Mean-Semivariance frontier in scikit-portfolio
65 Markowitz et al. (2019).
\[ \mathrm{TSV}(r_i, r_\mathrm{tan}) = \mathbb{E}\left[ (r_i - r_\mathrm{tan})^2 \: \mathbb{1}_{\{r_i < r_\mathrm{tan}\}} \right] \label{eq:target_semi_variance} \]
\[ \mathrm{TSD}(r_i, r_\mathrm{tan}) = \sqrt{\mathrm{TSV}(r_i, r_\mathrm{tan})} \label{eq:target_semi_deviation} \]
3.10.5 Criticisms of the Efficient Market Hypothesis
- Bessembinder, H. & Chan, K. (1998). Market efficiency and the returns to technical analysis. 66
- TODO: Room for fundamentally-motivated indicator analysis — sits between fundamental analysis and technical analysis.
66 Bessembinder & Chan (1998).
3.10.6 More
- Rom, B.M. & Ferguson, K. (1993). Post-modern portfolio theory comes of age. 67
- Sortino, F. (2010). The Sortino Framework for Constructing Portfolios. 68
- Elton, E.J., Gruber, M.J., Brown, S.J., & Goetzmann, W.N. (2014). Modern Portfolio Theory and Investment Analysis. 69
- Low-volatility anomaly
3.11 Hierarchical risk analysis
- Asset trees
- Stock correlation network
- Mantegna, R.N. (1998). Hierarchical structure in financial markets. 70
- Onnela, J.P., Chakraborti, A., Kaski, K., Kertész, J., & Kanto, A. (2003). Dynamics of market correlations: Taxonomy and portfolio analysis. 71
- Onnela, J.P., Kaski, K., & Kertész, J. (2004). Clustering and information in correlation based financial networks. 72
- Hierarchical Risk Parity (HRP)
- López de Prado, M. (2016). Building diversified portfolios that outperform out-of-sample. 73
- López de Prado, M. (2018). Advances in Financial Machine Learning. 74
- Lohre, H., Rother, C., & Schäfer, K.A. (2020). Hierarchical Risk Parity: Accounting for tail dependencies in multi-asset multi-factor allocations. 75
- Raffinot, T. (2018). Hierarchical clustering-based asset allocation. (HCAA) 76
- Raffinot, T. (2018). The hierarchical equal risk contribution portfolio. (HERC) 77
- Hudson & Thames. (2024). The Modern Guide to Portfolio Optimization. 78
- Cotton, P. (2024). Hierarchical minimum variance portfolios: A unifying approach using Schur complements. 79
- Blogs:
70 Mantegna (1998).
71 Onnela, J.P. et al. (2003).
72 Onnela, Kaski, & Kertész (2004).
73 López de Prado (2016).
74 López de Prado (2018).
75 Lohre, Rother, & Schäfer (2020).
76 Raffinot (2018a).
77 Raffinot (2018b).
78 Hudson & Thames (2024).
79 Cotton (2024).
Correlation distance between two assets:
\[ d_{ij} = \sqrt{\frac{1}{2} \left( 1 - \rho_{ij} \right)} \label{eq:hrp_distance} \]
Euclidean distance between two assets in \(n\)-assest space:
\[ \tilde{d}_{ij} = \sqrt{ \sum_{k=1}^{n} \left( d_{ki} - d_{kj} \right)^{2} } \label{eq:hrp_tilde_distance} \]
Note that \(\tilde{d}_{ij}\) is a function of the entire correlation matrix over all assets, whereas \(d_{ij}\) is defined for asset pairs.
