Covariance Matrix
In portfolio theory, the sample covariance matrix \(\hat\Sigma\) is a critical input for both the mean–variance optimization (the Markowitz portfolio) and risk‑parity approaches.
When we speak of “covariance” in practice, we always mean the sample version—since the population covariance matrix is purely hypothetical and unobservable. A population covariance matrix is defined as
where:
\(\mathbf r_t \in \mathbb R^n\) is the random return vector at time \(t\).
\(\boldsymbol\mu = \mathbb{E}[\mathbf r_t]\) is the true mean return vector.
We begin by computing the sample mean return vector of \(n\) assets, \(\hat{\boldsymbol\mu}\in\mathbb R^n\), also known as the expected return vector, as the column‑wise average of each asset’s log‑returns over the \(T-1\) observations. As detailed in log‑returns, “return” refers to log‑returns; see Osborne’s work for why log‑returns are preferred.
With \(\mathbf r_t\in\mathbb R^n\) denoting the vector of log‑returns at time \(t\), the sample covariance matrix is then
where the factor \(1/(T-1)\) ensures \(\hat\Sigma\) is an unbiased estimator of the true covariance.
Computation
To calculate the sample covariance matrix we build the return matrix by stacking log‑returns into an \((T-1)\times n\) matrix
where each column
contains the log‑returns of asset \(i\): $\( r_t^{(i)} = \ln\bigl(\tfrac{p_{t+1}^{(i)}}{p_t^{(i)}}\bigr). \)$
Then we center the return matrix by subtracting each column’s mean from its entries, i.e.\ $\( \tilde R = R - \mathbf1\,\hat{\boldsymbol\mu}^\top, \quad \hat{\boldsymbol\mu} = \frac{1}{T-1}\,R^\top\mathbf1. \)$
Finally, the sample covariance matrix is found as follows:
Note: Because \(\hat{\boldsymbol\mu}\) itself is a function of log‑returns, the sample covariance ultimately builds on log‑returns. Ensuring log‑returns are well‑justified (e.g., by the normality assumption in Osborne’s work) is therefore crucial, as it underpins every step of this construction.
Numerical Example: Sample Covariance Matrix
Using our nine days of log‑returns for six assets (AAPL, AMZN, GOOG, MSFT, TQQQ, TSLA) from the numerical example, we obtain:
AAPL |
AMZN |
GOOG |
MSFT |
TQQQ |
TSLA |
|
---|---|---|---|---|---|---|
AAPL |
0.000299 |
0.000097 |
-0.000012 |
0.000027 |
-0.000035 |
0.000058 |
AMZN |
0.000097 |
0.000285 |
-0.000078 |
0.000254 |
0.000244 |
0.000202 |
GOOG |
-0.000012 |
-0.000078 |
0.000851 |
0.000265 |
0.000189 |
0.000099 |
MSFT |
0.000027 |
0.000254 |
0.000265 |
0.000587 |
0.000374 |
0.000011 |
TQQQ |
-0.000035 |
0.000244 |
0.000189 |
0.000374 |
0.000603 |
0.000328 |
TSLA |
0.000058 |
0.000202 |
0.000099 |
0.000011 |
0.000328 |
0.000752 |
Diagonal entries are each asset’s daily variance (volatility squared).
Off‑diagonals are daily covariances, measuring how pairs of assets co‑move:
Positive values (e.g. AMZN‑MSFT) indicate returns tend to rise and fall together.
Negative values (e.g. AAPL‑TQQQ) indicate opposite movements.
Interpretation and Usage
In mean–variance optimization, \(\hat\Sigma\) enters as the risk (variance) term you minimize for a given expected return.
In risk‑parity, you assign weights so that each asset’s contribution to overall portfolio risk, computed via \(\hat\Sigma\), is equal.
Because both frameworks rely on the same \(\hat\Sigma\) built from log‑returns and the sample mean, the justification of log‑returns’ properties (e.g., approximate normality) propagates through to your portfolio construction rules.