Covariance Matrix

In portfolio theory, the sample covariance matrix $\hat\Sigma$ is a critical input for both the mean–variance optimization (the Markowitz portfolio) and risk‑parity approaches.

When we speak of “covariance” in practice, we always mean the sample version—since the population covariance matrix is purely hypothetical and unobservable. A population covariance matrix is defined as

\[ \Sigma = \mathbb{E}\bigl[(\mathbf r_t - \boldsymbol\mu)(\mathbf r_t - \boldsymbol\mu)^\top\bigr] \]

where:

$\mathbf r_t \in \mathbb R^n$ is the random return vector at time $t$.
$\boldsymbol\mu = \mathbb{E}[\mathbf r_t]$ is the true mean return vector.

We begin by computing the sample mean return vector of $n$ assets, $\hat{\boldsymbol\mu}\in\mathbb R^n$, also known as the expected return vector, as the column‑wise average of each asset’s log‑returns over the $T-1$ observations. As detailed in log‑returns, “return” refers to log‑returns; see Osborne’s work for why log‑returns are preferred.

With $\mathbf r_t\in\mathbb R^n$ denoting the vector of log‑returns at time $t$, the sample covariance matrix is then

\[ \hat\Sigma = \frac{1}{T-1} \sum_{t=1}^{T-1} \bigl(\mathbf r_t - \hat{\boldsymbol\mu}\bigr) \bigl(\mathbf r_t - \hat{\boldsymbol\mu}\bigr)^\top, \]

where the factor $1/(T-1)$ ensures $\hat\Sigma$ is an unbiased estimator of the true covariance.

Computation

To calculate the sample covariance matrix we build the return matrix by stacking log‑returns into an $(T-1)\times n$ matrix

\[ R = \begin{bmatrix} \mathbf r^{(1)} & \mathbf r^{(2)} & \cdots & \mathbf r^{(n)} \end{bmatrix}, \]

where each column

\[ \mathbf r^{(i)} = \bigl(r_1^{(i)}, r_2^{(i)}, \dots, r_{T-1}^{(i)}\bigr)^\top \]

contains the log‑returns of asset $i$: $$ r_t^{(i)} = \ln\bigl(\tfrac{p_{t+1}^{(i)}}{p_t^{(i)}}\bigr). $$

Then we center the return matrix by subtracting each column’s mean from its entries, i.e.\ $$ \tilde R = R - \mathbf1\,\hat{\boldsymbol\mu}^\top, \quad \hat{\boldsymbol\mu} = \frac{1}{T-1}\,R^\top\mathbf1. $$

Finally, the sample covariance matrix is found as follows:

\[ \hat\Sigma = \frac{1}{T-1}\,\tilde R^\top\,\tilde R = \frac{1}{T-1} \sum_{t=1}^{T-1} (\mathbf r_t - \hat\mu)(\mathbf r_t - \hat\mu)^\top. \]

Note: Because $\hat{\boldsymbol\mu}$ itself is a function of log‑returns, the sample covariance ultimately builds on log‑returns. Ensuring log‑returns are well‑justified (e.g., by the normality assumption in Osborne’s work) is therefore crucial, as it underpins every step of this construction.

Numerical Example: Sample Covariance Matrix

Using our nine days of log‑returns for six assets (AAPL, AMZN, GOOG, MSFT, TQQQ, TSLA) from the numerical example, we obtain:

	AAPL	AMZN	GOOG	MSFT	TQQQ	TSLA
AAPL	0.000299	0.000097	-0.000012	0.000027	-0.000035	0.000058
AMZN	0.000097	0.000285	-0.000078	0.000254	0.000244	0.000202
GOOG	-0.000012	-0.000078	0.000851	0.000265	0.000189	0.000099
MSFT	0.000027	0.000254	0.000265	0.000587	0.000374	0.000011
TQQQ	-0.000035	0.000244	0.000189	0.000374	0.000603	0.000328
TSLA	0.000058	0.000202	0.000099	0.000011	0.000328	0.000752

Diagonal entries are each asset’s daily variance (volatility squared).
Off‑diagonals are daily covariances, measuring how pairs of assets co‑move:
- Positive values (e.g. AMZN‑MSFT) indicate returns tend to rise and fall together.
- Negative values (e.g. AAPL‑TQQQ) indicate opposite movements.

Interpretation and Usage

In mean–variance optimization, $\hat\Sigma$ enters as the risk (variance) term you minimize for a given expected return.
In risk‑parity, you assign weights so that each asset’s contribution to overall portfolio risk, computed via $\hat\Sigma$, is equal.

Because both frameworks rely on the same $\hat\Sigma$ built from log‑returns and the sample mean, the justification of log‑returns’ properties (e.g., approximate normality) propagates through to your portfolio construction rules.