Gaussian Distribution

For a $D$-dimensional vector $\mathbf x$, the multivariate Gaussian distribution takes the form $\mathcal N(\mathbf x\mid\boldsymbol \mu,\mathbf \Sigma)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\mathbf \Sigma|^{1/2}}\exp\big\{ -\frac{1}{2}(\mathbf x-\mathbf\mu)^\mathrm T\mathbf\Sigma^{-1}(\mathbf x-\mathbf\mu) \big\}$ Expanding the exponential term and group by $\mathbf x$ we get $E=-\frac{1}{2}\mathbf x^\mathrm T\mathbf\Sigma^{-1}\mathbf x + \mathbf x^\mathrm T\mathbf\Sigma^{-1}\boldsymbol\mu -\frac{1}{2}\boldsymbol\mu^\mathrm T\mathbf\Sigma^{-1}\boldsymbol\mu$ This provides us a pattern-matching way of finding $\boldsymbol \mu$ and $\mathbf \Sigma$. ## Example: posterior distribution in BLR When deriving the [[Bayesian Linear Regression#Posterior distribution|posterior distribution]] in Bayesian linear regression $\begin{align} p(\mathbf w\mid \mathbf y) &\propto p(\mathbf w) \times p(\mathbf y\mid \mathbf w) \\&\propto \exp\big \{ -\frac{\alpha}{2}\mathbf w^\mathrm T \mathbf w \big\} \exp\big \{ -\frac{\beta}{2}(\mathbf y - \mathbf {\hat y})^\mathrm T (\mathbf y - \mathbf {\hat y}) \big \} \end{align} $ We know $p(\mathbf w\mid \mathbf y)$ follows a Gaussian distribution. One way to find $\boldsymbol \mu_\mathbf w$ and $\mathbf \Sigma_\mathbf w$ is to pattern-match the exponential term $\begin{align} p(\mathbf w\mid \mathbf y) &\propto \exp\big \{ -\frac{\alpha}{2}\mathbf w^\mathrm T \mathbf w \big\} \exp\big \{ -\frac{\beta}{2}(\mathbf y - \mathbf {Xw})^\mathrm T (\mathbf y - \mathbf {Xw}) \big \} \\ &\propto \exp\big \{ -\frac{\alpha}{2}\mathbf w^\mathrm T \mathbf w - \frac{\beta}{2}(\mathbf y^\mathrm T \mathbf y - 2\mathbf w^\mathrm T \mathbf X^\mathrm T \mathbf y + \mathbf w^\mathrm T \mathbf X^\mathrm T \mathbf {Xw}) \big \} \\ &\propto \exp\big \{ -\frac{\alpha}{2}\mathbf w^\mathrm T \mathbf I \mathbf w - \frac{\beta}{2}\mathbf w^\mathrm T \mathbf X^\mathrm T \mathbf {Xw} + \beta\mathbf w^\mathrm T \mathbf X^\mathrm T \mathbf y \big \} \\ &\propto \exp\big \{ -\frac{1}{2}\mathbf w^\mathrm T\big (\alpha\mathbf I + \beta\mathbf {X}^\mathrm T\mathbf {X}\big) \mathbf w + \mathbf w^\mathrm T\big(\beta\mathbf {X}^\mathrm T\mathbf y\big) \big \} \end{align}$ Therefore $\begin{align}\Sigma^{-1}_\mathbf w &= \alpha\mathbf I + \beta\mathbf {X}^\mathrm T\mathbf {X} \\\mathbf\Sigma^{-1}_\mathbf w \boldsymbol\mu_\mathbf w &=\beta\mathbf {X}^\mathrm T\mathbf y\end{align}$