A Gaussian Process (GP) is a statistical model, or more precisely, it is a stochastic process. There are two ways I like to think about GPs, both of which are highly useful.
- An extension to a multivariate normal (MVN) distribution: A GP can be thought of as extending a MVN to infinitely many random variables. That is, a GP is an infinite collection of variables, every finite subset of which is jointly distributed with MVN distribution. What does this mean? Very simply, every finite set of observations (yy) from the GP have a regular Gaussian distribution, and so all the wonderful properties of the MVN apply to them (conditional distributions are Gaussian, marginal distributions are Gaussian etc’).
- A distribution over functions: another useful way of thinking of GPs is as a probability distribution, but over functions rather than variables. This is a really useful thing, as often in machine learning what we are trying to do is some form of function approximation. A GP allows us to derive posterior distributions over functions by simply observing variables.
What we can do with GPs is absolutely amazing. Basically, they are an extremely flexible tool for modelling functions. They are completely specified by a mean function (μ(x)μ(x)) and a covariance function (k(x,x′)k(x,x′)), in the same way a MVN is completely specified by its mean and covariance. Here x∈Rd is some feature space the function is over.
Further, similarly to how we can perform posterior Bayesian inference for the parameters of a MVN given some observations, we can perform posterior Bayesian inference for the mean and covariance function of a GP. The miraculous thing is that we don’t need to observe entire functions to do this – we can use observations. And, the posterior inference is analytically tractable, which is very rare.
The GP also enables us to encode any assumptions we may have about our data, things like periodicity, smoothness (or lack there of) and so forth, so that as a user/modeller you have a lot of flexibility to bring in your domain expertise.
For an in-depth investigation of GPs, I would highly recommend “Gaussian Processes for Machine Learning”. by Carl Rasmussen. Its a short readable book that gives a very thorough detailing of GPs, including more advanced aspects such as classification or sparse approximations. It also covers some very useful basic concepts in general machine learning.