GP (3) Random Fourier Features Gaussian Process

Posted Aug 6, 2025

By jayarnim

1 min read

Random Fourier Feature Gaussian Process

Random Fourier Feature Gaussian Process is a technique for approximating an infinite-dimensional function space based on kernel functions to a finite-dimensional function space based on real sinusoids.
nonparametric function:
\[f(x)=\sum_{i=1}^{\infty}{\beta_{i}k(x,x_{i})}\]
nonparametric function space:
\[\mathcal{F} =\mathrm{span}\left\{k(\cdot,x_{1}),k(\cdot,x_{2}),\cdots\right\}\]
Random Fourier Feature:
\[\sum_{i=1}^{\infty}{\beta_{i}k(x,x_{i})} \approx \sum_{m=1}^{M}{\gamma_{m}\psi_{m}(x)}\]
infinite dimensional function space is approximated by finite-dimensional space that take a finite number of real sinusoids as basis functions:
\[\mathcal{F} \approx\mathrm{span}\left\{\psi_{1}(\cdot),\psi_{2}(\cdot),\cdots,\psi_{M}(\cdot)\right\}\]

how to approximation

by Bochner’s theorem, a shift-invariant and positive-definite kernel function can be expressed as an infinite-dimensional inner product of continuous complex sinusoids.
\[\begin{aligned} k(x,x^{\prime}) &=\int_{\mathbb{R}^{d}}{\phi(x)\phi^{*}(x)p(\omega)\mathrm{d}\omega}\\ \phi(x) &=\exp{i\omega x},\quad\omega\in\mathbb{R} \end{aligned}\]
Random Fourier Feature is a technique for approximating the infinite-dimensional inner product of continuous complex sinusoids to the finite-dimensional inner product of discrete real sinusoids.
\[\begin{aligned} \int_{\mathbb{R}^{d}}{\phi(x)\phi^{*}(x)p(\omega)\mathrm{d}\omega} &\approx\sqrt{\frac{2}{M}}\sum_{m=1}^{M}{\psi_{m}(x)\psi_{m}^{*}(x)}\\ \psi_{m}(x) &=\cos{\omega_{m}x+b_{m}},\quad\begin{cases} \omega_{m}\sim p(\omega)\\ b_{m}\sim\mathrm{Uniform}(0,2\pi] \end{cases} \end{aligned}\]
when substituted into the kernel function, which is basis function of a nonparametric function space, the nonparametric function can be expressed as a linear combination of discrete real number sinusoids.
\[\begin{aligned} f(x) &=\sum_{i=1}^{\infty}{\beta_{i}k(x,x_{i})}\\ &\approx\sum_{i=1}^{\infty}{\beta_{i}}\left(\sqrt{\frac{2}{M}}\sum_{m=1}^{M}{\psi_{m}(x)\psi_{m}(x_{i})}\right)\\ &=\sum_{m=1}^{M}{\underbrace{\sqrt{\frac{2}{M}}\left(\sum_{i=1}^{\infty}{\beta_{i}\psi_{m}(x_{i})}\right)}_{=:\gamma_{m}}\psi_{m}(x)}\\ &=\sum_{m=1}^{M}{\gamma_{m}\psi_{m}(x)} \end{aligned}\]
therefore, the infinite dimensional function space based on the kernel function $\phi(x)=k(\cdot,x)$ defined for all input values $\forall x$ is approximated by the finite dimensional function space based on the discrete real sinusoid $\psi_{m}(x)=\cos{\omega_{m}x+b_{m}}$.
\[\mathcal{F} \approx\mathrm{span}\left\{\psi_{1}(\cdot),\psi_{2}(\cdot),\cdots,\psi_{M}(\cdot)\right\}\]

bayes rule

The basis $\psi_{m}(x):=\cos{\omega_{m}x+b_{m}}$ is deterministic after Monte Carlo approximation. Therefore, the probability of the random variable $f(x):=\sum_{m=1}^{M}{\gamma_{m}\psi_{m}(x)}$ is entirely based on the coefficient $\gamma_{m}$.
\[\gamma_{m}:=\sqrt{\frac{2}{M}}\sum_{i=1}^{\infty}{\beta_{i}\psi_{m}(x_{i})}\]
proxy variable of $f(x)$:

The distribution of the function value vector $\mathbf{f}_{X}:=\{f(x_{i})\}_{i=1}^{N}$ for the input set $\{x_{i}\}_{i=1}^{N}$ can be expressed as a distribution that pushes forward the distribution $p(\gamma)$ of $\gamma$ through a linear transformation $T_{X}:=\Psi_{X}$.

\[\begin{gathered} \mathbf{f}_{X}=\Psi_{X}^{T}\gamma=T_{X}(\gamma)\quad\Rightarrow\quad p(\mathbf{f}_{X})=(T_{X})_{\#}p(\gamma) \end{gathered}\]
- $\mathbb{E}\left[\mathbf{f}_{X}\right]=T_{X}\mathbb{E}\left[\gamma\right]$ $\quad$
- $\mathrm{Cov}\left[\mathbf{f}_{X}\right]=T_{X}^{T}\mathbb{E}\left[\gamma\right]T_{X}$ $\quad$
prior:
\[\begin{aligned} p(\gamma)=\mathcal{N}(\mu_{\gamma},\Sigma_{\gamma}) \end{aligned}\]
likelihood:
\[\begin{gathered} y_{i} =\sum_{m=1}^{M}{\gamma_{m}\psi(x)}+\epsilon_{i},\quad\epsilon\sim\mathcal{N}(0,\sigma^{2})\\ \therefore p(y\mid\gamma) =\mathcal{N}(\Psi_{X}^{T}\gamma,\sigma^{2}\mathbf{I}) \end{gathered}\]
posterior:
\[\begin{aligned} p(\gamma\mid y) &\propto p(y\mid\gamma)p(\gamma)\\ &\propto \exp\left[-\frac{1}{2\sigma^{2}}\Vert y-\Psi_{X}^{T}\gamma\Vert^{2}-\frac{1}{2}\Vert\gamma\Vert^{2}\right]\\ &\propto \mathcal{N}(\mu_{\gamma}^{\prime},\Sigma_{\gamma}^{\prime}) \end{aligned}\]
- $\mu_{\gamma}^{\prime}=(1/\sigma^{2})\Sigma_{\gamma}\Psi_{X}^{T}y$
- $\Sigma_{\gamma}^{\prime}=\left[(1/\sigma^{2})\Psi_{X}^{T}\Psi_{X}+\mathbf{I}\right]^{-1}$
posterior predictive:
\[\begin{aligned} \mathbf{f}_{*} &=\Psi_{*}^{T}\gamma\\ \therefore p(\mathbf{f}_{*}\mid y) &=\int{p(\mathbf{f}_{*}\mid\gamma)p(\gamma\mid y)\mathrm{d}\gamma}\\ &=\mathcal{N}(\Psi_{*}^{T}\mu_{\gamma}^{\prime},\Psi_{*}^{T}\Sigma_{\gamma}^{\prime}\Psi_{*}) \end{aligned}\]

BAYES, 4.stochastic process

This post is licensed under CC BY 4.0 by the author.

Random Fourier Feature Gaussian Process

how to approximation

bayes rule

Trending Tags