Instead of targeting directly \(\min KL(q(\mathbf x) || p(\mathbf x)),\) latent variable models introduce a $\mathbf z$ on which inference is performed. If one has a prior distribution that can be sampled, the model can be used to generate new data.

The latent variable enables unsupervised objectives like clustering. Also, it can be easier to model the data with more expressive models instead of directly targeting data likelihood. See Variational AutoencoderVariational Autoencoder

The Variational Autoencoder (Kingma & Welling, 2013) is a [[Latent Variable Model]] that targets

\(\begin{aligned}

\min_{\phi, \theta} D_{KL}[q_\phi(\mathbf x, \mathbf z) || p_\theta(\mathbf x, \mathbf z)] &= \mathbb{E}_{q(x)q_\phi(z|x)}[\log q_\phi(\mathbf z \mid \mathbf x) - \log p_\theta(\mathbf x \mid \mathbf z) - \log p_\theta(\mathbf z)] \

&= \mathbb{E}_{q(x)}\left[\mathbb{E}_{q_\phi(z|x)}[-\log p_\theta(\mathbf x \mid \mathbf z)] + D_{KL}[q_\phi(\mathbf z \mid \mathbf x) ....