Wasserstein GANs

 Martin Arjovsky, Soumith Chintala, and L ́eon Bottou Courant Institute of Mathematical, Sciences Facebook AI Research cited:287 since 2017

GANs

GANs refers to Generative Adversarial Networks.

 GANs is inspired by game theory, it has 2 nets (Generator & Discriminator), they play against each other to get stronger at each round. GANs is an implicit generative model since Generator uses the signal (loss) from the Discriminator (classifier) to implicitly approximate his intractable cost function.

Review GANs

 $x$: real data example. $\hat{x}$ fake data example from $G(z)$ $z$: noise input usually from a uniform distribution. $y$: a label $\in \{\text{Real:1},\text{Fake:0}\}$. $D$: a discriminator net to estimate $p(y|x)$. $G$: a generator net to output fake example($\hat{x}$). $P_z$: assumed data distribution over noise input $z$. $P_g$: generator distribution over sample $\hat{x}$. $P_r$: ‘real’ data distribution over real sample $x$.
• Discriminator:
• Generator:
• GAN0:
• GAN1:

KL divergence and JS divergence

• KL is a measure of how one probability distribution diverges from a second, expected probability distribution
• KL and not symmetric, forward reversed
• JS , is a symmetric and more smooth measure of 2 probability distribution.

Analyze loss function 1

• when we fixed G,what is the optimal D:
• this is taking the partial partial derivative of the loss function w.r.t D(x) to 0
• we can get

Analyze loss function 2

• when we have optimal D* what is the loss for min G :

Problems 1: Gradient vanishing

• now we know that when we have optimal D, min g is same as min

• there are 3 different cases to consider when we plug it in to the JS measure

• and -> 0 , -> 0

• or -> , ->

• and barely happen,neglectable

Manifold Assumption

The data distribution lie close to a low-dimensional manifold Example: consider image data

 Very high dimensional (1,000,000D) A randomly generated image will almost certainly not look like any real world scene The space of images that occur in nature is almost completely empty Hypothesis: real world images lie on a smooth, low-dimensional manifold

Assumption: Support of $P_r \& P_g$ lie on low dimensional manifolds

Support: A real-valued function f is he subset of the domain containing those elements which are not mapped to zero.

 we now assume Support of $P_r$ lives in a low-dimensional manifold embedded in a higher-dimensional space (input space) now think about what does the generator net do? we first randomly generate z and dim(z) << dim(x) we use G(z) as a non-linear mapping from dim(z) to dim(x) so what does the p_g represent eventually? since Manifold learning is an approach to non-linear dimensionality reduction p_g represents a consequence after reverting manifold learning we now assume Support of $P_r \& P_g$ lie on low dimensional manifolds this means each one of manifold hardly fills up the whole high dimensional space they are almost certainly gonna be disjoint, the case where they overlap is neglectable