Model Notes Revised - 2017/03/15

learning

Reconstruct

Learning: - $H^{word}{enc,i,j} = \begin{cases} f^{GRU}{\Theta}(x_{i,j}) & \quad i \in [1,n],j=1 f^{rnn}{\Theta}(x{i,j},H^{word}{enc,i,j-1}) & \quad i \in [1,n],j \in [2,m] \end{cases}$ $H^{con}{enc,i} = \begin{cases} f^{GRU}{\Phi}(H^{word}{enc,i,-1}) & \quad i=1 f^{GRU}{\Phi}(H^{word}{enc,i,-1},H^{con}{enc,i-1}) & \quad i \in [2,n] \end{cases}$ $H^{enc2lat}{i} = \begin{cases} \begin{cases} f^{rnn}{\vec z_1}(init) & \quad i \prime=n f^{rnn} {\vec z_1}(H^{con}{enc,i\prime+1},H^{con \prime}{enc,i\prime+1}) & \quad i \prime \in[n-1,1] \end{cases}& \quad i\prime=i=1 f^{rnn}{\vec z_i}(H^{con}{enc,i},\vec z_{i-1}) \text{ or concat\sum}(H^{con}{enc,i},\vec z{i-1}) & \quad i \in [2,n] \end{cases}$

$\vec z_i=\sim\mathcal{N}(.|f^{mlp}{\mu}(H^{enc2lat}{i}),f^{mlp}{\sigma}(H^{enc2lat}{i}))$ $H^{con}{dec,i} =f{lat2dec}(\vec z_i)$ $H^{word}{dec,i,j} = \begin{cases} f^{GRU}{\Omega}(H^{con}{dec,i}) & \quad i \in [1,n],j =1 f^{GRU} {\Omega}(H^{con}{dec,i},\hat x{i,j},H^{word}_{dec,i,j-1}) & \quad i \in [1,n],j \in [2,m] \end{cases}$

Reconstruct:

$H^{enc2lat}{i} =f^{rnn}{\vec z_i}(H^{con}{dec,i-1},\vec z{i-1}) \text{ or concat\sum}(H^{con}{dec,i-1},\vec z{i-1}) \quad i \in[2,n]$ $\vec{\mu_1}{empirical}=avg(\vec{\mu_1}{training})$ $\vec{\sigma_1}{empirical}=avg(\vec{\sigma_1}{training})$ $\vec{z_i} \sim \begin{cases} \mathcal{N}(\vec{\mu_1}{empirical},\vec{\sigma_1}{empirical}) & \quad i=1 \mathcal{N}(f^{mlp}{\mu}(H^{enc2lat}{i}),f^{mlp}{\sigma}(H^{enc2lat}{i})) & \quad i \in[2,n] \end{cases}$ $H^{con}{dec,i} =f{lat2dec}(\vec z_i)$ $H^{word}{dec,i,j} = \begin{cases} f^{GRU}{\Omega}(H^{con}{dec,i}) & \quad i \in [1,n],j =1 f^{GRU} {\Omega}(H^{con}{dec,i},\hat x{i,j},H^{word}_{dec,i,j-1}) & \quad i \in [1,n],j \in [2,m] \end{cases}$

KL Objective - $\begin{split} \log p_{\theta}(x)&= \log \int_{z} p_{\theta}(x,z) & &= \log \int_{z} q_\phi (z|x) \frac{p_{\theta}(x,z)}{q_\phi(z|x)} & &\ge \int_{z} q(z|x) \log \frac{p(x,z)}{q(z|x)} \text{(Jensen’s inequality)} & &= \mathbb E_{z\sim q(z|x)} [\log p(x,z)-q(z|x)] & &\text{if }\log p(x,z)=\log p(x)+\log p(z|x) &\text{else }\log p(x,z)=\log p(x|z)+\log p(z) &= \mathbb E_{z\sim q(z|x)} [\log p(x)+\log p(z|x)-q(z|x)] & =\mathbb E_{z\sim q(z|x)} [\log p(x|z)+\log p(z)-q(z|x)] &= \mathbb E_{z\sim q(z|x)} [\log p(x)-[-\log p(z|x)+q(z|x)]] & =\mathbb E_{z\sim q(z|x)} [\log p(x|z)-[-\log p(z)+q(z|x)]] &= \mathbb E_{z\sim q(z|x)} [\log p(x)-\log \frac{q(z|x)}{p(z|x)}] &=\mathbb E_{z\sim q(z|x)} [\log p(x|z)-\log \frac{q(z|x)}{p(z)}] &= - \mathbb E_{z\sim q(z|x)} [\log \frac{q(z|x)}{p(z|x)}]+\log p(x) &=- \mathbb E_{z\sim q(z|x)} [\log \frac{q(z|x)}{p(z)}]+\log p(x|z) &= - D_{KL}(q_\phi(z|x)||p_\theta(z|x))+\log p_\theta(x) &=- D_{KL}(q_\phi(z|x)||p_\theta(z))+\log p_\theta(x|z) & &= {\cal L}(x,\theta,\phi)& \end{split}$

•  KL divergence between two multivariate Gaussians : $D_{KL}(q p)=\frac{1}{2} (\log \frac{ \Sigma_p }{ \Sigma_q } -d+tr(\Sigma^{-1}_p\Sigma_q)+(\mu_p-\mu_q)^{T}\Sigma^{-1}_p(\mu_p-\mu_q))$
• If $\forall \text{ dim} \in z_i \text{ is} \perp$ and let $\Sigma$ be a vector($\vec \sigma$) on the diagonal matrix,then we have: $D_{KL}(q_i||p_i)=\frac{1}{2} (\sum\log \frac{\sigma_{p_i}}{\sigma_{q_i}} -d+\sum \frac{\sigma_{q_i}}{\sigma_{p_i}})+\frac{(\mu_{p_i}-\mu_{1i})^{T}(\mu{p_i}-\mu_{q_i})}{\sigma_{p_i}})$
•  $D_{KL}(q p)=D_{KL}(q p) \otimes M$ where M is the mask cost matrix
• We want, $p(z_1)\approx \begin{cases} \mathcal{N}(\vec{\mu_1}{empirical},\vec{\sigma_1}{empirical}) \text{ or } \mathcal{N}(0,I)& \quad \text{when Reconstruct} \mathcal{N}(f^{mlp}{\mu}(H^{enc2lat}{1}),f^{mlp}{\sigma}(H^{enc2lat}{1})) & \quad \text{when Learning} \end{cases} \ p(z_i|z_{i-1}) \approx q(z_i|z_{i-1},x_i)=\mathcal{N}(f^{mlp}{\mu}(H^{enc2lat}{i}),f^{mlp}{\sigma}(H^{enc2lat}{i}))$