Chinese Song Ci (Iambics) Generation: From Overview to VAE

Presented by: Xinyu Liu Partner: Xinwei Chen Supervised by: Yongyi Mao 2017/03/17

Outline

Introduction Related Work Model Apply VAE Future Work

Introduction —

What is the Song Ci

The Adagio of Resonance (声声慢)

++--,--++,++----。 寻寻觅,冷冷清清,凄凄惨惨戚。I look for what I miss:I know not what it is. I feel so sad, so drear,So lonely, without cheer. --+++-,-++-。 乍暖还寒时候,最难将。How hard is it; To keep me fit; In this lingering cold! ++----,--+、-++-? 三杯两盏淡酒,怎敌他、晚来风?Hardly warmed up; By cup on cup; Of wine so dry, Oh, how could I; Endure at dusk the drift;Of wind so swift? ---,-++、---++-。 雁过也,正伤心,却是旧时相。It breaks my heart, alas! To see the wild geese pass, For they are my acquaintances of old.

--+++-,+--、++-++-。 满地黄花堆,憔悴损,如今有谁堪?The ground is covered with yellow flowers Faded and fallen in showers. Who will pick them up now? --++,---+--。 守着窗儿,独自怎生得?Sitting alone at the window, how; Could I but quicken; The pace of darkness which won’t thicken? ++-+--,-++、----。 梧桐更兼细雨,到黄昏、点点滴。On parasol-trees leaves a fine rain drizzles As twilight grizzles. ---,---+---。 这次第,怎一个愁字了?Oh! what can I do with a grief; Beyond belief?

Based on Deep (Learning) Generation Method - $P_{\theta}(w_1,…,w_n)=\displaystyle\prod_{n=1}^{N}P_{\theta}(w_n|w_{<n})$ RNNLM: given a sequence of word as the encoder inputs, using temporal model to generate a sentence compression(C) first, then given C and last token to generate next token. rnnlm SEQ2SEQ: it uses ground-truth inputs on decoder side when training. basic_seq2seq

Based on Deep (Learning) Generation Method(Cond.) - $ \begin{equation} \begin{split} P_{\theta}(U_1,…,U_M)& =\displaystyle\prod_{m=1}^{M}P_{\theta}(U_m|U_{<m})
&=\displaystyle\prod_{m=1}^{M}\prod_{n=1}^{N_m}P_{\theta}(w_{m,n}|w_{m<n},U_{<m}) \end{split} \end{equation}$ Dialogue: is more complicated, has a group of word-level enc/dec and a sequence of context-level representation built on top it. HRED

Our Model:Phrases & Formulation —

Training

Reconstruct/Generation reconstruct

Apply VAE: Purpose —

Apply VAE: Framework - vae We force a fake posterior (q(z|x)) to close to the ground-truth prior(p(z)) as much as possible, then we sample form prior(p(z)) in order to get a close enough but not identical $\hat x$

Apply VAE: Theory —

Apply VAE: Limitation — Assumption on data distribution hard to train in order to get a meaningful latent Representation

Future Work —

Questions and Thanks! -