Chinese Song Ci (Iambics) Generation: From Overview to VAE

Presented by: Xinyu Liu
Partner: Xinwei Chen
Supervised by: Yongyi Mao
2017/03/17

Outline

Introduction
Related Work
Model
Apply VAE
Future Work

Introduction

What is the Song Ci
  • Ci is part of the format of Shi(poem) and develops widely in Song dynasty which is so called Song CI.

  • Ci is type of lyric and usually it is coordinated with instruments such is 'The Adagio of Resonance'(声声慢)


Why choose Song Ci and its potential difficulty
  • structure constraints: compared to traditional 5/7-character poems,Song Ci:

    • written with uneven lengths of lines and verses

    • different format from the ancient poetic style

    • Its lyrics in a regular format and with set tunes(vowels)

  • tone constraints:

    • once Ci's name(lyrics) is given,# of words(characters) is fixed and a tone constraints is set for each position of the word

    • every character has its own tone(s)

      • nī ní nǐ nì

      • 妮 倪 你 腻

    • $+ \in\{\bar{},\acute{}\}$denotes ping(平)

    • $- \in\{\check{},\grave{}\}$denotes ze(仄)

  • rhyming constraints:

    • Usually the last character between current and next sentence needs to be a same/similar vowel sound

The Adagio of Resonance (声声慢)

++--,--++,++----
寻寻觅,冷冷清清,凄凄惨惨戚。I look for what I miss:I know not what it is. I feel so sad, so drear,So lonely, without cheer.
--+++-,-++-
乍暖还寒时候,最难将。How hard is it; To keep me fit; In this lingering cold!
++----,--+、-++-
三杯两盏淡酒,怎敌他、晚来风?Hardly warmed up; By cup on cup; Of wine so dry, Oh, how could I; Endure at dusk the drift;Of wind so swift?
---,-++、---++-
雁过也,正伤心,却是旧时相。It breaks my heart, alas! To see the wild geese pass, For they are my acquaintances of old.

--+++-,+--、++-++-
满地黄花堆,憔悴损,如今有谁堪?The ground is covered with yellow flowers Faded and fallen in showers. Who will pick them up now?
--++,---+--
守着窗儿,独自怎生得?Sitting alone at the window, how; Could I but quicken; The pace of darkness which won't thicken?
++-+--,-++、----
梧桐更兼细雨,到黄昏、点点滴。On parasol-trees leaves a fine rain drizzles As twilight grizzles.
---,---+---
这次第,怎一个愁字了?Oh! what can I do with a grief; Beyond belief?

  • By given ci's name(The Adagio of Resonance (声声慢) ):

    • every position has its corresponding tone, and lengh of line is uneven...

Based on Traditional Generation Method

The work of using machine to generate poem starts in 70s, and the major approaches are:

  • Word Salada: the earliest approach which is only based on the permutation of phrases/words (does not care about gramma/semantic level)

  • Template Model: like the task of imputing missing words and cloze test, removing some words from existing poems.(lack of flexibility)

  • Generic algorithm: treat poems generation as a searching problem from state space and using the pre-defined evaluation function to iterate over each sentence.(lack of the relative relation between sentences)

  • Abstract: treat poems generation as an abstract generation based on user intents.

  • Machine Translation:treat last sentence as a source sentence and generate next sentence as the target sentence(topics shift/off-topic problems)

Based on Deep (Learning) Generation Method

$P_{\theta}(w_1,...,w_n)=\displaystyle\prod_{n=1}^{N}P_{\theta}(w_n|w_{<n})$
RNNLM: given a sequence of word as the encoder inputs, using temporal model to generate a sentence compression(C) first, then given C and last token to generate next token.
rnnlm
SEQ2SEQ: it uses ground-truth inputs on decoder side when training.
basic_seq2seq

Based on Deep (Learning) Generation Method(Cond.)

$\begin{equation}\begin{split}P_{\theta}(U_1,...,U_M)& =\displaystyle\prod_{m=1}^{M}P_{\theta}(U_m|U_{<m}) \\ &=\displaystyle\prod_{m=1}^{M}\prod_{n=1}^{N_m}P_{\theta}(w_{m,n}|w_{m<n},U_{<m})\end{split}\end{equation}$
Dialogue: is more complicated, has a group of word-level enc/dec and a sequence of context-level representation built on top it.
HRED

Our Model:Phrases & Formulation

  • Phrases:

    1. Intention Representation

    2. Ci Generation

  • Notations:

    • $W_v=\{w_1,w_2,...,w_n\}:$ A set of keywords from user Intention

    • $c\in C:$ A collection of Ci's name/lyric(词牌名)

    • $D=\{w_1,w_2,...,w_n\}:$ A sentence or word sequence

    • $P=\{D_1,D_2,...,D_n\}:$ A Song Ci or sentence sequence

  • Formulation:

    • Given user's specified Ci's name/lyric and keywords as inputs

    • to generate a Ci as a output

    • $P=f(c,W_v)$

  • Summary:

    • hierarchical variational auto-encoder

      • word level

      • context/sentence level

    • latent representations(vector)

      • core idea behind VAE

Training

Reconstruct/Generation
reconstruct

Apply VAE: Purpose

  • what is the variational auto-encoder(VAE)?

    • A generate auto-encoder framework

    • learns a simple and meaningful feature representations($\vec{z}$) via encoding and inference

    • generate a new output($\hat{x}$) given ($\vec{z}$)

  • why use VAE

    • not only capture more compact latent representations($\vec{z}$) for data

    • but also because ($\vec{z}$) is interpretable

    • once we have a good and strong enough of ($\vec{z}$):

      • build a connection between user's intent and latent representation

      • create/reconstruct a vivid data without $x$(source)

      • or more fun play it and manipulate it

Apply VAE: Framework

vae
We force a fake posterior (q(z|x)) to close to the ground-truth prior(p(z)) as much as possible, then we sample form prior(p(z)) in order to get a close enough but not identical $\hat x$

Apply VAE: Theory

  • Notation:
    $D_{KL}:$Kullback–Leibler Divergence (a measure of how close between 2 distributions.)

  • Maximize log likelihood -> marginalize the joint distribution over z; but z is intractable -> variational inference(q(z|x))

  • Objective: Maximize the lower bound of marginal log likelihood

    • $\begin{aligned} {\text{Maximize}} & &{\cal L}(x,\theta,\phi)=[\log p(x|z)-D_{KL}(q(z|x)||p(z))]\end{aligned}$
      $\begin{equation}\begin{split}\log p_{\theta}(x)&= \log \int_{z} p_{\theta}(x,z) &\\&= \log \int_{z} q_\phi (z|x) \frac{p_{\theta}(x,z)}{q_\phi(z|x)} &\\&\ge \int_{z} q(z|x) \log \frac{p(x,z)}{q(z|x)} \text{(Jensen's inequality)} & \\&= \mathbb E_{z\sim q(z|x)} [\log p(x,z)-q(z|x)] &\\&\text{if }\log p(x,z)=\log p(x)+\log p(z|x) &\text{else }\log p(x,z)=\log p(x|z)+\log p(z)\\&= \mathbb E_{z\sim q(z|x)} [\log p(x)+\log p(z|x)-q(z|x)] & =\mathbb E_{z\sim q(z|x)} [\log p(x|z)+\log p(z)-q(z|x)]\\&= \mathbb E_{z\sim q(z|x)} [\log p(x)-[-\log p(z|x)+q(z|x)]] & =\mathbb E_{z\sim q(z|x)} [\log p(x|z)-[-\log p(z)+q(z|x)]]\\&= \mathbb E_{z\sim q(z|x)} [\log p(x)-\log \frac{q(z|x)}{p(z|x)}] &=\mathbb E_{z\sim q(z|x)} [\log p(x|z)-\log \frac{q(z|x)}{p(z)}]\\&= - \mathbb E_{z\sim q(z|x)} [\log \frac{q(z|x)}{p(z|x)}]+\log p(x) &=- \mathbb E_{z\sim q(z|x)} [\log \frac{q(z|x)}{p(z)}]+\log p(x|z)\\&= - D_{KL}(q_\phi(z|x)||p_\theta(z|x))+\log p_\theta(x) &=- D_{KL}(q_\phi(z|x)||p_\theta(z))+\log p_\theta(x|z)\\& &= {\cal L}(x,\theta,\phi)&\\\end{split}\end{equation}$

Apply VAE: Limitation

Assumption on data distribution
hard to train in order to get a meaningful latent Representation

Future Work

  • Cvae: Conditioned on topic,phrases or keywords

  • GAN(generative adversarial nets):

    • no explicit assumption on distribution

    • Representation decouple(decompose semantic) :latent vector arithmetic
      Vector arithmetic

Questions and Thanks!