trainer module

class trainer.Vrae(config, step)

Bases: torch.nn.modules.module.Module

Vrae model, the higher and abstract API for The SongCi project


initialize dimensions(hyper-parameters) and weights for every layer/ unit on

decode(decode_input, select_index, tone_index, vowel_index)

decoder workflow

-decode_input (max(S),sum(C)): a 2d tensor of vocab index after padding, max char sequence as the time major. -select_index: a 1d list contains valid sentence index on C. -tone_index (max(S),sum(C)): a 2d list of tone index -vowel_index (max(S),sum(C)): a 2d list of vowel index
each sentence first hidden for decoder -> char_level_decoder -> fill max(S) for each sentence -> greedy_decode(agrmax)-> best match index


encode(input, padded_sentence, length_sequence)

encoder workflow

  • S: flatten 1D list containing sentence length sequence over the batch. (在一批宋词中每首词的句长)
  • C: 1D list containing ci length sequence over the batch. (在一批宋词中每首词的诗长,句数)
  • max(S): max sentence length sequence over the batch. (在一’批’宋词中最大句长数)

— max(C): max ci length sequence over the batch. (在一’批’宋词中最大词长数(在一批最多句数的词)) - sum(C): total number of sentence over a batch. (一批中的总句数) - B: batch size. (一批宋词的数量(多少首))

  • input (max(S),sum(C)): a 2d tensor of vocab index after padding, max char sequence as the time major.
  • padded_sentence dict[‘forward’,’backward’] -> (max(C),B): a 2d tensor of sentence index after padding both for forward and backward char sequence.
  • length_sequence dict[‘ci’,’sentence’]: contains 1d list S and 1d list C

input -> char_level_encdoer -> each last char hidden state as each sentence representation -> sentence_level_encoder -> each sentence_sequence representation(H_enc) for encoder -> first_sentence_level_encoder -> ci_level representation -> replace ‘each first sentence_sequence representation’ with ‘ci_level representation’ -> the 0th q_z_0 or p_z_0 is zero_inited -> loop over i times from 0 to max(C)->

concat q_z_i with H_enc and concat p_z with H_enc correspondingly -> encoder_to_latent_layer -> mu,log_var for q_z_i, p_z_i -> sample q_z_i(inferenced latent sentence sequence representation) over sample_times; sample p_z_i(assumed truth latent rep z) over 1 time ->

latent_to_decoder_layter -> each sentence_sequence representation(H_dec) for decoder

if ‘using_first_sentence’ is true, p_z_1 is using the q_z_1 and first sentence hidden out is exactly same from q_z_1 and p_z_1
get_last_hidden_state(input, sequence_array, min_seq)
  • get last char hidden for each single sentence rep via S(applied after char_level_encoder)
  • get first(which is the last for backward input) sentence hidden for each single ci rep via C(applied after



initialize each layer’s first hidden state, currently using zero initialization


load model from sub_model_path and find the latest one by regex of the file name Vrae_{epochs}_{steps}.pth

loss_function(ground_truth_x, reconstruct_x, mask_weights)
padSentence(padded_sentence, out)

pad extra sentences, from (sum(C),sentence_hidden) to (max(C)*B,sentence_hidden)

reconstruct(decode_input, select_index, tone_index, vowel_index)

same as the ‘decode’ function

tensor(inputs, is_float_type=True)

helper function to form a input either to a gpu versioned tensor or cpu versioned tensor with specified type[long or float]


test/generating loop


training loop


validation during training

variable(inputs, is_tensor=True, is_float_type=True)

helper function to form a tensor either to a gpu versioned variable or cpu versioned variable

write_summary(rl_per_char, kl_obj_per_char, kl_cost_per_char, l, p, y, is_train=True)

summary for tensorboardX visualization

trainer.one_hot(index, dim)

fills a one-hot tensor given index 2d list and dimension

m: an instance of a typical unit / layer

initialize weights for the whole model w.r.t layer or unit. currently default init for each unit except GRU unit, we use orthogonal_init

m: an instance of a typical unit / layer