trainer module

class trainer.Vrae(config, step)

Bases: torch.nn.modules.module.Module

Vrae model, the higher and abstract API for The SongCi project

build_model()

initialize dimensions(hyper-parameters) and weights for every layer/ unit on model.py

decode(decode_input, select_index, tone_index, vowel_index)

decoder workflow

Args:
-decode_input (max(S),sum(C)): a 2d tensor of vocab index after padding, max char sequence as the time major. -select_index: a 1d list contains valid sentence index on C. -tone_index (max(S),sum(C)): a 2d list of tone index -vowel_index (max(S),sum(C)): a 2d list of vowel index
Flow:
each sentence first hidden for decoder -> char_level_decoder -> fill max(S) for each sentence -> greedy_decode(agrmax)-> best match index

Notes:

encode(input, padded_sentence, length_sequence)

encoder workflow

Notations:
  • S: flatten 1D list containing sentence length sequence over the batch. (在一批宋词中每首词的句长)
  • C: 1D list containing ci length sequence over the batch. (在一批宋词中每首词的诗长,句数)
  • max(S): max sentence length sequence over the batch. (在一’批’宋词中最大句长数)

— max(C): max ci length sequence over the batch. (在一’批’宋词中最大词长数(在一批最多句数的词)) - sum(C): total number of sentence over a batch. (一批中的总句数) - B: batch size. (一批宋词的数量(多少首))

Args:
  • input (max(S),sum(C)): a 2d tensor of vocab index after padding, max char sequence as the time major.
  • padded_sentence dict[‘forward’,’backward’] -> (max(C),B): a 2d tensor of sentence index after padding both for forward and backward char sequence.
  • length_sequence dict[‘ci’,’sentence’]: contains 1d list S and 1d list C
Flow:

input -> char_level_encdoer -> each last char hidden state as each sentence representation -> sentence_level_encoder -> each sentence_sequence representation(H_enc) for encoder -> first_sentence_level_encoder -> ci_level representation -> replace ‘each first sentence_sequence representation’ with ‘ci_level representation’ -> the 0th q_z_0 or p_z_0 is zero_inited -> loop over i times from 0 to max(C)->

concat q_z_i with H_enc and concat p_z with H_enc correspondingly -> encoder_to_latent_layer -> mu,log_var for q_z_i, p_z_i -> sample q_z_i(inferenced latent sentence sequence representation) over sample_times; sample p_z_i(assumed truth latent rep z) over 1 time ->

latent_to_decoder_layter -> each sentence_sequence representation(H_dec) for decoder

Notes:
if ‘using_first_sentence’ is true, p_z_1 is using the q_z_1 and first sentence hidden out is exactly same from q_z_1 and p_z_1
get_last_hidden_state(input, sequence_array, min_seq)
  • get last char hidden for each single sentence rep via S(applied after char_level_encoder)
  • get first(which is the last for backward input) sentence hidden for each single ci rep via C(applied after

first_sentence_encoder)

init_hidden(flatten_num_of_sentences)

initialize each layer’s first hidden state, currently using zero initialization

load_model()

load model from sub_model_path and find the latest one by regex of the file name Vrae_{epochs}_{steps}.pth

loss_function(ground_truth_x, reconstruct_x, mask_weights)
padSentence(padded_sentence, out)

pad extra sentences, from (sum(C),sentence_hidden) to (max(C)*B,sentence_hidden)

reconstruct(decode_input, select_index, tone_index, vowel_index)

same as the ‘decode’ function

tensor(inputs, is_float_type=True)

helper function to form a input either to a gpu versioned tensor or cpu versioned tensor with specified type[long or float]

test()

test/generating loop

train()

training loop

valid(is_init=False)

validation during training

variable(inputs, is_tensor=True, is_float_type=True)

helper function to form a tensor either to a gpu versioned variable or cpu versioned variable

write_summary(rl_per_char, kl_obj_per_char, kl_cost_per_char, l, p, y, is_train=True)

summary for tensorboardX visualization

trainer.one_hot(index, dim)

fills a one-hot tensor given index 2d list and dimension

Args:
m: an instance of a typical unit / layer
trainer.weights_init(m)

initialize weights for the whole model w.r.t layer or unit. currently default init for each unit except GRU unit, we use orthogonal_init

Args:
m: an instance of a typical unit / layer