cnlpt.HierarchicalTransformer module¶
Module containing the Hierarchical Transformer module, adapted from Xin Su.
- cnlpt.HierarchicalTransformer.set_seed(seed, n_gpu)¶
Set the random seeds for
random, numpy, and pytorch to a specific value.- Parameters
seed – the seed to use
n_gpu – the number of GPUs being used
- class cnlpt.HierarchicalTransformer.MultiHeadAttention¶
Bases:
ModuleMulti-Head Attention module
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters
n_head – the number of attention heads
d_model – the dimensionality of the input and output of the encoder
d_k – the size of the query and key vectors
d_v – the size of the value vector
- __init__(n_head, d_model, d_k, d_v, dropout=0.1)¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(q, k, v, mask=None)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.PositionwiseFeedForward¶
Bases:
ModuleA two-feed-forward-layer module
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters
d_in – the dimensionality of the input and output of the encoder
d_hid – the inner hidden size of the positionwise FFN in the encoder
dropout – the amount of dropout to use in training (default 0.1)
- __init__(d_in, d_hid, dropout=0.1)¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.ScaledDotProductAttention¶
Bases:
ModuleScaled Dot-Product Attention
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters
temperature – the temperature for scaled dot product attention
attn_dropout – the amount of dropout to use in training for scaled dot product attention (default 0.1, not tuned in the rest of the code)
- __init__(temperature, attn_dropout=0.1)¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(q, k, v, mask=None)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.EncoderLayer¶
Bases:
ModuleCompose with two layers
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters
d_model – the dimensionality of the input and output of the encoder
d_inner – the inner hidden size of the positionwise FFN in the encoder
n_head – the number of attention heads
d_k – the size of the query and key vectors
d_v – the size of the value vector
dropout – the amount of dropout to use in training in both the attention and FFN steps (default 0.1)
- __init__(d_model, d_inner, n_head, d_k, d_v, dropout=0.1)¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(enc_input, slf_attn_mask=None)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.HierarchicalTransformerConfig¶
Bases:
objectConfig object for hierarchical transformer’s document-level encoder layers
Original author: Xin Su (https://github.com/xinsu626/DocTransformer)
- Parameters
n_layers – number of encoder layers
d_model – the dimensionality of the input and output of the encoder
d_inner – the inner hidden size of the positionwise FFN in the encoder
n_head – the number of attention heads
d_k – the size of the query and key vectors
d_v – the size of the value vector
dropout – the amount of dropout to use in training in both the attention and FFN steps (default 0.1)
- __init__(n_layers, d_model, d_inner, n_head, d_k, d_v, dropout=0.1)¶
- class cnlpt.HierarchicalTransformer.HierarchicalModel¶
Bases:
CnlpModelForClassificationHierarchical Transformer model (https://arxiv.org/abs/2105.06752)
Adapted from Xin Su’s implementation (https://github.com/xinsu626/DocTransformer)
- Parameters
config –
transformer_head_config –
class_weights –
final_task_weight –
argument_regularization –
freeze –
- config_class¶
alias of
CnlpConfig
- __init__(config, transformer_head_config, *, class_weights=None, final_task_weight=1.0, argument_regularization=- 1, freeze=False)¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, event_tokens=None)¶
Forward method.
- Parameters
input_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – A batch of chunked documents as tokenizer indices.
attention_mask (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Attention masks for the batch.
token_type_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Token type IDs for the batch.
position_ids – (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional): Position IDs for the batch.
head_mask (torch.LongTensor of shape (num_heads,) or (num_layers, num_heads), optional) – Token encoder head mask.
inputs_embeds (torch.FloatTensor of shape (batch_size, num_chunks, chunk_len, hidden_size), optional) – A batch of chunked documents as token embeddings.
labels (torch.LongTensor of shape (batch_size, num_tasks), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, …, self.num_labels[task_ind] - 1]. If self.num_labels[task_ind] == 1 a regression loss is computed (Mean-Square loss), If self.num_labels[task_ind] > 1 a classification loss is computed (Cross-Entropy).
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers.
output_hidden_states – not used.
event_tokens – not currently used (only relevant for token classification)
Returns: