API Reference¶
Attention blocks for transformer models.
ModelConfigError
¶
Bases: Exception
Custom exception class for model configuration errors.
Source code in src/llmz/components/attention.py
97 98 99 100 | |
MultiHeadAttention
¶
Bases: Module
Basic causal attention block.
Source code in src/llmz/components/attention.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
__init__(context_size, dim_in, dim_out, n_heads=1, dropout=0.6, qkv_bias=False)
¶
Initialise module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim_in
|
int
|
Dimension of input word embeddings. |
required |
dim_out
|
int
|
Dimension of output attention embeddings. |
required |
context_size
|
int
|
The number of input word embeddings in the sequence. |
required |
n_heads
|
int
|
The number of attention heads. Defaults to 1. |
1
|
dropout
|
float
|
The dropout rate. Defaults to 0.6. |
0.6
|
qkv_bias
|
bool
|
Whether or not to include bias in the linear layers used to compute W_query, W_key and W_value. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
ModelConfigError
|
if dim_out % n_heads |
Source code in src/llmz/components/attention.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
forward(x)
¶
Execute the module's forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Batch of token embeddings. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Batch of attention weighted embeddings. |
Source code in src/llmz/components/attention.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
Normalisation operations.
LayerNormalisation
¶
Bases: Module
Layer normalisation.
Normalises batches of input tensors close zero mean and unit variance. The module allows for some trained deviation from the a mean of zero and a variance of one.
Source code in src/llmz/components/normalisation.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
__init__(dim_in)
¶
Initialise module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim_in
|
int
|
Dimension of the input batches. |
required |
Source code in src/llmz/components/normalisation.py
14 15 16 17 18 19 20 21 22 23 24 25 26 | |
forward(x)
¶
Forward pass of module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input tensors. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Tensor-by-tensor normalised version of the inputs. |
Source code in src/llmz/components/normalisation.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
Activation functions for transformer models.
GELU
¶
Bases: Module
Guassian Error Linear Unit (GELU).
Implemented using an approximation to x * F(x), where F is the cumulative
normal distribution function. See 'Build a LLM (from scratch)' by S. Raschka
(2024), p105.
Source code in src/llmz/components/activations.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
__init__()
¶
Initialise module.
Source code in src/llmz/components/activations.py
15 16 17 | |
forward(x)
¶
Execute the module's forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Batch of input tensors. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Batch of output tensors that have been filtered on an element-by-element basis using the GELU activation function. |
Source code in src/llmz/components/activations.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
Transformer block for LLMs.
TransformerBlockGPT2
¶
Bases: Module
Basic transformer block with multi-head attention as used in GPT2.
Source code in src/llmz/components/transformers.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
__init__(context_size, dim_in, n_heads=1, dropout=0.6, qkv_bias=False)
¶
Initialise module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim_in
|
int
|
Dimension of input word embeddings. |
required |
context_size
|
int
|
The number of input word embeddings in the sequence. |
required |
n_heads
|
int
|
The number of attention heads. Defaults to 1. |
1
|
dropout
|
float
|
The dropout rate. Defaults to 0.6. |
0.6
|
qkv_bias
|
bool
|
Whether or not to include bias in the linear layers used to compute W_query, W_key and W_value. Defaults to False. |
False
|
Source code in src/llmz/components/transformers.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
forward(x)
¶
Execute the module's forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Batch of token embeddings. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Batch of attention weighted embeddings. |
Source code in src/llmz/components/transformers.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
Datasets for LLMs.
GPTSmallTextDataset
¶
Bases: Dataset
GPT dataset interface for any 'small' text data.
This will tokenize all text in-memory using a GPT2's tokenization algorithm, which is a pre-trained Bite Pair Encoding (BPE).
Source code in src/llmz/datasets.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
__init__(text, max_length=256, stride=128)
¶
Initialise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Raw text data to convert into tokens. |
required |
max_length
|
int
|
Number of tokens for each data instance. Defaults to 256. |
256
|
stride
|
int
|
Separation (in tokens) between consecutive instances. Defaults to 128. |
128
|
Source code in src/llmz/datasets.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
create_data_loader(batch_size=4, shuffle=True, drop_last=True, num_workers=0)
¶
Create data loader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int
|
The batch size. Defaults to 4. |
4
|
shuffle
|
bool
|
Whether to randomise instance order after each iteration. Defaults to True. |
True
|
drop_last
|
bool
|
Drop last batch if less than |
True
|
num_workers
|
int
|
Number of CPU processes to use for pre-processing. Defaults to 0. |
0
|
Returns:
| Type | Description |
|---|---|
DataLoader
|
A fully configured DataLoader |
Source code in src/llmz/datasets.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
Evaluation and metrics.
EvalResult
¶
Bases: NamedTuple
Container for evaluation results produced during training.
Source code in src/llmz/evaluate.py
14 15 16 17 18 | |
Evaluator
¶
Model evaluator.
This class executes and stores all model evaluations during training.
Source code in src/llmz/evaluate.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
__init__(train_dataloader, val_dataloader, metrics_fn, scenarios_fn=None)
¶
Initialise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_dataloader
|
DataLoader
|
DataLoader for training data. |
required |
val_dataloader
|
DataLoader
|
DataLoader for validation data. |
required |
metrics_fn
|
Callable[[Module, DataLoader], dict[str, Result]]
|
Callable that returns a dictionary of metrics given a model and a dataloader. |
required |
scenarios_fn
|
Callable[[Module], dict[str, Result]] | None
|
Optional callable that returns a dictionary of results/outputs given a model - e.g., generated text given an example prompt. Defaults to None. |
None
|
Source code in src/llmz/evaluate.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
evaluate(step, model, log=None)
¶
Evaluate model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
int
|
The number of training steps applied to the model. |
required |
model
|
Module
|
The model to evaluate. |
required |
log
|
Logger | None
|
Optional logger for logging results? Defaults to custom llmz logger. |
None
|
Return
All evaluations for the model after training steps.
Source code in src/llmz/evaluate.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
basic_llm_metrics(model, dl)
¶
Compute basic LLM metrics for a dataloader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Model to use for inference. |
required |
dl
|
DataLoader
|
Dataloader with data batches for inference. |
required |
Source code in src/llmz/evaluate.py
93 94 95 96 97 98 99 100 101 102 103 104 | |
Tools for text generation .
decode(token_logits, strategy='greedy', temperature=1.0, *, k=5)
¶
Decode generative model output using the specified strategy.
Source code in src/llmz/generate.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
format_generated_words(text, prompt)
¶
Format list of words into a readable paragraph.
Source code in src/llmz/generate.py
92 93 94 95 96 | |
generate(model, prompt, tokenizer, strategy='greedy', output_length=60, temperature=1.0, random_seed=42, device=torch.device('cpu'), *, k=2)
¶
Generate new text conditional on a text prompt.
Source code in src/llmz/generate.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
print_wrapped(text, width=89)
¶
Print text with word wrapping.
Source code in src/llmz/generate.py
106 107 108 109 | |
Implementation of GPT2.
GPT2
¶
Bases: Module
Implementation of OpenAI's GPT2 model.
Source code in src/llmz/gpt2.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
__init__(vocab_size, embed_dim, context_size, n_tsfmr_blocks=1, n_attn_heads=1, dropout=0.6, qkv_bias=False)
¶
Initialise model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int
|
The number of unique tokens that the model expects to encounter. |
required |
embed_dim
|
int
|
Dimension of input word embeddings. |
required |
context_size
|
int
|
The number of input word embeddings in the sequence. |
required |
n_tsfmr_blocks
|
int
|
The number of transformer blocks stacked together. |
1
|
n_attn_heads
|
int
|
The number of attention heads in every transformer block. Defaults to 1. |
1
|
dropout
|
float
|
The dropout rate. Defaults to 0.6. |
0.6
|
qkv_bias
|
bool
|
Whether or not to include bias in the linear layers used to compute W_query, W_key and W_value. Defaults to False. |
False
|
Source code in src/llmz/gpt2.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
forward(x)
¶
Execute the module's forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Batch of token embeddings. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Batch of attention weighted embeddings. |
Source code in src/llmz/gpt2.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
GPT2Config
dataclass
¶
Container class for GPT2 model hyper-parameters.
This class will validate parameters and then allow GPT2 objects to be created using keyword argument expansion - e.g,
config = GPT2Config(...)
model = GPT2(**config)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int
|
The number of unique tokens that the model expects to encounter. |
required |
embed_dim
|
int
|
Dimension of input word embeddings. |
required |
context_size
|
int
|
The number of input word embeddings in the sequence. |
required |
n_tsfmr_blocks
|
int
|
The number of transformer blocks stacked together. |
1
|
n_attn_heads
|
int
|
The number of attention heads in every transformer block. Defaults to 1. |
1
|
dropout
|
float
|
The dropout rate. Defaults to 0.6. |
0.6
|
qkv_bias
|
bool
|
Whether or not to include bias in the linear layers used to compute W_query, W_key and W_value. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
GPT2ConfigError
|
if any int or float parameter is <= 0, or embed_dim % n_attn_heads != 0 |
Source code in src/llmz/gpt2.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
__getitem__(key)
¶
Get config value via its field name.
Part of Mapping protocol required to enable keyword argument expansion using the
** operator.
Source code in src/llmz/gpt2.py
77 78 79 80 81 82 83 | |
__post_init__()
¶
Validate fields after initialisation.
Source code in src/llmz/gpt2.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
__repr__()
¶
Format config for the command line.
Source code in src/llmz/gpt2.py
94 95 96 97 98 99 100 | |
__str__()
¶
Format config as a string.
Source code in src/llmz/gpt2.py
85 86 87 88 89 90 91 92 | |
keys()
¶
Get iterator of field keys.
Part of Mapping protocol required to enable keyword argument expansion using the
** operator.
Source code in src/llmz/gpt2.py
69 70 71 72 73 74 75 | |
GPT2ConfigError
¶
Bases: Exception
Custom exception for GPT2 inference errors.
Source code in src/llmz/gpt2.py
189 190 191 192 | |
GPT2InferenceError
¶
Bases: Exception
Custom exception for GPT2 inference errors.
Source code in src/llmz/gpt2.py
195 196 197 198 | |
GPT2Tokenizer
¶
Bases: _Tokenizer
Pre-trained version of GPT2's tokenizer.
Source code in src/llmz/gpt2.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
__init__()
¶
Initialise tokenizer.
Source code in src/llmz/gpt2.py
176 177 178 | |
text2tokens(text)
¶
Map a string to a list of tokens.
Source code in src/llmz/gpt2.py
180 181 182 | |
tokens2text(tokens)
¶
Map a list of tokens to a string..
Source code in src/llmz/gpt2.py
184 185 186 | |
Functions for training LLMs.
GradientClipCallback
¶
Callable class that clips model gradient using max norm.
Source code in src/llmz/train.py
87 88 89 90 91 92 93 94 95 96 | |
__call__(model)
¶
Clip model gradients.
Source code in src/llmz/train.py
94 95 96 | |
__init__(clip_grad_norm=torch.inf)
¶
Initialise.
Source code in src/llmz/train.py
90 91 92 | |
LinearWarmupCosineAnnealingLRSchedule
¶
LR schedule using cosine annealing with linear warmup.
Source code in src/llmz/train.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__call__(step)
¶
Get learning rate for given step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
int
|
The global training step. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The learning rate for the global training step. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If step < 0. |
Source code in src/llmz/train.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__init__(num_steps, warmup_steps, initial_lr, peak_lr)
¶
Initialise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_steps
|
int
|
The total number of steps for the schedule. |
required |
warmup_steps
|
int
|
Number of steps in the linear warmup phase. |
required |
initial_lr
|
float
|
Learning rate at first step. |
required |
peak_lr
|
float
|
Peak learning rate at end of warmup phase. |
required |
Source code in src/llmz/train.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
autoregressive_llm_loss(model, X_batch, y_batch)
¶
Compute loss for AR LLMs like GPTs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
The language model. |
required |
X_batch
|
Tensor
|
Batch of input tokens. |
required |
y_batch
|
Tensor
|
Batch of output tokens - i.e., next token from the input sequence. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Mean cross-entropy loss for the batch. |
Source code in src/llmz/train.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
train(model, loss_calc, optimiser, lr_schedule, train_dataloader, train_epochs, eval_freq_steps, evaluator, model_backward_callbacks=None, log_freq_steps=100, device=torch.device('cpu'))
¶
Trains model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
The PyTorch model to train. |
required |
loss_calc
|
Callable[[Module, Tensor, Tensor], Tensor]
|
Function that calculates and returns loss for model and batch. |
required |
optimiser
|
Optimizer
|
The optimizer for updating model parameters. |
required |
lr_schedule
|
Callable[[int], float] | LRScheduler
|
Function to compute learning rate for training step. |
required |
train_dataloader
|
DataLoader
|
DataLoader for training data. |
required |
train_epochs
|
int
|
Number of training epochs. |
required |
eval_freq_steps
|
int
|
Number of steps between evaluations. |
required |
evaluator
|
Evaluator
|
A handler for all model evaluations. |
required |
model_backward_callbacks
|
list[Callable[[Module], None]] | None
|
Optional callbacks for model after backward pass. |
None
|
log_freq_steps
|
int
|
Number of steps between basic progress logging to stdout. Defaults to 100. |
100
|
device
|
device
|
The processor to use for training. Defaults to CPU. |
device('cpu')
|
Source code in src/llmz/train.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |