torchtune.modules¶
Modeling Components and Building Blocks¶
| Multi-headed grouped query self-attention (GQA) layer introduced in https://arxiv.org/pdf/2305.13245v1.pdf. | |
| This class implements the feed-forward network derived from Llama2. | |
| Standalone nn.Module containing a kv-cache to cache past key and values during inference. | |
| Create a learning rate schedule that linearly increases the learning rate from 0.0 to lr over num_warmup_steps, then decreases to 0.0 on a cosine schedule over the remaining num_training_steps-num_warmup_steps (assuming num_cycles = 0.5). | |
| This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864. | |
| Implements Root Mean Square Normalization introduced in https://arxiv.org/pdf/1910.07467.pdf. | |
| Transformer layer derived from the Llama2 model. | |
| Transformer Decoder derived from the Llama2 architecture. | 
Tokenizers¶
| A wrapper around SentencePieceProcessor. | |
| A wrapper around tiktoken Encoding. | 
PEFT Components¶
| LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models. | |
| Interface for an nn.Module containing adapter weights. | |
| Return the subset of parameters from a model that correspond to an adapter. | |
| Set trainable parameters for an nn.Module based on a state dict of adapter parameters. | 
Module Utilities¶
These are utilities that are common to and can be used by all modules.
| A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU. |