torchtune.data¶
Text templates¶
Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.
| A prompt template for grammar error correction tasks. | |
| A prompt template for summarization tasks. | |
| A prompt template for question answering tasks. | |
| Quickly define a custom prompt template by passing in a dictionary mapping role to the prepend and append tags. For example, to achieve the following prompt template::. | |
| Interface for prompt templates. | |
| OpenAI's Chat Markup Language used by their chat models. | 
Types¶
| This class represents individual messages in a fine-tuning dataset. | |
| alias of  | 
Message transforms¶
Converts data from common schema and conversation JSON formats into a list of torchtune Message.
| Message transform class that converts a single sample with "input" and "output" fields, (or equivalent fields specified in column_map) to user and assistant messages, respectively. This is useful for datasets that have two columns, one containing the user prompt string and the other containing the model response string::. | |
| Convert a single chat sample adhering to the ShareGPT JSON structure to torchtune's  | |
| Convert a single chat sample adhering to the OpenAI chat completion JSON structure to torchtune's  | |
| Transform for converting a single sample from datasets with "chosen" and "rejected" columns containing conversations to a list of chosen and rejected messages. For example::. | |
| Message transform class for Alpaca-style datasets with "instruction", "input", and "output" (or equivalent fields specified in column_map) columns. | 
Collaters¶
Collaters used to collect samples into batches and handle any padding.
| A generic padding collation function which pads  | |
| Pad a batch of text sequences, tiled image tensors, aspect ratios, and cross attention masks. | |
| Pad a batch of sequences to the longest sequence length in the batch, and convert integer lists to tensors. | |
| Pad a batch of sequences for Direct Preference Optimization (DPO). | |
| This function is identical to  | 
Helper functions¶
Miscellaneous helper functions used in modifying data.
| Given a list of messages, ensure that messages form a valid back-and-forth conversation. | |
| Truncate a list of tokens to a maximum length. | |
| Convenience method to load an image in PIL format from a local file path or remote source. | |
| Given a raw text string, split by the specified  |