自定义服务¶

本文档内容¶

自定义处理器¶

通过编写一个Python脚本并将其打包在使用模型存档器时使用的模型中，您可以自定义TorchServe的行为。当TorchServe运行时，它会执行此代码。

提供一个自定义脚本来：

初始化模型实例
在将输入数据预处理后，再将其发送给模型进行推理或Captum解释。
为推理或解释定制模型的调用方式
在发送回复之前，先对模型的输出进行预处理。

适用于所有类型的自定义处理器

数据 - 来自请求的输入数据
上下文 - 这是TorchServe的上下文。您可以使用以下信息进行自定义 model_name, model_dir, manifest, batch_size, gpu 等。

开始使用BaseHandler！¶

BaseHandler 实现了您需要的大部分功能。您可以从它派生一个新类，如示例和默认处理程序所示。大多数情况下，您只需要覆盖 preprocess 或 postprocess。

自定义处理器，具有`module`级入口点¶

自定义处理器文件必须定义一个模块级别的函数，该函数作为执行的入口点。该函数可以具有任何名称，但必须接受以下参数并返回预测结果。

一个入口函数的签名是：

# Create model object
model = None

def entry_point_function_name(data, context):
    """
    Works on data and context to create model object or process inference request.
    Following sample demonstrates how model object can be initialized for jit mode.
    Similarly you can do it for eager mode models.
    :param data: Input data for prediction
    :param context: context contains model server system properties
    :return: prediction output
    """
    global model

    if not data:
        manifest = context.manifest

        properties = context.system_properties
        model_dir = properties.get("model_dir")
        device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = manifest['model']['serializedFile']
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        model = torch.jit.load(model_pt_path)
    else:
        #infer and return result
        return model(data)

这个入口点涉及两个案例：

TorchServe 被要求扩展模型以增加后端工作者的数量（这可以通过一个 PUT /models/{model_name} 请求或一个 POST /models 请求并选择 initial-workers 选项，或者在 TorchServe 启动时使用 --models 选项（torchserve --start --models {model_name=model.mar}），即您提供要加载的模型）
TorchServe 得到一个 POST /predictions/{model_name} 请求。

(1) 用于扩展或缩减模型的工作节点。(2) 作为运行模型推理的标准方式。(1) 也被称为模型加载时间。通常，你希望在模型加载时运行模型初始化代码。你可以进一步了解这些以及其他 TorchServe API，在 TorchServe 管理 API 和 TorchServe 推理 API

自定义处理器，具有`class`级入口点¶

你可以通过拥有任何名称的类来创建自定义处理器，但必须有一个 initialize 和一个 handle 方法。

注意 - 如果您计划在同一Python模块/文件中包含多个类，则确保处理类是列表中的第一个。

一个入口点类和函数的签名是：

class ModelHandler(object):
    """
    A custom model handler implementation.
    """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None
        self.device = None

    def initialize(self, context):
        """
        Invoke by torchserve for loading a model
        :param context: context contains model server system properties
        :return:
        """

        #  load the model
        self.manifest = context.manifest

        properties = context.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = self.manifest['model']['serializedFile']
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        self.model = torch.jit.load(model_pt_path)

        self.initialized = True


    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        pred_out = self.model.forward(data)
        return pred_out

高级自定义处理器¶

返回自定义错误代码¶

为了通过自定义处理器返回自定义错误代码给用户，使用module级入口点。

from ts.utils.util import PredictionException
def handle(data, context):
    # Some unexpected error - returning error code 513
    raise PredictionException("Some Prediction Error", 513)

为了通过自定义处理器返回自定义错误代码给用户，使用class级入口点。

from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import PredictionException

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def handle(self, data, context):
        # Some unexpected error - returning error code 513
        raise PredictionException("Some Prediction Error", 513)

编写自定义处理器以处理预测和解释请求¶

你应该通常从BaseHandler继承，并且只重写那些行为需要改变的方法！ 你可以看到在示例中，大多数时候你只需要重写 preprocess 或 postprocess

不过，你仍然可以从头开始编写一个类。下面是一个例子。基本上，它遵循典型的Init-Pre-Infer-Post模式来创建可维护的自定义处理器。

# custom handler file

# model_handler.py

"""
ModelHandler defines a custom model handler.
"""

from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.explain = False
        self.target = 0

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time
        :param context: Initial context contains model server system properties.
        :return:
        """
        self._context = context
        self.initialized = True
        #  load the model, refer 'custom handler class' above for details

    def preprocess(self, data):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        # Take the input data and make it inference ready
        preprocessed_data = data[0].get("data")
        if preprocessed_data is None:
            preprocessed_data = data[0].get("body")

        return preprocessed_data


    def inference(self, model_input):
        """
        Internal inference methods
        :param model_input: transformed model input data
        :return: list of inference output in NDArray
        """
        # Do some inference call to engine here and return output
        model_output = self.model.forward(model_input)
        return model_output

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        # Take output from network and post-process to desired format
        postprocess_output = inference_output
        return postprocess_output

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        model_input = self.preprocess(data)
        model_output = self.inference(model_input)
        return self.postprocess(model_output)

参见waveglow_handler以获取更多详细信息。

自定义处理器的解释¶

Torchserve返回了图像分类、文本分类和BERT模型的captum解释。这是通过以下请求实现的： POST /explanations/{model_name}

解释被写入到基处理方法的 explain_handle 方法中。基处理方法调用这个 explain_handle 方法。传递给 explain_handle 方法的参数是预处理数据和原始数据。它调用自定义处理方法的 get_insights 函数，该函数返回 captum attributions。用户应该编写自己的 get_insights 功能来获取解释。

为了使用自定义处理器，应初始化captum算法在处理器的初始化函数中。

用户可以覆盖自定义处理中的explain_handle函数。用户应该为自定义处理定义get_insights方法以获取Captum属性。

上述ModelHandler类应具有以下带有captum功能的方法。

    def initialize(self, context):
        """
        Load the model and its artifacts
        """
        .....
        self.lig = LayerIntegratedGradients(
                captum_sequence_forward, self.model.bert.embeddings
            )

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction/explanation request.
        Do pre-processing of data, prediction using model and postprocessing of prediction/explanations output
        :param data: Input data for prediction/explanation
        :param context: Initial context contains model server system properties.
        :return: prediction/ explanations output
        """
        model_input = self.preprocess(data)
        if not self._is_explain():
                model_output = self.inference(model_input)
                model_output = self.postprocess(model_output)
            else :
                model_output = self.explain_handle(model_input, data)
            return model_output
    
    # Present in the base_handler, so override only when neccessary
    def explain_handle(self, data_preprocess, raw_data):
        """Captum explanations handler

        Args:
            data_preprocess (Torch Tensor): Preprocessed data to be used for captum
            raw_data (list): The unprocessed data to get target from the request

        Returns:
            dict : A dictionary response with the explanations response.
        """
        output_explain = None
        inputs = None
        target = 0

        logger.info("Calculating Explanations")
        row = raw_data[0]
        if isinstance(row, dict):
            logger.info("Getting data and target")
            inputs = row.get("data") or row.get("body")
            target = row.get("target")
            if not target:
                target = 0

        output_explain = self.get_insights(data_preprocess, inputs, target)
        return output_explain

    def get_insights(self,**kwargs):
        """
        Functionality to get the explanations.
        Called from the explain_handle method 
        """
        pass

扩展默认处理器¶

TorchServe 有以下默认处理器。

如果需要，请扩展上述处理器以创建自定义处理器。此外，您还可以扩展抽象 base_handler。

在Python脚本中导入默认处理器使用以下导入语句。

from ts.torch_handler.<default_handler_name> import <DefaultHandlerClass>

以下是一个自定义处理器扩展默认图像分类器处理器的示例。

from ts.torch_handler.image_classifier import ImageClassifier

class CustomImageClassifier(ImageClassifier):

    def preprocess(self, data):
        """
        Overriding this method for custom preprocessing.
        :param data: raw data to be transformed
        :return: preprocessed data for model input
        """
        # custom pre-procsess code goes here
        return data

请参阅以下示例以获取更多详细信息：

创建模型档案并设置入口点¶

TorchServe 从 manifest 文件中识别自定义服务的入口点。当你创建模型包时，通过使用 --handler 选项指定入口点的位置。

model-archiver 工具使您能够创建 TorchServe 可以提供的模型存档。

torch-model-archiver --model-name <model-name> --version <model_version_number> --handler model_handler[:<entry_point_function_name>] [--model-file <path_to_model_architecture_file>] --serialized-file <path_to_state_dict_file> [--extra-files <comma_seperarted_additional_files>] [--export-path <output-dir> --model-path <model_dir>] [--runtime python3]

注意 -

选项在[]中是可选的。
entry_point_function_name 可以跳过，如果它在你的处理模块或处理程序是 Python 类中命名为 handle

这会在目录 <output-dir> 中为 Python3 运行时创建文件 <model-name>.mar。参数 --runtime 启用了在运行时使用特定的 Python 版本。默认情况下，它使用系统的默认 Python 发行版。

示例

torch-model-archiver --model-name waveglow_synthesizer --version 1.0 --model-file waveglow_model.py --serialized-file nvidia_waveglowpyt_fp32_20190306.pth --handler waveglow_handler.py --extra-files tacotron.zip,nvidia_tacotron2pyt_fp32_20190306.pth

处理在多个GPU上执行模型¶

TorchServe 在 vCPUs 或 GPU 上扩展后端工作线程。如果有多块 GPU，TorchServe 会以轮询方式选择 GPU 设备，并将该设备 ID 传递给模型处理器对象。用户应使用此 GPU ID 创建 PyTorch 设备对象，以确保所有工作线程不会在同一 GPU 上创建。以下代码片段可用于模型处理器以创建 PyTorch 设备对象：

import torch

class ModelHandler(object):
    """
    A base Model handler implementation.
    """

    def __init__(self):
        self.device = None

    def initialize(self, context):
        properties = context.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

安装特定模型的Python依赖项¶

自定义模型/处理器可能依赖于不同的Python包，这些包不是作为TorchServe安装的一部分。

以下步骤允许用户提供一个包含自定义Python包的列表，以便通过TorchServe无缝进行模型服务。

自定义服务¶

本文档内容¶

自定义处理器¶

开始使用BaseHandler！¶

自定义处理器，具有`module`级入口点¶

自定义处理器，具有`class`级入口点¶

高级自定义处理器¶

返回自定义错误代码¶

编写自定义处理器以处理预测和解释请求¶

自定义处理器的解释¶

扩展默认处理器¶

创建模型档案并设置入口点¶

处理在多个GPU上执行模型¶

安装特定模型的Python依赖项¶

文档

教程

资源

自定义服务¶

本文档内容¶

自定义处理器¶

开始使用BaseHandler！¶

自定义处理器，具有module级入口点¶

自定义处理器，具有class级入口点¶

高级自定义处理器¶

返回自定义错误代码¶

编写自定义处理器以处理预测和解释请求¶

自定义处理器的解释¶

扩展默认处理器¶

创建模型档案并设置入口点¶

处理在多个GPU上执行模型¶

安装特定模型的Python依赖项¶

文档

教程

资源

自定义处理器，具有`module`级入口点¶

自定义处理器，具有`class`级入口点¶