TorchServe gRPC API¶

Note: 当前TorchServe gRPC不支持工作流。

TorchServe 还支持 gRPC API，用于推理和管理调用。

TorchServe 提供以下 gRPC 服务接口

推理API
- ping : 获取运行中的服务器健康状态
- 预测 : 获取服务模型的预测结果
- 流预测 : 从保存的模型中获取服务器端流式预测

对于所有推理API请求，TorchServe要求包含正确的推理令牌或将令牌授权禁用。有关更多详情，请参阅令牌授权文档

管理API
- 注册模型 : 在TorchServe上提供模型/版本
- 注销模型 : 通过从TorchServe中注销特定版本的模型来释放系统资源
- ScaleWorker : 动态调整模型的可用工作线程数量，以更好地满足不同推理请求负载。
- 列表模型 : 查询当前注册模型的默认版本
- DescribeModel : 获取模型默认版本的详细运行时状态
- SetDefault : 设置任何注册的模型版本为默认版本

对于所有管理API请求，TorchServe要求包含正确的管理令牌或禁用令牌授权。有关详细信息，请参阅令牌授权文档

默认情况下，TorchServe 在本地主机的端口 7070 上监听 gRPC 推理 API，在端口 7071 上监听 gRPC 管理 API。要配置不同地址和端口上的 gRPC API，请参阅配置文档

Python客户端示例，用于gRPC API¶

运行以下命令以注册、运行推理和注销来自 TorchServe 模型库的 densenet161 模型，使用 gRPC Python 客户端。

安装 TorchServe
克隆仓库以运行此示例

git clone --recurse-submodules https://github.com/pytorch/serve
cd serve

安装 gRPC Python 依赖项

pip install -U grpcio protobuf grpcio-tools googleapis-common-protos

开始 PyTorch Serve

mkdir models
torchserve --start --disable-token-auth --enable-model-api --model-store models/

使用proto文件生成Python gRPC客户端代理

python -m grpc_tools.protoc -I third_party/google/rpc --proto_path=frontend/server/src/main/resources/proto/ --python_out=ts_scripts --grpc_python_out=ts_scripts frontend/server/src/main/resources/proto/inference.proto frontend/server/src/main/resources/proto/management.proto

注册densenet161模型

注意：要在TorchServe启动后使用此API，必须启用模型API控制。在启动TorchServe时，在命令行中添加--enable-model-api以启用此API的使用。有关更多详细信息，请参见模型API控制

如果令牌授权被禁用，请使用：

python ts_scripts/torchserve_grpc_client.py register densenet161

如果启用令牌授权，请使用：

python ts_scripts/torchserve_grpc_client.py register densenet161 --auth-token <management-token>

使用推理运行

如果令牌授权被禁用，请使用：

python ts_scripts/torchserve_grpc_client.py infer densenet161 examples/image_classifier/kitten.jpg

如果启用令牌授权，请使用：

python ts_scripts/torchserve_grpc_client.py infer densenet161 examples/image_classifier/kitten.jpg --auth-token <inference-token>

注销densenet161模型

注意：要在TorchServe启动后使用此API，必须启用模型API控制。在启动TorchServe时，在命令行中添加--enable-model-api以启用此API的使用。有关更多详细信息，请参见模型API控制

如果令牌授权被禁用，请使用：

python ts_scripts/torchserve_grpc_client.py unregister densenet161

如果启用令牌授权，请使用：

python ts_scripts/torchserve_grpc_client.py unregister densenet161 --auth-token <management-token>

GRPC 服务器端流式传输¶

TorchServe GRPC API 添加了一个服务器端流式传输的推理API“StreamPredictions”，允许将一系列推理响应通过相同的GRPC流发送。这个新API仅推荐在全响应的推理延迟较高且推理中间结果发送到客户端时使用。一个例子可能是用于生成性应用的LLMs，其中生成“n”个令牌可能具有较高的延迟，在这种情况下，用户可以在每个生成的令牌准备好后接收它，直到整个响应完成。这个新API自动将batchSize强制设置为1。

service InferenceAPIsService {
    // Check health status of the TorchServe server.
    rpc Ping(google.protobuf.Empty) returns (TorchServeHealthResponse) {}

    // Predictions entry point to get inference using default model version.
    rpc Predictions(PredictionsRequest) returns (PredictionResponse) {}

    // Streaming response for an inference request.
    rpc StreamPredictions(PredictionsRequest) returns (stream PredictionResponse) {}
}

后端处理器调用“send_intermediate_predict_response”发送一个中间结果到前端，并返回最后一个结果作为现有风格。例如

from ts.handler_utils.utils import send_intermediate_predict_response
''' Note: TorchServe v1.0.0 will deprecate
"from ts.protocol.otf_message_handler import send_intermediate_predict_response".
Please replace it with "from ts.handler_utils.utils import send_intermediate_predict_response".
'''

def handle(data, context):
    if type(data) is list:
        for i in range (3):
            send_intermediate_predict_response(["intermediate_response"], context.request_ids, "Intermediate Prediction success", 200, context)
        return ["hello world "]

TorchServe gRPC API¶

Python客户端示例，用于gRPC API¶

GRPC 服务器端流式传输¶

文档

教程

资源