ExecuTorch Vulkan 委托¶

ExecuTorch Vulkan 委托是 ExecuTorch 的原生 GPU 委托，基于跨平台的 Vulkan GPU API 标准构建。它主要用于利用 GPU 加速 Android 设备上的模型推理，但也可用于任何支持 Vulkan 实现的平台：笔记本电脑、服务器和边缘设备。

注意

Vulkan 委托目前正处于积极开发中，其组件可能会发生变化。

什么是Vulkan？¶

Vulkan 是一种低级 GPU API 规范，作为 OpenGL 的继任者而开发。与之前的规范相比，它旨在为开发者提供更明确的 GPU 控制能力，以降低开销并充分发挥现代图形硬件的性能。

Vulkan 已被众多 GPU 厂商广泛采用，市场上大多数现代 GPU（包括桌面端和移动端）均支持 Vulkan。此外，从 Android 7.0 开始，Android 系统也内置了 Vulkan。

请注意，Vulkan 是一种 GPU API，而非 GPU 数学库。也就是说，它提供了一种在 GPU 上执行计算和图形操作的方法，但并不自带高性能计算内核的内置库。

Vulkan 计算库¶

ExecuTorch Vulkan 委托是围绕一个名为Vulkan 计算库的独立运行时构建的封装器。Vulkan 计算库的目标是通过 GLSL 计算着色器为 PyTorch 算子提供 GPU 实现。

Vulkan 计算库是 PyTorch Vulkan 后端的一个分支/迭代版本。 PyTorch Vulkan 后端的核心组件被 fork 到 ExecuTorch，并适应了 AOT 图模式的模型推理（与 PyTorch 采用的急切执行模式的模型推理相反）。

Vulkan 计算库的组件包含在 executorch/backends/vulkan/runtime/ 目录中。核心组件如下所示并进行描述：

runtime/
├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects
└── graph/ .................. ComputeGraph class which implements graph mode inference
    └── ops/ ................ Base directory for operator implementations
        ├── glsl/ ........... GLSL compute shaders
        │   ├── *.glsl
        │   └── conv2d.glsl
        └── impl/ ........... C++ code to dispatch GPU compute shaders
            ├── *.cpp
            └── Conv2d.cpp

特性¶

Vulkan 委托目前支持以下功能：

内存规划
- 生命周期不重叠的中间张量将共享内存分配。这降低了模型推理时的峰值内存占用。
基于能力的分区:
- 图可以通过分区器部分降低到 Vulkan 委托，该分区器将识别由 Vulkan 委托支持的节点（即算子），并仅降低受支持的子图。
支持上限动态形状:
- 只要张量的当前形状小于降低阶段指定的边界，张量可以在推理之间改变形状。

除了增加算子覆盖范围外，以下功能目前正在开发中：

量化支持
- 我们目前正在开发对 8 位动态量化的支持，并计划在未来扩展到其他量化方案。
内存布局管理
- 内存布局是优化性能的重要因素。我们计划引入图变换，在整个计算图中实现内存布局的转换，以优化对内存布局敏感的算子，例如卷积和矩阵乘法。
选择性构建
- 我们计划通过选择要构建的算子/着色器来控制构建大小。

端到端示例¶

若要进一步了解 Vulkan 委托器的功能及其使用方法，请考虑以下使用简单单算子模型的端到端示例。

将模型编译并降低为Vulkan委托¶

假设 ExecuTorch 已经设置并安装，以下脚本可用于生成一个降低的 MobileNet V2 模型作为 vulkan_mobilenetv2.pte。

一旦 ExecuTorch 已设置并安装完成，即可使用以下脚本生成一个简单的模型，并将其降低至 Vulkan 后端。

# Note: this script is the same as the script from the "Setting up ExecuTorch"
# page, with one minor addition to lower to the Vulkan backend.
import torch
from torch.export import export
from executorch.exir import to_edge

from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner

# Start with a PyTorch model that adds two input tensors (matrices)
class Add(torch.nn.Module):
  def __init__(self):
    super(Add, self).__init__()

  def forward(self, x: torch.Tensor, y: torch.Tensor):
      return x + y

# 1. torch.export: Defines the program with the ATen operator set.
aten_dialect = export(Add(), (torch.ones(1), torch.ones(1)))

# 2. to_edge: Make optimizations for Edge devices
edge_program = to_edge(aten_dialect)
# 2.1 Lower to the Vulkan backend
edge_program = edge_program.to_backend(VulkanPartitioner())

# 3. to_executorch: Convert the graph to an ExecuTorch program
executorch_program = edge_program.to_executorch()

# 4. Save the compiled .pte program
with open("vk_add.pte", "wb") as file:
    file.write(executorch_program.buffer)

与其他 ExecuTorch 委托类似，模型可以通过使用 to_backend() API 降低到 Vulkan 委托。Vulkan 委托实现了 VulkanPartitioner 类，该类用于标识图中的节点（即操作符），这些节点由 Vulkan 委托支持，并将模型中兼容的部分分离出来在 GPU 上执行。

这意味着即使模型包含某些不支持的算子，也可以将其降低到 Vulkan 委托。这仅表示图的部分将在 GPU 上执行。

注意

支持的操作列表 Vulkan分区器代码可以被检查以查看当前在 Vulkan 代理中实现的操作。

构建 Vulkan 代理库¶

构建和测试 Vulkan 委托的最简单方法是针对 Android 进行构建，并在本地 Android 设备上进行测试。Android 设备内置了对 Vulkan 的支持，且 Android NDK 附带了 GLSL 编译器，这是编译 Vulkan 计算库的 GLSL 计算着色器所必需的。

当使用CMake构建时，可以通过设置-DEXECUTORCH_BUILD_VULKAN=ON来构建Vulkan委托库。

首先，确保已安装 Android NDK；任何超过 NDK r19c 的版本都应该可以工作。请注意，本文档中的示例已在 NDK r27b 上验证通过。Android SDK 也应已安装，以便您可以访问 adb。

本页中的说明假设已设置以下环境变量。

export ANDROID_NDK=<path_to_ndk>
# Select the appropriate Android ABI for your device
export ANDROID_ABI=arm64-v8a
# All subsequent commands should be performed from ExecuTorch repo root
cd <path_to_executorch_root>
# Make sure adb works
adb --version

使用 Vulkan 委托构建和安装 ExecuTorch 库（适用于 Android）：

# From executorch root directory
(rm -rf cmake-android-out && \
  pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=$ANDROID_ABI \
    -DEXECUTORCH_BUILD_VULKAN=ON \
    -DPYTHON_EXECUTABLE=python \
    -Bcmake-android-out && \
  cmake --build cmake-android-out -j16 --target install)

在设备上运行Vulkan模型¶

注意

由于算子支持目前有限，仅二元算术运算符可在 GPU 上运行。由于大多数算子通过便携式算子执行，推理速度预计较慢。

现在，部分委托的模型可以在您设备的 GPU 上（部分）执行！

# Build a model runner binary linked with the Vulkan delegate libs
cmake --build cmake-android-out --target vulkan_executor_runner -j32

# Push model to device
adb push vk_add.pte /data/local/tmp/vk_add.pte
# Push binary to device
adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin

# Run the model
adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte