使用XNNPACK后端构建和运行ExecuTorch¶

以下教程将帮助您了解如何利用 ExecuTorch XNNPACK 委托来加速使用 CPU 硬件的机器学习模型。它将介绍如何将模型导出并序列化为二进制文件，针对 XNNPACK 委托后端，并在支持的目标平台上运行模型。为了快速入门，请使用 ExecuTorch 仓库中的脚本，其中包含有关导出和生成几个示例模型二进制文件的说明，以演示整个流程。

你将在本教程中学习到：

在这个教程中，你将学习如何导出一个XNNPACK降低后的模型并在目标平台上运行它

在开始之前，建议您完成以下内容：

将模型转换为XNNPACK¶

import torch
import torchvision.models as models

from torch.export import export, ExportedProgram
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge_transform_and_lower
from executorch.exir.backend.backend_api import to_backend


mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

exported_program: ExportedProgram = export(mobilenet_v2, sample_inputs)
edge: EdgeProgramManager = to_edge_transform_and_lower(
    exported_program,
    partitioner=[XnnpackPartitioner()],
)

我们将使用从TorchVision库下载的MobileNetV2预训练模型来演示这个示例。模型转换的流程始于导出模型to_edge之后。我们调用to_backend API，并传入XnnpackPartitioner。分区器会识别适合XNNPACK后端委托消费的子图。随后，这些识别出的子图将按照XNNPACK委托的flatbuffer格式进行序列化，并且每个子图将被替换为对XNNPACK委托的调用。

>>> print(edge.exported_program().graph_module)
GraphModule(
  (lowered_module_0): LoweredBackendModule()
  (lowered_module_1): LoweredBackendModule()
)



def forward(self, b_features_0_1_num_batches_tracked, ..., x):
    lowered_module_0 = self.lowered_module_0
    lowered_module_1 = self.lowered_module_1
    executorch_call_delegate_1 = torch.ops.higher_order.executorch_call_delegate(lowered_module_1, x);  lowered_module_1 = x = None
    getitem_53 = executorch_call_delegate_1[0];  executorch_call_delegate_1 = None
    aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(getitem_53, [1, 1280]);  getitem_53 = None
    aten_clone_default = executorch_exir_dialects_edge__ops_aten_clone_default(aten_view_copy_default);  aten_view_copy_default = None
    executorch_call_delegate = torch.ops.higher_order.executorch_call_delegate(lowered_module_0, aten_clone_default);  lowered_module_0 = aten_clone_default = None
    getitem_52 = executorch_call_delegate[0];  executorch_call_delegate = None
    return (getitem_52,)

我们在降低上述内容后打印图，以显示调用XNNPACK委托所插入的新节点。被委托给XNNPACK的子图是在每个调用点的第一个参数。可以观察到，大部分convolution-relu-add块和linear块能够被委托给XNNPACK。我们还可以看到无法降低到XNNPACK委托的操作符，例如clone和view_copy。

exec_prog = edge.to_executorch()

with open("xnnpack_mobilenetv2.pte", "wb") as file:
    exec_prog.write_to_file(file)

在降低到XNNPACK程序后，我们可以将其准备用于executorch，并将模型保存为.pte文件。.pte是一种二进制格式，用于存储序列化的ExecuTorch图。

将量化模型降低到XNNPACK¶

XNNPACK 委托还可以执行对称量化模型。要了解量化流程并学习如何量化模型，请参阅自定义量化说明。为了本教程的目的，我们将利用添加到executorch/executorch/examples文件夹中的quantize() Python 辅助函数。

from torch.export import export_for_training
from executorch.exir import EdgeCompileConfig, to_edge_transform_and_lower

mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

mobilenet_v2 = export_for_training(mobilenet_v2, sample_inputs).module() # 2-stage export for quantization path

from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)


def quantize(model, example_inputs):
    """This is the official recommended flow for quantization in pytorch 2.0 export"""
    print(f"Original model: {model}")
    quantizer = XNNPACKQuantizer()
    # if we set is_per_channel to True, we also need to add out_variant of quantize_per_channel/dequantize_per_channel
    operator_config = get_symmetric_quantization_config(is_per_channel=False)
    quantizer.set_global(operator_config)
    m = prepare_pt2e(model, quantizer)
    # calibration
    m(*example_inputs)
    m = convert_pt2e(m)
    print(f"Quantized model: {m}")
    # make sure we can export to flat buffer
    return m

quantized_mobilenetv2 = quantize(mobilenet_v2, sample_inputs)

量化需要一个两阶段导出。首先我们使用 export_for_training API 来捕获模型，在将其交给 quantize 工具函数之前。在执行量化步骤后，我们现在可以利用 XNNPACK 委托来降低量化导出的模型图。从这里开始，流程与非量化模型降低到 XNNPACK 的过程相同。

# Continued from earlier...
edge = to_edge_transform_and_lower(
    export(quantized_mobilenetv2, sample_inputs),
    compile_config=EdgeCompileConfig(_check_ir_validity=False),
    partitioner=[XnnpackPartitioner()]
)

exec_prog = edge.to_executorch()

with open("qs8_xnnpack_mobilenetv2.pte", "wb") as file:
    exec_prog.write_to_file(file)

使用 `aot_compiler.py` 脚本降低¶

我们还提供了一个脚本，可以快速降低并导出几个示例模型。您可以运行该脚本以生成降低后的 fp32 和量化模型。此脚本仅用于方便起见，执行的操作与前两个部分列出的步骤完全相同。

python -m examples.xnnpack.aot_compiler --model_name="mv2" --quantize --delegate

注意在上面的例子中，

the -—model_name 指定要使用的模型
the -—quantize flag controls whether the model should be quantized or not
the -—delegate flag controls whether we attempt to lower parts of the graph to the XNNPACK delegate.

生成的模型文件将根据提供的参数命名为 [model_name]_xnnpack_[qs8/fp32].pte。

使用CMake运行XNNPACK模型¶

在导出 XNNPACK 委托模型后，我们现在可以尝试使用 CMake 用示例输入运行它。我们可以构建并使用 xnn_executor_runner，它是 ExecuTorch 运行时和 XNNPACK 后端的一个示例包装器。我们首先通过以下方式配置 CMake 构建：

# cd to the root of executorch repo
cd executorch

# Get a clean cmake-out directory
./install_requirements.sh --clean
mkdir cmake-out

# Configure cmake
cmake \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_ENABLE_LOGGING=ON \
    -DPYTHON_EXECUTABLE=python \
    -Bcmake-out .

然后你可以使用以下命令构建运行时组件：

cmake --build cmake-out -j9 --target install --config Release

现在你应该能够找到在 ./cmake-out/backends/xnnpack/xnn_executor_runner 构建的可执行文件，你可以用你生成的模型以如下方式运行该可执行文件

./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_fp32.pte
# or to run the quantized variant
./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_q8.pte

使用XNNPACK后端进行构建和链接¶

您可以构建XNNPACK后端CMake目标,并将其与您的应用程序二进制文件（如Android或iOS应用程序）链接。有关此方面的更多信息，您可以查看此资源。