TensorFlow Lite代码分析

这是 TensorFlow Lite C++ 源代码中的一段实现，用于 Interpreter::Invoke() 函数。这段代码执行了模型的推理过程，并对一些性能和安全性进行了处理。下面是对这段代码功能的逐步解读：

1. 函数的主要作用

Interpreter::Invoke() 是 TensorFlow Lite 的核心方法，用于执行模型推理。通过调用它，可以处理输入数据并获取模型的输出。

2. 代码分析

ScopedRuntimeInstrumentationProfile

1	`ScopedRuntimeInstrumentationProfile scoped_runtime_event(root_profiler_.get(), "invoke");`

作用：创建一个性能跟踪器，记录推理过程的性能指标（如运行时间）。
**root_profiler_**：这是用于跟踪模型推理性能的分析器。
**"invoke"**：指定当前分析器记录的事件名称。

处理取消标志

1	`if (cancellation_enabled_) (void)continue_invocation_.test_and_set();`

作用：在支持取消推理功能的场景下，重置取消标志以确保本次推理能够正常执行。
**cancellation_enabled_**：表示是否启用取消功能。
**continue_invocation_**：是一个原子标志，用于检查推理是否被取消。

抑制非正规浮点数（Denormal Floating Point Numbers）

1	`ruy::ScopedSuppressDenormals suppress_denormals;`

作用：抑制非正规浮点数（Denormals）的影响。
原因：在某些架构（如 x86）上，处理非正规浮点数可能会导致严重的性能下降。
**ruy::ScopedSuppressDenormals**：确保推理期间所有浮点运算不会生成非正规值。

执行模型主图的推理

1 2	`TF_LITE_ENSURE_STATUS_WITH_SCOPED_INSTRUMENTATION( scoped_runtime_event, primary_subgraph().Invoke());`

**primary_subgraph().Invoke()**：调用模型的主子图进行推理。
**TF_LITE_ENSURE_STATUS_WITH_SCOPED_INSTRUMENTATION**：检查推理状态，并将状态与性能分析工具关联。

处理输出张量数据

if (!allow_buffer_handle_output_) {
    for (int tensor_index : outputs()) {
        TF_LITE_ENSURE_STATUS_WITH_SCOPED_INSTRUMENTATION(
            scoped_runtime_event,
            primary_subgraph().EnsureTensorDataIsReadable(tensor_index));
    }
}

allow_buffer_handle_output_

：用于控制是否直接允许访问输出缓冲区的句柄。
- **true**：跳过对输出数据的进一步检查。
- **false**：确保输出张量的数据可以被正确读取。
**EnsureTensorDataIsReadable**：检查并确保输出张量数据是可访问的。

返回状态

1	`return kTfLiteOk;`

表示推理执行成功，返回 kTfLiteOk 状态。

3. 函数中的关键点

性能优化

：
- 使用 ScopedRuntimeInstrumentationProfile 记录性能数据。
- 抑制非正规浮点数以提高计算速度。
取消功能支持

：
- 提供了取消推理执行的机制，适用于长时间运行的模型。
安全性检查

：
- 确保输出数据的可读性，防止缓冲区访问错误。
模块化设计

：
- 调用主子图的推理逻辑，使得模型支持多子图和模块化扩展。

4. 适用场景

移动和嵌入式设备：适合资源受限环境，优化了性能并支持运行时监控。
大规模推理系统：支持性能分析和取消操作，用于高效调试和部署。

这段代码体现了 TensorFlow Lite 在性能、安全性和功能支持上的深度设计。如果需要修改此函数（如自定义推理逻辑），需确保遵循 TensorFlow Lite 的架构和 API 规范。

TfLiteStatus Subgraph::Invoke() {
auto status = InvokeImpl();
telemetry::TelemetryReportEvent(&context_, “Invoke”, status);
return status;
}

这段代码是 TensorFlow Lite 的核心实现之一，Subgraph::InvokeImpl()，用于执行子图的推理过程（子图是 TensorFlow Lite 中支持多子图模型运行的核心结构之一）。以下是逐步分析和解读：

1. 函数的主要作用

Subgraph::InvokeImpl() 是 TFLite 的底层实现，负责按照执行计划依次执行子图中的每个节点操作，并管理相关的资源分配、错误处理和性能跟踪。

2. 代码分解与详细解读

2.1. 初步一致性检查

if (!consistent_) {
    ReportError("Invoke called on model that is not consistent.");
    return kTfLiteError;
}

检查子图是否处于一致状态（consistent_）。
如果子图不一致，说明模型结构可能被破坏，立即返回错误。

2.2. 状态检查

if (state_ == kStateUninvokable) {
    ReportError("Invoke called on model that is not ready.");
    return kTfLiteError;
} else if (memory_planner_ && !memory_planner_->HasNonPersistentMemory()) {
    ReportError("Non-persistent memory is not available.");
    return kTfLiteError;
}

模型状态检查：kStateUninvokable 表示子图未准备好运行，可能因为没有完成内存分配。
内存检查：确保内存规划器存在且已分配非持久性内存（动态内存）。

2.3. 性能跟踪

1	`TFLITE_SCOPED_TAGGED_DEFAULT_PROFILE(profiler_.get(), "Invoke");`

启动性能跟踪，用于记录当前子图的执行性能数据。

2.4. 执行计划的遍历

1
2
3

for (int execution_plan_index = 0; 
     execution_plan_index < execution_plan_.size(); 
     execution_plan_index++) {

遍历子图的执行计划：execution_plan_ 是一组节点索引的列表，定义了模型操作的执行顺序。
对每个节点进行如下处理：

2.4.1. 节点准备

1
2
3

if (execution_plan_index == next_execution_plan_index_to_prepare_) {
    TF_LITE_ENSURE_STATUS(PrepareOpsAndTensors());
}

如果当前节点需要重新准备资源（例如张量或操作），调用 PrepareOpsAndTensors() 准备。

2.4.2. 读取节点和操作

1
2
3

int node_index = execution_plan_[execution_plan_index];
TfLiteNode& node = nodes_and_registration_[node_index].first;
const TfLiteRegistration& registration = nodes_and_registration_[node_index].second;

通过 node_index 获取节点和其对应的操作注册信息。

2.4.3. 数据完整性检查

for (int i = 0; i < node.inputs->size; ++i) {
    int tensor_index = node.inputs->data[i];
    if (tensor->data.raw == nullptr && tensor->bytes > 0) {
        ReportError("Input tensor %d lacks data", tensor_index);
        return kTfLiteError;
    }
}

确保节点输入张量的内存数据完整。
如果发现某个输入张量缺少必要数据，则报错返回。

2.4.4. 执行节点操作

if (auto s = OpInvoke(registration, &node); s != kTfLiteOk) {
    auto err = ReportOpError(&context_, node, registration, node_index, "failed to invoke");
    return s == kTfLiteCancelled ? s : err;
}

调用 OpInvoke()，执行注册的操作。
如果操作失败，报告错误并返回状态。

2.4.5. 动态张量调整

if (tensor_resized_since_op_invoke_ && 
    HasDynamicTensor(context_, node.outputs, nullptr)) {
    next_execution_plan_index_to_prepare_ = execution_plan_index + 1;
}

检测动态张量大小是否发生变化。
如果发生变化，更新执行计划以重新准备相关节点。

2.5. 取消操作的支持

if (check_cancelled_func_ != nullptr && check_cancelled_func_(cancellation_data_)) {
    ReportError("Client requested cancel during Invoke()");
    return kTfLiteError;
}

如果用户请求取消当前推理操作，通过回调函数处理取消逻辑并终止执行。

3. 关键特性总结

执行计划管理：通过遍历执行计划，依次调用子图中的每个操作。
动态内存管理：支持动态张量的大小调整，并重新规划内存。
错误和取消处理：在推理过程中随时检查错误状态和取消请求。
性能监控：内置性能跟踪机制，帮助开发者优化推理性能。

4. 适用场景

嵌入式设备推理：适合资源受限环境下的高效执行。
分布式推理框架：支持多子图和动态内存的灵活运行。
调试与优化：通过内置的错误报告和性能跟踪，便于开发者发现问题和优化性能。

Conv2D实现

在 TensorFlow Lite 中，Conv2D 操作并不是直接定义在 interpreter 类本身，而是由 TensorFlow Lite 的操作集合（operations）来实现。TensorFlow Lite 的 Interpreter 类是用于加载和执行 .tflite 模型的，它通过解析模型中的操作（例如 Conv2D）来执行推理。

1. `Conv2D` 操作在 TensorFlow Lite 中的位置

Conv2D注册 tensorflow/core/kernels/conv_ops.cc

REGISTER_OP("Conv2D")
    .Input("input: T")
    .Input("filter: T")
    .Output("output: T")
    .Attr("T: {half, bfloat16, float, double, int32}")
    .Attr("strides: list(int)")
    .Attr("use_cudnn_on_gpu: bool = true")
    .Attr(GetPaddingAttrStringWithExplicit())
    .Attr(GetExplicitPaddingsAttrString())
    .Attr(GetConvnetDataFormatAttrString())
    .Attr("dilations: list(int) = [1, 1, 1, 1]")
    .SetShapeFn(shape_inference::Conv2DShapeWithExplicitPadding);

Conv2D实现

1
2
3

REGISTER_KERNEL_BUILDER(
    Name("Conv").Device(DEVICE_CPU).TypeConstraint<bfloat16>("T"),
    ConvOp<CPUDevice, bfloat16>);

tensorflow-master/tensorflow/core/kernels/conv_ops_bfloat16.cc:    Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<bfloat16>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_bfloat16.cc:    Name("Conv2D").Device(DEVICE_GPU).TypeConstraint<Eigen::bfloat16>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_double.cc:    Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<double>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_double.cc:    Name("Conv2D").Device(DEVICE_GPU).TypeConstraint<double>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_float.cc:    Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<float>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_float.cc:    Name("Conv2D").Device(DEVICE_GPU).TypeConstraint<float>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_half.cc:    Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<Eigen::half>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_half.cc:    Name("Conv2D").Device(DEVICE_GPU).TypeConstraint<Eigen::half>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_int32.cc:    Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<int32>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_int32.cc:    Name("Conv2D").Device(DEVICE_GPU).TypeConstraint<int32>("T"),
tensorflow-master/tensorflow/core/kernels/conv_ops_using_gemm.cc:      Name("Conv2D").Device(DEVICE_CPU).TypeConstraint<T>("T"), \

在 TensorFlow Lite 中，卷积操作 Conv2D 是一个内部实现的操作，它是由 TFLite 操作库提供的。具体来说，TensorFlow Lite 会根据模型中 .tflite 文件的结构，自动解析卷积操作，并调用对应的底层实现。卷积操作的实现通常会通过一个 操作委托（delegate） 来完成（如 GPU、EdgeTPU、CPU 上的优化）。

2. 卷积操作的实现机制

Conv2D 操作的实现通常会依赖于 操作节点（operation node），每个操作节点定义了要执行的计算。在 TensorFlow Lite 中，这些操作节点在模型的解析和推理过程中被逐一执行。卷积层的实现涉及以下几个步骤：

在解析 .tflite 模型时，TensorFlow Lite 会解析出一个或多个 Conv2D 操作，并为每个操作分配相应的输入张量（图像数据）和权重（卷积核）。
Interpreter 类会管理模型执行的流程，它会在调用 invoke() 方法时，根据每个操作的定义，依次执行卷积操作。这些操作会触发内部的执行机制，卷积操作会通过优化的数学算法来计算卷积输出。
卷积操作的底层实现：具体的实现和优化（如矩阵乘法、Winograd 算法、手写优化等）是在 TensorFlow Lite 中的 操作实现函数 中定义的。这些函数在执行时会被调用，并通过 invoke() 执行模型计算。

3. 查找 `Conv2D` 操作定义

在 TensorFlow Lite 的源代码中，Conv2D 的具体实现通常位于 tensorflow/lite/kernels 文件夹中的实现文件里。例如：

conv_2d.cc 文件：包含 Conv2D 操作的具体实现。
kernels 子模块：TensorFlow Lite 在 kernels 文件夹中定义了各种操作（包括 Conv2D），这些操作用于处理低级别的张量计算。

例如，你可以在 TensorFlow Lite 的源代码中看到像以下这样的函数：

// 在 `conv_2d.cc` 文件中的 Conv2D 实现：
TfLiteStatus Conv2DPrepare(TfLiteContext* context, TfLiteNode* node) {
    // 卷积层准备工作
}

TfLiteStatus Conv2DInvoke(TfLiteContext* context, TfLiteNode* node) {
    // 卷积操作的实际执行函数
}

4. 调用机制

当你在 Interpreter 中调用 invoke() 时，TensorFlow Lite 会依次遍历每一层，识别每个操作（例如 Conv2D）。对于 Conv2D 操作，它会调用底层的执行函数（如 Conv2DInvoke）来完成卷积运算。这个函数会执行以下任务：

提取输入张量和卷积核。
使用卷积算法（例如，直接卷积、GEMM、Winograd 等）执行卷积计算。
计算并存储输出结果。

5. 总结

Conv2D 操作并不在 interpreter 类本身定义，而是在 TensorFlow Lite 的 kernels 目录下的实现文件中（例如 conv_2d.cc）。
Interpreter 类 会解析 .tflite 模型并执行对应的操作（如 Conv2D）。它会在 invoke() 方法中触发卷积操作的执行，调用相应的底层实现函数来完成计算。
TensorFlow Lite 中的卷积操作会通过优化的算法执行，如 GEMM、Winograd 或硬件加速（GPU、EdgeTPU）来提高性能。

要在 TensorFlow 中注册自己实现的 Conv2D 操作，可以参考以下步骤：

注册自己的Conv2D

1. 定义 Op 的接口

首先，需要定义一个新的 Op 接口。这通常涉及到编写一个 .cc 文件，其中包含对 REGISTER_OP 宏的调用，以定义操作的输入、输出和属性。例如：

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"

using namespace tensorflow;

REGISTER_OP("CustomConv2D")
    .Input("input: T")
    .Input("filter: T")
    .Output("output: T")
    .Attr("T: {half, bfloat16, float, double, int32}")
    .Attr("strides: list(int)")
    .Attr("padding: {'SAME', 'VALID'}")
    .Attr("use_cudnn_on_gpu: bool = true")
    .Attr("data_format: {'NHWC', 'NCHW'} = 'NHWC'")
    .Attr("dilations: list(int) = [1, 1, 1, 1]")
    .SetShapeFn([](shape_inference::InferenceContext* c) {
        // Implement shape inference here
        return Status::OK();
    });

2. 实现 Kernel

接下来，需要实现该 Op 的具体计算逻辑。这通常涉及到编写一个继承自 OpKernel 的类，并重写 Compute 方法。例如：

class CustomConv2DKernel : public OpKernel {
public:
    explicit CustomConv2DKernel(OpKernelConstruction* ctx) : OpKernel(ctx) {
        // Initialization code here
    }

    void Compute(OpKernelContext* ctx) override {
        // Get the input tensors
        const Tensor& input = ctx->input(0);
        const Tensor& filter = ctx->input(1);

        // Get the attributes
        int32 strides[4];
        OP_REQUIRES_OK(ctx, ctx->GetAttr("strides", &strides));
        string padding;
        OP_REQUIRES_OK(ctx, ctx->GetAttr("padding", &padding));
        bool use_cudnn_on_gpu;
        OP_REQUIRES_OK(ctx, ctx->GetAttr("use_cudnn_on_gpu", &use_cudnn_on_gpu));
        string data_format;
        OP_REQUIRES_OK(ctx, ctx->GetAttr("data_format", &data_format));
        int32 dilations[4];
        OP_REQUIRES_OK(ctx, ctx->GetAttr("dilations", &dilations));

        // Perform the convolution operation
        // This is where you would call your custom convolution implementation
        // For example, using tf.nn.conv2d or a custom CUDA kernel

        // Create the output tensor
        Tensor* output = nullptr;
        OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({/* output shape */}), &output));

        // Fill the output tensor with the result of the convolution
        // ...
    }
};

REGISTER_KERNEL_BUILDER(Name("CustomConv2D").Device(DEVICE_CPU), CustomConv2DKernel);

3. 编译为共享库

将上述代码编译为一个共享库（.so 文件）。这通常涉及到编写一个 BUILD 文件，使用 Bazel 进行编译。例如：

cc_binary(
    name = "custom_conv2d_op.so",
    srcs = ["custom_conv2d_op.cc", "custom_conv2d_kernel.cc"],
    copts = ["-fPIC"],
    linkshared = 1,
    deps = [
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
    ],
)

4. 加载共享库

在 Python 代码中，使用 tf.load_op_library 加载编译好的共享库：

import tensorflow as tf

custom_conv2d_module = tf.load_op_library('./custom_conv2d_op.so')
custom_conv2d = custom_conv2d_module.custom_conv2d

5. 使用自定义 Op

现在可以在 TensorFlow 图中使用自定义的 Conv2D 操作了：

1
2
3

input_tensor = tf.random.normal([1, 28, 28, 1])
filter_tensor = tf.random.normal([3, 3, 1, 32])
output_tensor = custom_conv2d(input=input_tensor, filter=filter_tensor, strides=[1, 1, 1, 1], padding='SAME')

总结

通过上述步骤，你可以成功注册并使用自己实现的 Conv2D 操作。这涉及到定义 Op 接口、实现 Kernel、编译为共享库、加载共享库以及在 TensorFlow 图中使用自定义 Op。希望这些步骤能帮助你实现自定义的卷积操作。

TensorFlow Lite代码分析

http://blog.uanet.cn/AI/TensorFlow Lite代码分析.html

作者

dnsnat

发布于

2025年2月13日

许可协议

pcap会话预处理上一篇

fpga加速yolo方案调研下一篇

TensorFlow Lite代码分析

1. 函数的主要作用

2. 代码分析

ScopedRuntimeInstrumentationProfile

处理取消标志

抑制非正规浮点数（Denormal Floating Point Numbers）

执行模型主图的推理

处理输出张量数据

返回状态

3. 函数中的关键点

4. 适用场景

1. 函数的主要作用

2. 代码分解与详细解读

2.1. 初步一致性检查

2.2. 状态检查

2.3. 性能跟踪

2.4. 执行计划的遍历

2.4.1. 节点准备

2.4.2. 读取节点和操作

2.4.3. 数据完整性检查

2.4.4. 执行节点操作

2.4.5. 动态张量调整

2.5. 取消操作的支持

3. 关键特性总结

4. 适用场景

Conv2D实现

1. Conv2D 操作在 TensorFlow Lite 中的位置

2. 卷积操作的实现机制

3. 查找 Conv2D 操作定义

4. 调用机制

5. 总结

注册自己的Conv2D

1. 定义 Op 的接口

2. 实现 Kernel

3. 编译为共享库

4. 加载共享库

5. 使用自定义 Op

总结

1. `Conv2D` 操作在 TensorFlow Lite 中的位置

3. 查找 `Conv2D` 操作定义