Design DocumentNxp

i.MX TensorFlow Lite on Android User's Guide

User guide for deploying TensorFlow Lite models on NXP i.MX 8 series processors using the Android NNAPI and eIQ software stack for hardware-accelerated NPU/GPU inference.

View design document

Overview

This document provides technical guidance for implementing TensorFlow Lite on NXP i.MX 8 series platforms running Android. It details the NXP eIQ software stack and the Neural Network Runtime (NNRT) middleware, which facilitates communication between inference frameworks and hardware accelerators via the Android NN HAL. The guide covers TensorFlow Lite v2.10.1 features, including support for ARM Neon SIMD instructions and both per-tensor and per-channel quantized models. It includes instructions for building and running benchmark applications using Bazel, the Android NDK, and ADB to evaluate performance on NPU and GPU hardware units.

Use Cases

Deploying machine learning models on i.MX 8 series processors
Benchmarking neural network inference performance on Android
Optimizing TensorFlow Lite models for NPU and GPU hardware acceleration
Building Android-based edge AI applications using the NXP eIQ software stack

Topics

NXP

i.MX 8

TensorFlow Lite

Android

NNAPI

eIQ

NPU

GPU acceleration

Neural Network Runtime

NNRT

OpenVX

Machine Learning

Referenced Parts

i.MX 8M Plus

NXP

Table 1. Comparison of inference time between CPU and NPU on i.MX 8M Plus EVK

i.MX 8 series

NXP

NNRT also acts as the heterogeneous compute platform for further distributing workloads efficiently across i.MX 8 series compute devices, such as NPU, GPU and CPU.

i.MX 8M Plus	NXP	Table 1. Comparison of inference time between CPU and NPU on i.MX 8M Plus EVK
i.MX 8 series	NXP	NNRT also acts as the heterogeneous compute platform for further distributing workloads efficiently across i.MX 8 series compute devices, such as NPU, GPU and CPU.