From patchwork Thu Mar 26 07:26:51 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pratham Deshmukh X-Patchwork-Id: 84395 X-Patchwork-Delegate: reatmon@ti.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09427109C054 for ; Thu, 26 Mar 2026 07:27:10 +0000 (UTC) Received: from SN4PR0501CU005.outbound.protection.outlook.com (SN4PR0501CU005.outbound.protection.outlook.com [40.93.194.64]) by mx.groups.io with SMTP id smtpd.msgproc02-g2.42801.1774510027279913044 for ; Thu, 26 Mar 2026 00:27:07 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="dkim: body hash did not verify" header.i=@ti.com header.s=selector1 header.b=D22gGBFl; spf=permerror, err=parse error for token &{10 18 spf.protection.outlook.com}: limit exceeded (domain: ti.com, ip: 40.93.194.64, mailfrom: p-deshmukh@ti.com) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MP0STWUu2KqIWu+4ZqrS00rOPBvfLm9mKS1xnnLfHAGbrtdtjI6BU3J3RtGT25An+2v+dJCKDHW5VzJRuSGqKaVafjQhfFhcGH4QXg/7SRf6ErFk+zePm+ijP0hci+HbwvD5Fuur5W5VL1Mkr7suzzPSsw4zI/zwcHa9z/Mi8QByD4xDPokdEfI+SFix/Th9hEB5a3MyAmIZZcWapwhZRuRjwfcWK5EsKc68ApW9Sxzvk06hm/AgAB6Y5kV98hWNE1dWT2iPkc9fciT+c5DE5J/QXKxU2ZmJ/5JT3F9NreouvuOLVIm19lNqfiZ2CB0S9R3ht/9cT+mxQwXPuiqqkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=F1LQo7DahRch6Ccue5Rkm3nszWnOL6wB7Uo4J0gup8E=; b=uz4Y0v/kZO/6kIWRzR6qAfmMfjnOVxxx6L5I7nTjhY4LWD1d7GJbYoaC2SSV87YEE6YH5eQkqojtnva6VhbuYdA9449L3YI+J6W1lbs8v9wXDuSv+wn43y2ptLW4157YuS2dcy/BiWOkB1sqKdR/f8JhkwzLo1PQV+OLnpthhSPGHNrikpUFOr7qNt21vxK1BqhPbHEpasunOjTLhnJ4CAxaePilyBIhZbyXaQgzhb22YvGbeQaxCEXjH4R47DERjC0KmjbrAb9G9ar85jGMYOKTS7z6oEMgv1Y1Px5hjA7wCgK7gbeUoS7SanBAH+KhDGdKKiOXXqGgpTeUaKry6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 198.47.21.194) smtp.rcpttodomain=lists.yoctoproject.org smtp.mailfrom=ti.com; dmarc=pass (p=quarantine sp=none pct=100) action=none header.from=ti.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F1LQo7DahRch6Ccue5Rkm3nszWnOL6wB7Uo4J0gup8E=; b=D22gGBFlQ6KCpwuhwAf25BoBGAW4HDXr2CyUiDpYrBLc/ivdZTF8wx2RP6Q9WiymiC/K1Eo1fsQ8E6aqdq/a/nA9M6DH79D25Ij7vjbI72kpkLzpJhu2GO6XEbPp1KViCG4eOVlPXK4APArcm/VH1Oo+diAjdb0NzAvLo5MSeF4= Received: from SJ0P220CA0005.NAMP220.PROD.OUTLOOK.COM (2603:10b6:a03:41b::12) by CYYPR10MB7606.namprd10.prod.outlook.com (2603:10b6:930:c4::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Thu, 26 Mar 2026 07:27:03 +0000 Received: from SJ5PEPF000001D3.namprd05.prod.outlook.com (2603:10b6:a03:41b:cafe::f9) by SJ0P220CA0005.outlook.office365.com (2603:10b6:a03:41b::12) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Thu, 26 Mar 2026 07:27:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 198.47.21.194) smtp.mailfrom=ti.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ti.com; Received-SPF: Pass (protection.outlook.com: domain of ti.com designates 198.47.21.194 as permitted sender) receiver=protection.outlook.com; client-ip=198.47.21.194; helo=flwvzet200.ext.ti.com; pr=C Received: from flwvzet200.ext.ti.com (198.47.21.194) by SJ5PEPF000001D3.mail.protection.outlook.com (10.167.242.55) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Thu, 26 Mar 2026 07:27:02 +0000 Received: from DFLE200.ent.ti.com (10.64.6.58) by flwvzet200.ext.ti.com (10.248.192.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:01 -0500 Received: from DFLE202.ent.ti.com (10.64.6.60) by DFLE200.ent.ti.com (10.64.6.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:01 -0500 Received: from lelvem-mr05.itg.ti.com (10.180.75.9) by DFLE202.ent.ti.com (10.64.6.60) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20 via Frontend Transport; Thu, 26 Mar 2026 02:27:01 -0500 Received: from pratham-TI.dhcp.ti.com (pratham-ti.dhcp.ti.com [172.24.233.101]) by lelvem-mr05.itg.ti.com (8.18.1/8.18.1) with ESMTP id 62Q7Qsxc1733061; Thu, 26 Mar 2026 02:26:58 -0500 From: Pratham Deshmukh To: CC: , , , , , Pratham Deshmukh Subject: [meta-arago][master][PATCH v2 1/3] tensorflow-lite: Add armv7 support with fp16 optimizations for v2.20.0 Date: Thu, 26 Mar 2026 12:56:51 +0530 Message-ID: <20260326072653.1025506-2-p-deshmukh@ti.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260326072653.1025506-1-p-deshmukh@ti.com> References: <20260326072653.1025506-1-p-deshmukh@ti.com> MIME-Version: 1.0 X-C2ProcessedOrg: 333ef613-75bf-4e12-a4b1-8e3623f5dcea X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D3:EE_|CYYPR10MB7606:EE_ X-MS-Office365-Filtering-Correlation-Id: eafd5314-2d9a-4e90-3a2c-08de8b09131c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|376014|82310400026|1800799024|13003099007|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: sv45AIF+HodoeddNZuA9hEVFauaIeC3Eazb+xXHgAtBa4A1cf+uONa/XEi5g3ry0G1YtONSa9jMhj8KlUKSli5D46SJeEqFkNYA5gOUDhqFZVops8UV2P+IUnnet0saVVn/5VaHiNj/4BVakl3HpSM0o+hSPxRa0fWDj0DBeefcwKu1PnGRTn/pFixAEH2BnSSuG72gab7DcGqV8ZY6TABmIDHLJGNJUkaRhwp9HLqtBrh5OLZwYZOfy6TPthjHDf32mGI4sXVlj/mNLdD/pl6bjGIrwwAJVZBKl/ucaRP3eCZZ2/97CnxPURHdbVYVtxF7yqjMu4pDnbjRA6n4C19ozFMYDXvKSBa77ZaJF4kNIzn1s4hD3mvS7sA+bjwG8BsJqO28lpQN11EbrlAh8bDxJJK2oUXiSxF/7AelAxlNBVOC2gnewYtDIu0mPKLF/xIhOFt5sfnEeXGJ0CXrF5W5AsxoF5QZ6DwKKrMJkH/CpqdZ9qzyDG8bmKW5VlxHcQkGM+4ePcNIyVPg/5n2LRzRsDhAHXjlgc6X8QpfCzC3nk4VRxP3x5ZDUsiXWf8tfgbjBFJShbHG0XkqbunyLjUpI2ClgUQ1rjuNBnVj8cgikB4gKMcAuMbXXgZoZhhy3+7zRjVtsqvG92wa/7iAtXqE2sFUJdPQc6GMZPz/J43NKpDiP8ckroLaR4LgICfGdaJNNcsWqNY1TkI5Cqx1NABoFuOT0yGujk8lDP73qDtE= X-Forefront-Antispam-Report: CIP:198.47.21.194;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:flwvzet200.ext.ti.com;PTR:ErrorRetry;CAT:NONE;SFS:(13230040)(36860700016)(376014)(82310400026)(1800799024)(13003099007)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 5NZheeqGSARNaOsMVO6K/Ii9EIMyuT22IS5JARxv19YActm79jHlTd5avwh+ie/+X2I5wj5z3DaO2mr/MXVHeEuRCmZd0+exX86N6/YLxEHrQj2Uwsy5VM9RUvEsNP/k9Q7WKc7Aii2XHybve46mRocDGXt5MSHgaizIA+3TsgTHvSlYIQbvVI8m3LYodVn5DDTLAub+1urRp6BHXRNcy7Utp90zjPgq1n7x0Al0BD6b/ALe50TMCnbyIOAjL+oHDKK5VyrZPwqcHJsNmnjLpRruGzoGr5TQrtZlgEVUXEPJ57AgjVpnm3R/ZbgtYCo/CFagQZP1Ccay+8dImpNeOVW1GihgdxgHl8GVew+zdZFMSFfU7DC+qPXLWT4jkJfb3o7umzk31pfFQhiXijiCpScqh6F9Gk5mDA0W5RmRxMbJyxZJ4dCj+wNtvLfJcgcP X-OriginatorOrg: ti.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Mar 2026 07:27:02.3073 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: eafd5314-2d9a-4e90-3a2c-08de8b09131c X-MS-Exchange-CrossTenant-Id: e5b49634-450b-4709-8abb-1e2b19b982b7 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=e5b49634-450b-4709-8abb-1e2b19b982b7;Ip=[198.47.21.194];Helo=[flwvzet200.ext.ti.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D3.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYYPR10MB7606 List-Id: X-Webhook-Received: from 45-33-107-173.ip.linodeusercontent.com [45.33.107.173] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 26 Mar 2026 07:27:10 -0000 X-Groupsio-URL: https://lists.yoctoproject.org/g/meta-arago/message/17440 Added three patches to enable TensorFlow Lite on armv7 platforms: - 0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch - Fix xnnpack-delegate CMake configuration failures when XNNPACK is disabled for armv7 by guarding target operations with existence checks - 0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch - Introduce comprehensive fp16 data type infrastructure, replacing Eigen::half dependency with native TFLite implementation for improved performance on armv7 - 0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch - Add float16 support to EMBEDDING_LOOKUP kernel with full test coverage across various quantization modes Signed-off-by: Pratham Deshmukh --- Change Log: - No changes ...delegate-target-operations-for-armv7.patch | 38 ++ ...pe-infrastructure-to-TensorFlow-Lite.patch | 552 ++++++++++++++++++ ...6-support-to-EMBEDDING_LOOKUP-kernel.patch | 447 ++++++++++++++ .../tensorflow-lite/tensorflow-lite_2.20.0.bb | 3 + 4 files changed, 1040 insertions(+) create mode 100644 meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch create mode 100644 meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch create mode 100644 meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch diff --git a/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch new file mode 100644 index 00000000..428c7849 --- /dev/null +++ b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch @@ -0,0 +1,38 @@ +From ba13240ebc53a572edd984b8c223e39480bf45ee Mon Sep 17 00:00:00 2001 +From: Pratham Deshmukh +Date: Fri, 20 Mar 2026 11:45:44 +0530 +Subject: [PATCH 4/6] Disable xnnpack-delegate target operations for armv7 + +The xnnpack-delegate target is not built when XNNPACK is disabled +for armv7, but CMake still tries to set compile options on it, +causing configuration failures. + +Guard the target operations with target existence checks. + +Upstream-Status: Backport from 4f5e199a87e11ef4bb44992a3ccb22ea7e9fe983 + +Signed-off-by: Pratham Deshmukh +--- + tensorflow/lite/CMakeLists.txt | 8 +++++--- + 1 file changed, 5 insertions(+), 3 deletions(-) + +diff --git a/tensorflow/lite/CMakeLists.txt b/tensorflow/lite/CMakeLists.txt +index 8c43fdac..4dbd519a 100644 +--- a/tensorflow/lite/CMakeLists.txt ++++ b/tensorflow/lite/CMakeLists.txt +@@ -854,7 +854,9 @@ target_compile_options(_pywrap_tensorflow_interpreter_wrapper + PRIVATE ${TFLITE_TARGET_PRIVATE_OPTIONS} + ) + +-target_compile_options(xnnpack-delegate +- PUBLIC ${TFLITE_TARGET_PUBLIC_OPTIONS} +- PRIVATE ${TFLITE_TARGET_PRIVATE_OPTIONS} ++if(TARGET xnnpack-delegate) ++ target_compile_options(xnnpack-delegate ++ PUBLIC ${TFLITE_TARGET_PUBLIC_OPTIONS} ++ PRIVATE ${TFLITE_TARGET_PRIVATE_OPTIONS} + ) ++endif() +-- +2.34.1 + diff --git a/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch new file mode 100644 index 00000000..9a8a91d6 --- /dev/null +++ b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch @@ -0,0 +1,552 @@ +From 5f745084227b1caf38a0d12954925fe63c074e9c Mon Sep 17 00:00:00 2001 +From: Arian Arfaian +Date: Tue, 24 Mar 2026 17:52:07 +0530 +Subject: [PATCH 5/6] Add fp16 data type infrastructure to TensorFlow Lite + +This commit introduces comprehensive half-precision floating point (fp16) +data type support to TensorFlow Lite runtime, replacing the previous +Eigen::half dependency with a native TFLite implementation.This +infrastructure enables fp16 support throughout the TFLite runtime +and serves as the foundation for fp16-optimized kernel implementations + +Upstream-Status: Backport from 304986569a459c9f8ab3b9d922249e796553e5ea + +Signed-off-by: Pratham Deshmukh +--- + tensorflow/lite/BUILD | 2 +- + tensorflow/lite/interpreter_test.cc | 4 +- + tensorflow/lite/types/BUILD | 31 ++++ + tensorflow/lite/types/bit_cast.h | 36 +++++ + tensorflow/lite/types/fp16.h | 219 ++++++++++++++++++++++++++++ + tensorflow/lite/types/half.h | 169 +++++++++++++++++++++ + 6 files changed, 458 insertions(+), 3 deletions(-) + create mode 100644 tensorflow/lite/types/BUILD + create mode 100644 tensorflow/lite/types/bit_cast.h + create mode 100644 tensorflow/lite/types/fp16.h + create mode 100644 tensorflow/lite/types/half.h + +diff --git a/tensorflow/lite/BUILD b/tensorflow/lite/BUILD +index 2bb98382..dc0f03da 100644 +--- a/tensorflow/lite/BUILD ++++ b/tensorflow/lite/BUILD +@@ -1041,8 +1041,8 @@ cc_test( + "//tensorflow/lite/kernels:kernel_util", + "//tensorflow/lite/kernels/internal:compatibility", + "//tensorflow/lite/testing:util", ++ "//tensorflow/lite/types:half", + "@com_google_googletest//:gtest_main", +- "@eigen_archive//:eigen3", + ], + ) + +diff --git a/tensorflow/lite/interpreter_test.cc b/tensorflow/lite/interpreter_test.cc +index 19a36f4b..e8074f01 100644 +--- a/tensorflow/lite/interpreter_test.cc ++++ b/tensorflow/lite/interpreter_test.cc +@@ -29,7 +29,6 @@ limitations under the License. + + #include + #include +-#include "Eigen/Core" // from @eigen_archive + #include "tensorflow/lite/core/c/builtin_op_data.h" + #include "tensorflow/lite/core/c/c_api_types.h" + #include "tensorflow/lite/core/c/common.h" +@@ -42,6 +41,7 @@ limitations under the License. + #include "tensorflow/lite/kernels/kernel_util.h" + #include "tensorflow/lite/string_util.h" + #include "tensorflow/lite/testing/util.h" ++#include "tensorflow/lite/types/half.h" + #include "tensorflow/lite/util.h" + + #ifdef __APPLE__ +@@ -272,7 +272,7 @@ TEST(BasicInterpreter, CheckResize) { + const uint8_t uint8s[] = {3, 4}; + const int64_t int64s[] = {6, -7}; + const int16_t int16s[] = {8, -9}; +- const Eigen::half float16s[] = {Eigen::half(-3.f), Eigen::half(-4.f)}; ++ const half float16s[] = {half(-3.f), half(-4.f)}; + + struct { + TfLiteType type; +diff --git a/tensorflow/lite/types/BUILD b/tensorflow/lite/types/BUILD +new file mode 100644 +index 00000000..c00aadb6 +--- /dev/null ++++ b/tensorflow/lite/types/BUILD +@@ -0,0 +1,31 @@ ++# Copyright 2025 The TensorFlow Authors. All Rights Reserved. ++# ++# Licensed under the Apache License, Version 2.0 (the "License"); ++# you may not use this file except in compliance with the License. ++# You may obtain a copy of the License at ++# ++# http://www.apache.org/licenses/LICENSE-2.0 ++# ++# Unless required by applicable law or agreed to in writing, software ++# distributed under the License is distributed on an "AS IS" BASIS, ++# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ++# See the License for the specific language governing permissions and ++# limitations under the License. ++# ============================================================================== ++ ++load("@rules_cc//cc:cc_library.bzl", "cc_library") ++ ++package( ++ # copybara:uncomment default_applicable_licenses = ["//tensorflow:license"], ++ default_visibility = ["//visibility:public"], ++ licenses = ["notice"], ++) ++ ++cc_library( ++ name = "half", ++ hdrs = [ ++ "bit_cast.h", ++ "fp16.h", ++ "half.h", ++ ], ++) +diff --git a/tensorflow/lite/types/bit_cast.h b/tensorflow/lite/types/bit_cast.h +new file mode 100644 +index 00000000..77d97726 +--- /dev/null ++++ b/tensorflow/lite/types/bit_cast.h +@@ -0,0 +1,36 @@ ++/* Copyright 2025 The TensorFlow Authors. All Rights Reserved. ++ ++Licensed under the Apache License, Version 2.0 (the "License"); ++you may not use this file except in compliance with the License. ++You may obtain a copy of the License at ++ ++ http://www.apache.org/licenses/LICENSE-2.0 ++ ++Unless required by applicable law or agreed to in writing, software ++distributed under the License is distributed on an "AS IS" BASIS, ++WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ++See the License for the specific language governing permissions and ++limitations under the License. ++==============================================================================*/ ++ ++#ifndef TENSORFLOW_LITE_TYPES_BIT_CAST_H_ ++#define TENSORFLOW_LITE_TYPES_BIT_CAST_H_ ++ ++#include ++ ++namespace tflite { ++ ++// Unfortunately, std::bit_cast is C++20, which we can't use. More unfortunately ++// it seems impossible to hack together a constexpr bit_cast without compiler ++// support. ++template ++To bit_cast(From x) { ++ static_assert(sizeof(To) == sizeof(From), ""); ++ To result; ++ memcpy(&result, &x, sizeof(result)); ++ return result; ++} ++ ++} // namespace tflite ++ ++#endif // TENSORFLOW_LITE_TYPES_BIT_CAST_H_ +diff --git a/tensorflow/lite/types/fp16.h b/tensorflow/lite/types/fp16.h +new file mode 100644 +index 00000000..cc63fe7d +--- /dev/null ++++ b/tensorflow/lite/types/fp16.h +@@ -0,0 +1,219 @@ ++/* Copyright 2025 The TensorFlow Authors. All Rights Reserved. ++ ++Licensed under the Apache License, Version 2.0 (the "License"); ++you may not use this file except in compliance with the License. ++You may obtain a copy of the License at ++ ++ http://www.apache.org/licenses/LICENSE-2.0 ++ ++Unless required by applicable law or agreed to in writing, software ++distributed under the License is distributed on an "AS IS" BASIS, ++WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ++See the License for the specific language governing permissions and ++limitations under the License. ++==============================================================================*/ ++ ++#ifndef TENSORFLOW_LITE_TYPES_FP16_H_ ++#define TENSORFLOW_LITE_TYPES_FP16_H_ ++ ++#include ++ ++// This file is an excerpt from ++// https://github.com/Maratyszcza/FP16/blob/master/include/fp16/fp16.h, ++// including only the minimal functionality we need in XNNPACK. This works ++// around some issues that we haven't been able to fix upstream ++// (https://github.com/Maratyszcza/FP16/pull/32). See also: ++// - https://github.com/microsoft/onnxruntime/pull/22294/files ++// - https://github.com/google/XNNPACK/issues/6989 ++// We also don't need a lot of the functionality in the upstream library. ++ ++static inline float fp32_from_bits(uint32_t w) { ++ union { ++ uint32_t as_bits; ++ float as_value; ++ } fp32 = {w}; ++ return fp32.as_value; ++} ++ ++static inline uint32_t fp32_to_bits(float f) { ++ union { ++ float as_value; ++ uint32_t as_bits; ++ } fp32 = {f}; ++ return fp32.as_bits; ++} ++ ++/* ++ * Convert a 16-bit floating-point number in IEEE half-precision format, in bit ++ * representation, to a 32-bit floating-point number in IEEE single-precision ++ * format. ++ * ++ * @note The implementation relies on IEEE-like (no assumption about rounding ++ * mode and no operations on denormals) floating-point operations and bitcasts ++ * between integer and floating-point variables. ++ */ ++static inline float fp16_ieee_to_fp32_value(uint16_t h) { ++ /* ++ * Extend the half-precision floating-point number to 32 bits and shift to the ++ * upper part of the 32-bit word: ++ * +---+-----+------------+-------------------+ ++ * | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000| ++ * +---+-----+------------+-------------------+ ++ * Bits 31 26-30 16-25 0-15 ++ * ++ * S - sign bit, E - bits of the biased exponent, M - bits of the mantissa, 0 ++ * - zero bits. ++ */ ++ const uint32_t w = (uint32_t)h << 16; ++ /* ++ * Extract the sign of the input number into the high bit of the 32-bit word: ++ * ++ * +---+----------------------------------+ ++ * | S |0000000 00000000 00000000 00000000| ++ * +---+----------------------------------+ ++ * Bits 31 0-31 ++ */ ++ const uint32_t sign = w & UINT32_C(0x80000000); ++ /* ++ * Extract mantissa and biased exponent of the input number into the high bits ++ * of the 32-bit word: ++ * ++ * +-----+------------+---------------------+ ++ * |EEEEE|MM MMMM MMMM|0 0000 0000 0000 0000| ++ * +-----+------------+---------------------+ ++ * Bits 27-31 17-26 0-16 ++ */ ++ const uint32_t two_w = w + w; ++ ++ /* ++ * Shift mantissa and exponent into bits 23-28 and bits 13-22 so they become ++ * mantissa and exponent of a single-precision floating-point number: ++ * ++ * S|Exponent | Mantissa ++ * +-+---+-----+------------+----------------+ ++ * |0|000|EEEEE|MM MMMM MMMM|0 0000 0000 0000| ++ * +-+---+-----+------------+----------------+ ++ * Bits | 23-31 | 0-22 ++ * ++ * Next, there are some adjustments to the exponent: ++ * - The exponent needs to be corrected by the difference in exponent bias ++ * between single-precision and half-precision formats (0x7F - 0xF = 0x70) ++ * - Inf and NaN values in the inputs should become Inf and NaN values after ++ * conversion to the single-precision number. Therefore, if the biased ++ * exponent of the half-precision input was 0x1F (max possible value), the ++ * biased exponent of the single-precision output must be 0xFF (max possible ++ * value). We do this correction in two steps: ++ * - First, we adjust the exponent by (0xFF - 0x1F) = 0xE0 (see exp_offset ++ * below) rather than by 0x70 suggested by the difference in the exponent bias ++ * (see above). ++ * - Then we multiply the single-precision result of exponent adjustment by ++ * 2**(-112) to reverse the effect of exponent adjustment by 0xE0 less the ++ * necessary exponent adjustment by 0x70 due to difference in exponent bias. ++ * The floating-point multiplication hardware would ensure than Inf and ++ * NaN would retain their value on at least partially IEEE754-compliant ++ * implementations. ++ * ++ * Note that the above operations do not handle denormal inputs (where biased ++ * exponent == 0). However, they also do not operate on denormal inputs, and ++ * do not produce denormal results. ++ */ ++ const uint32_t exp_offset = UINT32_C(0xE0) << 23; ++#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) || \ ++ defined(__GNUC__) && !defined(__STRICT_ANSI__) ++ const float exp_scale = 0x1.0p-112f; ++#else ++ const float exp_scale = fp32_from_bits(UINT32_C(0x7800000)); ++#endif ++ const float normalized_value = ++ fp32_from_bits((two_w >> 4) + exp_offset) * exp_scale; ++ ++ /* ++ * Convert denormalized half-precision inputs into single-precision results ++ * (always normalized). Zero inputs are also handled here. ++ * ++ * In a denormalized number the biased exponent is zero, and mantissa has ++ * on-zero bits. First, we shift mantissa into bits 0-9 of the 32-bit word. ++ * ++ * zeros | mantissa ++ * +---------------------------+------------+ ++ * |0000 0000 0000 0000 0000 00|MM MMMM MMMM| ++ * +---------------------------+------------+ ++ * Bits 10-31 0-9 ++ * ++ * Now, remember that denormalized half-precision numbers are represented as: ++ * FP16 = mantissa * 2**(-24). ++ * The trick is to construct a normalized single-precision number with the ++ * same mantissa and thehalf-precision input and with an exponent which would ++ * scale the corresponding mantissa bits to 2**(-24). A normalized ++ * single-precision floating-point number is represented as: FP32 = (1 + ++ * mantissa * 2**(-23)) * 2**(exponent - 127) Therefore, when the biased ++ * exponent is 126, a unit change in the mantissa of the input denormalized ++ * half-precision number causes a change of the constructud single-precision ++ * number by 2**(-24), i.e. the same ammount. ++ * ++ * The last step is to adjust the bias of the constructed single-precision ++ * number. When the input half-precision number is zero, the constructed ++ * single-precision number has the value of FP32 = 1 * 2**(126 - 127) = ++ * 2**(-1) = 0.5 Therefore, we need to subtract 0.5 from the constructed ++ * single-precision number to get the numerical equivalent of the input ++ * half-precision number. ++ */ ++ const uint32_t magic_mask = UINT32_C(126) << 23; ++ const float magic_bias = 0.5f; ++ const float denormalized_value = ++ fp32_from_bits((two_w >> 17) | magic_mask) - magic_bias; ++ ++ /* ++ * - Choose either results of conversion of input as a normalized number, or ++ * as a denormalized number, depending on the input exponent. The variable ++ * two_w contains input exponent in bits 27-31, therefore if its smaller than ++ * 2**27, the input is either a denormal number, or zero. ++ * - Combine the result of conversion of exponent and mantissa with the sign ++ * of the input number. ++ */ ++ const uint32_t denormalized_cutoff = UINT32_C(1) << 27; ++ const uint32_t result = ++ sign | (two_w < denormalized_cutoff ? fp32_to_bits(denormalized_value) ++ : fp32_to_bits(normalized_value)); ++ return fp32_from_bits(result); ++} ++ ++/* ++ * Convert a 32-bit floating-point number in IEEE single-precision format to a ++ * 16-bit floating-point number in IEEE half-precision format, in bit ++ * representation. ++ * ++ * @note The implementation relies on IEEE-like (no assumption about rounding ++ * mode and no operations on denormals) floating-point operations and bitcasts ++ * between integer and floating-point variables. ++ */ ++static inline uint16_t fp16_ieee_from_fp32_value(float f) { ++#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) || \ ++ defined(__GNUC__) && !defined(__STRICT_ANSI__) ++ const float scale_to_inf = 0x1.0p+112f; ++ const float scale_to_zero = 0x1.0p-110f; ++#else ++ const float scale_to_inf = fp32_from_bits(UINT32_C(0x77800000)); ++ const float scale_to_zero = fp32_from_bits(UINT32_C(0x08800000)); ++#endif ++ const uint32_t w = fp32_to_bits(f); ++ const float abs_f = fp32_from_bits(w & UINT32_C(0x7FFFFFFF)); ++ float base = (abs_f * scale_to_inf) * scale_to_zero; ++ ++ const uint32_t shl1_w = w + w; ++ const uint32_t sign = w & UINT32_C(0x80000000); ++ uint32_t bias = shl1_w & UINT32_C(0xFF000000); ++ if (bias < UINT32_C(0x71000000)) { ++ bias = UINT32_C(0x71000000); ++ } ++ ++ base = fp32_from_bits((bias >> 1) + UINT32_C(0x07800000)) + base; ++ const uint32_t bits = fp32_to_bits(base); ++ const uint32_t exp_bits = (bits >> 13) & UINT32_C(0x00007C00); ++ const uint32_t mantissa_bits = bits & UINT32_C(0x00000FFF); ++ const uint32_t nonsign = exp_bits + mantissa_bits; ++ return (sign >> 16) | ++ (shl1_w > UINT32_C(0xFF000000) ? UINT16_C(0x7E00) : nonsign); ++} ++ ++#endif // TENSORFLOW_LITE_TYPES_FP16_H_ +diff --git a/tensorflow/lite/types/half.h b/tensorflow/lite/types/half.h +new file mode 100644 +index 00000000..13e8662d +--- /dev/null ++++ b/tensorflow/lite/types/half.h +@@ -0,0 +1,169 @@ ++/* Copyright 2025 The TensorFlow Authors. All Rights Reserved. ++ ++Licensed under the Apache License, Version 2.0 (the "License"); ++you may not use this file except in compliance with the License. ++You may obtain a copy of the License at ++ ++ http://www.apache.org/licenses/LICENSE-2.0 ++ ++Unless required by applicable law or agreed to in writing, software ++distributed under the License is distributed on an "AS IS" BASIS, ++WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ++See the License for the specific language governing permissions and ++limitations under the License. ++==============================================================================*/ ++ ++#ifndef TENSORFLOW_LITE_TYPES_HALF_H_ ++#define TENSORFLOW_LITE_TYPES_HALF_H_ ++ ++#include ++ ++// We want to use _Float16 if the compiler supports it fully, but it's ++// tricky to do this detection; there are compiler versions that define the ++// type in broken ways. We're only going to bother using it if the support is ++// known to be at least a robust f16<->f32 conversion, which generally means a ++// recent version of Clang or GCC, x86 or ARM or RISC-V architectures, and ++// (in some cases) the right architecture flags specified on the command line. ++ ++#ifndef TFLITE_ARCH_FLOAT16 ++ ++// Some non-GCC compilers define __GNUC__, but we only want to detect the Real ++// Thing ++#if defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER) && \ ++ !defined(__INTEL_LLVM_COMPILER) ++#define TFLITE_GNUC_ACTUAL __GNUC__ ++#else ++#define TFLITE_GNUC_ACTUAL 0 ++#endif ++ ++#if (defined(__i386__) || defined(__x86_64__)) && defined(__SSE2__) && \ ++ defined(__FLT16_MAX__) && defined(__F16C__) && \ ++ ((__clang_major__ >= 15 && !defined(_MSC_VER)) || \ ++ (TFLITE_GNUC_ACTUAL >= 12)) ++#define TFLITE_ARCH_FLOAT16 1 ++#endif ++ ++#if ((defined(__arm__) || defined(_M_ARM) || defined(__aarch64__) || \ ++ defined(_M_ARM64) || defined(_M_ARM64EC)) && \ ++ !defined(_MSC_VER)) && \ ++ defined(__ARM_FEATURE_FP16_SCALAR_ARITHMETIC) ++#define TFLITE_ARCH_FLOAT16 1 ++#endif ++ ++#if defined(__riscv) && defined(__riscv_zvfh) && __clang__ >= 1600 ++#define TFLITE_ARCH_FLOAT16 1 ++#endif ++ ++#ifndef TFLITE_ARCH_FLOAT16 ++#define TFLITE_ARCH_FLOAT16 0 ++#endif ++ ++#endif // TFLITE_ARCH_FLOAT16 ++ ++#if TFLITE_ARCH_FLOAT16 ++ ++#include ++ ++#include "tensorflow/lite/types/bit_cast.h" ++ ++namespace tflite { ++ ++class half { ++ public: ++ half() = default; ++ constexpr half(float x) : value_(static_cast<_Float16>(x)) {} // NOLINT ++ constexpr half(int x) ++ : value_(static_cast<_Float16>(static_cast(x))) {} // NOLINT ++ ++ constexpr operator float() const { return value_; } // NOLINT ++ ++ static half from_bits(uint16_t bits) { ++ half result; ++ result.value_ = bit_cast<_Float16>(bits); ++ return result; ++ } ++ ++ uint16_t to_bits() const { return bit_cast(value_); } ++ ++ bool is_zero() const { return value_ == 0.0f; } ++ ++ // These definitions are imprecise because we want them to be constexpr, and ++ // the various tools for doing that are not constepxr (bit_cast, ++ // std::numeric_limits, etc.). ++ static constexpr half epsilon() { return 0.0009765625f; } ++ static constexpr half infinity() { return INFINITY; } ++ static constexpr half min() { return -65504.0f; } ++ static constexpr half max() { return 65504.0f; } ++ static constexpr half smallest_normal() { return 0.00006103515625f; } ++ static constexpr half min_identity() { return INFINITY; } ++ static constexpr half max_identity() { return -INFINITY; } ++ static constexpr half sum_identity() { return 0.0f; } ++ ++ // Not private due to -Werror=class-memaccess, which can't be disabled: ++ // - via a --copt, because it seems to have no effect. ++ // - via .bazelrc, because it then applies to C code, and the compiler says ++ // this flag is not valid in C. ++ _Float16 value_; ++}; ++ ++} // namespace tflite ++ ++#else // TFLITE_ARCH_FLOAT16 ++ ++#include "tensorflow/lite/types/fp16.h" ++ ++namespace tflite { ++ ++class half { ++ private: ++ // We need this hoop jumping to enable implementing a constexpr `from_bits`. ++ struct zero_initializer {}; ++ explicit constexpr half(zero_initializer) : bits_(0) {} ++ ++ public: ++ half() = default; ++ half(float x) : bits_(fp16_ieee_from_fp32_value(x)) {} // NOLINT ++ explicit half(int x) ++ : bits_(fp16_ieee_from_fp32_value(static_cast(x))) {} ++ ++ operator float() const { return fp16_ieee_to_fp32_value(bits_); } // NOLINT ++ ++ static constexpr half from_bits(uint16_t bits) { ++ half result{zero_initializer{}}; ++ result.bits_ = bits; ++ return result; ++ } ++ ++ constexpr uint16_t to_bits() const { return bits_; } ++ ++ bool is_zero() const { ++ // Check for +/- zero (0x0000/0x8000). uint16 overflow is well defined to ++ // wrap around. ++ return static_cast(bits_ * 2) == 0; ++ } ++ ++ static constexpr half epsilon() { ++ return half::from_bits(0x1400); // 2^-10 = 0.0009765625 ++ } ++ static constexpr half infinity() { return from_bits(0x7c00); } ++ static constexpr half min() { return from_bits(0xfbff); } ++ static constexpr half max() { return from_bits(0x7bff); } ++ static constexpr half smallest_normal() { ++ return from_bits(0x0400); // 2^-14 ++ } ++ static constexpr half min_identity() { return from_bits(0x7c00); } ++ static constexpr half max_identity() { return from_bits(0xfc00); } ++ static constexpr half sum_identity() { return from_bits(0); } ++ ++ // Not private due to -Werror=class-memaccess, which can't be disabled: ++ // - via a --copt, because it seems to have no effect. ++ // - via .bazelrc, because it then applies to C code, and the compiler says ++ // this flag is not valid in C. ++ uint16_t bits_; ++}; ++ ++} // namespace tflite ++ ++#endif // TFLITE_ARCH_FLOAT16 ++ ++#endif // TENSORFLOW_LITE_TYPES_HALF_H_ +-- +2.34.1 + diff --git a/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch new file mode 100644 index 00000000..ac333931 --- /dev/null +++ b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite/0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch @@ -0,0 +1,447 @@ +From 62b578947645562bd902b6a36e1841fb8c136aeb Mon Sep 17 00:00:00 2001 +From: Dillon Sharlet +Date: Tue, 24 Mar 2026 20:41:27 +0530 +Subject: [PATCH 6/6] Add float16 support to EMBEDDING_LOOKUP kernel + +This commit adds comprehensive float16 (half precision) support to the +TensorFlow Lite EMBEDDING_LOOKUP operation, enabling more efficient +inference on hardware that supports 16-bit floating point operations. + +Upstream-Status: Backport from dfc2c904c7ca3ea6749b1604bdda5877855e0582 + +Signed-off-by: Pratham Deshmukh +--- + tensorflow/lite/kernels/embedding_lookup.cc | 92 ++++++---- + .../lite/kernels/embedding_lookup_test.cc | 172 ++++++++++++++++-- + 2 files changed, 210 insertions(+), 54 deletions(-) + +diff --git a/tensorflow/lite/kernels/embedding_lookup.cc b/tensorflow/lite/kernels/embedding_lookup.cc +index e5ee8610..a54a3d93 100644 +--- a/tensorflow/lite/kernels/embedding_lookup.cc ++++ b/tensorflow/lite/kernels/embedding_lookup.cc +@@ -33,11 +33,11 @@ limitations under the License. + #include + #include + +-#include "fp16/fp16.h" // from @FP16 + #include "tensorflow/lite/c/c_api_types.h" + #include "tensorflow/lite/core/c/common.h" + #include "tensorflow/lite/kernels/internal/tensor_ctypes.h" + #include "tensorflow/lite/kernels/kernel_util.h" ++#include "tensorflow/lite/types/half.h" + + namespace tflite { + namespace ops { +@@ -75,7 +75,8 @@ TfLiteStatus Prepare(TfLiteContext* context, TfLiteNode* node) { + TF_LITE_ENSURE(context, value->type == kTfLiteUInt8 || + value->type == kTfLiteInt8 || + value->type == kTfLiteInt4); +- TF_LITE_ENSURE(context, output->type == kTfLiteFloat32); ++ TF_LITE_ENSURE(context, output->type == kTfLiteFloat32 || ++ output->type == kTfLiteFloat16); + // Per-axis quantization must have quantized_dimension == 0 and correct + // sizes for scale and zero_point. + TF_LITE_ENSURE(context, qparams->quantized_dimension == 0); +@@ -128,8 +129,12 @@ TfLiteStatus EvalSimple(TfLiteContext* context, TfLiteNode* node, + return kTfLiteOk; + } + +-void Unpack4Bit(double scaling_factor, int col_size, const int8_t* value_ptr, +- float* output_ptr) { ++template ++void Unpack4Bit(float scaling_factor, int col_size, const int8_t* value_ptr, ++ T* output_ptr) { ++ float scaling_factor0 = scaling_factor / 16; ++ int j = 0; ++ int i4_idx = 0; + for (int j = 0; j < col_size; j++) { + int i8_idx = j; + int i4_idx = i8_idx / 2; +@@ -163,7 +168,10 @@ TfLiteStatus EvalBlockwise(TfLiteContext* context, TfLiteNode* node, + col_size *= SizeOfDimension(value, i); + } + +- float* output_ptr = GetTensorData(output); ++ float* output_fp32_ptr = ++ output->type == kTfLiteFloat32 ? GetTensorData(output) : nullptr; ++ half* output_fp16_ptr = ++ output->type == kTfLiteFloat16 ? GetTensorData(output) : nullptr; + const int8_t* value_ptr = GetTensorData(value); + const int32_t* lookup_data = GetTensorData(lookup); + +@@ -191,14 +199,17 @@ TfLiteStatus EvalBlockwise(TfLiteContext* context, TfLiteNode* node, + return kTfLiteError; + } + for (int j = 0; j < num_blocks; ++j) { +- uint16_t raw_scaling_factor = +- GetTensorData(&scale)[j + idx * num_blocks]; +- uint32_t fp32_scaling_factor = fp16_ieee_to_fp32_bits(raw_scaling_factor); +- double scaling_factor = *reinterpret_cast(&fp32_scaling_factor); +- +- Unpack4Bit(scaling_factor, blocksize, +- &value_ptr[(j * blocksize + idx * col_size) / 2], +- &output_ptr[j * blocksize + i * col_size]); ++ float scaling_factor = GetTensorData(&scale)[j + idx * num_blocks]; ++ ++ if (output_fp32_ptr) { ++ Unpack4Bit(scaling_factor, blocksize, ++ &value_ptr[(j * blocksize + idx * col_size) / 2], ++ &output_fp32_ptr[j * blocksize + i * col_size]); ++ } else { ++ Unpack4Bit(scaling_factor, blocksize, ++ &value_ptr[(j * blocksize + idx * col_size) / 2], ++ &output_fp16_ptr[j * blocksize + i * col_size]); ++ } + } + } + return kTfLiteOk; +@@ -207,9 +218,6 @@ TfLiteStatus EvalBlockwise(TfLiteContext* context, TfLiteNode* node, + TfLiteStatus EvalHybrid(TfLiteContext* context, TfLiteNode* node, + const TfLiteTensor* lookup, const TfLiteTensor* value, + TfLiteTensor* output) { +- if (value->quantization.type == kTfLiteBlockwiseQuantization) { +- return EvalBlockwise(context, node, lookup, value, output); +- } + const int row_size = SizeOfDimension(value, 0); + + // col_size after we flatten tensor into 2D. +@@ -218,7 +226,23 @@ TfLiteStatus EvalHybrid(TfLiteContext* context, TfLiteNode* node, + col_size *= SizeOfDimension(value, i); + } + +- float* output_ptr = GetTensorData(output); ++ auto copy_row = [&](float scaling_factor, auto output_ptr, auto value_ptr, ++ int idx, int i) { ++ if (value->type == kTfLiteInt4) { ++ Unpack4Bit(scaling_factor, col_size, &value_ptr[idx * col_size / 2], ++ &output_ptr[i * col_size]); ++ } else { ++ for (int j = 0; j < col_size; j++) { ++ output_ptr[j + i * col_size] = ++ value_ptr[j + idx * col_size] * scaling_factor; ++ } ++ } ++ }; ++ ++ float* output_fp32_ptr = ++ output->type == kTfLiteFloat32 ? GetTensorData(output) : nullptr; ++ half* output_fp16_ptr = ++ output->type == kTfLiteFloat16 ? GetTensorData(output) : nullptr; + const int8_t* value_ptr = GetTensorData(value); + const int32_t* lookup_data = GetTensorData(lookup); + +@@ -234,7 +258,7 @@ TfLiteStatus EvalHybrid(TfLiteContext* context, TfLiteNode* node, + // Dequantize embedding values. + // TODO(alanchiao): refactor scalar multiply into separate function + // for ease of adding a neon equivalent if ever necessary. +- double scaling_factor = value->params.scale; ++ float scaling_factor = value->params.scale; + if (value->quantization.type == kTfLiteAffineQuantization) { + const auto qparams = static_cast( + value->quantization.params); +@@ -244,14 +268,10 @@ TfLiteStatus EvalHybrid(TfLiteContext* context, TfLiteNode* node, + } + } + +- if (value->type == kTfLiteInt4) { +- Unpack4Bit(scaling_factor, col_size, &value_ptr[idx * col_size / 2], +- &output_ptr[i * col_size]); ++ if (output_fp32_ptr) { ++ copy_row(scaling_factor, output_fp32_ptr, value_ptr, idx, i); + } else { +- for (int j = 0; j < col_size; j++) { +- output_ptr[j + i * col_size] = +- value_ptr[j + idx * col_size] * scaling_factor; +- } ++ copy_row(scaling_factor, output_fp16_ptr, value_ptr, idx, i); + } + } + } +@@ -266,21 +286,13 @@ TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) { + TF_LITE_ENSURE_OK(context, GetInputSafe(context, node, 1, &value)); + TfLiteTensor* output; + TF_LITE_ENSURE_OK(context, GetOutputSafe(context, node, 0, &output)); +- switch (value->type) { +- case kTfLiteFloat32: +- return EvalSimple(context, node, lookup, value, output); +- case kTfLiteInt4: +- return EvalHybrid(context, node, lookup, value, output); +- case kTfLiteUInt8: +- case kTfLiteInt8: +- if (output->type == kTfLiteFloat32) { +- return EvalHybrid(context, node, lookup, value, output); +- } else { +- return EvalSimple(context, node, lookup, value, output); +- } +- default: +- TF_LITE_KERNEL_LOG(context, "Type not currently supported."); +- return kTfLiteError; ++ if (value->quantization.type == kTfLiteBlockwiseQuantization) { ++ return EvalBlockwise(context, node, lookup, value, output); ++ } else if (value->type != output->type && (output->type == kTfLiteFloat32 || ++ output->type == kTfLiteFloat16)) { ++ return EvalHybrid(context, node, lookup, value, output); ++ } else { ++ return EvalSimple(context, node, lookup, value, output); + } + } + +diff --git a/tensorflow/lite/kernels/embedding_lookup_test.cc b/tensorflow/lite/kernels/embedding_lookup_test.cc +index 14091ab1..8530e629 100644 +--- a/tensorflow/lite/kernels/embedding_lookup_test.cc ++++ b/tensorflow/lite/kernels/embedding_lookup_test.cc +@@ -27,11 +27,13 @@ License. + #include "tensorflow/lite/kernels/internal/tensor_ctypes.h" + #include "tensorflow/lite/kernels/test_util.h" + #include "tensorflow/lite/schema/schema_generated.h" ++#include "tensorflow/lite/types/half.h" + + namespace tflite { + namespace { + +-float kTestTolerance = 7.41e-03; ++constexpr float kTestTolerance = 7.41e-03; ++constexpr float kFp16TestTolerance = 1e-02; + + using ::testing::ElementsAreArray; + +@@ -125,8 +127,10 @@ class HybridEmbeddingLookupOpModel : public BaseEmbeddingLookupOpModel { + public: + HybridEmbeddingLookupOpModel(std::initializer_list index_shape, + std::initializer_list weight_shape, +- TensorType type) +- : BaseEmbeddingLookupOpModel(index_shape, weight_shape, type) {} ++ TensorType weight_type, ++ TensorType output_type = TensorType_FLOAT32) ++ : BaseEmbeddingLookupOpModel(index_shape, weight_shape, weight_type, ++ output_type) {} + + void SetWeight(std::initializer_list data) { + SymmetricQuantizeAndPopulate(weight_, data); +@@ -143,9 +147,9 @@ class PerAxisHybridEmbeddingLookupOpModel : public BaseEmbeddingLookupOpModel { + std::initializer_list index_shape, + std::initializer_list weight_shape, + const std::vector& per_channel_quantization_scales, +- TensorType type) +- : BaseEmbeddingLookupOpModel(index_shape, weight_shape, type, +- TensorType_FLOAT32, ++ TensorType weights_type, TensorType output_type = TensorType_FLOAT32) ++ : BaseEmbeddingLookupOpModel(index_shape, weight_shape, weights_type, ++ output_type, + per_channel_quantization_scales) {} + + void SetSignedWeight(std::initializer_list data) { +@@ -155,12 +159,13 @@ class PerAxisHybridEmbeddingLookupOpModel : public BaseEmbeddingLookupOpModel { + + class PerBlockHybridEmbeddingLookupOpModel : public BaseEmbeddingLookupOpModel { + public: +- PerBlockHybridEmbeddingLookupOpModel(std::initializer_list index_shape, +- std::initializer_list weight_shape, +- TensorType type, int blocksize, +- std::vector scales) +- : BaseEmbeddingLookupOpModel(index_shape, weight_shape, type, +- TensorType_FLOAT32, scales, blocksize) {} ++ PerBlockHybridEmbeddingLookupOpModel( ++ std::initializer_list index_shape, ++ std::initializer_list weight_shape, TensorType weights_type, ++ int blocksize, std::vector scales, ++ TensorType output_type = TensorType_FLOAT32) ++ : BaseEmbeddingLookupOpModel(index_shape, weight_shape, weights_type, ++ output_type, scales, blocksize) {} + void SetSignedWeight(std::initializer_list data) { + PerBlockSymmetricQuantizeAndPopulate(weight_, data); + } +@@ -168,8 +173,9 @@ class PerBlockHybridEmbeddingLookupOpModel : public BaseEmbeddingLookupOpModel { + + // TODO(ahentz): write more tests that exercise the details of the op, such as + // lookup errors and variable input shapes. +-TEST(EmbeddingLookupOpTest, SimpleTest) { +- EmbeddingLookupOpModel m({3}, {3, 2, 4}); ++TEST(EmbeddingLookupOpTest, Float32) { ++ EmbeddingLookupOpModel m({3}, {3, 2, 4}, TensorType_FLOAT32, ++ TensorType_FLOAT32); + m.SetInput({1, 0, 2}); + m.Set3DWeightMatrix( + [](int i, int j, int k) -> float { return i + j / 10.0f + k / 100.0f; }); +@@ -184,6 +190,25 @@ TEST(EmbeddingLookupOpTest, SimpleTest) { + }))); + } + ++TEST(EmbeddingLookupOpTest, Float16) { ++ EmbeddingLookupOpModel m({3}, {3, 2, 4}, TensorType_FLOAT16, ++ TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.Set3DWeightMatrix( ++ [](int i, int j, int k) -> half { return i + j / 10.0f + k / 100.0f; }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT(m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 1.00, 1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }, ++ kTestTolerance))); ++} ++ + #if !defined(MEMORY_SANITIZER) && !defined(GOOGLE_UNSUPPORTED_OS_LOONIX) && \ + defined(__LP64__) + TEST(EmbeddingLookupOpTest, LargeTableTest) { +@@ -269,6 +294,28 @@ TEST(HybridEmbeddingLookupHybridOpTest, Simple4DTestUint8) { + kTestTolerance))); + } + ++TEST(HybridEmbeddingLookupHybridOpTest, Simple4DTestUint8Float16) { ++ HybridEmbeddingLookupOpModel m({3}, {3, 2, 2, 2}, TensorType_UINT8, ++ TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.SetWeight({ ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 1.00, 1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT(m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 1.00, 1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }, ++ kFp16TestTolerance))); ++} ++ + TEST(HybridEmbeddingLookupHybridOpTest, Simple2DTestInt8) { + HybridEmbeddingLookupOpModel m({3}, {3, 8}, TensorType_INT8); + m.SetInput({1, 0, 2}); +@@ -332,6 +379,28 @@ TEST(HybridEmbeddingLookupHybridOpTest, Simple4DTestInt8) { + kTestTolerance))); + } + ++TEST(HybridEmbeddingLookupHybridOpTest, Simple4DTestInt8Float16) { ++ HybridEmbeddingLookupOpModel m({3}, {3, 2, 2, 2}, TensorType_INT8, ++ TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.SetSignedWeight({ ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 1.00, -1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT(m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 1.00, -1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }, ++ kFp16TestTolerance))); ++} ++ + TEST(EmbeddingLookupHybridOpTest, Simple3DTestQuantized) { + EmbeddingLookupOpModel m({3}, {3, 2, 4}, TensorType_UINT8, TensorType_INT8); + m.SetInput({1, 0, 2}); +@@ -414,6 +483,29 @@ TEST(PerAxisHybridEmbeddingLookupHybridOpTest, PerAxisSimple4DTestInt8) { + kTestTolerance))); + } + ++TEST(PerAxisHybridEmbeddingLookupHybridOpTest, PerAxisSimple4DTestInt8Float16) { ++ PerAxisHybridEmbeddingLookupOpModel m({3}, {3, 2, 2, 2}, ++ {0.00102, 0.0089, 0.016772}, ++ TensorType_INT8, TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.SetSignedWeight({ ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 1.00, -1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT(m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 1.00, -1.01, 1.02, 1.03, 1.10, 1.11, 1.12, 1.13, // Row 1 ++ 0.00, 0.01, 0.02, 0.03, 0.10, 0.11, 0.12, 0.13, // Row 0 ++ 2.00, 2.01, 2.02, 2.03, 2.10, 2.11, 2.12, 2.13, // Row 2 ++ }, ++ kFp16TestTolerance))); ++} ++ + TEST(PerBlockHybridEmbeddingLookupHybridOpTest, PerBlockSimple2DTestInt4) { + PerBlockHybridEmbeddingLookupOpModel m( + /*index_shape=*/{3}, +@@ -441,6 +533,35 @@ TEST(PerBlockHybridEmbeddingLookupHybridOpTest, PerBlockSimple2DTestInt4) { + kTestTolerance))); + } + ++TEST(PerBlockHybridEmbeddingLookupHybridOpTest, ++ PerBlockSimple2DTestInt4Float16) { ++ PerBlockHybridEmbeddingLookupOpModel m( ++ /*index_shape=*/{3}, ++ /*weight_shape=*/{3, 8}, ++ /*weights_type=*/TensorType_INT4, ++ /*blocksize=*/4, ++ /*scales=*/{0.001, 0.001, 0.02, 0.02, 0.3, 0.3}, ++ /*output_type=*/TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.SetSignedWeight({ ++ 0.00, 0.007, 0.006, 0.005, 0.004, 0.003, 0.002, 0.001, // Row 0 ++ 0.02, -0.02, 0.04, 0.06, 0.08, -0.04, -0.08, -0.06, // Row 1 ++ 0.3, 0.6, 0.9, 1.2, 1.5, -0.3, -0.6, -0.9, // Row 2 ++ }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT( ++ m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 0.02, -0.02, 0.04, 0.06, 0.08, -0.04, -0.08, -0.06, // Row 1 ++ 0.00, 0.007, 0.006, 0.005, 0.004, 0.003, 0.002, 0.001, // Row 0 ++ 0.3, 0.6, 0.9, 1.2, 1.5, -0.3, -0.6, -0.9, // Row 2 ++ }, ++ kFp16TestTolerance))); ++} ++ + TEST(PerAxisHybridEmbeddingLookupHybridOpTest, PerAxisSimple2DTestInt4) { + PerAxisHybridEmbeddingLookupOpModel m( + /*index_shape=*/{3}, /*weight_shape=*/{3, 8}, +@@ -512,5 +633,28 @@ TEST(PerAxisHybridEmbeddingLookupHybridOpTest, PerAxisSimple4DTestInt4) { + kTestTolerance))); + } + ++TEST(PerAxisHybridEmbeddingLookupHybridOpTest, PerAxisSimple4DTestInt4Float16) { ++ PerAxisHybridEmbeddingLookupOpModel m({3}, {3, 2, 2, 2}, {0.001, 0.02, 0.3}, ++ TensorType_INT4, TensorType_FLOAT16); ++ m.SetInput({1, 0, 2}); ++ m.SetSignedWeight({ ++ 0.00, 0.007, 0.006, 0.005, 0.004, 0.003, 0.002, 0.001, // Row 0 ++ 0.02, -0.02, 0.04, 0.06, 0.08, -0.04, -0.08, -0.06, // Row 1 ++ 0.3, 0.6, 0.9, 1.2, 1.5, -0.3, -0.6, -0.9, // Row 2 ++ }); ++ ++ ASSERT_EQ(m.Invoke(), kTfLiteOk); ++ ++ EXPECT_THAT( ++ m.GetOutput(), ++ ElementsAreArray(ArrayFloatNear( ++ { ++ 0.02, -0.02, 0.04, 0.06, 0.08, -0.04, -0.08, -0.06, // Row 1 ++ 0.00, 0.007, 0.006, 0.005, 0.004, 0.003, 0.002, 0.001, // Row 0 ++ 0.3, 0.6, 0.9, 1.2, 1.5, -0.3, -0.6, -0.9, // Row 2 ++ }, ++ kFp16TestTolerance))); ++} ++ + } // namespace + } // namespace tflite +-- +2.34.1 + diff --git a/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb index ee445e75..559ec5ef 100644 --- a/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb +++ b/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb @@ -17,6 +17,9 @@ SRC_URI = " \ file://0001-Update-CMakeLists-for-building.patch \ file://0002-Update-CMakeLists-for-building-shared-object.patch \ file://0003-Fix-GStreamer-TensorFlow-Lite-pipeline-failures-due-.patch \ + file://0004-Disable-xnnpack-delegate-target-operations-for-armv7.patch \ + file://0005-Add-fp16-data-type-infrastructure-to-TensorFlow-Lite.patch \ + file://0006-Add-float16-support-to-EMBEDDING_LOOKUP-kernel.patch \ file://tensorflow2-lite.pc.in \ " From patchwork Thu Mar 26 07:26:52 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pratham Deshmukh X-Patchwork-Id: 84394 X-Patchwork-Delegate: reatmon@ti.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7161109C038 for ; Thu, 26 Mar 2026 07:27:09 +0000 (UTC) Received: from BYAPR05CU005.outbound.protection.outlook.com (BYAPR05CU005.outbound.protection.outlook.com [52.101.85.59]) by mx.groups.io with SMTP id smtpd.msgproc02-g2.42802.1774510027786987127 for ; Thu, 26 Mar 2026 00:27:07 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="dkim: body hash did not verify" header.i=@ti.com header.s=selector1 header.b=IqXQ5sMZ; spf=permerror, err=parse error for token &{10 18 spf.protection.outlook.com}: limit exceeded (domain: ti.com, ip: 52.101.85.59, mailfrom: p-deshmukh@ti.com) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=gBLOVkgma0guY8AVvUOHLG4RTGALEUZf9fJKZKBnMEUwnQQC/FEYrZrkgV8ZAUgKock9xE9nL6nvqDZNPsKIpFAX7B/xwdJpMt2Zz70p9wdl6LxS09pR/A1vZpTRHWngzGPRQwzrunrZG9QnXvNWS0H+BB7GpXSueJN5xOKmhkhNI8fLBDAapj6b/gEKRSW4gitoA+tRExFlO0bRQ7InHkXU+JraCazQlph69GPmymiCQnS5oUbVLTavWvMQl8vxEIcvzPcTga0THrA8EIk4U5Tho1S3mVYawRf88tq+eG+L0O0lj75yibtEEMt4+lu4ATQgT3IXJX2a5KQK8CKSbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=izW8ekk001l+9zql2fs1xRVj3zqjkaIeIIxBvBnOhes=; b=dYvrh7++nJjq/GreQat0ycF+TLwj+VjZg7nUw4W3hBaRFdsiL0ed4HG9TOl1OaLJTLytRBgbatkifYXkwOZOx/fuVec9hQs1uJb0ZwYk1EaYDLw/i6e3cNSaGFgAUgTPrat4FxsezbWCAmiiCe6Vy03qWZXr6RnS4/KO0NYyn3gN/ywHe1J2XgjGwaSweFZHc3JGIStamtnklPll5nAn7CZAFcoBJdiAqq6UlNYgEVbVnFqaNaclSTvX//ZtZPReylb7V0ytFKal5ruE4zZT8ojCvnVZgy6vNstV/j2gC5DkPaGsZoqZpJh8vVtJbbN4pUKOV/mGMFgtna1yGdai+w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 198.47.23.195) smtp.rcpttodomain=lists.yoctoproject.org smtp.mailfrom=ti.com; dmarc=pass (p=quarantine sp=none pct=100) action=none header.from=ti.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=izW8ekk001l+9zql2fs1xRVj3zqjkaIeIIxBvBnOhes=; b=IqXQ5sMZbxNMTONiDY7eduJlgux6KtEM0Edd56BkuvVl/RUjWSB+JMmHsTTgLrulAVM0XoO/LS9e6gDfn621BWuE+nVcykYbeYhvvu+uoG2DevYKcUersQZrQMR8DFXDtBCDAsjZi5S/uM4eR3G28Cl7J/Lvf5yh/6vWMxZLDYk= Received: from PH8P221CA0050.NAMP221.PROD.OUTLOOK.COM (2603:10b6:510:346::11) by DM6PR10MB4124.namprd10.prod.outlook.com (2603:10b6:5:218::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.22; Thu, 26 Mar 2026 07:27:05 +0000 Received: from SN1PEPF000252A3.namprd05.prod.outlook.com (2603:10b6:510:346:cafe::89) by PH8P221CA0050.outlook.office365.com (2603:10b6:510:346::11) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.32 via Frontend Transport; Thu, 26 Mar 2026 07:27:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 198.47.23.195) smtp.mailfrom=ti.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ti.com; Received-SPF: Pass (protection.outlook.com: domain of ti.com designates 198.47.23.195 as permitted sender) receiver=protection.outlook.com; client-ip=198.47.23.195; helo=lewvzet201.ext.ti.com; pr=C Received: from lewvzet201.ext.ti.com (198.47.23.195) by SN1PEPF000252A3.mail.protection.outlook.com (10.167.242.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.21 via Frontend Transport; Thu, 26 Mar 2026 07:27:05 +0000 Received: from DLEE210.ent.ti.com (157.170.170.112) by lewvzet201.ext.ti.com (10.4.14.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:04 -0500 Received: from DLEE201.ent.ti.com (157.170.170.76) by DLEE210.ent.ti.com (157.170.170.112) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:04 -0500 Received: from lelvem-mr05.itg.ti.com (10.180.75.9) by DLEE201.ent.ti.com (157.170.170.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20 via Frontend Transport; Thu, 26 Mar 2026 02:27:04 -0500 Received: from pratham-TI.dhcp.ti.com (pratham-ti.dhcp.ti.com [172.24.233.101]) by lelvem-mr05.itg.ti.com (8.18.1/8.18.1) with ESMTP id 62Q7Qsxd1733061; Thu, 26 Mar 2026 02:27:02 -0500 From: Pratham Deshmukh To: CC: , , , , , Pratham Deshmukh Subject: [meta-arago][master][PATCH v2 2/3] onnx: fix 32-bit ARM build failure due to LONG_BIT mismatch Date: Thu, 26 Mar 2026 12:56:52 +0530 Message-ID: <20260326072653.1025506-3-p-deshmukh@ti.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260326072653.1025506-1-p-deshmukh@ti.com> References: <20260326072653.1025506-1-p-deshmukh@ti.com> MIME-Version: 1.0 X-C2ProcessedOrg: 333ef613-75bf-4e12-a4b1-8e3623f5dcea X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF000252A3:EE_|DM6PR10MB4124:EE_ X-MS-Office365-Filtering-Correlation-Id: 0666a6fa-cd29-4395-45b5-08de8b0914b8 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|376014|36860700016|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: S+TP0I6CzuAdRoXguBPp587B9K3991KVOFMN1OJHQ0a7UIZwDuAOdrMMXf7yNowPz8CDh9f/xdx0VtgtJ1yNUyawxQgM8BWIW+0fVYcu/89lpJAbpoKngDJzpaL+mTjBhoYRvKr1h4DEEj9Rs9OtC6sHTwNw0Z3ST7Zw7Iz4EbDTSTJ6kl0LB0c6MM6I30rKs3vTa36XQ++b1ugChG3Ex0KwXSMrsv5OaS7N6G7jmQH/dwatrKiFSqoAH4GEB6hCfqbBD+Ga7RlA78aE9FEbl+qQt4XE2JThytgjOB83Bp0RNozJRiL5tW47LuA5Xj7XpCiH1DbXkcCosOVJK0EKe1GJfAzW4DlOSUNRRt2dp5Q+rVru6BWe7mLGe5ey9/UCfbL/+7GndKySgHEagaoHLzXHSn8gxe6O+Pi/hd7aHxkWb08Ue4I7TiqQ/v86EwzWcFthD4ylfnXOn8z6O1NZmrxtoBNz3ERcS9SJZ6rin6ozQ+nthRRubDdbFXiG7PabzeWKGVWKXadslj+JWUhQrddVtCvkrH3YcwakyClKsqHHQMArKJyqOcxqVLzbnkK4ePjOPCPABMtNKqnygE86ncEflpIMZa4zMHF+ceEgSVTdtwCWXZbBZpB4iUmhQTlJoRzQ47Ky4CT7mJi0gDs7CBCq8K4n4gJFcMQEM+3qBbJt4Zh62NflPtztYYY+VHidPdO3RKnmOOZSNWuhJbFhSULyDqfRIqpX7/8B3PEtleMjVaRp2tcxGSDjea4PWdjVsESZMQbbE/nDQd4UTdKg2w== X-Forefront-Antispam-Report: CIP:198.47.23.195;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:lewvzet201.ext.ti.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(376014)(36860700016)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ZBhRMaKmt40rkdwwpAQ8mVpiUS2cblMYcOLrwRnrVYha8IRuk9LL87XBAVh6He5EqsD6LwCDZ2s/u96D4fl84TNgnLFfHi8KYpqiFttTsVKCAjTrHMkHKEL5F4lflEfjr6rT3NuoTrJgevWEn6176ynaZnaX0Wp4OIqYocldMD1HIn2vhs8lPYVh9axCQkcwU1XdzbhnC1tMe+OKywACexga5jq36wnQ8/fJ/46Z2qq+VzYouGhKNpSzLruDRYU1fNrP2LmD+f8sCfbXWekEzw3M4R7iny1I2CLaqH8Bq/vEpu4XIHOqgam+2U7bS8LJjHMQHUi/cg9Mbiz8cA8eJKJBhoO7V63w8LnIn78h7izzrsEv10CPB4xUEwGKQbnSG9/pX3v7aAxrEfM2tM8dnK6BfSiwsZMniI2iY4sNfoPUNLlSA5eBV5o8qgQH01R7 X-OriginatorOrg: ti.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Mar 2026 07:27:05.1059 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0666a6fa-cd29-4395-45b5-08de8b0914b8 X-MS-Exchange-CrossTenant-Id: e5b49634-450b-4709-8abb-1e2b19b982b7 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=e5b49634-450b-4709-8abb-1e2b19b982b7;Ip=[198.47.23.195];Helo=[lewvzet201.ext.ti.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF000252A3.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR10MB4124 List-Id: X-Webhook-Received: from 45-33-107-173.ip.linodeusercontent.com [45.33.107.173] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 26 Mar 2026 07:27:09 -0000 X-Groupsio-URL: https://lists.yoctoproject.org/g/meta-arago/message/17441 Switch from python3native to python3targetconfig and update CMake include paths to point to the target sysroot. The build was incorrectly pulling Python headers from the native (x86_64) sysroot, leading to a "LONG_BIT definition appears wrong" error when cross-compiling for 32-bit ARMv7. Using python3targetconfig ensures the compiler uses the target's architecture-specific configuration, making the recipe compatible with both the architectures. Signed-off-by: Pratham Deshmukh --- Change Logs: - No Changes meta-arago-extras/recipes-framework/onnx/onnx_1.18.0.bb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meta-arago-extras/recipes-framework/onnx/onnx_1.18.0.bb b/meta-arago-extras/recipes-framework/onnx/onnx_1.18.0.bb index fc565dec..7d27119d 100644 --- a/meta-arago-extras/recipes-framework/onnx/onnx_1.18.0.bb +++ b/meta-arago-extras/recipes-framework/onnx/onnx_1.18.0.bb @@ -61,7 +61,7 @@ EXTRA_OECMAKE:append = " \ --log-level=VERBOSE \ " -inherit python3native cmake +inherit python3targetconfig cmake python do_build_version_file() { import os From patchwork Thu Mar 26 07:26:53 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pratham Deshmukh X-Patchwork-Id: 84396 X-Patchwork-Delegate: reatmon@ti.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B1BB109C038 for ; Thu, 26 Mar 2026 07:27:20 +0000 (UTC) Received: from BN1PR04CU002.outbound.protection.outlook.com (BN1PR04CU002.outbound.protection.outlook.com [52.101.56.25]) by mx.groups.io with SMTP id smtpd.msgproc01-g2.42208.1774510032114900034 for ; Thu, 26 Mar 2026 00:27:12 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="dkim: body hash did not verify" header.i=@ti.com header.s=selector1 header.b=AVO4PsN9; spf=permerror, err=parse error for token &{10 18 spf.protection.outlook.com}: limit exceeded (domain: ti.com, ip: 52.101.56.25, mailfrom: p-deshmukh@ti.com) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=gVZSbYJGLglUCmHiv+yY6vz4RK5LrDfURf1+q645TBS9ORbYbjfLWrDoSIsmgVjfTvJcOQqGa0GlVk0oN336v19qLydo/e3xQ/IMO2k5SuYbUnbBgYew+yfJP+gDO1TtTfMShR9lYJZ+J9FkramBAHbmdBgAcaxLyZbMgZ8YL1nR1scjMh1vgryL7EZ/TmsSGNWqiim80kQRGSpgg3VKk4dqWDmI3IvTd6/rAbq08ENjQ80FdRi1QKTrEzTCEspl1xNAd27NTml7XoaBYl5KtZtZgPhPwsyzhAP5+euaiDrYGvOHzhVtqZVDBQLcaNOtyClez5ajvaixSvkSfX8lmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=osGjbOV7X2CvU1ueW+3yddJ6pAqnWYbOJDatBuadNlQ=; b=OOJXufwFuviT4hEVKebHIGXiSqvQRt2nVWoYrCpA6w1Uh+ZqAKS+oiQj/xsA4y3E1xSvve0zH4SUOfRgjjAgsYRc1RVfJOx538D3RFj1xmn/vI/6EvJ8CI/4Ei2YoAY+ypulaPkUBSD6UYG9uXE7oz34NNyewJuZJrwW4Pwk92GqwkGFkxhpAt6DwPKBAv6NE2S7HybkSFdlcxh+dW0PbAOBe2LbHZFc0u4PAhmq+aXGFL0nj/Oe+RQY4ZuZpYfnkmIk5h7v6J85P83/I5RIx5onCNemaFpzIppv8m1z2HvnHEfc/IPY1tJAvFzQbOtttUUMMwBkN8FvJ+37E8EvJA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 198.47.23.195) smtp.rcpttodomain=lists.yoctoproject.org smtp.mailfrom=ti.com; dmarc=pass (p=quarantine sp=none pct=100) action=none header.from=ti.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=osGjbOV7X2CvU1ueW+3yddJ6pAqnWYbOJDatBuadNlQ=; b=AVO4PsN9VR7rp4zJrJSowJqC8ZaA28Wbz+EG4L137EyFRtxymtuGS+jWBZXdejZP/CHyhnlHNtisHxKdO+vBQmSQZ1oEidmCGaNTuMIADIiDbebL7Szyinsbhj/wNn06D8edD90+szZXW0CHYTPP+42glu87Xox3CFw7kUb8jdU= Received: from PH8PR07CA0024.namprd07.prod.outlook.com (2603:10b6:510:2cd::8) by DS7PR10MB4991.namprd10.prod.outlook.com (2603:10b6:5:38e::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.22; Thu, 26 Mar 2026 07:27:09 +0000 Received: from SN1PEPF000252A0.namprd05.prod.outlook.com (2603:10b6:510:2cd:cafe::7f) by PH8PR07CA0024.outlook.office365.com (2603:10b6:510:2cd::8) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.32 via Frontend Transport; Thu, 26 Mar 2026 07:27:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 198.47.23.195) smtp.mailfrom=ti.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ti.com; Received-SPF: Pass (protection.outlook.com: domain of ti.com designates 198.47.23.195 as permitted sender) receiver=protection.outlook.com; client-ip=198.47.23.195; helo=lewvzet201.ext.ti.com; pr=C Received: from lewvzet201.ext.ti.com (198.47.23.195) by SN1PEPF000252A0.mail.protection.outlook.com (10.167.242.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.21 via Frontend Transport; Thu, 26 Mar 2026 07:27:08 +0000 Received: from DLEE212.ent.ti.com (157.170.170.114) by lewvzet201.ext.ti.com (10.4.14.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:07 -0500 Received: from DLEE206.ent.ti.com (157.170.170.90) by DLEE212.ent.ti.com (157.170.170.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 26 Mar 2026 02:27:07 -0500 Received: from lelvem-mr05.itg.ti.com (10.180.75.9) by DLEE206.ent.ti.com (157.170.170.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20 via Frontend Transport; Thu, 26 Mar 2026 02:27:07 -0500 Received: from pratham-TI.dhcp.ti.com (pratham-ti.dhcp.ti.com [172.24.233.101]) by lelvem-mr05.itg.ti.com (8.18.1/8.18.1) with ESMTP id 62Q7Qsxe1733061; Thu, 26 Mar 2026 02:27:05 -0500 From: Pratham Deshmukh To: CC: , , , , , Pratham Deshmukh Subject: [meta-arago][master][PATCH v2 3/3] arm-compute-library: Exclude AARCH64 KleiDiAI kernels from armv7 Date: Thu, 26 Mar 2026 12:56:53 +0530 Message-ID: <20260326072653.1025506-4-p-deshmukh@ti.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260326072653.1025506-1-p-deshmukh@ti.com> References: <20260326072653.1025506-1-p-deshmukh@ti.com> MIME-Version: 1.0 X-C2ProcessedOrg: 333ef613-75bf-4e12-a4b1-8e3623f5dcea X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF000252A0:EE_|DS7PR10MB4991:EE_ X-MS-Office365-Filtering-Correlation-Id: 28fa1fef-f9a7-4196-6a07-08de8b09169a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|376014|36860700016|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: YUt2qaGJDNbrf5ymC0Waj96X/pAwSCzyohuIRTp7zznjrWfGYbhTmQ1ejD6DS/yerpkSrFSg1uTzY6SG7gYawbGW0aov/mqQfg8hB1cMQwrDThmcA1fFOqii15qyjKyIGOGtrtmK46ycaJbQ5W26vh5h9Jm22lBBS6fi6HUAoSAol+udWSiERgYdXwiojhDYUSeMbDe9dogErNk0df20k7bmJgos7xePIB7OKg5fYbNZOlipBi1HHh4G+qGESrZP+VFkYfp8Qb+KCjbKyB3oJMzBEHjrXmmcd6h6S7MbFv0nTQZ8FC9X40F1EmkRF225p+zLbdU3P25x4iOyCmN8A6BFczihrwwkANeW9uQrlQNmwFX0xJYh1SP73uTcih+NSnn9DIVzzSer0fkQkRGnDxMwEJQCSoAxrc4W3+L33ODNdAgq3Snn2F4isADHPUFuVXJ/yDSiri+5Tk+IEKRPufmdAAys+y2rbMLFoZwExx0NQRTWXr/kCZS+zb5jbSA1tZLAuwAI9vRvpQZiphps3jy402sLu5FCifGq6AJHX8UnSz0AIbbUxAnZ/jE8/C9q6vtvl18oseZBHdm3sn2wEwDDmjHJGPznYetbXyyRZ8/zBiaPqi0o3dy1fIuHNj11qaxwNWIg1CEptg13zSneMh0Um9msUSXhwgcBFUSLGCQ/c3Wz6wVchYcfh8uib/WeAENbJHJkgOxJNDlpclUi6G0Q+SqVsd6Vt9BfohX45MC2Sa9PeJLtrDaxLGeVncmyrkk3lktMUQP90eRI0/Grmw== X-Forefront-Antispam-Report: CIP:198.47.23.195;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:lewvzet201.ext.ti.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(376014)(36860700016)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nfWOsygjo1jCNRIThUbKS9sIM4VVgxdZ3P3Tmqn5fT0GPP3NVztmEaLJW/DP4YAIwCtciFasmUgA+6CzuBwoaZxJ79ZrSLoSG8oJO+K2wDHRr4MoGlRObYZPQ2njhhiJNOk1dA6S2I6L3p8ZJn3wkFWz+DN6u6tBBqB3MnYR8PLse5w1Hf+hnzeUIxHxJZCKJD1QrLkunoYba2Of/KSmdZUU9/50s4gc3UNpg1HKgRB5xe4JdNpe4j6iGHAii+wUH5BXuN/aLpDFOKMYKWny2DyA9mm5i6EWUF1h3+CwPqX/HcGZ3baYWlNAbj7Ekc1Kp9JYataobuuLe9LIoncTX5E5eYC9XUdWilWHQtaPNEFzUJ6JcoLEuY1BcVmyTHR7vmWlWwnccAZKnxIFYnw9qznZGHoEbNjBAumhGgm6B4GQ8G4PjnkLVgkXxFAGtL5E X-OriginatorOrg: ti.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Mar 2026 07:27:08.2666 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 28fa1fef-f9a7-4196-6a07-08de8b09169a X-MS-Exchange-CrossTenant-Id: e5b49634-450b-4709-8abb-1e2b19b982b7 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=e5b49634-450b-4709-8abb-1e2b19b982b7;Ip=[198.47.23.195];Helo=[lewvzet201.ext.ti.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF000252A0.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR10MB4991 List-Id: X-Webhook-Received: from 45-33-107-173.ip.linodeusercontent.com [45.33.107.173] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Thu, 26 Mar 2026 07:27:20 -0000 X-Groupsio-URL: https://lists.yoctoproject.org/g/meta-arago/message/17442 Prevent AARCH64-specific KleiDiAI kernels from being included in armv7 builds, resolving build issues on 32-bit ARM platforms. Added Patch; - 0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch Signed-off-by: Pratham Deshmukh --- Change Logs: - Added shorter commit message ...RCH64-specific-KleiDiAI-kernels-from.patch | 37 +++++++++++++++++++ .../arm-compute-library_52.7.0.bb | 1 + 2 files changed, 38 insertions(+) create mode 100644 meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library/0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch diff --git a/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library/0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch b/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library/0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch new file mode 100644 index 00000000..fd72189b --- /dev/null +++ b/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library/0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch @@ -0,0 +1,37 @@ +From 4e2bdad07af1a4a16264ef1373488f41fadd62ab Mon Sep 17 00:00:00 2001 +From: Pratham Deshmukh +Date: Wed, 18 Mar 2026 20:55:08 +0530 +Subject: [PATCH] cmake: Exclude AARCH64-specific KleiDiAI kernels from armv7 + builds + +The KleiDiAI matrix multiplication kernels use AARCH64-specific instructions +that are incompatible with armv7 architecture. + +Update CMake and add expressions to conditionally exclude these files when +ACL_ARCH_ISA is set to armv7, preventing build failures on 32-bit ARM platforms. + +Upstream-Status: Pending + +Signed-off-by: Pratham Deshmukh +--- + src/CMakeLists.txt | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt +index 0867ba81ad..45046fd0f1 100644 +--- a/src/CMakeLists.txt ++++ b/src/CMakeLists.txt +@@ -361,8 +361,8 @@ set(ARM_COMPUTE_SVE2_SOURCES + ) + + set(ARM_COMPUTE_SOURCES +- ../third_party/kleidiai/kai/ukernels/matmul/matmul_clamp_f32_f32_f32p/kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla.c +- ../third_party/kleidiai/kai/ukernels/matmul/pack/kai_rhs_pack_kxn_f32p8x1biasf32_f32_f32_neon.c ++ $<$>:../third_party/kleidiai/kai/ukernels/matmul/matmul_clamp_f32_f32_f32p/kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla.c> ++ $<$>:../third_party/kleidiai/kai/ukernels/matmul/pack/kai_rhs_pack_kxn_f32p8x1biasf32_f32_f32_neon.c> + c/AclContext.cpp + c/AclOperator.cpp + c/AclQueue.cpp +-- +2.34.1 + diff --git a/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library_52.7.0.bb b/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library_52.7.0.bb index 02155d20..f3b67a66 100644 --- a/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library_52.7.0.bb +++ b/meta-arago-extras/recipes-devtools/arm-compute-library/arm-compute-library_52.7.0.bb @@ -13,6 +13,7 @@ SRC_URI = " \ file://0006-Remove-TARGET-dependency.patch \ file://0007-cmake-Generate-generic-library-name-instead-of.patch \ file://0008-Add-FP16-source-path.patch \ + file://0009-cmake-Exclude-AARCH64-specific-KleiDiAI-kernels-from.patch \ " SRCREV = "c9a1fff898abd5109b759e8e16616519dc758fdd"