Message ID | 20240215212613.57012-1-afd@ti.com |
---|---|
State | Superseded |
Delegated to: | Ryan Eatmon |
Headers | show |
Series | [meta-ti,master/kirkstone] conf: machine: k3: Use Cortex-A53/A72 CPU tune | expand |
Hi Andrew, I was testing this patch locally, and wanted to see if we get some perf improvements with some benchmarks available on the default image. What benchmark do you recommend testing this against? I ran '/runLinpack/' on the tisdk-default-image on j7200 and did not see any difference in the reported performance with and without this patch... is this expected? With the default image built on latest SDK, WITHOUT the patch: Unrolled Single Precision 1845878 Kflops ; 10 Reps With default image built WITH the patch: Unrolled Single Precision 1857362 Kflops ; 10 Reps Thanks, Aniket On 2/16/2024 2:56 AM, Andrew Davis via lists.yoctoproject.org wrote: > All current K3 devices use either A53 or A72. Use the compile tune > configuration specific for these to allow the compiler to make better > optimizations. Signed-off-by: Andrew Davis <afd@ ti. com> --- > meta-ti-bsp/conf/machine/include/k3. inc > ZjQcmQRYFpfptBannerStart > This message was sent from outside of Texas Instruments. > Do not click links or open attachments unless you recognize the source > of this email and know the content is safe. > ZjQcmQRYFpfptBannerEnd > All current K3 devices use either A53 or A72. Use the compile tune > configuration specific for these to allow the compiler to make > better optimizations. > > Signed-off-by: Andrew Davis<afd@ti.com> > --- > meta-ti-bsp/conf/machine/include/k3.inc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc > index 2415f0ba..7c3579af 100644 > --- a/meta-ti-bsp/conf/machine/include/k3.inc > +++ b/meta-ti-bsp/conf/machine/include/k3.inc > @@ -3,7 +3,7 @@ > require conf/machine/include/ti-soc.inc > SOC_FAMILY:append = ":k3" > > -require conf/machine/include/arm/arch-arm64.inc > +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc > > BBMULTICONFIG += "k3r5" > > -- > 2.39.2 > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17482):https://urldefense.com/v3/__https://lists.yoctoproject.org/g/meta-ti/message/17482__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL844OMEXw$ > Mute This Topic:https://urldefense.com/v3/__https://lists.yoctoproject.org/mt/104381861/6607860__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL8UKq3aCI$ > Group Owner:meta-ti+owner@lists.yoctoproject.org > Unsubscribe:https://urldefense.com/v3/__https://lists.yoctoproject.org/g/meta-ti/unsub__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL8aX0fg_c$ [a-limaye@ti.com] > -=-=-=-=-=-=-=-=-=-=-=- >
Unfortunately, NAK. This is considered an antisocial behavior for a BSP in the Yocto Project world. And the performance benefit is questionable with 1%-2%, if at all. The proper place for any extra optimization tunes is in a distro config. Maybe even by end customer's final product, not a reference distro. Consider a distro that supports multiple HW platforms and uses multiple BSPs besides meta-ti - YoE, AGL, etc. You do want a common denominator tunes in order to get the most binary re-use across the platforms. For example, AGL goes to some extreme lengths to override such custom tunes set by misbehaving BSPs and it's quite ugly. And moreover, we've gone through this motion in the past many years ago when we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 tunes by default, but eventually ended up setting a common ARMv7 tune: DEFAULTTUNE ?= "armv7athf-neon" So, you should either leave the current arch-arm64.inc inclusion as is, or if you insist on including tune-cortexa72-cortexa53.inc, set the default tune back to plain aarch64: DEFAULTTUNE ?= "aarch64" On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via lists.yoctoproject.org wrote: > All current K3 devices use either A53 or A72. Use the compile tune > configuration specific for these to allow the compiler to make > better optimizations. > > Signed-off-by: Andrew Davis <afd@ti.com> > --- > meta-ti-bsp/conf/machine/include/k3.inc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc > index 2415f0ba..7c3579af 100644 > --- a/meta-ti-bsp/conf/machine/include/k3.inc > +++ b/meta-ti-bsp/conf/machine/include/k3.inc > @@ -3,7 +3,7 @@ > require conf/machine/include/ti-soc.inc > SOC_FAMILY:append = ":k3" > > -require conf/machine/include/arm/arch-arm64.inc > +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc > > BBMULTICONFIG += "k3r5" > > -- > 2.39.2
On 2/16/24 2:23 PM, Denys Dmytriyenko wrote: > Unfortunately, NAK. > > This is considered an antisocial behavior for a BSP in the Yocto Project > world. And the performance benefit is questionable with 1%-2%, if at all. > This stated when a potential customer noticed building and running some benchmarks (linpack for instance) on our SDK were being out-performed by some other vendors. Even though on paper our platforms should have been the better performing ones. After investigating it turns out these other vendors have these tune options in their BSP layers, causing the performance discrepancy. So the performance here, even of a couple percent, is very important. > The proper place for any extra optimization tunes is in a distro config. Maybe > even by end customer's final product, not a reference distro. > > Consider a distro that supports multiple HW platforms and uses multiple BSPs > besides meta-ti - YoE, AGL, etc. You do want a common denominator tunes in > order to get the most binary re-use across the platforms. > If one wants binary re-use they can override the tune. Otherwise maybe they should be using Debian or some other binary distro. The main selling point for Yocto IMHO is customizing like this. The best part of rebuilding everything from scratch every time for every machine is we can have these machine specific tunings. > For example, AGL goes to some extreme lengths to override such custom tunes > set by misbehaving BSPs and it's quite ugly. > Then we should work to make it easier to override for those folks, not simply leave this performance on the table. > And moreover, we've gone through this motion in the past many years ago when > we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 tunes > by default, but eventually ended up setting a common ARMv7 tune: > > DEFAULTTUNE ?= "armv7athf-neon" > > So, you should either leave the current arch-arm64.inc inclusion as is, or if > you insist on including tune-cortexa72-cortexa53.inc, set the default tune > back to plain aarch64: > > DEFAULTTUNE ?= "aarch64" > I see our friends over in meta-xilinx are doing machine specific DEFAULTTUNEs. I was thinking of matching that to keep our BSP performance competitive. But as a compromise and to avoid "antisocial behavior" as you say, I think I can live with DEFAULTTUNE ?= "aarch64". Will resend with that. Andrew > > On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via lists.yoctoproject.org wrote: >> All current K3 devices use either A53 or A72. Use the compile tune >> configuration specific for these to allow the compiler to make >> better optimizations. >> >> Signed-off-by: Andrew Davis <afd@ti.com> >> --- >> meta-ti-bsp/conf/machine/include/k3.inc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc >> index 2415f0ba..7c3579af 100644 >> --- a/meta-ti-bsp/conf/machine/include/k3.inc >> +++ b/meta-ti-bsp/conf/machine/include/k3.inc >> @@ -3,7 +3,7 @@ >> require conf/machine/include/ti-soc.inc >> SOC_FAMILY:append = ":k3" >> >> -require conf/machine/include/arm/arch-arm64.inc >> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc >> >> BBMULTICONFIG += "k3r5" >> >> -- >> 2.39.2
On 2/20/2024 8:31 AM, Andrew Davis wrote: > On 2/16/24 2:23 PM, Denys Dmytriyenko wrote: >> Unfortunately, NAK. >> >> This is considered an antisocial behavior for a BSP in the Yocto Project >> world. And the performance benefit is questionable with 1%-2%, if at all. >> > > This stated when a potential customer noticed building and running some > benchmarks (linpack for instance) on our SDK were being out-performed by > some other vendors. Even though on paper our platforms should have been > the better performing ones. > > After investigating it turns out these other vendors have these tune > options in their BSP layers, causing the performance discrepancy. > > So the performance here, even of a couple percent, is very important. > >> The proper place for any extra optimization tunes is in a distro >> config. Maybe >> even by end customer's final product, not a reference distro. >> >> Consider a distro that supports multiple HW platforms and uses >> multiple BSPs >> besides meta-ti - YoE, AGL, etc. You do want a common denominator >> tunes in >> order to get the most binary re-use across the platforms. >> > > If one wants binary re-use they can override the tune. Otherwise maybe > they should be using Debian or some other binary distro. The main selling > point for Yocto IMHO is customizing like this. The best part of rebuilding > everything from scratch every time for every machine is we can have these > machine specific tunings. > >> For example, AGL goes to some extreme lengths to override such custom >> tunes >> set by misbehaving BSPs and it's quite ugly. >> > > Then we should work to make it easier to override for those folks, not > simply > leave this performance on the table. > >> And moreover, we've gone through this motion in the past many years >> ago when >> we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 >> tunes >> by default, but eventually ended up setting a common ARMv7 tune: >> >> DEFAULTTUNE ?= "armv7athf-neon" >> >> So, you should either leave the current arch-arm64.inc inclusion as >> is, or if >> you insist on including tune-cortexa72-cortexa53.inc, set the default >> tune >> back to plain aarch64: >> >> DEFAULTTUNE ?= "aarch64" >> > > I see our friends over in meta-xilinx are doing machine specific > DEFAULTTUNEs. > I was thinking of matching that to keep our BSP performance competitive. > But > as a compromise and to avoid "antisocial behavior" as you say, I think I > can > live with DEFAULTTUNE ?= "aarch64". > > Will resend with that. So, if we include the more targeted tuning file, but set DEFAULTUNE to the generic, then how do our builds use the more targeted tuning? Is that something we have to set in the local.conf as part of our builds? Or is this some sort of magic that occurs that gets the correct thing? > Andrew > >> >> On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via >> lists.yoctoproject.org wrote: >>> All current K3 devices use either A53 or A72. Use the compile tune >>> configuration specific for these to allow the compiler to make >>> better optimizations. >>> >>> Signed-off-by: Andrew Davis <afd@ti.com> >>> --- >>> meta-ti-bsp/conf/machine/include/k3.inc | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc >>> b/meta-ti-bsp/conf/machine/include/k3.inc >>> index 2415f0ba..7c3579af 100644 >>> --- a/meta-ti-bsp/conf/machine/include/k3.inc >>> +++ b/meta-ti-bsp/conf/machine/include/k3.inc >>> @@ -3,7 +3,7 @@ >>> require conf/machine/include/ti-soc.inc >>> SOC_FAMILY:append = ":k3" >>> -require conf/machine/include/arm/arch-arm64.inc >>> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc >>> BBMULTICONFIG += "k3r5" >>> -- >>> 2.39.2
diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc index 2415f0ba..7c3579af 100644 --- a/meta-ti-bsp/conf/machine/include/k3.inc +++ b/meta-ti-bsp/conf/machine/include/k3.inc @@ -3,7 +3,7 @@ require conf/machine/include/ti-soc.inc SOC_FAMILY:append = ":k3" -require conf/machine/include/arm/arch-arm64.inc +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc BBMULTICONFIG += "k3r5"
All current K3 devices use either A53 or A72. Use the compile tune configuration specific for these to allow the compiler to make better optimizations. Signed-off-by: Andrew Davis <afd@ti.com> --- meta-ti-bsp/conf/machine/include/k3.inc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)