CI: restrict compression threading

Message ID	20221011130710.1638676-1-ross.burton@arm.com
State	New
Headers	show Return-Path: <ross.burton@arm.com> ip: 217.140.110.172, mailfrom: ross.burton@arm.com) From: Ross Burton <ross.burton@arm.com> To: meta-arm@lists.yoctoproject.org Cc: nd@arm.com Subject: [PATCH] CI: restrict compression threading Date: Tue, 11 Oct 2022 14:07:10 +0100 Message-Id: <20221011130710.1638676-1-ross.burton@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable
Series	CI: restrict compression threading \| expand CI: restrict compression threading

Ross Burton Oct. 11, 2022, 1:07 p.m. UTC

On large systems, using all of the CPUs and 50% of the RAM when xz
compressing packages is actively harmful because it will happily use up
to that limit.

Signed-off-by: Ross Burton <ross.burton@arm.com>
---
 ci/base.yml | 3 +++
 1 file changed, 3 insertions(+)

Jérôme Forissier Oct. 11, 2022, 1:26 p.m. UTC | #1

On 10/11/22 15:07, Ross Burton wrote:
> On large systems, using all of the CPUs and 50% of the RAM when xz
> compressing packages is actively harmful because it will happily use up
> to that limit.
> 
> Signed-off-by: Ross Burton <ross.burton@arm.com>
> ---
>  ci/base.yml | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/ci/base.yml b/ci/base.yml
> index 3f644c1e..3619b8f2 100644
> --- a/ci/base.yml
> +++ b/ci/base.yml
> @@ -30,6 +30,9 @@ local_conf_header:
>      CONF_VERSION = "2"
>      BB_NUMBER_THREADS = "16"
>      PARALLEL_MAKE = "-j16"

That is arguably wrong already, because it restricts performance on >16 core
systems. I for example routinely build on a 32-core Aarch64 and 36-core
x86_64 machines. Yocto builds are painful enough that I don't want to spend
more time than necessary due to wrong settings ;-)

> +    XZ_MEMLIMIT = "25%"
> +    XZ_THREADS = "16"
> +    ZSTD_THREADS = "16"

Another hard-coded limit which can't be optimal on all systems obviously.
I am talking about the thread limits -- the memory limit is relative so it's
probably reasonable.
What's the problem with the default? You said "actively harmful", what
does it mean? Performance drop? By how much? What happens if you introduce
only the XZ_MEMLIMIT?

Thanks,

Jon Mason Oct. 11, 2022, 2:29 p.m. UTC | #2

On Tue, 11 Oct 2022 14:07:10 +0100, Ross Burton wrote:
> On large systems, using all of the CPUs and 50% of the RAM when xz
> compressing packages is actively harmful because it will happily use up
> to that limit.

Applied, thanks!

[1/1] CI: restrict compression threading
      commit: cbcb1bf39de2cd7205447af9a64442402fbdc6f2

Best regards,

Jon Mason Oct. 11, 2022, 3:09 p.m. UTC | #3

On Tue, Oct 11, 2022 at 03:26:36PM +0200, Jerome Forissier wrote:
> 
> 
> On 10/11/22 15:07, Ross Burton wrote:
> > On large systems, using all of the CPUs and 50% of the RAM when xz
> > compressing packages is actively harmful because it will happily use up
> > to that limit.
> > 
> > Signed-off-by: Ross Burton <ross.burton@arm.com>
> > ---
> >  ci/base.yml | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/ci/base.yml b/ci/base.yml
> > index 3f644c1e..3619b8f2 100644
> > --- a/ci/base.yml
> > +++ b/ci/base.yml
> > @@ -30,6 +30,9 @@ local_conf_header:
> >      CONF_VERSION = "2"
> >      BB_NUMBER_THREADS = "16"
> >      PARALLEL_MAKE = "-j16"
> 
> That is arguably wrong already, because it restricts performance on >16 core
> systems. I for example routinely build on a 32-core Aarch64 and 36-core
> x86_64 machines. Yocto builds are painful enough that I don't want to spend
> more time than necessary due to wrong settings ;-)

Are you running Gitlab CI on meta-arm too?  I thought I was the only
one running it outside of Arm internal.

I dislike this for the same reasoning.  We're discussing making this a
Gitlab CI variable, with a default that doesn't do the limiting.

 
> > +    XZ_MEMLIMIT = "25%"
> > +    XZ_THREADS = "16"
> > +    ZSTD_THREADS = "16"
> 
> Another hard-coded limit which can't be optimal on all systems obviously.
> I am talking about the thread limits -- the memory limit is relative so it's
> probably reasonable.
> What's the problem with the default? You said "actively harmful", what
> does it mean? Performance drop? By how much? What happens if you introduce
> only the XZ_MEMLIMIT?

This was using 256 cores on our internal build servers, which was
slowing our CI pipelines significantly.  I'll investigate doing the
gitlab ci variables sooner now.

Thanks,
Jon

> 
> Thanks,
> 
> -- 
> Jerome
> 
> >      LICENSE_FLAGS_ACCEPTED += "Arm-FVP-EULA"
> >    setup: |
> >      PACKAGE_CLASSES = "package_ipk"
> > 
> > 
> > 
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > View/Reply Online (#3948): https://lists.yoctoproject.org/g/meta-arm/message/3948
> > Mute This Topic: https://lists.yoctoproject.org/mt/94257870/7094589
> > Group Owner: meta-arm+owner@lists.yoctoproject.org
> > Unsubscribe: https://lists.yoctoproject.org/g/meta-arm/unsub [jerome.forissier@linaro.org]
> > -=-=-=-=-=-=-=-=-=-=-=-
> > 
>

Jérôme Forissier Oct. 11, 2022, 3:36 p.m. UTC | #4

On 10/11/22 17:09, Jon Mason wrote:
> On Tue, Oct 11, 2022 at 03:26:36PM +0200, Jerome Forissier wrote:
>>
>>
>> On 10/11/22 15:07, Ross Burton wrote:
>>> On large systems, using all of the CPUs and 50% of the RAM when xz
>>> compressing packages is actively harmful because it will happily use up
>>> to that limit.
>>>
>>> Signed-off-by: Ross Burton <ross.burton@arm.com>
>>> ---
>>>  ci/base.yml | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/ci/base.yml b/ci/base.yml
>>> index 3f644c1e..3619b8f2 100644
>>> --- a/ci/base.yml
>>> +++ b/ci/base.yml
>>> @@ -30,6 +30,9 @@ local_conf_header:
>>>      CONF_VERSION = "2"
>>>      BB_NUMBER_THREADS = "16"
>>>      PARALLEL_MAKE = "-j16"
>>
>> That is arguably wrong already, because it restricts performance on >16 core
>> systems. I for example routinely build on a 32-core Aarch64 and 36-core
>> x86_64 machines. Yocto builds are painful enough that I don't want to spend
>> more time than necessary due to wrong settings ;-)
> 
> Are you running Gitlab CI on meta-arm too?  I thought I was the only
> one running it outside of Arm internal.

No, not meta-arm directly, but some layer [1] which used meta-arm as a template
[2] and includes meta-arm as well as other things.

[1] https://gitlab.com/Linaro/trustedsubstrate/meta-ts
[2] https://gitlab.com/Linaro/trustedsubstrate/meta-ts/-/commit/52985d29fb0332d9479d838a86e098f54be1bef6

> 
> I dislike this for the same reasoning.  We're discussing making this a
> Gitlab CI variable, with a default that doesn't do the limiting.

Sounds like a good idea.
  
>>> +    XZ_MEMLIMIT = "25%"
>>> +    XZ_THREADS = "16"
>>> +    ZSTD_THREADS = "16"
>>
>> Another hard-coded limit which can't be optimal on all systems obviously.
>> I am talking about the thread limits -- the memory limit is relative so it's
>> probably reasonable.
>> What's the problem with the default? You said "actively harmful", what
>> does it mean? Performance drop? By how much? What happens if you introduce
>> only the XZ_MEMLIMIT?
> 
> This was using 256 cores on our internal build servers,

256 cores!? I'm jealous :) And don't tell how much RAM you have! :D
 
> which was
> slowing our CI pipelines significantly.  I'll investigate doing the
> gitlab ci variables sooner now.

Cool, thanks.

Ross Burton Oct. 11, 2022, 3:53 p.m. UTC | #5

On 11 Oct 2022, at 14:26, Jerome Forissier <jerome.forissier@linaro.org> wrote:
> That is arguably wrong already, because it restricts performance on >16 core
> systems. I for example routinely build on a 32-core Aarch64 and 36-core
> x86_64 machines. Yocto builds are painful enough that I don't want to spend
> more time than necessary due to wrong settings ;-)

Hypothetically, sure.  Do you have numbers to prove that setting all of those to 16 is noticably slower than setting them to 32?

Ross

Ross Burton Oct. 11, 2022, 6:43 p.m. UTC | #6

On 11 Oct 2022, at 16:53, Ross Burton via lists.yoctoproject.org <ross.burton=arm.com@lists.yoctoproject.org> wrote:
> 
> On 11 Oct 2022, at 14:26, Jerome Forissier <jerome.forissier@linaro.org> wrote:
>> That is arguably wrong already, because it restricts performance on >16 core
>> systems. I for example routinely build on a 32-core Aarch64 and 36-core
>> x86_64 machines. Yocto builds are painful enough that I don't want to spend
>> more time than necessary due to wrong settings ;-)
> 
> Hypothetically, sure.  Do you have numbers to prove that setting all of those to 16 is noticably slower than setting them to 32?

I just did a couple of builds on a workstation we’re trialing.  32-core Threadripper, 64GB RAM, NVMe storage. The test case was tip of poky kirkstone, core-image-sato from a populated DL_DIR but no sstate.  With all threading set to 32 it does a build in 30 minutes, and with all threading set to 16 it did a build in 32 minutes.

So in this test, there was a marginal gain.  Looking at the buildstats, it looks like the bulk of that came from rust-llvm-native.

However, the flip side of “just use the number of cores” is that can sometimes be a terrible thing to do.  I’ve already added a change to oe-core to cap the number of processors returned to 64, a number chosen as high enough that it still seems like “a lot” but not the real number that some systems can give.  Concrete example: a ThunderX 2 has 64 physical cores with 4-way hardware threading, so lscpu says 256 cores.  I could run a benchmark to see how that behaves compared to 32 on the same hardware, but I suspect it’s actually slower.

So yeah, it’s a balance.  But at the end of the day, this is our CI setup. We should let the top-level “thread count” be tunable per-runner via a variable because some people (looks at Jon) like using inappropriate hardware for their personal lab, but reducing the number of threads and memory usage of XZ should help even out the load when compressing large files.

Oh, and:
> 256 cores!? I'm jealous :) And don't tell how much RAM you have! :D

256GB. :)

Right now, 200GB of that is disk cache.

$ cat /proc/meminfo
MemTotal:       263480312 kB
MemFree:        11203792 kB
MemAvailable:   253022088 kB
Buffers:        25498024 kB
Cached:         200004092 kB

Ross

CI: restrict compression threading

Commit Message

Comments

Patch