diff mbox series

[kirkstone] expat: fix CVE-2023-52425

Message ID 20240809123902.17701-1-mail2szahir@gmail.com
State Superseded, archived
Commit 1bdcd10930a2998f6bbe56b3ba4c9b6c91203b39
Delegated to: Steve Sakoman
Headers show
Series [kirkstone] expat: fix CVE-2023-52425 | expand

Commit Message

aszh07 Aug. 9, 2024, 12:39 p.m. UTC
libexpat through 2.5.0 allows a denial of service
(resource consumption) because many full reparsings
are required in the case of a large token for which
multiple buffer fills are needed.

References:
https://security-tracker.debian.org/tracker/CVE-2023-52425
https://ubuntu.com/security/CVE-2023-52425
https://packages.ubuntu.com/mantic/expat
https://github.com/libexpat/libexpat/pull/789

Signed-off-by: Bindu Bhabu <bhabu.bindu@kpit.com>
---
 .../expat/expat/CVE-2023-52425.patch          | 581 ++++++++++++++++++
 meta/recipes-core/expat/expat_2.5.0.bb        |   1 +
 2 files changed, 582 insertions(+)
 create mode 100644 meta/recipes-core/expat/expat/CVE-2023-52425.patch

Comments

Steve Sakoman Aug. 9, 2024, 2:20 p.m. UTC | #1
On Fri, Aug 9, 2024 at 5:39 AM aszh07 via lists.openembedded.org
<mail2szahir=gmail.com@lists.openembedded.org> wrote:
>
> libexpat through 2.5.0 allows a denial of service
> (resource consumption) because many full reparsings
> are required in the case of a large token for which
> multiple buffer fills are needed.
>
> References:
> https://security-tracker.debian.org/tracker/CVE-2023-52425
> https://ubuntu.com/security/CVE-2023-52425
> https://packages.ubuntu.com/mantic/expat
> https://github.com/libexpat/libexpat/pull/789
>
> Signed-off-by: Bindu Bhabu <bhabu.bindu@kpit.com>
> ---
>  .../expat/expat/CVE-2023-52425.patch          | 581 ++++++++++++++++++
>  meta/recipes-core/expat/expat_2.5.0.bb        |   1 +
>  2 files changed, 582 insertions(+)
>  create mode 100644 meta/recipes-core/expat/expat/CVE-2023-52425.patch
>
> diff --git a/meta/recipes-core/expat/expat/CVE-2023-52425.patch b/meta/recipes-core/expat/expat/CVE-2023-52425.patch
> new file mode 100644
> index 0000000000..00bc464173
> --- /dev/null
> +++ b/meta/recipes-core/expat/expat/CVE-2023-52425.patch
> @@ -0,0 +1,581 @@
> +Backport of https://github.com/libexpat/libexpat/pull/789
> +
> +From 9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1 Mon Sep 17 00:00:00 2001
> +From: Snild Dolkow <snild@sony.com>
> +Date: Thu, 17 Aug 2023 16:25:26 +0200
> +Subject: [PATCH] Skip parsing after repeated partials on the same token
> +MIME-Version: 1.0
> +Content-Type: text/plain; charset=UTF-8
> +Content-Transfer-Encoding: 8bit
> +
> +When the parse buffer contains the starting bytes of a token but not
> +all of them, we cannot parse the token to completion. We call this a
> +partial token.  When this happens, the parse position is reset to the
> +start of the token, and the parse() call returns. The client is then
> +expected to provide more data and call parse() again.
> +
> +In extreme cases, this means that the bytes of a token may be parsed
> +many times: once for every buffer refill required before the full token
> +is present in the buffer.
> +
> +Math:
> +  Assume there's a token of T bytes
> +  Assume the client fills the buffer in chunks of X bytes
> +  We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=)
> +  That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression)
> +  While it is alleviated by larger refills, this amounts to O(T²)
> +
> +Expat grows its internal buffer by doubling it when necessary, but has
> +no way to inform the client about how much space is available. Instead,
> +we add a heuristic that skips parsing when we've repeatedly stopped on
> +an incomplete token. Specifically:
> +
> + * Only try to parse if we have a certain amount of data buffered
> + * Every time we stop on an incomplete token, double the threshold
> + * As soon as any token completes, the threshold is reset
> +
> +This means that when we get stuck on an incomplete token, the threshold
> +grows exponentially, effectively making the client perform larger buffer
> +fills, limiting how many times we can end up re-parsing the same bytes.
> +
> +Math:
> +  Assume there's a token of T bytes
> +  Assume the client fills the buffer in chunks of X bytes
> +  We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger)
> +  That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X
> +  This is equal to 2T-X, which amounts to O(T)
> +
> +We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to
> +increase performance further, at the cost of further increasing the
> +risk of growing the buffer more than necessary. This can easily be
> +adjusted in the future, if desired.
> +
> +This is all completely transparent to the client, except for:
> +1. possible delay of some callbacks (when our heuristic overshoots)
> +2. apps that never do isFinal=XML_TRUE could miss data at the end
> +
> +For the affected testdata, this change shows a 100-400x speedup.
> +The recset.xml benchmark shows no clear change either way.
> +
> +Before:
> +benchmark -n ../testdata/largefiles/recset.xml 65535 3
> +  3 loops, with buffer size 65535. Average time per loop: 0.270223
> +benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 15.033048
> +benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.018027
> +benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 11.775362
> +benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 11.711414
> +benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.019362
> +
> +After:
> +./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3
> +  3 loops, with buffer size 65535. Average time per loop: 0.269030
> +./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.044794
> +./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.016377
> +./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.027022
> +./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.099360
> +./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
> +  3 loops, with buffer size 4096. Average time per loop: 0.017956
> +
> +CVE: CVE-2023-52425
> +Upstream-Status: Backport [http://archive.ubuntu.com/ubuntu/pool/main/e/expat/expat_2.5.0-2ubuntu0.1.debian.tar.xz]

Not sure why this was sent three times!  But all have the same issue
-- ubuntu is not the upstream for libexpat, please reference the
actual upstream commit.

Thanks!

Steve

> +Comments: Hunks refreshed.
> +          Patch difference/summary has also been modified.
> +Signed-off-by: Bhabu Bindu <bhabu.bindu@kpit.com>
> +
> +---
> + lib/expat.h            | 5   +++++
> + lib/libexpat.def.cmake | 2   ++
> + lib/xmlparse.c         | 228 ++++++++++++++++++++++++++++++++++++++++++----------------------------------
> + xmlwf/xmlwf.c          | 20  ++++++++++++++++++++
> + xmlwf/xmlwf_helpgen.py | 4   ++++
> + 5 files changed, 156 insertions(+), 103 deletions(-)
> +
> +--- a/lib/expat.h
> ++++ b/lib/expat.h
> +@@ -16,6 +16,7 @@
> +    Copyright (c) 2016      Thomas Beutlich <tc@tbeu.de>
> +    Copyright (c) 2017      Rhodri James <rhodri@wildebeest.org.uk>
> +    Copyright (c) 2022      Thijs Schreijer <thijs@thijsschreijer.nl>
> ++   Copyright (c) 2023      Sony Corporation / Snild Dolkow <snild@sony.com>
> +    Licensed under the MIT license:
> +
> +    Permission is  hereby granted,  free of charge,  to any  person obtaining
> +@@ -1050,6 +1051,10 @@ XML_SetBillionLaughsAttackProtectionActi
> +     XML_Parser parser, unsigned long long activationThresholdBytes);
> + #endif
> +
> ++/* Added in Expat 2.6.0. */
> ++XMLPARSEAPI(XML_Bool)
> ++XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled);
> ++
> + /* Expat follows the semantic versioning convention.
> +    See http://semver.org.
> + */
> +--- a/lib/libexpat.def.cmake
> ++++ b/lib/libexpat.def.cmake
> +@@ -77,3 +77,5 @@ EXPORTS
> + ; added with version 2.4.0
> + @_EXPAT_COMMENT_DTD_OR_GE@ XML_SetBillionLaughsAttackProtectionActivationThreshold @69
> + @_EXPAT_COMMENT_DTD_OR_GE@ XML_SetBillionLaughsAttackProtectionMaximumAmplification @70
> ++; added with version 2.6.0
> ++  XML_SetReparseDeferralEnabled @71
> +--- a/lib/xmlparse.c
> ++++ b/lib/xmlparse.c
> +@@ -36,6 +36,7 @@
> +    Copyright (c) 2022      Samanta Navarro <ferivoz@riseup.net>
> +    Copyright (c) 2022      Jeffrey Walton <noloader@gmail.com>
> +    Copyright (c) 2022      Jann Horn <jannh@google.com>
> ++   Copyright (c) 2023      Sony Corporation / Snild Dolkow <snild@sony.com>
> +    Licensed under the MIT license:
> +
> +    Permission is  hereby granted,  free of charge,  to any  person obtaining
> +@@ -73,6 +74,7 @@
> + #  endif
> + #endif
> +
> ++#include <stdbool.h>
> + #include <stddef.h>
> + #include <string.h> /* memset(), memcpy() */
> + #include <assert.h>
> +@@ -196,6 +198,8 @@ typedef char ICHAR;
> + /* Do safe (NULL-aware) pointer arithmetic */
> + #define EXPAT_SAFE_PTR_DIFF(p, q) (((p) && (q)) ? ((p) - (q)) : 0)
> +
> ++#define EXPAT_MIN(a, b) (((a) < (b)) ? (a) : (b))
> ++
> + #include "internal.h"
> + #include "xmltok.h"
> + #include "xmlrole.h"
> +@@ -617,6 +621,9 @@ struct XML_ParserStruct {
> +   const char *m_bufferLim;
> +   XML_Index m_parseEndByteIndex;
> +   const char *m_parseEndPtr;
> ++  size_t m_partialTokenBytesBefore; /* used in heuristic to avoid O(n^2) */
> ++  XML_Bool m_reparseDeferralEnabled;
> ++  int m_lastBufferRequestSize;
> +   XML_Char *m_dataBuf;
> +   XML_Char *m_dataBufEnd;
> +   XML_StartElementHandler m_startElementHandler;
> +@@ -948,6 +955,46 @@ get_hash_secret_salt(XML_Parser parser)
> +   return parser->m_hash_secret_salt;
> + }
> +
> ++static enum XML_Error
> ++callProcessor(XML_Parser parser, const char *start, const char *end,
> ++              const char **endPtr) {
> ++  const size_t have_now = EXPAT_SAFE_PTR_DIFF(end, start);
> ++
> ++  if (parser->m_reparseDeferralEnabled
> ++      && ! parser->m_parsingStatus.finalBuffer) {
> ++    // Heuristic: don't try to parse a partial token again until the amount of
> ++    // available data has increased significantly.
> ++    const size_t had_before = parser->m_partialTokenBytesBefore;
> ++    // ...but *do* try anyway if we're close to causing a reallocation.
> ++    size_t available_buffer
> ++        = EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer);
> ++#if XML_CONTEXT_BYTES > 0
> ++    available_buffer -= EXPAT_MIN(available_buffer, XML_CONTEXT_BYTES);
> ++#endif
> ++    available_buffer
> ++        += EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd);
> ++    // m_lastBufferRequestSize is never assigned a value < 0, so the cast is ok
> ++    const bool enough
> ++        = (have_now >= 2 * had_before)
> ++          || ((size_t)parser->m_lastBufferRequestSize > available_buffer);
> ++
> ++    if (! enough) {
> ++      *endPtr = start; // callers may expect this to be set
> ++      return XML_ERROR_NONE;
> ++    }
> ++  }
> ++  const enum XML_Error ret = parser->m_processor(parser, start, end, endPtr);
> ++  if (ret == XML_ERROR_NONE) {
> ++    // if we consumed nothing, remember what we had on this parse attempt.
> ++    if (*endPtr == start) {
> ++      parser->m_partialTokenBytesBefore = have_now;
> ++    } else {
> ++      parser->m_partialTokenBytesBefore = 0;
> ++    }
> ++  }
> ++  return ret;
> ++}
> ++
> + static XML_Bool /* only valid for root parser */
> + startParsing(XML_Parser parser) {
> +   /* hash functions must be initialized before setContext() is called */
> +@@ -1129,6 +1176,9 @@ parserInit(XML_Parser parser, const XML_
> +   parser->m_bufferEnd = parser->m_buffer;
> +   parser->m_parseEndByteIndex = 0;
> +   parser->m_parseEndPtr = NULL;
> ++  parser->m_partialTokenBytesBefore = 0;
> ++  parser->m_reparseDeferralEnabled = XML_TRUE;
> ++  parser->m_lastBufferRequestSize = 0;
> +   parser->m_declElementType = NULL;
> +   parser->m_declAttributeId = NULL;
> +   parser->m_declEntity = NULL;
> +@@ -1298,6 +1348,7 @@ XML_ExternalEntityParserCreate(XML_Parse
> +      to worry which hash secrets each table has.
> +   */
> +   unsigned long oldhash_secret_salt;
> ++  XML_Bool oldReparseDeferralEnabled;
> +
> +   /* Validate the oldParser parameter before we pull everything out of it */
> +   if (oldParser == NULL)
> +@@ -1342,6 +1393,7 @@ XML_ExternalEntityParserCreate(XML_Parse
> +      to worry which hash secrets each table has.
> +   */
> +   oldhash_secret_salt = parser->m_hash_secret_salt;
> ++  oldReparseDeferralEnabled = parser->m_reparseDeferralEnabled;
> +
> + #ifdef XML_DTD
> +   if (! context)
> +@@ -1394,6 +1446,7 @@ XML_ExternalEntityParserCreate(XML_Parse
> +   parser->m_defaultExpandInternalEntities = oldDefaultExpandInternalEntities;
> +   parser->m_ns_triplets = oldns_triplets;
> +   parser->m_hash_secret_salt = oldhash_secret_salt;
> ++  parser->m_reparseDeferralEnabled = oldReparseDeferralEnabled;
> +   parser->m_parentParser = oldParser;
> + #ifdef XML_DTD
> +   parser->m_paramEntityParsing = oldParamEntityParsing;
> +@@ -1848,55 +1901,8 @@ XML_Parse(XML_Parser parser, const char
> +     parser->m_parsingStatus.parsing = XML_PARSING;
> +   }
> +
> +-  if (len == 0) {
> +-    parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
> +-    if (! isFinal)
> +-      return XML_STATUS_OK;
> +-    parser->m_positionPtr = parser->m_bufferPtr;
> +-    parser->m_parseEndPtr = parser->m_bufferEnd;
> +-
> +-    /* If data are left over from last buffer, and we now know that these
> +-       data are the final chunk of input, then we have to check them again
> +-       to detect errors based on that fact.
> +-    */
> +-    parser->m_errorCode
> +-        = parser->m_processor(parser, parser->m_bufferPtr,
> +-                              parser->m_parseEndPtr, &parser->m_bufferPtr);
> +-
> +-    if (parser->m_errorCode == XML_ERROR_NONE) {
> +-      switch (parser->m_parsingStatus.parsing) {
> +-      case XML_SUSPENDED:
> +-        /* It is hard to be certain, but it seems that this case
> +-         * cannot occur.  This code is cleaning up a previous parse
> +-         * with no new data (since len == 0).  Changing the parsing
> +-         * state requires getting to execute a handler function, and
> +-         * there doesn't seem to be an opportunity for that while in
> +-         * this circumstance.
> +-         *
> +-         * Given the uncertainty, we retain the code but exclude it
> +-         * from coverage tests.
> +-         *
> +-         * LCOV_EXCL_START
> +-         */
> +-        XmlUpdatePosition(parser->m_encoding, parser->m_positionPtr,
> +-                          parser->m_bufferPtr, &parser->m_position);
> +-        parser->m_positionPtr = parser->m_bufferPtr;
> +-        return XML_STATUS_SUSPENDED;
> +-        /* LCOV_EXCL_STOP */
> +-      case XML_INITIALIZED:
> +-      case XML_PARSING:
> +-        parser->m_parsingStatus.parsing = XML_FINISHED;
> +-        /* fall through */
> +-      default:
> +-        return XML_STATUS_OK;
> +-      }
> +-    }
> +-    parser->m_eventEndPtr = parser->m_eventPtr;
> +-    parser->m_processor = errorProcessor;
> +-    return XML_STATUS_ERROR;
> +-  }
> +-#ifndef XML_CONTEXT_BYTES
> +-  else if (parser->m_bufferPtr == parser->m_bufferEnd) {
> ++#if XML_CONTEXT_BYTES == 0
> ++  if (parser->m_bufferPtr == parser->m_bufferEnd) {
> +     const char *end;
> +     int nLeftOver;
> +     enum XML_Status result;
> +@@ -1907,12 +1913,15 @@ XML_Parse(XML_Parser parser, const char
> +       parser->m_processor = errorProcessor;
> +       return XML_STATUS_ERROR;
> +     }
> ++    // though this isn't a buffer request, we assume that `len` is the app's
> ++    // preferred buffer fill size, and therefore save it here.
> ++    parser->m_lastBufferRequestSize = len;
> +     parser->m_parseEndByteIndex += len;
> +     parser->m_positionPtr = s;
> +     parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
> +
> +     parser->m_errorCode
> +-        = parser->m_processor(parser, s, parser->m_parseEndPtr = s + len, &end);
> ++        = callProcessor(parser, s, parser->m_parseEndPtr = s + len, &end);
> +
> +     if (parser->m_errorCode != XML_ERROR_NONE) {
> +       parser->m_eventEndPtr = parser->m_eventPtr;
> +@@ -1939,23 +1948,25 @@ XML_Parse(XML_Parser parser, const char
> +                       &parser->m_position);
> +     nLeftOver = s + len - end;
> +     if (nLeftOver) {
> +-      if (parser->m_buffer == NULL
> +-          || nLeftOver > parser->m_bufferLim - parser->m_buffer) {
> +-        /* avoid _signed_ integer overflow */
> +-        char *temp = NULL;
> +-        const int bytesToAllocate = (int)((unsigned)len * 2U);
> +-        if (bytesToAllocate > 0) {
> +-          temp = (char *)REALLOC(parser, parser->m_buffer, bytesToAllocate);
> +-        }
> +-        if (temp == NULL) {
> +-          parser->m_errorCode = XML_ERROR_NO_MEMORY;
> +-          parser->m_eventPtr = parser->m_eventEndPtr = NULL;
> +-          parser->m_processor = errorProcessor;
> +-          return XML_STATUS_ERROR;
> +-        }
> +-        parser->m_buffer = temp;
> +-        parser->m_bufferLim = parser->m_buffer + bytesToAllocate;
> ++      // Back up and restore the parsing status to avoid XML_ERROR_SUSPENDED
> ++      // (and XML_ERROR_FINISHED) from XML_GetBuffer.
> ++      const enum XML_Parsing originalStatus = parser->m_parsingStatus.parsing;
> ++      parser->m_parsingStatus.parsing = XML_PARSING;
> ++      void *const temp = XML_GetBuffer(parser, nLeftOver);
> ++      parser->m_parsingStatus.parsing = originalStatus;
> ++      // GetBuffer may have overwritten this, but we want to remember what the
> ++      // app requested, not how many bytes were left over after parsing.
> ++      parser->m_lastBufferRequestSize = len;
> ++      if (temp == NULL) {
> ++        // NOTE: parser->m_errorCode has already been set by XML_GetBuffer().
> ++        parser->m_eventPtr = parser->m_eventEndPtr = NULL;
> ++        parser->m_processor = errorProcessor;
> ++        return XML_STATUS_ERROR;
> +       }
> ++      // Since we know that the buffer was empty and XML_CONTEXT_BYTES is 0, we
> ++      // don't have any data to preserve, and can copy straight into the start
> ++      // of the buffer rather than the GetBuffer return pointer (which may be
> ++      // pointing further into the allocated buffer).
> +       memcpy(parser->m_buffer, end, nLeftOver);
> +     }
> +     parser->m_bufferPtr = parser->m_buffer;
> +@@ -1966,16 +1977,15 @@ XML_Parse(XML_Parser parser, const char
> +     parser->m_eventEndPtr = parser->m_bufferPtr;
> +     return result;
> +   }
> +-#endif /* not defined XML_CONTEXT_BYTES */
> +-  else {
> +-    void *buff = XML_GetBuffer(parser, len);
> +-    if (buff == NULL)
> +-      return XML_STATUS_ERROR;
> +-    else {
> +-      memcpy(buff, s, len);
> +-      return XML_ParseBuffer(parser, len, isFinal);
> +-    }
> ++#endif /* XML_CONTEXT_BYTES == 0 */
> ++  void *buff = XML_GetBuffer(parser, len);
> ++  if (buff == NULL)
> ++    return XML_STATUS_ERROR;
> ++  if (len > 0) {
> ++    assert(s != NULL); // make sure s==NULL && len!=0 was rejected above
> ++    memcpy(buff, s, len);
> +   }
> ++  return XML_ParseBuffer(parser, len, isFinal);
> + }
> +
> + enum XML_Status XMLCALL
> +@@ -2015,8 +2025,8 @@ XML_ParseBuffer(XML_Parser parser, int l
> +   parser->m_parseEndByteIndex += len;
> +   parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
> +
> +-  parser->m_errorCode = parser->m_processor(
> +-      parser, start, parser->m_parseEndPtr, &parser->m_bufferPtr);
> ++  parser->m_errorCode = callProcessor(parser, start, parser->m_parseEndPtr,
> ++                                      &parser->m_bufferPtr);
> +
> +   if (parser->m_errorCode != XML_ERROR_NONE) {
> +     parser->m_eventEndPtr = parser->m_eventPtr;
> +@@ -2061,8 +2071,12 @@ XML_GetBuffer(XML_Parser parser, int len
> +   default:;
> +   }
> +
> +-  if (len > EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd)) {
> +-#ifdef XML_CONTEXT_BYTES
> ++  // whether or not the request succeeds, `len` seems to be the app's preferred
> ++  // buffer fill size; remember it.
> ++  parser->m_lastBufferRequestSize = len;
> ++  if (len > EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd)
> ++      || parser->m_buffer == NULL) {
> ++#if XML_CONTEXT_BYTES > 0
> +     int keep;
> + #endif /* defined XML_CONTEXT_BYTES */
> +     /* Do not invoke signed arithmetic overflow: */
> +@@ -2083,10 +2097,11 @@ XML_GetBuffer(XML_Parser parser, int len
> +       return NULL;
> +     }
> +     neededSize += keep;
> +-#endif /* defined XML_CONTEXT_BYTES */
> +-    if (neededSize
> +-        <= EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer)) {
> +-#ifdef XML_CONTEXT_BYTES
> ++#endif /* XML_CONTEXT_BYTES > 0 */
> ++    if (parser->m_buffer && parser->m_bufferPtr
> ++        && neededSize
> ++               <= EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer)) {
> ++#if XML_CONTEXT_BYTES > 0
> +       if (keep < EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer)) {
> +         int offset
> +             = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer)
> +@@ -2099,19 +2114,17 @@ XML_GetBuffer(XML_Parser parser, int len
> +         parser->m_bufferPtr -= offset;
> +       }
> + #else
> +-      if (parser->m_buffer && parser->m_bufferPtr) {
> +-        memmove(parser->m_buffer, parser->m_bufferPtr,
> +-                EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr));
> +-        parser->m_bufferEnd
> +-            = parser->m_buffer
> +-              + EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr);
> +-        parser->m_bufferPtr = parser->m_buffer;
> +-      }
> +-#endif /* not defined XML_CONTEXT_BYTES */
> ++      memmove(parser->m_buffer, parser->m_bufferPtr,
> ++              EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr));
> ++      parser->m_bufferEnd
> ++          = parser->m_buffer
> ++            + EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr);
> ++      parser->m_bufferPtr = parser->m_buffer;
> ++#endif /* XML_CONTEXT_BYTES > 0 */
> +     } else {
> +       char *newBuf;
> +       int bufferSize
> +-          = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferPtr);
> ++          = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer);
> +       if (bufferSize == 0)
> +         bufferSize = INIT_BUFFER_SIZE;
> +       do {
> +@@ -2208,7 +2221,7 @@ XML_ResumeParser(XML_Parser parser) {
> +   }
> +   parser->m_parsingStatus.parsing = XML_PARSING;
> +
> +-  parser->m_errorCode = parser->m_processor(
> ++  parser->m_errorCode = callProcessor(
> +       parser, parser->m_bufferPtr, parser->m_parseEndPtr, &parser->m_bufferPtr);
> +
> +   if (parser->m_errorCode != XML_ERROR_NONE) {
> +@@ -2576,6 +2576,15 @@ XML_SetBillionLaughsAttackProtectionActi
> + }
> + #endif /* XML_GE == 1 */
> +
> ++XML_Bool XMLCALL
> ++XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled) {
> ++  if (parser != NULL && (enabled == XML_TRUE || enabled == XML_FALSE)) {
> ++    parser->m_reparseDeferralEnabled = enabled;
> ++    return XML_TRUE;
> ++  }
> ++  return XML_FALSE;
> ++}
> ++
> + /* Initially tag->rawName always points into the parse buffer;
> +    for those TAG instances opened while the current parse buffer was
> +    processed, and not yet closed, we need to store tag->rawName in a more
> +@@ -4497,15 +4506,15 @@ entityValueInitProcessor(XML_Parser pars
> +       parser->m_processor = entityValueProcessor;
> +       return entityValueProcessor(parser, next, end, nextPtr);
> +     }
> +-    /* If we are at the end of the buffer, this would cause XmlPrologTok to
> +-       return XML_TOK_NONE on the next call, which would then cause the
> +-       function to exit with *nextPtr set to s - that is what we want for other
> +-       tokens, but not for the BOM - we would rather like to skip it;
> +-       then, when this routine is entered the next time, XmlPrologTok will
> +-       return XML_TOK_INVALID, since the BOM is still in the buffer
> ++    /* XmlPrologTok has now set the encoding based on the BOM it found, and we
> ++       must move s and nextPtr forward to consume the BOM.
> ++
> ++       If we didn't, and got XML_TOK_NONE from the next XmlPrologTok call, we
> ++       would leave the BOM in the buffer and return. On the next call to this
> ++       function, our XmlPrologTok call would return XML_TOK_INVALID, since it
> ++       is not valid to have multiple BOMs.
> +     */
> +-    else if (tok == XML_TOK_BOM && next == end
> +-             && ! parser->m_parsingStatus.finalBuffer) {
> ++    else if (tok == XML_TOK_BOM) {
> + #  if XML_GE == 1
> +       if (! accountingDiffTolerated(parser, tok, s, next, __LINE__,
> +                                     XML_ACCOUNT_DIRECT)) {
> +@@ -4500,7 +4522,7 @@ entityValueInitProcessor(XML_Parser pars
> + #  endif
> +
> +       *nextPtr = next;
> +-      return XML_ERROR_NONE;
> ++      s = next;
> +     }
> +     /* If we get this token, we have the start of what might be a
> +        normal tag, but not a declaration (i.e. it doesn't begin with
> +--- a/xmlwf/xmlwf.c
> ++++ b/xmlwf/xmlwf.c
> +@@ -914,6 +914,9 @@ usage(const XML_Char *prog, int rc) {
> +       T("  -a FACTOR     set maximum tolerated [a]mplification factor (default: 100.0)\n")
> +       T("  -b BYTES      set number of output [b]ytes needed to activate (default: 8 MiB)\n")
> +       T("\n")
> ++      T("reparse deferral:\n")
> ++      T("  -q             disable reparse deferral, and allow [q]uadratic parse runtime with large tokens\n")
> ++      T("\n")
> +       T("info arguments:\n")
> +       T("  -h            show this [h]elp message and exit\n")
> +       T("  -v            show program's [v]ersion number and exit\n")
> +@@ -967,6 +970,8 @@ tmain(int argc, XML_Char **argv) {
> +   unsigned long long attackThresholdBytes;
> +   XML_Bool attackThresholdGiven = XML_FALSE;
> +
> ++  XML_Bool disableDeferral = XML_FALSE;
> ++
> +   int exitCode = XMLWF_EXIT_SUCCESS;
> +   enum XML_ParamEntityParsing paramEntityParsing
> +       = XML_PARAM_ENTITY_PARSING_NEVER;
> +@@ -1089,6 +1094,11 @@ tmain(int argc, XML_Char **argv) {
> + #endif
> +       break;
> +     }
> ++    case T('q'): {
> ++      disableDeferral = XML_TRUE;
> ++      j++;
> ++      break;
> ++    }
> +     case T('\0'):
> +       if (j > 1) {
> +         i++;
> +@@ -1134,6 +1144,16 @@ tmain(int argc, XML_Char **argv) {
> + #endif
> +     }
> +
> ++    if (disableDeferral) {
> ++      const XML_Bool success = XML_SetReparseDeferralEnabled(parser, XML_FALSE);
> ++      if (! success) {
> ++        // This prevents tperror(..) from reporting misleading "[..]: Success"
> ++        errno = EINVAL;
> ++        tperror(T("Failed to disable reparse deferral"));
> ++        exit(XMLWF_EXIT_INTERNAL_ERROR);
> ++      }
> ++    }
> ++
> +     if (requireStandalone)
> +       XML_SetNotStandaloneHandler(parser, notStandalone);
> +     XML_SetParamEntityParsing(parser, paramEntityParsing);
> +--- a/xmlwf/xmlwf_helpgen.py
> ++++ b/xmlwf/xmlwf_helpgen.py
> +@@ -81,6 +81,10 @@ billion_laughs.add_argument('-a', metava
> +                             help='set maximum tolerated [a]mplification factor (default: 100.0)')
> + billion_laughs.add_argument('-b', metavar='BYTES', help='set number of output [b]ytes needed to activate (default: 8 MiB)')
> +
> ++reparse_deferral = parser.add_argument_group('reparse deferral')
> ++reparse_deferral.add_argument('-q', metavar='FACTOR',
> ++                            help='disable reparse deferral, and allow [q]uadratic parse runtime with large tokens')
> ++
> + parser.add_argument('files', metavar='FILE', nargs='*', help='file to process (default: STDIN)')
> +
> + info = parser.add_argument_group('info arguments')
> diff --git a/meta/recipes-core/expat/expat_2.5.0.bb b/meta/recipes-core/expat/expat_2.5.0.bb
> index 31e989cfe2..09bc7a0a0b 100644
> --- a/meta/recipes-core/expat/expat_2.5.0.bb
> +++ b/meta/recipes-core/expat/expat_2.5.0.bb
> @@ -22,6 +22,7 @@ SRC_URI = "https://github.com/libexpat/libexpat/releases/download/R_${VERSION_TA
>            file://CVE-2023-52426-009.patch \
>            file://CVE-2023-52426-010.patch \
>            file://CVE-2023-52426-011.patch \
> +           file://CVE-2023-52425.patch \
>             "
>
>  UPSTREAM_CHECK_URI = "https://github.com/libexpat/libexpat/releases/"
> --
> 2.17.1
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#203172): https://lists.openembedded.org/g/openembedded-core/message/203172
> Mute This Topic: https://lists.openembedded.org/mt/107806028/3620601
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [steve@sakoman.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/meta/recipes-core/expat/expat/CVE-2023-52425.patch b/meta/recipes-core/expat/expat/CVE-2023-52425.patch
new file mode 100644
index 0000000000..00bc464173
--- /dev/null
+++ b/meta/recipes-core/expat/expat/CVE-2023-52425.patch
@@ -0,0 +1,581 @@ 
+Backport of https://github.com/libexpat/libexpat/pull/789
+
+From 9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1 Mon Sep 17 00:00:00 2001
+From: Snild Dolkow <snild@sony.com>
+Date: Thu, 17 Aug 2023 16:25:26 +0200
+Subject: [PATCH] Skip parsing after repeated partials on the same token
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When the parse buffer contains the starting bytes of a token but not
+all of them, we cannot parse the token to completion. We call this a
+partial token.  When this happens, the parse position is reset to the
+start of the token, and the parse() call returns. The client is then
+expected to provide more data and call parse() again.
+
+In extreme cases, this means that the bytes of a token may be parsed
+many times: once for every buffer refill required before the full token
+is present in the buffer.
+
+Math:
+  Assume there's a token of T bytes
+  Assume the client fills the buffer in chunks of X bytes
+  We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=)
+  That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression)
+  While it is alleviated by larger refills, this amounts to O(T²)
+
+Expat grows its internal buffer by doubling it when necessary, but has
+no way to inform the client about how much space is available. Instead,
+we add a heuristic that skips parsing when we've repeatedly stopped on
+an incomplete token. Specifically:
+
+ * Only try to parse if we have a certain amount of data buffered
+ * Every time we stop on an incomplete token, double the threshold
+ * As soon as any token completes, the threshold is reset
+
+This means that when we get stuck on an incomplete token, the threshold
+grows exponentially, effectively making the client perform larger buffer
+fills, limiting how many times we can end up re-parsing the same bytes.
+
+Math:
+  Assume there's a token of T bytes
+  Assume the client fills the buffer in chunks of X bytes
+  We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger)
+  That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X
+  This is equal to 2T-X, which amounts to O(T)
+
+We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to
+increase performance further, at the cost of further increasing the
+risk of growing the buffer more than necessary. This can easily be
+adjusted in the future, if desired.
+
+This is all completely transparent to the client, except for:
+1. possible delay of some callbacks (when our heuristic overshoots)
+2. apps that never do isFinal=XML_TRUE could miss data at the end
+
+For the affected testdata, this change shows a 100-400x speedup.
+The recset.xml benchmark shows no clear change either way.
+
+Before:
+benchmark -n ../testdata/largefiles/recset.xml 65535 3
+  3 loops, with buffer size 65535. Average time per loop: 0.270223
+benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 15.033048
+benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.018027
+benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 11.775362
+benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 11.711414
+benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.019362
+
+After:
+./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3
+  3 loops, with buffer size 65535. Average time per loop: 0.269030
+./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.044794
+./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.016377
+./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.027022
+./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.099360
+./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
+  3 loops, with buffer size 4096. Average time per loop: 0.017956
+
+CVE: CVE-2023-52425
+Upstream-Status: Backport [http://archive.ubuntu.com/ubuntu/pool/main/e/expat/expat_2.5.0-2ubuntu0.1.debian.tar.xz]
+Comments: Hunks refreshed.
+          Patch difference/summary has also been modified.
+Signed-off-by: Bhabu Bindu <bhabu.bindu@kpit.com>
+
+---
+ lib/expat.h            | 5   +++++
+ lib/libexpat.def.cmake | 2   ++
+ lib/xmlparse.c         | 228 ++++++++++++++++++++++++++++++++++++++++++----------------------------------
+ xmlwf/xmlwf.c          | 20  ++++++++++++++++++++
+ xmlwf/xmlwf_helpgen.py | 4   ++++
+ 5 files changed, 156 insertions(+), 103 deletions(-)
+
+--- a/lib/expat.h
++++ b/lib/expat.h
+@@ -16,6 +16,7 @@
+    Copyright (c) 2016      Thomas Beutlich <tc@tbeu.de>
+    Copyright (c) 2017      Rhodri James <rhodri@wildebeest.org.uk>
+    Copyright (c) 2022      Thijs Schreijer <thijs@thijsschreijer.nl>
++   Copyright (c) 2023      Sony Corporation / Snild Dolkow <snild@sony.com>
+    Licensed under the MIT license:
+ 
+    Permission is  hereby granted,  free of charge,  to any  person obtaining
+@@ -1050,6 +1051,10 @@ XML_SetBillionLaughsAttackProtectionActi
+     XML_Parser parser, unsigned long long activationThresholdBytes);
+ #endif
+ 
++/* Added in Expat 2.6.0. */
++XMLPARSEAPI(XML_Bool)
++XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled);
++
+ /* Expat follows the semantic versioning convention.
+    See http://semver.org.
+ */
+--- a/lib/libexpat.def.cmake
++++ b/lib/libexpat.def.cmake
+@@ -77,3 +77,5 @@ EXPORTS
+ ; added with version 2.4.0
+ @_EXPAT_COMMENT_DTD_OR_GE@ XML_SetBillionLaughsAttackProtectionActivationThreshold @69
+ @_EXPAT_COMMENT_DTD_OR_GE@ XML_SetBillionLaughsAttackProtectionMaximumAmplification @70
++; added with version 2.6.0
++  XML_SetReparseDeferralEnabled @71
+--- a/lib/xmlparse.c
++++ b/lib/xmlparse.c
+@@ -36,6 +36,7 @@
+    Copyright (c) 2022      Samanta Navarro <ferivoz@riseup.net>
+    Copyright (c) 2022      Jeffrey Walton <noloader@gmail.com>
+    Copyright (c) 2022      Jann Horn <jannh@google.com>
++   Copyright (c) 2023      Sony Corporation / Snild Dolkow <snild@sony.com>
+    Licensed under the MIT license:
+ 
+    Permission is  hereby granted,  free of charge,  to any  person obtaining
+@@ -73,6 +74,7 @@
+ #  endif
+ #endif
+ 
++#include <stdbool.h>
+ #include <stddef.h>
+ #include <string.h> /* memset(), memcpy() */
+ #include <assert.h>
+@@ -196,6 +198,8 @@ typedef char ICHAR;
+ /* Do safe (NULL-aware) pointer arithmetic */
+ #define EXPAT_SAFE_PTR_DIFF(p, q) (((p) && (q)) ? ((p) - (q)) : 0)
+ 
++#define EXPAT_MIN(a, b) (((a) < (b)) ? (a) : (b))
++
+ #include "internal.h"
+ #include "xmltok.h"
+ #include "xmlrole.h"
+@@ -617,6 +621,9 @@ struct XML_ParserStruct {
+   const char *m_bufferLim;
+   XML_Index m_parseEndByteIndex;
+   const char *m_parseEndPtr;
++  size_t m_partialTokenBytesBefore; /* used in heuristic to avoid O(n^2) */
++  XML_Bool m_reparseDeferralEnabled;
++  int m_lastBufferRequestSize;
+   XML_Char *m_dataBuf;
+   XML_Char *m_dataBufEnd;
+   XML_StartElementHandler m_startElementHandler;
+@@ -948,6 +955,46 @@ get_hash_secret_salt(XML_Parser parser)
+   return parser->m_hash_secret_salt;
+ }
+ 
++static enum XML_Error
++callProcessor(XML_Parser parser, const char *start, const char *end,
++              const char **endPtr) {
++  const size_t have_now = EXPAT_SAFE_PTR_DIFF(end, start);
++
++  if (parser->m_reparseDeferralEnabled
++      && ! parser->m_parsingStatus.finalBuffer) {
++    // Heuristic: don't try to parse a partial token again until the amount of
++    // available data has increased significantly.
++    const size_t had_before = parser->m_partialTokenBytesBefore;
++    // ...but *do* try anyway if we're close to causing a reallocation.
++    size_t available_buffer
++        = EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer);
++#if XML_CONTEXT_BYTES > 0
++    available_buffer -= EXPAT_MIN(available_buffer, XML_CONTEXT_BYTES);
++#endif
++    available_buffer
++        += EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd);
++    // m_lastBufferRequestSize is never assigned a value < 0, so the cast is ok
++    const bool enough
++        = (have_now >= 2 * had_before)
++          || ((size_t)parser->m_lastBufferRequestSize > available_buffer);
++
++    if (! enough) {
++      *endPtr = start; // callers may expect this to be set
++      return XML_ERROR_NONE;
++    }
++  }
++  const enum XML_Error ret = parser->m_processor(parser, start, end, endPtr);
++  if (ret == XML_ERROR_NONE) {
++    // if we consumed nothing, remember what we had on this parse attempt.
++    if (*endPtr == start) {
++      parser->m_partialTokenBytesBefore = have_now;
++    } else {
++      parser->m_partialTokenBytesBefore = 0;
++    }
++  }
++  return ret;
++}
++
+ static XML_Bool /* only valid for root parser */
+ startParsing(XML_Parser parser) {
+   /* hash functions must be initialized before setContext() is called */
+@@ -1129,6 +1176,9 @@ parserInit(XML_Parser parser, const XML_
+   parser->m_bufferEnd = parser->m_buffer;
+   parser->m_parseEndByteIndex = 0;
+   parser->m_parseEndPtr = NULL;
++  parser->m_partialTokenBytesBefore = 0;
++  parser->m_reparseDeferralEnabled = XML_TRUE;
++  parser->m_lastBufferRequestSize = 0;
+   parser->m_declElementType = NULL;
+   parser->m_declAttributeId = NULL;
+   parser->m_declEntity = NULL;
+@@ -1298,6 +1348,7 @@ XML_ExternalEntityParserCreate(XML_Parse
+      to worry which hash secrets each table has.
+   */
+   unsigned long oldhash_secret_salt;
++  XML_Bool oldReparseDeferralEnabled;
+ 
+   /* Validate the oldParser parameter before we pull everything out of it */
+   if (oldParser == NULL)
+@@ -1342,6 +1393,7 @@ XML_ExternalEntityParserCreate(XML_Parse
+      to worry which hash secrets each table has.
+   */
+   oldhash_secret_salt = parser->m_hash_secret_salt;
++  oldReparseDeferralEnabled = parser->m_reparseDeferralEnabled;
+ 
+ #ifdef XML_DTD
+   if (! context)
+@@ -1394,6 +1446,7 @@ XML_ExternalEntityParserCreate(XML_Parse
+   parser->m_defaultExpandInternalEntities = oldDefaultExpandInternalEntities;
+   parser->m_ns_triplets = oldns_triplets;
+   parser->m_hash_secret_salt = oldhash_secret_salt;
++  parser->m_reparseDeferralEnabled = oldReparseDeferralEnabled;
+   parser->m_parentParser = oldParser;
+ #ifdef XML_DTD
+   parser->m_paramEntityParsing = oldParamEntityParsing;
+@@ -1848,55 +1901,8 @@ XML_Parse(XML_Parser parser, const char
+     parser->m_parsingStatus.parsing = XML_PARSING;
+   }
+ 
+-  if (len == 0) {
+-    parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
+-    if (! isFinal)
+-      return XML_STATUS_OK;
+-    parser->m_positionPtr = parser->m_bufferPtr;
+-    parser->m_parseEndPtr = parser->m_bufferEnd;
+-
+-    /* If data are left over from last buffer, and we now know that these
+-       data are the final chunk of input, then we have to check them again
+-       to detect errors based on that fact.
+-    */
+-    parser->m_errorCode
+-        = parser->m_processor(parser, parser->m_bufferPtr,
+-                              parser->m_parseEndPtr, &parser->m_bufferPtr);
+-
+-    if (parser->m_errorCode == XML_ERROR_NONE) {
+-      switch (parser->m_parsingStatus.parsing) {
+-      case XML_SUSPENDED:
+-        /* It is hard to be certain, but it seems that this case
+-         * cannot occur.  This code is cleaning up a previous parse
+-         * with no new data (since len == 0).  Changing the parsing
+-         * state requires getting to execute a handler function, and
+-         * there doesn't seem to be an opportunity for that while in
+-         * this circumstance.
+-         *
+-         * Given the uncertainty, we retain the code but exclude it
+-         * from coverage tests.
+-         *
+-         * LCOV_EXCL_START
+-         */
+-        XmlUpdatePosition(parser->m_encoding, parser->m_positionPtr,
+-                          parser->m_bufferPtr, &parser->m_position);
+-        parser->m_positionPtr = parser->m_bufferPtr;
+-        return XML_STATUS_SUSPENDED;
+-        /* LCOV_EXCL_STOP */
+-      case XML_INITIALIZED:
+-      case XML_PARSING:
+-        parser->m_parsingStatus.parsing = XML_FINISHED;
+-        /* fall through */
+-      default:
+-        return XML_STATUS_OK;
+-      }
+-    }
+-    parser->m_eventEndPtr = parser->m_eventPtr;
+-    parser->m_processor = errorProcessor;
+-    return XML_STATUS_ERROR;
+-  }
+-#ifndef XML_CONTEXT_BYTES
+-  else if (parser->m_bufferPtr == parser->m_bufferEnd) {
++#if XML_CONTEXT_BYTES == 0
++  if (parser->m_bufferPtr == parser->m_bufferEnd) {
+     const char *end;
+     int nLeftOver;
+     enum XML_Status result;
+@@ -1907,12 +1913,15 @@ XML_Parse(XML_Parser parser, const char
+       parser->m_processor = errorProcessor;
+       return XML_STATUS_ERROR;
+     }
++    // though this isn't a buffer request, we assume that `len` is the app's
++    // preferred buffer fill size, and therefore save it here.
++    parser->m_lastBufferRequestSize = len;
+     parser->m_parseEndByteIndex += len;
+     parser->m_positionPtr = s;
+     parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
+ 
+     parser->m_errorCode
+-        = parser->m_processor(parser, s, parser->m_parseEndPtr = s + len, &end);
++        = callProcessor(parser, s, parser->m_parseEndPtr = s + len, &end);
+ 
+     if (parser->m_errorCode != XML_ERROR_NONE) {
+       parser->m_eventEndPtr = parser->m_eventPtr;
+@@ -1939,23 +1948,25 @@ XML_Parse(XML_Parser parser, const char
+                       &parser->m_position);
+     nLeftOver = s + len - end;
+     if (nLeftOver) {
+-      if (parser->m_buffer == NULL
+-          || nLeftOver > parser->m_bufferLim - parser->m_buffer) {
+-        /* avoid _signed_ integer overflow */
+-        char *temp = NULL;
+-        const int bytesToAllocate = (int)((unsigned)len * 2U);
+-        if (bytesToAllocate > 0) {
+-          temp = (char *)REALLOC(parser, parser->m_buffer, bytesToAllocate);
+-        }
+-        if (temp == NULL) {
+-          parser->m_errorCode = XML_ERROR_NO_MEMORY;
+-          parser->m_eventPtr = parser->m_eventEndPtr = NULL;
+-          parser->m_processor = errorProcessor;
+-          return XML_STATUS_ERROR;
+-        }
+-        parser->m_buffer = temp;
+-        parser->m_bufferLim = parser->m_buffer + bytesToAllocate;
++      // Back up and restore the parsing status to avoid XML_ERROR_SUSPENDED
++      // (and XML_ERROR_FINISHED) from XML_GetBuffer.
++      const enum XML_Parsing originalStatus = parser->m_parsingStatus.parsing;
++      parser->m_parsingStatus.parsing = XML_PARSING;
++      void *const temp = XML_GetBuffer(parser, nLeftOver);
++      parser->m_parsingStatus.parsing = originalStatus;
++      // GetBuffer may have overwritten this, but we want to remember what the
++      // app requested, not how many bytes were left over after parsing.
++      parser->m_lastBufferRequestSize = len;
++      if (temp == NULL) {
++        // NOTE: parser->m_errorCode has already been set by XML_GetBuffer().
++        parser->m_eventPtr = parser->m_eventEndPtr = NULL;
++        parser->m_processor = errorProcessor;
++        return XML_STATUS_ERROR;
+       }
++      // Since we know that the buffer was empty and XML_CONTEXT_BYTES is 0, we
++      // don't have any data to preserve, and can copy straight into the start
++      // of the buffer rather than the GetBuffer return pointer (which may be
++      // pointing further into the allocated buffer).
+       memcpy(parser->m_buffer, end, nLeftOver);
+     }
+     parser->m_bufferPtr = parser->m_buffer;
+@@ -1966,16 +1977,15 @@ XML_Parse(XML_Parser parser, const char
+     parser->m_eventEndPtr = parser->m_bufferPtr;
+     return result;
+   }
+-#endif /* not defined XML_CONTEXT_BYTES */
+-  else {
+-    void *buff = XML_GetBuffer(parser, len);
+-    if (buff == NULL)
+-      return XML_STATUS_ERROR;
+-    else {
+-      memcpy(buff, s, len);
+-      return XML_ParseBuffer(parser, len, isFinal);
+-    }
++#endif /* XML_CONTEXT_BYTES == 0 */
++  void *buff = XML_GetBuffer(parser, len);
++  if (buff == NULL)
++    return XML_STATUS_ERROR;
++  if (len > 0) {
++    assert(s != NULL); // make sure s==NULL && len!=0 was rejected above
++    memcpy(buff, s, len);
+   }
++  return XML_ParseBuffer(parser, len, isFinal);
+ }
+ 
+ enum XML_Status XMLCALL
+@@ -2015,8 +2025,8 @@ XML_ParseBuffer(XML_Parser parser, int l
+   parser->m_parseEndByteIndex += len;
+   parser->m_parsingStatus.finalBuffer = (XML_Bool)isFinal;
+ 
+-  parser->m_errorCode = parser->m_processor(
+-      parser, start, parser->m_parseEndPtr, &parser->m_bufferPtr);
++  parser->m_errorCode = callProcessor(parser, start, parser->m_parseEndPtr,
++                                      &parser->m_bufferPtr);
+ 
+   if (parser->m_errorCode != XML_ERROR_NONE) {
+     parser->m_eventEndPtr = parser->m_eventPtr;
+@@ -2061,8 +2071,12 @@ XML_GetBuffer(XML_Parser parser, int len
+   default:;
+   }
+ 
+-  if (len > EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd)) {
+-#ifdef XML_CONTEXT_BYTES
++  // whether or not the request succeeds, `len` seems to be the app's preferred
++  // buffer fill size; remember it.
++  parser->m_lastBufferRequestSize = len;
++  if (len > EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferEnd)
++      || parser->m_buffer == NULL) {
++#if XML_CONTEXT_BYTES > 0
+     int keep;
+ #endif /* defined XML_CONTEXT_BYTES */
+     /* Do not invoke signed arithmetic overflow: */
+@@ -2083,10 +2097,11 @@ XML_GetBuffer(XML_Parser parser, int len
+       return NULL;
+     }
+     neededSize += keep;
+-#endif /* defined XML_CONTEXT_BYTES */
+-    if (neededSize
+-        <= EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer)) {
+-#ifdef XML_CONTEXT_BYTES
++#endif /* XML_CONTEXT_BYTES > 0 */
++    if (parser->m_buffer && parser->m_bufferPtr
++        && neededSize
++               <= EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer)) {
++#if XML_CONTEXT_BYTES > 0
+       if (keep < EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer)) {
+         int offset
+             = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferPtr, parser->m_buffer)
+@@ -2099,19 +2114,17 @@ XML_GetBuffer(XML_Parser parser, int len
+         parser->m_bufferPtr -= offset;
+       }
+ #else
+-      if (parser->m_buffer && parser->m_bufferPtr) {
+-        memmove(parser->m_buffer, parser->m_bufferPtr,
+-                EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr));
+-        parser->m_bufferEnd
+-            = parser->m_buffer
+-              + EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr);
+-        parser->m_bufferPtr = parser->m_buffer;
+-      }
+-#endif /* not defined XML_CONTEXT_BYTES */
++      memmove(parser->m_buffer, parser->m_bufferPtr,
++              EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr));
++      parser->m_bufferEnd
++          = parser->m_buffer
++            + EXPAT_SAFE_PTR_DIFF(parser->m_bufferEnd, parser->m_bufferPtr);
++      parser->m_bufferPtr = parser->m_buffer;
++#endif /* XML_CONTEXT_BYTES > 0 */
+     } else {
+       char *newBuf;
+       int bufferSize
+-          = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_bufferPtr);
++          = (int)EXPAT_SAFE_PTR_DIFF(parser->m_bufferLim, parser->m_buffer);
+       if (bufferSize == 0)
+         bufferSize = INIT_BUFFER_SIZE;
+       do {
+@@ -2208,7 +2221,7 @@ XML_ResumeParser(XML_Parser parser) {
+   }
+   parser->m_parsingStatus.parsing = XML_PARSING;
+ 
+-  parser->m_errorCode = parser->m_processor(
++  parser->m_errorCode = callProcessor(
+       parser, parser->m_bufferPtr, parser->m_parseEndPtr, &parser->m_bufferPtr);
+ 
+   if (parser->m_errorCode != XML_ERROR_NONE) {
+@@ -2576,6 +2576,15 @@ XML_SetBillionLaughsAttackProtectionActi
+ }
+ #endif /* XML_GE == 1 */
+ 
++XML_Bool XMLCALL
++XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled) {
++  if (parser != NULL && (enabled == XML_TRUE || enabled == XML_FALSE)) {
++    parser->m_reparseDeferralEnabled = enabled;
++    return XML_TRUE;
++  }
++  return XML_FALSE;
++}
++
+ /* Initially tag->rawName always points into the parse buffer;
+    for those TAG instances opened while the current parse buffer was
+    processed, and not yet closed, we need to store tag->rawName in a more
+@@ -4497,15 +4506,15 @@ entityValueInitProcessor(XML_Parser pars
+       parser->m_processor = entityValueProcessor;
+       return entityValueProcessor(parser, next, end, nextPtr);
+     }
+-    /* If we are at the end of the buffer, this would cause XmlPrologTok to
+-       return XML_TOK_NONE on the next call, which would then cause the
+-       function to exit with *nextPtr set to s - that is what we want for other
+-       tokens, but not for the BOM - we would rather like to skip it;
+-       then, when this routine is entered the next time, XmlPrologTok will
+-       return XML_TOK_INVALID, since the BOM is still in the buffer
++    /* XmlPrologTok has now set the encoding based on the BOM it found, and we
++       must move s and nextPtr forward to consume the BOM.
++
++       If we didn't, and got XML_TOK_NONE from the next XmlPrologTok call, we
++       would leave the BOM in the buffer and return. On the next call to this
++       function, our XmlPrologTok call would return XML_TOK_INVALID, since it
++       is not valid to have multiple BOMs.
+     */
+-    else if (tok == XML_TOK_BOM && next == end
+-             && ! parser->m_parsingStatus.finalBuffer) {
++    else if (tok == XML_TOK_BOM) {
+ #  if XML_GE == 1
+       if (! accountingDiffTolerated(parser, tok, s, next, __LINE__,
+                                     XML_ACCOUNT_DIRECT)) {
+@@ -4500,7 +4522,7 @@ entityValueInitProcessor(XML_Parser pars
+ #  endif
+ 
+       *nextPtr = next;
+-      return XML_ERROR_NONE;
++      s = next;
+     }
+     /* If we get this token, we have the start of what might be a
+        normal tag, but not a declaration (i.e. it doesn't begin with
+--- a/xmlwf/xmlwf.c
++++ b/xmlwf/xmlwf.c
+@@ -914,6 +914,9 @@ usage(const XML_Char *prog, int rc) {
+       T("  -a FACTOR     set maximum tolerated [a]mplification factor (default: 100.0)\n")
+       T("  -b BYTES      set number of output [b]ytes needed to activate (default: 8 MiB)\n")
+       T("\n")
++      T("reparse deferral:\n")
++      T("  -q             disable reparse deferral, and allow [q]uadratic parse runtime with large tokens\n")
++      T("\n")
+       T("info arguments:\n")
+       T("  -h            show this [h]elp message and exit\n")
+       T("  -v            show program's [v]ersion number and exit\n")
+@@ -967,6 +970,8 @@ tmain(int argc, XML_Char **argv) {
+   unsigned long long attackThresholdBytes;
+   XML_Bool attackThresholdGiven = XML_FALSE;
+ 
++  XML_Bool disableDeferral = XML_FALSE;
++
+   int exitCode = XMLWF_EXIT_SUCCESS;
+   enum XML_ParamEntityParsing paramEntityParsing
+       = XML_PARAM_ENTITY_PARSING_NEVER;
+@@ -1089,6 +1094,11 @@ tmain(int argc, XML_Char **argv) {
+ #endif
+       break;
+     }
++    case T('q'): {
++      disableDeferral = XML_TRUE;
++      j++;
++      break;
++    }
+     case T('\0'):
+       if (j > 1) {
+         i++;
+@@ -1134,6 +1144,16 @@ tmain(int argc, XML_Char **argv) {
+ #endif
+     }
+ 
++    if (disableDeferral) {
++      const XML_Bool success = XML_SetReparseDeferralEnabled(parser, XML_FALSE);
++      if (! success) {
++        // This prevents tperror(..) from reporting misleading "[..]: Success"
++        errno = EINVAL;
++        tperror(T("Failed to disable reparse deferral"));
++        exit(XMLWF_EXIT_INTERNAL_ERROR);
++      }
++    }
++
+     if (requireStandalone)
+       XML_SetNotStandaloneHandler(parser, notStandalone);
+     XML_SetParamEntityParsing(parser, paramEntityParsing);
+--- a/xmlwf/xmlwf_helpgen.py
++++ b/xmlwf/xmlwf_helpgen.py
+@@ -81,6 +81,10 @@ billion_laughs.add_argument('-a', metava
+                             help='set maximum tolerated [a]mplification factor (default: 100.0)')
+ billion_laughs.add_argument('-b', metavar='BYTES', help='set number of output [b]ytes needed to activate (default: 8 MiB)')
+ 
++reparse_deferral = parser.add_argument_group('reparse deferral')
++reparse_deferral.add_argument('-q', metavar='FACTOR',
++                            help='disable reparse deferral, and allow [q]uadratic parse runtime with large tokens')
++
+ parser.add_argument('files', metavar='FILE', nargs='*', help='file to process (default: STDIN)')
+ 
+ info = parser.add_argument_group('info arguments')
diff --git a/meta/recipes-core/expat/expat_2.5.0.bb b/meta/recipes-core/expat/expat_2.5.0.bb
index 31e989cfe2..09bc7a0a0b 100644
--- a/meta/recipes-core/expat/expat_2.5.0.bb
+++ b/meta/recipes-core/expat/expat_2.5.0.bb
@@ -22,6 +22,7 @@  SRC_URI = "https://github.com/libexpat/libexpat/releases/download/R_${VERSION_TA
 	   file://CVE-2023-52426-009.patch \
 	   file://CVE-2023-52426-010.patch \
 	   file://CVE-2023-52426-011.patch \
+           file://CVE-2023-52425.patch \
            "
 
 UPSTREAM_CHECK_URI = "https://github.com/libexpat/libexpat/releases/"