diff mbox series

[meta-python,kirkstone] python3-cbor2: fix CVE-2025-68131

Message ID 20260218120306.51096-1-hprajapati@mvista.com
State New
Headers show
Series [meta-python,kirkstone] python3-cbor2: fix CVE-2025-68131 | expand

Commit Message

Hitendra Prajapati Feb. 18, 2026, 12:03 p.m. UTC
Added a read-ahead buffer to the C decoder.

Upstream-Status: Backport from https://github.com/agronholm/cbor2/commit/fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824

Signed-off-by: Hitendra Prajapati <hprajapati@mvista.com>
---
 .../python/python3-cbor2/CVE-2025-68131.patch | 494 ++++++++++++++++++
 .../python/python3-cbor2_5.4.2.bb             |   1 +
 2 files changed, 495 insertions(+)
 create mode 100644 meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch

Comments

Gyorgy Sarvari Feb. 18, 2026, 12:27 p.m. UTC | #1
On 2/18/26 13:03, Hitendra Prajapati wrote:
> Added a read-ahead buffer to the C decoder.
>
> Upstream-Status: Backport from https://github.com/agronholm/cbor2/commit/fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824

Is this the correct fix for this vulnerability? I mean I see that NVD
advisory refers to this, but I wonder if that report has correct data.
This commit looks unrelated to the description of the vulnerability -
but please correct me if I'm wrong with anything in this mail.

The vulnerability is about marking a shareable CBORDecoder objects to
persist in memory, and leak information between uses, but this patch
seems to be a performance improvement, without security implications (at
least it's not obvious to me).

Looking in the recent code changes, there are two suspicious commits:
https://github.com/agronholm/cbor2/commit/f1d701cd2c411ee40bb1fe383afe7f365f35abf0
https://github.com/agronholm/cbor2/commit/403c2ce3d61ce5ddc2d2143127baf31b7ab4a75c

The first one seems to implement the solution that was recommended in
the related github advisory[1], some magic with the share-indicator
flags (and the second one fixes a bug introduced by the first one)

What do you think? Or am I going in a wrong direction?

[1]:
https://github.com/agronholm/cbor2/security/advisories/GHSA-wcj4-jw5j-44wh

>
> Signed-off-by: Hitendra Prajapati <hprajapati@mvista.com>
> ---
>  .../python/python3-cbor2/CVE-2025-68131.patch | 494 ++++++++++++++++++
>  .../python/python3-cbor2_5.4.2.bb             |   1 +
>  2 files changed, 495 insertions(+)
>  create mode 100644 meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch
>
> diff --git a/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch b/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch
> new file mode 100644
> index 0000000000..38cb04b14f
> --- /dev/null
> +++ b/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch
> @@ -0,0 +1,494 @@
> +From fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824 Mon Sep 17 00:00:00 2001
> +From: Andreas Eriksen <andreer@vespa.ai>
> +Date: Mon, 29 Dec 2025 14:01:52 +0100
> +Subject: [PATCH] Added a read-ahead buffer to the C decoder (#268) 
> +
> +CVE: CVE-2025-68131
> +Upstream-Status: Backport [https://github.com/agronholm/cbor2/commit/fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824]
> +Signed-off-by: Hitendra Prajapati <hprajapati@mvista.com>
> +---
> + source/decoder.c      | 250 ++++++++++++++++++++++++++++++++++++------
> + source/decoder.h      |   9 ++
> + tests/test_decoder.py |  86 +++++++++++++++
> + 3 files changed, 310 insertions(+), 35 deletions(-)
> +
> +diff --git a/source/decoder.c b/source/decoder.c
> +index d4da393..34d07a6 100644
> +--- a/source/decoder.c
> ++++ b/source/decoder.c
> +@@ -41,6 +41,7 @@ enum DecodeOption {
> + typedef uint8_t DecodeOptions;
> + 
> + static int _CBORDecoder_set_fp(CBORDecoderObject *, PyObject *, void *);
> ++static int _CBORDecoder_set_fp_with_read_size(CBORDecoderObject *, PyObject *, Py_ssize_t);
> + static int _CBORDecoder_set_tag_hook(CBORDecoderObject *, PyObject *, void *);
> + static int _CBORDecoder_set_object_hook(CBORDecoderObject *, PyObject *, void *);
> + static int _CBORDecoder_set_str_errors(CBORDecoderObject *, PyObject *, void *);
> +@@ -98,6 +99,13 @@ CBORDecoder_clear(CBORDecoderObject *self)
> +     Py_CLEAR(self->shareables);
> +     Py_CLEAR(self->stringref_namespace);
> +     Py_CLEAR(self->str_errors);
> ++    if (self->readahead) {
> ++        PyMem_Free(self->readahead);
> ++        self->readahead = NULL;
> ++        self->readahead_size = 0;
> ++    }
> ++    self->read_pos = 0;
> ++    self->read_len = 0;
> +     return 0;
> + }
> + 
> +@@ -139,6 +147,10 @@ CBORDecoder_new(PyTypeObject *type, PyObject *args, PyObject *kwargs)
> +         self->str_errors = PyBytes_FromString("strict");
> +         self->immutable = false;
> +         self->shared_index = -1;
> ++        self->readahead = NULL;
> ++        self->readahead_size = 0;
> ++        self->read_pos = 0;
> ++        self->read_len = 0;
> +     }
> +     return (PyObject *) self;
> + error:
> +@@ -148,21 +160,27 @@ error:
> + 
> + 
> + // CBORDecoder.__init__(self, fp=None, tag_hook=None, object_hook=None,
> +-//                      str_errors='strict')
> ++//                      str_errors='strict', read_size=4096)
> + int
> + CBORDecoder_init(CBORDecoderObject *self, PyObject *args, PyObject *kwargs)
> + {
> +     static char *keywords[] = {
> +-        "fp", "tag_hook", "object_hook", "str_errors", NULL
> ++        "fp", "tag_hook", "object_hook", "str_errors", "read_size", NULL
> +     };
> +     PyObject *fp = NULL, *tag_hook = NULL, *object_hook = NULL,
> +              *str_errors = NULL;
> ++    Py_ssize_t read_size = CBOR2_DEFAULT_READ_SIZE;
> + 
> +-    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOO", keywords,
> +-                &fp, &tag_hook, &object_hook, &str_errors))
> ++    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOOn", keywords,
> ++                &fp, &tag_hook, &object_hook, &str_errors, &read_size))
> +         return -1;
> + 
> +-    if (_CBORDecoder_set_fp(self, fp, NULL) == -1)
> ++    if (read_size < 1) {
> ++        PyErr_SetString(PyExc_ValueError, "read_size must be at least 1");
> ++        return -1;
> ++    }
> ++
> ++    if (_CBORDecoder_set_fp_with_read_size(self, fp, read_size) == -1)
> +         return -1;
> +     if (tag_hook && _CBORDecoder_set_tag_hook(self, tag_hook, NULL) == -1)
> +         return -1;
> +@@ -190,11 +208,12 @@ _CBORDecoder_get_fp(CBORDecoderObject *self, void *closure)
> + }
> + 
> + 
> +-// CBORDecoder._set_fp(self, value)
> ++// Internal: set fp with configurable read size
> + static int
> +-_CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
> ++_CBORDecoder_set_fp_with_read_size(CBORDecoderObject *self, PyObject *value, Py_ssize_t read_size)
> + {
> +     PyObject *tmp, *read;
> ++    char *new_buffer = NULL;
> + 
> +     if (!value) {
> +         PyErr_SetString(PyExc_AttributeError, "cannot delete fp attribute");
> +@@ -207,13 +226,43 @@ _CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
> +         return -1;
> +     }
> + 
> ++    if (self->readahead == NULL || self->readahead_size != read_size) {
> ++        new_buffer = (char *)PyMem_Malloc(read_size);
> ++        if (!new_buffer) {
> ++            Py_DECREF(read);
> ++            PyErr_NoMemory();
> ++            return -1;
> ++        }
> ++    }
> ++
> +     // See notes in encoder.c / _CBOREncoder_set_fp
> +     tmp = self->read;
> +     self->read = read;
> +     Py_DECREF(tmp);
> ++
> ++    self->read_pos = 0;
> ++    self->read_len = 0;
> ++
> ++    // Replace buffer (size changed or was NULL)
> ++    if (new_buffer) {
> ++        PyMem_Free(self->readahead);
> ++        self->readahead = new_buffer;
> ++        self->readahead_size = read_size;
> ++    }
> ++
> +     return 0;
> + }
> + 
> ++// CBORDecoder._set_fp(self, value) - property setter uses default read size
> ++static int
> ++_CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
> ++{
> ++    // Use existing readahead_size if already allocated, otherwise use default
> ++    Py_ssize_t read_size = (self->readahead_size > 0) ?
> ++                           self->readahead_size : CBOR2_DEFAULT_READ_SIZE;
> ++    return _CBORDecoder_set_fp_with_read_size(self, value, read_size);
> ++}
> ++
> + 
> + // CBORDecoder._get_tag_hook(self)
> + static PyObject *
> +@@ -340,38 +389,123 @@ _CBORDecoder_get_immutable(CBORDecoderObject *self, void *closure)
> +         Py_RETURN_FALSE;
> + }
> + 
> +-
> + // Utility functions /////////////////////////////////////////////////////////
> + 
> ++static void
> ++raise_from(PyObject *new_exc_type, const char *message) {
> ++    // This requires the error indicator to be set
> ++    PyObject *cause;
> ++#if PY_VERSION_HEX >= 0x030c0000
> ++    cause = PyErr_GetRaisedException();
> ++#else
> ++    PyObject *exc_type, *exc_traceback;
> ++    PyErr_Fetch(&exc_type, &cause, &exc_traceback);
> ++    PyErr_NormalizeException(&exc_type, &cause, &exc_traceback);
> ++    Py_XDECREF(exc_type);
> ++    Py_XDECREF(exc_traceback);
> ++#endif
> ++
> ++    PyObject *msg_obj = PyUnicode_FromString(message);
> ++    if (message) {
> ++        PyObject *new_exception = PyObject_CallFunctionObjArgs(
> ++            new_exc_type, msg_obj, NULL);
> ++        if (new_exception) {
> ++            PyException_SetCause(new_exception, cause);
> ++            PyErr_SetObject(new_exc_type, new_exception);
> ++        }
> ++        Py_DECREF(msg_obj);
> ++    }
> ++}
> ++
> ++// Read directly into caller's buffer (bypassing readahead buffer)
> ++static Py_ssize_t
> ++fp_read_bytes(CBORDecoderObject *self, char *buf, Py_ssize_t size)
> ++{
> ++    PyObject *size_obj = PyLong_FromSsize_t(size);
> ++    if (!size_obj)
> ++        return -1;
> ++
> ++    PyObject *obj = PyObject_CallFunctionObjArgs(self->read, size_obj, NULL);
> ++    Py_DECREF(size_obj);
> ++    if (!obj)
> ++        return -1;
> ++
> ++    assert(PyBytes_CheckExact(obj));
> ++    Py_ssize_t bytes_read = PyBytes_GET_SIZE(obj);
> ++    if (bytes_read > 0)
> ++        memcpy(buf, PyBytes_AS_STRING(obj), bytes_read);
> ++
> ++    Py_DECREF(obj);
> ++    return bytes_read;
> ++}
> ++
> ++// Read into caller's buffer using the readahead buffer
> + static int
> + fp_read(CBORDecoderObject *self, char *buf, const Py_ssize_t size)
> + {
> +-    PyObject *obj, *size_obj;
> +-    char *data;
> +-    int ret = -1;
> +-
> +-    size_obj = PyLong_FromSsize_t(size);
> +-    if (size_obj) {
> +-        obj = PyObject_CallFunctionObjArgs(self->read, size_obj, NULL);
> +-        if (obj) {
> +-            assert(PyBytes_CheckExact(obj));
> +-            if (PyBytes_GET_SIZE(obj) == (Py_ssize_t) size) {
> +-                data = PyBytes_AS_STRING(obj);
> +-                memcpy(buf, data, size);
> +-                ret = 0;
> ++    Py_ssize_t available, to_copy, remaining, total_copied;
> ++
> ++    remaining = size;
> ++    total_copied = 0;
> ++
> ++    while (remaining > 0) {
> ++        available = self->read_len - self->read_pos;
> ++
> ++        if (available > 0) {
> ++            // Copy from buffer
> ++            to_copy = (available < remaining) ? available : remaining;
> ++            memcpy(buf + total_copied, self->readahead + self->read_pos, to_copy);
> ++            self->read_pos += to_copy;
> ++            total_copied += to_copy;
> ++            remaining -= to_copy;
> ++        } else {
> ++            Py_ssize_t bytes_read;
> ++
> ++            if (remaining >= self->readahead_size) {
> ++                // Large remaining: read directly into destination, bypass buffer
> ++                bytes_read = fp_read_bytes(self, buf + total_copied, remaining);
> ++                if (bytes_read > 0) {
> ++                    total_copied += bytes_read;
> ++                    remaining -= bytes_read;
> ++                }
> +             } else {
> +-                PyErr_Format(
> +-                    _CBOR2_CBORDecodeEOF,
> +-                    "premature end of stream (expected to read %zd bytes, "
> +-                    "got %zd instead)", size, PyBytes_GET_SIZE(obj));
> ++                // Small remaining: refill buffer
> ++                self->read_pos = 0;
> ++                self->read_len = 0;
> ++                bytes_read = fp_read_bytes(self, self->readahead, self->readahead_size);
> ++                if (bytes_read > 0)
> ++                    self->read_len = bytes_read;
> ++            }
> ++
> ++            if (bytes_read <= 0) {
> ++                if (bytes_read == 0)
> ++                    PyErr_Format(
> ++                        _CBOR2_CBORDecodeEOF,
> ++                        "premature end of stream (expected to read %zd bytes, "
> ++                        "got %zd instead)", size, total_copied);
> ++                return -1;
> +             }
> +-            Py_DECREF(obj);
> +         }
> +-        Py_DECREF(size_obj);
> +     }
> +-    return ret;
> ++
> ++    return 0;
> + }
> + 
> ++// Read and return as PyBytes object
> ++static PyObject *
> ++fp_read_object(CBORDecoderObject *self, const Py_ssize_t size)
> ++{
> ++    PyObject *ret = PyBytes_FromStringAndSize(NULL, size);
> ++    if (!ret)
> ++        return NULL;
> ++
> ++    if (fp_read(self, PyBytes_AS_STRING(ret), size) == -1) {
> ++        Py_DECREF(ret);
> ++        return NULL;
> ++    }
> ++
> ++    return ret;
> ++}
> + 
> + // CBORDecoder.read(self, length) -> bytes
> + static PyObject *
> +@@ -1760,21 +1894,59 @@ static PyObject *
> + CBORDecoder_decode_from_bytes(CBORDecoderObject *self, PyObject *data)
> + {
> +     PyObject *save_read, *buf, *ret = NULL;
> ++    bool is_nested = (self->decode_depth > 0);
> ++    Py_ssize_t save_read_pos = 0, save_read_len = 0;
> ++    char *save_buffer = NULL;
> + 
> +     if (!_CBOR2_BytesIO && _CBOR2_init_BytesIO() == -1)
> +         return NULL;
> + 
> +-    save_read = self->read;
> +     buf = PyObject_CallFunctionObjArgs(_CBOR2_BytesIO, data, NULL);
> +-    if (buf) {
> +-        self->read = PyObject_GetAttr(buf, _CBOR2_str_read);
> +-        if (self->read) {
> +-            ret = decode(self, DECODE_NORMAL);
> +-            Py_DECREF(self->read);
> ++    if (!buf)
> ++        return NULL;
> ++
> ++    self->decode_depth++;
> ++    save_read = self->read;
> ++    Py_INCREF(save_read);  // Keep alive while we use a different read method
> ++    save_read_pos = self->read_pos;
> ++    save_read_len = self->read_len;
> ++
> ++    // Save buffer pointer if nested
> ++    if (is_nested) {
> ++        save_buffer = self->readahead;
> ++        self->readahead = NULL;  // Prevent setter from freeing saved buffer
> ++    }
> ++
> ++    // Set up BytesIO decoder - setter handles buffer allocation
> ++    if (_CBORDecoder_set_fp_with_read_size(self, buf, self->readahead_size) == -1) {
> ++        if (is_nested) {
> ++            PyMem_Free(self->readahead);
> ++            self->readahead = save_buffer;
> +         }
> ++        Py_DECREF(save_read);
> +         Py_DECREF(buf);
> ++        self->decode_depth--;
> ++        return NULL;
> ++    }
> ++
> ++    ret = decode(self, DECODE_NORMAL);
> ++
> ++    Py_XDECREF(self->read);  // Decrement BytesIO read method
> ++    self->read = save_read;  // Restore saved read (already has correct refcount)
> ++    Py_DECREF(buf);
> ++    self->decode_depth--;
> ++
> ++    if (is_nested) {
> ++        PyMem_Free(self->readahead);
> ++        self->readahead = save_buffer;
> ++    }
> ++    self->read_pos = save_read_pos;
> ++    self->read_len = save_read_len;
> ++
> ++    assert(self->decode_depth >= 0);
> ++    if (self->decode_depth == 0) {
> ++        clear_shareable_state(self);
> +     }
> +-    self->read = save_read;
> +     return ret;
> + }
> + 
> +@@ -1920,6 +2092,14 @@ PyDoc_STRVAR(CBORDecoder__doc__,
> + "    dictionary. This callback is invoked for each deserialized\n"
> + "    :class:`dict` object. The return value is substituted for the dict\n"
> + "    in the deserialized output.\n"
> ++":param read_size:\n"
> ++"    the size of the read buffer (default 4096). The decoder reads from\n"
> ++"    the stream in chunks of this size for performance. This means the\n"
> ++"    stream position may advance beyond the bytes actually decoded. For\n"
> ++"    large values (bytestrings, text strings), reads may be larger than\n"
> ++"    ``read_size``. Code that needs to read from the stream after\n"
> ++"    decoding should use :meth:`decode_from_bytes` instead, or set\n"
> ++"    ``read_size=1`` to disable buffering (at a performance cost).\n"
> + "\n"
> + ".. _CBOR: https://cbor.io/\n"
> + );
> +diff --git a/source/decoder.h b/source/decoder.h
> +index 6bb6d52..8a3393a 100644
> +--- a/source/decoder.h
> ++++ b/source/decoder.h
> +@@ -3,6 +3,9 @@
> + #include <stdbool.h>
> + #include <stdint.h>
> + 
> ++// Default readahead buffer size for streaming reads
> ++#define CBOR2_DEFAULT_READ_SIZE 4096
> ++
> + typedef struct {
> +     PyObject_HEAD
> +     PyObject *read;    // cached read() method of fp
> +@@ -13,6 +16,12 @@ typedef struct {
> +     PyObject *str_errors;
> +     bool immutable;
> +     Py_ssize_t shared_index;
> ++
> ++    // Readahead buffer for streaming
> ++    char *readahead;            // allocated buffer
> ++    Py_ssize_t readahead_size;  // size of allocated buffer
> ++    Py_ssize_t read_pos;        // current position in buffer
> ++    Py_ssize_t read_len;        // valid bytes in buffer
> + } CBORDecoderObject;
> + 
> + extern PyTypeObject CBORDecoderType;
> +diff --git a/tests/test_decoder.py b/tests/test_decoder.py
> +index d03e288..b153971 100644
> +--- a/tests/test_decoder.py
> ++++ b/tests/test_decoder.py
> +@@ -717,3 +717,89 @@ def test_decimal_payload_unpacking(impl, data, expected):
> +     with pytest.raises(impl.CBORDecodeValueError) as exc_info:
> +         impl.loads(unhexlify(data))
> +     assert exc_info.value.args[0] == f"Incorrect tag {expected} payload"
> ++
> ++
> ++def test_decode_from_bytes_in_hook_preserves_buffer(impl):
> ++    """Test that calling decode_from_bytes from a hook preserves stream buffer state.
> ++
> ++    This is a documented use case from docs/customizing.rst where hooks decode
> ++    embedded CBOR data. Before the fix, the stream's readahead buffer would be
> ++    corrupted, causing subsequent reads to fail or return wrong data.
> ++    """
> ++
> ++    def tag_hook(decoder, tag):
> ++        if tag.tag == 999:
> ++            # Decode embedded CBOR (documented pattern)
> ++            return decoder.decode_from_bytes(tag.value)
> ++        return tag
> ++
> ++    # Test data: array with [tag(999, embedded_cbor), "after_hook", "final"]
> ++    # embedded_cbor encodes: [1, 2, 3]
> ++    data = unhexlify(
> ++        "83"  # array(3)
> ++        "d903e7"  # tag(999)
> ++        "44"  # bytes(4)
> ++        "83010203"  # embedded: array [1, 2, 3]
> ++        "6a"  # text(10)
> ++        "61667465725f686f6f6b"  # "after_hook"
> ++        "65"  # text(5)
> ++        "66696e616c"  # "final"
> ++    )
> ++
> ++    # Decode from stream (not bytes) to use readahead buffer
> ++    stream = BytesIO(data)
> ++    decoder = impl.CBORDecoder(stream, tag_hook=tag_hook)
> ++    result = decoder.decode()
> ++
> ++    # Verify all values decoded correctly
> ++    assert result == [[1, 2, 3], "after_hook", "final"]
> ++
> ++    # First element should be the decoded embedded CBOR
> ++    assert result[0] == [1, 2, 3]
> ++    # Second element should be "after_hook" (not corrupted)
> ++    assert result[1] == "after_hook"
> ++    # Third element should be "final"
> ++    assert result[2] == "final"
> ++
> ++
> ++def test_decode_from_bytes_deeply_nested_in_hook(impl):
> ++    """Test deeply nested decode_from_bytes calls preserve buffer state.
> ++
> ++    This tests tag(999, tag(888, tag(777, [1,2,3]))) where each tag value
> ++    is embedded CBOR that triggers the hook recursively.
> ++
> ++    Before the fix, even a single level would corrupt the buffer. With multiple
> ++    levels, the buffer would be completely corrupted, mixing data from different
> ++    BytesIO objects and the original stream.
> ++    """
> ++
> ++    def tag_hook(decoder, tag):
> ++        if tag.tag in [999, 888, 777]:
> ++            # Recursively decode embedded CBOR
> ++            return decoder.decode_from_bytes(tag.value)
> ++        return tag
> ++
> ++    # Test data: [tag(999, tag(888, tag(777, [1,2,3]))), "after", "final"]
> ++    # Each tag contains embedded CBOR
> ++    data = unhexlify(
> ++        "83"  # array(3)
> ++        "d903e7"  # tag(999)
> ++        "4c"  # bytes(12)
> ++        "d9037848d903094483010203"  # embedded: tag(888, tag(777, [1,2,3]))
> ++        "65"  # text(5)
> ++        "6166746572"  # "after"
> ++        "65"  # text(5)
> ++        "66696e616c"  # "final"
> ++    )
> ++
> ++    # Decode from stream to use readahead buffer
> ++    stream = BytesIO(data)
> ++    decoder = impl.CBORDecoder(stream, tag_hook=tag_hook)
> ++    result = decoder.decode()
> ++
> ++    # With the fix: all three levels of nesting work correctly
> ++    # Without the fix: buffer corruption at each level, test fails
> ++    assert result == [[1, 2, 3], "after", "final"]
> ++    assert result[0] == [1, 2, 3]
> ++    assert result[1] == "after"
> ++    assert result[2] == "final"
> +-- 
> +2.50.1
> +
> diff --git a/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb b/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
> index bbdeca7adb..5aeb82b992 100644
> --- a/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
> +++ b/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
> @@ -10,6 +10,7 @@ inherit pypi python_setuptools_build_meta ptest
>  
>  SRC_URI += " \
>          file://run-ptest \
> +        file://CVE-2025-68131.patch \
>  "
>  
>  # not vulnerable yet, vulnerability was introduced in v5.6.0
diff mbox series

Patch

diff --git a/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch b/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch
new file mode 100644
index 0000000000..38cb04b14f
--- /dev/null
+++ b/meta-python/recipes-devtools/python/python3-cbor2/CVE-2025-68131.patch
@@ -0,0 +1,494 @@ 
+From fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824 Mon Sep 17 00:00:00 2001
+From: Andreas Eriksen <andreer@vespa.ai>
+Date: Mon, 29 Dec 2025 14:01:52 +0100
+Subject: [PATCH] Added a read-ahead buffer to the C decoder (#268) 
+
+CVE: CVE-2025-68131
+Upstream-Status: Backport [https://github.com/agronholm/cbor2/commit/fb4ee1612a8a1ac0dbd8cf2f2f6f931a4e06d824]
+Signed-off-by: Hitendra Prajapati <hprajapati@mvista.com>
+---
+ source/decoder.c      | 250 ++++++++++++++++++++++++++++++++++++------
+ source/decoder.h      |   9 ++
+ tests/test_decoder.py |  86 +++++++++++++++
+ 3 files changed, 310 insertions(+), 35 deletions(-)
+
+diff --git a/source/decoder.c b/source/decoder.c
+index d4da393..34d07a6 100644
+--- a/source/decoder.c
++++ b/source/decoder.c
+@@ -41,6 +41,7 @@ enum DecodeOption {
+ typedef uint8_t DecodeOptions;
+ 
+ static int _CBORDecoder_set_fp(CBORDecoderObject *, PyObject *, void *);
++static int _CBORDecoder_set_fp_with_read_size(CBORDecoderObject *, PyObject *, Py_ssize_t);
+ static int _CBORDecoder_set_tag_hook(CBORDecoderObject *, PyObject *, void *);
+ static int _CBORDecoder_set_object_hook(CBORDecoderObject *, PyObject *, void *);
+ static int _CBORDecoder_set_str_errors(CBORDecoderObject *, PyObject *, void *);
+@@ -98,6 +99,13 @@ CBORDecoder_clear(CBORDecoderObject *self)
+     Py_CLEAR(self->shareables);
+     Py_CLEAR(self->stringref_namespace);
+     Py_CLEAR(self->str_errors);
++    if (self->readahead) {
++        PyMem_Free(self->readahead);
++        self->readahead = NULL;
++        self->readahead_size = 0;
++    }
++    self->read_pos = 0;
++    self->read_len = 0;
+     return 0;
+ }
+ 
+@@ -139,6 +147,10 @@ CBORDecoder_new(PyTypeObject *type, PyObject *args, PyObject *kwargs)
+         self->str_errors = PyBytes_FromString("strict");
+         self->immutable = false;
+         self->shared_index = -1;
++        self->readahead = NULL;
++        self->readahead_size = 0;
++        self->read_pos = 0;
++        self->read_len = 0;
+     }
+     return (PyObject *) self;
+ error:
+@@ -148,21 +160,27 @@ error:
+ 
+ 
+ // CBORDecoder.__init__(self, fp=None, tag_hook=None, object_hook=None,
+-//                      str_errors='strict')
++//                      str_errors='strict', read_size=4096)
+ int
+ CBORDecoder_init(CBORDecoderObject *self, PyObject *args, PyObject *kwargs)
+ {
+     static char *keywords[] = {
+-        "fp", "tag_hook", "object_hook", "str_errors", NULL
++        "fp", "tag_hook", "object_hook", "str_errors", "read_size", NULL
+     };
+     PyObject *fp = NULL, *tag_hook = NULL, *object_hook = NULL,
+              *str_errors = NULL;
++    Py_ssize_t read_size = CBOR2_DEFAULT_READ_SIZE;
+ 
+-    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOO", keywords,
+-                &fp, &tag_hook, &object_hook, &str_errors))
++    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOOn", keywords,
++                &fp, &tag_hook, &object_hook, &str_errors, &read_size))
+         return -1;
+ 
+-    if (_CBORDecoder_set_fp(self, fp, NULL) == -1)
++    if (read_size < 1) {
++        PyErr_SetString(PyExc_ValueError, "read_size must be at least 1");
++        return -1;
++    }
++
++    if (_CBORDecoder_set_fp_with_read_size(self, fp, read_size) == -1)
+         return -1;
+     if (tag_hook && _CBORDecoder_set_tag_hook(self, tag_hook, NULL) == -1)
+         return -1;
+@@ -190,11 +208,12 @@ _CBORDecoder_get_fp(CBORDecoderObject *self, void *closure)
+ }
+ 
+ 
+-// CBORDecoder._set_fp(self, value)
++// Internal: set fp with configurable read size
+ static int
+-_CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
++_CBORDecoder_set_fp_with_read_size(CBORDecoderObject *self, PyObject *value, Py_ssize_t read_size)
+ {
+     PyObject *tmp, *read;
++    char *new_buffer = NULL;
+ 
+     if (!value) {
+         PyErr_SetString(PyExc_AttributeError, "cannot delete fp attribute");
+@@ -207,13 +226,43 @@ _CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
+         return -1;
+     }
+ 
++    if (self->readahead == NULL || self->readahead_size != read_size) {
++        new_buffer = (char *)PyMem_Malloc(read_size);
++        if (!new_buffer) {
++            Py_DECREF(read);
++            PyErr_NoMemory();
++            return -1;
++        }
++    }
++
+     // See notes in encoder.c / _CBOREncoder_set_fp
+     tmp = self->read;
+     self->read = read;
+     Py_DECREF(tmp);
++
++    self->read_pos = 0;
++    self->read_len = 0;
++
++    // Replace buffer (size changed or was NULL)
++    if (new_buffer) {
++        PyMem_Free(self->readahead);
++        self->readahead = new_buffer;
++        self->readahead_size = read_size;
++    }
++
+     return 0;
+ }
+ 
++// CBORDecoder._set_fp(self, value) - property setter uses default read size
++static int
++_CBORDecoder_set_fp(CBORDecoderObject *self, PyObject *value, void *closure)
++{
++    // Use existing readahead_size if already allocated, otherwise use default
++    Py_ssize_t read_size = (self->readahead_size > 0) ?
++                           self->readahead_size : CBOR2_DEFAULT_READ_SIZE;
++    return _CBORDecoder_set_fp_with_read_size(self, value, read_size);
++}
++
+ 
+ // CBORDecoder._get_tag_hook(self)
+ static PyObject *
+@@ -340,38 +389,123 @@ _CBORDecoder_get_immutable(CBORDecoderObject *self, void *closure)
+         Py_RETURN_FALSE;
+ }
+ 
+-
+ // Utility functions /////////////////////////////////////////////////////////
+ 
++static void
++raise_from(PyObject *new_exc_type, const char *message) {
++    // This requires the error indicator to be set
++    PyObject *cause;
++#if PY_VERSION_HEX >= 0x030c0000
++    cause = PyErr_GetRaisedException();
++#else
++    PyObject *exc_type, *exc_traceback;
++    PyErr_Fetch(&exc_type, &cause, &exc_traceback);
++    PyErr_NormalizeException(&exc_type, &cause, &exc_traceback);
++    Py_XDECREF(exc_type);
++    Py_XDECREF(exc_traceback);
++#endif
++
++    PyObject *msg_obj = PyUnicode_FromString(message);
++    if (message) {
++        PyObject *new_exception = PyObject_CallFunctionObjArgs(
++            new_exc_type, msg_obj, NULL);
++        if (new_exception) {
++            PyException_SetCause(new_exception, cause);
++            PyErr_SetObject(new_exc_type, new_exception);
++        }
++        Py_DECREF(msg_obj);
++    }
++}
++
++// Read directly into caller's buffer (bypassing readahead buffer)
++static Py_ssize_t
++fp_read_bytes(CBORDecoderObject *self, char *buf, Py_ssize_t size)
++{
++    PyObject *size_obj = PyLong_FromSsize_t(size);
++    if (!size_obj)
++        return -1;
++
++    PyObject *obj = PyObject_CallFunctionObjArgs(self->read, size_obj, NULL);
++    Py_DECREF(size_obj);
++    if (!obj)
++        return -1;
++
++    assert(PyBytes_CheckExact(obj));
++    Py_ssize_t bytes_read = PyBytes_GET_SIZE(obj);
++    if (bytes_read > 0)
++        memcpy(buf, PyBytes_AS_STRING(obj), bytes_read);
++
++    Py_DECREF(obj);
++    return bytes_read;
++}
++
++// Read into caller's buffer using the readahead buffer
+ static int
+ fp_read(CBORDecoderObject *self, char *buf, const Py_ssize_t size)
+ {
+-    PyObject *obj, *size_obj;
+-    char *data;
+-    int ret = -1;
+-
+-    size_obj = PyLong_FromSsize_t(size);
+-    if (size_obj) {
+-        obj = PyObject_CallFunctionObjArgs(self->read, size_obj, NULL);
+-        if (obj) {
+-            assert(PyBytes_CheckExact(obj));
+-            if (PyBytes_GET_SIZE(obj) == (Py_ssize_t) size) {
+-                data = PyBytes_AS_STRING(obj);
+-                memcpy(buf, data, size);
+-                ret = 0;
++    Py_ssize_t available, to_copy, remaining, total_copied;
++
++    remaining = size;
++    total_copied = 0;
++
++    while (remaining > 0) {
++        available = self->read_len - self->read_pos;
++
++        if (available > 0) {
++            // Copy from buffer
++            to_copy = (available < remaining) ? available : remaining;
++            memcpy(buf + total_copied, self->readahead + self->read_pos, to_copy);
++            self->read_pos += to_copy;
++            total_copied += to_copy;
++            remaining -= to_copy;
++        } else {
++            Py_ssize_t bytes_read;
++
++            if (remaining >= self->readahead_size) {
++                // Large remaining: read directly into destination, bypass buffer
++                bytes_read = fp_read_bytes(self, buf + total_copied, remaining);
++                if (bytes_read > 0) {
++                    total_copied += bytes_read;
++                    remaining -= bytes_read;
++                }
+             } else {
+-                PyErr_Format(
+-                    _CBOR2_CBORDecodeEOF,
+-                    "premature end of stream (expected to read %zd bytes, "
+-                    "got %zd instead)", size, PyBytes_GET_SIZE(obj));
++                // Small remaining: refill buffer
++                self->read_pos = 0;
++                self->read_len = 0;
++                bytes_read = fp_read_bytes(self, self->readahead, self->readahead_size);
++                if (bytes_read > 0)
++                    self->read_len = bytes_read;
++            }
++
++            if (bytes_read <= 0) {
++                if (bytes_read == 0)
++                    PyErr_Format(
++                        _CBOR2_CBORDecodeEOF,
++                        "premature end of stream (expected to read %zd bytes, "
++                        "got %zd instead)", size, total_copied);
++                return -1;
+             }
+-            Py_DECREF(obj);
+         }
+-        Py_DECREF(size_obj);
+     }
+-    return ret;
++
++    return 0;
+ }
+ 
++// Read and return as PyBytes object
++static PyObject *
++fp_read_object(CBORDecoderObject *self, const Py_ssize_t size)
++{
++    PyObject *ret = PyBytes_FromStringAndSize(NULL, size);
++    if (!ret)
++        return NULL;
++
++    if (fp_read(self, PyBytes_AS_STRING(ret), size) == -1) {
++        Py_DECREF(ret);
++        return NULL;
++    }
++
++    return ret;
++}
+ 
+ // CBORDecoder.read(self, length) -> bytes
+ static PyObject *
+@@ -1760,21 +1894,59 @@ static PyObject *
+ CBORDecoder_decode_from_bytes(CBORDecoderObject *self, PyObject *data)
+ {
+     PyObject *save_read, *buf, *ret = NULL;
++    bool is_nested = (self->decode_depth > 0);
++    Py_ssize_t save_read_pos = 0, save_read_len = 0;
++    char *save_buffer = NULL;
+ 
+     if (!_CBOR2_BytesIO && _CBOR2_init_BytesIO() == -1)
+         return NULL;
+ 
+-    save_read = self->read;
+     buf = PyObject_CallFunctionObjArgs(_CBOR2_BytesIO, data, NULL);
+-    if (buf) {
+-        self->read = PyObject_GetAttr(buf, _CBOR2_str_read);
+-        if (self->read) {
+-            ret = decode(self, DECODE_NORMAL);
+-            Py_DECREF(self->read);
++    if (!buf)
++        return NULL;
++
++    self->decode_depth++;
++    save_read = self->read;
++    Py_INCREF(save_read);  // Keep alive while we use a different read method
++    save_read_pos = self->read_pos;
++    save_read_len = self->read_len;
++
++    // Save buffer pointer if nested
++    if (is_nested) {
++        save_buffer = self->readahead;
++        self->readahead = NULL;  // Prevent setter from freeing saved buffer
++    }
++
++    // Set up BytesIO decoder - setter handles buffer allocation
++    if (_CBORDecoder_set_fp_with_read_size(self, buf, self->readahead_size) == -1) {
++        if (is_nested) {
++            PyMem_Free(self->readahead);
++            self->readahead = save_buffer;
+         }
++        Py_DECREF(save_read);
+         Py_DECREF(buf);
++        self->decode_depth--;
++        return NULL;
++    }
++
++    ret = decode(self, DECODE_NORMAL);
++
++    Py_XDECREF(self->read);  // Decrement BytesIO read method
++    self->read = save_read;  // Restore saved read (already has correct refcount)
++    Py_DECREF(buf);
++    self->decode_depth--;
++
++    if (is_nested) {
++        PyMem_Free(self->readahead);
++        self->readahead = save_buffer;
++    }
++    self->read_pos = save_read_pos;
++    self->read_len = save_read_len;
++
++    assert(self->decode_depth >= 0);
++    if (self->decode_depth == 0) {
++        clear_shareable_state(self);
+     }
+-    self->read = save_read;
+     return ret;
+ }
+ 
+@@ -1920,6 +2092,14 @@ PyDoc_STRVAR(CBORDecoder__doc__,
+ "    dictionary. This callback is invoked for each deserialized\n"
+ "    :class:`dict` object. The return value is substituted for the dict\n"
+ "    in the deserialized output.\n"
++":param read_size:\n"
++"    the size of the read buffer (default 4096). The decoder reads from\n"
++"    the stream in chunks of this size for performance. This means the\n"
++"    stream position may advance beyond the bytes actually decoded. For\n"
++"    large values (bytestrings, text strings), reads may be larger than\n"
++"    ``read_size``. Code that needs to read from the stream after\n"
++"    decoding should use :meth:`decode_from_bytes` instead, or set\n"
++"    ``read_size=1`` to disable buffering (at a performance cost).\n"
+ "\n"
+ ".. _CBOR: https://cbor.io/\n"
+ );
+diff --git a/source/decoder.h b/source/decoder.h
+index 6bb6d52..8a3393a 100644
+--- a/source/decoder.h
++++ b/source/decoder.h
+@@ -3,6 +3,9 @@
+ #include <stdbool.h>
+ #include <stdint.h>
+ 
++// Default readahead buffer size for streaming reads
++#define CBOR2_DEFAULT_READ_SIZE 4096
++
+ typedef struct {
+     PyObject_HEAD
+     PyObject *read;    // cached read() method of fp
+@@ -13,6 +16,12 @@ typedef struct {
+     PyObject *str_errors;
+     bool immutable;
+     Py_ssize_t shared_index;
++
++    // Readahead buffer for streaming
++    char *readahead;            // allocated buffer
++    Py_ssize_t readahead_size;  // size of allocated buffer
++    Py_ssize_t read_pos;        // current position in buffer
++    Py_ssize_t read_len;        // valid bytes in buffer
+ } CBORDecoderObject;
+ 
+ extern PyTypeObject CBORDecoderType;
+diff --git a/tests/test_decoder.py b/tests/test_decoder.py
+index d03e288..b153971 100644
+--- a/tests/test_decoder.py
++++ b/tests/test_decoder.py
+@@ -717,3 +717,89 @@ def test_decimal_payload_unpacking(impl, data, expected):
+     with pytest.raises(impl.CBORDecodeValueError) as exc_info:
+         impl.loads(unhexlify(data))
+     assert exc_info.value.args[0] == f"Incorrect tag {expected} payload"
++
++
++def test_decode_from_bytes_in_hook_preserves_buffer(impl):
++    """Test that calling decode_from_bytes from a hook preserves stream buffer state.
++
++    This is a documented use case from docs/customizing.rst where hooks decode
++    embedded CBOR data. Before the fix, the stream's readahead buffer would be
++    corrupted, causing subsequent reads to fail or return wrong data.
++    """
++
++    def tag_hook(decoder, tag):
++        if tag.tag == 999:
++            # Decode embedded CBOR (documented pattern)
++            return decoder.decode_from_bytes(tag.value)
++        return tag
++
++    # Test data: array with [tag(999, embedded_cbor), "after_hook", "final"]
++    # embedded_cbor encodes: [1, 2, 3]
++    data = unhexlify(
++        "83"  # array(3)
++        "d903e7"  # tag(999)
++        "44"  # bytes(4)
++        "83010203"  # embedded: array [1, 2, 3]
++        "6a"  # text(10)
++        "61667465725f686f6f6b"  # "after_hook"
++        "65"  # text(5)
++        "66696e616c"  # "final"
++    )
++
++    # Decode from stream (not bytes) to use readahead buffer
++    stream = BytesIO(data)
++    decoder = impl.CBORDecoder(stream, tag_hook=tag_hook)
++    result = decoder.decode()
++
++    # Verify all values decoded correctly
++    assert result == [[1, 2, 3], "after_hook", "final"]
++
++    # First element should be the decoded embedded CBOR
++    assert result[0] == [1, 2, 3]
++    # Second element should be "after_hook" (not corrupted)
++    assert result[1] == "after_hook"
++    # Third element should be "final"
++    assert result[2] == "final"
++
++
++def test_decode_from_bytes_deeply_nested_in_hook(impl):
++    """Test deeply nested decode_from_bytes calls preserve buffer state.
++
++    This tests tag(999, tag(888, tag(777, [1,2,3]))) where each tag value
++    is embedded CBOR that triggers the hook recursively.
++
++    Before the fix, even a single level would corrupt the buffer. With multiple
++    levels, the buffer would be completely corrupted, mixing data from different
++    BytesIO objects and the original stream.
++    """
++
++    def tag_hook(decoder, tag):
++        if tag.tag in [999, 888, 777]:
++            # Recursively decode embedded CBOR
++            return decoder.decode_from_bytes(tag.value)
++        return tag
++
++    # Test data: [tag(999, tag(888, tag(777, [1,2,3]))), "after", "final"]
++    # Each tag contains embedded CBOR
++    data = unhexlify(
++        "83"  # array(3)
++        "d903e7"  # tag(999)
++        "4c"  # bytes(12)
++        "d9037848d903094483010203"  # embedded: tag(888, tag(777, [1,2,3]))
++        "65"  # text(5)
++        "6166746572"  # "after"
++        "65"  # text(5)
++        "66696e616c"  # "final"
++    )
++
++    # Decode from stream to use readahead buffer
++    stream = BytesIO(data)
++    decoder = impl.CBORDecoder(stream, tag_hook=tag_hook)
++    result = decoder.decode()
++
++    # With the fix: all three levels of nesting work correctly
++    # Without the fix: buffer corruption at each level, test fails
++    assert result == [[1, 2, 3], "after", "final"]
++    assert result[0] == [1, 2, 3]
++    assert result[1] == "after"
++    assert result[2] == "final"
+-- 
+2.50.1
+
diff --git a/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb b/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
index bbdeca7adb..5aeb82b992 100644
--- a/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
+++ b/meta-python/recipes-devtools/python/python3-cbor2_5.4.2.bb
@@ -10,6 +10,7 @@  inherit pypi python_setuptools_build_meta ptest
 
 SRC_URI += " \
         file://run-ptest \
+        file://CVE-2025-68131.patch \
 "
 
 # not vulnerable yet, vulnerability was introduced in v5.6.0