[1/1] overview-manual: initial documentation for hash equivalence

Message ID 20211215091631.1989314-2-michael.opdenacker@bootlin.com
State New
Headers show
Series Initial documentation for hash equivalence | expand

Commit Message

Michael Opdenacker Dec. 15, 2021, 9:16 a.m. UTC
Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
---
 documentation/overview-manual/concepts.rst | 67 ++++++++++++++++++++++
 1 file changed, 67 insertions(+)

Comments

Khem Raj Dec. 15, 2021, 4:40 p.m. UTC | #1
On 12/15/21 1:16 AM, Michael Opdenacker wrote:
> Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
> ---
>   documentation/overview-manual/concepts.rst | 67 ++++++++++++++++++++++
>   1 file changed, 67 insertions(+)
> 
> diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
> index bfd54208af..873f43a66e 100644
> --- a/documentation/overview-manual/concepts.rst
> +++ b/documentation/overview-manual/concepts.rst
> @@ -1938,6 +1938,73 @@ another reason why a task-based approach is preferred over a
>   recipe-based approach, which would have to install the output from every
>   task.
>   
> +Hash Equivalence
> +----------------
> +
> +The above section explained how BitBake skips the execution of tasks
> +which output can already be found in the Shared State Cache.
> +
> +During a build, it may often be the case that the output / result of a task might
> +be unchanged despite changes in the task's input values. An example might be
> +whitespace changes in some input C code. In project terms, this is what we define
> +as "equivalence". We can create a hash / checksum which represents a task and two
> +input task hashes are said to be equivalent if the hash of the generated output
> +(as stored / restored by sstate) is the same.
> +
> +Once bitbake knows that two input hashes for a task have equivalent output,
> +this has important and useful implications for all tasks depending on this task.
> +
> +Thanks to this equivalence, a change in one of the tasks in BitBake's run queue
> +doesn't have to propagate to all the downstream tasks that depend on the output
> +of this task, causing a full rebuild of such tasks, and so on with the next
> +depending tasks. Instead, BitBake can safely retrieve all the downstream
> +task output from the Shared State Cache.
> +
> +This applies to multiple scenarios:
> +
> +-  A "trivial" change to a recipe that doesn't impact its generated output,
> +   such as whitespace changes, modifications to unused code paths or
> +   in the ordering of variables.
> +
> +-  Shared library updates, for example to fix a security vulnerability.
> +   For sure, the programs using such a library should be rebuilt, but
> +   their new binaries should remain identical. The corresponding tasks should
> +   have a different output hash because of the change in the hash of their
> +   library dependency, but thanks to their output being identical, hash
> +   equivalence will stop the propagation down the dependency chain.
> +

is this case working I wonder, and if it is then it would be good to 
expand on what limitations are, since all shared libs aren't made same
some follow symbol versioning some don't and it get quickly get murky

> +-  Native tool updates. Though the depending tasks should be rebuilt,
> +   it's likely that they will generate the same output and be marked
> +   as equivalent.
> +
> +This mechanism is enabled by default in Poky, and is controlled by two
> +variables:
> +
> +-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote hash
> +   equivalence server to use.
> +
> +-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
> +
> +Therefore, the default configuration in Poky corresponds to the
> +below settings::
> +
> +   BB_HASHSERVE = "auto"
> +   BB_SIGNATURE_HANDLER = "OEEquivHash"
> +
> +Another possibility is to share a hash equivalence server on a network,
> +by setting::
> +
> +   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
> +
> +.. note::
> +
> +   The hash equivalence server needs to be maintained together with the
> +   share state cache. Otherwise, the server could report shared state hashes
> +   that do not exist.
> +
> +   We therefore recommend that one hash equivalence server be set up to
> +   correspond with a given shared state cache.
> +
>   Automatically Added Runtime Dependencies
>   ========================================
>   
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#2265): https://lists.yoctoproject.org/g/docs/message/2265
> Mute This Topic: https://lists.yoctoproject.org/mt/87740819/1997914
> Group Owner: docs+owner@lists.yoctoproject.org
> Unsubscribe: https://lists.yoctoproject.org/g/docs/unsub [raj.khem@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>

Patch

diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
index bfd54208af..873f43a66e 100644
--- a/documentation/overview-manual/concepts.rst
+++ b/documentation/overview-manual/concepts.rst
@@ -1938,6 +1938,73 @@  another reason why a task-based approach is preferred over a
 recipe-based approach, which would have to install the output from every
 task.
 
+Hash Equivalence
+----------------
+
+The above section explained how BitBake skips the execution of tasks
+which output can already be found in the Shared State Cache.
+
+During a build, it may often be the case that the output / result of a task might
+be unchanged despite changes in the task's input values. An example might be
+whitespace changes in some input C code. In project terms, this is what we define
+as "equivalence". We can create a hash / checksum which represents a task and two
+input task hashes are said to be equivalent if the hash of the generated output
+(as stored / restored by sstate) is the same.
+
+Once bitbake knows that two input hashes for a task have equivalent output,
+this has important and useful implications for all tasks depending on this task.
+
+Thanks to this equivalence, a change in one of the tasks in BitBake's run queue
+doesn't have to propagate to all the downstream tasks that depend on the output
+of this task, causing a full rebuild of such tasks, and so on with the next
+depending tasks. Instead, BitBake can safely retrieve all the downstream
+task output from the Shared State Cache.
+
+This applies to multiple scenarios:
+
+-  A "trivial" change to a recipe that doesn't impact its generated output,
+   such as whitespace changes, modifications to unused code paths or
+   in the ordering of variables.
+
+-  Shared library updates, for example to fix a security vulnerability.
+   For sure, the programs using such a library should be rebuilt, but
+   their new binaries should remain identical. The corresponding tasks should
+   have a different output hash because of the change in the hash of their
+   library dependency, but thanks to their output being identical, hash
+   equivalence will stop the propagation down the dependency chain.
+
+-  Native tool updates. Though the depending tasks should be rebuilt,
+   it's likely that they will generate the same output and be marked
+   as equivalent.
+
+This mechanism is enabled by default in Poky, and is controlled by two
+variables:
+
+-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote hash
+   equivalence server to use.
+
+-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
+
+Therefore, the default configuration in Poky corresponds to the
+below settings::
+
+   BB_HASHSERVE = "auto"
+   BB_SIGNATURE_HANDLER = "OEEquivHash"
+
+Another possibility is to share a hash equivalence server on a network,
+by setting::
+
+   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
+
+.. note::
+
+   The hash equivalence server needs to be maintained together with the
+   share state cache. Otherwise, the server could report shared state hashes
+   that do not exist.
+
+   We therefore recommend that one hash equivalence server be set up to
+   correspond with a given shared state cache.
+
 Automatically Added Runtime Dependencies
 ========================================