Why do patches appear to compose?

Let a patch system be a set of states, a set of patches, a total diff function -, and a partial patch function +.
Suppose our code is in its initial state, A.
We apply some notional change to A to get B, and apply some other notional change to A to get C.
Composition is this: we want to get the state that has both notional changes applied to it.
So we take a diff B - A, and apply it to C: C + (B - A).
The composability property of a patch system refers to its capacity to ensure that C + (B - A) is what we wanted--a state which is like A, but has the notional changes of both B and C with respect to it.

The more _complete_ a composability property is, the more cases there are in which + is defined (i.e., there are no 'merge conflicts', or they can automatically be resolved).
The more _sound_ a composability property is, the more cases there are in which, if + is well-defined, then its result is actually what we want.
There are many trivially complete patch systems (e.g. A + P = A), and a trivially sound one (+ is never defined), but neither of these is useful, and we can never be totally sound and complete (because some changes are incommensurable).
We could try to formalise these, but I'm treating them mostly as fuzzy, informal properties (especially soundness, which is the more interesting one), not least because I haven't formally defined what 'what we want' means.

Obviously, existing patch systems (such as diff(1)/patch(1)) are neither sound nor complete.
A simple demonstration of unsoundness which is very easy to reproduce in all patch systems I know of: suppose that A contained some behaviour which was replicated across two subroutines; B changed the behaviour in both places where it appeared; C added two more subroutines in which the behaviour was also supposed to be duplicated.
The desired behaviour is that C + (B - A) should have the the new behaviour replicated across all four routines, but existing patch systems will report no conflicts while producing a final state in which the first two subroutines have the new behaviour but the second two have the old behaviour.

Nevertheless, in practice, patches _do_ very frequently appear to compose.
This is important because our software development practices tend to assume some degree of composability--software is not developed in lockstep.
I have the following questions:

- Why are existing patch systems so often sound?
  - Is it a product of the way we tend to write code?
    - (Like how inline caching exploits the fact that, even in heavily dynamic and polymorphic languages, most code is fairly monomorphic and lower-order.)
  - Similarly, even though code is not per se developed in lockstep, because ultra-fine-grained coordination is difficult, interfaces tend to be more-or-less developed in lockstep and isolation, with inter-interface changes happening more rarely, and either in lockstep as well or otherwise with explicit coordination between the parties involved.
- In what situations are patch systems unsound?
- Can we exploit unsoundness in order to smuggle vulnerabilities past open-source maintainers into open-source projects?
  - Look for 'unsoundness gadgets'--which probably generally require some sort of redundancy?
  - Contribute correct code which leaves itself vulnerable to patch composition problems in the future when other people try to contribute?
    - (Like the warning trick--todo explain this since I haven't been able to find a link to the original.)
    - In this case we are trying to create unsoundness gadgets, or the opportunity for them, so probably we just want redundancy => poorly designed interfaces in general?
- How can we increase the soundness and completeness of our patch systems?

Some obvious but important observations:

- Increasing soundness at the expense of completeness is probably good, but not enough on its own--the patch system has to give good feedback on what the problems actually are when it rejects a patch.  In the above example, where a behavioural change was supposed to be replicated across two newly-added subroutines, if the patch function only flags one of them, then that is probably better than extant patch systems (and that might cause a human auditor to notice that the other also needs fixed), but we would like to do better.
- A maintainer audits a patch; they should ideally have an understanding of the code changed by the patch and of why the changes made by the patch do what they purport to.  Put another way: forget C and C + (B - A); B itself could be buggy.  Writing and reading a set of code changes are different, but they aren't _that_ different.  So the difference between composability soundness problems and straight up ... bugs seems very subtle.
  - (E.G. in the example, somebody producing B could have erroneously made their change to only one of the attendant subroutines, forgetting the other.)
  - That's a good thing, because it means that, ideally, we can use the same tools to help deal with composability soundness bugs and other sorts of bugs.  But I think it is worth giving some special attention to patch composability, because [oh.  apparently I never finished writing this sentence.  I have no idea what I was going to say.]

Idempotence (real idempotence, not fake idempotence) seems like a useful property.
E.G., if the source code contains a set S = {X, Y}, and one patch says 'add Z' and another patch says 'remove Y and add Z', it's not difficult to see that, after applying both patches, the result should be S = {X, Z}; existing patch systems will reject this, but one with semantic knowledge of what sets are would be able to produce the desired result because both adding and removing a given element from a set are idempotent operations.
(On the other hand, if one patch says 'add X', and another says 'remove X', there is a conflict.)

Similarly, suppose we have a formal specification; then, if two patches contain only interface additions and removals, then it is highly likely that composing them is sound.
But if two patches both _change_ the same component of an interface, then it is not unlikely that there were some unwanted interactions (of course, we have to decide what counts as 'same').
At the same time, having both a specification and an implementation provides valuable redundancy, because interfaces have to be both specified and implemented, and both the specification and the implementation constrain behaviour in the same ways; if we can find a way to apply both patches such that the specification and the implementation agree--i.e., there are no type errors--then it is more likely that the result is correct.
(And if there are type errors, then of course the patch should be rejected, and the type errors are likely then the feedback we are interested in.)
(Of course, one putative advantage of such type systems is that the implementation can supply definitions automatically, removing this redundancy--program synthesis.)

Most people are not willing to formally specify their code.
But perhaps we can substitute strong static analyses.
Extremely fine-grained incrementality and also modularity are critically important and overlooked features for optimising compilers (half-written-about elsewhere).
In a very rough and general sense, this means the following: some properties will hold for a component Q, and another component P which uses Q will assume that those properties hold.
When we make a change to the codebase, we will check if it maintains the assumptions we made about Q; if it does not, then we must revise our understanding of P.
Now, observe: in general, we want the assumptions that we make about Q to be as general as possible while still being useful.
If we assume more than we have to, then we will needlessly end up re-analysing P.
Thinking back to the stronger specification, and the 'way we tend to write code' note, it seems likely that some components will naturally have to make a lot of assumptions about each other, while others will not have to make so many assumptions about each other's internal implementation details.
If we have a clump of components that make a lot of assumptions about each other, and are only weakly connected to other components, then perhaps this clump 'implements an interface', and we can be relaxed about changes internal to the clump, while being more aggressive about flagging changes that extend outside of the clump.
('Modular analyses' (cousot et cousot, of course) is the paper everyone cites for modular analysis but it is mostly obvious and very boring and has far too many Greek letters.)
The existence of strong abstraction boundaries as a language-level feature is significant and can be exploited (insert nock rant), but is not enough--not least because abstractions can--and, on occasion, should--be low level.

Another way to define a patch system is that a patch is always against a particular state, and can only be applied to that state--for instance, B - A can only be applied to A--and we have a transfer function (call it -> or something); so we cannot say C + (B - A), but must say C + ((B - A) -> C).
(Or maybe it's C + ((B - A) -A-> C) or C + ((B - A) -(C - A)-> C) or something; whatever.)
This seems to afford some separation of concern, but I'm not sure of the implications.

There is a question of what should actually happen when a patch fails to apply.
Perhaps, rather than making + partial, we can make it total but say it sometimes produces an 'invalid' state (whatever that means).
There is likely inspiration to be taken from hazel.
Another hint comes from commutativity: it seems like it would be nice to guarantee A'' = (A + (A'' - A')) + (A' - A), even if A + (A'' - A') is invalid (because A'' depends on A').
That suggests that maybe, if B and C conflict, then merging them involves creating a patch P such that our result is (C + (B - A)) + P where P 'could have gone' before B - A.
But perhaps something more general is needed, since in the former case we had A' as an ostensibly coherent state, whereas there might not even be a sensibly coherent C + P.