An ABI, or Application Binary Interface, defines a way for binaries to talk to each other on a given platform, and it includes (among other things) a calling convention. Most (all?) ABIs have a design flaw that harms performance.
Let’s start by looking at the System V ABI, for x86-style CPUs. The ABI classifies function arguments into a number of different categories; we’re just going to consider two:
INTEGER: This class consists of integral types that ﬁt into one of the general purpose registers
MEMORY: This class consists of types that will be passed and returned in memory via the stack
I won’t go finely over the rules that classify arguments; suffice to say that, in a general sense:
In other words, passing large structures by value entails large copies and This Makes Me Sad.
Well, what’s wrong with that? Surely we can just do what we did in the days of dumb compilers and pass structures by pointer. Unfortunately, that doesn’t work anymore; compilers are smart now, and they don’t like it when objects alias.
void foo(int*); void bar(void); int x = 5; foo(&x); // for all we know, foo could have stored &x in a global variable x = 7; bar(); // through which bar could modify x return x; // meaning that this needs to turn into an actual load; it cannot be constant-folded // (This would not have happened if x had been passed by value, but this is not viable for large structs, as we have seen.)
Restrict to the rescue! If foo’s parameter had been annotated with restrict, foo would not be allowed to alias it (C11§188.8.131.52p4,11). Unfortunately, compilers do not seem to generally be aware of this fact. More to the point, because there is no type-level enforcement of restrict in C, the veracity of the attribute cannot be counted upon in a general sense, even though it can be correct in cases when C’s ABI is used to communicate between languages with stronger type systems.
And really, the ABI should do the right thing by default. void foo(struct bla) is much easier to read than void foo(const struct bla *restrict), not to mention it does a better job of conveying intent and actually provides a stronger semantic guarantee.
Well, that’s System V. How do other ABIs fare? Microsoft’s is similar, but it passes structs with a pointer:
Structs or unions of [not small] sizes are passed as a pointer to memory allocated by the caller.
This gains you a bit of flexibility (though it also probably confuses the memory renamer a bit), but it doesn’t solve the actual problem. The ‘memory allocated by the caller’ is owned by the callee, who can modify it at will, so the caller still needs to spuriously copy.
More ABIs! ARM (sorry, AAA arch 64):
If the argument type is a Composite Type that is larger than 16 bytes, then the argument is copied to memory allocated by the caller and the argument is replaced by a pointer to the copy.
Aggregates larger than 2×XLEN bits [side note: why the hell are you talking about bits?] are passed by reference and are replaced in the argument list with the address
Arguments passed by reference may be modified by the callee.
All [non-homogeneous] aggregates are passed in consecutive GPRs, in GPRs and in memory, or in memory
Structs, unions, or other composite types are treated as a sequence of doublewords, and are passed in integer or floating point registers as though they were simple scalar parameters to the extent that they fit, with any excess on the stack packed according to the normal memory layout of the object
All of these are repetitions of the same two mistakes.
A correctly-specified ABI should pass large structures by immutable reference, usually obviating the copy. In the event that a copy is needed, it will generally happen only once, in the callee, rather than needing to be repeated by every caller. The callee also has more flexibility, and can copy only those portions of the structure that are actually modified.
Take heed, future ABI-makers, lest the angel of neuralgia take you!