Unfortunately many compilers simply aren't up to the task of offering the efficiency advantages. Writing code with abstraction instead results in extra layers of function calling that slow down execution and, more importantly, separate blocks of code which would be optimised away if only they were brought together.
What follows is a short investigation of the compilation of a short piece of code that arose as a result of some real development.
Here is the function under consideration:
(See below for links to the full code.) operator*() has been overriden to provide the usual matrix/vector multiply operation and operator[]() accesses the elements of a vector or matrix in the way you might expect. A quick computation shows that this routine in fact returns 5*z. But because of these overrides the above code implicitly contains many function calls. For example there are 22 calls to operator[](). As a result it is a highly non-trivial task for the compiler to unpack all of these function calls, and perform the required algebra, to spot that the same result can be found with a single multiplication.
int test1(int z) { Matrix<Static<int,3,3> > m; Vector<Static<int,3> > x; m[0][0] = 1; m[0][1] = 0; m[0][2] = 0; m[1][0] = 0; m[1][1] = z; m[1][2] = 0; m[2][0] = 0; m[2][1] = z; m[2][2] = z; x[0] = 1; x[1] = 2; x[2] = 3; return (m*x)[2]; }
That's about 150 instrcutions. But actually it's much worse than it looks. Notice that it's actually calling a separate matrix times vector routine that hasn't been inlined. And that is where the actual guts of the code lie. What you've seen above is merely the code to set up a call to this routine. It's astonishingly bad.
g++ -O5 -S -o main.S main.cpp pushl %ebp movl $1, %eax movl %esp, %ebp pushl %esi leal -56(%ebp), %ecx pushl %ebx subl $240, %esp ... 120 lines deleted... movl %edx, 4(%esp) call __ZmlI6StaticIiLi3ELi3EES0_IiLi3ELi0EEE6VectorI17TimesMatrixVectorIT_T0_EERK6MatrixIS5_ERKS3_IS6_E movl %eax, -224(%ebp) movl %edx, -220(%ebp) movl -224(%ebp), %edx movl -220(%ebp), %ecx addl $24, %edx .p2align 4,,15 L56: movl (%ecx), %eax addl $4, %ecx imull (%edx), %eax addl $4, %edx addl %eax, %esi decl %ebx jns L56 addl $240, %esp movl %esi, %eax popl %ebx popl %esi popl %ebp ret
We can give gcc some clues by asking it to inline functions and unroll all loops. This trims about 5% off the length of this function.
Note that the core of this routine is the single lea instruction that performs an integer multiply by five.
cl /GX /Ox /Fa main.cpp mov eax, DWORD PTR _z$[esp-4] lea eax, DWORD PTR [eax+eax*4] ret 0