
For example, the fused multiply-add instruction is used to increase the performance and accuracy in dense linear algebra, collision detection instruction is suitable for the operation of binning in statistical calculations, and bit-masked instructions are designed for handling branches in vector calculations.

Modern vector extensions to the x86-64 architecture, such as AVX2 and AVX-512, have instructions developed to handle common computational kernels. Modern x86-64 CPUs are highly complex CISC architecture machines. The tests are performed on an Intel Xeon Platinum processor featuring the Skylake architecture with AVX-512 vector instructions.Ĭolfax_Compiler_Comparison.pdf (562 KB) Table of Contents In addition to measuring the performance, we interpret the results by examining the assembly instructions produced by each compiler.


This paper reports a performance-based comparison of six state-of-the-art C/C++ compilers: AOCC, Clang, G++, Intel C++ compiler, PGC++, and Zapcc.
