Greetings,
I use MSVC and /QxHOST on Haswell (AVX-256).
I have code under MSVC that is using __m256 type for my own memcpy, and ICC generates correct result, and it is working well.
But when I look at the assembler output, is it sufficient to unroll ONLY by 2 ?! when I have:
#define PACKET_SIZE_MIN 128 #define PACKET_SIZE_AVG 512 #define PACKET_SIZE_MAX 2048 ... #if defined(__INTEL_COMPILER) # pragma loop count min(PACKET_SIZE_MIN) avg(PACKET_SIZE_AVG) max(PACKET_SIZE_MAX) #endi # pragma unroll
and the assembler output reads:
.B1.8:: ; Preds .B1.6 .B1.8 L4:: ; optimization report ; LOOP WAS UNROLLED BY 2 ; %s was not vectorized: operation cannot be vectorized $LN15: 00022 48 ff c1 inc rcx ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5 $LN16: 00025 c5 fe 6f 04 10 vmovdqu ymm0, YMMWORD PTR [rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14 $LN17: 0002a c5 fe 6f 4c 10 20 vmovdqu ymm1, YMMWORD PTR [32+rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14 $LN18: 00030 c4 a1 7e 7f 04 08 vmovdqu YMMWORD PTR [rax+r9], ymm0 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9 $LN19: 00036 c4 a1 7e 7f 4c 08 20 vmovdqu YMMWORD PTR [32+rax+r9], ymm1 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9 $LN20: 0003d 48 83 c0 40 add rax, 64 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5 $LN21: 00041 49 3b c8 cmp rcx, r8 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5 $LN22: 00044 72 dc jb .B1.8 ; Prob 63% ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5 $LN23: ; LOE rax rdx rcx rbx rbp rsi rdi r8 r9 r10 r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
PS: I need to "#undef" the "min" and the "max" because of MSVC defining these symbols in the other way...
TIA, best