Greetings,
I use MSVC and /QxHOST on Haswell (AVX-256).
I have code under MSVC that is using __m256 type for my own memcpy, and ICC generates correct result, and it is working well.
But when I look at the assembler output, is it sufficient to unroll ONLY by 2 ?! when I have:
#define PACKET_SIZE_MIN 128 #define PACKET_SIZE_AVG 512 #define PACKET_SIZE_MAX 2048 ... #if defined(__INTEL_COMPILER) # pragma loop count min(PACKET_SIZE_MIN) avg(PACKET_SIZE_AVG) max(PACKET_SIZE_MAX) #endi # pragma unroll
and the assembler output reads:
.B1.8:: ; Preds .B1.6 .B1.8
L4:: ; optimization report
; LOOP WAS UNROLLED BY 2
; %s was not vectorized: operation cannot be vectorized
$LN15:
00022 48 ff c1 inc rcx ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN16:
00025 c5 fe 6f 04 10 vmovdqu ymm0, YMMWORD PTR [rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN17:
0002a c5 fe 6f 4c 10
20 vmovdqu ymm1, YMMWORD PTR [32+rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN18:
00030 c4 a1 7e 7f 04
08 vmovdqu YMMWORD PTR [rax+r9], ymm0 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN19:
00036 c4 a1 7e 7f 4c
08 20 vmovdqu YMMWORD PTR [32+rax+r9], ymm1 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN20:
0003d 48 83 c0 40 add rax, 64 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN21:
00041 49 3b c8 cmp rcx, r8 ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN22:
00044 72 dc jb .B1.8 ; Prob 63% ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN23:
; LOE rax rdx rcx rbx rbp rsi rdi r8 r9 r10 r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
PS: I need to "#undef" the "min" and the "max" because of MSVC defining these symbols in the other way...
TIA, best