Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

loop was unrolled by 2: is it sufficient?

$
0
0

Greetings,

I use MSVC and /QxHOST on Haswell (AVX-256).

I have code under MSVC that is using __m256 type for my own memcpy, and ICC generates correct result, and it is working well.

But when I look at the assembler output, is it sufficient to unroll ONLY by 2 ?! when I have:

#define PACKET_SIZE_MIN             128
#define PACKET_SIZE_AVG             512
#define PACKET_SIZE_MAX             2048

...

#if defined(__INTEL_COMPILER)
#   pragma loop count min(PACKET_SIZE_MIN) avg(PACKET_SIZE_AVG) max(PACKET_SIZE_MAX)
#endi
#   pragma unroll

and the assembler output reads:

.B1.8::                         ; Preds .B1.6 .B1.8
L4::            ; optimization report
                ; LOOP WAS UNROLLED BY 2
                ; %s was not vectorized: operation cannot be vectorized
$LN15:
  00022 48 ff c1         inc rcx                                ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN16:
  00025 c5 fe 6f 04 10   vmovdqu ymm0, YMMWORD PTR [rax+rdx]    ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN17:
  0002a c5 fe 6f 4c 10
        20               vmovdqu ymm1, YMMWORD PTR [32+rax+rdx] ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.14
$LN18:
  00030 c4 a1 7e 7f 04
        08               vmovdqu YMMWORD PTR [rax+r9], ymm0     ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN19:
  00036 c4 a1 7e 7f 4c
        08 20            vmovdqu YMMWORD PTR [32+rax+r9], ymm1  ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:149.9
$LN20:
  0003d 48 83 c0 40      add rax, 64                            ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN21:
  00041 49 3b c8         cmp rcx, r8                            ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN22:
  00044 72 dc            jb .B1.8 ; Prob 63%                    ;c:\Users\vdmn1.vdmn\Documents\develop\Recorder7.1\Recorder7_Processor\src\wav/my_frame.h:145.5
$LN23:
                                ; LOE rax rdx rcx rbx rbp rsi rdi r8 r9 r10 r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15

 

PS: I need to "#undef" the "min" and the "max" because of MSVC defining these symbols in the other way...

TIA, best


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>