Hi guys,
I'm working on a compiler abstraction to provide more loop information to compilers. In order to get optimized code by using pragmas, we implemented something for TI as follows.
#define PRAGMA(x) _Pragma(#x)
#define LOOP_COUNT_INFO(_min_n, _multiple) \
PRAGMA(MUST_ITERATE(_min_n, , _multiple))
As a result, we can add the following code in the following loop if we know the minimum loop count is 32 and it is multiple of 4.
void test_loop_count(float*__restrict a, float*__restrict b, float*__restrict c, int n)
{
int i;
LOOP_COUNT_INFO(32, 4)
for (i = 0; i < n; i++)
{
c[i] = a[i] * b[i];
}
}
Now we are trying to implement similar things in Intel icc/icl. In order to use PRAGMA, we only implemented as follows:
#define DLB_LOOP_COUNT_INFO(_min_n, _multiple) \
PRAGMA(loop_count(_multiple))
However, there are quite a lot of discussions in the team about whether it should be PRAGMA(loop_count(_multiple)) or PRAGMA(loop_count(_min_n)).
From http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizati...
#pragma loop count (n) may be used to advise the compiler of the typical trip
count of the loop. This may help the compiler to decide whether vectorization is
worthwhile, or whether or not it should generate alternative code paths for the loop.
and:
https://software.intel.com/sites/products/documentation/doclib/iss/2013/...
#pragma loop_count(n)
#pragma loop_count=n
(n) or =n
Non-negative integer value. The compiler will attempt to iterate the next loop the number of times specified in n; however, the number of iterations is not guaranteed.
Could you give a doubtless answer about it to us whether it should be PRAGMA(loop_count(_multiple)) or PRAGMA(loop_count(_min_n))?
From the code:
#pragma loop_count min(n),max(n),avg(n)
#pragma loop_count min=n, max=n, avg=n
It seems there's no multiple in this pragma. If that's true, it makes the compiler generate more code to cover the cases that the loop count is not multiple of n. Is there a way to do that from a pragma?
Thanks,
Richard