Fun with Intel® Transactional Synchronization Extensions
By now, many of you have heard of Intel® Transactional Synchronization Extensions (Intel® TSX). If you have not, I encourage you to check out this page (http://www.intel.com/software/tsx) before you...
View ArticleImproving Discrete Cosine Transform performance using Intel(R) Cilk(TM) Plus
DCT and Quantization are the first two steps in JPEG compression standard. This article demonstrates how DCT and Quantizing stages can be implemented to run faster using Intel® Cilk™ Plus. In order to...
View ArticleHands-on Lab: Optimizing Monte Carlo on Intel(R) Xeon Phi(tm) Coprocessor
IntroductionThis lab was developed for the Intel(R) Xeon Phi(tm) Technology Conference held in May 8-9 2013 in the United Kingdom, which was attended by multiple Financial Services Institutions.In this...
View ArticleBest Known Method: Avoid heterogeneous precision in control flow calculations
Best Known MethodRunning an MPI program in symmetric mode on an Intel® Xeon® host and an Intel Xeon Phi™ coprocessor may deadlock in specific cases due to the heterogeneous precision in replicated...
View Article_mm256_hadd_pd
Adds horizontal pairs of float64 elements of two vectors. The corresponding Intel® AVX instruction is VHADDPD.Syntaxextern __m256d _mm256_hadd_pd(__m256d m1, __m256d m2);Argumentsm1float64 vector used...
View Article_mm256_addsub_ps
Adds odd float32 elements and subtracts even float32 elements of vectors. The corresponding Intel® AVX instruction is VADDSUBPS.Syntaxextern __m256 _mm256_addsub_ps(__m256 m1, __m256...
View Article_mm256_addsub_pd
Adds odd float64 elements and subtracts even float64 elements of vectors. The corresponding Intel® AVX instruction is VADDSUBPD.Syntaxextern __m256d _mm256_addsub_pd(__m256d m1, __m256d...
View Article_mm256_add_ps
Adds float32 vectors. The corresponding Intel® AVX instruction is VADDPS.Syntaxextern __m256 _mm256_add_ps(__m256 m1, __m256 m2);Argumentsm1float32 vector used for the operationm2float32 vector also...
View Article_mm256_add_pd
Adds float64 vectors. The corresponding Intel® AVX instruction is VADDPD.Syntaxextern __m256d _mm256_add_pd(__m256d m1, __m256d m2);Argumentsm1float64 vector used for the operationm2float64 vector also...
View ArticleIntrinsics for Arithmetic Operations
Parent topic: Intrinsics for Intel® Advanced Vector Extensions_mm256_add_pd Adds float64 vectors. The corresponding Intel® AVX instruction is VADDPD._mm256_add_ps Adds float32 vectors. The...
View ArticleDetails of Intel® Advanced Vector Extensions Intrinsics
Intel® Advanced Vector Extensions (Intel® AVX) intrinsics map directly to Intel® AVX instructions and other enhanced 128-bit single-instruction multiple data processing (SIMD) instructions. Intel® AVX...
View ArticleOverview: Intrinsics for Intel® Advanced Vector Extensions Instructions
Intel® Advanced Vector Extensions (Intel® AVX) intrinsics are assembly-coded functions that call on Intel® AVX instructions, which are new vector SIMD instruction extensions for IA-32 and Intel® 64...
View ArticleIntrinsics for Intel® Advanced Vector Extensions
Parent topic: IntrinsicsOverview: Intrinsics for Intel® Advanced Vector Extensions InstructionsDetails of Intel® Advanced Vector Extensions IntrinsicsIntrinsics for Arithmetic OperationsIntrinsics for...
View ArticleFunction Prototype and Macro Definitions
Function Prototype and Macro Definitions for RTMThe following function prototypes are included in the immintrin.h header file:unsigned int _xbegin(void); void _xend(void); void _xabort(const unsigned...
View ArticleHLE Release _Store Functions
Stores the specified value at the specified address and releases pending active HLE transaction. This intrinsic function applies to C/C++ applications for Windows* OS only.Syntaxvoid...
View ArticleHLE Release _InterlockedExchangeAdd Functions
Performs an atomic addition of two values and releases pending active HLE transaction. This intrinsic function applies to C/C++ applications for Windows* OS only.Syntaxlong...
View ArticleHLE Release _InterlockedCompareExchange Functions
Performs an atomic compare-and-exchange operation on the specified values and releases pending active HLE transaction. This intrinsic function applies to C/C++ applications for Windows* OS...
View ArticleIntrinsics for Hardware Lock Elision Operations
Parent topic: Intrinsics for Intel® Transactional Synchronization Extensions (Intel® TSX)Hardware Lock Elision OverviewHLE Acquire _InterlockedCompareExchange Functions Performs an atomic...
View ArticleAnalyse the single-threaded Stream benchmark's behaviour on Intel® Xeon®...
The STREAM benchmark (http://www.cs.virginia.edu/stream/) a synthetic benchmark program, written in standard Fortran 77 (with a corresponding version in C). It measures the the performance of four long...
View ArticleLarge Page Considerations
Compiler Methodology for Intel® MIC ArchitectureLarge Page ConsiderationsUse THP enabled by default in the MPSS Operating System:MPSS versions later than 2.1.4982-15 support “Transparent Huge Pages...
View Article