Question about divergence and branch instructions when coding to Intel Graphics.

I looked and I'm not sure if there is a better forum for this, so I'll ask here. If there is, please let me know and I'll move my question.

I'm quite new at developing for the GPU in general, and so some topics are still a bit new/confusing for me. One of them is the notion of divergence when it comes to branch instructions: how the SIMD EU can come become "stalled" when different kernel instances take different paths (please check my terminology here too). It's my understanding that, with NVIDIA, there's a notion of a "wavefront" across the the EU, whereby the individual threads are okay if they're each executing the same instruction; if that no longer becomes the case, then some threads are stalled. This means the SIMD lanes aren't maximally used and you can run into situations where performance fails.

My high-level question is this: when programming for Intel's GPGPU, is this still the case? Or does it become more of an issue of instruction cache being not optimally used? Or both? Or neither? This article seems to indicate this is an "issue" with the Intel GPGPU as well: https://software.intel.com/en-us/node/540425. And I understand it's not really an issue, but just the way GPUs work, possibly.

I ask because I am trying to put a decision tree (a machine learning technique) on the GPU. Not the training, but the use of the decision tree. I'm using it for computer vision and so each pixel in an image I run down the decision tree. The decision tree is a binary tree, and at each node there's a question asked of the pixel; the answer causes slightly different logic to be executed. Then we branch left or right until we reach an end node. I figured I would gain massive speed increases by putting this on the GPU. I was wrong. I see little speed improvements. The problem, at least as far as I can tell, is that this is precisely a worst-case scenario for the GPU because different threads will be executing slightly different logic based on the particular pixel they're working on. Therefore causing massive stalling as divergence and merging occurs. But, again, the world of CPU and GPU seems to be blurring more and more everyday, and so perhaps I'm wrong and this isn't a good reason why my code may be executing much slower than expected.

Thoughts? Again, I apologize if this is the wrong forum.

Question about divergence and branch instructions when coding to Intel Graphics.

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112