One of the themes that ran through this year’s Intel Software Conference, in EMEA, was programmer productivity. The event took place in Seville in April and gave invited resellers and journalists an opportunity to learn more about Intel’s tools for high-performance computing (HPC), parallel programming, cross-platform development, and video processing.
“Scaling is a big deal and power consumption is talked about a lot,” said James Reinders, chief evangelist of Intel Software products, opening the day. “But one challenge that isn’t talked about enough is programmer productivity.”
This means not only making it easier for programmers to get things done, but also preserving their investment in skills and knowledge as the technology evolves. The Intel® Xeon PhiTM product family, for example, offers up to 61 processor cores and is designed to only run parallel programs well. Yet, it still uses the same programming tools and models as the Intel® Xeon® products, avoiding the need for programmers to learn a whole new technology. This is also why Intel works hard at standards compliance, together with other companies and standards bodies, to ensure that code is portable between architectures.
Throughout the course of the event, there were opportunities to hear about several tools that can help to increase productivity. Laurent Duhem, senior application engineer, presented the new Intel® Advisor XE 2016 Beta for vectorization. This helps to identify where programs can use single instruction multiple data (SIMD) code, which can run the same calculation across a number of different data items simultaneously. The tool helps to ensure correctness by simulating vectorized loops and checking for any memory conflicts, and enables developers to more quickly identify where the program is spending most of its time (including the number of times a loop is called), so that these hot sections can be optimized. The tool offers hints for improving vectorization, and advice on where vectorization might be inefficient because of non-contiguous memory accesses. Vectorization is a difficult challenge, but this new tool provides guidance at each step to make it as easy as possible. You can download the beta now.
Tackling the multiplatform challenge
In the case of vectorization, productivity challenges might be said to arise from hardware complexity. In consumer software development, productivity is more likely to be challenged by the diverse range of operating systems, form factors and processor architectures that make up the device landscape. Intel® Integrated Native Developer Experience (INDE) is a suite of tools that enables programmers to write fast C++ code that targets multiple operating systems and architectures, making it easier to ship applications more quickly. Alex Weggerle, technical consulting engineer, explained how it integrates with your existing developer environment and introduced its key features. For example, it includes Intel® Hardware Accelerated Execution Manager (Intel HAXM), which uses virtualization technology to run a full-speed Android emulation. That enables developers to more quickly test a wide range of device sizes and types. The Graphics Frame Debugger eliminates the need to push updated OpenGL code to the target device for testing each time a change is made (a process of 5 to 10 minutes), so you can instead take a screenshot and instantly see any code changes applied to that screenshot. Alex also presented the Intel® XDK, a free HTML5 cross-platform development tool, that includes templates to help you get started quickly, and the Apache Cordova* APIs to enable cross-platform access to phone hardware features.
Parallel programming more effectively
Intel® Parallel Studio XE 2016 Composer Edition is available now as an open beta. Heinz Bast, technical consulting engineer, introduced some of the new features in this tool suite, which is designed to support programmers as they develop parallel programs to make optimal use of the hardware. It offers improved vectorization using Intel® CilkTM Plus and OpenMP* 4.0, with some features from the upcoming OpenMP* 4.1 already implemented. Reinders said that one of the things that excites him about OpenMP is that it helps obtain vectorization while leaving the actual code relatively intact, making it an efficient way to improve performance while keeping the code looking like the original science of the application. The new Intel Parallel Studio XE 2016 tool suite introduces loop blocking, so that data can be chunked for processing to avoid cache misses, and array reductions to avoid the bottleneck of turning off SIMD where there are data dependencies within a loop. The new annotated source listing inserts compiler diagnostics after the corresponding lines, making it easier to see what the compiler has done.
Offloading to GPUs
As more and more sophisticated graphics capabilities have been added to Intel® processors, they have become a key compute resource, with performance exceeding that of the CPU cores by up to 8 times. Heinz explained how work can be offloaded to the Intel® HD Graphics execution units using annotations the Intel® C/C++ Compiler. He said a number of customers have been asking for this capability, and that Intel had chosen to support standards rather than building its own proprietary language extensions.
Faster compilation
There are some changes that make the compilers more time-efficient too. Intel® Fortran Compiler XE 2016 has been improved with the implementation of submodules. Previously, if you made changes to a module you had to recompile not just that module but also any other modules that call it. In a project of three million lines of code, that could cause a significant delay. With submodules, that’s no longer necessary as long as the interface between the submodule and other modules is unchanged. Intel® C/C++ Compiler 16.0 implements a number of compile time improvements, including disabling intrinsics for prototypes (which were rarely used, Bast said) by default.
Accelerating video processing
The conference’s final session considered a different challenge, the rise of video streaming and download. Starting with the 5th Generation Intel® CoreTM processor, Intel has included hardware acceleration for video with functions built in that enable accelerated encoding and decoding of video. The Intel® Media SDK enables application developers to use those capabilities in their software, making it easier to make applications for visual analysis, media transcoding, and graphics in the cloud (including hosted desktops and cloud gaming). Intel® Media Server Studio can be used to generate random Intel® Stress Bitstreams for testing the architecture and also includes tools for analyzing, encoding and decoding video. As the resolution of video increases (4K is expected to be widespread by the time of the next World Cup in 2018), hardware-accelerated encoding and decoding will become increasingly important to deliver a good user experience.
To find out more about Intel tools for software developers, visit the Intel Developer Zone.
Imagen del icono:
