


Our results using the Altera Stratix V 5SGXA7 FPGA indicate that, with FPGA-specific optimizations, it is possible to achieve up to 3.9x better power efficiency in comparison to an Nvidia K20C GPU. However, by exploiting the FPGA-specific optimizations, FPGA with OpenCL shows promising performance. Specifically, we find that multithreaded kernels typically used for GPUs do not perform as efficiently as those optimized with FPGA-specific optimizations such as sliding windows. Our results show that, while it is possible to use a common programming language available for other more-widely used accelerators in HPC, the implementation method optimal for FPGAs is significantly different from those for other accelerators such as GPUs. The results are presented for multiple versions of each benchmark, each with a varying degree of optimization for FPGAs, ranging from direct ports from the initial OpenCL implementation to loop-pipelined kernels specifically optimized for FPGAs.

Apple and Intel) and lots of ways to parallel program without OpenCL (to compare to).We evaluate the performance of a subset of the benchmarks available in the Rodinia Suite, using Altera’s OpenCL SDK and the Terasic DE5-Net FPGA board, equipped with an Altera Stratix V GXA7 FPGA, and present timing and power estimation results and comparison with a modern CPU and GPU. This field is much wider, as you have different OSes, even different OpenCL SDKs for the same processors (e.g. I intentionally didn't write anything about the other use cases of OpenCL, e.g. So they definitely play in the same league, and even if one is 5% faster than the other for some specific problem at the moment, I thing it would not make a difference in a general view.
OPENCL BENCHMARK CODE
are you able to make use of all the compute unitsįrom my tests, the answer to these questions - will my code use the hardware optimally - is yes for both frameworks.will the compiler create efficient code.will optimal code use all available memory bandwidth.What remains to be tested are things like: Thread scheduling is done in hardware, so they have the same performance there. Maybe you should approach this problem from another angle: what can you do with one of the frameworks that you can't do with the other? They both use the same drivers, so both will support fancy technologies that come out with new hardware. I dont think that that would be time well spent, though. You could choose a few and compare your results. There are the NVIDIA Code Examples, done in both CUDA and OpenCL.
OPENCL BENCHMARK PLUS
This is what you ask for, but I don't know anything like that in existence - people will choose either technology for their bigger projects and won't write everything twice. A Geekbench OpenCL benchmark score for Intels Iris Plus Graphics has been unearthed, and it seems the Ice Lake variant of the iGPU has performed very. Results will differ, so you would have to have a big test suite. But you won't be able to deduce from that experiment that you'll get similar results on another program, or with different hardware. The problem with benchmarks is obviously that you only can objectively evaluate very specific things - say, the same program done in CUDA and OpenCL, on the same hardware (as you named a source). Fair warning though, I might get some stuff wrong. Not being a performance/benchmarking expert I can only try to give you a few general thoughts on OpenCL vs. Just to stop speculating about this or that, I'd like to check everything by myself, but I need you to help me! Is there an OpenCL benchmark set, universally accepted, I can use to compare with native code? Is there an analogous of CUDA SDK written in OpenCL code?
OPENCL BENCHMARK WINDOWS
In the case of Pthread or Windows threads, I really have no idea, but I think that "generality" and multi-architecture approach will always have something to "pay". That is good, ok, but is there a loss of performance by migrating from a native programming library to OpenCL? In the case of nVidia GeForces, I've already found an article were two realizations of the same program - CUDA vs OpenCL code - were compared and the first one seemed to be more performant. OpenCL should be the first parallel architecture programming standard, and it'll be eventually adopted by the most part of programmers. I can read around a lot about OpenCL, and it seems to be the most promising (the only one?) multi-architecture library.
