When eInfochips was commissioned by a client to build a next-generation portable endoscopy system, the company had to invest considerable thought in choosing the appropriate image processing engine. Our client had recommended that we use a general purpose processor that would couple up as the device’s primary control unit, and multi-threading with image processing functions. The choice seemed obvious to our client because that is how the legacy systems of the product were architected. But we quickly realized there was a major challenge ahead—getting the processor to work with UltraHD/4K image resolution.

Similar analysis on other general purpose processors and microcontrollers led us to the same conclusion; they may no longer be good enough for imaging applications. This is because as medical imaging advances, engineers and designers will have to move image processing functions to specialized processors to offer advanced features and disruptive performance. We, therefore, had three options for the road ahead: a DSP, an FPGA, and a GPU.
Digital Signal Processors (DSPs) are designed to measure, filter, and/or compress continuous real-world signals. They can fetch data/instructions simultaneously with low latency and excellent power efficiency. DSP-based products typically do not require specialized cooling, or large batteries. DSPs are best suited to portable devices. Some examples include the Texas Instruments K2E, Texas Instruments Hawking, Freescale IMX.6, and Blackfin. (See Figure 1)
Field Programmable Gate Arrays (FPGAs) are configured by a specialized Hardware Description Language (HDL). FPGA designs use high-speed I/O buses, so it is difficult, expensive, and time consuming to verify timing of valid data without proper floor planning to best allocate resources. FPGAs are flexible, though, and can be reprogrammed after deployment. Examples of FPGAs include Altera, Microsemi, Xilinx.
Graphics Processing Units (GPUs) accelerate the creation of images in a frame buffer for output with their parallel structure. They support programmable shaders, which can manipulate vertices and textures, oversampling and interpolation techniques, and very high-precision color spaces. They are ideal for constructing 3D models. (See Figure 2) These include the NVIDIA Tesla, NVIDIA Tegra, AMD Radeon.
Some chips, like the Qualcomm Snapdragon platform, incorporate multiple capabilities: a central processing unit, a DSP, and a GPU. Performance and efficiency were the most important parameters to consider when selecting the best platform.
Performance

The GPU topped the performance metric because of its multicore, parallel architecture, which can handle large parallel data operations with very low latency in real time. Traditionally GPUs have not been used for real-time low latency, but have had great success when used with computer tomography (CT) scans and neuro-imaging. There are also technologies that can enhance the real-time low-latency capabilities of GPUs. GPUs can manipulate and render graphics and have immense potential for neuroimaging and CT applications. The latest versions of GPUs, which have in excess of 3,000 cores, are being considered for use in servers for performing big data analytics for telemedicine and remote monitoring solutions.
DSPs were slower on the performance end because of resource sharing among different process threads. Even DSPs with multicore and VLIW (Very Long Instruction Word) architecture could not surpass the performance of the GPU.
Standalone FPGAs could match the performance of GPUs, but they require a large investment in development time for the project, simulation, and synthesis. With existing reference implementation time-frames for FPGAs the time consumed on the development is close to four or five times that of a DSP. (See Figure 3)
Efficiency

DSP power consumption ranges typically from 0.2 to 3 watts, FPGA from 0.007 to 10 Watts. The Idle state power consumption of a GPU varies from 2 to 10 watts and, in higher operating conditions, it could hit 30 watts. One of the reasons FPGAs’ power consumption is lower is that they can shut down certain sections of the chip not in use. This primarily depends on the efficiency with which your programmers build the code.
If users are looking for a large throughput in terms of data, and latencies in the order of nanoseconds, an FPGA could be a better option than multiple DSPs. This will not only improve efficiency but also reduce complexities arising from having multiple processors on a single PCB.
Decision Point

The downside to using a GPU is that it needs to be coupled with a CPU, an FPGA, or a controller for data processing execution. This would add to the overall bill-of-materials for the system.
Pricing may be an important factor, especially when the device is to address emerging markets. DSPs are the least expensive platforms to adopt, followed by the GPU, even without considering the additional cost of the controlling device. FPGA tends to be the most expensive of the lot, owing to the on-field flexibility it offers.
If the application is expected to be portable/battery operated, a DSP device would be the ideal choice. DSPs offer sufficient performance for most medical imaging equipment. A GPU is more suited for powerful, high-throughput applications, often server-grade. In a telemedicine application, where X-ray machines update images to the cloud for processing and analytics, a GPU would be ideal as it would process them in near-real time.
An FPGA is the most flexible, and can be provisioned as a controller or a GPU or a DSP. It is ideal to have a custom mix of GPU, DSP, and microprocessor capabilities. Also, the extended debug cycles will delay the product to the point that the competition might come out with their products first. (See Table 1)
Selection Summary
For the UltraHD 4K resolution imaging, we initially looked at a GPU as it offered the best performance. Soon we realized that we would need to split the frames into four HD images and process them independently. We could only utilize 4 of the 32 cores available on the GPU, ruling it out as the best option. Also, if we were to split the UltraHD image into 32 parts and have parallel execution on the 32 GPU cores, we would need a high-end FPGA to act as a controller. That would mean a significant hit on the overall device cost.
A high-end FPGA-based design by itself could encompass all the functionality required, but the design and verification cycle was too long for the client’s comfort. So, we opted for a simpler FPGA and a DSP to reduce design time. The FPGA could split the image into four 4K images, which could easily be processed by a DSP.
Note that we would have selected either a top-of-the-line FPGA or a combination of an FPGA and a GPU if the solution had to be made scalable to 4 X 4K resolution or even higher.
The choice of the application defines the processor core that you choose. For example, 3D imaging for medical applications is an ideal case of using a GPU to alter multiple 2D images to form a 3D image in real-time, as a physician examines the patient. With larger throughput requirements in terms of data and 3D panels being easily available, a GPU would be an ideal platform for the development of such imaging intensive applications.
This article was written by Renjith Ponnappan, a product manager at eInfochips, Sunnyvale, CA. For more information, Click Here " target="_blank" rel="noopener noreferrer">http://info.hotims.com/49750-160.

