When eInfochips was commissioned by a client to build a next-generation portable endoscopy system, the company had to invest considerable thought in choosing the appropriate image processing engine. Our client had recommended that we use a general purpose processor that would couple up as the device’s primary control unit, and multi-threading with image processing functions. The choice seemed obvious to our client because that is how the legacy systems of the product were architected. But we quickly realized there was a major challenge ahead—getting the processor to work with UltraHD/4K image resolution.
4K video content has been a revelation in terms of quality and image clarity, and the consumer electronics industry has led technology adoption when it comes to displays. The problem designing 4K video systems with general purpose processors is two-pronged. The first issue was that there are not many vendors in the market that would support 4K resolution imaging, in fact, there is just one. The second issue was that the processor was targeted for consumer applications with a market life of three years. It would be imprudent to choose this device, and have to redesign the product three years down the line, given the tedious, expensive, and time-consuming regulatory approval cycles.
Similar analysis on other general purpose processors and microcontrollers led us to the same conclusion; they may no longer be good enough for imaging applications. This is because as medical imaging advances, engineers and designers will have to move image processing functions to specialized processors to offer advanced features and disruptive performance. We, therefore, had three options for the road ahead: a DSP, an FPGA, and a GPU.
Digital Signal Processors (DSPs) are designed to measure, filter, and/or compress continuous real-world signals. They can fetch data/instructions simultaneously with low latency and excellent power efficiency. DSP-based products typically do not require specialized cooling, or large batteries. DSPs are best suited to portable devices. Some examples include the Texas Instruments K2E, Texas Instruments Hawking, Freescale IMX.6, and Blackfin. (See Figure 1)
Field Programmable Gate Arrays (FPGAs) are configured by a specialized Hardware Description Language (HDL). FPGA designs use high-speed I/O buses, so it is difficult, expensive, and time consuming to verify timing of valid data without proper floor planning to best allocate resources. FPGAs are flexible, though, and can be reprogrammed after deployment. Examples of FPGAs include Altera, Microsemi, Xilinx.
Graphics Processing Units (GPUs) accelerate the creation of images in a frame buffer for output with their parallel structure. They support programmable shaders, which can manipulate vertices and textures, oversampling and interpolation techniques, and very high-precision color spaces. They are ideal for constructing 3D models. (See Figure 2) These include the NVIDIA Tesla, NVIDIA Tegra, AMD Radeon.
Some chips, like the Qualcomm Snapdragon platform, incorporate multiple capabilities: a central processing unit, a DSP, and a GPU. Performance and efficiency were the most important parameters to consider when selecting the best platform.
3D imaging, imaging analytics, and 4K content are among the emerging trends in the medical devices industry. These trends point not only towards applications that are performance-hungry, but also towards hardware support for flawless operations.
The GPU topped the performance metric because of its multicore, parallel architecture, which can handle large parallel data operations with very low latency in real time. Traditionally GPUs have not been used for real-time low latency, but have had great success when used with computer tomography (CT) scans and neuro-imaging. There are also technologies that can enhance the real-time low-latency capabilities of GPUs. GPUs can manipulate and render graphics and have immense potential for neuroimaging and CT applications. The latest versions of GPUs, which have in excess of 3,000 cores, are being considered for use in servers for performing big data analytics for telemedicine and remote monitoring solutions.
DSPs were slower on the performance end because of resource sharing among different process threads. Even DSPs with multicore and VLIW (Very Long Instruction Word) architecture could not surpass the performance of the GPU.
Standalone FPGAs could match the performance of GPUs, but they require a large investment in development time for the project, simulation, and synthesis. With existing reference implementation time-frames for FPGAs the time consumed on the development is close to four or five times that of a DSP. (See Figure 3)
On the efficiency front, DSPs score considerably higher than GPUs because of their architecture. Moreover, they use a threading mechanism for running multiple processes using shared resources, which makes them more efficient than GPUs and FPGAs. DSPs traditionally have been used in image processing and medical instruments because of the ease of development, ARM controller availability on the systems-on-a-chip (SoC) to work as a standalone, and their real-time processing capability.
DSP power consumption ranges typically from 0.2 to 3 watts, FPGA from 0.007 to 10 Watts. The Idle state power consumption of a GPU varies from 2 to 10 watts and, in higher operating conditions, it could hit 30 watts. One of the reasons FPGAs’ power consumption is lower is that they can shut down certain sections of the chip not in use. This primarily depends on the efficiency with which your programmers build the code.
If users are looking for a large throughput in terms of data, and latencies in the order of nanoseconds, an FPGA could be a better option than multiple DSPs. This will not only improve efficiency but also reduce complexities arising from having multiple processors on a single PCB.
Our evaluation showed GPUs to be the best option for high-end performance followed by FPGAs and DSPs, in that order. On the efficiency front, microprocessors are the most energy-efficient devices on the table. Typically, FPGAs have higher energy consumption compared to DSPs, but they offer a high degree of flexibility in terms of throughput. The overall energy consumption for FPGAs depends on the application being ported. It may sometimes be less than that for DSPs.
The downside to using a GPU is that it needs to be coupled with a CPU, an FPGA, or a controller for data processing execution. This would add to the overall bill-of-materials for the system.
Pricing may be an important factor, especially when the device is to address emerging markets. DSPs are the least expensive platforms to adopt, followed by the GPU, even without considering the additional cost of the controlling device. FPGA tends to be the most expensive of the lot, owing to the on-field flexibility it offers.
If the application is expected to be portable/battery operated, a DSP device would be the ideal choice. DSPs offer sufficient performance for most medical imaging equipment. A GPU is more suited for powerful, high-throughput applications, often server-grade. In a telemedicine application, where X-ray machines update images to the cloud for processing and analytics, a GPU would be ideal as it would process them in near-real time.
An FPGA is the most flexible, and can be provisioned as a controller or a GPU or a DSP. It is ideal to have a custom mix of GPU, DSP, and microprocessor capabilities. Also, the extended debug cycles will delay the product to the point that the competition might come out with their products first. (See Table 1)
For the UltraHD 4K resolution imaging, we initially looked at a GPU as it offered the best performance. Soon we realized that we would need to split the frames into four HD images and process them independently. We could only utilize 4 of the 32 cores available on the GPU, ruling it out as the best option. Also, if we were to split the UltraHD image into 32 parts and have parallel execution on the 32 GPU cores, we would need a high-end FPGA to act as a controller. That would mean a significant hit on the overall device cost.
A high-end FPGA-based design by itself could encompass all the functionality required, but the design and verification cycle was too long for the client’s comfort. So, we opted for a simpler FPGA and a DSP to reduce design time. The FPGA could split the image into four 4K images, which could easily be processed by a DSP.
Note that we would have selected either a top-of-the-line FPGA or a combination of an FPGA and a GPU if the solution had to be made scalable to 4 X 4K resolution or even higher.
The choice of the application defines the processor core that you choose. For example, 3D imaging for medical applications is an ideal case of using a GPU to alter multiple 2D images to form a 3D image in real-time, as a physician examines the patient. With larger throughput requirements in terms of data and 3D panels being easily available, a GPU would be an ideal platform for the development of such imaging intensive applications.
This article was written by Renjith Ponnappan, a product manager at eInfochips, Sunnyvale, CA. For more information, Click Here " target="_blank" rel="noopener noreferrer">http://info.hotims.com/49750-160.