Major medical imaging vendors are currently ramping up their artificial intelligence (AI) activities, and the market is expected to see a massive CAGR of 40 percent and more over the next few years.1 AI can, for example, effect improvements in stroke detection and diagnosis software and analysis tools to measure blood flow in non-invasive coronary exams.2

Because of AI’s capability to detect cancer at an early stage, further application areas can be found in tumor development tracking to improve patients’ life expectancy. The increasing application of AI in diagnostics and medical imaging will drive growth and change the user experience to a more phenotypic characterization of images, including medical inference workflows that are prioritized by AI.

Executing AI on Standard Hospital IT

Quite a few of these applications can be executed on hospitals’ standard picture archiving and communication systems (PACS) as well as radiology information systems (RIS), and the integration is simple and won’t require significant IT time.3 Often, no additional hardware is required for AI-supported hospital workflows to analyze standard radiology data from heads, chests, spines, abdomens, etc. for the detection of acute abnormalities across the body, helping radiologists prioritize life threatening cases and expedite patient care.

Most of these day-to-day tasks can be handled on conventional hospital IT client/server infrastructures with their central servers and powerful diagnostic workstations that already provide plenty of performance and can easily be scaled up.

Medical Real-Time Equipment Needs Own Intelligence

The increasing application of AI in medical imaging will drive growth and change the user experience.

But what do system designers need if they want to execute the same learned inference algorithms within the strictly limited power and performance headroom of medical imaging devices to also utilize the inferences for augmented reality (AR) in real time? Possible applications for this include anything from medical ultrasound and minimally invasive procedures to AI-assisted real-time magnetic resonance, fluoroscopic, and nuclear imaging.

On-device machine learning, of course, also requires massive performance for all the various tasks such as classification, localization, and segmentation, as well as image filtering and denoising without omitting any irregularities. Will those applications need their own server room next to operating theaters or onboard mobile emergency ambulances?

Raw Data Processing Tasks. First and foremost, with medical imaging applications, there has always been a need for massive parallel preprocessing performance supported by the appropriate technologies. So, in this respect, nothing has changed except that there is now a need to eventually preprocess the raw data for machine learning in another way to what the human eye needs.

Machine Learning and Deep Learning Algorithms. But now machine learning (ML) and deep learning (DL) algorithms have additionally to be implemented, and they need further high parallel processing capabilities. So, in addition to real-time imaging for high-resolution screens with brilliant quality, massive parallel processing capabilities are needed. And these capabilities should be provided by the same processing architecture leveraged by the original DL processes on massive GPGPU-based server farms, which enables highly efficient reuse of software and algorithms.

Therefore, x86-based imaging systems with excellent GPU support are ideal candidates as visualization and ML algorithms are made for general purpose GPU processing, and their entire ecosystem comes with comprehensive heterogeneous computing system support. This is particularly significant, because preprocessing of medical imaging data can also be executed on those GPGPU machines, enabling heterogeneous computing experts to even shift their algorithms from FPGAs or ASICs to the highly flexible GPGPUs.

Scalable AI Performance with GPUs and CPUs from AMD

Fig. 1 - AMD processor technology offers an end-to-end neural network solution including all required software support for the inference engines that balance the performance between CPU and GPU, including MIOpen and ROCm for Linux, or PAL for Windows for the GPGPU.

AMD offers both GPUs and CPUs from a single source and, with its latest AMD Ryzen Embedded V1000 as well as AMD EPYC Embedded 3000 processors, the company has launched stunning benchmark processors in the embedded computing and embedded server sectors. The AMD Ryzen Embedded V1000 processors, for example, integrate the breakthrough performance of Zen and Vega architectures on an accelerated processing unit (APU) into a single device and provides a 52 percent IPC uplift on the CPU and a 200 percent throughput/clock improvement on the GPU.

This AMD solution focuses on the unique data throughput, image transformation, and post-processing requirements of medical imaging applications - including mobile and cart-based ultrasound systems, endoscopy systems, and high-end MRI and CT scanners - while also supporting the highest possible medical image resolution and accuracy.

Fig. 2 - Example workflows merging computer vision and neural network flow.

If the performance is not enough, engineers can harness the same Zen and Vega architecture by using discrete graphics cards parallel to the even higher performing AMD EPYC Embedded processors that provide up to 1 TB of DDR4 memory capacity (across 4 channels); 4, 8, 12, or 16 high-performance cores with simultaneous multithreading (SMT); and up to 64 PCIe Gen 3 lanes, for example, to connect 4 fully featured PEG graphics cards using 16 lanes each.

Open Ecosystem. Both platforms also include an open ecosystem for heterogeneous system programming. It provides all the capabilities to execute not only the AI inference logic on computers embedded in medical devices but also offers the right platforms to execute real DL processes on data center grade-server platforms with massive parallel processing on GPGPUs.

The recently launched AMD Radeon Instinct MI60 and MI50 data center GPUs, for example, offer a supercharged compute performance, high-speed connectivity, fast memory bandwidth, and the latest ROCm (Radeon Open Compute) open software platform to power the most demanding DL applications.

ROCm, TensorFlow and MIOpen

Besides pure performance benefits, open systems development is another key argument that leads designers of medical imaging applications toward the AMD ecosystem. The ROCm software is an example of such an open software platform for GPU-enabled heterogeneous computing with a holistic computational approach on a system level - not focusing only on the GPU. It was created with developers in mind to accommodate manifold future GPGPU technologies including ML and AI. As an open platform, the ROCm ecosystem provides a rich foundation of modern programming languages including a heterogeneous C and C++ single-source compiler designed to speed development of high-performance, energy-efficient heterogeneous computing systems.

Updated Math Libraries for the New DLOPS. One of the most exciting ROCm developments over the past year is the integration and progression of the various ML frameworks. For example, ROCm 2.0 provides updated math libraries for the new DLOPS and supports 64-bit Linux operating systems including CentOS, RHEL, and Ubuntu as well as the latest versions of the most popular DL frameworks, including TensorFlow 1.11 (merging code upstream into the main repository), PyTorch (Caffe2), and others.

TensorFlow with Strong Support for ML and DL. TensorFlow support is essential because it is one of the most important open source software libraries for high-performance numerical computation needed for AI. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, APUs), and from desktops to clusters of servers to embedded mobile and edge devices. The latter two are essential for medical imaging OEMs requiring real-time visualization including AI-assisted AR in the operating theater. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for ML and DL, and the flexible numerical computation core is used across many other scientific domains.

Convolution Neural Network Acceleration with ROCm 2.0. ROCm 2.0 also supports the MIOpen library that has been developed for Convolution Neural Network (CNN) acceleration and is capable to run on top of the ROCm software stack. CNNs need relatively little preprocessing compared with other image classification algorithms because the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage that ultimately leads to faster executions of inferences on medical imaging devices.

Fig. 3 - To start, data need to be analyzed on central server farms with massive GPGPUs. On the inference system, the footprint required to execute all tasks is then much smaller.

Runtime to Convert CUDA to Portable C++ Code

Working with AMD platforms opens the path to this rich ecosystem of open source software. With HIP there is even a runtime to convert CUDA to portable C++ code with as little or no performance impact over coding directly in CUDA or hcc “HC” mode. This makes it possible to migrate existing code from competitive platforms toward the AMD ecosystem without having to switch engineering environment, which is essential for application programming engineers who need to be experts within their frameworks. To try out the newest ROCm packages, develop an application and easily deploy a ROCm solution, engineers can get the most recent Docker images at  to save the time of collecting all the libraries and building them specifically for a dedicated platform.

Mentor Embedded and AMD. Support is not only provided via open source but also from commercial vendors. Mentor Embedded has joined forces with AMD to make several software enablement products available for the latest AMD EPYC Embedded 3000 and Ryzen Embedded V1000 processor families. For example, Mentor Embedded Linux (MEL) in combination with the Sourcery CodeBench IDE tools for hardware-accelerated applications enable the development of high-performance machine vision and ML applications under exceptional integration conditions. This is achieved by enabling OpenCV vision libraries and TensorFlow libraries with security enabled throughout the device, which is essential for OEMs’ IP protection.

Regular Security Updates and Easy-to-Use Profiling Tools. As the leading commercial embedded Linux supplier, Mentor also provides regular security updates and easy-to-use profiling tools, enabling clients to achieve shorter development cycles and smoother integrations. The MEL platform includes, for example, GNU compiler collection (GCC), GNU project debugger (GBD), and Yocto™ Project 4.9 Long-Term Support (LTS) Linux kernel plus, of course, comprehensive support of the latest AMD GPUs. Such an out-of-the-box support package enables customers to get started faster while leveraging the proven reliability of MEL.

For real-time systems, developers can even take advantage of the small footprint and low power capable Nucleus® RTOS. If the hardware platform is also ready for the real-time hypervisor support from Real-Time Systems, engineers have everything they need from a software point of view to design the embedded computing intelligence for AI-assisted real-time imaging within their medical devices, which of course requires medical IoT connectivity to OEMs’ servers for the continual improvement of artificial intelligence, or CI of AI in short.

Computer-on-Modules for Optimum Performance Balancing

The congatec evaluation carrier boards plus computer modules with either AMD Ryzen Embedded 1000 or AMD EPYC Embedded 3000 processors are AI inference ready for augmented reality in medical real-time imaging applications. (Credit: congatec)

But what can engineers utilize from a hardware perspective to optimally balance the performance needs in their medical imaging devices? It all depends on the OEM’s medical imaging application. When it comes to small-sized devices, OEMs should consider implementing COM Express Type 6 computer-on-modules, which are available with AMD Ryzen Embedded V1000 processors for all-in-one systems that include the monitors as output devices, or COM Express Type 7 server-on-modules with AMD EPYC 3000 embedded processors for backend processing on the embedded server level.

Both processors are based on the same micro architecture and can be accompanied with the required AMD Radeon GPUs from a single source. This way, embedded system designs can be scaled from just 12 or 25 W for AMD Ryzen APUs, to server-on-modules with 100 W and more if discrete GPGPUs are added to the system design.

Closed Loop Engineering with Computer-on-Modules. The benefit of a computer-on-module building block approach - engineered on the basis of vendor independent standards - is the fact that developers can highly scale the demanded performance and always leverage the latest technologies by simply switching modules, which is important for continuous improvement processes and closed loop engineering. Modules also offer - and this is significant for medical life cycles - long-term availability and come with comprehensive software support as well so that OEM customers get an extensive collection of drives and libraries in a board support package for all the standard interfaces supported by the modules.

The best practice is to leave entire embedded system design to embedded module vendor. If medical imaging engineers can also get a COM Express carrier board platform with an easy-to-use GPU interface that is perfectly suited for embedded systems design, such as an MXM connector, then they have access to an entire ecosystem for their own GPGU-based AI inference platforms.

Embedded computing vendors like congatec offer such comprehensive services so that medical imaging engineers can concentrate on their application engineering and leave the entire design based on standard embedded computer technologies to the embedded computing vendor - including all integration support required for third-party hardware components such as Basler medical and life science imaging cameras. Congatec, for example, has set up a cooperation with Basler, to facilitate the fusion of embedded computer technologies and vision applications. But services are not limited to a single partner. The overriding aim of any cooperation is to simplify the integration of embedded technologies for medical engineers.


  1. Artificial Intelligence in Healthcare Market by Offering (Hardware, Software, Services), Technology (Machine Learning, NLP, Context-Aware Computing, Computer Vision), End-Use Application, End User, and Geography – Global Forecast to 2025
  2. Michael Walter, “Global market for AI in medical imaging expected to top $2B by 2023,” Radiology Business, Aug. 2, 2018.
  3. AIDOC

This article was written by Zeljko Loncaric, Marketing Engineer, congatec, Deggendorf, Germany, and Dan Demers, Director of Sales and Marketing at congatec Americas, San Diego, CA. For more information, visit here .