As NASA develops plans for increasingly ambitious human missions, including a return to the Moon and, eventually, exploration of Mars, more advanced medical risk assessment is necessary in order to keep astronauts healthy. Many aspects of spaceflight can contribute to risk, including altered gravity (which effects blood distribution and vascular biology, muscles, and bones), confinement, changes in sleep patterns, and challenges related to pharmaceutical administration and nutrition. Perhaps the one aspect of the space environment that poses the greatest risk is space radiation. Space radiation consists of energetic protons and helium ions from the sun, as well as galactic cosmic rays — high energy protons and energetic heavy ions — from outside our solar system.
For current NASA missions to the International Space Station (ISS), located in low earth orbit (LEO), astronauts receive minimal exposure to space radiation because they are protected by the earth’s magnetic field. For missions to the moon and beyond, however, astronauts will not be protected by the earth’s magnetosphere, so space radiation exposure will be much higher. Space radiation can cause oxidative stress and can induce DNA mutations, which could lead to leukemia and other cancers in astronauts, as well as other types of medical problems. Some problems may become apparent during a space mission, while other problems related to DNA mutations may not become apparent until after a crew member has returned to earth. NASA’s Human Research Program is currently funding studies that seek to identify specific types of medical risk, and studies that focus on mitigating those risks.
High-Performance Computing and Genomics Research
NASA Ames Research Center has been using high-performance computing for genomics studies since the year 2004, when investigators were involved in creating the first map of human genome activity, as well as studies of genetic variation in humans and model organisms. These efforts demonstrated the value of transcriptome analysis, which involves identification of both expressed and non-expressed sequences across the entire genome. Because of the huge amount of data required and the complexity of genome sequences, the the Pleiades supercomputer at NASA Ames was critical for these studies and is an essential resource for current genome science investigations as well.
Transcriptomics continues to be an important genomics research tool at NASA and is one of the important modalities supported by NASA’s GeneLab . GeneLab is the first comprehensive space-related omics open-access database, where scientists can upload, download, share, store, and analyze spaceflight and spaceflight-relevant data from experiments using model organisms. The widespread use of next-generation sequencing (NGS) has resulted in an enormous increase in genomics data available to investigators. As the amount of data requiring analysis grows, the need for high-performance computing grows in order to glean the maximum insight about spaceflight effects on physiology from the induced sequence variation data.
Development of a Novel Bioinformatics Pipeline on NASA’s Pleiades supercomputer to Meet Space Biology and Human Research Program Needs. Bioinformatics pipeline development continues to be one of the most challenging aspects of genomics analysis using high-performance computing. The Genetic Analysis Tool Kit 4.1 (GATK4) is a collection of command line tools for analyzing high-throughput sequencing data with a primary focus on variant discovery. GATK4 can reveal spaceflight-induced single nucleotide polymorphisms (SNPs), insertions, and deletions, as well as other variants that can affect spaceflight performance and risks. Additional tools based on the pipeline management software Toil and the programming language R are needed as well. Integrating tools from multiple sources is a challenge, since different programming languages are used (Java, Python, and R).
Recently, a study of mutation rates was performed in mice flown aboard the ISS. The study revealed that even a brief stay aboard the ISS induced very rapid tissue-specific mutations at unique and commonly varied non-synonymous sites (affecting protein sequence) in actively transcribed mouse genes that encode functions for many of the known negative spaceflight effects in astronauts, including bone and muscle loss, fatigue, vision impairment, and intraocular pressure elevations. Discovery of the increased DNA mutation rate aboard the ISS was enabled by the development of a high-performance computational pipeline for NGS analytics.
Specialized Computational Algorithms and High-Performance Computing Provide for Highly Efficient DNA Sequence Analysis. The computational algorithms that NASA Ames has developed allow for parallelization and scaling that allow genomics investigators to take full advantage of the NAS. In contrast, conventional workstations are limited by the number of CPUs, and cannot scale the required analytics, as shown in Figure 1. The combination of custom-written algorithms and the Pleiades supercomputer at Ames is critical to the analysis efficiency of NASA’s GeneLab platform.
In 2020, researchers at NASA Ames Research Center embarked on a study of COVID-19 using the Pleiades supercomputer. This study seeks to unravel one of the most perplexing aspects of COVID-19 — the fact that some patients experience a relatively mild form of the disease, while others become gravely ill with complications such as acute respiratory distress syndrome. Pleiades is playing a critical role in this investigation, which involves an analysis of genetic variation in the general population and correlation with COVID-19 disease severity. Insight from this study will influence clinical management of the disease. This study is supported by the COVID-19 HPC Consortium .