Current and future grand challenges in bioinformatics data visualization are outlined, and the first publication venue dedicated to this subdiscipline is announced.
Increasingly, the life sciences rely on data science, an emerging discipline in which visualization plays a critical role. Visualization is particularly important with challenging data from cutting-edge experimental techniques, such as 3D genomics, spatial transcriptomics, 3D proteomics, epiproteomics, high-throughput imaging, and metagenomics. Data visualization also plays an increasing role in how research is communicated. Some scientists still think of data visualization as optional; however, as more realize it is an essential tool for revealing insights buried in complex data, bioinformatics visualization is emerging as a subdiscipline. This article outlines current and future grand challenges in bioinformatics data visualization, and announces the first publication venue dedicated to this subdiscipline. Over the past two decades, life science data have increased rapidly in volume and complexity, with the result that data analysis is often the major bottleneck (O’Donoghue et al., 2010a). For example, “All major genomics breakthroughs so far have been accompanied by the development of groundbreaking statistical and computational methods” (Green et al., 2020). Thus, in the remaining decades of the 21st century, life scientists will become increasingly reliant on the emerging tools and methods of data science (Blei and Smyth, 2017; Altman and Levitt, 2018). One of these methods is data visualization (a.k.a. DataVis), which plays a critical role in transforming data and analysis outcomes into insight (Card et al., 1999). Data visualization involves analysis, design, and rendering, as well as observation and cognitive processing (Figure 1). Some scientists think of DataVis as an optional step mostly aimed at aesthetics — however, there is growing recognition that it is an essential tool in the analysis of complex data; two indicators of this recognition are the recent sales of DataVis companies Looker and Tableau for US$3B and $16B, respectively. Currently, however, most attention is focused on another aspect of data science, namely, the use of machine learning to develop artificial intelligence systems. Such systems have recently led to exciting advances in the life sciences (e.g., Callaway, 2020a)— but also to some hyperbole. Clearly, machine learning methods are increasingly critical for research; but these methods also have limitations (Challen et al., 2019; Heaven, 2019; Yu and Kohane, 2019). More fundamentally, automated methods are insufficient, since analysis outcomes must be observed and understood by an analyst before insight can occur (Figure 1). Most analysts use data visualization as an integral part of their cognitive processes—especially important is manual validation, which involves checking for errors and outliers in raw data, and for wrong assumptions used in automated analysis methods (Anscombe, 1973). Automated data analysis (including machine learning) and data visualization are just components of the larger goal of data science, which the eminent computer scientist Fred Brooks argues should focus on ‘Intelligence Amplification’ (a.k.a. I.A.) — i.e., on amplifying our abilities to manage more Edited by: Barbora Kozlikova, Masaryk University, Czechia