Deep learning for bioimage analysis in developmental biology.

Deep learning has transformed the way large and complex image datasets can be processed, reshaping what is possible in bioimage analysis. As the complexity and size of bioimage data continues to grow, this new analysis paradigm is becoming increasingly ubiquitous. In this Review, they begin by introducing the concepts needed for beginners to understand deep learning. they then review how deep learning has impacted bioimage analysis and explore the open-source resources available to integrate it into a research project. Finally, they discuss the future of deep learning applied to cell and developmental biology. They analyze how state-of-the-art methodologies have the potential to transform their understanding of biological systems through new image-based analysis and modelling that integrate multimodal inputs in space and time.

In the past decade, deep learning (DL) has revolutionized biology and medicine through its ability to automate repetitive tasks and integrate complex collections of data to produce reliable predictions . Among its many uses, DL has been fruitfully exploited for image analysis. Although the first DL approaches that were successfully used for the analysis of medical and biological data were initially developed for computer vision applications, such as image database labelling (Krizhevsky et al., 2012), many research efforts have since focused on tailoring DL for medical and biological image analysis. Bioimages , in particular, exhibit a large variability due to the countless different possible combinations of phenotypes of interest, sample preparation protocols, imaging modalities and acquisition parameters. DL is thus a particularly appealing strategy to design general algorithms that can easily adapt to specific microscopy data with minimal human input. For this reason, the successes and promises of DL in bioimage analysis applications have been the topic of a number of recent review articles.

Here, they expand upon a recent Spotlight article and tour the practicalities of the use of DL for image analysis in the context of developmental biology. They first provide a primer on key machine learning (ML) and DL concepts. They then review the use of DL in bioimage analysis and outline success stories of DLenabled bioimage analysis in developmental biology experiments. For readers wanting to further experiment with DL, they compile a list of freely available resources, most requiring little to no coding experience. Finally, they discuss more advanced DL strategies that are still under active investigation but are likely to become routinely used in the future. What is machine learning? The term machine learning defines a broad class of statistical models and algorithms that allow computers to perform specific data analysis tasks.

Examples of tasks include, but are not limited to, classification, regression, ranking, clustering or dimensionality reduction, and are usually performed on datasets collected with or without prior human annotations. Three main ML paradigms can be distinguished: supervised, unsupervised and reinforcement learning. The overwhelming majority of established bioimage analysis algorithms rely on supervised and unsupervised ML paradigms and they therefore focus on these two in the rest of the article. In supervised learning, existing human knowledge is used to obtain a ‘ground truth’ label for each element in a dataset. The resulting data-label pairs are then split into a ‘training’ and a ‘testing’ set. Using the training set, the ML algorithm is ‘trained’ to learn the relationship between ‘input’ data and ‘output’ labels by minimizing a ‘loss’ function, and its performance is assessed on the testing set.

Once training is complete, the ML model can be applied to unseen, but related, input data in order to predict output labels. Classical supervised ML methods include random forests, gradient boosting and support vector machines. In contrast, unsupervised learning deals with unlabelled data: ML is then employed to uncover patterns in input data without human-provided examples. Examples of unsupervised learning tasks include clustering and dimensionality reduction, routinely used in the analysis of single-cell ‘-omics’ data. Neural networks and deep learning DL designates a family of ML models based on neural networks (NN). Formally, an NN aims to learn non-linear maps between inputs and outputs. An NN is a network of processing ‘layers’ composed of simple, but non-linear, units called artificial neurons . When composed of several layers, an NN is referred to as a deep NN. Layers of artificial neurons transform inputs at one level (starting with input data) into outputs at the next level such that the data becomes increasingly more abstract as it progresses through the different layers, encapsulating in the process the complex non-linear relationship usually existing between input and output data. This process allows sufficiently deep NN to learn during training some higher-level features contained in the data. For example, for a classification problem such as the identification of cells contained in a fluorescence microscopy image, this would typically involve learning features correlated with cell contours while ignoring the noisy variation of pixel intensity in the background of the image. Intuitively, a DL model can be viewed as a machine with many tuneable knobs, which are connected to one another through links. Tuning a knob changes the mathematical function that transforms the inputs into outputs. This transformation depends on the strength of the links between the knobs, and the importance of the knobs, known together as ‘weights’ . A model with randomly set weights will make many mistakes, but the so-called ‘winning lottery hypothesis’ assumes that an optimal configuration of knobs and weights exists.

This optimal configuration is searched for during training, in which the knobs of the DL model are reconfigured by minimizing the loss function. Although prediction with trained networks is generally fast, training deep NN de novo proves to be more challenging. A main difficulty in DL lies in finding an appropriate numerical scheme that allows, with limited computational power, tuning of the tens of thousands of weights contained in each layer of the networks and obtaining high ‘accuracy’. Although the idea (McCulloch and Pitts, 1943) and the first implementations of NN date back to the dawn of digital computing, it took several decades for the development of computing infrastructure and efficient optimization algorithms to allow implementations of practical interest, such as handwritten-digit recognition. Convolutional NN (CNN) are a particular type of NN architecture specifically designed to be trained on input data in the form of multidimensional arrays, such as images. CNN attracted particular interest in image processing when, in the 2012 edition of the ImageNet challenge on image classification, the AlexNet model outperformed by a comfortable margin other ML algorithms. In bioimage analysis application, the Unet architecture has become predominant, as discussed below.

Deep learning for bioimage analysis DL in bioimage analysis tackles three main kind of tasks: (1) image restoration, in which an input image is transformed into an enhanced output image; (2) image partitioning, whereby an input image is divided into regions and/or objects of interest; and (3) image quantification, whereby objects are classified, tracked or counted. Here, they illustrate each class of application with examples of DLenabled advances in cell and developmental biology. Image restoration Achieving a high signal-to-noise ratio when imaging an object of interest is a ubiquitous challenge when working with developmental systems. Noise in microscopy can arise from several sources (e.g. the optics of the microscope and/or its associated detectors or camera). Live imaging, in particular, usually involves compromises between SNR, acquisition speed, and imaging resolution. In addition, regions of interest in developing organisms are frequently located inside the body, far from the microscope objective. Therefore, because of scattering, light traveling from fluorescent markers can be distorted and less intense when it reaches the objective. Photobleaching and phototoxicity are also increasingly problematic deeper into the tissue, leading to low SNR as one mitigates its effect through decreased laser power and increased camera exposure or detector voltage .

DL has been successful at overcoming these challenges when used in the context of image restoration algorithms, which transform input images into output images with improved SNR. Although algorithms relying on theoretical knowledge of imaging systems have made image restoration possible since the early days of bioimage analysis the competitive performance of both supervised and unsupervised forms of DL has introduced a paradigm shift. Despite lacking in theoretical guarantees, several purely data-driven DL-based approaches outshine non-DL strategies inaccurate image restoration tasks. One challenge in applying supervised DL to image restoration is the need for highquality training sets of ground truth images exhibiting a reduced amount of noise. A notable example of DL-based image restoration algorithm requiring a relatively small training set [200 image patches, size 64×64×16 pixels] is contentaware image restoration (CARE) To train CARE, pairs of registered low-SNR and high-SNR images must first be acquired. The high-SNR images serve as ground truth for training a DL model based on the U-net architecture. The trained network can then be used to restore noiseless, higher-resolution images from unseen noisier datasets. Often, however, high-SNR ground truth image data cannot be easily generated experimentally. In such cases, synthetic high SNR images generated by non-DL deconvolution algorithms can be used to train the network. For example, CARE has been trained to resolve sub-diffraction structures in low-SNR brightfield microscopy images using synthetically generated super-resolution data. More recently, the DECODE method uses a U-net architecture to address the related challenge of computationally increasing resolution in the context of single-molecule localization microscopy. The U-net model takes into account multiple image frames, as well as their temporal context. DECODE can localize single fluorophore emitters in 3D for a wide range of emitter brightnesses and densities, making it more versatile compared with previous CNN-based methods.

Unsupervised methods for image restoration offer an alternative to the generation of dedicated or synthetic training sets. Some recent denoising approaches exploit DL to learn how to best separate signal (e.g. the fluorescent reporter from a protein of interest) from noise, in some cases without the need for any ground truth. for example, uses a U-net model to restore noiseless images after training on pairs of independent noisy images, and was demonstrated to accurately denoise biomedical image data. Going further, Noise2Self modifies Noise2Noise to only require noisy images split into the input and target sets. In these algorithms, training is carried out on noisy images under the assumption that noise is statistically independent in image pairs, whereas the signal present is more structured. Alternatively, Noise2Void proposes a strategy to train directly on the dataset that needs to be denoised. The Noise2 model family is ideal for biological applications, in which it can be challenging to obtain noise-free images.

Neural networks and convolutional neural networks for bioimage analysis. (A) Schematic of a typical NN composed of an input layer (green), hidden layers (blue) and an output layer (red). Each layer is composed of neurons connected to each other. (B) Schematic of a U-net architecture for the segmentation of cells and nuclei in mouse epithelial tissues. U-net is amongst the most popular and efficient CNN models used for bioimage analysis and is designed using ‘convolutional’, ‘pooling’ and ‘dense’ layers as key building blocks. U-net follows a symmetric encoder-decoder architecture resulting in a characteristic U-shape. Along the encoder path, the first branch of the U, the input image is progressively compacted, leading to a representation with reduced spatial information but increased feature information. Along the decoder path, the second branch of the U, feature and spatial information are combined with information from the encoder path, enforcing the model to learn image characteristics at various spatial scales. (C) Schematic of an Inception V1 architecture, also called ‘GoogleLeNet’. Inception V1 is a typical CNN architecture for image classification tasks. For example, it has been used to classify early human embryos images with very high accuracy . It is designed around a repetitive architecture made of so-called ‘inception blocks’, which apply several ‘convolutional’ and max ‘pooling’ layers to their input before concatenating together all generated feature maps.

Deep learning methods applied to developmental biology applications. (A) A simulated ground truth cell membrane image is artificially degraded with noise. Denoised outputs obtained using Noise2Noise and Noise2Void are shown at the bottom, along with their average peak signal-to-noise values (PSNR; higher values translate to sharper, less-noisy images). Image adapted from. (B) Fluorescence microscopy cell nuclei image from the Kaggle 2018 Data Science Bowl BBBC038v1 segmented with StarDist, in which objects are represented as star-convex polygons, and with SplineDist, in which objects are described as a planar spline curve.

Hallou, A., Yevick, H. G., Dumitrascu, B. & Uhlmann, V. (2021). Deep learning for bioimage analysis in developmental biology. Development, 148 (18). doi:10.1242/dev.199616