The "stochasters" approach is based on assuming future multiple, but random in time, large sea ice loss events such as those that occurred in 2007 and 2012. This method estimates it would take several more events to reach a nearly sea ice-free state in the summer. Using the likelihood of such events, this approach suggests a nearly sea ice-free Arctic by about 2030 but with large uncertainty in timing.
The development of computational methods to predict three-dimensional (3D) protein structures from the protein sequence has proceeded along two complementary paths that focus on either the physical interactions or the evolutionary history. The physical interaction programme heavily integrates our understanding of molecular driving forces into either thermodynamic or kinetic simulation of protein physics16 or statistical approximations thereof17. Although theoretically very appealing, this approach has proved highly challenging for even moderate-sized proteins due to the computational intractability of molecular simulation, the context dependence of protein stability and the difficulty of producing sufficiently accurate models of protein physics. The evolutionary programme has provided an alternative in recent years, in which the constraints on protein structure are derived from bioinformatics analysis of the evolutionary history of proteins, homology to solved structures18,19 and pairwise evolutionary correlations20,21,22,23,24. This bioinformatics approach has benefited greatly from the steady growth of experimental protein structures deposited in the Protein Data Bank (PDB)5, the explosion of genomic sequencing and the rapid development of deep learning techniques to interpret these correlations. Despite these advances, contemporary physical and evolutionary-history-based approaches produce predictions that are far short of experimental accuracy in the majority of cases in which a close homologue has not been solved experimentally and this has limited their utility for many biological applications.
The methodology that we have taken in designing AlphaFold is a combination of the bioinformatics and physical approaches: we use a physical and geometric inductive bias to build components that learn from PDB data with minimal imposition of handcrafted features (for example, AlphaFold builds hydrogen bonds effectively without a hydrogen bond score function). This results in a network that learns far more efficiently from the limited data in the PDB but is able to cope with the complexity and variety of structural data.
Derived GMT scenarios offer a way for the public and policymakersto understand the impacts for any given temperature threshold, asmany physical changes and impacts have been shown to scale withglobal mean surface temperature, including shifts in averageprecipitation, extreme heat, runoff, drought risk, wildfire,temperature-related crop yield changes, and even risk of coralbleaching (e.g., NRC 2011;38 Collins et al. 2013;3 Frieler et al.2013;39 Swainand Hayhoe 201540).They also allow scientists to highlight the effect of globalmean temperature on projected regional change by de-emphasizing theuncertainty due to both climate sensitivity and future scenarios.40,41 This approach isless useful for those impacts that vary based on rate of change,such as species migrations, or where equilibrium changes are verydifferent from transient effects, such as sea level rise.
Statistical models are generally flexible and less computationallydemanding than RCMs. A number of databases using a variety ofmethods, including the LOcalized Constructed Analogs method (LOCA),provide statistically downscaled projections for a continuous periodfrom 1960 to 2100 using a large ensemble of global models and arange of higher and lower future scenarios to capture uncertaintydue to human activities. ESDMs are also effective at removing biasesin historical simulated values, leading to a good match between theaverage (multidecadal) statistics of observed and statisticallydownscaled climate at the spatial scale and over the historicalperiod of the observational data used to train the statisticalmodel. Unless methods can simultaneously downscale multiple variables,however, statistical downscaling carries the risk of altering someof the physical interdependences between variables. ESDMs are alsolimited in that they require observational data as input; the longerand more complete the record, the greater the confidence that theESDM is being trained on a representative sample of climaticconditions for that location. Application of ESDMs to remote locationswith sparse temporal and/or spatial records is challenging, thoughin many cases reanalysis84 or even monthly satellitedata85 canbe used in lieu of in situ observations. Lack of data availabilitycan also limit the use of ESDMs in applications that require morevariables than temperature and precipitation. Finally, statisticalmodels are based on the key assumption that the relationship betweenlarge-scale weather systems and local climate or the spatial patternof surface climate will remain stationary over the time horizon ofthe projections. This assumption may not hold if climate changealters local feedback processes that affect these relationships.
The reconstruction and analysis of genome-scale metabolic models constitutes a powerful systems biology approach, with applications ranging from basic understanding of genotype-phenotype mapping to solving biomedical and environmental problems. However, the biological insight obtained from these models is limited by multiple heterogeneous sources of uncertainty, which are often difficult to quantify. Here we review the major sources of uncertainty and survey existing approaches developed for representing and addressing them. A unified formal characterization of these uncertainties through probabilistic approaches and ensemble modeling will facilitate convergence towards consistent reconstruction pipelines, improved data integration algorithms, and more accurate assessment of predictive capacity.
Despite the numerous reconstructions and wide range of applications, GEMs have important limitations . In this review, we focus on one major factor that currently limits the successful application of GEMs: the inherent uncertainty in GEM predictions that arises from degeneracy in both model structure (reconstruction) and simulation results (analysis). While GEM reconstructions typically only yield one specific metabolic network as the final outcome, this one network is indeed one of many possible networks that could have been constructed through different choices of algorithms and availability of information (Fig. 1). The process of GEM reconstruction is divided into (1) genome annotation, (2) environment specification, (3) biomass formulation, and (4) network gap-filling. Different choices in these first four steps can lead to reconstructed networks with different structures (reactions and constraints). On top of these choices, the final phenotypic prediction and biological interpretation is significantly affected by (5) the choice of flux simulation method. This review moves through these five different aspects of GEM reconstruction and analysis, outlining the key sources of uncertainty in each. In addition, we review various approaches that have been developed to deal with this uncertainty. We emphasize approaches that utilize probabilities or an ensemble of models to represent uncertainty. A table associated with each section outlines the different approaches that have been summarized and the sources of uncertainty that they address (Tables 1, 2, 3, 4 and 5).
A general progression for genome-scale metabolic model reconstruction and analysis is represented by five major steps. The central black arrows demonstrate a standard approach, which yields a single output from each step. The gray arrows represent the uncertainty in this process, with the output of each step as an ensemble of possible results. The new additions to the model at each step are shown in red: circles represent metabolites, stars represent biomass components, arrows represent metabolic reactions, and bold arrows represent a specific flux distribution
Another approach is to directly incorporate uncertainty in functional annotation by assigning several likely annotations to each gene rather than picking the single most likely. In one likelihood-based approach, metabolic reactions are annotated probabilistically by taking into account the overall homology score, BLAST e-value, and keeping track of suboptimal annotations . In this approach, metabolic reactions are assigned a probability of being present in a GEM based on both the strength and the uniqueness of the annotation. This approach has been developed into the ProbAnnoPy and ProbAnnoWeb pipelines that provide probabilistic annotations in the ModelSEED framework . Beyond using only homology from BLAST to inform annotation probabilities, the CoReCo algorithm has additionally included homology scores based on global trace graphs, which have been proposed as an improved approach for identifying distant homologs . The CoReCo algorithm also utilizes phylogenetic information to improve the probabilistic annotation of GEMs for multiple organisms simultaneously. Additional context information has also been incorporated into a probabilistic metabolic reaction annotation approach in the GLOBUS algorithm . Context-based information includes gene correlations from transcriptomics, co-localization of genes on the chromosome and phylogenetic profiles, all of which are complementary to gene-sequence homology for inferring functional protein annotations. The probabilistic metabolic reaction annotations generated with these methods serve as a good starting point for subsequent reconstruction steps. For example, the likelihood-based approach mentioned here is used to implement a probabilistic gap-filling algorithm, further discussed in the gap-filling section . 2b1af7f3a8