03 October, 2022

Genomic Prediction of Cassava Mosaic Disease

Article overview

800 million people's diet depends on cassava, yet Cassava Mosaic Disease (CMD) regular destroys over 30m tonnes of the world's crop. Read how incorporating newly identified SNP networks into breeding prediction models can lead to more disease-resistant cassava plants.

Executive Summary

Cassava (Manihot esculenta) is one of the most nutritionally important starchy root crops in the world, providing sustenance for around 800 million people globally. It has many desirable characteristics such as a high starch content, drought resistance and a high yield per area which makes it ideally suited to the challenging environments.

In the production of cassava, cassava mosaic disease (CMD) is a major threat, causing losses between 12% and 82%, equating to about 30 million tons annually. Therefore, new insights into the drivers responsible for the disease development are an especially important aspect in cassava breeding.

Cassava provides sustenance for around 800m people

Synomics novel technology platform, DISCOVER™, models the complexity of biology through explicit statistical testing of multi-dimensional feature interactions using proprietary, computationally-efficient algorithms and GPU implementations. Here we identified SNPs and SNP-networks that are involved in both alcohol and cadmium-ion related pathways, which are known to be involved in
CMD resistance. Not only did these discoveries map to potentially causative biology, but when SNP networks were incorporated into breeding prediction models, they materially outperform the current best approach, suggesting selection of cassava clones based on predictions using Synomics’ technology leads to higher CMD-resistance.
”Results suggest that selection of cassava clones based on predictions using Synomics’ technology leads to higher CMD-resistance.”



Cassava (Manihot esculenta) is the most important starchy root crop in the tropics. Its drought resistance, wide harvest windows and high yield per area make it an ideal crop in the developing world. As such, it is the primary food staple for millions of people in Africa, Asia, and Latin America (Parmar et al., 2017) making cassava an economically important crop.

Cassava mosaic disease (CMD) is a viral infection causing losses ranging from 12% to 82% depending on variety and infection type, leading to annual economic losses of between $1.9 and $2.7 billion USD in East and Central Africa alone (Patil and Fauquet, 2009). Due to cassava’s global importance and susceptibility to biological risks, breeding for CMD resistance has been of key importance.

Breeding resistant plants is also a more sustainable and deployable solution than agrochemical approaches.

$1.9-$2.7 billion USD Annual economic losses in East and Central Africa alone caused by cassava mosaic disease (CMD)

Cassava breeding

While breeding is an excellent strategy for combating CMD, traditional cassava breeding comes with some challenges like asynchronous flowering, a small number of seeds per cross, and a long cropping cycle (Rabbi et al., 2022). These, coupled with low multiplication of planting material for multi-environment screening and high variation in performance of cuttings used for propagation has slowed cassava’s breeding progress.

”Through genomic breeding, we can drastically reduce the length of breeding cycles.”

Reasons for slowed breeding

  • Asynchronous flowering*
  • Small number of seeds per cross*
  • Long cropping cycle*
  • Low multiplication of planting material for multi-environment screening
  • High variation in performance of cuttings used for propagation

*Rabbi et al., 2022

The most important tool for circumventing such limitations is the application of genomic breeding. Through genomic breeding, we can drastically reduce the length of breeding cycles, as decisions on which plants to advance can be made before the collection of phenotypic data. Genomic Selection strategies to improve key traits will be very valuable and will be fundamental to food security.

Study Design


Biological systems are complex, and nonadditive effects constitute a large part of the underlying processes. While this is widely understood and acknowledged, due to difficult computational problems and statistical inference in such a large search-space, models incorporating these effects are largely underutilized.

The Synomics DISCOVER™ platform models traits and disease not only by the individual SNPs, but also by complex multiway genomic interactions that relates to a disease or a trait. Synomics’ novel methodology can lead to discoveries of functionally co-dependent genetic interactions.

Our discoveries can be used to understand and predict phenotypic outcomes using the same genetic information other industry standard tools would use, but with a much more complex computational model.

To demonstrate the general applicability of our discoveries, we apply a standard machine learning technique of leaving a portion of the data out of our discovery process and using it as a blinded test-set.

”Synomics’ novel methodology can lead to discoveries of functionally co-dependent genetic interactions.”


The dataset used in this study was taken from Rabbi et al. (2022). As our trait, we used the mean value taken from CMD observations at 1, 3 and 6 months after planting. The original data were discrete observations ranging from 1 to 5, taken over 4 years, with 3 measurements per year (1-month, 3-month, and 6-month intervals) Table 1 describes the observation values.

Furthermore, the mean CMD trait entering the Synomics DISCOVER™ platform were adjusted for effects of years, locations, and for the study design before running the platform.

As the prediction of the next generation is a major goal in breeding, the clones of the year 2015 served as a test-set while all years before the last one served for model training. Prediction accuracy was evaluated using the Pearson correlation coefficient between phenotypes of 2015 and their predictions.

Table 1: Description of CMD scoring criteria

Clean, no infection
Up to 25% leaf area chlorotic, mild leaf distortion, no stunting
25-50% leaf area chlorotic, moderate leaf distortion, no stunting
50-75% leaf area chlorotic, severe leaf distortion, moderate stunting
75- 100% leaf area chlorotic, severe leaf distortion, small leaflets (almost no lamina), severe stunting

Breeding Value Estimation

The networks identified by DISCOVER™ were taken further into the framework of breeding value estimation, predicting the best individuals to select for further breeding.

In this framework, the industry standard GBLUP approach served as a reference benchmark. The incorporation of networks identified by DISCOVER™ into BLUPmodels is achieved by a range of unique, proprietary approaches developed by Synomics. These approaches are further defined in Table 2.

Table 2: Description of Synomics’ novel BLUP models.

Model 1
Models main effects for the SNPs of the top-networks
Model 2
Models effects for the top networks
Model 3
Models both main effects for all SNPs, together with network effects
Model 4
Models effects for all networks.


Prediction accuracies

Prediction accuracies were calculated on predictions made on plants excluded from model training. The prediction accuracies for CMD in Table 3 showed higher predictability by all the Synomics models over that from the GBLUP model. While the highest accuracy of 69% was observed when all networks were modelled (model 4), models 2 and 3 showed accuracies of 67% and 68% with much fewer SNPs/networks than model 4.

CMD prediction accuracies for the GBLUP model and the Synomics BLUP models

Table 3: CMD prediction accuracies for the GBLUP model and the different Synomics BLUP models that are based on DISCOVER™. The last three columns represent (a) the number of individual SNPs, (b) the number of SNP-networks, and (c) the number of SNPs within those networks.

Prediction Accuracy
#SNPs modelled individually
#SNP-networks modelled
#SNPs in those networks
Model 1
Model 2
Model 3
Model 4

Gene Enrichment

Beyond pure predictive breeding applications, we seamlessly extract biological knowledge from the SNPs and SNP-networks found in DISCOVER™. This automatically extends into using orthologue genes from well annotated species instead of those where genome annotation is poor, such as in cassava. Based on our automatic pipelines we were able to identify multiple functional biological processes which were enriched in genes associated with the most predictive SNPs based on our network analysis.

We found that alcohol pathway GO terms were enriched in the CMD-associated gene set. Cellular response to alcohol has been shown to be upregulated in CMD-tolerant cassava landraces (Allie et al. 2014). Interestingly, we also found enrichment in cadmium-ion related pathways, and cadmium has been shown to provide viral resistance in several studies (Ghoshroy et al., 1998; DalCorso et al., 2010).

”We identified enrichment of genes in alcohol pathways and in cadmium-ion related pathways.”

Conclusion and Discussions

Here we have shown that Synomics’ unique proprietary technology – DISCOVER™ – is able to identify SNPs and SNPnetworks that both:

  1. Provide insight into the genes involved in the development of CMD; and
  2. Predict the phenotype with a high accuracy.

Providing insight into genes involved in CMD

Synomics Models 1 and 2 show that a small number of SNPs (101) predict the phenotypes more precisely than using all SNPs (40,397) in a traditional GBLUP model. This indicates that there are a small number of SNPs that strongly affect CMD.

The presence of quantitative trait loci that regulate CMD susceptibility of cassava was reported previously (Lokko et al. 2005; Okogbenin et al., 2012). Similarly, transgenic cassava plants have already been developed to be more resistant towards CMD (Zhang et al., 2005). Therefore, the small number of SNPs found to affect CMD here provide further potential targets to improve CMDresistance in cassava.

Further, we were able to identify enriched pathways within those 101 SNPs that have high biological relevance to disease resistance.

”The small number of SNPs found to affect CMD provide further potential targets to improve CMDresistance in cassava.”

Predicting the phenotype with high accuracy

The prediction accuracies of Table 3 show that all Synomics’ novel BLUP models outperform the standard genetic evaluation GBLUP model.

Synomics unique proprietary approach includes identifying networks (combinations of SNP genotypes across the genome) that affect a disease or trait. Prediction accuracies for our networkbased BLUP models (models 2 to 4) were found to outperform the purely SNP-based models (GBLUP and model 1). As network-based models allow us to model higher-order epistatic effects, these findings indicate that CMD is not only controlled by a small number of SNPs that have simple additive effects, but also by interactions between those SNPs.

”Results indicate that CMD is not only controlled by a small number of SNPs that have simple additive effects, but also by interactions between those SNPs.”

As the prediction accuracy using the SNPs from DISCOVER™ was higher than that of GBLUP, it can be concluded that a selection based on Synomics’ technology can be expected to lead to a higher genetic gain than a selection based on GBLUP.

Finally, in a multi-environment trial study, interactions between SNPs and environments can serve to breed varieties that are adapted to specific environments, which is essential for a successful crop breeding program. Such SNP-environment interaction was, for example, found for cassava bacterial blight (Sedano et al., 2017) and therefore it can be expected that there will also be SNP-environment interactions in case of CMD.

”A selection based on Synomics’ technology can be expected to lead to higher genetic gain when compared to a selection based on GBLUP”


P. Zhang, et al. (2005). Resistance to cassava mosaic disease (…). Plant Biotechnology Journal. Vol. 3, pp. 385–397. doi: 10.1111/j.1467-7652.2005.00132.x

E. Okogbenin, et al. (2012). Molecular Marker Analysis and Validation of Resistance to Cassava Mosaic Disease in Elite Cassava Genotypes in Nigeria. Crop Breeding & Genetics, Vol. 52 (6), pp. 2576-2586. https://doi.org/10.2135/cropsci2011.11.0586

Y. Lokko, et al. (2005). Molecular markers associated with a new source of resistance to the cassava mosaic disease. African Journal of Biotechnology, Vol. 4 (9), pp. 873-881

F. Allie, et al. (2014). Transcriptional analysis of South African cassava mosaic virusinfected (…) BMC Genomics, Vol.15, https://doi.org/10.1186/1471-2164-15-1006

S. Ghoshroy, et al. (1998). Inhibition of plant viral systemic infection by non-toxic concentrations of cadmium. The Plant Journal, Vol.13 (5), pp. 591–602. https://doi.org/10.1046/j.1365-313X.1998.00061.x

G. DalCorso, et al. (2010). Regulatory networks of cadmium stress in plants. Plant Signaling & Behavior, 5:6, 663-667, DOI: 10.4161/psb.5.6.11425

J.C.S. Sedano, et al. (2017). Major Novel QTL for Resistance to Cassava Bacterial Blight Identified through a Multi- Environmental Analysis. Frontiers in Plant Science, Vol. 8. https://doi.org/10.3389/fpls.2017.01169

I.Y. Rabbi, et al. (2022). Genome-wide association analysis reveals new insights into the genetic architecture of defensive(…). Plant Molecular Biology, Vol.109, pp. 195–213. https://doi.org/10.1007/s11103-020-01038-3

B. Owor et al. (2004). The effect of cassava mosaic geminiviruses on symptom severity, growth and root yield of a cassava (…). Annals of Applied Biology, Vol.145, pp.331 – 337. DOI:10.1111/j.1744-7348.2004.tb00390.x

A. Parmar, et al. (2017). Crops that feed the world: Production and improvement of cassava for food, feed, and industrial uses. Food Security, Vol.9, pp.907–927

Patil, B. L., & Fauquet, C. M. (2009). Cassava mosaic geminiviruses: actual knowledge and perspectives. Molecular plant pathology, 10(5), 685-701.

Related Reports

Golden retriever in the meadow
September 15, 2022

Synomics Announces Partnership with Mars Petcare

Cow in the meadow
March 21, 2022

Frontier In Genetics

March 21, 2022

Synomics to attend World Agri-Tech