Contact us +49 7071 568 3995
company metagenomics plant breeding publication

Make Sense of Science: Interpretable machine learning decodes soil microbiome’s response to drought stress

The Soil Microbiome and Drought Stress

Welcome to Make Sense of Science, where we present a publication in a short, concise form. Today it's Computomics' machine learning-driven feature importance analysis for the identification of marker taxa, published in BMC Environmental Microbiome:

Interpretable machine learning decodes soil microbiome’s response to drought stress
Michelle Hagen, Rupashree Dass, Cathy Westhues, Jochen Blom, Sebastian J. Schultheiss & Sascha Patz


Michelle Hagen
Bioinformatics Analyst

Sascha Patz
Product Mgr Microbiome Solutions

We are proud to share the essence of this publication and the fact that interpretable machine learning holds immense potential for drought stress classification of soil based on marker taxa. This is Michelle Hagen's first publication.


Climate change threatens food security globally, especially in regions with limited food resources. More frequent and severe droughts are expected to damage crops, making food scarcer. Drought also changes the types of microbes in soil, affecting how plants grow and the environment's stability. Machine learning can analyze complex data about these microbes, helping farmers manage soil better and choose crops that can survive droughts. Understanding how machine learning predicts drought effects is tricky but important. SHAP (SHapley Additive ExPlanation) values, an interpretable machine-learning method, help with this by showing which features, in this case taxa, are most important in the predictions. These methods are new to studying soil microbes but offer hope for finding ways to help crops survive droughts and improve agriculture.

Which methods were used?

We studied how drought affects soil microbes using two datasets. The first dataset, 'Grass-Drought,' included 623 samples from soil, roots, and rhizosphere of various grass species under drought and regular watering conditions. The second dataset, 'Sorghum-Drought,' focused on the drought effects on Sorghum bicolor plants, and was used to test the model predictions on a new unseen dataset. Both datasets were analyzed using DNA sequencing to understand the differences in microbial communities.

A Random Forest Classifier (RFC) was used to analyze the data. To understand the RFC's predictions, we used SHAP values, which help identify which microbial features are most important in predicting drought stress.

To compare the effectiveness of different methods, SHAP value results, feature importances contributing to drought/control, were compared with the results from commonly used Differential Abundance Analysis (DAA) tools.

We evaluated the performance of the models using several measures:

  • Accuracy: How often the model makes correct predictions.
  • F1 Score: A balance between precision and recall.
  • Precision: How many of the positive predictions were correct.
  • Recall: How well the model identifies all the positive instances.

What was discovered?

The comparison of SHAP values with DAA results showed that both interpretable machine learning and traditional statistical methods in metagenomics produced similar outcomes. Therefore, these approaches can be used together to identify key microbial markers for drought stress, enhancing the accuracy and reliability of detecting important taxa.

This study employed machine learning to predict whether soil samples were irrigated or drought-stressed by analyzing the types and relative abundances of soil microbes. The AI focused on changes in the number of different microbes rather than the presence of entirely new or missing types. The AI model demonstrated high accuracy and effectively explained its predictions. It performed well across different soil samples, indicating its potential to identify signs of drought stress in various conditions. This capability is particularly valuable for detecting marker taxa for drought stress, aiding in the definition of microbial strains for targeted bioinoculation approaches.

How does this information help?

With extreme weather events like droughts becoming more common, it's essential to find new ways to understand how soil microbes affect agriculture and ecosystems. This study developed a method to identify drought stress in soil based on its microbial content, which works well across different drought levels and various grass types. While this approach is focused on drought stress, it can also be adapted to analyze other types of stress with the right data and a suitable machine learning model. However, this model is location-specific, meaning that farmers should create machine-learning models tailored to their fields. For instance, using a model trained on data from California on soil samples from Siberia would be ineffective due to different microbiomes.

The study's machine learning model not only detects drought stress but also improves farming practices such as irrigation. By identifying key microbes linked to drought stress, this research aids farmers and breeders in selecting the best microbial strains to support plant growth. Understanding these beneficial microbes can help plants better adapt to drought, boost their immune systems, and protect them from diseases, ultimately leading to better crop yields and resilience.

Read the full publication

Read about Microbiome Solutions for Agriculture

or contact Michelle Hagen or Sascha Patz directly!

Share on

Get in touch with us