Erik Amézquita

Quantifying the shape of barley grains

Barley is more than just beer. It is the fourth most cultivated grain around the world, behind maize, rice, and wheat. Barley is probably the most adaptable crop out there. There are barley land races that are originary from the warm, low-altituted, humid climate of the Moroccan Meditarrenean coast; others adapted to the dry, cold, high-altitude terrain of the Tibetan plateau. And then there are barley land races for every other climate in-between, across the whole Eurasian continent, from the Scottish Orkney Islands, to Urals in Russia; from the Norweigian Artic circle to the Rajasthan praires in India.

Even looking at the seeds, we can already see that some land races have small, rounded seeds; while others have long, elongated shapes. It is natural then to ask How does shape correspond to this climate versatility? And in turn How do we comprehensively capture shape nuances across barley grains? For shape quantification, as you might guess, we drew tools from Topological Data Analysis. Especifically, we used the Euler Characteristic Transform.

The Euler Characteristic Transform (ECT)

The Euler Characteristic (EC) of an object is a topological invariant. That is, it is a quantity that remains unchanged if we stretch, shrink, twist, or smoothly deform the object. However, it might change if we cut, paste, or pierce the object. The EC is attractive to our purposes, as it is both easy to compute and intrisically connected to the topological features of an object. From the Euler-Poincaré formula, we know that:

EC = #(connected components) - #(loops) + #(voids)
EC = #(vertices) - #(edges) + #(faces).

Following this formula, we see why all Platonic Solids share the same EC of 2. Which in turn is the same EC of a sphere.

Euler Characteristic
Credits: Wikipedia

The EC on its own is too simple, though. We cannot tell apart a pyramid from a cube based solely on their ECs. We thus turn to the Euler Characteristic Transform (ECT) of objects. The ECT of barley seeds in our case. First, we fix a direction, say top to bottom. Then, we slice our seed across that direction and we compute how the EC changes as we "add slices". This gives us a topological signature, a time series on how the topology changes as we traverse the seed vertically.

Euler Characteristic Curve

How do we choose the right direction? We do not. We simply take all possible directions. For every direction, we slice and dice our seeds across such direction. That gives us different topological signatures, that we concatenate in a single, very long vector. This whole procedure is refered to as the ECT of the seed. For illustrative purposes, the animation below shows the same seed being sliced across three different directions and we record its ECT below.

Euler Characteristic Curve

The ECT is mathematically quite interesting. In 2014, Turner, Mukherjee, and Boyer mathematically proved that the ECT is injective and that it effectively summarizes all the morphological information of an object. In other words, two different shapes will produce two different ECTs; and if we are given two different ECTs, these must come from two different shapes.

An important caveat is that for the mathematical theorem to hold, we need to slice our seeds across all possible directions. We cannot perform infinite computations, though. As a proof of concept, we simply took a large number of directions and ran with it. About 150 directions seemed to be enough for our purposes. We observed that taking 200, 300, and 400 directions did not improve our results.

The barley data

In collaboration with the University of California—Riverside, we got access to barley panicle samples corresponding to 28 different land races from 28 very different climates. These panicles were later X-ray CT scanned at Michigan State University. But we want the grains. After plenty of trial and error with image processing, a script was developed to automatically extract digitally individual grains from all the panicles.

Once we have individual 3D grains, we measure barley grain shape with two different sets of shape descriptors:

Traditional: length, width, volume, surface area, etc. for each grain. We used 11 traditional descriptors in total.
Topological: the ECT computed across 158 directions across 16 slices per direction This produced a very large vector of more than 2500 dimensions per seed (!). To avoid shenanigans from the curse of dimensionality, this large vector was later aggresively reduced to just 12 dimensions using UMAP.

Barley Classification Results

How descriptive is our shape quantification? As a sanity check, we first test whether shape information alone is enough to tell apart seeds from different accessions. Now that we describe numerically the shape of every seed, we can employ many machine learning classification algorithms. We chose a support vector machine (SVM) due to its simplicity and the fact that we do not require large amounts of training data (unlike neural networks). Following a supervised learning pipeline, we chose 75% of the seeds from each accession as training data. With this training, the machine tries to deduce a pattern that characterizes the numerical descriptors corresponding to each accession. Next, we verify how good of job the machine did for pattern recognition. We test such pattern recognition asking the machine to predict the accession of the 25% of grains remaining based solely on numerical shape description. Finally, we compute the machine accuracy. Below is an illustration of supervised learning with 3 accessions. In our experiment, we worked with 28 accessions.

Supervised Learning

We trained 3 different SVMs. One used only traditional shape descriptors. The other used solely topological shape descriptors. And the last one used both sets of descriptors. The classification accuracy results were quite encouraging.

Shape descriptors	No. of descriptors	F1
Traditional	11	0.55 ± 0.019
Topological	12	0.74 ± 0.016
Combined	23	0.86 ± 0.010

Just using traditional descriptors, the machine does a decent job already. It correctly predicts the accession 55% of the time. For comparison, if the machine were to guess randomly out of 28 possible accessions, it would give the correct answer less than 4% of the time. If we use topological shape descriptors, the machine accuracy improves, which suggests that the ECT is capturing shape nuances that are not readily seen by the naked eye. Finally, combining both sources of morphological information yields the best results.

A closer examination of the results shows that traditional shape descriptors highlight shape similarities across barley accessions. Topological shape descriptors highlight similarites across individual panicles. Thus, their combination yields exciting results at both population and individual levels (!) Compare Figures 5 and 6 from our paper.

But what does topology really measure?

One key advantage of the ECT over other TDA methods is that it is relatively easy to observe how physical, shape features are topologically encoded. In the ECT case, we can use a Kruskal-Wallis One Way Analysis of Variance to determine which directions and slices were topologically different across seeds corresponding to different barley varieties. Turns out that the most significant shape differences across accessions correspond to the top and bottom sides of the seeds!

Seed ANOVA

Future directions

Now that we are convinced that our shape descriptors do a good job of telling apart different barley accessions, the potential future directions are exciting. To name a few:

Identify specific molecular markers corresponding to morphological differences across the diverse barley population.
Develop a high-throughput pipeline to produce 3D images of individual barley seeds and quantify comprehensively their morphology.
Formalize a method for promising seedling selection to further crop breeeding.
Extend such pipeline and population genetics studies to other crops and grains.

Stay tuned for updates!

¡Published article: Amézquita et al. (2021)!

DOI: 10.1093/insilicoplants/diab033

—

A general-audience brief piece was written about our paper in Botany One.

Lee un resumen de nuestro paper para público general en Botany One.

—

The Euler Characteristic Transform is our main workhorse to measure and compare grain shape. Watch my following video below for more details.

This is a 4min spiel on the barley project I recorded a while ago.

—

As slides: Presented at JMM 2022. April 2022.

As a static poster: Presented at IPPS 2022. September 2022.

As a dynamic poster: Updated from AATRN. October 2021.

——————————

Go to additional resources

Other research projects

Additional Resources

Regarding the Euler Characteristic Transform (ECT)

For background material, you can first watch this video on the Euler Characteristic. You can skip the "Deformation Retrack" section.

My PhD advisor, Liz Munch, does a pretty good job at explaining the ECT and this barley project. Her seminar was prepared to include a non-topology audience.

The ECT was originally proposed as the cheap cousin of the Persistent Homology Transform (PHT) in the same Turner et al. (2014) paper. In fact, Turner et al first prove that the PHT is injective. Since the proof relies on keeping track of EC changes, the proof of injectivity extends immediately to the ECT. Personally, I found reading Turner et al quite challenging at first. Here are some slides I made with plenty of figures to ease the reading.

ECT in the math literature

As I mentioned above, the ECT is a very interesting mathematical technique. There are plenty of mathematical advances that have been developed recently.

Curry, Mukherjee, and Turner (2022) extend the original 2014 paper. They show that the injectivity of the PHT and ECT extends to all dimensions. Moreover, they prove the existence of finite bounds when it comes to choose directions. Good news: we do not need infinite directions to mathematically guarantee the injectivity of the ECT (yay). Bad news: the bound is still absurdly large, and right now only works for theoretical purposes.
If we are handed a topological signature, a string of numbers that we are told come from an ECT, can we reconstruct the original shape this ECT signature comes from? The dissertation of Leo Betthauser provides some insight into how we can reconstruct 2D images based solely on their ECT signatures.
Things get way trickier in 3D. Fasy et al (2022) provide some exciting initial insights for small test cases. A computationally-feasible algorithm for real-life data remamins elusive.
Crawford et al (2020) propose an alternative: the Smooth Euler Characteristic Transform (SECT). This is obtained by first computing the ECT as explained before, and then this piecewise linear signal is integrated. This seemingly simple extra step guarantees that the SECT is now a continuous function that lives in the L2 space, the space of all real-valued continuous functions. This in turn opens the door to plenty of tools from the functional analysis realm.
Maybe not all slices and directions should count the same. If we have additional information of the data we are trying to characterize morphologically, then we can tune the ECT to give more weight to certain directions. This Weighted ECT is explored by Jiang, Kurtek and Needham (2020).
Wang et al. (2021) combine the ECT with SINATRA, a statistical pipeline that takes in two classes of shapes and highlights the physical features that best describe the variation between them.

Barley is more than just beer

Barley is the fourth most cultivated crop, behind rice, maize, and wheat.
There are barley land races that have adapted to virtually every climated across the whole Eurasian continent.
In fact, this wonderful adaptability paved the way to the first permanent human settlements in the Tibetan plateau. Barley provided a reliable food source to the first inhabitants of the region.
In general, this historical and natural adaptability of barley provides important insights into paleobotany. For instance, charred and fossilized barley grains give important clues on how the prehistoric human populations moved across Central Asia and Southern France.
The versatility of barley makes it a crucial crop to understand better climate resilience. The concern is even more pressing given the rapidly climate change.

Assistant Professor @ University of Missouri

Division of Plant Science and Department of Mathematics