Erik Amézquita

PFFIE Postdoctoral Fellow

Plant Science and Mathematics @ University of Missouri

If life gives you lemons, do directional statistics

Citrus work like lego blocks. Roughly speaking, any two citrus can hybridize and produce potentially new citrus varieties. In fact, all citrus that you see in the produce section of the market are hybrids. A grapefruit is actually a cross of a pummelo with a sweet orange. An a sweet orange is a cross of a pummelo with a sweet mandarin. And a sweet mandarin is a cross of a pummelo with a pure mandarin. A pure mandarin crossed with a pummelo can also produce a sour orange. And a sour orange crossed with a citron yields a lemon. You get the picture. Citrus are as promiscuous as it gets.

Citrus Genealogy
Credits: Wu et al. (2018)

This large variety of hybridization possibilities corresponds to a variety of citrus fruit shapes. Can we quantify such shape diversity? If we can mathematically describe the shape of both a pummelo and a sweet orange, would I be able to predict that their shape combination yields a grapefruit?

We are especially interested in being able to quantify and characterize the distribution of the oil glands on the citrus fruits. Citrus essential oils are important for the food and perfume industries. Oil glands also play a fundamental role in citrus fruit development. There are plenty of unknowns going forward.

The setup

In collaboration with the Givaudan Citrus Variety Collection at University of California—Riverside, we got access to 158 individual fruit samples comprising 64 citrus varieties. These included all the fundamental citrus (citrons, pure mandarins, pummelos), close relatives (trifoliates, kumquats, microcitrus), and important hybrids (sweet oranges, lemons, etc.) These were X-ray CT scanned at Michigan State University. After a lot of image processing fiddling, we manage to segment out the central column, flesh, rind, skin, and oil glands for each citrus fruit.

original normal unair clean clean


We focus on the oil glands. We can represent each oil gland as point in space where the x,y,z coordinates are the center of mass of each gland. That is, each citrus fruit now can be thought as point cloud in space (!) As a sanity check, we verify that our count of individual oil glands goes in hand with established literature.

original normal

Modeling citrus fruits as ellipsoids

It seemed natural to model citrus as ellipsoids —an affine transformation of a sphere. This was done by simply performing ordinary least squares regression to find the best algebraic parameters of the general ellipsoid formula. Next, the point cloud made of all oil gland centers was projected to the best-fit ellipsoid. Finally, we reparameterized these centers in terms of geodetical coordinates —latitude and longitude. But latitude and longitude coordinates can be thought as lying on a unit sphere, as well. We thus have a size-independent common framework to compare all the oil glands for all the citrus fruit varieties. We visualized the oil glands on 2D via two Lambert cylindrical equal-area projections from the north and south poles.

original normal normal

Directional statistics

Now that all our oil gland data can be represented as points on a common unit sphere, we turn to directional statistics. Directional statistics allows us to characterize distributions specifically on circles, spheres, and related surfaces. We can also test whether a collection of points on a sphere follow a known distribution. To this end, we observed that there is no statistical evidence that supports the hypothesis of glands being uniformly distributed. Nor there was evidence in favor of rotational symmetry.

original normal

What is the distribution then?

We can compute an empiric distribution via kernel density estimation (KDE). As expected, there is a spherical-specific KDE that we can use. As in the linear case, our KDE will depend on a bandwidth parameter that will determine how smooth our empiric distribution is. We can play around with varying bandwidth parameters and observe which regions show the most dramatic distribution changes.

original

Future directions

Now that we are convinced that our pipeline enables us to quantify and compare citrus fruit shape, the potential future directions are exciting. To name a few:

  • Locate, segment, and phenotype seed tissue.
  • Explore more on normal diffusion mechanics and their possible relationship to oil gland distribution.
  • Define a measure of similarity of oil gland distributions and compute a pairwise distance matrix for all citrus fruits.
  • Compare such distances between distributions to phylogenetic distances.
  • Explore alternative ellipsoid-to-sphere algorithms to minimize distortion.

Stay tuned for updates!

¡Published article: Amézquita et al. (2022)!

Amezquita et al 2022

DOI: 10.1002/ppp3.10333

As slides: Presented at CMSE Brown Bag Seminar. October 2022.

As a static poster: Presented at OSUPSS. April 2022.

As a dynamic poster: Presented at OSUPSS. April 2022.

Amezquita et al 2021

Amezquita et al 2021

——————————

Go to additional resources

Other research projects

Additional Resources

Ellipsoid approximations

Given a pointcloud, a set of 3D x,y,z coordinates, what are the parameters of the best-fit ellipsoid? I was surprised there was not a straightforward, widely-accepted answer. I also realized how rusty my pre-calculus math is by now. There are a number of papers out there.

I chose Li and Griffiths (2004) as it was both mathematically and computationally the most straightforward answer I could find. They simply perform an OLS to find the algebraic parameters that minimize the algebraic residuals for the general quadric surface. In principle, this approach could produce parameters that approximate the points with a paraboloid or hyperboloid instead. You must know that your point cloud indeed looks like an ellipsoid. Supposedly they use Lagrange multipliers to guarantee that you'll always get ellipsoid paramters, but I personally did not verify it.

Ellipsoid Projection
Credits: Li and Griffiths (2004)

This approach will give you algebraic parameters of the general quadric surface equation. To translate these into more intuitive geometric parameters (semiaxes lengths, origin, rotation, etc.), you can follow Section 2.4 of Panou et al (2020).

  • I personally found Chojnacki et al (2000) too convoluted. I admit stats and numerical analysis are not exactly my strength.
  • I remember I had trouble fully following and implementing Yu et al (2009).
  • I tried my best to implement Reza and Sengupta (2017). I was unable to get sensible results.
  • I was also a bit confused by Sivapalan et al (2011).
  • Panou et al (2020) is a relative easy read for someone with a basic math background. They propose a two-step approximation to obtain the ellipsoid. The first step is an algebraic parameter approximation like in Li and Griffiths (2004). The second is a geometric parameter approximation. However, I was unable to get significant results from this second step. Maybe it only works with geoids, with very large ellipsoids the size of a planet, not for ellipsoids the size of a lemon.

Ellipsoid coordinates

Ellipsoid Projection
Credits: Diaz-Toca et al. (2020)

Once we have the ellipsoid parameters, we have to project our original point cloud onto such ellipsoid. This projection can be either:

  • Geocentric: By drawing a ray from the ellipsoid center to the point and noting where the ray intersects the ellipsoid surface.
  • Geodetic: By projecting the point perpendicularly to the ellipsoid surface, i.e., by minimizing the distance from the point to the surface.

The former projection can be computed immediately. The latter requires a much more ellaborate computation. Diaz-Toca et al. (2020) Is a very well-written breakdown of the computations needed, and they even provide a link to C code that works out of the box.

Regardless of the projection used, we can then reparameterize the original point cloud in (latitude, longitude, height) coordinates. We can then translate the (latitude, longitude) coordinates to a unit sphere. These unit sphere will be the common ground that will allow us to compare all citrus at once.

Directional statistics

Directional statistics is a relatively new branch of statistics which focuses on statistics where the domain is not a Euclidean space —as with regular statististics— but a circle, sphere, torical or cylindrical.

The 1999 seminal textbook Directional Statistics by Mardia and Jupp is pretty good. It is seminal for a reason. It basically compiles everything that was known on the subject until that point. The textbook is quite comprehensive, the index is very fleshed out, and most of the chapters are self-contained. You can jump straight into the the relevant content for the application you have in mind. It also comes with plenty of citations so you can dive deeper into the relevant literature.

More than 20 years later, Mardia and Jupp are still relevant. Their textbook is still one of the best ways to get familiar with the foundational ideas. Naturally the discipline has grown, and there have been plenty of advances since 1999. Ley and Verdebout's Modern Directional Statistics (2017) aims to be an update. They still provide some basic definitions and concepts, albeit very succintly and refer the reader back to Mardia and Jupp plenty of times. Ley and Verdebout are still a pretty good reference to know where to start the google scholar search on how to do a particular task in directional statistics. Applications are fleshed out in their complement Applied Directional Statistics.

A brief review, update, and historical overview of the discipline is provided by Pewsey and García Portugués (2021). Their last section covers on some of the available software to do actual computations. Fortunately, most of the software comes as R packages, works out of the box, and it is easy to use.

Citrus are intrinsically linked to human history

I love the Gastropod podcast. The food that we consume is more than just food. It is also a reflection of human culture and history. Food shaped our society. Citrus are no exception. As Cynthia Graber and Nicola Twilley say, «not only were these [citrus] fruits so precious that they inspired both museums and the Mafia, they are also under attack by an incurable immune disease that is decimating citrus harvests around the world.» Listen to their whole citrus episode!

Also, I highly recommend you following and watching @WeirdExplorer YouTube channel for a foodie insight into the wide, global fruit variety. He has a number of citrus-specific episodes. In particular, one of his first episodes roughly explains how most citrus are hybrids.

And this one is pretty good as well. It also contains an important message that fruit naming matters. There is a reason why the Makrut lime should be always called Makrut lime and not something else.

Citrus are quite fascinating. Their history is intrinsically linked to ours. To name a few fun facts.

  • Citrus paved the way to the first modern medical trials in Western medicine. Scurvy was the biggest scourge of the seas. It is estimated than more sailors died due to scurvy than the rest of diseases and sea battles combined. One day, the British tested the anecdotes of sailors being scurvy-resistant due to regular consumption of citrus. Several ships were sent for long voyages. One was provided with citrus juice. Others were provided with no fruits but fresh water, or various elixirs. This was the setup of the first modern medical trial.
  • Very specific citrus are a key for various South and South East Asian cuisines. There are anecdotes of Thai immigrants trespassing the Citrus Variety Collection of UC Riverside to obtain key ingredients back in the 60s. Else, people would smuggle citrus from Asia to California to complete important dishes. Smuggling naturally comes with a high risk of importing diseases, which can be particularly catastrophic for comercial fields of citrus as most of the commercially grown citrus fruits are essentially clones. It is important to develop citrus varieties that can grow in other climates to both supply people with fruits and eliminate the need of dangerous smuggling.
  • Qu Yuan, probably the most important classic Chinese poet, depicted an orange tree as a symbol of steadfastness and resilience in the beautiful poem Ju song. Although we do not know for certain if Qu Yuan is the actual author of the poem, it is beautiful nonetheless. For context, Qu Yuan was a renowned advisor of a king circa 200 BC. Inner conflicts with other advisors and court members led to rumor spreads and backstabbing, which forced Qu Yuan to be exiled from the kingdom. According to the legend, Qu Yuan was absolutely distraught, he was profoundly hurt that his king believed others' words and not his. Qu Yuan was roaming when he observed an orange tree. He was captivated, given the fact that orange trees were not supposed to grow, let alone flourish, in that region, where the climate was colder. If an orange tree was able to grow against harsh climates, he as a poet should be able to withstand the terrible setbacks.
  • Citrons, especially etrogs are key in some Jewish festivities. The relationship between etrogs and the Jewish community goes back millenia. In fact, modern evidence suggests that as the Jewish community moved westward towards modern Europe, they brought citrus cultivation and citrus breeding with them. It was thanks to their migration that Romans tasted citrus for the first time, and they made quite an impact, especially across Italy and Spain.