Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Day 22: In-Class Assignment

University of Missouri

✅  Put your name here

✅  Put your group member names here

Bootstrap and confidence intervals: urban birds and seed rains

Picture of several pigeons sitting on a steel barrier in a city. Various skyrises can be seen on the background.

Credits: PBS

Learning goals of today’s assignment

  • Recap how to use bootstrap to compute confidence intervals whenever a formula is not immediately available

  • Use SciPy to do bootstrap confidence intervals under the hood

Assignment instructions

Work with your group to complete this assignment. Instructions for submitting this assignment are at the end of the Notebook. The assignment is due at the end of class.


Importing the modules that we will need

Before we start anything, it is good practice to have all our imports as the first Python cell

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
from sklearn import metrics

1. Predicting bird diversity in urban settings

(This is the same text as in Homework 04)

There is a wealth of satellite data out there. With it, we can compare large swaths of land and environments at once, which can improve our biodiversity predictions. The normalized difference vegetation index (NDVI)—a widely used index captured by satellite cameras to determine “plant green-ness”—often shows positive relationships with bird diversity. However, its reliability as a predictor across contrasting urban contexts remains uncertain.

You will examine the relationships between the NDVI and bird species richness across three cities—Columbia MO (USA), Xalapa (Mexico), and Medellín (Colombia)—using Landsat-8 satellite imagery with buffer distances of 200m. Think of “camera focus settings” if you’re unfamiliar with buffer distances.

All the data is taken from:

MacGregor-Fors, I, Garizábal-Carmona, JA, García-Arroyo, M, Fishel, E, Nilon, CH, Lemoine-Rodríguez, R (2026) Does NDVI explain patterns of urban bird diversity? Insights from temperate to tropical cities. Urban Forestry & Urban Greening 116: 129211.

Relationships between bird species richness and NDVI values highlighting the extreme scenarios (weakest and strongest) for visualization purposes. We selected these extreme scenarios based on the coefficient of determination at the 50 buffer using Sentinel-2. We also included Medellín, with intermediate values, to provide the full comparative picture.

Credits: MacGregor-Fors et al (2026)

✅  Question 1

  • What is reflected on the x-axis?

  • What is reflected on the y-axis?

  • In your own words, what information do you get out of this figure?

Put your answer here

1.1 Data loading and wrangling (2pt)

✅  Task 2

  • Load the 'Bird_Richness_NDVI_Woody_Pages7-20.xlsx' file (attached in Canvas). Notice that it is an Excel file, not a CSV.

  • Display the first few rows to make sure it looks good. It should be 405 rows and 15 columns long.

# Load with pandas

Things to keep in mind: Each row represents a different geographical Point:

  • City: The city where the data comes from.

  • BR: Bird-richness index: the higher means larger diversity of birds species.

  • L8: NDVI data from the Landsat-8 satellite.

  • For both satellites, we have 4 buffer distances (camera resolutions): 50m, 100m, 200m, and 400m.

✅  Task 2 (continued)

  • Mask and make a subDataFrame looking only at the data from Columbia.

  • We will work with this subdataframe from hereafter

# Your subdataframe

1.2 Correlations

Let’s check if bird richness in Columbia is correlated to NDVI according to Landsat-8 data with 200 meters buffer distance.

✅  Task 3

  • Make a scatter plot of NDVI versus bird richness.

    • It will be easier down the road if you use Axes objects instead of pyplot (which is what we used for Days 20 and onward).

  • Remember to add x- and y-axes labels.

# Complete the scatterplot
fig, ax = plt.subplots(1,1, figsize=(3,4), sharex=True, sharey=True) # Use Axes `ax` instead of pyplot
ax = np.atleast_1d(ax) # Make sure ax is an array (by default, it is NOT when making a 1 x 1 subplot)

# Color the background ever so slightly
ax[0].set_facecolor('snow')

# Uncomment and complete
#ax[0].scatter() 
<Figure size 300x400 with 1 Axes>

✅  Question 4

  • From the scatter plot alone, do you think NDVI is a good predictor of bird richness in Columbia?

Put your answer here.

✅  Task 5

Now compute and print both the Pearson and Spearman correlation coefficients along their respective p-values.

# Your code

✅  Question 6

  • Are Pearson and Spearman coefficients quite different or quite similar?

  • What conclusion do you draw by looking at both of them? Does it match your intuition from Q4?

Put your answer here.

1.3 Confidence intervals for Pearson’s correlation: the PearsonRResult object

Confidence intervals are not limited to means: we can also compute confidence intervals for the Pearson’s rr. Its interpretation is the same as the CI for means:

Pearson’s 95% confidence interval is an interval that will contain the true Pearson’s rr 95% of the times.

Said other way: if we were to repeat our experiment 100 times—we grab 100 samples—, the 95% CI contains the rr from at least 95 experiments.

The formula to compute a Pearson’s rr confidence interval is quite complicated, though This is because, unlike a normal distribution, Pearson’s rr can never go outside the [1,1][-1, 1] range.

The good news is that SciPy already does all the math for us. The stats.pearsonr returns actually a PearsonRResult object. Check the Returns section of stats.pearsonr.

pearsonr = stats.pearsonr( x_variable, y_variable )
print(pearsonr)

# PearsonRResult(statistic=<r value>, pvalue=<p value>)
# As you have done before, we can then get the r coefficient by doing
# pearsonr.statistic
#
# And the p-value by doing
# pearsonr.pvalue

# But we can also get a confidence interval by doing
ci = pearsonr.confidence_interval(confidence_level = alpha)
print(ci.low, ci.high)
# It does 95% CI by default

✅  Task 7

  • Compute and print the Pearson’s correlation 95% confidence interval for the NDVI versus bird richness data above.

  • Also print the 99% CI.

# Your code

✅  Question 8

  • Do the confidence inteval values change or reinforce your hypothesis from Q4?

  • If you were a consultant for Columbia’s bird conservancy group, would you advise them to invest in NDVI data collection and analysis?

Put your answer here.

1.4 Confidence intervals for Spearman’s correlation: bootstrapping

But what about Spearman’s ρ\rho coefficient? We have two bad news.

But we have one good news: we can bootstrap. Recall the steps:

  1. Generate a new samples from our original measurements by sampling with replacement

    • In this case, we consider a (ndvi_value, br_value) pair as a single point.

  2. Compute and save the Spearman’s coefficient

  3. Repeat steps 1-2 a lot of times (like N=1000N = 1000)

  4. Get the (1α)/2(1-\alpha)/2 and (1+α)/2(1+\alpha)/2 quantiles of all the Spearman’s coefficients

✅  Task 9

The tricky part is to bootstrap a pair of measurements instead of a single one, as we did back in Day 19. A strategy is to sample (with replacement) a set of indices instead. Then we use those indices to get both NDVI and BR values.

  • Use the rng.integers function to generate a random array of integers between 0 and <length of NDVI data>. It must have size of <length of NDVI data>.

  • Save such array of random numbers as a random_indices variable.

  • Use the array to get the corresponding NDVI and BR values

    • For example, you might want to do ndvi[random_indices] if ndvi is an array

    • Or ndvi.iloc[random_indices] if it is a Series

  • Compute the Spearman’s correlation coefficient with your new sampled values

# Your code

rng = np.random.default_rng(42)

✅  Task 10

Now put things on a loop:

  • Make an array with np.zeros where you’ll save Spearman coefficients.

  • In a loop:

    • Make a new random set of indices like in T9

    • With the indices, get a sample of NDVI values and their corresponding BR values

    • Compute and save the spearman correlation

    • Repeat N=1000N = 1000 times

# Your code

N = 1000

✅  Task 11

  • Now use np.quantile to get and print the quantiles corresponding to the 95% confidence interval (α=0.95\alpha = 0.95).

  • Print the 99% CI as well.

# Your code

✅  Question 12

  • Do the confidence inteval values make sense?

  • How do they compare to the CIs from Pearson? Is this like in Q6?

Put your answer here.


2. Seed rainfall in Northern Missouri praires [Time-permitting]

(This is the same text from Homework 04)

Let’s switch gears. For this part, the research quantifies patterns of seed rain—the process of seeds falling from plants to the soil—in a remnant tallgrass prairie and various stages of restored prairies. It reveals that young prairies produce significantly more seed rain than previously understood. The study found distinct changes in seed rain metrics as prairies aged, with the oldest restorations showing comparable quantities to the remnant prairie, but lacking in species composition. The findings suggest that increasing seeding rates of desirable species may be necessary to enhance restoration outcomes.

More details and data on:

Wynne, KC, Parker-Smith, MJ, Murdock, EM, & Sullivan, LL (2024) Quantifying seed rain patterns in a remnant and a chronosequence of restored tallgrass prairies in north central Missouri. Journal of Applied Ecology, 61: 3017–3027

The goal is to reproduce Figure 1b:

The number of seeds, (b) seed rain biomass (mg) and (c) CWM seed mass (mg) captured per transect in the remnant prairie in 2019 (dark green), young (2-year-old, dark blue), middle-aged (5-6-year-old, light blue) and old (15-year-old, light green) reconstructed prairies. Error bars represent 95% confidence intervals around model estimates (black). Asterisks indicate sites with significant differences from the remnant prairie (p < 0.05).

Credits: Wynne et al (2024)

✅  Question 13: Just looking a Figure 1b

  • What is reflected on the x-axis?

  • What is reflected on the y-axis?

  • In your own words, what information do you get out of this figure?

Put your answer here

2.1 Data loading

✅  Task 14

  • Load the 'Seed_Rain_Traits_Dataset.csv' file (attached in Canvas). Notice that it is an Excel file, not a CSV.

  • Display the first few rows to make sure it looks good. It should be 4275 rows and 8 columns long.

# Your code

Important columns and information:

  • Plot: Prairie type

    • PFCA 1 = Young-aged restored prairie

    • PFCA 2 = Middle-aged restored prairie

    • PFCA 3 = Old-aged restored prairie

    • TP = remnant prairie in 2019

  • Transect: Corresponding transect number at a site (10 per site).

  • Number: The number of seeds collected from traps.

  • TotWeight: Total seed rain biomass (mg)

✅  Task 15

  • With your original DataFrame, group the rows by prairie age (Plot) and transect

  • Compute the total biomass of seeds for each prairie age–transect combination. Remember pandas has a .sum function.

  • Use the as_index=False parameter to obtain a DataFrame instead of a Series.

  • The first five rows should read something like:

PlotTransectTotWeight
0PFCA 112973.886640
1PFCA 124920.950468
2PFCA 136435.186945
3PFCA 144836.356308
4PFCA 156052.586332
# Your code

✅  Task 16

Peform three t-tests to check if the seed counts from restored prairies (PFCA 1, PFCA 2, PFCA 3 for young, middle aged, and old, respectively) are different compared to remnant prairies (TP).

  • Mask your DataFrame from Task 15 appropriately

  • Print your t-test results

# Your tests

✅  Question 17

  • Based on your results above, what do you think is the relationship—if any—between prairie restoration age and seed counts?

Put your answer here

✅  Task 18

But the p-values alone might not tell the full picture. Now make a jitterplot to confirm or reject your thoughts. Before that, make a summary of your summary.

Make a new DataFrame that is a summary of the one in T15: you want the seed biomass mean, SE, and sample size for each prairie age

  • Group rows by Plot

  • Get the seed biomass mean, SE, and sample size (count) for each age

  • Display your new summary

# Your code here

✅  Task 19

  • Use the values in your T18 DataFrame and append to it two new columns with the lower and upper 95% confidence intervals of the mean.

  • Remember to set as_index = False

# Your code

alpha = 0.95

✅  Task 20

  • Like Figure 1(b), make a jitterplot where different columns correspond to different restoration ages

  • Uncomment and edit the lines to plot the mean biomass along the 95% CI like in Wynne et al (2024)

  • Remember that the indices of your summary-summary DataFrame (from T19) are conveniently 0, 1, 2, 3

  • Label the x- and y-axes

# Your plot

fs = 12
rng = np.random.default_rng(42)
nudge = rng.uniform(-0.15, .15, 10000)

# for i in range(len(<summary dataframe>)):
    #ax.scatter([i], [summary.loc[i, 'mean']], c='k', s=75, zorder=2)
    #ax.plot([i,i], [summary.loc[i, 'low_ci'], summary.loc[i, 'high_ci']], c='k', marker='_', ms=10)

✅  Question 21

  • Does the visualization match your thoughts from Q17?

  • If you were a consultant for a conservancy NGO, what would you tell them about waiting times to see some true restoration taking place?

Put your answer here


Congratulations, you’re done!

Submit this assignment by uploading it to the course Canvas web page. Go to the “In-class assignments” folder, find the appropriate submission link, and upload it there.

See you next class!

© Copyright 2026, Division of Plant Science & Technology—University of Missouri

References
  1. MacGregor-Fors, I., Garizábal-Carmona, J. A., García-Arroyo, M., Fishel, E., Nilon, C. H., & Lemoine-Rodríguez, R. (2026). Does NDVI explain patterns of urban bird diversity? Insights from temperate to tropical cities. Urban Forestry & Urban Greening, 116, 129211. 10.1016/j.ufug.2025.129211
  2. Wynne, K. C., Parker‐Smith, M. J., Murdock, E. M., & Sullivan, L. L. (2024). Quantifying seed rain patterns in a remnant and a chronosequence of restored tallgrass prairies in north central Missouri. Journal of Applied Ecology, 61(12), 3017–3027. 10.1111/1365-2664.14806
  3. Wynne, K. C., Parker‐Smith, M. J., Murdock, E. M., & Sullivan, L. L. (2024). Quantifying seed rain patterns in a remnant and a chronosequence of restored tallgrass prairies in north central Missouri. Journal of Applied Ecology, 61(12), 3017–3027. 10.1111/1365-2664.14806