Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Day 17: In-Class Assignment

University of Missouri

✅  Put your name here

✅  Put your group member names here

Can we control for rice weevil infestations with wasp vibes?

Close-up picture of rice grains covered with rice weevils.

Credits: Rice Array

Learning goals of today’s assignment

  • Compute the standard error of the mean via bootstrap and direct formula

  • Practice more pandas wrangling and column datatypes

Assignment instructions

Work with your group to complete this assignment. Instructions for submitting this assignment are at the end of the notebook. The assignment is due at the end of class.


Background

The rice weevil (Sitophilus oryzae L.) and the lesser grain borer (Rhyzopertha dominica F.) are cosmopolitan pests that commonly infest stored cereals, causing substantial damage. Biological control is a complementary management strategy that may expand the integrated pest management (IPM) toolset in stored cereals. Theocolax elegans reduced R. dominica and S. oryzae populations in grain bins by >90% compared to controls. The mere presence of natural enemies has been shown to influence patterns of insect habitat use, feeding, oviposition and dispersal, ultimately impacting the prey damage and fecundity.

A large hindrance to the implementation of biological control in food facilities in North America is the cultural preoccupation with 'clean' facilities and the perception that biocontrol agents contribute to tolerance limits in the food supply. However, it could be beneficial if food facilities could exploit the 'ecology of fear' by deploying natural enemy cues to reduce stored product activity and damage to commodities, without contributing to the perception of the facility being 'unclean' by fostering or releasing beneficial insects.

Table 1: Summary of percentage Sitophilus oryzae consumed by and nonconsumptive effects elicited by natural enemies collected adjacent to a food facility in 2022.

FamilynnMean % consumed  ±  SE% Self-aggregation
Miridae30  ±  033
Thomisidae50  ±  060
Coreidae102.0  ±  2.010
Chrysopidae63.3  ±  3.367
Unknown Araneae36.7  ±  3.30
Salticidae78.6  ±  5.514
Acrididae2217  ±  5.736
Reduviidae320  ±  5.833
Gryllidae2430  ±  7.825
Carabidae337  ±  3233

Hetherington, M.C., Sakka, M.K., Abshire, J., Maille, J.M., Stoll, I., Athanassiou, C.G., Scully, E.D., Gerken, A.R. and Morrison, W.R., III (2025) Nonconsumptive effects of parasitoids and predators in stored products: the impact of Theocolax elegans and other field-collected predators on the foraging of lesser grain borer and rice weevil. Pest Manag Sci, 81: 7529–7541.

✅  Question 1

  • Based on the table above, which family of predators seems to be the best and most reliable at eating weevils?

Put your answer here


1. Setting everything up

✅  Task 2

Import the usual libraries: NumPy, matplotlib, pandas, and stats.

# Import the usual libraries

✅  Task 3

  • Load the data set Natural Predator Data_combined.csv

  • Display the first few rows of the DataFrame to make sure it looks ok.

# Load with pandas

2. Means and standard errors

Columns that we may need:

Column NameDescription
Yearyear when the sample was collected
Familyfamily of the natural predator
Avg Weevils Consumed% of weevils eaten by the predator
NCE1 = Evidence of weevils’ self-aggregation. Nonconsumptive effect observed.
0 = otherwise.

✅  Task 4

  • With masking, get a new dataframe correspoding to Carabidae predators (ground beetles) in the 2022 experiments.

  • Print the number of such samples (n).

  • Print the mean % consumed.

  • Print the SE of the mean using the .sem() function from pandas.

Do all three printed values match those in Table 1?

Hint: Define a variables year and family and use them for masking.

# Your code

2.2 Standard Errors for ground beetles

Now we will compute the SE of the mean% consumed but manual using bootstrapping.

✅  Task 5: Manual bootstrap

  • Plot all the sample values in a horizontal line and draw a vertical line segment through the mean a là StatQuest.

  • Then do N = 10 bootstrapped samples. Plot each of them under the actual sample values and draw a vertical line segment through their means.

  • Finally, compute the SE with your 10 bootstrapped samples. How different is it from the SE you got before?

# Your code

✅  Task 6: stats.bootstrap

Maybe we need more bootstrapped samples. In that case, stats.bootstrap becomes handy. Consider a list Ns of various number of samples.

  • For each element Ns[i], bootstrap Ns[i] samples and compute the standard error using stats.bootstrap.

  • Save this estimated standard error in another array.

  • Looking at the values of this last array, are you getting closer or further to the SE you got from Task 4?

Ns = [10, 100, 500, 1000, 5000, 10000, 50000]

# Your code

✅  Task 7

Remember that the standard error is “the standard deviation of the means taken for multiple samples from the same population.”

With all that in mind: which value do you think is closer to the true Standard Error? The one from Task 4 or from Task 6?

Explain your answer.

Put your answer here.

2.3 Standard Errors for grasshoppers

We only had 3 samples for Carabidae predators. In general, it is impossible to come up with any precise statistics (either by formulas or bootstrapping) with such a limited amount of data. To get a better sense of this, let’s examine the Acrididae predators (grasshoppers).

✅  Task 8

  • Repeat Task 4, but this time looking at the family = Acrididae.

# Your code

✅  Task 9

  • Repeat Task 5 (plot and estimate SE with 10 bootstrapped samples), but this time looking at the family = Acrididae.

  • How close is this estimate compared to the formula?

# Your code

✅  Task 10

  • Now repeat Task 10. How do the values in the resulting array vary? Are they getting closer to the value you got in Task 8?

# Your code

✅  Question 11

Like in Task 7: which SE value do you think is closer to the true Standard Error?

  • The one from Task 8 or from Task 10?

  • Are they truly different in the first place?

  • How does increasing the original sample size from 3 to 22 affect the standard error estimation?

Explain your answer.

Put your answer here.


3. Wrangling in pandas: reproducing the full Table 1

This second half will be mostly for you to keep practicing data wrangling with pandas (because you can never have enough wrangling experience.)

3.1 Start with one year and one family

Remember that a good strategy to wrangle data is to first limit yourself to a single case—a single predator family in this case. Once you figure out how to deal with one family, it will be much easier to make a loop to deal with the rest of families.

✅  Task 12

Below are a couple of variables for year and insect family to focus on. Similar to Task 4 (but not the same!):

  • Make a sub-DataFrame for all the data collected in year 2022.

  • Use the sub-DataFrame to then make a sub-sub-DataFrame for all the Carabidae family entries.

With this sub-sub-DataFrame, calculate:

  • Number of entries (n)

  • Mean % of weevils consumed (mean of Avg Weevils Consumed)

  • Standard error of the mean (using the pandas function)

self-aggregation %=100×# entries reporting NCE# total entries.\text{self-aggregation \%} = 100\times\frac{\text{\# entries reporting NCE}}{\text{\# total entries}}.

Do your results match those reported in Table 1 for Hetherington et al. (2025)?

# Your code

year = 2022
family = 'Carabidae'

3.2 Summarizing all families

Now that we’ve figured out how to summarize a single family, it is time to put everything together with a loop and a summary DataFrame.

✅  Question 13

  • Uncomment the two lines below and run the cell. You may need to change the subdata variable for the name of your sub-DataFrame from Task 12—where you mask only the 2022 entries.

  • What are the indices and columns of the weevils summary DataFrame?

  • What does the .unique() function do?

#Uncomment the two lines below. What are the index and column names of the `weevils` DataFrame

#weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])
#weevils

Put your answer here

✅  Task 14

  • Print the column names of the weevils DataFrame. What is the pandas function that returns column names?

  • Print the index (row) names of the weevils DataFrame. What is the pandas function that return index names?

# Your code here

Now fill in weevils using a loop!

✅  Task 15

  • The start of the loop is already given to you

  • You already have most of the code ready from Task 12

  • Verify your results by displaying weevils at the end. Do your results match Table 1 from Hetherington et al.?

  • Remember that we can modify a DataFrame entry with .loc by specifying the column and index names:

# modify the dataframe entry corresponding to a specific index (row) and column
my_dataframe.loc[ 'index name' , 'column name' ] = some_value
# Your code

# for family in weevils.index:
   # Your code mostly ready from Task 12

# weevils

When we initialized weevils, we did it with all float zeroes:

# Initializing `weevils` with float zeros
# Notice it is `0.` , with a point at the end
weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])

✅  Question 16

  • Go back to Question 13 and change 0. with 0—just remove one dot.

  • 0. is a float zero while 0 is an int zero.

  • Now re-run Task 15. Did it work again?

Put your answer here (yes or no)

In Python, floats and ints can often times be treated interchangeably, but not always. When dividing (like when computing a mean), Python will always treat the result as a float. On the other hand, lengths len are always treated as ints.

The latest version of Pandas does not like when float values go into int columns. However, pandas is ok with int values into float columns.

By initializing the weevils columns with float zeroes 0., we can store both ints and floats. This is not the case when initializing the dataframe with int zeroes 0.


4. Pretty-fying the display (Time-permitting)

As discussed above, all the columns in weevils are floats. This can make the numbers have lots of decimal places that make the Table less human-readable. Pandas, as you might expect, has functions to easily round values and change specific columns from floats to ints.

✅  Task 17

  • Use the .round function to round

    • the n and %selfaggregation columns to 0 decimal places.

    • the mean and SE columns to 1 decimal place.

  • Important: Remember to save the modified DataFrame

You’ll need either a Series or a dictionary

# To make a SERIES specifying a value (number of decimal places) for each column
# How can we retrieve the name of the dataframe columns to use them as row names for the Series?
# Make sure the list of decimal places has the same length as the number of columns
dec_places = pd.Series([4,3,2,1], index=dataframe_columns)

# To make a DICTIONARY specifying a value (number of decimal places) for each column
# Manually: helpful if we just need to modify one or two columns
dec_places = {'column name 1': 4, 'column name 3': 2}
# Automatically: helpful if we need a dictionary with lots of entries
dec_places = dict(zip( list_of_keys , list_of_values ) )
# In this case:
dec_places = dict(zip( dataframe_columns , [4,3,2,1] ) )
# round the values of `weevils`

✅  Task 18

  • Now use the .astype function to change the datatype of columns n and %selfaggregation columns to int

  • We want to keep mean and SE columns as floats, so no need to change those.

  • Important: Remember to save the modified DataFrame

As in Task 17, you’ll need either a dictionary or a Series.

# change datatypes

✅  Task 19

  • Finally, use the .sort_values function to sort the rows in weevils by their mean consumption values.

  • Check how the by argument works

  • Important: Remember to save the modified DataFrame

Display the resulting weevils DataFrame. How does it look compared to Table 1 from Hetherington et al. (2025)?

# Sort entries by mean consumption

Put your answer here


Congratulations, you’re done!

Submit this assignment by uploading it to the course Canvas web page. Go to the “In-class assignments” folder, find the appropriate submission link, and upload it there.

See you next class!

© Copyright 2026, Division of Plant Science & Technology—University of Missouri

References
  1. Hetherington, M. C., Sakka, M. K., Abshire, J., Maille, J. M., Stoll, I., Athanassiou, C. G., Scully, E. D., Gerken, A. R., & Morrison, W. R. (2025). Nonconsumptive effects of parasitoids and predators in stored products: the impact of <scp> Theocolax elegans </scp> and other field‐collected predators on the foraging of lesser grain borer and rice weevil. Pest Management Science, 81(11), 7529–7541. 10.1002/ps.70230