Can we control for rice weevil infestations with wasp vibes?¶

Credits: Rice Array
Learning goals of today’s assignment¶
Compute the standard error of the mean via bootstrap and direct formula
Practice more pandas wrangling and column datatypes
Assignment instructions¶
Work with your group to complete this assignment. Instructions for submitting this assignment are at the end of the notebook. The assignment is due at the end of class.
Background¶
The rice weevil (Sitophilus oryzae L.) and the lesser grain borer (Rhyzopertha dominica F.) are cosmopolitan pests that commonly infest stored cereals, causing substantial damage. Biological control is a complementary management strategy that may expand the integrated pest management (IPM) toolset in stored cereals. Theocolax elegans reduced R. dominica and S. oryzae populations in grain bins by >90% compared to controls. The mere presence of natural enemies has been shown to influence patterns of insect habitat use, feeding, oviposition and dispersal, ultimately impacting the prey damage and fecundity.
A large hindrance to the implementation of biological control in food facilities in North America is the cultural preoccupation with 'clean' facilities and the perception that biocontrol agents contribute to tolerance limits in the food supply. However, it could be beneficial if food facilities could exploit the 'ecology of fear' by deploying natural enemy cues to reduce stored product activity and damage to commodities, without contributing to the perception of the facility being 'unclean' by fostering or releasing beneficial insects.
Table 1: Summary of percentage Sitophilus oryzae consumed by and nonconsumptive effects elicited by natural enemies collected adjacent to a food facility in 2022.
| Family | Mean % consumed ± SE | % Self-aggregation | |
|---|---|---|---|
| Miridae | 3 | 0 ± 0 | 33 |
| Thomisidae | 5 | 0 ± 0 | 60 |
| Coreidae | 10 | 2.0 ± 2.0 | 10 |
| Chrysopidae | 6 | 3.3 ± 3.3 | 67 |
| Unknown Araneae | 3 | 6.7 ± 3.3 | 0 |
| Salticidae | 7 | 8.6 ± 5.5 | 14 |
| Acrididae | 22 | 17 ± 5.7 | 36 |
| Reduviidae | 3 | 20 ± 5.8 | 33 |
| Gryllidae | 24 | 30 ± 7.8 | 25 |
| Carabidae | 3 | 37 ± 32 | 33 |
Hetherington, M.C., Sakka, M.K., Abshire, J., Maille, J.M., Stoll, I., Athanassiou, C.G., Scully, E.D., Gerken, A.R. and Morrison, W.R., III (2025) Nonconsumptive effects of parasitoids and predators in stored products: the impact of Theocolax elegans and other field-collected predators on the foraging of lesser grain borer and rice weevil. Pest Manag Sci, 81: 7529–7541.
✅ Question 1
Based on the table above, which family of predators seems to be the best and most reliable at eating weevils?
✎ Put your answer here
# Import the usual libraries
✅ Task 3
Load the data set
Natural Predator Data_combined.csvDisplay the first few rows of the DataFrame to make sure it looks ok.
# Load with pandas2. Means and standard errors¶
Columns that we may need:
| Column Name | Description |
|---|---|
Year | year when the sample was collected |
Family | family of the natural predator |
Avg Weevils Consumed | % of weevils eaten by the predator |
NCE | 1 = Evidence of weevils’ self-aggregation. Nonconsumptive effect observed. 0 = otherwise. |
✅ Task 4
With masking, get a new dataframe correspoding to Carabidae predators (ground beetles) in the 2022 experiments.
Print the number of such samples (
n).Print the mean % consumed.
Print the SE of the mean using the
.sem()function from pandas.
Do all three printed values match those in Table 1?
Hint: Define a variables year and family and use them for masking.
# Your code2.2 Standard Errors for ground beetles¶
Now we will compute the SE of the mean% consumed but manual using bootstrapping.
✅ Task 5: Manual bootstrap
Plot all the sample values in a horizontal line and draw a vertical line segment through the mean a là StatQuest.
Then do
N = 10bootstrapped samples. Plot each of them under the actual sample values and draw a vertical line segment through their means.Finally, compute the SE with your 10 bootstrapped samples. How different is it from the SE you got before?
# Your code✅ Task 6: stats.bootstrap
Maybe we need more bootstrapped samples. In that case, stats.bootstrap becomes handy. Consider a list Ns of various number of samples.
For each element
Ns[i], bootstrapNs[i]samples and compute the standard error usingstats.bootstrap.Save this estimated standard error in another array.
Looking at the values of this last array, are you getting closer or further to the SE you got from Task 4?
Ns = [10, 100, 500, 1000, 5000, 10000, 50000]
# Your code
✅ Task 7
Remember that the standard error is “the standard deviation of the means taken for multiple samples from the same population.”
With all that in mind: which value do you think is closer to the true Standard Error? The one from Task 4 or from Task 6?
Explain your answer.
✎ Put your answer here.
2.3 Standard Errors for grasshoppers¶
We only had 3 samples for Carabidae predators. In general, it is impossible to come up with any precise statistics (either by formulas or bootstrapping) with such a limited amount of data. To get a better sense of this, let’s examine the Acrididae predators (grasshoppers).
✅ Task 8
Repeat Task 4, but this time looking at the
family = Acrididae.
# Your code✅ Task 9
Repeat Task 5 (plot and estimate SE with 10 bootstrapped samples), but this time looking at the
family = Acrididae.How close is this estimate compared to the formula?
# Your code✅ Task 10
Now repeat Task 10. How do the values in the resulting array vary? Are they getting closer to the value you got in Task 8?
# Your code✅ Question 11
Like in Task 7: which SE value do you think is closer to the true Standard Error?
The one from Task 8 or from Task 10?
Are they truly different in the first place?
How does increasing the original sample size from 3 to 22 affect the standard error estimation?
Explain your answer.
✎ Put your answer here.
3. Wrangling in pandas: reproducing the full Table 1¶
This second half will be mostly for you to keep practicing data wrangling with pandas (because you can never have enough wrangling experience.)
3.1 Start with one year and one family¶
Remember that a good strategy to wrangle data is to first limit yourself to a single case—a single predator family in this case. Once you figure out how to deal with one family, it will be much easier to make a loop to deal with the rest of families.
✅ Task 12
Below are a couple of variables for year and insect family to focus on. Similar to Task 4 (but not the same!):
Make a sub-DataFrame for all the data collected in year 2022.
Use the sub-DataFrame to then make a sub-sub-DataFrame for all the Carabidae family entries.
With this sub-sub-DataFrame, calculate:
Number of entries (
n)Mean % of weevils consumed (mean of
Avg Weevils Consumed)Standard error of the mean (using the pandas function)
Do your results match those reported in Table 1 for Hetherington et al. (2025)?
# Your code
year = 2022
family = 'Carabidae'3.2 Summarizing all families¶
Now that we’ve figured out how to summarize a single family, it is time to put everything together with a loop and a summary DataFrame.
✅ Question 13
Uncomment the two lines below and run the cell. You may need to change the
subdatavariable for the name of your sub-DataFrame from Task 12—where you mask only the 2022 entries.What are the indices and columns of the
weevilssummary DataFrame?What does the
.unique()function do?
#Uncomment the two lines below. What are the index and column names of the `weevils` DataFrame
#weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])
#weevils✎ Put your answer here
✅ Task 14
Print the column names of the
weevilsDataFrame. What is the pandas function that returns column names?Print the index (row) names of the
weevilsDataFrame. What is the pandas function that return index names?
# Your code hereNow fill in weevils using a loop!
✅ Task 15
The start of the loop is already given to you
You already have most of the code ready from Task 12
Verify your results by displaying
weevilsat the end. Do your results match Table 1 from Hetherington et al.?Remember that we can modify a DataFrame entry with
.locby specifying the column and index names:
# modify the dataframe entry corresponding to a specific index (row) and column
my_dataframe.loc[ 'index name' , 'column name' ] = some_value# Your code
# for family in weevils.index:
# Your code mostly ready from Task 12
# weevilsWhen we initialized weevils, we did it with all float zeroes:
# Initializing `weevils` with float zeros
# Notice it is `0.` , with a point at the end
weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])✅ Question 16
Go back to Question 13 and change
0.with0—just remove one dot.0.is a float zero while0is an int zero.Now re-run Task 15. Did it work again?
✎ Put your answer here (yes or no)
In Python, floats and ints can often times be treated interchangeably, but not always. When dividing (like when computing a mean), Python will always treat the result as a float. On the other hand, lengths len are always treated as ints.
The latest version of Pandas does not like when float values go into int columns. However, pandas is ok with int values into float columns.
By initializing the weevils columns with float zeroes 0., we can store both ints and floats. This is not the case when initializing the dataframe with int zeroes 0.
4. Pretty-fying the display (Time-permitting)¶
As discussed above, all the columns in weevils are floats. This can make the numbers have lots of decimal places that make the Table less human-readable. Pandas, as you might expect, has functions to easily round values and change specific columns from floats to ints.
✅ Task 17
Use the
.roundfunction to roundthe
nand%selfaggregationcolumns to 0 decimal places.the
meanandSEcolumns to 1 decimal place.
Important: Remember to save the modified DataFrame
You’ll need either a Series or a dictionary
# To make a SERIES specifying a value (number of decimal places) for each column
# How can we retrieve the name of the dataframe columns to use them as row names for the Series?
# Make sure the list of decimal places has the same length as the number of columns
dec_places = pd.Series([4,3,2,1], index=dataframe_columns)
# To make a DICTIONARY specifying a value (number of decimal places) for each column
# Manually: helpful if we just need to modify one or two columns
dec_places = {'column name 1': 4, 'column name 3': 2}
# Automatically: helpful if we need a dictionary with lots of entries
dec_places = dict(zip( list_of_keys , list_of_values ) )
# In this case:
dec_places = dict(zip( dataframe_columns , [4,3,2,1] ) )# round the values of `weevils`✅ Task 18
Now use the
.astypefunction to change the datatype of columnsnand%selfaggregationcolumns to intWe want to keep
meanandSEcolumns as floats, so no need to change those.Important: Remember to save the modified DataFrame
As in Task 17, you’ll need either a dictionary or a Series.
# change datatypes✅ Task 19
Finally, use the
.sort_valuesfunction to sort the rows inweevilsby their mean consumption values.Check how the
byargument worksImportant: Remember to save the modified DataFrame
Display the resulting weevils DataFrame. How does it look compared to Table 1 from Hetherington et al. (2025)?
# Sort entries by mean consumption✎ Put your answer here
Congratulations, you’re done!¶
Submit this assignment by uploading it to the course Canvas web page. Go to the “In-class assignments” folder, find the appropriate submission link, and upload it there.
See you next class!
© Copyright 2026, Division of Plant Science & Technology—University of Missouri
- Hetherington, M. C., Sakka, M. K., Abshire, J., Maille, J. M., Stoll, I., Athanassiou, C. G., Scully, E. D., Gerken, A. R., & Morrison, W. R. (2025). Nonconsumptive effects of parasitoids and predators in stored products: the impact of <scp> Theocolax elegans </scp> and other field‐collected predators on the foraging of lesser grain borer and rice weevil. Pest Management Science, 81(11), 7529–7541. 10.1002/ps.70230