Day 17: In-Class Assignment

✅ Put your name here
¶

✅ Put your group member names here
¶

Can we control for rice weevil infestations with wasp vibes?¶

Close-up picture of rice grains covered with rice weevils.

Learning goals of today’s assignment¶

Compute the standard error of the mean via bootstrap and direct formula
Practice more pandas wrangling and column datatypes

Assignment instructions¶

Work with your group to complete this assignment. Instructions for submitting this assignment are at the end of the notebook. The assignment is due at the end of class.

Background¶

The rice weevil (Sitophilus oryzae L.) and the lesser grain borer (Rhyzopertha dominica F.) are cosmopolitan pests that commonly infest stored cereals, causing substantial damage. Biological control is a complementary strategy for pest management. The mere presence of natural enemies has been shown to influence patterns of insect habitat use, feeding, oviposition and dispersal, ultimately impacting the prey damage and fecundity.

A large hindrance to the implementation of biological control in food facilities in North America is the cultural preoccupation with 'clean' facilities. However, it could be beneficial if food facilities could exploit the 'ecology of fear' by deploying natural enemy cues to reduce stored product activity and damage to commodities, without contributing to the perception of the facility being 'unclean' by fostering or releasing beneficial insects.

Table 1: Summary of percentage Sitophilus oryzae consumed by and nonconsumptive effects elicited by natural enemies collected adjacent to a food facility in 2022.

Family	$n$	Mean % consumed ± SE	% Self-aggregation
Miridae	3	0 ± 0	33
Thomisidae	5	0 ± 0	60
Coreidae	10	2.0 ± 2.0	10
Chrysopidae	6	3.3 ± 3.3	67
Unknown Araneae	3	6.7 ± 3.3	0
Salticidae	7	8.6 ± 5.5	14
Acrididae	22	17 ± 5.7	36
Reduviidae	3	20 ± 5.8	33
Gryllidae	24	30 ± 7.8	25
Carabidae	3	37 ± 32	33

Hetherington, M.C., Sakka, M.K., Abshire, J., Maille, J.M., Stoll, I., Athanassiou, C.G., Scully, E.D., Gerken, A.R. and Morrison, W.R., III (2025) Nonconsumptive effects of parasitoids and predators in stored products: the impact of Theocolax elegans and other field-collected predators on the foraging of lesser grain borer and rice weevil. Pest Manag Sci, 81: 7529–7541.

✅ Question 1

Based on the table above, which family of predators seems to be the best and most reliable at eating weevils?

✎ Put your answer here

1. Setting everything up¶

✅ Task 2

Import the usual libraries: NumPy, matplotlib, pandas, and stats.

# Import the usual libraries

✅ Task 3

Load the data set Natural Predator Data_combined.csv
Display the first few rows of the DataFrame to make sure it looks ok.

# Load with pandas

2. Means and standard errors¶

Columns that we may need:

Column Name	Description
`Year`	year when the sample was collected
`Family`	family of the natural predator
`Avg Weevils Consumed`	% of weevils eaten by the predator
`NCE`	1 = Evidence of weevils’ self-aggregation. Nonconsumptive effect observed. 0 = otherwise.

✅ Task 4

With masking, get a new dataframe correspoding to Carabidae predators (ground beetles) in the 2022 experiments.
Print the number of such samples (n).
Print the mean % consumed.
Print the SE of the mean using the .sem() function from pandas.

Do all three printed values match those in Table 1?

Hint: Define a variables year and family and use them for masking.

# Your code

2.2 Standard Errors for ground beetles (Carabidae)¶

Now compute the SE of the mean% consumed but manual using bootstrapping.

✅ Task 5: Manual bootstrap

Plot all the sample values in a horizontal line and draw a vertical line segment through the mean a là StatQuest.
Then do N = 10 bootstrapped samples. Plot each of them under the actual sample values and draw a vertical line segment through their means.
Finally, compute the SE with your 10 bootstrapped samples. How different is it from the SE you got before?

# Copy/pasted from (3.2) in the pre-class
# You'll need to remove the ''' and tinker it

'''
fig, ax = plt.subplots(figsize=(10,1))
ax.set_ylim(-0.3, 0.3)
ax.set_yticks([0], '')
ax.set_title(site)
ax.set_xlabel('Pollen contamination %')
ax.axhline(0, c='k', lw=2, zorder=1)
ax.plot([contamination.mean(), contamination.mean()], [0.2, -0.2], c='r', lw=3, zorder=2)
ax.scatter(contamination , np.zeros(len(contamination)), marker='o', s=100, fc='r', ec = 'k', zorder=3);
'''

"\nfig, ax = plt.subplots(figsize=(10,1))\nax.set_ylim(-0.3, 0.3)\nax.set_yticks([0], '')\nax.set_title(site)\nax.set_xlabel('Pollen contamination %')\nax.axhline(0, c='k', lw=2, zorder=1)\nax.plot([contamination.mean(), contamination.mean()], [0.2, -0.2], c='r', lw=3, zorder=2)\nax.scatter(contamination , np.zeros(len(contamination)), marker='o', s=100, fc='r', ec = 'k', zorder=3);\n"

✅ Task 6: stats.bootstrap

Maybe we need more bootstrapped samples. In that case, stats.bootstrap becomes handy. Consider a list Ns of various number of samples.

For each element Ns[i], bootstrap Ns[i] samples and compute the standard error using stats.bootstrap.
Save this estimated standard error in another array.
Looking at the values of this last array, are you getting closer or further to the SE you got from Task 4?

Ns = [10, 100, 500, 1000, 5000, 10000, 50000]

# estimated_SE = ... 

# for i in ... :
    
    # boot = something  Ns[i]  something
    #
    # estimated_SE[i] = ...

# print the estimated SEs

✅ Task 7

Remember that the standard error is “the standard deviation of the means taken for multiple samples from the same population.”

With all that in mind: which value do you think is closer to the true Standard Error? The one from Task 4 or from Task 6?

Explain your answer.

✎ Put your answer here.

2.3 Standard Errors for grasshoppers (Acrididae)¶

We only had 3 samples for Carabidae predators. In general, it is impossible to come up with any precise statistics (either by formulas or bootstrapping) with such a limited amount of data. To get a better sense of this, let’s examine the Acrididae predators (grasshoppers).

✅ Task 8

Repeat Task 4, but this time looking at the family = Acrididae.

# Your code

✅ Task 9

Repeat Task 5 (plot and estimate SE with 10 bootstrapped samples), but this time looking at the family = Acrididae.
How close is this estimate compared to the formula?

# Your code

✅ Task 10

Now repeat Task 10. How do the values in the resulting array vary? Are they getting closer to the value you got in Task 8?

# Your code

✅ Question 11

Like in Task 7: which SE value do you think is closer to the true Standard Error?

The one from Task 8 or from Task 10?
Are they truly different in the first place?
How does increasing the original sample size from 3 to 22 affect the standard error estimation?

Explain your answer.

✎ Put your answer here.

3. Wrangling in pandas: reproducing the full Table 1¶

This second half will be mostly for you to keep practicing data wrangling with pandas (because you can never have enough wrangling experience.)

3.1 Start with one year and one family¶

Remember that a good strategy to wrangle data is to first limit yourself to a single case—a single predator family in this case. Once you figure out how to deal with one family, it will be much easier to make a loop to deal with the rest of families.

✅ Task 12

Below are a couple of variables for year and insect family to focus on. Similar to Task 4 (but not the same!):

Make a sub-DataFrame for all the data collected in year 2022.
Use the sub-DataFrame to then make a sub-sub-DataFrame for all the Carabidae family entries.

With this sub-sub-DataFrame, calculate:

Number of entries (n)
Mean % of weevils consumed (mean of Avg Weevils Consumed)
Standard error of the mean (using the pandas function)

\text{self-aggregation \%} = 100\times\frac{\text{\# entries reporting NCE}}{\text{\# total entries}}.

(1)

Do your results match those reported in Table 1 for Hetherington et al. (2025)?

# Your code

year = 2022
family = 'Carabidae'

3.2 Summarizing all families¶

Now that we’ve figured out how to summarize a single family, it is time to put everything together with a loop and a summary DataFrame.

✅ Question 13

Uncomment the two lines below and run the cell. You may need to change the subdata variable for the name of your sub-DataFrame from Task 12—where you mask only the 2022 entries.
What are the indices and columns of the weevils summary DataFrame?
What does the .unique() function do?

#Uncomment the two lines below. What are the index and column names of the `weevils` DataFrame

#weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])
#weevils

✎ Put your answer here

✅ Task 14

Print the column names of the weevils DataFrame. What is the pandas function that returns column names?
Print the index (row) names of the weevils DataFrame. What is the pandas function that return index names?

# Your code here

Now fill in weevils using a loop!

✅ Task 15

The start of the loop is already given to you
You already have most of the code ready from Task 12
Verify your results by displaying weevils at the end. Do your results match Table 1 from Hetherington et al.?
Remember that we can modify a DataFrame entry with .loc by specifying the column and index names:

# modify the dataframe entry corresponding to a specific index (row) and column
my_dataframe.loc[ 'index name' , 'column name' ] = some_value

# Your code

# for family in weevils.index:
   # Your code mostly ready from Task 12

# weevils

When we initialized weevils, we did it with all float zeroes:

# Initializing `weevils` with float zeros
# Notice it is `0.` , with a point at the end
weevils = pd.DataFrame(0., index=subdata['Family'].unique(), columns=['n', 'mean %consumed', 'SE', '%selfaggregation'])

✅ Question 16

Go back to Question 13 and change 0. with 0—just remove one dot.
0. is a float zero while 0 is an int zero.
Now re-run Task 15. Did it work again?

✎ Put your answer here (yes or no)

In Python, floats and ints can often times be treated interchangeably, but not always. When dividing (like when computing a mean), Python will always treat the result as a float. On the other hand, lengths len are always treated as ints.

The latest version of Pandas does not like when float values go into int columns. However, pandas is ok with int values into float columns.

By initializing the weevils columns with float zeroes 0., we can store both ints and floats. This is not the case when initializing the dataframe with int zeroes 0.

4. Pretty-fying the display (Time-permitting)¶

As discussed above, all the columns in weevils are floats. This can make the numbers have lots of decimal places that make the Table less human-readable. Pandas, as you might expect, has functions to easily round values and change specific columns from floats to ints.

✅ Task 17

Use the .round function to round
- the n and %selfaggregation columns to 0 decimal places.
- the mean and SE columns to 1 decimal place.
Important: Remember to save the modified DataFrame

You’ll need either a Series or a dictionary

# To make a SERIES specifying a value (number of decimal places) for each column
# How can we retrieve the name of the dataframe columns to use them as row names for the Series?
# Make sure the list of decimal places has the same length as the number of columns
dec_places = pd.Series([4,3,2,1], index=dataframe_columns)

# To make a DICTIONARY specifying a value (number of decimal places) for each column
# Manually: helpful if we just need to modify one or two columns
dec_places = {'column name 1': 4, 'column name 3': 2}
# Automatically: helpful if we need a dictionary with lots of entries
dec_places = dict(zip( list_of_keys , list_of_values ) )
# In this case:
dec_places = dict(zip( dataframe_columns , [4,3,2,1] ) )

# round the values of `weevils`

✅ Task 18

Now use the .astype function to change the datatype of columns n and %selfaggregation columns to int
We want to keep mean and SE columns as floats, so no need to change those.
Important: Remember to save the modified DataFrame

As in Task 17, you’ll need either a dictionary or a Series.

# change datatypes

✅ Task 19

Finally, use the .sort_values function to sort the rows in weevils by their mean consumption values.
Check how the by argument works
Important: Remember to save the modified DataFrame

Display the resulting weevils DataFrame. How does it look compared to Table 1 from Hetherington et al. (2025)?

# Sort entries by mean consumption

✎ Put your answer here

Congratulations, you’re done!¶

Submit this assignment by uploading it to the course Canvas web page. Go to the “In-class assignments” folder, find the appropriate submission link, and upload it there.

See you next class!

References¶

Hetherington, M. C., Sakka, M. K., Abshire, J., Maille, J. M., Stoll, I., Athanassiou, C. G., Scully, E. D., Gerken, A. R., & Morrison, W. R. (2025). Nonconsumptive effects of parasitoids and predators in stored products: the impact of <scp> Theocolax elegans </scp> and other field‐collected predators on the foraging of lesser grain borer and rice weevil. Pest Management Science, 81(11), 7529–7541. 10.1002/ps.70230

✅ Put your name here¶

✅ Put your group member names here¶

Can we control for rice weevil infestations with wasp vibes?¶

Learning goals of today’s assignment¶

Assignment instructions¶

Background¶

1. Setting everything up¶

2. Means and standard errors¶

2.2 Standard Errors for ground beetles (Carabidae)¶

2.3 Standard Errors for grasshoppers (Acrididae)¶

3. Wrangling in pandas: reproducing the full Table 1¶

3.1 Start with one year and one family¶

3.2 Summarizing all families¶

4. Pretty-fying the display (Time-permitting)¶

Congratulations, you’re done!¶

✅ Put your name here
¶

✅ Put your group member names here
¶