✅ Put your name here
¶
Introduction to Data Ethics: Data and Algorithmic Bias¶
The following is a series of important points to keep in mind whenever you are thinking in data science terms, regardless if you are focused on biology or not.

Credits: xkcd
Learning goals for today’s pre-class assignment¶
Identify how bias occurs in data and algorithms
Understand the impact data and algorithmic bias has on people
Apply practices to look for and minimize bias in your work and others
Construct your personal academic integrity statement
Practice with lists and loops
Assignment instructions¶
Read this notebook, watch the videos below and complete the assigned programming problems. Please get started early, and come to office hours if you have any questions!
This assignment is due by 11:59 p.m. the day before class and should be uploaded into the “Pre-Class Assignments” dropbox folder for Day 3. Submission instructions can be found at the end of the Notebook.
1. Introduction¶
Data and algorithms are everywhere, and we encounter them everyday. Streaming services use the information you’ve provided on previous shows and movies you’ve watched to give more accurate recommendations. Advertisements are customized to us based on our search histories. As students dealing with data and constructing your own algorithms, you’ll have even more involvement in these processes.
✅ Question:
Give an example when you’ve interacted with data or algorithms outside of this class:
✎ Write your response here
We give a lot of power to data and algorithms. Perhaps you’ve heard someone say, “look at the data/numbers,” or “it’s just fact.” Data-driven and evidence-based thinking is a very important skill that can lead to well-informed, insightful decisions. However, we must also recognize limitations data may have.
While using data, it is important to ask ourselves:
Who collected this data, and do they have a motivation to highlight a certain perspective?
This is similar to watching the news- each network has it’s own bias and leanings. Several news channels might tell the same news story very differently. It is our job to tease out the most complete story we can. Data is not neutral.
Who/What does this data exclude?
Often times people, regions, and species who have been historically marginalized find themselves erased in data.
For example in plant science, when analyzing ˜300,000 published papers between 2000 and 2020, we can find striking geographical biases that are correlated with national affluence. Gender imbalances were also evident, with far more papers led by authors with masculine names than those by authors with feminine names. Last, there are substantial taxonomic sampling gaps. The vast majority of surveyed studies focused on major crop and model species, and the remaining biodiversity accounted for only a fraction of publications (Marks et al 2023).
2. What is data bias?¶

Credits: sketchplanations
Bias is defined to be “prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair.” Data bias occurs when parts of a dataset are overemphasized, underemphasized, or are completely nonexistent.
✅ Task 1
Watch the video below and answer the reflection questions below.
# Imports the functionality that we need to display YouTube videos in a Jupyter Notebook.
# You need to run this cell before you run ANY of the YouTube videos.
from IPython.display import YouTubeVideo
# Video on how algorithms spread bias
YouTubeVideo("1z9KsNoAmFA",width=640,height=360) Write a paragraph reflecting on the video. Be prepared to discuss these videos and your reflections in class. Consider answering the questions below, but you are not limited to them:
Which example of data and algorithmic bias was most impactful and surprising to you? Why?
If you were explaining this video to a friend who hadn’t watched it, what would you tell them? What are the major takeaways?
Will this change how you engage with data and algorithms going forward? If so, why and how? If not, why not?
What is something you can do to fight against bias in algorithms? Both as a user of algorithms, and someone who could help create algorithms in the future.
✎ Write your response here
Data bias can lead to well intentioned algorithms outputting biased results. When those results and biased data are used in the algorithm, it perpetuates a cycle of bias.
✅ Task 2
Choose at least one of the examples below of real algorithmic bias. Some of them were actually referred to in the TEDx Talk above.
Based on what you read, try addressing at least three of the questions below:
How is data being used?
How does the actual usage of data relate to its intended usage?
Who owns and/or controls the data?
Who benefits from the data usage?
How is the data usage related to bias?
✎ Write your response here
3. Craft Your Personal Academic Integrity Statement¶
After spending some time thinking about data and algorithm bias and some of the ethical implications of that bias, it’s worth spending sometime thinking about how you, personally, are going to approach your development as a scientist, especially in the context of your work in this course.
As you work to develop your computational skills and learn to write evermore complex code needed for evermore complex experimental setups. You will like find yourself searching the internet for help. This is a completely authentic part of data science. However, it is important that you use the resources you find on the internet in transparent and honest ways. This includes being thoughtful about how to give credit to the code authors and websites you lean on when you need to figure out something new.
Along these lines, Mizzou has enacted a Standard of Conduct for Academic Integrity. It is important for you to be aware of this standard as it acknowledges some of our shared code of ethics as members of MU. Academic integrity is the foundation for university success and future success. Learning how to express original ideas, cite works, work independently, and report results accurately and honestly are skills that carry students beyond their academic career.
Mizzou’s Academic Integrity Honor Pledge:
I strive to uphold the University values of respect, responsibility, discovery, and excellence. On my honor, I pledge that I have neither given nor received unauthorized assistance on this work.
✅ Task 3: Your pledge
In the cell below, craft a personal statement of commitment to academic honesty and integrity. As part of this statement, address the following components:
Why is integrity important to you?
What values motivate the work that you do?
Commitment to conducting yourself with integrity
Acknowledgement that you are aware of Mizzou’s ethical standards for integrity.
IMPORTANT: This personal integrity statement will be placed on all of your major homework and exams. (You will be asked to paste your statement into each assignment as an acknowledgement of your committment to ethical behavior).
Your personal statement may share elements with those of your peers, but it should also, ideally, be unique to you. Try to make something that has personal meaning!
✎ Put your statement here
I, _________, commit to _______
If you need a starting place, here is a sample personal statement:
Integrity Pledge:
I, [name], value the opportunity to receive a collegiate education. Because of this value and the sacrifices of people who have made this possible for me, I commit to studying to the best of my ability, submitting work that is my own, and citing sources when I receive help. I acknowledge I am aware of the University of Missouri policy concerning academic honesty, plagiarism, and cheating.
Additionally, there are numerous examples of such statements on the internet. You may find them useful for inspiration.
4. More Practice With Variables, Lists, and Loops¶
(Not required, but could be useful for building your skills and providing additional preparation for class)¶
If you have some extra time and want some extra practice building on your new Python skills, you are encouraged to work through the following examples and exercises. It is not required that you complete this section to get credit for this pre-class assignment. However, you will be writing more lists and loops in class for Day 4, so if you feel like you need to spend some time practicing this, you may wish to do so.
REMINDER on Generative AI Usage¶
To ensure that you are starting to build a strong basis in foundational concepts, please do not use Generative AI (chatGPT, Dall-E, Claude, Co-pilot, etc.) at this time. We will introduce how to use them in support of your learning soon!
However, feel free to post in Slack, talk with your peers and instructors, and use the resources below:
4.1 Variables¶
Review the following code for examples of how variables can be defined, used, and manipulated.
int_var = 3 # Integer variable
float_var = 15.75 # floating point variable
str_var = 'Truman the Tiger' # string variable
print('1:', 'An integer plus a float works in python:',int_var+float_var)
#You can not do math with strings, but you can concatenate strings (if you turn your variables into strings first)
new_str_var = str_var +' has won '+str(int_var)+' Best Mascot National Championships.'
print('2:',new_str_var)
# or you can just use a print statement with commas to make meaningful debugging and result statements
print('3:',str_var,'is',float_var+int_var,'times better than Big Jay.')
print('4: The value of int_var:', int_var)1: An integer plus a float works in python: 18.75
2: Truman the Tiger has won 3 Best Mascot National Championships.
3: Truman the Tiger is 18.75 times better than Big Jay.
4: The value of int_var: 3
✅ Task 4
Write a print statement that concatenates all of the following strings to show the complete quote
q1 = "including those at MU games, hospitals, schools, community events and campus gatherings."
q2 = 'Truman was first acclaimed the "Best Mascot in the Nation" in 2004'
q3 = "In a typical year, Truman makes more than 400 appearances, "
q4 = "and repeated the honor in 2014 and 2024."# put your code here4.2 Lists¶
A list stores a series of items in a particular order. You access items using an index, or with a for loop (ex: for val in list:)
list_ex = [] # initialize an empty list
list_ex.append('Truman the Tiger') # append an item to a list
list_ex.append('The Columns')
list_ex.append('Mizzou')
list_ex.append('Columbia')
print('Print 1:',list_ex) # print contents of variable or whole blist
list_ex.remove('Mizzou') # remove specific entry from list, but only first entry with this value
print('Print 2:',list_ex) # print contents of variable or list
list_ex.append('Show-Me State')
print('Print 3:',list_ex)
print('Print 4:',list_ex[3]) # print the 4th value in the list 'list_ex'Print 1: ['Truman the Tiger', 'The Columns', 'Mizzou', 'Columbia']
Print 2: ['Truman the Tiger', 'The Columns', 'Columbia']
Print 3: ['Truman the Tiger', 'The Columns', 'Columbia', 'Show-Me State']
Print 4: Show-Me State
Note: An important concept with lists is that they have values stored at specific indexes. It is important to remember the idea of an Index (which is the location) and the Value (which is the value of the single variable at that index).

Credits: railsware.com
To access an element by its index we need to use square brackets.
# Example of Values and Indexes
index = 1
print(list_ex[index],'is the value at the', index, 'index.')The Columns is the value at the 1 index.
4.3 Loops¶
So far, we have learned:
forloops (repeats a block of code the number of times described in the “for” statement)whileloops (repeats a block of code as long as a certain condition is true.)
# First Loop Type
for value1 in list_ex: # loop through all the entries in list "list_ex"
print('Current entry in variable value is:', value1) # for each iteration, variable named "value1"
# will be assigned the next entry in "list_ex"Current entry in variable value is: Truman the Tiger
Current entry in variable value is: The Columns
Current entry in variable value is: Columbia
Current entry in variable value is: Show-Me State
# Second Loop Type
for index1 in range(len(list_ex)): # loop through integers from 0 to length of list "list_ex"
# for each iteration, variable named "index1"
# will be assigned the next integer in 0 to length of list "list_ex"
str_now = list_ex[index1] # assign a variable the content of the index1-th entry of list "list_ex"
print('The',index1,'entry in list_ex is',str_now)The 0 entry in list_ex is Truman the Tiger
The 1 entry in list_ex is The Columns
The 2 entry in list_ex is Columbia
The 3 entry in list_ex is Show-Me State
# Third Loop Type
index1 = 0
while index1 < len(list_ex): # perform a while loop until index1 is equal to or greater than the length of list "list_ex"
str_now = list_ex[index1]
print('The',index1,'entry in list_ex is',str_now)
index1 += 1 # increment whatever is in index1 by +1
# Note this is the identical result as the for loop in cell aboveThe 0 entry in list_ex is Truman the Tiger
The 1 entry in list_ex is The Columns
The 2 entry in list_ex is Columbia
The 3 entry in list_ex is Show-Me State
✅ Task 5
Write a loop using one of the types above that prints the entries in list_ex in reverse order. There is more than one way to tackle this problem!
# Put your code here✅ Question 6
If you were able to successfully print the list in reverse order, describe how you came up with your solution. If not, describe where you are stuck and what you have tried so far.
✎ Write your response here
Follow-up Questions¶
Copy and paste the following questions into the appropriate box in the assignment survey include below and answer them there. (Note: You’ll have to fill out the section number and the assignment number and go to the “NEXT” section of the survey to paste in these questions.)
In your own words, how would your define algorithmic bias?
What is one example of something we can do as either users or creators of algorithms and data to help avoid algorithmic bias?
How are you feeling about your ability to work with lists and loops in Python?
Congratulations, you’re done!¶
Submit this assignment by uploading it to the course Canvas web page. Go to the “Pre-class assignments” folder, find the appropriate submission folder link, and upload it there.
See you in class!
Material drawn with permission from:
© Copyright 2023. Department of Computational Mathematics, Science and Engineering at Michigan State University
Adapted for:
© Copyright 2026, Division of Plant Science & Technology—University of Missouri
- Marks, R. A., Amézquita, E. J., Percival, S., Rougon-Cardoso, A., Chibici-Revneanu, C., Tebele, S. M., Farrant, J. M., Chitwood, D. H., & VanBuren, R. (2023). A critical analysis of plant science literature reveals ongoing inequities. Proceedings of the National Academy of Sciences, 120(10). 10.1073/pnas.2217564120