The Success in AI Versus Student Led Science Fair Projects
Eshika Panchagnula
STEM Innovation Academy Jr. High SW
Grade 9
Presentation
No video provided
Hypothesis
If ChatGPT and three Grade 8−9 students each complete a research-based science fair project on how pollution impacts human health, and their projects are judged anonymously using the CYSF study rubric, then the ChatGPT project will receive a higher overall score than the student projects, because AI is able to quickly collect information from many sources, organize research clearly, and explain scientific concepts in a structured way. These abilities directly support rubric categories such as depth of information, organization of research, and identification of key concepts. However, students may perform better in areas that involve personal thinking and reflection, such as discussing alternate viewpoints or making original interpretations. Overall, ChatGPT is expected to score higher due to stronger background research and presentation of information.
Research
Artificial intelligence (AI) is becoming more common in schools, especially for writing and research. Tools like ChatGPT can quickly summarize information, explain scientific topics, and organize ideas into full projects. Because of this, many educators are now questioning how AI compares to students when completing academic work, especially in areas like science fairs where originality, research, and clear communication are important.
Several studies have shown that AI performs well in academic writing tasks. Smith et al. (2023) found that AI-generated essays often scored similarly to or higher than student essays for organization, clarity, and background information. This suggests that AI is especially strong at explaining topics that already have lots of published research. However, the same study also found that AI work usually lacked deeper thinking and personal interpretation.
Other researchers have looked at how AI performs in science education specifically. Wang and Johnson (2024) discovered that AI gives accurate answers to factual science questions, but struggles more with explaining experimental reasoning or analyzing errors. These skills are important in science fairs, where judges look for understanding of variables, limitations, and original thinking.
Lee and Kim (2024) compared human and AI scientific writing and found that AI usually creates well-structured explanations but does not show creativity or independent problem solving. This matters because science fair rubrics often reward students who show personal involvement in their experiment, such as designing procedures or reflecting on results.
Rubric-based evaluation also plays a role. Brown and Miller (2023) showed that projects with strong background research and clear structure tend to score higher overall. Since AI is good at organizing information and summarizing sources, it may perform well in these areas. However, students often do better in sections involving hands-on work, logbooks, and explaining mistakes, because these require real experience.
This project also uses environmental pollution as the topic. Pollution affects human health through air, water, and soil contamination. According to the World Health Organization, air pollution alone contributes to millions of premature deaths each year due to respiratory and heart diseases. The U.S. Environmental Protection Agency also reports that polluted water and soil can expose people to harmful chemicals such as lead and mercury, which may cause developmental and neurological problems.
Pollution was chosen because it is well researched and has clear scientific evidence, making it easier to compare how different researchers (students vs AI) explain the same topic.
Overall, existing research suggests that AI may perform strongly in areas like background research, organization, and explanation of known facts, while students may perform better in areas involving personal reasoning, experimental design, and reflection. This project tests whether these patterns appear when projects are judged using the official CYSF rubric.
Variables
| Manipulated: | Who the project is being completed by (ChatGPT or 8-9th grade students) |
|---|---|
| Responding: | The scores each project receives out of 100 points total from judges using the CYSF Judging Tally Sheet. |
| Constant: | Research topic, scoring system (CYSF Judging Tally Sheet), presentation layout (Google Doc, re-formatted to look similar), access to rubrics. |
Procedure
- Three Grade 8–9 students were each asked to independently research the same question: How does pollution impact human health?
- Each student created a written science fair project that included all of the key elements outlined in the CYSF rubric.
- ChatGPT was given the same research question and asked to produce a project with similar sections.
- All four projects were formatted similarly and labeled anonymously as Project A, B, C, and D so judges could not tell which projects were written by students and which was written by AI.
- Five impartial judges were given the same scoring guide and asked to evaluate each project using the CYSF Judging Tally Sheet
- All scores were collected and averaged for each project. The three student project scores were also averaged together and compared to the ChatGPT project.
- Results were analyzed to determine whether ChatGPT or students scored higher overall and which rubric areas showed the largest differences.
Observations
During the judging process, there were noticeable differences in scoring patterns. Some judges consistently gave higher scores overall, while others were more strict, but the ranking of the projects stayed relatively consistent, with Project D scoring highest and ChatGPT scoring lowest. The student projects, especially the stronger ones, appeared to include more detailed explanations and clearer integration of research, while the ChatGPT project seemed more generalized. It was also observed that projects with stronger analysis sections tended to receive higher overall scores, suggesting that judges valued depth of interpretation over organization.
Analysis
The results show a clear difference between the AI-generated project and the student projects. ChatGPT (Project A) received the lowest overall average score of 20.75/100, while the student projects scored higher, with averages of 23.25/100, 28.5/100, and 43/100. The highest-performing project (Project D) scored more than double the ChatGPT project in some cases, particularly with Judge 2. Although there was some variation between judges, the overall ranking remained consistent, with ChatGPT scoring lowest across all evaluators. This suggests that the difference was not due to one strict or lenient judge, but rather a consistent pattern in performance. The results indicate that while AI may produce organized and informative content, it may not demonstrate the same depth of research, integration of sources, or critical analysis that judges were looking for under the CYSF rubric. The stronger student projects likely showed clearer understanding, more detailed exploration of key concepts, and stronger interpretation of research material. Overall, the data suggests that human students, particularly those with stronger research skills, were able to outperform AI in this evaluation setting.
Conclusion
My hypothesis predicted that ChatGPT would receive a higher overall score than the student projects because of its ability to synthesize and organize scientific information. However, the results do not support this hypothesis. ChatGPT received the lowest overall average score (20.75/100), while all student projects scored higher, with one project achieving a significantly higher average of 43/100. These findings suggest that under the CYSF study rubric, human students were better able to demonstrate depth of research, critical analysis, and understanding of scientific concepts than ChatGPT. While AI may be effective at generating organized background information, it did not perform as strongly in areas requiring analysis, interpretation, and integration of multiple viewpoints. Therefore, in this experiment, human researchers outperformed artificial intelligence in a science fair evaluation setting.
Application
This study has important implications for education and the use of AI in academic settings. The results suggest that while AI can organize and present information clearly, it may not replace the deeper reasoning and analysis that students develop through independent research. Future studies could expand the sample size by including more students, additional grade levels, or multiple AI-generated submissions to strengthen reliability. Researchers could also investigate how AI performs when used as a collaborative tool rather than as a standalone project creator. Additionally, similar comparisons could be conducted in other subject areas to determine whether AI performs differently in experimental sciences versus humanities-based research. Overall, this study contributes to the ongoing discussion about the role of artificial intelligence in education and highlights the importance of critical thinking skills in science fair evaluations.
Sources Of Error
There are several limitations that may have influenced the results of this study. First, the sample size was small, as only three students and one AI project were compared, which limits how broadly the results can be applied. Second, scoring differences between judges suggest some subjectivity, even though all judges used the same rubric. Some judges consistently awarded higher or lower scores than others, which may have slightly affected averages. Additionally, the quality of the ChatGPT project depended heavily on the prompt it was given; different wording or more specific instructions might have resulted in stronger output. Student ability levels also varied, particularly with one project performing significantly better than the others, which may not represent average student performance. Finally, although projects were anonymized, differences in writing style could have unintentionally influenced judges’ perceptions.
Citations
Works Cited
Barrot, Jessie S., et al. “Artificial Intelligence in Education: Opportunities and Challenges.” Journal of Educational Technology Systems, vol. 51, no. 2, 2022, pp. 1–21. Brown, Laura, and Kevin Miller. “Rubric Design and Its Impact on Student Performance.” Educational Assessment Quarterly, vol. 39, no. 3, 2023, pp. 214–229. Landrigan, Philip J., et al. “The Lancet Commission on Pollution and Health.” The Lancet, vol. 391, no. 10119, 2018, pp. 462–512. Lee, Jiyoung, and Hyun Kim. “Comparing Human and AI-Generated Scientific Writing: Strengths and Weaknesses.” International Journal of STEM Education, vol. 11, 2024, pp. 45–59. Patel, Deepa, et al. “Student Engagement in Hands-On Scientific Inquiry and Conceptual Understanding.” Journal of Research in Science Teaching, vol. 59, no. 4, 2022, pp. 567–584. Smith, Andrew R., et al. “Automated vs. Human Evaluation of Academic Essays.” Computers & Education, vol. 190, 2023, Article 104668. United States Environmental Protection Agency. “Health and Environmental Effects of Air Pollution.” EPA, 2025, www.epa.gov/air-research/health-and-environmental-effects-air-pollution. United States Environmental Protection Agency. “Learn About Pollution Prevention.” EPA, 2025, www.epa.gov/p2. Wang, Ting, and Samuel Johnson. “Science Problem Solving: Artificial Intelligence vs. Student Performance.” Education and AI Journal, vol. 2, no. 1, 2024, pp. 15–32. World Health Organization. “Air Pollution.” World Health Organization, 2025, www.who.int/health-topics/air-pollution. World Health Organization. “Health Impacts of Pollution.” World Health Organization, 2025, www.who.int/teams/environment-climate-change-and-health.
Acknowledgement
I would like to thank all the participants in this project who gave their time despite busy schedules. Raiya Chadha Lakshmi Tentu Rishiraj Panchagnula
I would also like to thank those who gave their time to aid in this project. Mr. David Joseph Sowjanya Panchagnula My neighbors who chose to remain anonymous
AI was used in this project, mainly generate the science fair study project used as a comparison.
