Using Bayesian statistics to interpret DNA fingerprinting evidence

Our research project can allow us to use the Bayesian statistics to interpret DNA fingerprint matches more accurately. This method uses analysis, previous information and probability to determine how we can strengthen DNA fingerprinting evidence. By appl
Rishitha Shivamurthy, Shirin Bingi
Henry Wise Wood High School
Grade 10

Presentation

No video provided

Problem

Question: If everyone has unique DNA, how can DNA evidence still be wrong and how can statistics help us explain this?

DNA fingerprinting is tend to be seen as perfect perfect form of evidence to rely on as everyone has unique form of DNA. However, the foresnsic DNA tests do analyze all of a person's DNA and sometimes there could be false positives that arise, espicially when there is a large database of DNA that needs to be explored. This can create a risks that the DNA evidence can be misunderstood in criminal cases. The main problem is to discover how reliable a DNA match really is when there are errors and probability involved.

Method

First, we started off by describing what DNA fingerprinting is and what the Bayesian statistics method is. Then we talked about the evolution of each and some real-life examples. Next, we talked about the accuracy and the reliability as well as the use in forensics cold cases. Then we explored the topics of ethics & privacy and techniques & technology. Next, we talked about some of the challenges and error as well as what could AI help with. Then we ended off by talking about visualizations and simulations, statistics and some of the pros and cons. Lastly, we ended with our conclusion.

Research

  1. What is DNA Fingerprinting?

  2. DNA fingerprinting is a technique used to determine the identity of a person based on the nucleotide sequence of specific parts of the human  DNA which are unique to every individual.

  3. A DNA strand is built as two long strands, double helix,  which are made up of repeating units of nucleotides. There are three main parts to the DNA strand: phosphate group, a deoxyribose sugar, and one of the following four nitrogen bases: Adenine (A), Thymine (T), Guanine (G), Cytosine (C).
  4. Polynucleotide chains are formed when nucleotides are linked together through bonds of sugar of one and the phosphate of the next which forms a sugar-phosphate backbone.
  5. Purines and Pyrimidines are nitrogenous bases that make up two different kinds of nucleotides bases in the DNA and RNA.
  6. Purines: are natural chemicals that are found in the body's cells and many foods which build DNA and RNA (adenine and guanine) and are vital for energy (ATP).
  7. Pyrimidines are organic compounds which contain nitrogen with a single-ring structure that form nucleobases of DNA & RNA. They mainly form cytosine and thymine (in DNA), and uracil (in RNA).
  8. 2. What is Bayesian Statistics?
  9. Bayesian statistics is a method of analyzing data where you begin with an initial belief and then change your belief as you get new information. This combines what you know with new evidence to make more accurate decisions.
  10. Bayes’ Theorem is a mathematical formula used in probability that changes the probability of a hypothesis when given new evidence.
  11. 3. Evolution?
  12. Bayesian Statistics Evolution:
  13. Thomas Bayes was the first person in history dated to use probability inductively and established a mathematical model for probability inferences.
  14. He set down his finding of probability in a paper called “Essay Towards Solving a Problem in the Doctrine of Chances” which was presumably published after his death in 1763
  15. Pierre-simon laplace was a figure who was involved in expanding the theory that was developed by Thomas Bayes. He had major contributions to developing the statistics and probabilities by introducing another theory of continuous probability, by proving the first general Central Limit Theorem.
  16. DNA Fingerprinting evolution:
  17. Sir Alec Jeffery was the first person to discover a technique that could identify an individual based on their unique DNA in 1984.
  18. He used his technique to solve the 1986 immigration dispute and also to identify a murderer.
  19. Kary Mullis developed a technique to amplify the specific DNA segments called the polymerase chain reaction
  20. This method splits the DNA into a million pieces through heat to separate the DNA strands.
  21. 4. How accurate is this method and can it be reliable?
  22. There can also be errors that arise from sample handling or test interpretations which can lead to false positives in DNA testing. The Bayesian model can demonstrate how this false information can affect the value of DNA evidence in a legal case that is strong enough to convict someone.
  23. Using this model, we can determine the likelihood of a random DNA match occurring and the chances of an actual false positive occurring when looking into DNA evidence.
  24. Undermining the importance or underestimating the risk of a false positive can lead to serious mistakes in the future, most especially if the suspect was discovered through a DNA database search. Not knowing the real error rate adds uncertainty to how much we can trust DNA evidence.
  25. Bayesian analysis, typically implemented through Probabilistic Genotyping (PG) software, can reliably distinguish between individuals in complex DNA mixtures where traditional, manual analysis (often referred to as CPI—Combined Probability of Inclusion) fails.
  26. Probabilistic genotyping: method used to analyze complex DNA. Uses mathematical models and computer algorithms to evaluate DNA mixtures and provide a probability of a match.
  27. 5. How is the Bayesian statistics method used in Forensics/Cold cases?
  28. Bayesian Statistics is used in Forensics to infer hidden or uncertain biological information within a pattern of DNA.
  29. It can be used to model these pieces of evidence with high throughput data mainly with the study of DNA repair and kinetics of replication origin firing.
  30. Kinetics of replication studies the rate, timing, and mechanism of how DNA repeats
  31. Once the algorithm for an individual's DNA is found, you can use that as prior information to infer the next pattern within the structure which can allow to calculate a certain hypothesis about a disease or status based on family information
  32. 6. The Ethics and Privacy/ Reliable information
  33. ACCURACY AND RELIABILITY
    • “Apart from internal inspections and audits to assess compliance with the QMS and the ISO standards, the QMS and testing methods must be exposed to regular external peer review through accreditation”
    • DNA analysis using Bayesian Inference
    • Fingerprint Comparison and Identification
      • This is used to evaluate the likelihood that a fingerprint found at a crime scene matches a particular individual
    • Understanding Likelihood Ratios and Posterior Probabilities
      • Calculated as the ratio of the likelihood of the evidence given one hypothesis to the likelihood of the evidence given another hypothesis
      • The posterior probability is the probability of a hypothesis given the evidence
      • >1 = Evidence supports hypothesis
      • =1 = Evidence is neutral
      • <1 = Evidence contradicts the hypothesis
  34. ETHICS AND PRIVACY

    • Privacy rights are violated by familial searches
    • Privacy rights critique has several dimensions
      • “Concerns for those tested for the many family members who are caught in the larger crime response dragnet, and a district concern for the innocents whose lives might be destroyed by being under a cloud of suspicion”
    • May impact those who are falsely accused
      • Because of social media, this may cause deep reputational damage

    7. Techniques and Technologies  STR * DNA sequences with repeating units of 2-6 base pairs * “They vary between individuals making them excellent for differentiation” * Uses fluorescent dyes for automated detection and analysis * Able to handle small samples * Used in databases like the Combined DNA Index System (CODIS), helps law enforcement in identifying suspects by comparing “crime scene DNA with database entries” * Restriction Fragment Length Polymorphism * Restriction Fragment Length Polymorphism was one of the first DNA fingerprinting techniques * Uses enzymes to cut DNA into fragments of varying lengths, separated by gel electrophoresis * The resulting bonds are compared between samples to determine similarities and differences in the genetics * This requires large amounts of high-quality DNA * Polymerase Chain Reaction (PCR) * Amplification of specific DNA sequences, generating millions of copies from a small beginning sample * Crucial when dealing with small and/or degraded samples * “Repeated cycles of heating and cooling, separating DNA analysis techniques”

    8. Challenges/errors * * “DNA casework” is an important source of forensic evidence , and an error in the evidence can lead to harm to innocent suspects and a failure to capture the right suspect. * These issues include human error such as linking the wrong people to the crime and violating privacy rights and a surge in racial disparities * Error of accusing people and families wrongly * Some downfalls for Bayesian statistics include * Misunderstanding the prior probability * Misinterpreting the likelihood ratio * Failing to account for uncertainty * Computational complexity * Many Bayesian models require numerical methods that are complex like MCMC, which can be “computationally expensive” * Choice of prior [P(A)] * The prior you choose can greatly affect the results. Picking a good prior is hard when you don’t have prior knowledge about the problem * Model Selection * It can be tough to figure out the best model * Bayesian methods require assumptions about the model and the prior * Sample contamination, faulty preparation procedures, and mistakes in interpretation of results * Living or dead specimen * 9. AI involvement * AI used to improve halogroup classifications * “*Halogroups are groups of individuals who share a common ancestor through their maternal or paternal lineage * Halogroup classification is used to help determine the ancestral origins of unidentified remains or individuals in criminal investigations.” * AI helps enhance the efficiency and accuracy of the classification process * The classification process is where people had to manually analyze genetic data, which took too much time and effort * AI can be very useful in the creation of automated DNA profile interpretation systems * “ML algorithms are used to analyze DNA profiles generated by STR typing’’ * This analyzes a vast amount of data and identifies patterns that may not be easily recognizable by humans * Bayesian AI - the critic- logical, careful and grounded * “It doesn’t just ask what is possible but how sure we are that its true”

    10. Visualization and simulations * Situation: * A crime occurs, and somewhere near the scene there is DNA evidence found (a knife or gun). * DNA is found because skin cells contain nuclear DNA (the genetic blueprint found inside the nucleus of the eukaryotic cells) which can be used for DNA fingerprinting evidence. * The DNA present on the knife could either be the victims, the suspect or someone else within the area, this piece of evidence does not prove who the real suspect is.

    DNA fingerprinting: * The lab would take into account the evidence and then analyze the STR markers (Short Tandem Repeats) to produce a DNA fingerprint. * DNA fingerprinting only looks at a section within the entire genome so there are still chances of rare matches occurring.

    DNA database searches: * The DNA fingerprint from the crime scene is compared to a large database of people’s DNA profiles, for example in this example the total population is 50,000 people. False positive rate is 1 in 1,000 * Even if only one person committed the crime, testing many people means that there is a small chance that someone else could also have a match. * After doing the sample search, there would be one person's DNA that would match the crime scene sample. This does not mean it was 100% them because DNA could have been transferred accidentally or the test could have had human error associated with it. * P(Guilty I Match) = P(Match I Guilty) x P(Guilty) P(Match)

    • P(Guilty l Match) = likelihood of how guilty someone is after DNA match
    • P(Match l Guilty) = How likely the person is the real suspect from the match
    • P(Guilty) = How likely the person was guilty BEFORE DNA testing
    • P(Match) = how likely it is to find the match of the person

    a) True Match- * P(Match I Guilty) = 1 x 1 50,000 = 1 50,000

    b) 499,999 people are innocent people from population * false positive rate was 1 in 1,000 * 49,999 x 1 1,000 50 * 1 true match 50 false matches which means total matches 51 * P(Match) =  51 50,000 * Put all this data into the formula to determine the probability of guilt matches. * P(Guilty I Match) = 1 x 150,000 5150,000 x 100 P(Guilty I Match) = 0.0196 x 100 P(Guilty I Match) 2% 

    Bayes’ theorem was used to determine the probability that the DNA match truly came from the suspect. When false positives and database size were considered, the probability that the matched person was the true source of the DNA was much lower than expected. This demonstrates why Bayesian statistics are important when interpreting DNA fingerprinting evidence. * 11. Pro and Cons * PROS: * Prior knowledge is useful * Probability statements * Smaller sample sizes * Iterative learning * CONS * Subjectivity * Complexity * Limited Popularity

Data

Image

Where the new estimate lies between the old data and the new data. This is comparing Frequentist statistics and Bayesian statistics

This will help with the next graph

Conclusion

In conclusion, our project is able to demonstrate how Bayesian statistics can be used in DNA fingerprinting to provide a framework about how to calculate the strength of evidence collected in the situation. Using the formula of Bayesian statistics, it can alter the perspective of how individuals view ideas. With probability, it shows the false assumptions that are made of a person being guilty compared to the actual probability of a person being guilty. This can reduce the number of false cases that have been put upon innocent people with inaccurate data collection methods.

Citations

Works Cited

Alton, Paddy. “An Introduction to Bayesian Statistics.” Medium, ITNEXT, 6 June 2023, itnext.io/an-introduction-to-bayesian-statistics-fc1fbdadab80. Accessed 4 Mar. 2026. Anil. “When AI Learns to Doubt: How Bayesian Reasoning Is Making Generative AI Smarter and Safer.” Medium, 11 Oct. 2025, medium.com/@shuklaks/when-ai-learns-to-doubt-how-bayesian-reasoning-is-making-generative-ai-smarter-and-safer-a548c95d55f6. Accessed 4 Mar. 2026. Editor Editor. “Pros and Cons of Bayesian and Frequentist Statistics in Biomedical Research | Editage.” Educational Articles for Researchers\, Students and Authors - Editage Blog\, 10 Aug. 2023\, www.editage.com/blog/pros-and-cons-of-bayesian-and-frequentist-statistics/. Finger. “Identity\, Conceptual Image. Finger Print and DNA (Deoxyribonucleic Acid) Scan Illustrating Revealing Physical Evidence of Human Identity Stock Illustration | Adobe Stock.” Adobe Stock\, 2026\, stock.adobe.com/ca/images/identity-conceptual-image-finger-print-and-dna-deoxyribonucleic-acid-scan-illustrating-revealing-physical-evidence-of-human-identity/944624598. Accessed 4 Mar. 2026. Team, The BioPharma. “Bayesian Methods in Clinical Trials - BioPharma Services.” BioPharma Services - BioPharma Services, 27 Nov. 2024, biopharmaservices.com/blog/phase-1-using-bayesian-statistical-methods-in-clinical-trials-across-different-phases/. “Advancements in DNA Fingerprinting Techniques and Their Applications.” BiologyInsights, 21 Oct. 2024, biologyinsights.com/advancements-in-dna-fingerprinting-techniques-and-their-applications/. Ahmed, Aziza. “Ethical Concerns of DNA Databases Used for Crime Control - Petrie-Flom Center.” Petrie-Flom Center - the Blog of the Petrie-Flom Center at Harvard Law School, 14 Jan. 2019, petrieflom.law.harvard.edu/2019/01/14/ethical-concerns-of-dna-databases-used-for-crime-control/. Gulati, Jayita. “A Complete Guide to Bayesian Statistics.” Statology, 11 June 2025, www.statology.org/a-complete-guide-to-bayesian-statistics/. Lee, Sarah. “Bayesian Inference in Forensic Biology.” Numberanalytics.com, 2025, www.numberanalytics.com/blog/bayesian-inference-forensic-biology-ultimate-guide. Accessed 5 Mar. 2026. Sessa, Francesco, et al. “Artificial Intelligence and Forensic Genetics: Current Applications and Future Perspectives.” Applied Sciences (Basel), vol. 14, no. 5, 4 Mar. 2024, pp. 2113–2113, https://doi.org/10.3390/app14052113. Smith, J. H., and M. Singh. “Forensic DNA Profiling: Legal and Ethical Considerations.” Journal of Scientific Research and Reports, vol. 30, no. 5, 15 Mar. 2024, pp. 141–144, journaljsrr.com/index.php/JSRR/article/view/1929, https://doi.org/10.9734/jsrr/2024/v30i51929. “What Are Some Concerns about the Use of DNA Fingerprinting? | Britannica.” Www.britannica.com\, www.britannica.com/question/What-are-some-concerns-about-the-use-of-DNA-fingerprinting. Optimonk.com, 2026, cdn-sales.optimonk.com/wp-content/uploads/bayesian-statistics-vs-frequentiest-statistics.png. Accessed 4 Mar. 2026. “Bayesian Analysis and Risk Assessment in Genetic Counseling and Testing.” 2004. The Journal of Molecular Diagnostics 6 (1): 1–9. https://doi.org/10.1016/S1525-1578(10)60484-9. Chadwick, Lisa. 2019. “DNA Fingerprinting.” Genome.gov. National Human Genome Research Institute. 2019. https://www.genome.gov/genetics-glossary/DNA-Fingerprinting. Coble, Michael D., and Jo-Anne Bright. 2019. “Probabilistic Genotyping Software: An Overview.” Forensic Science International: Genetics 38 (January): 219–24. https://doi.org/10.1016/j.fsigen.2018.11.009. Forterre, Patrick, Jonathan Filée, and Hannu Myllykallio. 2013. “Origin and Evolution of DNA and DNA Replication Machineries.” Nih.gov. Landes Bioscience. 2013. https://www.ncbi.nlm.nih.gov/books/NBK6360/. Helmenstine, Anne. 2023. “Purines and Pyrimidines.” Science Notes and Projects. September 16, 2023. https://sciencenotes.org/purines-and-pyrimidines/. Perlin, MarkWilliam. 2015. “Inclusion Probability for DNA Mixtures Is a Subjective One-Sided Match Statistic Unrelated to Identification Information.” Journal of Pathology Informatics 6 (1): 59. https://doi.org/10.4103/2153-3539.168525. “Pierre Simon de Laplace.” 2025. Usu.edu. 2025. https://www.usu.edu/math/schneit/StatsHistory/Probabilists/Laplace. “Polynucleotide - an Overview | ScienceDirect Topics.” n.d. Www.sciencedirect.com. https://www.sciencedirect.com/topics/medicine-and-dentistry/polynucleotide. Saad, Rana. 2005. “Discovery, Development, and Current Applications of Dna Identity Testing.” Baylor University Medical Center Proceedings 18 (2): 130–33. https://doi.org/10.1080/08998280.2005.11928051. Schoot, Rens van de, Sarah Depaoli, Ruth King, Bianca Kramer, Kaspar Märtens, Mahlet G. Tadesse, Marina Vannucci, et al. 2021. “Bayesian Statistics and Modelling.” Nature Reviews Methods Primers 1 (1): 1–26. https://doi.org/10.1038/s43586-020-00001-2. “Thomas Bayes | English Theologian and Mathematician.” 2019. In Encyclopædia Britannica. https://www.britannica.com/biography/Thomas-Bayes. Thompson, William C., Franco Taroni, and Colin G. G. Aitken. 2003. “How the Probability of a False Positive Affects the Value of DNA Evidence.” Journal of Forensic Sciences 48 (1): 47–54. https://pubmed.ncbi.nlm.nih.gov/12570198/. University of Leicester. n.d. “Biography of Professor Sir Alec Jeffreys | DNA Fingerprinting | University of Leicester.” Le.ac.uk. https://le.ac.uk/dna-fingerprinting/biography. ———. n.d. “DNA Fingerprinting | University of Leicester.” Le.ac.uk. https://le.ac.uk/dna-fingerprinting.

Acknowledgement

We'd like to say thank you to all our friends and family, as well as our teachers, for helping and supporting us with our project. Thanks to Mr. Manias for reminding us and supporting us when we were lagging in our process. Thanks to our family for encouraging us when we needed help.