How Good Is AI at Catching Phishing Emails?
Sohaib Ehtisham
Horizon Leadership Academy
Grade 8
Presentation
Hypothesis
I believe that ChatGPT will give a more efficient answer as it is used more than Gemini, Claude, and Llama. This AI agent is more popular and most people would know of ChatGPT than others as it is considered to be “more advanced”
Research
| Project Title: How Good Is AI at Catching Phishing Emails?
| Name: Sohaib Ehtisham |
|---|
| Research |
|
https://www.cloudflare.com/learning/access-management/phishing-attack/
|
| (An image was taken from this website of an example of a phishing email) | Wikipediahttps://en.wikipedia.org/wiki/Phishing#
|
| Cyber criminals might use urgent or threatening language or make false promises in phishing messages to tempt you to click on the links inside them. They may also send just enough information to make you curious about a link, or make you feel you’re missing out on something. These links contain malicious content that can risk your cyber security. They can include the following threats:Downloading malware - clicking a link in a phishing message can download malware onto your device. Malware is a harmful software that can damage your device, spy on you or steal your information.Stealing information - links in phishing messages often lead to spoofed websites, which are disguised to look like legitimate websites you might recognize, like your bank or social media. These sites might ask you to log in or enter personal or sensitive information for scammers to steal.Losing money - clicking on a phishing link can cause you to lose money. Cyber criminals can use your personal information to make purchases or steal your credit card information. Urgent or threatening languageReal emergencies don’t happen over email.Look out for:Pressure to respond quicklyThreats of closing your account or taking legal action Requests for sensitive informationAnyone asking for personal information over email or text probably shouldn’t be trusted with it, anyway.
Look out for:Links directing you to login pagesRequests to update your account informationDemands for your financial information, even from your bank. Anything too good to be trueWinning a lottery is unlikely. Winning a lottery you didn’t enter is impossible!
Look out for:Winnings from contests you’ve never enteredPrizes you have to pay to receiveInheritance from long-lost relatives Unexpected emailsExcept the unexpected, and then send it right to the trash.
Look out for:Receipts for items you didn’t purchaseUpdates on deliveries for things you didn’t order Information mismatchesSearching for clues in phishing email puts your love of true crime podcasts to good use.
Look out for:Incorrect (but maybe similar) sender email addressesLinks that don’t go to official websitesSpelling or grammar errors, beyond the odd typo, that a legitimate organization wouldn’t miss Suspicious attachmentsAttachments might seem like gifts for your inbox. But just like real gifts, they’re not always good…
Look out for:Attachments you didn’t ask forWeird file namesUncommon file types Unprofessional designFor some reason, hiring a graphic designer isn’t on a cyber criminals priority list.
Look out for:Incorrect or blurry logosCompany emails with little, poor or no formattingImage-only emails (no highlightable text)
| Government of Canadahttps://www.getcybersafe.gc.ca/en/blogs/why-you-shouldnt-click-links-suspicious-messageshttps://www.getcybersafe.gc.ca/en/resources/7-red-flags-phishing
Variables
Independent Variable: The independent variable is the variable that is intentionally changed or manipulated by me, so that would be the AI I give the responses to Dependent Variable: The dependent variable is the variable that is recorded and it depends on the independent variable, so that would be the response given from the AI to me Controlled Variable: The controlled variable is the variable that always stays the same, so that would be the training give to the AI
Procedure
- Create a spreadsheet where you will insert your data
- Add 6 columns and 22 rows
- Leave the first 2 columns blank and label the others, the names of the different AIs you will be testing
- Leave the first 2 rows blank and label the next ten, attempt 1, attempt 2, attempt 3, and so on, do the same thing for the last ten rows
- Change the colors of the rows, either green or red, red being the attempt where you give them the phishing email, and green where you don’t, but, the first group of attempts must be the same color pattern as the other, for example, both attempts must be the same color
- Understand that the 1st group of attempts and the 2nd ones are using the same emails but the first group is where you just give the email to the AI with no training and the second group is where you give them training before the email
- Now that you have completed your data table, open up all four AI’s, ChatGPT, Gemini, Claude, and Llama
- Collect 10 phishing emails and 10 emails that look like phishing but give hints that they are not
- Now go through each row of the first attempt group, filling out the percentage that the AIs gave you without training, all AIs must be given the same email and then you should record their responses
- Now, you may choose the type of training you give the AIs for the second group, but the training I gave in my experiment was the 7 red flags of phishing method, inform the AIs to use only the red flag method for all the emails you already gave them, then insert the data
- After that, you can determine how accurate each AI was, or, which is more trusted to tell you the most accurate percentage
Observations
During my experiment, I have noticed lots of spikes and drops in the percentages between the 4 AIs. For example, in trained response, attempt 10, ChatGPT gave the response of 35-45% while Gemini was 100%, Claude 71%, and Llama 86%. Seeing that the changes are drastic between the other AIs, I have labeled each AI based on their characteristics in terms of finding phishing emails.
ChatGPT - Most unpredictable in responses + most inaccurate Gemini - Most predictable in responses + most accurate Claude - Most consistent between trained and untrained responses + average Llama - Most used for factors + average
Analysis
Through my observations, I have analysed that my hypothesis was incorrect. Even though ChatGPT is the most popular used AI, Gemini was the most accurate AI as ChatGPT was very unpredictable and Gemini gave responses such as 99.9%, 100%, and in the case of real emails, 5%, or even 0%.
Conclusion
Throughout this project, I have been trying to find the answer to the question this entire project revolves around, and that is how good is AI at catching phishing emails. The main results this project gave were combined to find the most reliable AI out of all, I have analysed that Gemini was the most accurate type of AI throughout each and every test of emails given to it. This means that my hypothesis was not supported by the experiment, using the proof of my data that ChatGPT, even though being the most popular type of AI, was not the most strongest or accurate in catching phishing emails as well as spotting normal ones. The data table shows the different responses from different AIs that shows how much percentage they think the email is, looking at these responses, we can see the perspective of the different AIs and how they think, this also helps understand the topic of the project through these perspectives. If I would like to expand this project in the future, I would add more types of AIs into the experiment, maybe different versions of the same AI, I would add more attempts, and I would make my training that I gave the AIs even more detailed as you can see, some of the AIs improved with the training, some did not, and some lessened their performance.
Application
This experiment’s results can now be applied by everyone around the globe, showing, demonstrating, and proving which AIs can be better than others in different ways in terms of catching phishing emails and in terms of any other things.
Sources Of Error
In this particular experiment, the main source of error would be that you are trying to test an AI but the mode for fast and undetailed is on and you want slow and detailed, or there are different models for each AI, this could affect the results or responses you get from each AI. So you should check all the modes and types the AIs you are using are on to have an experiment of your satisfaction.
Citations
Cloudflare https://www.cloudflare.com/learning/access-management/phishing-attack/
Wikipedia https://en.wikipedia.org/wiki/Phishing#
Government of Canada https://www.getcybersafe.gc.ca/en/blogs/why-you-shouldnt-click-links-suspicious-messages https://www.getcybersafe.gc.ca/en/resources/7-red-flags-phishing
Acknowledgement
I acknowledge and appreciate the people who have supported me throughout this project:
- My science teacher - Ms. Sara Behairy
- My family
- Shaaf Babar and his father - Babar Shehzad
