CYSF

Presentation

Problem

Question

Can a garbage classifier positively affect the environment by ensuring that waste gets sorted properly?

Hypothesis

If I make a web-based garbage classifier, it will have a positive impact on the waste problems our planet is facing. Humans tend to put trash in the wrong bin, and all waste (recyclables and compost) ends up in landfills. This has a negative impact on the environment and makes it difficult to sort it out in the final stage. Making sure our waste is disposed of properly would decrease the waste in landfills.

Why this is a problem

Trash today

In Canada, there are too many people putting trash in the wrong bin, most simply out of laziness to put it in the right bin or just not knowing which bin it goes in. 84% of households put things in the wrong bin. This is costing recycling programs around the country millions of dollars. One in three pounds (0.45kg in 1.36kg) of trash put in the recycling bin actually shouldn’t be there. Cities in Canada with very dirty recycling (Edmonton and Toronto) can have contamination increases by 25 percent. It's very expensive to process contamination as recycling, it can even cost up to $4 million for a city to do this. All this has become a big issue because China, which is the biggest importer of recyclables, banned importing 24 different types of waste to prevent environmental disasters in the country. One of the waste items they banned from importing was paper and this became a problem for the rest of the world. "Something as simple as a piece of paper with a coffee stain on it, that piece of paper a year ago would have been recyclable, Today that's actually garbage". These are words from Jim McKay because of the China importation ban.

Trash Segregation problems today

The main reason I am doing this project is because of trash segregation problems. This is a problem because dangerous stuff like needles, dead animals, and bear spray are put in the recycling and it costs Canada lots of money to decontaminate them and all that money is wasted because it all ends up in the landfill either way. It costs Canada millions of dollars to decontaminate and segregate trash. Calgary's contamination rate for residential recycling is 13% of all of Canada and the only cities with a higher contamination rate are Edmonton, Toronto, Halifax, and Fredericton. Toronto’s contamination rate is the highest at 26% because trash segregation is the worst over there. This is all just because of people who put the wrong things in the recycling, the process of sorting trash and decontaminating items thrown in the wrong bin costs a lot more than we would expect. In Toronto itself, it costs about $600,000 - $1 million per year, that's a lot just to decontaminate trash. This is why segregating trash is very much needed and otherwise Canada is going to keep wasting money on just segregating trash.

Method

Before I start the method, we first need to understand what exactly is Python and streamlit.

Python is a form of coding language, it's very advanced and already has a lot of functions built into it. I will be using this to code to tell what I want my dustbin to do, and when to do it. I will also use classification where I will classify which segregation the piece of trash will end up in. Streamlit is a Python framework that delivers database apps with a few lines of code. It is what shows on the front end and is controlled with code in the backend. Streamlit creates data science apps mainly for machine learning. It’s a Python library that gets created fast without knowledge of web development.

Method

Importing tensorflow in vs code

What is tensorflow?

Tensorflow is an open-source platform that software developers use heavily. It is mature in deep learning frameworks and can develop advanced models. It can be used to train models and datasets to make it easy to classify items. Tensorflow is based on machine learning and the accuracy increases after the model training. The framework will input the data as multi-dimensional arrays (arrays with more than one dimension), these are called tensors. It is used for many tasks, such as image recognition, language processing, handwriting recognition, and certain equations.

Data Importing

I used machine learning in the Python coding in my project. I used PIL (Python Imaging Library) which is a library to store my Images. Tensorflow and Keras are also used. Keras works with tensorflow; it simplifies the difficulties linked with deep neural networks.

Data Visualization

This shows how to visualize the data. The data visualization lists all the image files in a directory. It calculates the number of rows and columns needed to fit the image on the grid. It repeats the images and puts them in the correct subplot; the code hides the axes of the subplot for cleaner display allowing more accurate visualization. This results in a final grid that is organized for a good display.

Preparing the data (Transforming raw data to be read accurately when analyzed)

This prepares the data to be rescaled to the correct size and in the right batch size. "ImageDataGenerator" is a function that deals with image data for training deep learning modules. The batch size indicated shows how many images are being processed at a time when training, and rescaling the image makes it the pixel size that is needed for preparing the data. The "class_indices" labels all of the classes into numbers. This is all for the train directory, the same code is repeated for the test directory.

This is what the labels look like after using the "class_indices" function. It is all labeled in integers making it helpful when analyzing the model's predictions.

Importing OneDNN

I am using OneDNN. It is an advanced library for deep learning that is made for optimizing the performance of deep neural network computations.

Model Creation

This code is a deep-learning module that takes certain features and details from the image data through convolution and pooling layers. It classifies the images into 6 different categories, trash, metal, cardboard, glass, paper, and plastic. It converts 3D feature maps into 1D vectors. The fully connected layers learn difficult and complex representations from the features taken y the convolutional layers, this is useful when making a final prediction. Overfitting is when too much data is extracted, resulting in many unnecessary details that are not helpful when analyzing the data. To prevent overfitting, I used a function called "Dropout", it prevents overfitting by randomly resetting some neurons during the training process to leave out the unnecessary details.

Model Compilation

The optimizer I used is called Adam. Adam is short for Adaptive Moment Estimation and is an advanced optimization. The "sparse_categorical_crossentropy" is a loss function used when doing multi-class classification with numbered labels. The loss function calculates how well the class probabilities match the actual class labels. The "accuracy" function is the metric I used to assess the accuracy of the model during the testing and training phase. "model.summary()" plots a summary of the model architecture, here it is:

Layer (type) ┃ Output Shape ┃ Param # ┃

┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩

│ conv2d (Conv2D) │ (None, 300, 300, 32) │ 896 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ max_pooling2d (MaxPooling2D) │ (None, 150, 150, 32) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ conv2d_1 (Conv2D) │ (None, 150, 150, 64) │ 18,496 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ max_pooling2d_1 (MaxPooling2D) │ (None, 75, 75, 64) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ conv2d_2 (Conv2D) │ (None, 75, 75, 32) │ 18,464 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ max_pooling2d_2 (MaxPooling2D) │ (None, 37, 37, 32) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ flatten (Flatten) │ (None, 43808) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ dense (Dense) │ (None, 64) │ 2,803,776 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ dropout (Dropout) │ (None, 64) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ dense_1 (Dense) │ (None, 32) │ 2,080 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ dropout_1 (Dropout) │ (None, 32) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ dense_2 (Dense) │ (None, 6) │ 198 │

└─────────────────────────────────┴────────────────────────┴───────────────┘

Train the Model (batch_size = 32, epochs = 10)

I have already prepared the data and created the model, so now I need to train it. An epoch is one complete pass through the data set, 10 epochs specify that the model should be trained in 10 rounds. the "steps_per_epoch" is equal to 2184//32, this is floor division and will round to the nearest whole number which is 68. This means that 68 batches are trained in one epoch; there are 32 images in one batch. Right beside it is the accuracy and loss of each epoch.

Testing Predictions

Successfully classified as paper Successfully classified as metal

Save the model

Makes sure that everything is saved and ready to work with.

Analysis

My Code (utils, same as method)

My code (for application)

These are some components of machine learning, to understand what they are, we first need to understand what is machine learning and what it does.

Machine learning is a strong sense of computer science that uses data algorithms to enable AI. Machine learning slowly and gradually learns from its mistakes and improves its accuracy. Based on input data, machine learning will give a prediction, and will produce an estimate of patterns in the data. In my case, it will use an error function, where it identifies a margin of error to decide how accurate the following piece of trash is to the picture stored in the image library. Machine learning is adapting to human behaviors and trying to imitate them. It uses various image sources to define which segregation it belongs to, kind of like how we work, we see the trash with our eyes, and using previous knowledge, put it in the correct segregation. That's how it learns over mistaken attempts, when it makes a mistake, it will acknowledge that and fix it, just like humans!

Numpy is a package that is fundamental for scientific computing just for Python. It is a Python library that provides a multidimensional array object, different derived objects, and an assortment of routines for fast operations on arrays. It also has mathematical aspects like logic, shape manipulation, sorting, selecting, basic linear algebra, basic statistical operations, etc. I also used urllib.request. This module is used to define functions making it easier to open URLs; opening URLs is one of 2 options to upload an image onto streamlit. Tensorflow and Keras are also used. Keras works with tensorflow; it simplifies the difficulties linked with deep neural networks. Tensorflow and Keras will help train my model.

The loads the integer labels into the application.

These are the links and information for the display of the application. It includes the colour, font size, picture, title, font, etc.

This is the outcome of putting all of the other information on the display, my title is on the top with my picture under it, all with its font, size, and colour I set it as.

This displays a drop box when clicked on. The options are "Upload image via link" and "Upload image from device".

This is what it looks like in the application.

This elif statement states that if an individual chooses to upload an image from the device, then it must be jpg, png, or jpeg. If uploading an image through a link is chosen, then it is required to put the address or URL of the image. This is where urllib.request comes into play. If the URL is not valid, then "Invalid image URL! Please enter a valid URL" will be printed for viewers to see.

First, the image is displayed, and then when the "Predict" button is clicked, it processes the image and loads the model architecture. Using the model it will make a prediction and the category that has the highest percentage will be the prediction it makes. If an error occurs during predictions, it will print "An error occurred during prediction" and then print the error.

Recycling analysis today

Recycling stats

In Calgary, 80 percent of what's in the recycling bin is recyclable, the other 20 percent is contaminated. Everything that is recycled goes to a factory to get sorted out. It goes through many stages before finishing. 70 percent is sorted by machine and 30 percent is manually sorted out. To remove metal, there is a large magnet that picks all the metal up and removes it from the lines. It also sorted out all the other recyclables like glass, paper, and cardboard. Whatever the machines left are then manually sorted leaving only materials that aren't recyclables. There are some flaws though, anything like styrofoam, large scrap metals, or garbage can interfere with the machines and cause them to not work. In Toronto, 70 percent of items in residential recycling bins are recyclable, while 30 percent are contaminated. That is 10 percent more than Calgary.

What can and can’t be recycled

Plastic usually cannot be recycled, but if it is stretchy like grocery bags or bubble wrap, then it can! Containers made of plastic are also recyclable, a case where plastic wouldn't be recycled is if it's crinkly like a chip bag. If containers are made of tin, then they can be recycled, as well as tin foil and pop cans. Glass can also be recycled, in the factory, it will be crushed and shipped to companies to make new products. What can’t be recycled though is household items such as small appliances, trays, furniture, or lightbulbs. Toys and sports equipment can not be recycled either, along with clothing or shoes. Styrofoam will damage the machines in the factories so they cannot be recycled. Lastly, no hazardous materials as it is expensive to decontaminate.

Analysis

I did 10 rounds of each category to see the accuracy. Here are the results:

Paper 8/10
Plastic 9/10
Glass 8/10
Cardboard 9/10
Metal 8/10
Trash 7/10

Plastic and cardboard did the best as cardboard is very similar to most images, the only time it messed up, it got confused with paper. This isn't the biggest deal though because both paper and cardboard are recyclable. Plastic has similar shapes especially when it comes to plastic bottles. Paper got mixed up with cardboard a couple of times; glass got mistaken for plastic, and metal got confused with glass and plastic. Trash did the worst because there is such a wide variety and I only have around 150 photos for it. Overall, it got a score of 49/60, Meaning in 60 rounds, it got a 78.34% accuracy.

Extension

Before the cysf fair, I will extend on one thing that will affect this current analysis. I will add more pictures to my database. This means that there are more images to compare to, also meaning that the accuracy will increase.

Conclusion

My hypothesis is correct, my project if expanded would almost always correctly segregate trash at the source level. This will reduce the upstream sorting cost and less waste to reach landfills. So indeed this would benefit the environment, human health, and the economy. Proper waste management can also reduce pollution, prevent the spread of diseases, and conserve natural resources.

Sources of Error

It wasn't completely accurate - I have many photos in my dataset, but there will always be new angles or orientations that I do not have, therefore not making it 100% accurate

I could have made a physical model - I made it web-based meaning I need to use images online or already saved images, it would be more efficient if it could take a picture in real-time and compare it to my dataset.

Every time I needed to load my application, I would have to type "streamlit run app.py". After that, it would take a long time, approximately 2 minutes to load, meaning that it isn't very fast and efficient to work with.

Real World Applications

Would decrease the amount of unwanted trash in landfills - So many people put the wrong trash in the wrong bins, this project makes it so even if people are lazy, everything will be segregated properly. With rates going so high to decontaminate recycling from garbage (up to $4 million), then I think that at this point, we have to try anything else. Not only is it in landfills, in countries with poverty and no access to a proper way of disposing of trash, but it all ends up on the streets, so if this project is expanded, it must be accessible for all countries. It would make a huge difference in these people's lives.

Cost efficiency - Costs would go way down, there would be such a small amount of error if this project gets expanded for the trash to segregate incorrectly. Making it less decontamination, meaning also less money wasted. Even though the cost would go way down, it won't completely be reduced. There will still be some people who would be too lazy to put trash in a machine that does the work for them or they are not near a machine.

To show change - If this can help the world, people will see the change many years from now and realize how much damage and harm they have done to the planet. The difference between then and now. Maybe then, people might become better people in general.

Relief to recycle factory people who knew how bad things were - These people are probably one of the only people who would actually know how bad things were exactly. This would give them relief knowing that the world could be in better hands.

Citations

Avery, R., & Specialist, M. R. (2022, October 16). 84% of UK households are unintentionally contaminating their recycling bins. WRAP. https://www.wrap.ngo/media-centre/press-releases/84-uk-households-are-unintentionally-contaminating-their-recycling-bins

Chung, E. (2018, April 9). Many Canadians are recycling wrong, and it’s costing US Millions | CBC News. CBCnews. https://www.cbc.ca/news/science/recycling-contamination-1.4606893

Norman, H. (2018, April 11). Managing contaminated recyclables will cost Toronto Millions by end of year. The Globe and Mail. https://www.theglobeandmail.com/canada/toronto/article-changing-recycling-market-could-cost-toronto-92-million-this-year/#:~:text=Managing%20the%20increasing%20contamination%20will,to%20half%20of%20the%20cost.

Petricic, S. (2018, March 28). China looks to turn ban on foreign trash into growth opportunity | CBC news. CBCnews. https://www.cbc.ca/news/world/pollution-recycling-china-petricic-1.4593078

Calgary, C. (2025). Calgary Recycling Facts. https://www.calgary.ca. https://www.calgary.ca/waste/residential/recycling-facts.html#go

Zettler, M. (2019, March 27). Toronto recycling: Why so much material still goes to landfill - toronto. Global News. https://globalnews.ca/news/5099574/toronto-recycling-packaging-landfills/#:~:text=Each%20year%2C%20Toronto%20manages%20approximately,%2C%20and%20non%2Drecyclable%20materials.

Calgary, C. (2025a). What can go in your blue cart. https://www.calgary.ca. https://www.calgary.ca/waste/residential/what-can-go-in-blue-cart.html

Calgary, C. (2025b). What can’t go in your blue cart. https://www.calgary.ca. https://www.calgary.ca/waste/residential/what-cannot-go-in-blue-cart.html

Chung, E. (2018, April 9). Many Canadians are recycling wrong, and it’s costing US Millions | CBC News. CBCnews. https://www.cbc.ca/news/science/recycling-contamination-1.4606893

Brown, S. (2021, April 21). Machine Learning, explained. MIT Sloan. https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained

I, bm. (2025, January 24). What is machine learning (ML)?. IBM. https://www.ibm.com/topics/machine-learning

mhadhbi, N. (2024, September 29). Python tutorial: Streamlit. DataCamp. https://www.datacamp.com/tutorial/streamlit

NVI, D. (2025). What is tensorflow?. NVIDIA Data Science Glossary. https://www.nvidia.com/en-au/glossary/tensorflow/#:~:text=Heavily%20used%20by%20data%20scientists,tensors

Python, U. (2025). Urllib.request - extensible library for opening urls. Python documentation. https://docs.python.org/3/library/urllib.request.html

Cchangcs. (2018, November 24). Garbage classification. Kaggle. https://www.kaggle.com/datasets/asdasdasasdas/garbage-classification

raison024, R. (2022). RAISON024/smart-garbage-segregation: Image classification for recycling refers to the use of machine learning algorithms to automatically classify images of waste materials, such as plastic, paper, and metal, into their respective categories. GitHub. https://github.com/raison024/Smart-Garbage-Segregation

NumPy, T. (2020). What is numpy?#. What is NumPy? - NumPy v1.26 Manual. https://numpy.org/doc/stable/user/whatisnumpy.html

Python, R. (2023, January 25). Image processing with the python pillow library. Real Python. https://realpython.com/image-processing-with-the-python-pillow-library/

Stratvert, K. (2021, March 25). 👩‍💻 python for beginners tutorial. YouTube. https://www.youtube.com/watch?v=b093aqAZiPU&t=1134s

Acknowledgement

I would like to thank my science fair teacher, Ms. Burkell, for looking through my project and giving me tips on how to make my project better. It would have been difficult to find my mistakes and errors if Ms. Burkell wasn't there to help me through the process. I would also like to give a big thanks to my parents for always being there and providing extra support. Finally, I would like to thank the CYSF for giving me this chance and allowing me to show my project to others.

Attachments

View Log Book
(may download a file) View Extra Attachment: report/data/exhibit/etc
(may download a file)

Garbage Classifier

Anirudh Vijayan

Grade 9

Presentation

Problem

Method

Method

Importing tensorflow in vs code

Data Importing

Data Visualization

Preparing the data (Transforming raw data to be read accurately when analyzed)

Importing OneDNN

Model Creation

Model Compilation

Train the Model (batch_size = 32, epochs = 10)

Testing Predictions

Save the model

Analysis

My Code (utils, same as method)

My code (for application)

Recycling analysis today

Analysis

Extension

Conclusion

Conclusion

Sources of Error

Real World Applications

Citations

Acknowledgement

Attachments