CYSF

Presentation
Problem
Method
Analysis
Conclusion
Citations
Acknowledgement
Attachments

CHESSTER- The Chess Playing Robot

A chees engine built using Deep Reinforcement learning and python. This project uses neural networks and Markov Decision Process to be optimal for decision making.

Owen Skinner Paavan Ahuja

Grade 9

Presentation

Not working? Open in a new tab.

Problem

Can we feasibly build a proficiently build a machine learnign algorithm to build a chess engine using deep reinforcement learning?

Method

Our neural network will built using python, and the libraries Keras, Tensorflow, Python-chess, numpy. First, we will build the network using Deep Q Learning, with the following layers: The input layer: a 12x8x8 matrix of ones and zeros - 12 layers for each peice, and an 8x8 grid for all the peices and 3 dense layers (interconnected layers). We train the network using the mean squared error (mse) loss equation and a chess board dataset. To put together the physical board, we will use reed switches under a 8x8 board to detect the magnets in the peices on the board, and test all of the squares to get a matrix output of ones and zeroes to feed to the neural network. We plan to have to network running on a Rasperry pi 4.

Analysis

From the data collected while developing the network, the more you train, the better the outcome of the model is. The network still doesn’t play super well, but we are happy enough with the result. From the data gathered, we can tell that over time the training loss (error between stockfish evaluation and the chess bot evaluation), and the overall skill (evaluated using the elo rating system. improved over time. We found the most success in using a convolutional neural network. Convolutional networks are good at recognizing patterns, which in our case applies pretty well to a chess board that has many patterns.

Ultimately, this project is aimed at researching and applying Deep Reinforcement Learning(DRL) and its components such as Markov Decision Process(MDP), Function approximation, DRL algorithms, etc. This study portrays that DRL algorithms, specifically Deep Q-learning(DQN) are viable and effective methods for building a proficient chess engine. By applying the MDP framework and using the Bellman equation, the agent learns to optimize its moves based on long-term rewards. This paper also highlights the pros of DRL, such as reducing human intervention and dealing with complex decision making processes; while also acknowledging the challenges and limitations of DRL, like long training times, high computational costs, and difficulties in debugging. The use of function approximation through neural networks enables the agents to manage the large space and action spaces inherent in chess. Future research can explore improving the efficiency of DRL algorithms and their optimization could unlock even more advanced and adaptive AI systems in the future.

Conclusion

In conclusion, while Deep Reinforcement Learning has shown great promise in offering a powerful approach for developing intelligent systems like chess engines, challenges like the need for vast amounts of training data and computational power remain, which makes it a continually evolving field with significant potential for future advancements.

Citations

References
- Cevallos, J. F., Rizzardi, A., Sicari, S., & Porisini, A. C. (2023, June 15). Deep Reinforcement Learning for intrusion detection in the Internet of Things: Best practices, lessons learnt, and open challenges. Science Direct. Retrieved November 04, 2024, from https://www.sciencedirect.com/science/article/abs/pii/S1389128623004619
- Coraci, D., Brandi, S., & Cappozzoli, A. (2023, September 1). Effective pre-training of a deep reinforcement learning agent by means of long short-term memory models for thermal energy management in buildings. Science Direct. https://www.sciencedirect.com/science/article/pii/S0196890423006490#:~:text=An%20automatic%20and%20recursive%20procedure,a%20half%20months%20of%20training
- Deep Reinforcement Learning: What It Is and How to Benefit. (2023, August 3). LinkedIn. Retrieved November 4, 2024, from https://www.linkedin.com/advice/1/how-can-you-benefit-from-deep-reinforcement
- Ibrahim, S. (2024, July 21). Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications. arXiv. Retrieved March 5, 2025, from https://arxiv.org/html/2408.10215v1
- Kanade, V. (2022, December 20). Markov Decision Process Definition, Working, and Examples. Spiceworks. Retrieved March 5, 2025, from https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-markov-decision-process/
- Linkedin. (2023, August 3). Deep Reinforcement Learning: What It Is and How to Benefit. LinkedIn. Retrieved November 4, 2024, from https://www.linkedin.com/advice/1/how-can-you-benefit-from-deep-reinforcement
- Luo, G. (n.d.). Human-level control through deep reinforcement learning. Retrieved October 9, 2024, from https://courses.grainger.illinois.edu/cs546/sp2018/Slides/Apr05_Minh.pdf
- Rao, P. V., B., V., Manjeet, Kumar, A., Mittal, M., Verma, A., & Dhabliya, D. (2024). Deep Reinforcement Learning: Bridging the Gap with Neural Networks. International Journal of Intelligent Systems and Applications in Engineering. https://www.ijisae.org/index.php/IJISAE/article/view/4792
- Rouse, M. (2019, October 31). Function Approximation. Techopedia. https://www.techopedia.com/definition/34061/function-approximation
- Rouse, M. (2020, July 2). Markov Decision Process. Techopedia. https://www.techopedia.com/definition/34061/function-approximation
- Vernavas, L. (2022, December 14). Challenges and limitations of Deep Reinforcement Learning for your business. LinkedIn. Retrieved November 3, 2024, from https://www.linkedin.com/pulse/challenges-limitations-deep-reinforcement-learning-your-varnavas
- Watkins, C. J.C.H., & Dayan, P. (1992). Q-learning. Springer.com. Retrieved October 9, 2024, from https://link.springer.com/content/pdf/10.1007/BF00992698.pdf

Acknowledgement

We would like to acknowledge Mr. Mark Skinner, cousin in helping develop and troubleshoot the neural network.

Attachments

View Log Book
(may download a file)