Instantaneous NACA Airfoil Optimization Based on Customizable Inputs Using a Deep Q Network
Josh Doran
West Island College
Grade 9
Presentation
Problem
Airfoil optimization techniques are continuously evolving using artificial intelligence. The problem is, training an agent to optimize an airfoil can take copious amounts of time and computational power, meaning even if you want to make small adjustments to your input, it can take many hours before getting a result. This limitation is caused by the nature of how Deep Q Networks train, as well as the time it takes to simulate each airfoil. In this project, I plan to address those issues, and develop a model where you can get instant results based on your user inputs, instead of having to train a model each time you want a different airfoil.
The underlying question of this project is: "Can Reinforcement Learning Models be trained to instantaneously optimize airfoils based on user input, and how well can they do it?"
Method
My goal for this project was to create a Deep Q Network that could optimize airfoils at near instant speeds using user inputs. In other words, the user would put in certain parameters, and then the network would create an airfoil that suits them. Here was the plan: 1. Create an ANN to predict 2D NACA airfoil simulation results. 2. Create a Reinforcement Learning model (Deep Q Network) for the program. 3. Create the environment which would train the model\, using the previous ANN to simulate the model's prediction. 4. Modify and improve the network and environment Starting off: Before starting this project, I had to do copious amounts of research and tutorials in before even starting to make the models for my project. I watched and read many tutorials teaching the basics of PyTorch, before coding my first simple neural network on the iris dataset (https://gist.github.com/curran/a08a1080b88344b0c8a7/). This dataset is a great starting dataset for coding your first ANN. 1. Creating the ANN to predict airfoil simulation results
On my first attempt, I used a similar model to the Iris Dataset model and used a GitHub NACA airfoil dataset. I had to change the loss function from a Cross Entropy Loss function (used for classifiers) to a Mean Squared Error loss function (the loss function I will use for all my neural networks going forwards). The accuracy of this model was good, but it was extremely limited by the dataset (only containing around a hundred lines of data). On my second attempt I programmed a much larger ANN using a dataset I made myself. I made the dataset using XFOIL, a 2D airfoil simulator used by MIT. The data set contained over 400 lines of data, which the model trained on for 20,000 epochs (cycles). The model itself had 6 hidden layers with 320 neurons each.
Model structure:
The data was split into training and test data to ensure the model isn’t using memorization techniques. The average test loss was approximately 0.002. After I looked at the model prediction compared to the actual values, it was evident that the ANN performed well predicting the coefficient of drag, but it needed some work for predicting the coefficient of lift. One of my experts, Karanjot from the University of Calgary, suggested I divide the NACA values up into max camber, max camber position, and thickness instead of just giving the model the 4-digit number. In the third version, I applied his feedback as well as made two separate models, one for predicting lift, and the other for drag. Keep in mind, at this point, the models only had the inputs of the NACA (which was divided up into separate parts) and the angle of attack. I didn’t think of using the Reynolds number as an input, so the Reynolds number was kept constant at 1.5 million. To help visualize this Reynolds number, here is an example:
Formula for Reynold Number:
(sorry it didn't print well on the CYSF website, just right click it and select, "Open image in new tab")
In this example, I used the average chord length of 1.5m, which is around the average for a Cessna 172. The kinematic viscosity (the fluid’s resistance to flow) was set to that of 20°C air.
As you can see, a Reynolds number of 1.5 million (which I used in my dataset) is equivalent to a Cessna flying at 54 kph. A Cessna’s stall speed is 74 kph.
At the time, I didn’t realise this so I used this ANN for the first version of the Deep Q Network before reworking it and making the ANN able to predict simulation results from multi-Reynold number datasets.
2. Creating the Reinforcement Learning model (Deep Q Network)
Though a Deep Q Network is much more complex than the previous ANNs I created, there are lots of similarities between them. In a Deep Q Network, there are 3 main parts:
1. The Replay Network
This is just like any other ANN (like the previous ones I made) which trains on data the Agent gathers. The thing about that is it needs a ‘correct’ answer, like how the airfoil ANNs needed the correct lift or drag values. So, the Replay Network takes a state that the Agent has already experienced (a state in its memory) and predicts what actions will receive what rewards. (Note the Agent has already picked an action knows what reward it got.) Then it compares it prediction will the actual result. This allows the network to learn how the game works, and learn what actions receive certain rewards.
2. The Target Network
The Target Network provides a secondary reward for the model. This is because actions that are leading up to the reward might not give an actual reward from the environment but are essential for getting that reward in the future. The Target Network is a direct copy of the replay network except it only gets updated every once in a while, to provide stability and prevent reward from spiralling into infinity.
As you can see, the target network only copies the replay network (Q_eval) every 1000 times the learn function is called.
Technically, a Deep Q Network can lack a Target Network and use the Replay Network but the model will not work as well, especially on more complex environments.
3. The Agent
The Agent is the biggest part of the Deep Q Network. The agent is the one that chooses the action, uses the ANNs, write the memory, and does everything else. The agent uses the Replay network to make decisions.
Before I made the airfoil optimization model, I first created a reinforcement learning model for a spaceship game. I followed a tutorial which taught me how to do it. This tutorial had a large limitation however. The limitation was its lack of a target network. This led to very unstable training and a mediocre result in this spaceship game. The spaceship game was a pre-coded environment made by the import gymnasium, a python import which is widely used to train reinforcement learning models. Because the import took care most of the reinforcement learning actions and observations, coding the environment for the spaceship game was extremely simple.
As you can see, the environment script is very small, and the majority of the lines are just defining variables. For the model I was coding for airfoil optimization, the environment was completely from scratch and involved much more code.
Overall, the model from the tutorial plateaued at about a score of 150, but the results were unstable due to the lack of a target network. I decided I would recode this model, except with a target network to see if I could improve the results.
Implementing a target network into the second version of this model was fairly straight forward.
All you need to do is duplicate the eval network and make it so that the target network is the one giving the target value, which goes into the loss function so the main ANN can train. Also, you have to make the target network only update every certain number of episodes. After this large change, the network gained lots of stability and the average score by 100 to around 250. The difference between a score of 100 and 250 is not being able to land to landing perfectly every time. Overall, the implementation of the target network was a large success.
Version 1 of RL model:
I coded a new RL model with 3 hidden layers each with 320 neurons each. I completely modified this model later on, but I will get to that son.
3. Creating the environment
Before coding the first version of the environment, I made a plan:
Create an environment that gives two randomized values (x, y). x is the AoA, and y is the minimum lift.
The model then needs to find the NACA (which can be broken down into 4 numbers) that has the sufficient lift and AoA, as well as having the lowest lift.
Then the model is rewarded using the formula: reward = -(abs(lift_target-c_l)*2) - (c_d*1)
(Note, coefficients can be changed to prioritize differently)
Here is another way to write the reward function (I used https://editor.codecogs.com/ to create the image):
Each episode is 25 actions from the model.
Use the two ANNs for the environment calculations.
The model performed decently, but I realised the model was being punished for having a higher lift than the target. Also, I realised the issue about Reynolds number I talked about earlier. I would need to make some changes to the model.
4. Modifying and improving the network and environment
I created a new 4000-line data set, this time with varying Reynolds numbers from 1.5 million to 10 million. With the new input, I had to modify the two previous ANNs to accommodate with that, and I had to change the environment to give 3 random values instead. I also changed the reward function only to punish the difference between the lift target and lift only if the lift was less than the target.
This actually backfired and the model sometimes ended up being lazy and choosing an airfoil with around a lift coefficient of 1 which it could use for everything.
Also, the model was performing mediocre and I felt it could be improved.
For version 3, I made a lot of changes.
List of changes: changed eps_dec, model size and parameters, reward function, added a feature to only call learn() every 5 times, doubled batch_size, doubled max mem size, decreased learning rate dividing it by 10, added timer.
I didn’t change these all at once however, I first increased the model size, which meant that the learn function was taking a long time, so I decide to only call it every 5 episodes, which lead to another thing and so on. A large difference is the decreased learning rate, which requires the model to train for more time, but have increased results. I also changed the reward function to still punish the airfoils for being too high, just giving it a slight buffer and less punishment.
Reward function:
With the new and improved version 3 completed, I finished the task. Before I share the results, I’ll explain how the environment works.
The environment works by first starting off by picking three values (these would be the values the user picks), the target lift, the AoA (angle of attack), and the Reynolds number. The airfoil that the model starts with is a NACA 0110 airfoil, which is a symmetrical airfoil at 0 degrees of AoA. Then the model can edit one of the 3 numbers in the airfoil (0, 1, or 10) by +1 or -1. Once it chooses that, it simulates the new airfoil using the ANNs I made previously. Based on the results, it gives the agent a reward. The agent does this 100 times (decreasing to 50 times later in training) before a new ‘round’ and new values are created.
I trained the model for 10,000 games and it took 1726 seconds which is just under 30 minutes. The model performed extremely well and I will compare the different models in the analysis section.
I also created the script for the user inputs using the saved model. This means you don’t need a good computer to run the program.
Analysis
Model Comparison
Since the first Deep Q Network didn’t use Reynolds number as an input, I will just be comparing the second and third models. Here are some of the results:
In this example, my inputs were an AoA of 1.3, a lift target of 0.3, and a Reynolds number of 5 million. As you can see, the updated reward function of the newer model keeps the lift much closer to the target, and the other changes help the model to perform better regarding drag.
Here are a graph and table of values comparing the models based off of 6 other tests:
Overall, V3 performed much better then V2, but something really interesting occurred with the inputs, AoA = 2, Lift Target = 0.3, Reynolds = 10e^6. The V3 model performed slightly worse by a very small amount. This was most likely because of the changed reward function of the model, where it gets punished if the lift is too high. Being forced to stick to the lift target might impact the performance slightly, but it is better for practical purposes. These are the airfoils the models generated:
V2: AoA = 2, Lift Target = 0.3, Cl = 0.517…, Cd = 0.00456…
V3: AoA = 2, Lift Target = 0.3, Cl = 0.326…, Cd = 0.00499…
I find it amazing that even though each of the airfoils are very different, they have very similar properties.
High Lift Optimization Test (using V3 model)
I wanted to test the limits of this model by inputting a high lift target and seeing what airfoil it produces, so here are the results:
As you can see, I used an AoA of 10, a target lift coefficient of 2, and a Reynolds number of 5e^6. The model outputted a 9212 airfoil. Before I show what, the airfoil looks like, I did some calculations to help visualize a lift coefficient of 2. Here are calculations:
I used a speed of 50 m/s which is around 5 million Reynolds number. If this airfoil shape was on a Cessna 172, it would produce 48600N or approximately 5000 kg of lift force. Here is actual calculator to check my work (I needed to do the work by hand because the calculator couldn’t work backwards):
This is what the NACA 9211 the model generated looks like:
Now we can actually simulate it in XFOIL to see the true results:
Predicted Cl (from model): 1.929
Actual Cl (from XFOIL): 1.962
Predicted Cd (from model): 0.01086
Actual Cd (from XFOIL): 0.01156
As you can tell from the results, the ANNs used for the environment are extremely accurate and allow the Deep Q Network see the Cl and Cd without actually simulating them.
Low Lift Optimization Test (using V3 model)
Some aerobatic planes like the Extra 330 have symmetrical airfoils to ensure inverted flight characteristics are similar to normal flight. Since the airfoil is symmetrical, it means that it has no lift at an AoA of 0. Though this is an extreme case, many planes have very low lift coefficients at low angles of attack. In this test, I inputted the AoA and lift target as 0, with the Reynolds number at 10 to simulate high speeds. Since the V3 model get punished for the lift being too high, the model will optimize the airfoil to have a Cl of 0.
The negative lift value is only slight calculation error from the ANN because a symmetrical airfoil always has 0 lift. Here is the NACA 0314 airfoil:
I felt like the model wasn’t fully optimizing the drag, so I tried simulating the airfoil myself except with a thickness of 5, not 14.
With the thinner airfoil, the drag did decrease by about 0.009, but I’m not too upset because the airfoil the model generated seems much more practical anyways.
Here is another low lift scenario, this time with the lift target at 0.1 instead of zero.
Overall, the model performed very well in high lift and low lift scenarios and I am quite pleased with the result.
Conclusion
Conclusion In conclusion, I’ve created a Deep Q Network which can instantly generate an optimized airfoil tailored to user inputs. The model performs extremely well in all scenarios it was tested in. This answers the question of "Can Reinforcement Learning Models be trained to instantaneously optimize airfoils based on user input, and how well can they do it?” Applications This model could be used for aircraft manufacturing and design to provide accurate and quick results regarding airfoil performance and optimization, and could also create novel airfoil designs we haven’t thought of. It could also be used to optimize helicopter blades at Reynolds numbers under 10 million. Limitations and Future Extensions One of the biggest limitations of this model is the fact that it can only choose 2D NACA airfoils. A huge improvement would be to change the airfoil generation process to a 3D point cloud where it could edit individual coordinates. Another limitation is that it is only trained on Reynolds numbers under 10 million. There would be many more use cases for this model if it was trained on greater Reynolds numbers. Another thing I would change in the future is moving towards a better simulation software when training the initial ANNs. This could provide more accurate results the Deep Q Network could use to better optimize airfoils. In a future model, it could also be beneficial to use other inputs and outputs like stall speed or pressure to provide more customizability.
Citations
References
CodeCogs, Z. L. (n.d.). Equation Editor for online mathematics. Codecogs.com. Retrieved February 20, 2026, from https://editor.codecogs.com/
Lift Coefficient. (n.d.). vCalc. Retrieved February 20, 2026, from https://www.vcalc.com/wiki/Lift-Coefficient
Codemy.com [@Codemycom]. (n.d.). Create a basic neural network model - deep learning with PyTorch 5 [Video]. Youtube. Retrieved February 20, 2026, from https://www.youtube.com/watch?v=JHWqWIoac2I&list=PL-3CbDd49hveE2z0BI-EPZXyzFA9SFM5S&index=3
Huang, Z. [@ZacharyLLM]. (n.d.). PyTorch in 1 hour [Video]. Youtube. Retrieved February 20, 2026, from https://www.youtube.com/watch?v=r1bquDz5GGA&list=PL-3CbDd49hveE2z0BI-EPZXyzFA9SFM5S
Machine Learning with Phil [@MachineLearningwithPhil]. (n.d.). Deep Q learning is simple with PyTorch | full tutorial 2020 [Video]. Youtube. Retrieved February 20, 2026, from https://www.youtube.com/watch?v=wc-FxNENg9U&list=PL-3CbDd49hveE2z0BI-EPZXyzFA9SFM5S&index=4
Mukherjee, S. (2025, June 6). Vanishing gradient problem in deep learning: Explained. Digitalocean.com; DigitalOcean. https://www.digitalocean.com/community/tutorials/vanishing-gradient-problem
Palt, K. (n.d.). Cessna 172 skyhawk - specifications - technical data / description. Flugzeuginfo.net. Retrieved February 20, 2026, from https://www.flugzeuginfo.net/acdata_php/acdata_cessna172_en.php/
Reynolds number calculator. (n.d.). Airfoiltools.com. Retrieved February 20, 2026, from http://airfoiltools.com/calculator/reynoldsnumber Visualizing models, data, and training with TensorBoard — PyTorch tutorials 2.10.0+cu128 documentation. (2023, January 1). Pytorch.org. https://docs.pytorch.org/tutorials/intermediate/tensorboard_tutorial.html
Winslow, J., Otsuka, H., Govindarajan, B., & Chopra, I. (2018). Basic understanding of airfoil characteristics at low Reynolds numbers (104–105). Journal of Aircraft, 55(3), 1050–1061. https://doi.org/10.2514/1.c034415
Woodford, C. (2011, March 5). How neural networks work - A simple introduction. Explain That Stuff. https://www.explainthatstuff.com/introduction-to-neural-networks.html (N.d.-a). Githubusercontent.com. Retrieved February 20, 2026, from https://raw.githubusercontent.com/Andre-AH/Airfoil-Performance-Analysis/refs/heads/main/airfoil_performance_table.csv/
(N.d.-b). Stackexchange.com. Retrieved February 20, 2026, from https://aviation.stackexchange.com/questions/93116/why-xfoil-predicts-a-lower-drag-coefficient-at-higher-re
Acknowledgement
I would like to thank Karanjot Klair for all his advice through my project, my uncle Eric Doran for his guidance and ideas, my supervisor Dr. Sumner for all her help, as well as my parents for supporting me through this whole process.
