GOLD

Multi-Agent AI System for Emergency Vehicle Preemption

Creating traffic controllers that utilize the Shout-Ahead Multi-Agent Architecture and a hybrid reinforcement-evolutionary machine learning algorithm to reduce emergency vehicle response times.
Michael Xu
Grade 11

Problem

Tri-Fold Highlights

EV Preemption: traffic light controllers prioritizing emergency vehicles (EVs)

Current EV Preemption Systems

1. Isolated Instances

  • Traffic controllers at various intersections operate independently

    • Scalable

  • The isolated nature of these systems can be detrimental to overall traffic flow

    • Preemption at one intersection can lead to delays at other ones

    • Short-sighted, so they can only begin preemption when an EV is within close proximity

2. Centralized Networks

  • A central computer operates a network of traffic light controllers

  • Highly effective in EV preemption due to its holistic approach

  • A centralized architecture limits scalability and introduces a single point of failure

    • Limited real-world applicability, especially in dense metropolitan traffic networks

    • Decreased reliability

Therefore, current EV preemption systems are insufficient to meet the needs of urban traffic networks.

Multi-Agent Traffic Management Systems

  • A distributed, cooperative network of communicating traffic light controllers

  • Has been proven to be a highly effective architecture for regular traffic management

    • Similar effectiveness as centralized systems, but still scalable and not vulnerable to a single point of failure

  • Currently, there is minimal research into distributed multi-agent EV preemption systems

  • And little to no research into multi-agent EV preemptions systems that use AI

Research Question: How effective is a distributed multi-agent AI system in reducing emergency vehicle response times and minimizing the delay to regular traffic in comparison to fixed-time and actuated traffic light systems?

 


Conference Paper Style (More Depth)

Artificial intelligence (AI) is designed for resolving large-scale, multi-faceted problems that are unwieldy for humans—and managing traffic control signals to prioritize emergency vehicle (EV) traffic flow (known as EV preemption), is one such problem. An effective EV preemption system is capable of reducing EV response times, which can significantly improve an ambulance patient’s survival rate [1], enhance a police officer’s ability to make an arrest [2], minimize the risk for firefighters that are unable to wear seat-belts due to their bulky gear [3], and broadly, make roads safer for the community [4]. Importantly, an ideal EV preemption system will be able to achieve all of this, while simultaneously minimizing the delay for regular traffic. Thus, research into EV preemption systems is exceptionally necessary.

 

Current EV Preemption Systems

Current EV preemption systems can be classified into two broad categories: isolated instances and centralized systems [5].

1. Isolated Instances

The majority of existing EV preemption systems operate individually, utilizing a wireless sensor network (WSN) at each intersection to detect an oncoming EV and switching to a green light sequence to allow the EV to pass through [5]. The isolated nature of these systems is detrimental to overall traffic flow, as preemption at one intersection could potentially result in tremendous amounts of congestion at other ones, leading to significant delays [6]. Additionally, these isolated systems are oftentimes less effective in preemption, as they can only begin to ease congested traffic when an EV is within close proximity [6].

 

2. Centralized Systems

Other EV preemption systems, such as [7], are connected through a centralized architecture. The route of the EV is predetermined and using GPS tracking, a central computer intelligently (often using AI) communicates with traffic lights to for EV preemption. Although this is effective in preempting EV traffic flow on a small scale [8], this approach has numerous critical drawbacks, namely limited scalability [9] and a central point of failure [10]. Scalability is a vital characteristic of an ideal EV preemption system because urban areas—which are the most in need of intelligent traffic control systems due to their high traffic density—are comprised of vast, intricate networks of roads and as a consequence, traffic signals. Moreover, having a central point of failure is a fatal flaw, as it immensely decreases reliability [11]. For an intelligent EV preemption system to be viable, it must be reliable because if an entire network of traffic lights were to malfunction, catastrophic implications would ensue, wreaking significant havoc to the economy and public health [12].

 

Multi-Agent Traffic Management Systems

Although research into EV preemption systems has been primarily limited to these two approaches, a more diverse range of research has been conducted into general intelligent traffic control systems that are optimized for reducing traffic congestion. For instance, in [13], [14], and [15], a distributed multi-agent system (MAS) approach was used to establish a scalable network of traffic contollers. The authors of these studies concluded that a MAS is highly effective in resolving congestion. More specifically, in [13], France and Ghorbani conclude that disturbances to a traffic network-caused by morning traffic or an accident-are effectively handled by the MAS system. In [14], a reinforcement learning multi-agent structure is determined to be more, or at the very least, equally as effective as actuated traffic controllers in managing small, medium, and large-scale traffic flows. Based on these conclusions, a MAS approach appears promising for preempting EVs. The architecture of a MAS EV preemption system would be far more scalable than current centralized EV preemption systems and may prove to be more effective as well [13]. Despite this potential, there is currently a limited amount of research into EV preemption systems in which each traffic controller acts as an agent within a larger MAS system [16].

Therefore, the existing solutions for EV preemption are insufficient to meet the needs of metropolitan areas and further research, particularly into the use of MASs, is necessary. As such, this research aims to create a more effective, distributed method of prioritizing EV traffic flow by leveraging a multi-agent system (MAS) and a hybrid, layered machine learning technique consisting of reinforcement learning coupled with an evolutionary algorithm.

 

Research Question

How effective is a distributed multi-agent AI system in reducing emergency vehicle response times and minimizing the delay to regular traffic in comparison to actuated traffic light systems?

Method

Tri-Fold Highlights

Shout-Ahead Agent Architecture

Overview

  • Was shown in a recent study to be highly effective in managing regular traffic [15]
    • Promising foundation for creating an EV preemption system

Rule-Based Cooperative Agents

  • Each traffic controller is an independent agent
  • An agent possesses rule sets for responding to the local environment (the intersection) and the cooperation environment (communication with other agents)
    • Rules consist of a condition (made up of predicates), an action, and a weight
  • Agents cooperate with each other through communication
    • Agents will first pick an initial rule from their local environment rule set
    • Then, they communicate that intention (along with attributes of their local environment) with their communication partners
    • Based on the communication received, if a better rule in the cooperation environment rule set exists, then the agent will replace their intended rule

Agent Learning Using A Hybrid Learning Technique

  • Rule weights are updated using reinforcement learning, more specifically, the on-policy SARSA algorithm
    • An agent picks an applicable rule, applies its action, and based on a reward function, the weight of the rule is updated
      • A good outcome increases the weight and a bad one decreases it
      • Higher weighted rules are more likely to be chosen in the future
  • Rule sets are diversified with an evolutionary algorithm
    • Agents have pools of individuals
    • Within a generation, different combinations of individuals are placed in runs together to manage traffic and preeempt for EVs
      • After each run, individuals receive a fitness score depending on how well they performed
      • Runs continue until all individuals have participated in a minimum number of runs
    • At the end of a generattion, evolution occurs
      • Based on the fitness scores, the best individuals survive and breed with each other (by combining and mutating their rule sets)
      • The worst individuals don't make it on to the next generation

Application to Traffic Controller

  • The Python programming language was used to program the architecture using the Simulation of Urban Mobility (SUMO) simulator

Creating an EV Preemption System by Extending the Shout-Ahead Agent Architecture

Local Environment EV Rule Set

  • Is used by the agent to decide what to do at any given instance to preempt for an EV approaching its local intersection
  • Contains rules that have the following predicates relating EVs
    • Traffic density ahead of leading EV
    • Distance of leading EV to intersection
    • Curent Lane of current EV
  • As well as the following predicates relating to regular traffic
    • Longest vehicle wait time
    • Number of cars waiting in queue
    • Time spent in current phase
  • Predicates pertain to the leading EV in order to manage conflicts when more than one EV is approaching an intersecton at one time

Cooperation Environment Rule EV Rule Set

  • Is used by the agent to manage traffic and preempt for EVs at other parts of the agent network
    • Guides how agents should work cooperatively to preempt for EVs
  • Contains rules that have the following predicates relating EVs
    • The time since an EV last past through a partner intersection
    • Whether or not an EV is currently approaching a partner intersection
  • As well as the following predicates relating to regular traffic
    • Time since communication
    • The partner's intended action

EV Reinforcement Learning

  • The reinforcement learning reward function was modified to take into the account the following EV parameters
    • Change in leading EV speed
    • Change in queue length ahead of leading EV
    • Whether or not the leading EV stops
  • As well as the following parameters relating to regular traffic
    • Throughput
    • Queue difference
  • Is used by agents to learn rules that are effective at EV preemption
  • User defined coefficients (factors) are used to regulate the relative importance of each of the parameters

EV Evolutionary Algorithm

  • The evolutionary algorithm fitness function was modified to take into the account the following EV parameters
    • Average EV speed
    • Number of EV stops
  • As well as the following parameters relating to regular traffic
    • Simulation time
    • Aggregate vehicle wait time
  • Is used by agent pools to evolve individuals that are capable of EV preemption
  • User defined coefficients (factors) are used to regulate the relative importance of each of the parameters

Layered Learning 

  • To learn a balance between EV preemption and regular traffic management, a layered learning approach was used
  • First, regular rules were learned using the base shout-ahead agent architecture
    • Includes the local and cooperation regular rule sets
  • Then, EV rules were learned 
    • Includes the local and cooperation EV rules sets 
    • The regular rule sets were used as a fallback during the learning process
      • If there are no EV rules applicable, the agent will fall back on the learned rules to (hopefully) apply an action and change the state so that more rules are applicable

Instantiating the Shout-Ahead EV Preemption System

Simulator 

  • All extensions were programmed using the Python programming language and using the Simulation of Urban Mobility (SUMO) simulator

Training Parameters

  • Rule learning
    • Max rules per rule set: 10
    • Max EV rule predicates: 4
    • Max regular rule predicates: 3
    • Probability of choosing EV predicate over regular one when making a new rule: 0.5
  • Simulation constraints
    • Max simulaton time (gen 0 - 5): 7000 steps
    • Max simulation time (gen 5 - 15): 5000 steps
    • Max simulation time (gen 15 - 50): 4000
  • SARSA reinforcement learning
    • Learning rate (α): 0.5
    • Discount rate (ε): 0.5
  • Reward function
    • Change in EV speed multiplier: 0.1
    • Change in EV queue multiplier: 1
    • Stopped EV penalty: 1
  • Evolutionary algorithm
    • Generations learned: 50
    • Minimum number of runs per individual in a generation: 3
    • Mutation rate: 1/6
    • Breeding rate: 1/3
    • Maximum rule mutations: 1
  • Fitness Function
    • Penalty for each stopped EV: 1
    • Number of EV stops multiplier: 0.001
    • Average EV speed multiplier: 10

Code

 


 

Conference Paper Style (More Depth)

To address the shortcomings of existing isolated and centralized EV preemption architectures, a distributed multi-agent system (MAS) is proposed to create a refined EV preemption system. MASs are a computational technique that is employed to resolve complex problems involving multiple intelligent components, known as agents [17]. Each agent is responsible for solving a local problem that is a subset of the overall goal of the system; through communication and reactions to other agents, the agent network is able to work collectively [10]. MASs are distributed in nature, which means that processes are allocated to numerous agents, rather than a central computer. Since agents operate independent of one another, the network is not susceptible to a single point of failure, which proves advantageous over a centralized network. In this respect, a MAS inherits the scalability of current isolated EV preemption systems. However, since agents are able to communicate with one another, a MAS does not sacrifice the quality of EV preemption that is performed by centralized networks. In essence, a MAS is capable of highly effective EV preemption, while simultaneously being scalable and reliable—making it ideal for large-scale intelligent traffic network applications. 

 

Shout-Ahead Agent Architecture

Overview

This section outlines the state of the shout-ahead agent architecture established in [15] to preface the extensions made by this research project.

To create the proposed MAS EV preemption system, the shout-ahead agent architecture developed by Paskaradevan and Denzinger in 2012 [18] was chosen as a base system to extend upon. In [15], the architecture was adapted to instantiate cooperating traffic light controllers aimed at alleviating general traffic congestion. The experimental results of this study indicate that the shout-ahead agent architecture is more effective than established traffic control algorithms at managing small, medium, and large-scale traffic flows. This conclusion exemplifies the suitability of a MAS for establishing a refined EV preemption system and also serves as a solid foundation for EV extensions.

 

Rule-Based Cooperative Agents

Within a traffic network, each traffic light controller is characterized as an agent within the broader MAS [15]. Each agent consists of two rule sets: one that contains rules regarding the local environment of the intersection (RSlocal) and one that contains rules regarding the cooperation environment (RScoop). Rules consist of a condition, which is made up by a number of predicates, an action, and a weight. A rule is considered applicable if all predicates within its condition are evaluated as true. An agent first chooses an applicable rule from RSlocal and then notifies its communication partners of its intended action, effectively shouting ahead. Based on the intentions received by an agent from its communication partners, an applicable rule from RScoop is then chosen. Among the chosen RSlocal and RScoop rules, the higher weighted rule is applied [15] [18]. If no rules are applicable, the agent remains in its current state for the simulation step.

 

Agent Learning Using A Hybrid Learning Technique

The hybrid machine learning technique consists of two key components. 

Firstly, reinforcement learning is used to provide immediate incentives for traffic light agents to act in a desirable manner throughout a simulation run [15] [19]. After the agent applies the action of a particular rule, a reward function analyzes parameters that indicate a desirable action, such as the throughput at an intersection, and updates the rule weight using the Sarsa algorithm [20]. Rules that result in a better simulation state will earn a higher weight, which increases the probability of it being applied in the future. 

Secondly, the evolutionary algorithm evolves and breeds (with a certain factor of randomness) individuals within agent pools [15] [21]. Since each traffic controller has a distinct action set and communication partners, it has a unique agent pool as well. An agent pool consists of various individuals, who throughout a generation, perform reinforcement learning while participating in simulation runs. At the end of each simulation run, a fitness value for each involved individual is calculated and saved. At the end of each generation, the average fitness of an individual is obtained and is used to guide the evolution process. The best individuals proceed to the next generation and are bred with one another. The process of breeding simply involves creating new RSlocal and RScoop rule sets from the rules sets of two existing individuals. Some individuals are mutated, in which a random number of rules within RSlocal and RScoop have their predicates altered [15] [18].

 

Application to Traffic Controller

To apply the architecture to cooperative traffic controllers, the Simulation of Urban MObility (SUMO) [22] simulator was utilized [15]. SUMO is an open source, portable, microscopic and continuous multi-modal traffic simulation package designed to handle large networks. Traffic lights can be individually controlled through the Python Traffic Control Interface (TraCI) [23], which links to SUMO. 

 

Creating an EV Preemption System by Extending the Shout-Ahead Agent Architecture

Overview

This research project extended upon the base system (described in the previous section) in five key ways. All extensions strive to create an EV preemption system that effectively prioritizes EVs while simultaneously mitigating the disturbance to non-emergency traffic. Nearly every aspect of the base system was modified to accommodate EV preemption functionality for this project. These extensions include implementing:

  1. Local Environment EV Rule Set (RSevlocal)
  2. Cooperation Environment EV Rule Set (RSevcoop
  3. EV Reinforcement Learning
  4. EV Evolutionary Algorithm
  5. Layered Learning

 

Local Environment EV Rule Set (RSevlocal)

In order for the system to create dynamic, adaptive rules that are capable of preempting for EVs, it must be able to identify the state of EVs within the local environment. This was achieved by adding three new EV predicate types, in which the system could create rules with. By knowing the state of the local environment, the system can learn, through reinforcement learning, the best actions to apply under certain circumstances. 

The first of these predicate types was the distance an EV was from an intersection. Depending on how far away an EV is from an intersection, the system should learn different EV preemption behaviours, which was why this predicate was implemented. 

The second predicate type was the traffic density in front of the leading EV. Traffic density is calculated by dividing the queue length (number of vehicles) ahead of the EV by the distance that the EV is from the intersection. As a result, the system is provided the data to learn how to behave according to the amount of EV preemption required by the EV. For instance, the system should behave differently if there are no vehicles ahead of the leading EV (traffic density = 0) compared to when there are several vehicles ahead and they are all densely packed together. 

The final predicate type describes the lane of the leading EV. This parameter directly affects the action that a traffic controller should take because certain actions (e.g. setting the traffic light for the lane that the EV is currently in to green) will be conducive to high EV throughout, whereas other ones (e.g. setting the traffic light for the lane that the EV is currently in to red) will hinder it.

RSevlocal is essentially an exception rule set to RSlocal in the sense that a rule regarding the local environment will normally be picked from RSlocal, but when an EV is approaching an intersection, the system will turn to RSevlocal instead.

Naturally, at any given moment, more than one EV can potentially be approaching an intersection controlled by a shout-ahead traffic controller. To accommodate this, all of the EV predicates within RSevlocal are with respect to the leading EV (i.e. the EV that is the closest distance from the intersection). The leading EV was chosen to be the focus of preemption because it was determined that the closer an EV is to an intersection, the more urgent preemption for it becomes. By this logic, the leading EV requires the most preemption and consequently, it is the focus of the predicates. 

The EV's distance to the intersection and the traffic density ahead of the EV are common predicates across all agent pools. However, the lane of the leading EV predicates are agent pool-specific, as different intersections control different lanes.

Predicates are best described as boolean conditions, which means that they must evaluate as either true or false. Therefore, the raw values of an EV's distance to the intersection and traffic density must be split into ranges, formally known as bins, for them to be used as predicates. As a result, the predicates become whether or not the EV's distance to the intersection and traffic density are within a certain bin. The exact values of the bins is a user-defined parameter. On the other hand, the leading EV lane predicate is far simpler, as it merely evaluates as true if the current lane of the leading EV is strictly equal to the lane defined in the predicate.

Importantly, the rules within RSevlocal do not merely consist of EV predicates. Regular vehicle predicates described in [15] are also included in an effort to guide the system towards learning a balance between EV preemption and managing non-emergency traffic. There exists a parameter that determines the probability of choosing an EV predicate compared to a regular vehicle predicate, which is user-defined. Similar to the structure of RSlocal described in [15], other user-defined parameters include the maximum number of predicates per rule and the maximum number of rules in the rule set.

 

Cooperation Environment EV Rule Set (RSevcoop)

Note that the predicates for the cooperation environment do not deal with the state of an agent's local intersection; but rather, that of its communication partners. Since each traffic light controller agent is at a unique position within the traffic network, its communication partners are unique as well. As a result, the predicates for the cooperation environment differs for each agent pool. However, the same predicate types are mutual among all agent pools. For traffic light controller agents to learn the best actions to perform under certain conditions within the cooperation environment, two new cooperation EV predicate types were added for rules in RSevcoop.

The first cooperation EV predicate type was the time since the last EV passed through a communication partner's intersection. This value is indicative of whether or not an intersection should be expecting an EV to be approaching soon, as unless the EV reached its destination, once an EV passes through an intersection, it is bound to be on its way to another one. Through knowing when an EV last passed through a communication partner's intersection, an agent is able to apply actions accordingly.

The second type of cooperation EV predicate was whether or not an EV is currently approaching a communication partner's intersection. Through knowing the value of this predicate, the system is able to learn how to help other intersections preempt for EVs.

In a similar manner to RSevlocal, RSevcoop is an exception rule set to RScoop. Under ordinary circumstances, a rule regarding the cooperation environment will be chosen from RScoop, but when an EV is approaching any intersection, the system will choose rules from RSevcoop.

Mimicking RSevlocal, the time since the last EV passed through a communication partner's intersection predicate is binned into certain ranges to construct predicates and RSevcoop rules may also contain RScoop predicates, so as to learn a balance between EV preemption and non-emergency traffic management.

 

EV Reinforcement Learning

In order for the system to learn desirable behaviour (effective EV preemption), the reward function for the reinforcement learning of rules was updated to take into account three EV parameters. Changes to these parameters determine the magnitude and polarity of the reward that is given to the applied rule, which is used to update its weight according to the SARSA reinforcement learning technique [24]. Rules that result in a better state after their application will receive a greater positive reward, while rules resulting in a worse state will receive a smaller positive reward. Penalties subtract from the reward.

The first parameter was the speed of the leading EV. An effective EV preemption system is able to reduce the response times of EVs; a precise predictor of this is the speed of the leading EV. A rule applied that increases the speed of the leading EV will receive a positive reward, as the rule is considered to be conducive towards achieving the objective of the system. The exact value of the reward is the raw change in speed multiplied by a user-defined factor.

The second parameter was the queue length ahead of the EV. The primary disturbance to EV traffic flow is traffic congestion ahead of EVs. For this project, the amount of congestion is modeled by the queue length ahead of the EV. If a traffic controller applies a rule that reduces the queue length ahead of the EV, it will receive a greater reward, as it is effectively helping to reduce EV response times. The exact value of the reward is the raw change in queue length ahead of the EV multiplied by a user-defined factor.

The third parameter checks whether or not an EV stops at an intersection after the application of the rule. If an EV is stopped at the intersection, the rule applied will receive a penalty, as stopped EVs are indicative of a poor EV preemption system. The exact value of the penalty is a user-defined static value multiplied by a user-defined factor.

To allow for the fine-tuning of rewards, each parameter is multiplied by a user-defined factor, which can be used to change the range of certain parameters to match others, or to place emphasis on certain parameters over other ones.  

 

EV Evolutionary Algorithm

In order to guide the evolutionary algorithm towards creating agent pools that are effective at EV preemption, an additional EV component—consisting of two parameters—was added to the fitness function. The EV fitness function is applied after the initial fitness calculation defined in [15], which uses parameters that are aimed at managing regular traffic flow. The two parameters in the EV fitness function include the average speed of EVs and the number of EV stops throughout the simulation run.

At each simulation step, the average speed of all EVs (not just the leading one) approaching the individual traffic controller agent is calculated and stored in a list. Subsequently, at the end of the simulation run, the total average speed of EVs for the individual is calculated by taking the average of the values stored in the list of speeds. A higher average EV speed is desirable, so this parameter is multiplied by a positive user-defined factor before being added to the total fitness value. 

The number of EV stops throughout a simulation refers to the number of EVs that were stationary (stopped and waiting in a queue) at an intersection for every simulation step. That is to say, this value does not refer to the number of instances when an EV went from moving to a stationary state; but rather, the total number of EVs that were stopped across all simulation steps. The intent was to punish an individual for the time in which an EV was stopped, as in the case where two individuals both caused the same number of EVs to reach a stationary state from a moving one, the individual that kept the EV stopped for the longest period of time should receive the greatest deduction to their fitness. An effective EV preemption system ideally reduces the number of EV stops, so this parameter is multiplied by a negative user-defined factor to effectively punish individuals with a greater number of EV stops.

The average speed of EVs and number of EV stops were chosen to guide the evolutionary algorithm because they are excellent indicators of an effective EV preemption system, which can inferred from the fact that various other studies, such as [16], used these two parameters to evaluate the effectiveness of their EV preemption systems.

Other parameters, such as the total number of generations (50) were not modified from the base system in [15].

 

Layered Learning

Modeled after Stone's layered learning paradigm [25], EV preemption by the system is learned after regular traffic management is learned. That is, the rule sets RSevlocal and RSevcoop are learned on top of established RSlocal and RScoop rule sets. This learning technique is exceptionally well suited in this application area because it facilitates the learning of distinct sub-tasks, which accumulate towards a larger task [25]. In the context of EV preemption systems, the two primary objectives are to prioritize EV flow and also manage regular traffic. Since these objectives are distinct and oftentimes contradictory to one another, a layered learning approach was chosen.

First, the RSlocal and RScoop rule sets were learned using the base system established in [15]. After the learning finished, the RSlocal and RScoop rule sets of the individual with the best fitness value within the last 5 generations of the learning were saved.

Then, using the RSlocal and RScoop rule sets of the best individuals learned using the base system, EV preemption learning subsequently ensues. Since the RSlocal and RScoop rule sets are not learned during this stage, reinforcement learning is not performed on the rules within this rule set and these rule sets are not modified by the evolutionary algorithm either. As such, the RSlocal and RScoop rule sets remain static for all individuals throughout the entire process.

The rule selection process during EV preemption learning is as follows: If there is an EV approaching, the agent first finds the best applicable rule from RSevlocal using a probabilistic process (in order to balance exploration and exploitation). It then communicates it's intended EV rule to its communication partners. Using the received intentions, the agent then chooses an applicable cooperation rule from RSevcoop using the same probabilistic process used for RSevlocal. The higher weighted rule between the local and cooperation rule is then applied and reinforcement learning is performed on it.

If an EV is not approaching the intersection or no rules from both RSevlocal and RSevcoop are applicable, then the agent will select the best rule from RSlocal and RScoop (static rules). Since the rules within RSlocal and RScoop are not learned, the agent can simply exploit the best rule, which means that the selection process is not probabilistic.

 

Instantiating the Shout-Ahead EV Preemption System

Simulator

The EV preemption system was directly extended upon the base system, so the Simulation of Urban MObility (SUMO) [22] simulator was utilized [15]. All extensions were created using the Python Traffic Control Interface (TraCI) [23], which links to SUMO.

 

Training Parameters

For the learning of the rules, a particular set of user-defined parameters were chosen. The maximum number of rules per rule set was limited to 10, so as to not create an overwhelming search space that results in astronomical learning times. The rules within RSlevocal and RSevcoop contained EV predicates as well as regular traffic predicates created in [15] in order to balance EV preemption with minimizing the delay to regular traffic. Since rules within the EV rule sets have more possible predicate types, they were allowed to have a maximum of 4 predicates compared to the 3 predictates for the regular rule sets. When constructing rules within RSevlocal and RSevcoop, the probability of choosing an EV predicate over a regular traffic predicate was 0.5.

During the learning process, some individuals are expected to not be able to successfully allow all vehicles to reach their destinations (i.e. get stuck in a certain state), so the maximum simulation time was capped at 7000 steps. As generations progressed, the fitness of individuals was expected to improve, so the maximum simulation time was reduced to 5000 steps after generation 5 and then again to 4000 steps after generation 15.

For the SARSA reinforcement learning, the learning rate (alpha) hyperparameter was set to 0.5 (a moderate learning rate) and the discount rate hyperparameter was set 0.5 (a moderate valuation of future rewards). The change in EV speed factor was set to 0.1 in order to limit the values, so that other reinforcement learning parameters are not neglected. The EV change in queue factor was set to 1. A penalty of 1 was set to be applied for a stopped EV.

For the evolutionary algorithm, 50 generations were learned and before the next generation, each individual within an agent pool was required to take part in at least 3 simulation runs. Each agent pool contained a maximum of 30 individuals. When advancing from one generation to the next, 1/6 individuals were mutated, 1/3 individuals in a generation were bred and each rule was mutated a maximum of 1 time.

For the fitness function of the evolutionary algorithm, a penalty of 1 was applied to an individual for every stopped EV. The number of EV stops was reduced by a factor of 1000 in order to normalize its value in relation to the other parameters. The average EV speed factor was multiplied by a factor of 10 in order to bring its range closer to that of the regular traffic parameters within the fitness function.

 

Code

The source code for this project can be found on github at https://github.com/themicklepickle/shout-ahead-EV-preemption.

Analysis

Tri-Fold Highlights

Learning Results

Fig. 1: Top Individual Fitness vs. Generation Number
  • Above graph depicts the top individual fitness over each generation
    • The higher the value, the better
  • Positive slope of trendline in graph indicates that agents are becoming better at EV preemption over time
    • Validates the AI aspect of this project
  • The top individual fitness is not consistently upward due to the random nature of the evolutionary algorithm
    • Mutations during breeding are entirely random, so it is entirely possible for the fitnesses to peak and then level off

Performance Results

System Configurations

  • Local EV + Coop EV: complete multi-agent AI EV premption system
    • EV and regular rule sets
    • Trained for EV preemption and regular traffic management
  • Local EV: multi-agent AI EV preemption system without cooperation
    • EV and regular local rule sets
    • Trained for EV preemption and regular traffic management
  • Local Regular + Coop Regular: base traffic management system
    • Only regular rule sets
    • Trained for regular traffic management
  • SUMO ATL: actuated traffic light system built-in to SUMO
    • Manages traffic based on the queue length at each intersection

Evaluation Methodology

  • Tested each system 50 times on varying traffic flows
    • All of these flows contained 225 vehicles in total and 36 of which were EVs
  • The confidence intervals in the graphs were made such that there was a 99% confidence level

Performance Metrics

  • Average EV Speed (Fig. 2): mean EV speed throughout the simulation run
    • A key indicator of EV preemption effectiveness
    • The higher the better
  • Total Number of EV Stops (Fig. 3): total period of time in which there were EVs stopped
    • A key indicator EV preemption effectiveness
    • The lower the better
  • SImulation Time (Fig. 4)
    • Indicates the disturbance to regular traffic
    • The lower the better

Graphs

Fig. 2: Average Emergency Vehicle Speed for Various System Configurations
Fig. 3: Total Number of EV Stops for Various System Configurations
Fig. 4: Simulation Time for Various System Configurations

Tables

Fig. 5: Simulation Time, EV Stops and Average EV Speed for Various System Configurations

System Configuration Simulation Time (s) Number of EV Stops Average EV Speed] (km/h)
Local EV + Coop EV 1235 887 3.530
Local EV 2000 2468 0.792
Local Regular + Coop Regular 1484 3873 1.970
SUMO ATL 1687 3582 2.080

 

Analysis

  • There is quite a clear distinction between each of the system configuratios, as there was little to no overlap of the confidence intervals in each of the graphs
  • The complete multi-agent AI EV preemption system was outperformed all of the other systems
    • EV preemption
      • Best average EV speed
      • Best number of EV stops
    • Minimize the disturbance to regular traffic
      • Best simulation time
  • Agent cooperation is shown to be extremely important, as the EV preemption system without agent cooperation overall performed the worst out of all systems (mixed results for EV preemption, but definitive results for the disturbance to regular traffic)
    • EV preemption
      • Worst average EV speed
      • Second best number of EV stops
    • Minimize the disturbance to regular traffic
      • Worst simulation time

 


Conference Paper Style (More Depth)

Learning Results

Below, Fig. 1 illustrates the progression of the sum of the top fitness values of agent pools as they progressed across generations. An agent pool contains numerous individuals, each with a fitness value obtained by participating in simulation runs. The fitness function consists of regular traffic parameters, including the aggregate vehicle wait time and longest vehicle wait time, as well as EV traffic parameters, including the average EV speed and the number of EV stops. Therefore, the top fitness value refers to the fitness of the best individual within a generation for a particular agent pool.

The positive slope of the trendline indicates that as generations progress, individuals become better at EV preemption and minimizing the disturbance to regular traffic. This validates the AI component of the system, as it allows the system to become better over time.

 

Performance Results

To answer the guiding research question, the performance of the multi-agent AI EV preemption system was compared against several other variations of EV preemption systems. Three metrics were used to determine the effectiveness of the EV preemption systems in achieving the objectives of optimizing EV flow and minimizing the disturbance to regular traffic.

Each system was evaluated on the same traffic flow of 225 total vehicles (36 of the vehicles were EVs) within the SUMO [22] urban mobility simulator. All vehicles had randomly generated start times, starting positions, final positions, and routes.

The first metric was the simulation time (in seconds), which is the amount of time it took for a particular traffic system configuration to allow all vehicles to arrive at their final destination from their starting position.  Since the simulation time describes the overall ability of the traffic light controllers to manage traffic, this metric was used to gauge the disturbance to non-emergency traffic. A system with a relatively low simulation time is a system that has a minimal disturbance to non-emergency traffic.

The second metric was the number of EV stops, which is calculated in the same manner as within the fitness function of the evolutionary algorithm. At every simulation step, all EVs approaching an intersection are checked to see if they are stopped. The number of EV stops at that step is added to the total number of EV stops. As such, this value does not indicate the instances when an EV transitioned from moving to a stationary state; more accurately, it reflects the amount of time EVs spent waiting at an intersection. This metric directly indicates the effectiveness of EV preemption, as a primary indicator of effective EV prioritization is the fluid flow of EVs. As such, the lower the number of EV stops, the more effective the EV preemption system is at optimizing EV flow.

The third metric was the average EV speed, which represents the average speed of all EVs throughout the entire simulation. In a similar manner to the number of EV stops, at each simulation step, the average speed among all EVs approaching an intersection is determined and stored in a list of EV speeds. At the end of the simulation run, the average out of all of the EV speeds stored for each simulation step is calculated to obtain the final average EV speed value. A key indication of a reduction in EV response times is an increase in the speed of EVs, so the higher this metric is, the better the system is at EV preemption.

Alongside the complete multi-agent AI preemption system (described in the methods section), three additional system configurations were evaluated. 

The SUMO ATL system configuration is an algorithm that is built into SUMO and it is the most trivial out of all the other configurations. It operates by giving right of way to one direction until a maximum phase time is reached or there is a large gap in the traffic flow, whichever occurs first [22]. If an emergency vehicle approaches the intersection, the system will automatically change to a phase that allows the leading EV to pass through the intersection. Since this is the simplest system, it acts as the negative control. Most EV preemption systems in the real-world operate in this manner, so for another system to be considered effective, it needs to have better results than this one. 

The base system (prior to the extensions made by this project and displayed as “Local Regular + Coop Regular”) is also evaluated to gauge the difference made by the various modifications outlined in the methods section. There are two rule sets used in this system configuration. One of which contains rules with predicates that provide observations about regular trafffic within the local envrionment. The other rule set contains rules with predicates that provide observations about the intended actions of communitcations partners. Neither of these rule sets contain rules with EV predicates. Note that the evaluation of this system is testing the already learned rules; the system is not learning as it is being evaluated. The rules used during the evaluation of this system were retrieved from the best individuals in the 49th generation, as it was in this generation that the top individual fitnesses peaked.

Next, a modified version of the multi-agent EV preemption system was tested (shown as “Local EV”). The shout-ahead aspect of the agent architecture was disabled, so that only the local EV rule set (RSevlocal) was learned. As a result, agents in this system configuration could only retrieve information from their immediate local environment (i.e. they were made unaware of the state of their communication partners). This configuration closely mimics isolated instance EV preemption systems, which were described in the problem section.

Lastly, the complete multi-agent EV preemption system was evaluated, consisting of RSevlocal as well as RSevcoop (represented as “Local EV + Coop EV”).

The results for the average EV speed, number of EV stops, and simulation time for each of the system configurations can be found in Fig. 2, Fig. 3, and Fig. 4, respectively.

From Fig. 2, Fig. 3, and Fig. 4, it can be observed that the complete multi-agent EV preemption system is the most effective in optimizing EV flow (as it has the lowest number of EV stops and the highest average EV speed) and minimizing the delay to regular traffic (as it has the lowest simulation time). Moreover, the multi-agent EV preemption system is better by a significant margin, as in comparison to SUMO ATL (the negative control group), its simulation time was 27% lower, the number of EV stops was 75% lower, and the average EV speed was 70% higher.

The system configuration with only local EV predicates had the worst average EV speed and highest simulation time, which indicates that the cooperation aspect of the multi-agent architecture is key for effective EV preemption as well as minimizing the delay to regular traffic. In [15], a similar result was found. For managing regular traffic flow, if the shout-ahead aspect of the traffic controller system was disabled, it performed significantly worse than the system with shout-ahead enabled [15].

From these results, it is evident that a multi-agent preemption system is an exceptionally effective approach for establishing an EV preemption system. It is far better at preemption than an isolated version of the same AI system. At the same time, since it is a distributed, cooperative agent network, it does not suffer from the limited scalability and single point of failure that centralized systems do.

It would have ideal to run a comparison between the multi-agent preemption system and a state-of-the-art centralized system as another metric to gauge the effectiveness of a multi-agent approach. However, there were a few reasons as to why this comparison was not made. Firstly, state-of-the-art centralized systems are immensely complex, so implementing them to work in SUMO [22] would have required a significant amount of time, which unfortunately, was not available for this project. Secondly, even if a centralized system was implemented, the implementation may not have been entirely accurate, tainting the results of the comparison. Thirdly, result values from other studies could not have been simply because they used a different EV traffic flow and traffic network configuration and these aspects significantly affect the evaluation performance [15].

In [16], a similar multi-agent EV preemption system was implemented, except traffic controller agents did not use machine learning to learn rules. Instead, they relied on the static Longest Queue First – Maximal Weight Matching (LQF-MWM) algorithm. Louati et al. concluded that the multi-agent approach was was more effective in improving the speed of EVs and reducing the number of EV stops than non-cooperating static and dynamic algortithms. The results of this research project echo the results in [16]. However, without a direct comparison, it can not be concluded as to whether or not the hybrid, layered machine learning technique is superior over the LQF-MWM algorithm.

It can also be observed that the base system performed similar to SUMO ATL in respect to optimizing EV traffic flow, but was better in minimizing the disturbance to regular traffic. These results coincide with the experimental results in [15], in which it was determined that the shout-ahead agent architecture was able to create effective cooperative traffic controllers capable of reducing the simulation time. However, since EV traffic is not prioritized in the base configuration, it is unsurprising that it performed slightly worse than SUMO ATL in the two EV preemption metrics.

All in all, the results of this experiment align with other similar studies that indicate that a distribute multi-agent AI traffic controller system is effective at both EV preemption and minimizing the delay to non-emergency traffic.

Conclusion

Tri-Fold Highlights

Summary

  • Shout-ahead agent architecture was exteneded to create a distributed multi-agent EV preemption system
    • Two additional emergency vehicle rule sets were added
      • One pertains to a traffic light’s local environment
      • The other referring to the agent-cooperation environment
    • These rule sets were learned using a hybrid machine learning technique
      • Reinforcement learning
        • Used to update the weights of rules throughout a simulation run
      • Evolutionary learning
        • Employed to evolve agent pools, effectively generating new rule sets
    • To learn a balance between EV preemption and minimizing the impact to regular traffic, the EV rule sets were learned using a layered learning technique
      • EV rules sets were learned on top of established rules aimed at easing general traffic congestion

Significance

  • This research provides insights that can help reduce EV response times, allowing for everyone in the community to benefit
    • Reducing EV response times has been shown to increase a patient's survival rate and improve a police officer's ability to make an arrest
    • By minimizing the impact on regular traffic, productivity improves and environmental degradation is lessened

Future Work

  • Investigate the how the number of iterations of the layered learning can affect system quality
    • Learn regular rules on top of EV rules
  • Study the impact that various user-defined parameters have on the learned rules
  • Test and learn the system on more maps and traffic flows
  • Create a multi-agent system of centralized traffic controllers.

 


 

Conference Paper Style (More Depth)

Summary

The shout-ahead agent architecture was extended to create a distributed multi-agent EV preemption system. Two additional rule sets, one pertaining to a traffic light’s local environment and the other referring to the agent-cooperation environment, containing predicates describing the state of EVs were added. These rule sets were learned using a hybrid machine learning technique with reinforcement learning being used to update the weights of rules throughout a simulation run and evolutionary learning employed to evolve agent pools, effectively generating new rule sets. To learn a balance between EV preemption and minimizing the impact to regular traffic, the EV rule sets were learned using a layered learning technique on top of established rules aimed at easing general traffic congestion.

The experimental results indicate that a distributed multi-agent AI EV preemption system is significantly better than an actuated traffic light system at both optimizing EV flow (reducing the number of EV stops and increasing the average speed) as well as minimizing the disturbance to regular traffic (lowering the simulation time). The results further indicate that cooperation between agents is key to effective EV preemption, as mere local EV predicates without communication result in worse preemption than actuated traffic light systems.

 

Significance

Effective EV preemption systems are capable of significantly improving public safety, as reducing EV response times allows ambulances to reach a patient in a shorter span of time, which can improve a patient’s survival rate [1]. Firefighters will also be able to arrive on the scene of the fire faster, which enables them to better do their jobs, which has the potential to save countless lives. By expediting the movement of police cruisers, police officers are able to keep communities safer, as reducing their response time improves their ability to make an arrest [2]. Furthermore, EVs will be able to reach their destination in a far safer manner, as they will travel through less congested traffic, which also makes roads safer for the entire community [3]. Considering that EVs often travel at high speeds, allowing EVs to travel through uncongested traffic is especially important. 

A study conducted by Harriet, et al. [26] concludes that an average of 9% of productive hours a day are lost due to heavy traffic congestion. Therefore, an EV preemption system that has a minimally negative impact on regular traffic has tremendous implications for the economy. 

Environmentally, less congested roads means higher vehicular efficiency, which reduces the amount of harmful pollutants emitted by cars [27]. Although various measures have been taken to reduce air pollution, including Li-ion battery powered electric cars [28], there are still numerous barriers preventing widespread electric car adoption [29]. As a result, the prime method for improving the environmental sustainability of urban traffic networks is to reduce traffic congestion, which can be achieved by implementing an effective EV preemption system.

 

Future Work

Future work includes investigating the impact that more iterations of the layered learning process has on the effectiveness of an EV preemption system as well as whether or not first learning EV rules or first learning regular rules plays a role in learning quality. Additionally, more experimentation is expected to be completed using different user-defined parameters, such as the bins of the predicates, the weight factor of EV parameters in the reward and fitness functions, and the maximum number of rules per rule set. Additionally, it is planned to test the system on different sized traffic flows to evaluate the generality and adaptiveness of learned rules.

Citations

 [1]  M. Badjonski, M. Ivanovic, and Z. Budimac, “Agent oriented programming language lass,” Object-Oriented Technology and Computing Systems Re-engineering, p. 111–121, Mar 2014.

  [2]  Pate, T. Ferrara, A. Bowers, and J. Lorence, “Police response time: Its determinants and effects,” 1976.

  [3]  U. D. of Health, H. Services et al., “Strategic plan, fiscal years 2010–2015,” Washington, DC: US Department of Health and Human Services, 2010.

  [4]  H. Hsiao, J. Chang, and P. Simeonov, “Preventing emergency vehicle crashes: status and challenges of human factors issues,” Human factors, vol. 60, no. 7, pp. 1048–1072, 2018.

  [5]  K. Nellore and G. P. Hancke, “Traffic management for emergency vehicle priority based on visual sensing,” Sensors, vol. 16, no. 11, p. 1892, 2016.

  [6]  D. Bullock and E. Nelson, “Impact evaluation of emergency vehicle preemption on signalized corridor operation,” in TRB Annual Meeting, Transportation Research Board, Washington, DC, vol. 2, 2000.

  [7]  J. R. Bycraft, “Green light preemption of traffic signals for emergency vehicles richmond, british columbia’s approach,” City of Richmond Traffic Signal Control System, 2013.

  [8]  H. R. Al-Zoubi, B. A. Mohammad, S. Z. Shatnawi, and A. I. Kalaf, “A simple and efficient traffic light preemption by emergency vehicles using cellular phone wireless control,” in Proceedings of the 13th WSEAS international conference on Mathematical and computational methods in science and engineering. World Scientific and Engineering Academy and Society (WSEAS), 2011, pp. 167–170.

  [9]  P. Grandinetti, C. Canudas-de Wit, and F. Garin, “Distributed optimal traffic lights design for large-scale urban networks,” IEEE Transactions on Control Systems Technology, vol. 27, no. 3, pp. 950–963, 2018.

[10]  C. Kray, “The benefits of multi-agent systems in spatial reasoning.” in FLAIRS Conference, 2001, pp. 552–556.

[11]  N. Mc Donnell, E. Howley, and J. Duggan, “Dynamic virtual machine consolidation using a multi-agent system to optimise energy efficiency in cloud computing,” Future Generation Computer Systems, vol. 108, pp. 288–301, 2020.

[12]  M. Sweet, “Traffic congestion’s economic impacts: Evidence from us metropolitan regions,” Urban Studies, vol. 51, no. 10, pp. 2088–2110, 2014.

[13]  J. France and A. A. Ghorbani, “A multiagent system for optimizing urban traffic,” in IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003. IEEE, 2003, pp. 411–414.

[14]  D. Houli, L. Zhiheng, and Z. Yi, “Multi Objective Reinforcement Learning For Traffic Signal Control Using Vehicular Ad Hoc Network,” EURASIP Journal on Advances in Signal Processing, vol. 2010, no. 1, p. 724035, 2010. [Online]. Available: https://doi.org/10.1155/2010/724035

[15]  C. Roatis and J. Denzinger, “Extending the learning shout-ahead architecture with user-defined exception rules – a case study for traffic light controls,” in 2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI), 2020, pp. 9–16.

[16]  A. Louati, S. Elkosantini, S. Darmoul, and H. Louati, “Multi-agent preemptive longest queue first system to manage the crossing of emergency vehicles at interrupted intersections,” European Transport Research Review, vol. 10, no. 2, p. 52, 2018.

[17]  K. P. Sycara, “Multiagent systems,” AI magazine, vol. 19, no. 2, pp. 79–79,1998.

[18]  S. Paskaradevan and J. Denzinger, “A hybrid cooperative behavior learning method for a rule-based shout-ahead architecture,” in 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 2. IEEE, 2012, pp. 266–273.

[19]  R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.

[20]  R. S. Sutton, “Learning to predict by the methods of temporal differences,” in MACHINE LEARNING. Kluwer Academic Publishers, 1988, pp. 9–44

[21] K. Deb, Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, 2001, vol. 16.

[22]  P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P.Flötteröd, R. Hilbrich, L. L ̈ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018. [Online]. Available: https://elib.dlr.de/124092/

[23]  A. Wegener, M. Pi ́orkowski, M. Raya, H. Hellbr ̈uck, S. Fischer, and J.-P. Hubaux, “Traci: An interface for coupling road traffic and network simulators,” in Proceedings of the 11th Communications and Networking Simulation Symposium, ser. CNS ’08. New York, NY, USA: Association for Computing Machinery, 2008, p. 155–163. [Online]. Available: https://doi.org/10.1145/1400713.1400740

[24]  N. Sprague and D. Ballard, “Multiple-goal reinforcement learning with modular sarsa (o),” in Proceedings of the 18th International Joint Conference on Artificial Intelligence, ser. IJCAI ’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, p. 1445–1447.

[25]  P. Stone and M. Veloso, “Layered learning,” in Machine Learning: ECML2000 (Proceedings of the Eleventh European Conference on Machine Learning), R. L. de M ́antaras and E. Plaza, Eds.Barcelona,Catalonia,Spain:Springer Verlag, May/June 2000, pp. 369–381.

[26]  T. Harriet, K. Poku, and A. K. Emmanuel, “An assessment of traffic congestion and its effect on productivity in urban ghana,” International Journal of Business and Social Science, vol. 4, no. 3, 2013.

[27]  C. Dobre, “Using intelligent traffic lights to reduce vehicle emissions,” Int J Innov Comput Inf Control, vol. 8, no. 9, 2012.

[28]  D. A. Notter, M. Gauch, R. Widmer, P. Wager, A. Stamp, March 4, 2021 R. Zah, and H.-J. Althaus, “Contribution of li-ion batteries to the environmental impact of electric vehicles,” 2010.

[29]  O. Egbue and S. Long, “Barriers to widespread adoption of elec-tric vehicles: An analysis of consumer attitudes and perceptions,” Energy policy, vol. 48, pp. 717–729, 2012.

Acknowledgement

I would like to express my sincerest gratitude to all the mentors that have helped me complete this project and taught me everything that I know about scientific research. Thank you Dr. Denzinger for introducing to me the shout-ahead agent architecture and for always guiding me in the right direction. Christian, I truly appreciate your support in helping me navigate SUMO, walking me through your implementation of SATLO, and for your constant reassurance. Thank you Dr. Garcia for being an excellent teacher! I have learned so much this year through working on this project in your class; I have developed skills that I will carry with me for the rest of my life. And of course, thank you very much Ms. Gierus for providing us all this opportunity, we would not have been able to do any of this without your hardwork and dedication.