Cyberphysical Architecture
Shown here in the above figure, is the cyber-physical architecture for our project. It shows a system capable of performing closed-loop self-driving demonstrations with the reinforcement learning agent. To achieve this, we have broken down the architecture into 3 major subsystems: User Interface, Simulation, and Full Planner.
A high-level description of the information flow between the subsystems can be found in the function architecture section of the website. This section will give a detailed description of the technologies used in each subsystem.
Simulation
The physics and graphics of the simulator are managed by Unreal Engine. On top of the Unreal Engine, the simulator utilizes CARLA for additional features relating to the self-driving functionalities. Starting from the bottom of the Simulation subsystem in the Cyber-physical architecture, while the PID controller is made in house, the management of other agents is done through the CARLA interface. Next, as the action of all the agents is determined, the simulator triggers the physics rendering of all the agents together through a time synchronizer. Then, the simulator again uses CARLA interface to extract the state of the environment and send it to the Full Planner subsystem. Finally, the CARLA global planner is utilized to determine a high-level route for the Full Planner to follow.
Full Planner
With the given high-level route and environment states, the Full Planner firstly uses a Neural Network Selector to decided which RL Agent is appropriate for this situation. Each RL Agent is in charge of handling a single situation and the design of these Agents adheres to the Double Deep Q-Learning Method. The output of an Agent is a discrete behavior that the vehicle should perform (e.g. accelerate for 0.5 seconds, perform a lane change, etc.). The spline generation algorithm is in charge of converting the behavior decision into a detailed path plan with waypoints. Next, the velocity profile generator converts the path plan into a trajectory plan by filling in each waypoint with the desired speed. Eventually, a pair tracking pose and speed are selected among different pairs in the trajectory based on the time lasted of the current behavior decision.
User Interface
The user interface subsystem is connected directly with the simulator so that it can inform the simulator of the desired destination pose.