Modelling Subsystem


The modeling subsystem provides a model for generating realistic behavior for agents at a road intersection inside the simulation. It tries to learn realistic behaviors from a combination of real-world and hand-crafted simulated data. The model replicates the traffic behavior from the trajectories extracted by the preprocessing subsystem. This model is also known as Optimal policy in the Imitation and Reinforcement Learning domain. It will also provide tunable parameters to observe certain behaviors more often, hence making the simulation platform more suitable for autonomous vehicle testing. The system is not “learning how to drive” rather it is trying to model a traffic scenario at an intersection for testing a self-driving vehicle. Hence we assume that the entire world state is known to the subsystem.

System Implementation Details

Before going into the details of the modeling piece, we define a few terms-

  • Ego Vehicle – A vehicle actor under consideration whose behavior is observed with respect to the environment in a particular episode of the training model.
  • Actor- An entity in the traffic that is dynamic and influences the decision of the vehicle behavioral model. Currently, the following entities are treated as an actor in our system:-
    • Vehicles
    • Pedestrians
    • Traffic Lights
  • Environment – The environment constitutes the following aspects of the simulation:-
    • All actor states except the ego vehicle.
    • Lane Boundaries. This inherently takes into account the location of buildings and static obstacles in the town map.
    • Road area
  • World State- The ego vehicle and the environment state together constitutes the world state.

Conditional Imitation Learning Model

The current learning model is based upon a Conditional Imitation Learning framework provided by Learning-by-cheating (LBC) paper. Here our model is a privileged agent that has access to a map M that contains the entire world-state. We train our privileged using a set of expert demonstrations. It predicts a series of waypoints, based upon which the vehicle computes its steering and throttle commands using a low-level PID controller. The learned model will be a model for a single actor. The simulator will have multiple instances of the models for each actor. For example, all vehicles would be running an instance of the car actor model.

We use a privileged agent of the LBC, which has access to all the privileges information: it can directly observe the environment through a ground-truth Map matrix of dimensions W x H x 7, anchored at the ego vehicles’ current position. The task of the model is to predict K waypoints w = w1, w2, …., wk that the vehicle should travel to. The model also observes the speed of the vehicle and high-level command c. The model has parameterized in such that the convolutional neural network outputs a series of heatmaps, one for each waypoint k and high-level command c. The model then converts the heatmaps to waypoints using a soft-argmax. This representation has the advantage that the input M and the intermediate outputs are in perfect alignment, exploiting the spatial structure of the CNN.

The model is trained using behavior cloning from a set of real-world driven trajectories{τ0,τ1,…}. For each trajectory τi = {(M0,c0,v0,x0,R0),(M1,c1,v1,x1,R1)…} the data collation unit stores ground-truth map Mt, high-level navigation command ct, and the agent’s velocity vt, position xt, and orientation Rt in world coordinates. We generate the ground-truth way-points from future locations of the agent’s vehicle. Given a set of ground-truth trajectories and waypoints, our training objective is to imitate the training trajectories as well as possible, by minimizing theL1distance between the future waypoints and the agent’s predictions. This is also shown in Figure 11. The CNN architecture-specific details could be found in the LBC paper[7].


Finally, our baseline behavioral model is based upon Condition Imitation Learning (CIL) and is capable of demonstrating general traffic behaviors namely

  • Responding to traffic lights
  • Driving within lane boundary
  • Following lead vehicle
  • Collision Avoidance

Below we showcase a rare scenario, wherein the left we can see a car breaking red light in the real world and we have replicated a similar situation in the simulator. We can see that the LBC agent is capable of reacting to the vehicle in front by deaccelerating.


The training was done using simulated data generated inside CARLA. The expert trajectories were generated by the rule-based fine-tuned autopilot of the CARLA simulator. Hence the current model can only imitate a handful of behaviors produced by the autopilot.

Future Work

Firstly, we want to augment the current behavioral model with more complex behaviors such as breaking traffic light, variable distance between leading vehicle, overtaking, etc. We plan to use real-world data from the Data Capture subsystem to fine-tune our CIL based model. Secondly, since there is a possibility of never observing all the behaviors in the collected data, we plan to implement an additional probabilistic rule-based model. It makes our system capable of producing on-demand behaviors which are crucial in successfully testing various aspects of a self-driving car. A prior estimate of the probabilistic distribution parameters will be learned from the real data generated by the Data Capture unit. We will be using behavior trees to control when to use input from CIL based model and when to rely on the rule-based model. In future version, our Modelling sub-system will have two branches as shown in the below figure.

Below are some rule-based behaviors currently being incorporated in our system

  • Distance from Lead Car
  • Lane Following – Lateral Distance
  • Breaking Red Light