Polynomial Regression
The results from the Social LSTM network prompted us to divide the problem to the smallest level and think of what it is supposed to do. In essence, trajectory prediction comes down to analyzing a sequence to understand a pattern and the best way to use that pattern to extrapolate for further time steps. In this vein, we decided to take a more statistical approach to the problem by attempting a polynomial regression to fit the data in the observation frames in order to predict the trajectory for future frames.
Polynomial regression also requires some tuning of parameters like the degree of the polynomial, length of observation, etc. The preliminary results were very positive and showed great promise and a more intuitive sense of how it can be improved.
We tried different variants of this like a weighted average of different order polynomials to handle a more diverse range of test cases. The results were a significant improvement, particularly to handle the change in direction cases. We also attempted to use the velocity information in order to improve the anticipation of a change in direction of the pedestrian. This was however only slightly better than the previous model.
The added advantage with using polynomial regression is the ease and speed of real-time implementation. For initial unit testing to understand the accuracy of the model, we had a pedestrian walk in front of the LiDAR in a number of paths of different curvatures. The video below shows very promising results based on the error calculated. The green line shows the actual trajectory of the pedestrian and the red line shows the predicted trajectory for 1.2 seconds into the future with a frame rate of 10Hz.
We also experimented with a deep-learning approach to solve this problem to better utilize the temporal information in order to make a better prediction. Deep learning approaches are very difficult to transfer from a dataset situation to the real world. Hence we stuck to the more classic regression-based model explained above. Moreover, we were able to meet our performance requirements hence we stuck to this method. We did however attempt to implement the deep-learning approach and thenĀ
Social LSTM
Deep Learning Framework
Selecting the right framework is important since the complexity of implementation and the speed of computation are crucial to a successful project. We decided to choose PyTorch for our system since it is easier to debug owing to its dynamic graph computation.
Datasets
The algorithm was initially trained and tested on the ETH and UCY datasets which have videos of pedestrians crossing a road or walking on the pavement and have been manually annotated. We also used part of the Stanford Drone Dataset in order to improve the variety of data on which the model is trained on.
Algorithm
Social LSTM is in essence an LSTM which has an extra social pooling layer which takes into account the interaction between the different pedestrians. The input to the prediction module is a list of the pedestrian coordinates in a 2D plane which is updated with the frame rate of the sensor. The system outputs the predicted trajectory of each pedestrian for a predefined length of time.
Results
The results of the neural network have not been very promising based on preliminary results that we have obtained. We attempted to improve the model by tuning some of the hyperparameters based on a discussion with people familiar with the algorithm. The results improved very minimally and prompted us to delve deeper into the model implementation in order to debug it. We also used the Stanford Drone dataset as mentioned previously in order to help the model generalize better. The results of this were also unsatisfactory.