Development of object detection system for autonomous vehicles by using LiDAR technology
Author: Peter Fekonja
Mentor: Matej Rojc
Degree: Level 2 study program Electrical Engineering, course Electronics
Date: Februar, 2021
DKUM: PETER FEKONJA
Author: Peter Fekonja
Mentor: Matej Rojc
Degree: Level 2 study program Electrical Engineering, course Electronics
Date: Februar, 2021
DKUM: PETER FEKONJA
Abstract
In the master’s thesis we present the use of LiDAR systems and deep learning in the context of autonomous vehicles. The thesis consists of theoritecal and experimental parts. In the theoretical part, we present the state-of-the-art solutions for development of LiDAR systems, the most commonly used approaches to deep learning and the methods used to process LiDAR point clouds using neural networks. We also present the current sensor systems, used on the current generation of autonomous vehicles, databases targeted towards neural networks for autonomous vehicle use and the current generation of low-cost LiDAR sensors. In the experimental part, we give a detailed presentation of the capabilities of the Livox Mid-40 LiDAR system and its use in our solution to object detection in traffic. We also show, in detail, the developement of our neural network for use as a classifier, the developement of our own approach to object localization and the comparison of our solutions with existing approaches. Our object localization approach achieved similar or better results than those of existing methods, but, in conjunction with our classifier, achieved worse results than current end-to-end neural network models that use transfer learning.
The problem of environmental perception today is solved by using various sensors such as video cameras, LiDAR sensors, radars and ultrasonic sensors. In an autonomous vehicle, we can decide for one or more of the listed sensors, which operate separately or together. In order to enable an autonomous vehicle to reliably manage the environment around it, it is necessary to process the captured data first and understand that we can then make decisions based on them. Environmental data is processed using conventional approaches, and more and more recently using deep learning. Deep learning has already proven to be a successful and innovative approach in several areas to solve problems that in the past seemed too complex for computer systems (such as chess). In the master’s thesis, we are particularly interested in the operation and use of LiDAR technology – specifically Livox Mid-40 sensor – as well as the process and techniques of deep learning on such sensor data. LiDAR sensors have a key advantage over cameras in that they are active sensors. This means that they have no restrictions due to changes in ambient light and can operate smoothly both day and night, and that their 3-D output is a representation of the observed environment. The additional dimension of depth represents a very important added value in terms of awareness and understanding of the environment. Compared to radar and ultrasonic sensors, LiDAR has the important advantage of seeing the shape of the observed object and not only its presence in the environment.
LiDAR is a method for measuring distance that works by illuminating a target with a laser beam and measuring the reflection with a sensor. Based on the speed of light in the empty space, which is constant, and the use of a very accurate timer, we can measure the distance from the sensor to the target with very high accuracy. By using a large number of lasers and sensors, or by reorienting the direction of a single beam, we can measure distances to a large number of points on surfaces in space in a very short time. If these points are accumulated properly, we can compose a 3D representation of the target in digital format from them. LiDAR is widely used to make high-resolution surface maps, 3D models of historic buildings and sites, for measurements in construction and geodesy, and more recently in automotive and even mobile devices. The development of low-cost LiDAR devices is in full swing, as these sensors are likely to be used on virtually all autonomous vehicles in the near future. These devices are one of the major obstacles to the adaptation of autonomous and semi-autonomous vehicles to the market and the real environment, as they usually have a very high price. It is therefore not uncommon for a set of sensors on a prototype autonomous vehicle to be worth ten times more than the vehicle on which it is mounted.
In this master’s thesis, we decided to use the automotive Livox Mid-40 LiDAR system, which came on the market in 2019, with a price of 600 USD. This price is significantly lower than other already established car LIDAR systems, which are usually a few tens of times more expensive. Livox Mid-40 LIDAR is a device developed by Livox, primarily for the automotive industry. In this chapter, we present the actual useful value of such a low-cost LiDAR system in the field of autonomous vehicles in real environments. The field of view of the sensor is in the shape of a cone, with an angle of 38.4 degrees. In this respect, it is similar to early versions of ground-based laser scanners, called window-scanners. Details about the actual mechanism of the scanner are very sparse. The manufacturer claims that LIDAR does not use moving electronic components, but this should not be equated with semiconductor LiDAR devices. It is also stated that the scanner uses a non-repeating pattern, which increases the density of the captured point cloud over time. This pattern and its flight speed are fixed and cannot be changed or adjusted by the user. Depending on the ratio of the rotational speeds of the two prisms, several different patterns can be achieved, such as stable, repeating curves, or non-repeating space filling curves. The manufacturer does not give details about the more detailed construction of the LiDAR device, and [40] is only a guess.
In each point cloud captured by the Livox Mid-40 scanner, we observed structured noise propagating concentrically from the center of the scan. Noise is reminiscent of water waves and has an obvious impact on the accuracy and precision of angle and distance measurements. This effect is most noticeable on flat surfaces. The Livox Mid-40 had trouble distinguishing between the brightest areas and the white background of the wall. By comparison, the Leica RTC360 in [40], which uses a 1550nm source, does not have these problems. Again, the effect of ripples is very noticeable in the image on the test, which is manifested in the form of circular areas with lower reflectivity. Despite being a flat object scan and printed targets, the light colors on the scan appear more distant and the darker ones closer to the LiDAR Livox Mid-40 device. This effect and its causes are otherwise well known and are usually eliminated in most commercial LIDAR systems and surveying devices. In [40], they also propose to perform quantitative tests based on guidelines for optical 3-D sensors, because there are no such guidelines for LIDAR sensors. They were specifically oriented according to VDI guidelines 2634, [40]. In our task, we therefore used the same methods, where we obtained comparable results, or came to similar conclusions. We performed the following tests: measuring length from a distance of exactly 5 meters, measuring distance and accuracy at 40, 60, 90 and 130 meters, and testing the flatness of the captured data.
Machine Learning (ML) encompasses the field of computer algorithms, where models can be automatically improved based on experience. It falls under the auspices of the field of artificial intelligence. Machine learning algorithms generate mathematical models based on learning data. With the help of these algorithms, assumptions and decisions can then be made without having to be explicitly defined using programming. We use machine learning on a wide range of applications and services, such as. web mail filtering or machine vision where it is very difficult or even impossible to use only conventional approaches. Learning by transferring knowledge of the EfficientDet D1 neural network was performed on an Intel Core i7-8086K processor and an nVidia RTX 2080 Ti graphics card. The inference and prediction process was performed on an Intel Core i7-4600U laptop processor, without a graphics card. The network learning database was built from individual frames of four traffic images from the third version of the LiDAR system. We changed the dimensions of the images from 1000×400 pixels to 640×640 pixels with the help of the Python program. The frames were marked manually as presented in Chapter 6.2, and the resulting images were not further processed, except for a change in dimension – the grid was taught on data that also included the roadway and other elements. The database, which contained 592 tagged images from Drive 3_1, 3_3 and 3_4 images, was divided into a group for learning and evaluation. testing, in a ratio of 90% of learning data and 10% of test data. That means 533 images for learning and 59 images for testing. We taught the network 500,000 steps, which took 62 hours. The learning process is shown in Graph 6.1, where the knee is visible at step 300,000, as steps 300,001 – 500,000 were performed subsequently.