A Big Data Approach to Generate Training Data for Automatic Ship Detection


av Lars Henrik Berg-Jensen og Adrian Tofting



A fast and scalable approach to combine global satellite images with ship navigational messages

In this project, we have developed a data processing pipeline that combines multiple sources of data to automate the process of ship detection in satellite images. The cent- ral concept is based on integrating Sentinel-2 Optical Multi-Spectral Imagery with ship navigational messages provided by the Automatic Identification System (AIS). The suc- cessful integration made it possible to automatically generate a training dataset con- sisting of images labelled with ship positions. A future goal is for this training dataset to be used in supervised machine learning to train a neural network to recognise ship features in the images automatically. Our vision is for this to accompany AIS in applic- ations ensuring safety in the marine sector.

Our data processing pipeline includes all aspects of data analytics: collection, prepro- cessing, cleansing, storing, filtering, combining, analysing, as well as visualising the data. We have designed the system to be modular and highly scalable, such that it can further be developed into supporting real-time analysis of any aerial imagery.

The solution we have developed can be divided into three main parts:

Image Selection Optimisation is a proposed approach to select satellite images glob- ally, with a high probability of containing ships. This is done by processing global AIS data within an arbitrary time interval. When performing a density analysis on 400 mil- lion global ship navigational messages, we experienced a total execution time below 3 minutes. The optimisation enabled by such analyses will further save an immense amount of time in the generation of the training dataset.

Ship Position Estimation computes the coordinates for every ship within the satellite image. This includes complex data integrations, both spatially and temporally. It relies on having access to a complete dataset from AIS around the time the image was sensed. It also performs significant corrections, as the provided timestamp was found to be highly inaccurate. The execution time for an arbitrary image is below two seconds.

Training Dataset Generation extracts regions from the image around each estimated ship position, resulting in smaller training images. This will further be advanced into augmenting the training dataset by varying the region extraction using appropriate transformations.