2pi Software Case Study – ACT Emergency services Agency (ESA) – Using AWS (and python/GDAL) to batch process slippy map tiles from aerial photographs
2pi Software teamed up with Canberra partner, Spatialised, to deliver a new image processing service to the Australian Capital Territory government Emergency Services Agency (ACT ESA). The client challenge was delivering web map tiles (often referred to as ‘slippy tiles’) to an air-gapped Emergency Management System (EMS) within an expanding and continuously changing metropolitan landscape.
To handle the growth being experienced within the ACT region, the territory government responded by increasing aerial imagery collection frequency to four times a year. However, ACT ESA had no way of getting these images into their EMS in an efficient fashion – which resulted in emergency call responders being told to ‘follow the smoke’ – because imagery and maps in the off-air system were not updated as quickly as suburbs are built.
There were also a number of requirements around the solution needing to be predominantly open source, and highly scalable – hence the selection of the Amazon Web Services cloud platform.
The key targeted objective was to have a system where slippy map tile creation engines could be spawned essentially infinitely, each scaled according to the requirements of the image chopping task at hand, and each totally independent of the other. Complete and flexible control over memory usage and hardware capacity was critical.
The 2pi Software cloud team designed and delivered a scalable ‘image munching’ architecture based on AWS Batch and Elastic Compute Services. Large datasets were transitioned from local storage compute environments to AWS S3 storage buckets, and this immediately increased the number and range of compute power options available.
The team identified and solved myriad issues about S3 access, permission passing, long-running process issues as part of the solution development process.
The solution enabled the ESA team to upload new imagery to a location on AWS S3, and subsequently, by copying it into a processing bucket, a series of processing steps are automatically triggered. A summary of the technical details involved are provided here :-
- A Python process uses the Geospatial Data Abstraction Library (GDAL) + Shapely package to create an index of aerial imagery stored in an AWS S3 temporary store
- The resulting index is used to generate a series of GDAL virtual rasters (VRT files) again stored in S3. These VRTs covered 0.1 x 0.1 degrees, approximately 10 x 10 km.
- Each virtual raster is transformed from its native Coordinate Reference System (CRS) to the EMS target CRS, resulting in a stack of two virtual rasters – one listing input images with no transformation, and another saying how to transform input pixels.
- A compute engine was fired up for each generated mosaic (generally 0.1 x 0.1 degrees, or about 10 x 10 km), and each zoom level
- PNG tiles were cut and stored using the VRTs as input, delivering results straight to an S3 location, ignoring any tiles with partial no-data regions.
Commonly, in dealing with low zoom levels, the amount of computational resource required to complete the processing steps to generate a comparably small-in-size tile is high. This is where the choice of the highly expandable AWS environment certainly paid dividends as the process involved every pixel in the raw imagery for a 10 x 10KM mosaic being read into RAM to create a single 256 x 256 pixel output tile.
As well as AWS compute capabilities, the widely used GDAL system (using ‘gdal2tiles.py’ python) was called up on to cut the highest zoom level from raw imagery, then ‘interpolate up’. The results of this classic ‘delayed compute’ pattern approach led to impressive image quality, much to the satisfaction of the ACT ESA team.
TECHNICAL POINTS ABOUT THE AWS SERVICES USED IN THIS PROJECT
The following services from Amazon Web Services were utilised:
- Input S3 bucket contains the image files to be processed
- Working storage S3 bucket
- Output S3 bucket
- Lambda function to start things off
- CodeCommit repo containing software and configurations
- Batch Job Definitions, one for each Batch Job
- Batch Compute specifications, one for each Batch Job
- Batch Job Queue for the Batch Compute Environment
- Docker image for use by the Batch Compute Environment
- SNS entry to send completion notifications
The different jobs were uploaded to ECS Batch a service that uses a managed compute environment to run docker images on a fleet of container instances. ECS Batch manages auto scaling and distribution of jobs to the configured number of instances and can even utilise spot instances for cost efficiencies. The ECS Batch console provides a great way to monitor the process of jobs.
As an AWS Select Consulting Partner, 2pi Software has many experienced cloud systems architects who are skilled in delivering cost efficient, scalable and reliable infrastructures designed to handle just about anything. If your business or organisation is looking to migrate to the cloud, give us a call today!