Duke team devise method for detecting PM2.5 pollution using AI/machine learning, micro-satellite imagery and weather data
27 April 2020
Researchers from Duke University have devised a computer vision algorithm for estimating ground-level PM2.5 at a high spatiotemporal resolution by directly processing global-coverage, daily, near real-time updated, micro-satellite imagery of spatial coverages significantly smaller than 1 x 1 km (e.g., 200 x 200 m) available from Planet Lab. The results appear in the journal Atmospheric Environment.
From Zheng et al.
Such information could help researchers identify hidden hotspots of dangerous pollution, greatly improve studies of pollution on human health, or potentially tease out the effects of unpredictable events on air quality, such as the breakout of an airborne global pandemic.
We’ve used a new generation of micro-satellite images to estimate ground-level air pollution at the smallest spatial scale to date. We’ve been able to do it by developing a totally new approach that uses AI/machine learning to interpret data from surface images and existing ground stations.—Mike Bergin, professor of civil and environmental engineering at Duke
The specific air quality measurement that Bergin and his colleagues are interested in is the amount of PM2.5. Current best practices in remote sensing to estimate the amount of ground-level PM2.5 use satellites to measure how much sunlight is scattered back to space by ambient particulates over the entire atmospheric column.
This method, however, can suffer from regional uncertainties such as clouds and shiny surfaces, atmospheric mixing, and properties of the PM particles, and cannot make accurate estimates at scales smaller than about a square kilometer. While ground pollution monitoring stations can provide direct measurements, they suffer from their own host of drawbacks and are only sparsely located around the world.
Ground stations are expensive to build and maintain, so even large cities aren’t likely to have more than a handful of them. Plus they’re almost always put in areas away from traffic and other large local sources, so while they might give a general idea of the amount of PM2.5 in the air, they don’t come anywhere near giving a true distribution for the people living in different areas throughout that city.—Mike Bergin
In their search for a better method, Bergin and his doctoral student Tongshu Zheng turned to Planet, an American company that uses micro-satellites to take pictures of the entire Earth’s surface every single day with a resolution of three meters per pixel. The team was able to get daily snapshot of Beijing over the past three years.
With help from David Carlson, an assistant professor of civil and environmental engineering at Duke and an expert in machine learning, Bergin and Zheng applied a convolutional neural network with a random forest algorithm to the image set, combined with meteorological data from Beijing’s weather station.
A detailed map of the pollution levels in Beijing and its surrounding areas using a new machine learning algorithm for satellite images and weather. (Left) land features, (right) color-coded by the amount of PM2.5 pollution.
A random forest is a standard machine learning algorithm that uses a lot of different decision trees to make a prediction. The algorithm is looking through decision trees based on metrics such as wind, relative humidity, temperature and more, and using the resulting answers to arrive at an estimate for PM2.5 concentrations.
However, random forest algorithms don’t deal well with images. That’s where the convolutional neural networks come in. These algorithms look for common features in images such as lines and bumps and begin grouping them together. As the algorithm “zooms out,” it continues to lump similar groupings together, combining basic shapes into common features such as buildings and highways. Eventually the algorithm comes up with a summary of the image as a list of its most common features, and these get thrown into the random forest along with the weather data.
High-pollution images are definitely foggier and blurrier than normal images, but the human eye can’t really tell the exact pollution levels from those details. But the algorithm can pick out these differences in both the low-level and high-level features—edges are blurrier and shapes are obscured more—and precisely turn them into air quality estimates.—David Carlson
The convolutional neural network doesn’t give us as good of a prediction as we would like with the images alone. But when you put those results into a random forest with weather data, the results are as good as anything else currently available, if not better.—Tongshu Zheng
In the study, the researchers used 10,400 images to train their model to predict local levels of PM2.5 using nothing but satellite images and weather conditions. They tested their resulting model on another 2,622 images to see how well it could predict PM2.5.
They show that, on average, their model is accurate to within 24% of actual PM2.5 levels measured at reference stations, which is at the high end of the spectrum for these types of models, while also having a much higher spatial resolution. While most of the current standard practices can predict levels down to 1 million square meters, the new method is accurate down to 40,000—about the size of eight football fields placed side-by-side.
With that level of specificity and accuracy, Bergin believes their method will open up a wide range of new uses for such models.
This research was supported in part by the Research Initiative for Real-time River Water and Air Quality Monitoring program funded by the Department of Science and Technology, Government of India and Intel and a Duke Energy Initiative Energy Data Analytics PhD Fellowship.
Tongshu Zheng, Michael H. Bergin, Shijia Hu, Joshua Miller, and David E. Carlson (2020) “Estimating Ground-Level PM2.5 Using Micro-Satellite Images by a Convolutional Neural Network and Random Forest Approach,” Atmospheric Environment, doi: 10.1016/j.atmosenv.2020.117451