How is air quality data validated before publishing on the AirVisual platform?

Views:

The AirVisual platform aims to centralize and aggregate as much air quality information as is currently available in once place, in order to provide the most comprehensive overview of global air quality.

Data sources which are reported through the AirVisual platform include sensor data, from governmental monitoring stations (typically considered high-cost "reference monitors"), as well as low-cost sensors such as public AirVisual Pro stations and PurpleAir sensors.

All data which is published through the AirVisual platform is subject to data validation, and this process differs between these two sources of sensor data.

AirVisual's data validation system is cloud-based and driven by machine learning, and all measurements are passed through this system before publishing to our platform.

Governmental "reference" sensor data

Although high-cost governmental sensors are typically considered the most accurate and reliable source of measured air quality data, sometimes these sensors also report anomalies or inaccurate data. Reasons for this may include temporary periods of maintenance or defects, or even temporary hyperlocal emission sources nearby the sensor.

Accordingly, all government sensor data is subject to a data validation system before publishing. One example of this validation, is that the cloud-based system identifies any potential anomalies published by a station (for example, a sudden high spike in PM2.5 from 10ug/m3 to 100ug/m3 from one hour to the next), and will cross-check with other nearby measurements to verify whether such a spike is representative, or an anomaly. The validation process also cross-checks against historical patterns, and other parameters such as weather conditions. The value will then be published or discounted accordingly.

Low cost sensors

Measurements from low-cost sensors are also subjected to a data calibration and correction process, in addition to the validation process described above, which will identify and discount anomalous readings.

The system applied to low-cost sensors takes into account nearby conditions such as temperature, humidity, pollution composition and applies a data calibration algorithm based on environmental conditions. For one example, high humidity levels may under some circumstances lead to low-cost sensors over-reporting levels of PM2.5. Similarly, the pollution composition (transportation generated pollution, sand storms, coal based pollution, etc.) greatly affects the measurement and the AirVisual platform uses satellite imagery to determine the pollution composition to use in the calibration mechanism. Therefore, this calibration & correction algorithm takes local humidity levels and other environmental parameters into account, in addition to regional historical patterns, and adjusts the PM2.5 measurements accordingly.

The adjustment level is determined by the cloud-based system which is built on artificial intelligence & machine learning. Through aggregating billions of global air quality data points for numerous years, from reference sensors, AirVisual sensors, meteorology data, and pollution composition from satellite imagery, this system has been learning the complex historical relationships between different air quality parameters in different parts of the world.

Since the composition of PM can vary widely between different areas of the world, it is crucial to distinguish between the correlations of PM and factors such as humidity at a local / regional level. These correlations can vary widely depending on the different local compositions of PM, and therefore must be taken into account for local calibration and correction algorithms.

Keywords: AirVisual, Air quality information, publishing, validate, validation, validated, air quality data, data