Georisk Continuous (Generalized), Santa Rosa, CA

Jun 17, 2021 (Last modified Sep 15, 2022)
The dataset shows the risk of a water sample exceeding the Maximum Contaminant Level (MCL) after a wildfire.

Grid cell values provide absolute probabilities of drinking water contamination above the State of California MCL for Benzene (1 µg/L) for water samples from the water distribution system after a collocated wildfire event.

Bayesian regularized neural network ensembles were trained using high-resolution data layers comprising topography, soil properties, landcover, vegetation, meteorological parameters, fuel load, and infrastructure data. In combination with post-fire water samples, the input data was used to map the risks of MCL exceedance to the values observed in the cities of Santa Rosa, CA and Paradise, CA.

Many contributing factors and processes that can cause the post-fire contamination of drinking water in water distribution systems (WDS) are partially unknown or corresponding data unavailable. Processes such as the water distribution system-wide state of pressure, flow, and temperature in a complex pipe network across a town are unknown, except for certain main valves and control points.

Furthermore, parameters change during wildfires when firefighting efforts or damaged pipes and associated pressure drops change flow rates and directions at one or many points of the distribution system.
Furthermore, current wildfire models do not allow for modeling burn probability or fire behavior in built-up areas due to a current lack of fuel models for such structures. Sections of built-up areas containing numbers of structures that can be close to burnable vegetation are currently classified as non-burnable in fuel layers of fire models. Hence, using a deterministic process model for spatial predictions of post-fire contamination risk with available sampling data and knowledge of processes, is currently unfeasible. For the spatial analyses here, we use a machine learning approach with pattern recognition networks that have SoftMax classification output layers to spatially predict conditional probabilities of drinking water contamination in WUI areas after fire affected the structures and the surrounding areas. We use analytical results of post-fire water samples, topographic factors, landcover data, information about infrastructure, and physical soil properties in combination with Bayesian regularized neural networks building ensemble models that predict conditional probabilities for benzene levels in WDS exceeding the maximum contaminant level (MCL) for benzene. Benzene is considered a carcinogen and poses a severe health threat to humans if consumed in high concentrations. While other contaminants were found in WDS water samples after wildfires, benzene was chosen as a representative Volatile organic compound because of its abundance in post-fire water samples in Santa Rosa and Paradise, California.

Using the water samples that were collected in any study area after the wildfire, the parameters of the neural networks are iteratively optimized to map the input data on the target data (i.e., the contamination status of post-fire water samples at each point).

Once the model is optimized to reproduce the training data and generalize well enough to also model new data (not included in the training process) with sufficient accuracy, the model is applied to the entire model domain with a 30 m x30 m resolution as illustrated in figure 1. Several models can be averaged to build an ensemble result at each grid cell point which in practice often increases the generalization capabilities of the models and hence, their accuracy. The results for the risk of water contamination shown as part of the EEMS model give the conditional probability that post fire water samples exceed the California MCL for benzene in drinking water (1 µg/L) after a potential fire.
Data Provided By:
Oregon State University (Dr. Andres Schmidt) created the dataset. All input data variables used to calculate the risk were aggregated to 30 x 30 m spatial resolution. For the topographic data layers, we used the 30 m NASA Shuttle Radar Topography Mission dataset (SRTM) version 3.0. Aspect values were calculated using the ESRI ARCGIS Surface Parameters tool with adaptive neighborhood selection and quadratic surface functions fitted around each grid cell. Vegetation fuel load was quantified through landcover type, percentage vegetation cover, and vegetation height. We used the LANDFIRE 2016 Remap (LF 2.0.0) for existing vegetation height (EVH) and percentage vegetation cover. The Multi-Resolution Land Characteristics Consortium (NLCD 2016) dataset was used for landcover type classification. The locations of buildings were taken from the 2018 Microsoft Building Footprint data that was created from satellite and aerial imagery using the ResNet34 deep neural network. The spatial values for contents of clay, silt, and sand, as well as soil bulk density were downscaled using the WoSIS and SoilGrids datasets publicly provided through

Locations of fire stations were obtained from the Homeland Infrastructure Foundation-Level Data database (HIFLD).

Wind fields were then downscaled with WindNinja (ver. 3.7.2) to account for topography and surface roughness and obtain the 30 m resolution wind fields for the two model domains. The thermal conductivity of soil has a strong effect on the resulting belowground temperature and, hence, the heat-related pipeline damage from aboveground fire potentially causing deformation, melting, and heat-induced release of contaminants in belowground water pipes. Using the soil data from the SoilGrids repository in combination with average soil moisture values during the months of the fire occurrences from the TerraClimate database.

Post-fire water samples for network training were collected and provided by the Santa Rosa Water Department.
Content date:
Schmidt, Andres, Lisa M. Ellsworth, Jenna H. Tilt, and Mike Gough. 2022. “Predicting Conditional Maximum Contaminant Level Exceedance Probabilities for Drinking Water after Wildfires with Bayesian Regularized Network Ensembles.” Machine Learning with Applications 7 (March): 100227.
Spatial Resolution:
30 (Meter)
Contact Organization:
Oregon State University
Contact Person(s):
Use Constraints:
Datasets provide a risk assessment based on landcover and infrastructure information prior to the 2017 Tubbs Fire. Hence, some MCL exceedance risk values might change spatially due to altered input pertaining to infrastructure and vegetation cover. Risk values pertain to samples taken from a water distribution system (WDS) and do not apply to collocated surface water bodies or ground water sources.
Layer Type:
Currently Visible Layer:
All Layer Options:
Layers in this dataset are based on combinations of the following options. You may choose from these options to select a specific layer on the map page.
Spatial Resolution:
Other Information:
Time Period:
Layer Accuracy:
Attribute Accuracy:
FGDC Standard Metadata XML
Click here to see the full FGDC XML file that was created in Data Basin for this layer.
Original Metadata XML
Click here to see the full XML file that was originally uploaded with this layer.
This dataset is visible to everyone
Dataset Type:
Layer Package
Bookmarked by 1 Group

About the Uploader

Conservation Biology Institute

The Conservation Biology Institute (CBI) provides scientific expertise to support the conservation and recovery of biological diversity in its natural state through applied research, education, planning, and community service.