This tutorial focuses on how to represent and visualize attribute uncertainty, using wetlands as a case study. We will use two resolutions of attribute uncertainty (referred to as 'fine' and 'coarse') in wetland classification and create visualizations to assist in analyzing and interpreting the spatial patterns of uncertainty across multiple scales. We will use confusion matrices to identify areas and types of misclassification across the two attribute resolutions: the coarse resolution will identify a Boolean distinction between the presence and absence of wetlands and the fine resolution will identify agreement and misclassficiation across specific sub-classes (riverine, palustrine, and lacustrine wetlands). Using the confusion matrices, we will construct multi-scale data pyramids to aid in determining where attribute uncertainty is extreme or highly variant. Even though the pyramid contains multe-scale uncertainty in a single unified framework, we can "slice" the pyramid to view individual focal scales and compute statistical summaries of spatial patterns. The ultimate goal here is to provide uncertainty metrics and display tools to assist researchers and practitioners in identifying specific areas and scales where potential gaps in data reliability may exist and may therefore be prioiritized for additional attention or more frequent field checks. While this tutorial focuses on wetlands, this methodology can be used to assess attribute uncertainty in other categorical data such as land use and land cover.
We are assuming you have basic working knowledge of GIS, Python, and MATLAB.
This tutorial requires the use of Anaconda. If you do not already have Anaconda, please download and install it now. We recommend following the preset settings during installation.
In Anaconda, we will set up an environment specific to this tutorial. Open the Anaconda Navigator. On the left hand side of the window, click on the Environments tab. Here you can view and control your environments. If you just installed Anaconda, you should currently only have a base environment. We will create a new environment specific to this tutorial:
conda activate [tutorial environment name]
and hitting enter. Then type pip install lidar
and hit enter. This tutorial also requires the use of MATLAB. If you do not already have MATLAB, please download and install it now. Note that MATLAB is proprietary software, but you may have free access through your university or employer. Follow the preset settings during installation.
Once downloaded and installed, open MATLAB. At the top of the window, you should see that you are on the 'Home' tab. Click on the 'Add-Ons' button to open the Add-On Explorer window. In the search bar, search for 'image processing toolbox' and select the 'Image Processing Toolbox' result that comes up. Click 'Sign In to Install', sign in, then click 'Install'. Follow the installation steps.
This tutorial was created in June 2021, working with Anaconda 4.10.1, Python 3.7.10, and MATLAB R2021a. Python package versions used: lidar 0.6.1, numpy 1.20.2, pandas 1.2.4, scipy 1.6.2, XlsxWriter 1.3.8, GDAL 3.0.2
We recommend that you have one folder that contains all of the tutorial materials. Within your tutorial folder you should have Part 1, Part 2, and Part 3 folders as well as all code files.
Some data pre-processing and cleaning was done for the Part 1 inputs. If you are interested in adapting this tutorial to a different area, here are the pre-processing steps you can take.
Open up Spyder from the Anaconda prompt. You can do this by activating your tutorial environment, typing spyder
in the prompt, and hitting enter. Then, load the first python script: Part1_HGM_NWI_Compare.py.
You need to change the paths in the script to where you've stored the Part 1 folders. Scroll to line 43 and update the set_path
variable to point to your Part1 folder.
set_path = r'C:\Users\ktyle\Documents\MSAU_Tutorial\Part1'
Once you've updated the set_path
variable, you can run the script. This will input the data, generate the HGM, and create the confusion matrices all in one script.
The Cowardin classification system includes five classes: riverine, lacustrine, palustrine, marine, and estuarine (FGDC, 2013). NWI is the largest database of geospatial wetlands data within the United States and uses the Cowardin classification. This tutorial uses NWI as the data source for the Cowardin classed wetlands. Using NWI, three wetland types are found within the study area: 566,902 acres of palustrine (44.3%), 217,458 acres of riverine (17.0%), and 68,720 acres of lacustrine (5.4%) wetlands. Wetlands dominate the study area, covering about 67% of the landscape.
Note: As part of data pre-processing, the NWI data was modified to remove open water from the classification in order to make the data comparable to the HGM system, which does not account for open water.
The HGM classification system includes seven classes reflecting landscape functionality: riverine, depressional, slope, mineral soil flats, organic soil flats, tidal fringe, and lacustrine fringe (Smith et al., 2013). In order to be comparable with the Cowardin system, the HGM classification was modified into three classes matching the Cowardin classes present within the study area (riverine, lacustrine, and palustrine). Riverine and lacustrine classes exist within both classification systems; however, the palustrine class is only in the Cowardin system. To better align the two systems a “palustrine” class was created for the HGM by combining depressional, slope wetlands, mineral soil flats, and organic soil flats definitions. The HGM palustrine class describes wetlands whose main water sources are groundwater or precipitation rather than flows from rivers or lakes. No geospatial layers using the HGM exist for this study area, so a wetlands database was created using ancillary data and the modified HGM classification system. This process is described below, and there are published examples of research that used GIS data to create an HGM dataset (Cedfeldt et al., 2000; Adamus et al., 2010; Van Deventer et al., 2016; Rivers-Moore et al., 2020).
In addition to inputting the data, the script you just ran also generates the HGM dataset. In order to do so, the code uses ancillary data including hydrogrophy, hydric soils, and elvation layers. All rivers and lakes larger than eight hectares are given a 30m buffer and then areas within this buffer with either hydric, predominantlyhydric, or partially hydric soils are classed as riverine. Areas adjacent to lakes with hydric, predominately hydric, or partially hydric soil are classed as lacustrine. Elevation is used to locate topographic depressions and slopes reater than two percent with hydric, predominately hydric, or partially hydric soils - classed as palustrine along with areas that had completely hydric soils.
The HGM dataset was intended for comparability with NWI. It was not created with the goal to be a more accurate wetlands model. Several assumptions were made in generating the HGM dataset that may not be true in every geographic case. For example, it is not correct to assume that soils classed as hydric are always wetlands, or that wetlands adjacent to rivers and lakes are always riverine and lacustrine respectively. Wetland classification is more nuanced and requires on-the-ground validation for the highest accuracy. However, the purpose of this analysis is to demonstrate a method and data framework within which to explore multi-scale patterns of spatial uncertainty, rather than to establish a spatially precise model of wetlands within the study area.
Here we are using confusion matrices to compare the NWI and HGM wetlands classes. We are using the NWI data as the validation dataset and the HGM as the test dataset; this is not to imply that the NWI dataset is more or less reliable than the HGM dataset. Both wetland datasets incur uncertainties resulting from their compilation, processing, and temporal resolution. The confusion matrices simply allow us to identify the agreement and disagreement between the two datasets. Where the two wetland classification systems agree, we have higher certainty that the classification of the wetland presence or type is reliable. Conversely, where there is disagreement, there is higher uncertainty.
Within the confusion matrices, there are three evaluation metrics used to quantify uncertainty: Recall, Precision, and F1.
To check your outputs, navigate into your Part 1 Output folder and open up the Full_Keys.xlsx which contains both your coarse and fine confusion matrices. The first tab in the excel contains your coarse confusion matrix and should look like this:
This coarse resolution assesses the presence versus absence of wetlands. This matrix is shows that we have 12.04% true positives (green) in our study area and 59.94% true negatives (white) - both refer to areas where NWI and HGM agree on the presence or absence of wetlands. We also have 4.41% false positives (red) and 23.62% false negatives (blue) - both referring to areas where NWI and HGM disagree. Since these matrices use NWI as the validation dataset, false positives are areas where the NWI classifies a wetland while HGM does not and false negatives are areas where the NWI classifies no wetlands while HGM classifies the presence of wetlands.
The second tab in your excel contains the fine resolution matrix and should look like this:
This fine resolution confusion matrix compares not only the presence and absence of wetlands, but also the agreement of the specific wetlands type (riverine, lacustrine, or palustrine). The first column shows false negatives (wetlands in NWI but not in HGM) for all categories while the top row shows false positives (wetlands in HGM but not in NWI). The diagonal shows cells that agree on the presence/absence of wetlands in both data sets as well as on their specific type. The six gray cells indicate a misclassification at the finer level, namely that both classification systems agree that wetlands are present but disagree on the type. The percentage values in the matrix indicate the proportion of the pixels in each of the sixteen cases for the entire study area. The cells are color-coded, with hue (blue, green, purple) referring to wetland type. Saturation and value are used to distinguish false negatives and false positives.
Other outputs from Part 1 include:
Note: These instructions were made using ArcMap 10.8
Note: These instructions were made using QGIS 3.16
After creating the confusion matrix surfaces, a progressive focal window analysis is used to transform the coarse and fine uncertainty surfaces into a pyramid data framework. The area used for the scale analysis is a 200-pixel by 200-pixel section of the False River subset. Note that you can use a larger or smaller area to build the pyramids, but it must be a square area. Starting with a three-by-three-pixel size, focal windows are moved across the entire surface and the desired metric is calculated for the pixels falling within the focal window. For each window, the calculated metrics are stored within the center pixel of the window. After the entire surface has been calculated a single layer of the pyramid is complete. The analysis then iterates with increasingly larger focal window sizes, and the resulting calculations are stored in new layers of the pyramid data framework. As the focal window size increases, more of the edge areas are excluded and the number of pixels containing data decreases. This results in a pyramid shape with the full data layer on the bottom and a single cell at the top that encompasses the entire study area.
The pyramid data framework allows multi-scale summary statistics to be stored in a single unified framework that can then be visualized, summarized, and sliced. Animation is possible to slide vertically through the pyramid. Summary statistics can be calculated for each slice allowing you to easily compare the data at different scales. This allows for easy viewing of misclassification at multiple scales and determination of uncertainty patterns within the data across spatial and attribute resolutions.
In this section of the tutorial we will run three separate python scripts. We'll start with "Part2_HGM_NWI_Compare_subset.py". Go ahead and open this script in Spyder via Anaconda.
Before we run this script, we need to update the paths. On line 25, update the path to point to your Part 2 folder.
set_path = r'C:\Users\ktyle\Documents\MSAU_Tutorial\Part2'
Once you've updated the set_path
variable, go ahead and run the script. This will clip the study area to the square analysis area for creating the pyramids.
The next script we'll run is "Part2B_focal_stats_coarse_res.py". In this script, change the set_path
variable on line 206 to point to your Part 2 folder.
set_path = r'C:\Users\ktyle\Documents\MSAU_Tutorial\Part2'
Once you've updated the path, run the script. This will create the pyramid at the coarse attribute uncertainty level.
The last script we'll run for part 2 is "Part2C_focal_stats_fine_res.py". Similar to the last script, you just need to change the paths to point to your Part 2 folder on line 300. Once you've done that, you can run the script to create the fine pyramids. This script may take a while to run, depending on your computer speed it could take around 20 minutes.
set_path = r'C:\Users\ktyle\Documents\MSAU_Tutorial\Part2'
In this final part of the tutorial. We will use MATLAB to visualize the pyramids. You may notice that all of the .mat files that were output in Part 2 are serving as our inputs for Part 3.
Using your file browser, double click on "Part3_Pyramid_model.m" to open the script in MATLAB. You shouldn't need to make any edits to the code for it to run successfully. However, you will need to add a folder connection to the Part 3 folder. You can do this in the left panel ("Current folder") by navigating to your Part 3 folder, then right click on the folder and click Add to Path > Selected Folder and Subfolders.
In your output folder, you should now have 35 files. These outputs include visualization from the coarse attribute F1 pyramid, coarse attribute Precision pyramid, coarse attribute Recall pyramid, fine attribute pyramid showing F1 scores for the palustrine class, and another fine attribute pyramid showing F1 scores for the non-wetland class. Within each of these you have 7 files: 5 .fig files that show you a slice of the pyramid (these can be viewed in MATLAB and exported as jpgs, pngs, pdfs, etc.), one .avi file that show an animation of sliding through multiple slices in the pyramid, and one excel sheet providing more detail about the F1, Recall, or Precision metrics. Note that files ending in "_75.fig" are scratch outputs and not meant for meaningful interpretation.
Below is an example of what your F1_slice5 should look like. Note that in the MATLAB interface you can click and drag to manipulate the perspective.
Cited References
Adamus, P., Christy, J., Jones, A., McCune, M., and Bauer, J. (2010). A Geodatabase and Digital Characterization of Wetlands Mapped in the Willamette Valley With Particular Reference to Prediction of Their Hydrogeomorphic (HGM) Class [Report to USEPA Region 10].
Cedfeldt, P. T., Watzin, M. C., and Richardson, B. D. (2000). Using GIS to Identify Functionally Significant Wetlands in the Northeastern United States. Environmental Management, 26(1), 13–24. https://doi.org/10.1007/s002670010067
FGDC, (Federal Geographic Data Committee). (2013). Classification of Wetlands and Deepwater Habitats of the United States (FGDC-STD-004-2013. Second Edition.). Wetlands Subcommittee, Federal Geographic Data Committee and U.S. Fish and Wildlife Service.
Powers, D.M.W. (2011). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation. International Journal of Machine Learning Technology, 2(1), 37–63.
Rivers-Moore, N. A., Kotze, D. C., Job, N., and Mohanlal, S. (2020). Prediction of Wetland Hydrogeomorphic Type Using Morphometrics and Landscape Characteristics. Frontiers in Environmental Science, 8. https://doi.org/10.3389/fenvs.2020.00058
Smith, R. D., Noble, C. V., and Berkowitz, J. F. (2013). Hydrogeomorphic (HGM) Approach to Assessing Wetland Functions: Guidelines for Developing Guidebooks (Version 2): Defense Technical Information Center.
Van Deventer, H., Nel, J., Mbona, N., Job, N., Ewart-Smith, J., Snaddon, K., and Maherry, A. (2016). Desktop classification of inland wetlands for systematic conservation planning in data-scarce countries: Mapping wetland ecosystem types, disturbance indices and threatened species associations at country-wide scale. Aquatic Conservation: Marine and Freshwater Ecosystems, 26(1), 57–75. https://doi.org/10.1002/aqc.2605