MACHINE LEARNING BASED CLASSIFICATION OF EARLY SERAL VEGETATION IN CUT-BLOCKS IN THE INTERIOR OF NORTHERN BRITISH COLUMBIA by Matt McLean BSc, University of Northern British Columbia, 2017 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN NATURAL RESOURCES AND ENVIRONMENTAL STUDIES UNIVERSITY OF NORTHERN BRITISH COLUMBIA December 2024 © Matt McLean, 2024 Abstract Globally forests provide a wide range of essential services such as lumber for construction, tourism value, and habitat for animals. In many regions forest management is performed to maximize the utilization of these services and to promote sustainable forest ecosystems. Effective management requires detailed information on the current state of forests, how the forest is projected to develop through time, and knowledge about the provisioning of desired forest services, such as forage for wildlife species. Historically this information has been acquired using traditional field surveys, which is both costly and limited in the extent of area that can be sampled. The use of Remotely Piloted Aircraft Systems (RPAS) combined with machine learning potentially allows for more scalable methods of gathering information on forest inventories. In this thesis, I evaluate and advance the use of multispectral imagery collected from RPAS for the classification of early seral vegetation. This specific type of vegetation is both a key indicator of forest regeneration and habitat suitability for ungulates. However, accurate identification and classification of early seral vegetation is particularly challenging due to its small size, the fact that individuals are highly variable, and the fact that individuals can overlap and not exhibit distinct boundaries. The process of image classification is broken down into two major components: the segmentation of collected imagery into discrete units of vegetation and then the classification of those units into their specific species. These two components are presented as an overall framework for classification. I also provide operational recommendations to achieve successful results. The algorithms used in the segmentation of images are highly configurable and can be tuned to the input data to yield high quality results; however, what is more challenging is ii determining what a high-quality result is, and applying suitable metrics that allow the accuracy of the segmentation process to be evaluated. In this research I propose a method for scoring the quality of segmentation quality applied to forest imagery, in a format that can be easily integrated into a larger framework that will integrate with the classification of results. In the second component of my thesis, I evaluate various common classification algorithms and assessed their accuracy. This analysis considered both overall accuracy of classification, as well as only the classification accuracy of species of interest. I also explore under what circumstances this type of classification be feasible and provide recommendations on what variables are most important to control during the collection of training data, and best practice for capture of new datasets for classification with already trained models. My research demonstrates both the benefits and limitations of using RPAS imagery for segmentation and classification of early seral vegetation and suggests best practices that can be used when applying this framework. iii Table of contents ABSTRACT............................................................................................................................. II TABLE OF CONTENTS ....................................................................................................... IV LIST OF TABLES .................................................................................................................. VI LIST OF FIGURES............................................................................................................. VIII CHAPTER 1 ..............................................................................................................................1 1.1 BACKGROUND ....................................................................................................................1 1.2 TECHNICAL BACKGROUND ..................................................................................................3 1.3 PROJECT ROADMAP.............................................................................................................8 CHAPTER 2 ............................................................................................................................ 12 2.1 INTRODUCTION ................................................................................................................. 12 2.2 STUDY SITES ..................................................................................................................... 14 2.3 DATA COLLECTION............................................................................................................ 16 2.4 SEGMENTATION ALGORITHMS ........................................................................................... 18 2.5 METHOD OF EVALUATION.................................................................................................. 21 2.6 RESULTS OF IMAGE SEGMENTATION ALGORITHMS ............................................................. 28 2.7 DISCUSSION ...................................................................................................................... 40 CHAPTER 3 ............................................................................................................................ 44 3.1 INTRODUCTION ................................................................................................................. 44 3.2 DEFINITIONS ..................................................................................................................... 46 3.3 DATA PREPARATION........................................................................................................... 48 3.4 CLASSIFICATION ALGORITHMS .......................................................................................... 50 3.5 MINIMUM DATA REQUIREMENTS ........................................................................................ 53 3.6 FRAMEWORK FOR EVALUATION ......................................................................................... 54 3.6 RESULTS ........................................................................................................................... 55 3.7 RESULTS SYNTHESIS ......................................................................................................... 71 3.8 DISCUSSION ...................................................................................................................... 73 CHAPTER 4 ............................................................................................................................ 78 4.1 INTRODUCTION ................................................................................................................. 78 4.2 DATA COLLECTION............................................................................................................ 78 4.3 EVALUATION OF SEGMENTATION........................................................................................ 81 4.4 EFFECTIVENESS OF CLASSIFICATION .................................................................................. 82 4.5 WHAT INFORMATION DO THESE RESULTS PROVIDE TO FOREST MANAGERS? .......................... 84 4.6 FRAMEWORK FOR EVALUATION.......................................................................................... 85 REFERENCES ........................................................................................................................ 87 APPENDIX.............................................................................................................................. 92 iv v List of Tables Table 1 | Study site data collection statistics with tree counts and capture date. .......................... 15 Table 2 | Extended site attributes; with information from British Columbia Vegetation Resource Index (Government of British Columbia, n.d.)........................................................................... 16 Table 3 | Figures in table below each demonstrate the scoring of a base metric as the SLIC hyperparameters are changed. ............................................................................................................ 24 Table 4 | Average segmentation score for SLIC across all four metrics....................................... 28 Table 5 | Average segmentation score for Quick Shift across all four metrics ............................. 30 Table 6 | Average segmentation score for Felzenszwalb's Efficient Graph across all four metrics ................................................................................................................................................. 31 Table 7 | Average segmentation score for Mean Shift across all four metrics .............................. 33 Table 8 | Summary of Algorithm performance across all four metrics. Average is the score of each site segmented independently and scores averaged, Aggregate is the score when the same hyper-parameters are used on all sites then averaged. ................................................................ 35 Table 9 | Stability Matrix for segmentation algorithms and scoring metrics, higher stability indicates more consistent results between sites. ......................................................................... 36 Table 10 |Correlations between Metric 1 score and site attributes .............................................. 40 Table 11 | Table of sample counts by site, both total samples as well as counts of only target species ...................................................................................................................................... 49 Table 12 | Classification accuracy of training segments for all species at each site with unbalanced data......................................................................................................................... 57 Table 13 | Classification accuracy of training segments for all species at each site with SMOTE balanced data ............................................................................................................................ 57 Table 14 | Change in classification accuracy of training segments for all species resulting from using SMOTE. .......................................................................................................................... 57 Table 15 | | Classification accuracy of training segments for target species at each site with unbalanced data......................................................................................................................... 58 Table 16 | Classification accuracy of training segments for target species at each site with SMOTE balanced data .............................................................................................................. 58 Table 17 | Change in classification accuracy of training segments for target species resulting from using SMOTE. .......................................................................................................................... 58 Table 18 | Classification accuracy of training segments for all species by site group with unbalanced data......................................................................................................................... 59 Table 19 | Classification accuracy of training segments for all species by site group with SMOTE balanced data ............................................................................................................................ 59 Table 20 | Change in classification accuracy of training segments for all species resulting from using SMOTE with site groups.................................................................................................. 60 Table 21 | Classification accuracy of training segments for target species by site group with unbalanced data......................................................................................................................... 60 Table 22 | Classification accuracy of training segments for target species by site group with SMOTE balanced data .............................................................................................................. 60 vi Table 23 | Change in classification accuracy of training segments for target species resulting from using SMOTE with site groups.................................................................................................. 60 Table 24 | Shannon’s Diversity Index of sample counts by site .................................................. 69 Table 25 | Shannon's Diversity Index of sample counts by site groups ....................................... 69 Table 26 | Correlation of Shannon’s Diversity Index and classification accuracy of individual sites ........................................................................................................................................... 69 Table 27 | Correlation of Shannon’s Diversity Index and SMOTS effects on classification accuracy of individual sites ....................................................................................................... 70 Table 28 | Correlation of Shannon’s Diversity Index and classification accuracy of site groups.. 70 Table 29 | Correlation of Shannon’s Diversity Index and SMOTS effects on classification accuracy of site groups .............................................................................................................. 70 Table 30 |Classification accuracy of target species using SVM, with top row showing total number of training samples. ...................................................................................................... 83 Table 31 |Classification accuracy of target species using SVM, with top row showing percentage of samples that are target species. .............................................................................................. 83 vii List of Figures Figure 1 | ER Diagram of processing pipeline for both initial training, as well as applying trained models to novel data.................................................................................................................. 10 Figure 2 | Map of locations where data was collected. ............................................................... 15 Figure 3 | Example of manually delineated crowns in red and corresponding species code labeling. .................................................................................................................................... 17 Figure 4 | Example of perfect over-segmentation; each value in right table is mapped to only one value in left table. ...................................................................................................................... 21 Figure 5 | Examples of Segmenations on data; colors represent distinct data, dashed lines determined segments ................................................................................................................. 23 Figure 6 - SLIC False Merges, as segments per hectare increases false merges decreases .......... 24 Figure 7 - SLIC False Splits, as segments per hectare increases False Splits increase ................ 24 Figure 8 - SLIC Adapted Random Precision decreases with segments per hectare ..................... 24 Figure 9 - SLIC Adapted Random Recall increases with segments per hectare .......................... 24 Figure 10 - Adapted Random Error, local minimum is achieved with segments per hectare. This is the only base metric that the optimal solution does not extend to limit ................................... 24 Figure 11 | Results of SLIC segmentation on 200rd site optimized for metric 1 in red, training segments in black. ..................................................................................................................... 29 Figure 12 | Results of Quick Shift segmentation on 200rd site optimized for metric 1 in red, training segments in black. ........................................................................................................ 30 Figure 13 | Results of Felzenszwalb’s Efficient Graph segmentation on 200rd site optimized for metric 1 in red, training segments in black................................................................................. 31 Figure 14 | Results of Mean Shift segmentation on 200rd site optimized for metric 1 in red, training segments in black. ........................................................................................................ 33 Figure 15 | Example of YOLO Segmentation and Classification, bounding boxes are labeled with species code: confidence percentage. ......................................................................................... 34 Figure 16 | Metric 1 scores across sites, horizontal line represents sore of all sites combined. ... 37 Figure 17 | Metric 2 scores across sites, horizontal line represents sore of all sites combined. .... 38 Figure 18 | Metric 3 scores across sites, horizontal line represents sore of all sites combined. .... 38 Figure 19 | Metric 4 scores across sites, horizontal line represents sore of all sites combined. .... 39 Figure 20 | Sobel Edge Detection from 200rd_13km site. .......................................................... 42 Figure 21 | Examples of how coverage is measured based on segment overlap. ......................... 47 Figure 22 | Average reflectance of species across all five bands of 10 most common species per site. ........................................................................................................................................... 49 Figure 23 | Classification Accuracy of Quick Shift Segments on individual sites. ...................... 62 Figure 24 | Classification Accuracy of Quick Shift Segments on groups of sites. ....................... 63 Figure 25 | Classification Accuracy of SLIC Segments on individual sites. ................................ 64 Figure 26 | Classification Accuracy of SLIC Segments on groups of sites. ................................. 64 Figure 27 | Relative Accuracy of SLIC over QuickShift on sites. ............................................... 65 Figure 28 | Relative Accuracy of SLIC over Quick Shift on site groupings. ............................... 66 Figure 29 | Quick Shift Oversampling Change in Accuracy ....................................................... 66 Figure 30 | SLIC Oversampling Change in Accuracy ................................................................. 67 viii Figure 31 | Relationship between the number of samples available for training, and the minimum overlap required. ....................................................................................................................... 68 Figure 32 | Relationship between the number of classes present in training data, and the minimum overlap required. ....................................................................................................... 68 ix Chapter 1 Introduction 1.1 Background Globally forests provide a range of essential ecosystem services that support communities, provide a wide range of regulating, provisioning and support services (Baskent et al., 2020; Taye et al., 2021), and are the basis of many economic sectors (Costanza et al., 1997; Millennium Ecosystem Assessment, 2005). In western Canada forest ecosystems are the providers of the timber supply that forms the economic foundation for many communities. Forest also help maintain the quality of water (Pearce, 2001) and air (Nowak et al., 2014) necessary for our survival. As time progresses there may be more significance placed on a variety of factors, such as wildlife habitat (Oettel and Lapin, 2021). In most regions of the world forests are explicitly managed by forest professionals with the aim of promoting sustainable forest ecosystems and often specific forest attributes and products, such as timber production (Boukherroub et al., 2017) or carbon storage (Lemprière et al., 2013). Forest management involves not only determining how forest should be harvested and regenerated, but also developing strategies that inform which forest or stands should be managed, when they should be managed, and what is the most appropriate management strategy (D’Amato et al., 2011). Throughout time there have been a variety of forest management goals (McGrath et al., 2015), however a constant has been a desire to maximize the utility of available forests, regardless of the desired utility at the time. As forest managers seek to utilize better decision processes, having access to the most accurate and detailed information about the environment is a necessity (Tompalski et al., 2015; 1 van Leeuwen et al., 2011). Understanding how well forests are regenerating requires knowing how many trees are present of each species, and their regrowth. As forests provide a variety of ecosystem services, they also have a variety of information that can help with determining these values. Ranging from elements such has how many of which trees are present, to more complicated metrics where regrowth rate predictions can be determined based on density of trees and other factors. The more fine-grained knowledge of the forest inventory allows for more precise management plans to maximize forest values. Historically, information on forest ecosystem condition and forest structure was obtained by having field crews conduct field surveys. These types of surveys can be very expensive to conduct, which thus limits the amount of knowledge available for making decisions. Historically the primary methods for obtaining tree species data includes ground survey-based identification of tree species, which is costly, time-consuming, and typically samples only a small percentage of the land base and extrapolates across the full area(British Columbia et al., 2007). Cost reduction has been achieved through manually interpreted air photos but at the cost of species accuracy (“Forest Health Aerial Survey Manual,” 2012; Seely, 1934); however, this involves a technologically complicated and expensive process. Traditional field surveys are limited in effectiveness due to both issues of scale and quality of data. The scale of data that can be collected is limited by the high costs of obtaining field data. Traditional plot sampling methods utilize relatively low sampling intensity combined with extrapolating the results across a disproportionately broad land base. Additionally, this extrapolation also introduces a degree of surveyor bias, which typically includes both subjective 2 personal interpretations of observations, as well as design bias resulting from avoiding difficult to access/remote areas vs an unbiased grid-based design over the entire population. 1.2 Technical Background Remote sensing technologies provide opportunities to survey forest blocks in their entirety as opposed to averaging sample plots against the land base (British Columbia et al., 2007). Knowing the vegetation species and quantifying its abundance helps to inform various types of models enabling us to better understand how to manage our land base. Particular to this research there is a goal of developing data that can be used in wildlife models focused on food availability and wildlife cover for ungulates (Brown et al., 2007; Terry et al., 2000; Whitman et al., 2017). This research presents additional value in providing detailed information on the state of regrowth in the cut block (Pitt et al., 2010; Weisberg and Bugmann, 2003). The ability to acquire higher resolution imagery in terms of both spatial resolution and spectral resolution opens new possibilities for observing the environment around us. Different methods of data collection allow for different questions to be answered based on what data is produced. As an example, one of the earliest forms of remote sensing is aerial photography, which can cover moderately large areas, with relatively high resolution. Air photos are most commonly available as panchromatic or true colour. Satellites can cover much larger areas, and often include multispectral data that is useful for environmental modeling, however this comes at the cost of spatial resolution making identification of smaller objects not possible. Recently we have seen a growth in the use of Remotely Piloted Aircraft Systems (RPAS) commonly known as drones. RPAS while limited in the amount of area that can be surveyed compared to a manned 3 aircraft are able to capture even higher spatial resolutions than traditional aerial photography, as well as some systems still providing a high spectral resolution. LiDAR (Light Detection and Ranging) is a very useful tool in forest classification (Brandtberg, 2007; Coops et al., 2016), as it provides insight into the structure of the tree canopy as opposed to viewing only the surface reflectance of the vegetation present. However, this system may begin to have trouble where unique structures cannot be determined. This can be especially true of early seral vegetation, where the individual trees may be too small to be viewing any form of structure in the trees due to lack of point density. Additionally, by using multispectral imagery for classification, there is the potential for future work leveraging the same raw data for forest health assessments in addition to classification. Using RPAS provides opportunities to examine the entire land base of interest allowing for a more thorough sampling. RPAS may carry a variety of sensors; some of these such as the MicaSense RedEdge-M used in this research are capable of sensing spectrums of light that are not visible to the human eye. The addition of nonvisible light spectrums adds more ability to discriminate species than with standard colour air photos intended primarily for human sight. Finally, due to the level of interpretation needed there is a potential for different surveyors to produce differences in their final report; adding a system of automation that can be executed across all sites can help to reduce operator bias. Reducing survey costs would allow land managers to collect more data allowing more refined management strategies. Machine learning (ML) represents a potential avenue for increasing the completeness and repetition of such surveys. Developing robust methods for 4 colleting remote sensing data is essential to the reliable implementation of machine learning for classifications. The first step in the process for using machine learning for classification is the collection of data. This can be split into two separate but related problems; what data to collect, and how to collect it. In terms of what data to collect we have a variety of methods at our disposal, ranging from photos, LiDAR, and Radar. Further expanded, each of these methods have multiple resolutions and modes available. Using the example of photos, it is important to select appropriate resolutions, and spectral ranges; then selecting the appropriate capture method such as satellite, manned aircraft or RPAS. Another crucial decision involves determining appropriate times to collect the data. Some of these factors might relate to weather, sunlight, seasons, or phenological cycle of vegetation imaged. Once data is captured it is then processed into usable products. For RPAS data this generally involves the use of photogrammetry to produce orthomosaics covering the entire capture area in a single image. It is at this stage that we may also apply radiometric calibrations, for RPAS data this is generally done by using a combination of images of panels with known reflectance, and sunlight sensors mounted on the drone (as is the case for the MicaSense cameras used in this research). Satellite data generally would undergo atmospheric correction, and other types of data would have other standard processes to prepare data for analysis. Following this, the machine learning pipeline will convert this standardized data into a suitable format for use within the algorithm. An example of this would be many of the algorithms in this research require that the images as a grid of pixels be converted into an 5 average reflectance for all pixels contained within an object. Once data is in the proper format, machine learning is then finally used to determine a class of the input data. Combining all these processing steps from data capture to classified data comprise of the analysis framework. While each step of the framework can be relatively simple to understand, they all work together, to produce the final outputs, where the quality of the final product is dependant on how robust the framework is and how well the various stages work together. A correct classification is dependant upon proper data collection, which is processed in such a way that accurate data is going into the classification step. Despite recent advancements in RPAS technology segmentation and classification still rely on surface reflectance as opposed to structural attributes. Leveraging differences in reflected light spectrums has been a staple of machine learning classifications in other fields; a trivial example of this is for self driving cars there is a different meaning to white and yellow lines on the road, even if they have a similar structural appearance. This can then be taken furth into using multispectral sensors to gain more accurate identifications in changing conditions (Takumi et al., 2017). However, in the field of forestry and environmental management more generally, using surface reflectance to identify features can be problematic as reflectance can change through the phenological cycle of trees. This can be made more challenging when looking at deciduous vegetation which tends to have less defined boundaries than conifer species. Additionally young trees being smaller provide a greater challenge in identification given the resolution of the data. This is further complicated by the natural variance between trees of a given species; and that 6 their presentation will also be affected by presence of disease, and differences in environmental factors such as access to water and nutrients. This research focuses specifically on early seral vegetation, which is and important food source for ungulates. Early seral is the first stage of vegetation in a forest lifecycle as it regenerates from cut to old growth. Characteristics of this vegetation are that it is small and may be less rigid or well defined than mature forest; studying this stage of forest growth presents some unique challenges to forests in general. While a more mature forest may have measurements of the structure from LiDAR (Holmgren et al., 2008), the small size of early seral vegetation can mean it has little difference in height between trees and the understory. Additionally, at this stage very high spatial resolutions are required to have opportunities to delineate edges of tree stems due to small sizes and less definition of edges than more mature forest. Furthermore, the species of particular interest in this research are deciduous where branches tend to overlap with neighboring trees further reducing the ability to cleanly delineate tree edges. This research presents methods for the collection and processing of RPAS multispectral imagery and applying and evaluating machine learning based classification of image segments representing early seral vegetation. The goal is to present methods that are abstractable; as imaging technology improves both in terms of spatial and spectral resolution, and as new machine learning algorithms are developed, these methods could be repeated, and evaluations be usable as a comparison of the potential improvements of new technologies. The research presented shows results for a specific type of imagery and a specific set of algorithms. It should not be interpreted as the optimal solution for early seral vegetation classification, but rather a proof of concept, and a starting point for continuous evaluation of technological possibilities. 7 Image segmentation is the process of splitting a whole image into discrete units known as segments. These segments can then be passed into other algorithms for further processing. While segmentation is a generally well researched field, this research looks to apply the techniques to the specific case of early seral vegetation. There are a wide variety of algorithms already developed e.g. Random [decision] Forests (Tin Kam Ho, 1995) or Support Vector Machines [Support-vector Networks] (Cortes and Vapnik, 1995), however not all approaches are equal, and different approaches work better for different problems. This research seeks to explore how some of the common approaches can be used to segment early seral vegetation. Quality analysis of segmentations is inherently difficult, as in almost all cases a decision about what is a correct segmentation needs to be determined. For the purposes of this research, human drawn polygons are taken to be correct. There are three primary errors that are faced with this form of truth; first humans may make mistakes in drawing the boundaries, this is true of any human derived dataset. second is the risk of oversimplification, as there is a tendency for humans to produce over simplified polygons that either over or under encapsulate the vegetation to produce polygons with fewer vertices. The final source of error faced is the impure pixels, in the context of this research project the multispectral data is made of pixels that represent a 3cm-6cm square on the ground; a problem may arise at the edges of the trees where a pixel will cover a percentage of the tree, and this poses another question of minutia; should these impure pixels be included, excluded, or a percentage of them included? 1.3 Project Roadmap At a high level this research seeks to provide a process for taking data collected from RPAS, along with ground truth data and produce models that show the amount of coverage of each species of vegetation captured in the data; along with an analysis of the accuracy of the data 8 produced. This is accomplished using a modular approach to the process of identifying early seral vegetation, such that any piece should be able to be replaced without altering other pieces of the process; thus, providing a path for natural evolution and continual growth. While this research was conducted specifically with a MicaSense RedEdge-M camera, the processing pipeline presented in this paper could work just as well with another camera system. Likewise, more machine learning algorithms could be added, or different scoring metrics used. As the processing pipeline is in place to take images with truth segments and output trained models with their achieved accuracies; determining if a change to the process is beneficial becomes trivial. There are two options for running the framework for early seral vegetation classification demonstrated in this research (Figure 1); the first is the Training Process where inputs of Imagery and Ground Truth Data are provided. This option can be used any time new training data is collected allowing for accuracy to be improved as training sets grow; or alternatively additional algorithms could be added to the test. An important point here is that thanks to the automatic storing this can be collected as a single monolithic script that while it would take days to even weeks to run; minimal human effort could allow for new optimization. Once a model has been trained it can be applied very quickly needing only segmentation and classification steps to be performed. Again, there would be opportunities here to automate this process such that when a new segmentation or classification algorithm is identified existing imagery could be reprocessed ideally providing enhanced results. 9 Figure 1 | ER Diagram of processing pipeline for both initial training, as well as applying trained models to novel data. Chapter 2 defines a framework for segmentation of early seral vegetation from RPAS Multispectral imagery. This will be done by establishing a method for evaluating the effectiveness of segmentations, and then segmenting the imagery for all the sample sites using standard segmentation algorithms. Then using the scoring techniques identify the most effective algorithm along with the parameters that can be used to produce the most effective results. Classification of imagery with RPAS based systems provides the flexibility to capture data as needed at moderate scales as well as providing various options regarding the type of imagery collected. In this work we examine specifically the ability to train models to identify tree species using the spectral information collected by a MicaSense RedEdge-M camera system. 10 This camera provides not just standard colour images, but also add Near-Infrared and RedEdge bands; as these bands are known to be useful in the analysis of vegetation (Schuster et al., 2012). Chapter 3 will build upon the results of Chapter 2, using the best segments produced as the basis for classification. In this chapter several common algorithms will be optimized, scored for accuracy, and compared for accuracy. A proposed framework for classification will be presented based on the combination of algorithms from chapter 2 and 3, along with discussion of the usability of these results. 11 Chapter 2 Image Segmentation for Early-Stage Vegetation in RPAS Imagery 2.1 Introduction Effective forest management requires widespread, yet detailed information of forest composition, to select the management techniques that will achieve the wide-ranging objectives of resource stewardship. This balancing of competing objectives is often addressed at the landscape level, where regions of forest are assigned, a class based upon species composition (Baleshta et al., 2015; Dhar, 2013), age of trees in the stands (Zheng et al., 2007), and the structural attributes of the forest (Bugmann et al., 1996). Forest composition can be segmented at a variety of scales ranging from landscape scale determining surface cover types (Walsh, 1980; Wulder, 2003), to surveys of individual sites, down to segmenting individual trees to build more detailed forest models. The use of RPAS systems combined with modern image analysis techniques allows for capturing data fine scale enough to separate individual trees from the forest; while also providing an efficiency required to survey large areas that would be cost probative with traditional survey techniques. The ability to segment all trees as individuals can then be used to produce highly accurate models of ecosystem dynamics(Seidl et al., 2012). The objective of image segmentation is to inspect an image and return boundaries delineating individual features from an image. An analogue to the research presented in this paper is Optical Character Recognition (OCR) (Gupta and Nair, 2005); a process where each letter is first segmented from the image for identification, as well as words are segmented from groups of symbols. Segmentation is also used in a variety of computer vision tasks such as recognizing licence plates, or barcodes. 12 Effective segmentation has a variety of challenges that must be addressed to ensure optimal accuracy. Ideally a system for segmentation will be able to identify features based upon similar objects, yet not identical; this requires the collection of very large datasets of tagged training data. This process can be made more challenging when tagging of training requires expert intervention; in comparison to a problem such as OCR training data which could be collected from anyone with the ability to read, an example of this is reCAPTCHA used on websites (Pettis, 2023). Additionally, segmentation relies upon the ability to separate objects from the background. Based upon the size and types of objects to be segmented along with the backgrounds upon which they are placed there is a variety of algorithms that may present different levels of success based on a given dataset. This research seeks to delineate individual stems (trees) within early seral vegetation. The identification of this vegetation is useful for managing forest regrowth; after cutting, forest fires or other disturbances displacing mature stands. Early seral vegetation is important as it provides sources of food and camouflage for animals within the stand and is a meaningful metric for habitat suitability. Further this vegetation will grow, and an accurate inventory can help to provide the base information needed for predictive forest models. Early seral vegetation also provides some distinct challenges for segmentation. Segmentation is generally less complex when uniform boundaries are present, however in the case of vegetation the structure of branches and leaves leads to a broken silhouette without clearly defined boundaries. As tree crowns grow and increase in size, the crowns of individual trees will begin to overlap, their boundaries become obfuscated or ambiguous. The other primary challenge faced is due to the scale; more mature stands will have taller trees with more defined shapes, and gaps between them, allowing for segmentation using spatial information in addition 13 to spectral information (Yancho et al., 2019). The small size of early seral vegetation necessitates the use of higher resolution data, but now the spatial component is less useable due to the granular blending of crown edges and the understory or forest floor. In this chapter I demonstrate scoring metrics that can be used to evaluate the quality of returned segments. The scoring metrics presented will be used to evaluate the effectiveness of four different image segmentation algorithms. These results are used to determine which algorithm is the most effective for the given dataset; in this case early seral vegetation captured using RPAS. While there are many more than four segmentation algorithms available, this provides a starting point for developing the methodology. The results for this section will include results for both the accuracy of segmentation; as well as how stable this accuracy is from site to site, an important factor for a generalized solution that could be implemented at scale. 2.2 Study sites The data used in this research was captured from a selection of sites both in the Ungulate Winter Range north-west of Mackenzie, as well as some sites closer to Prince George that provide easy access. The sites have all been logged approximately five years prior to data collection and are being monitored for their quality of Ungulate habitat as they regenerate. The sites contain a variety of young vegetation; of particular interest to this research are the deciduous species that will act as a food source and provide cover for ungulates in the area. Five target vegetation species have been identified as valuable for moose browse by the British Columbia Ministry of Forests: Trembling Aspen (Populus tremuloides), Red-osier Dogwood (Cornus stolonifera), Paper Birch (Betula papyrifera), Highbush-Cranberry (Viburnum edule), and Willow (Salix spp.). 14 Eleven case study sites were analyzed; these sites were situated in the central interior of B.C. and were in the Sub-boreal Spruce (SBS) biogeoclimatic zone (Beaudry et al., 1999). Table 1 | Study site data collection statistics with tree counts and capture date. Site \ Attribute Total Samples Target Samples Target % Capture Date 200rd 11529 2031 18% August 24th, 2020 700rd 3774 1977 52% September 29th, 2020 Alezza 5649 2064 37% September 25th, 2020 Bend05km 1521 636 42% September 21st, 2020 ChiefLake 12861 1491 12% August 24th, 2020 ConifexH47 1398 261 19% September 2nd, 2020 ConifexK14 1077 513 48% September 3rd, 2020 NorthFraser11 8331 3453 42% September 22nd, 2020 NorthFraser41 1677 270 16% September 14th, 2020 NorthFraser50 1107 468 42% September 21st, 2020 Olson5km 2862 567 20% September 22nd, 2020 Figure 2 | Map of locations where data was collected. 15 Table 2 | Extended site attributes; with information from British Columbia Vegetation Resource Index (Government of British Columbia, n.d.) Site \ Attribute Leading Species Planted Brushed BEC Zone NA Previous Dominant PLI SBS BEC Subzone mk 200rd Pl (33%) May 2016 700rd Cs (32%) July 2017 August 2014 SW SBS wk Alezza Sx (42%) June 2019 NA SX SBS wk Bend05km Sx (50%) July 2013 August 2015 BL SBS vk ChiefLake Pl (23%) June 2013 August 2018 -- SBS mk ConifexH47 Li (23%) July 2017 NA 1 SX ESSF mv ConifexK14 Bl (42%) NA 2 PLI BWBS dk NorthFraser11 Sx (42%) August 2017 June 2008 August 2010 PLI SBS mk NorthFraser41 Ri (21%) July 2017 August 2018 SX SBS wk NorthFraser50 Sx (44%) July 2016 August 2015 BL SBS wk Olson5km Bl (32%) June 2015 NA PLI SBS mk 2.3 Data Collection Aerial image data were collected by flying survey missions with a DJI Matrice 210 RPAS, and a payload of a MicaSense RedEdge-M multispectral camera. The collected imagery was then processed using Agisoft Metashape 1.5 processed on high quality for all stages. First calibrating the sensor against its Downwelling Light Sensor (DLS), matching tie points, then creating a dense point cloud. This dense cloud then has a ground filter applied, and a Digital Terrain Model DTM is produced using only ground points. Finally, an orthomosaic is produced by ortho-correcting and mosaicking the images collected and is then exported as a GeoTIFF for the segmentation process. 1 2 Records indicate most recent brushing completed after imagery collected, unknown previous brushing date. Records indicate most recent brushing completed after imagery collected, unknown previous brushing date. 16 In addition to the imagery a corresponding set of ground truth data was also collected. This dataset was generated by field crews marking in-person species assessment on an orthophoto. Once back in the office these spatial delineations were converted to a digital format using GIS software to draw polygons around each feature and adding a species code as an attribute to the polygon. It is recognized that this process of digitizing is not a pixel perfect representation as it occasionally excludes the tip of a branch or includes a small piece of ground. Later, subsequent use of this data will be evaluated on a per-pixel basis thereby adding an element of error to the results. However, classifying at an object level, and analyzing at a pixel level is believed to have minimal impacts, and represents errors that would also occur with existing fully manual methods of tree segmentation. To prepare the data for machine learning processing truth segments needed to be created, to represent the trees to be classified. After the drone imagery was collected, and processed; these orthomosaics were then imported into GIS software, where the crowns were manually delineated as polygons, including an attribute for species code (Figure 3). Figure 3 | Example of manually delineated crowns in red and corresponding species code labeling. 17 2.4 Segmentation Algorithms The algorithms used for image segmentation come from SciKit-learn and SciKit-Image. The motivation for this choice is based upon several factors including the relative popularity of SciKit, and the extensive documentation and community that comes with that widespread use. The open-source nature of the code makes it an ideal foundation to build upon, allowing for others to continue future research without needing to worry about licencing costs. Finally, SciKit provides many algorithms that utilize standardized API’s which allows an easy path to develop modular code where algorithms can be easily compared for the specific datasets being used. Finally, SciKit has support for PyTorch-CUDA, an industry standard library for GPU accelerated Machine Learning. All of the algorithms selected focus on spectral reflectance as opposed to those that look for structure such as the watershed algorithm on DEM’s as the trees examined were too small to be properly represented in surface models. 2.4.1 SLIC Simple Linear Iterative Clustering (SLIC) (Achanta et al., 2010) is a method for creating Super pixels, a computer vision term which is closely analogous to clustering in remote sensing. Super pixels can be thought of simply as a group of pixels sharing similar characteristics; these super pixels then have a border drawn around them, and this border is the segment. This method is based upon K-Means clustering (Pollard, 1982), and is used with a number of starting seeds, randomly placed though somewhat uniformly placed pixels, that are then grown by including pixels in the neighboring cluster that has the most similar properties to the given pixel. In the implementation of this algorithm in SciKit-Image the two primary parameters we can pragmatically test across are n_segments, and compactness. 18 N_segments is the estimated number of segments we expect the algorithm to output, this variable can be informed by knowing how large objects are in relation to the size of image. And compactness refers to how smooth we want individual objects to be, where maximum compactness would be squares (not circles as each segment must touch other segments on all sides and this algorithm is seeding from a grid). Lowering the compactness allows segments to contour to the objects being detected; however, this can also lead to more ambiguity as less significant features may be considered as edges. 2.4.2 Quick Shift Quick Shift (Vedaldi and Soatto, 2008) is a clustering algorithm (much like SLIC above) that works by calculating the average of clusters. However, due to the computational time required Vedaldi and Soatto proposed the addition of a Quick Shift that reduces the computational complexity of the algorithm. Quick Shift uses slightly different hyperparameters than before, where sigma, sometimes referred to as kernel size has a similar impact as N_segments had in SLIC. And the ratio hyperparameter has an effect like compactness, adjusting the ratio of importance for changes in colors vs changes in position within the image. 2.4.3 Felzenszwalb’s Efficient Graph (F-Graph) Felzenszwalb’s Efficient Graph (Felzenszwalb and Huttenlocher, 2004), works by looking at the image both in terms of regions and neighboring pixels. By examining the difference in intensities of regions we can find an estimate of where segments should exist in the image. The next step is to look at neighboring pixels to find where the intensity changes to determine the precise edge of the segments. 19 2.4.4 Mean Shift Mean shift (Comaniciu and Meer, 2002) is a clustering algorithm that like SLIC is a mean based clustering scheme. Where SLIC uses parameters to constrain the clusters and thus the computational time. Mean Shift has a single hyperparameter for bandwidth and was calculated at runtime by the SciKit library. This algorithm has a much higher computation time than the other algorithms, however, it may have benefits where prior testing for hyper parameters is not possible. 2.4.5 YOLO The final algorithm that was examined but was ultimately determined to be unsuitable at this stage is YOLOv3 (Redmon and Farhadi, 2018). YOLO works on a process of object identification as opposed to segmentation, and as a result the boundaries of detected objects are unclear as they are defined in terms of a bounding box as opposed to tracing the object. 20 2.5 Method of Evaluation To evaluate the effectiveness of the algorithms a training mask is used to compare their outputs. The output of the segmentation algorithms is a bitmap (image) where each pixel’s value is linked to a unique id for a given segment. The training mask is derived from the ground truth data collected and has identical resolution to the output of the segmentation algorithms. The training mask can later be used as a template for scoring algorithms. An important note here is that the classification and training mask must be of identical position and number of pixels. These images are then stacked, and scoring is based upon the relative similarity of the images. By relative similarity it is meant that pixels will not have identical values, however it should be possible to map them. That is if a pixel in the segmentation has value α, and if the corresponding pixel in the training mask has value β, then for every pixel in the segmentation with value α, the corresponding pixel should also be β. For cases where only some pixels with value α map to β we know an error has been made; likewise, when multiple values from the segmentation mask map to the same value in the training mask an error has been made. Figure 4 | Example of perfect over-segmentation; each value in right table is mapped to only one value in left table. 21 The segmentation methods are applied using the Python frameworks Sci-Kit and ImageAI. These frameworks provide the base implementation of the algorithms used and are designed to accept hyper-parameters that can be used to fine tune the algorithms. The advantage of using such frameworks is that they provide for rapid development, unified API’s making it easy to consistently change between algorithms and are optimized code allowing for computation to be completed in a reasonable amount of time. 2.5.1 Base-Metrics The metrics listed below represent various methods of comparing the segmentation to training mask, each placing value on different types of errors. Used in combination these metrics put more weight towards errors that would be more detrimental to the results. A False merge (FM; Figure 4) is the case where a segment encompasses more than a single object. For example, a segmentation that includes multiple trees, or the ground surrounding the tree. False mergers are a measure of entropy associated with segmentation and are a representation of over-segmentation (many objects per segment). For this metric a lower score is better, with 0 implying that there is no over-segmentation present which is to say there is never a segment including more than one tree. (Meilă, 2007) False splits (FS; Figure 4) occur when a single object is represented by multiple segments. False merges are a measure of entropy associated with the segmentation and are a representation of under segmentation (many segments per object). For this metric a lower score is better, with 0 implying that there is no under-segmentation present. (Meilă, 2007) 22 As a note if both False Merge and False Splits were 0 that would suggest that the output of the segmentation was a pixel perfect representation of the training mask (Figure 5). Figure 5 | Examples of Segmenations on data; colors represent distinct data, dashed lines determined segments Adapted Random Precision (ARP) is the probability that a pixel of a given class in the results is the same class in the truthing data, and normalized over the total number of pixels in the classified data, in simple terms this can be thought of as the percentage of pixels correctly segmented. (Arganda-Carreras et al., 2015) Higher values are better for this base-metric. This can be thought of as like false merges, in that if a segment is too large it will lower the ARP, by a proportion of the area of over segmentation as opposed to the quantity of segments. Adapted Random Recall (ARR) is the probability that a pixel of a given class in the results is the same class in the truthing data, and normalized over the number of pixels in the truthing data. (Arganda-Carreras et al., 2015) Higher values are better for this base-metric. As ARP was to FM, ARR is to False Splits, again looking at the number of pixels that have been placed into alternate segments as opposed to the number of additional segments produced. 23 ARP and ARR are closely related statistics and thus it may be useful to take their average. Adapted random error is defined by the equation. 𝐴𝑅𝐸 = 1 − 2(𝐴𝑅𝑃∗𝐴𝑅𝑅) 𝐴𝑅𝑃+𝐴𝑅𝑅 , (Arganda- Carreras et al., 2015) and in this case, we seek a lower value, with 0 being the perfect classification. In the case of 0 all pixels can be directly mapped from the classified data to the training mask. Table 3 | Figures in table below each demonstrate the scoring of a base metric as the SLIC hyper-parameters are changed. Figure 6 - SLIC False Merges, as segments per hectare increases false merges decreases Figure 8 - SLIC Adapted Random Precision decreases with segments per hectare Figure 9 - SLIC Adapted Random Recall increases with segments per hectare Figure 7 - SLIC False Splits, as segments per hectare increases False Splits increase Figure 10 - Adapted Random Error, local minimum is achieved with segments per hectare. This is the only base metric that the optimal solution does not extend to limit 24 In the table above we see some graphs which show the reaction of the metrics against the hyper parameters used in the SLIC algorithm (discussed later). These graphs are useful for visualizing how False Merges and False Splits compete against each other, as do Adapted Random Precision and Recall. Adapted Random Error is the only metric that does not extend towards a limit as it is made up of two competing metrics. It is for this reason that these metrics must be defined to produce useful results. 2.5.2 Evaluation-Metrics For this research, scoring methods that help to moderate the base-metrics were needed. At first glance it is easy to think that the goal is to min/max the metrics used, however these metrics taken to the extreme (perhaps except for Adapted Random Error); will not actually provide useful results. To illustrate this, consider false merges, if we simply make every pixel its own segment there will be no false merges; likewise making the entire image a single segment will present as no false splits. ARP and ARR also suffer from this however to a lesser extent as we recognize that the training data has multiple classes, as such simply saying the whole image is one segment means that some of the classes must be wrong; however, even here caution must be used in relation to unbalanced data sets. If our training data is for example 75% class α, then a single segment for the entire image could still represent a 0.75 ARR. Five base-metrics were utilized to evaluate the quality of the segmentation algorithms and the segments they produced: False Splits (FS), False Merges (FM), Adapted Random Error (ARE), Adapted Random Precision (ARP), and Adapted Random Recall (ARR) (ArgandaCarreras et al., 2015). While each of these metrics provide insight into the general accuracy of the segmentation algorithms, they represent different and sometimes contrasting accuracy components. For example, a segment that covers the entire image contains no false splits; while a 25 segmentation where every pixel is a segment would contain no false merges. Given this quandary it becomes necessary to find a formula where a balance is provided between these metrics. Subtracting the False Merges reduces the favourability of parameter sets that have segments which are too large. The theory here being that it would be better to have multiple segments per tree, than multiple trees per segment or worse yet, including ground in the segments. The logic behind this assumption is that a segment representing half of species γ should still present most of the characteristics of a segment representing an entire species γ, though potentially with lower deviation. In contrast, a segment that includes both species γ and species ε will have characteristics that are an average of the two species. This would make classification extremely difficult as this segment would not be like any of the training samples. Four equations are proposed to merge the metrics to a form where a maximum can be found as the desirable solution. The one exception to this is that Equation 1 converges towards a limit, for this equation the most desirable state is the first occurrence of this limit, making this equation a little bit harder to work with than the other three. 2.5.2.1 Metric 1: “Small Segments” MAX (ARP - FM) This metric is chosen for having a strong emphasis on reducing the amount of oversegmentation in the image, this is based on the theory that a tree spilt into multiple segments could still have all segments classified as that species. However, multiple trees in the same segment could never be classified correctly). 26 2.5.2.2 Metric 2: “Average of Metrics” MAX (-ARE - FM - FS) This metric represents an evenly weighted combination of all metrics. 2.5.2.3 Metric 3: “Weighted Small Segments” MAX (-FM - 0.5 * FS)) This metric puts a strong emphasis on avoiding false merges however does still consider false splits, just to a lesser degree. This represents that while less impactful false splits may still be damaging to the classification. 2.5.2.4 Metric 4: “Weighted Average of Metrics” MAX (-ARP - 0.5 * ARR - FM - 0.5 * FS)) This final metric can be thought of as a combination of Eq2 and Eq3, where all metrics are considered but those with higher weighting giving to those metrics that avoid false merges. 2.5.2.5 Stability: For the purposes of this paper stability is the absolute value of standard deviation divided by score; this is meant to provide a way of comparing scores from metrics that may not produce results within the same range. 2.5.2.6 Sharpness: Sharpness is calculated as the mean of edge intensity using Sobel’s filter and is used as an indication of how distinctly edges are presented within the image. This is a method that has been used for simple camera autofocus; and in this case is attempting to evaluate the amount of motion blur present in the image. However, it should be noted that the content of the image affects the sharpness and thus is only a proxy for amount of motion blur. 27 2.5.3 Compilation of Results Experiments are run on all study sites, with all 4 algorithms, where the hyperparameters were selected by distributing tests across the entire range of possible settings. The result of this experimentation is over 140,000 tests run producing a very large data set; results are picked from this set to distill a more usable set of results. For each pair of metric and algorithm the results tests are extracted using an SQL query that picks the highest score for each site, as well as the hyper parameters that produce the highest average score when those parameters are applied to all sites. With 4 algorithms, 4 metrics, and 11 sites yields 4 * 4 * (11 + 1) results providing a much more manageable dataset to draw conclusions from. 2.6 Results of Image Segmentation Algorithms Results are based upon the highest scoring hyperparameters for these sites, with complete tables located in Appendix C through I. 2.6.1 SLIC To make effective use of SLIC it is helpful to have an estimate of the number of segments that should be expected in the image. This was calculated by demining the average size of a feature from the ground truth data, then determining how many times that area could be placed in each hectare. This parameter was corrected in terms of hectares, as opposed to over the whole image to help provide transferability to the results. Analysis of the training data shows that the average canopy size of trees at the study site is 0.519m2 or 9576 clusters per ha. Table 4 | Average segmentation score for SLIC across all four metrics SLIC Metric Metric Metric Metric 1 2 3 4 Average -0.1137 -0.4829 -0.3045 -0.2907 SD 0.0601 0.2835 0.1797 0.1872 Stability 0.5283 0.5871 0.5902 0.6439 28 Table 4 above shows the scores attained with SLIC, along with the deviation in those scores and a third value representing Stability. A full list of site scores is available in Appendix C. Figure 11 | Results of SLIC segmentation on 200rd site optimized for metric 1 in red, training segments in black. Overall, the results show that the cluster scale has a heavy impact on the final segmentation producing what is nearly a grid, with segments larger than the smallest trees, yet still requiring many for the larger trees. This method could theoretically be advanced further by looking at cluster merging algorithms to merge the larger trees into single segments, however at this time as we are currently more concerned with coverage of species than actual counts; this will be left to future work should tree counts become a desirable attribute. Additionally, it may be easier to merge the trees post species classification as it can be assumed that only segments of the same species should be merged. 29 2.6.2 Quick Shift Quick Shift shows a much a more complicated pattern of results however it is not visually obvious that the segments are following tree crowns and many of the segments cover both crown and not crown. Compared to SLIC above there is generally higher stability but at the cost of lower scores (except for Metric 1). A complete list of site specific hyperparameters can be found in Appendix D. Table 5 | Average segmentation score for Quick Shift across all four metrics Quick Shift Metric Metric Metric Metric 1 2 3 4 Average -0.0755 -0.5020 -0.3464 -0.3315 St. Dev 0.0295 0.2610 0.1195 0.1255 Stability 2.5569 1.9230 2.8991 2.6417 Figure 12 | Results of Quick Shift segmentation on 200rd site optimized for metric 1 in red, training segments in black. 30 2.6.3 F-Graph Felzenszwalb’s Efficient Graph produced interesting results; looking at Figure 13 it can be seen while many smaller trees were segmented, it simultaneously missed segmenting some trees completely including some of the larger ones. A complete list of site specific hyperparameters can be found in Appendix E. Table 6 | Average segmentation score for Felzenszwalb's Efficient Graph across all four metrics F-Graph Metric Metric Metric Metric 1 2 3 4 Average -0.0823 -4.7949 -2.3340 -2.8789 St. Dev 0.0496 2.8222 1.2867 1.1941 Stability 1.6588 1.6990 1.8139 2.4109 Figure 13 | Results of Felzenszwalb’s Efficient Graph segmentation on 200rd site optimized for metric 1 in red, training segments in black. 31 2.6.4 Mean Shift Mean shift was completed using SKLearn’s estimate bandwidth function, however given how poor the results of the algorithm were, there was additional tests done manually setting bandwidth at various points ranging from 1 to 1000, however this did not produce any improvement in results, only a reduction in computation time as it bypassed the estimate bandwidth function. Mean Shift missed segmenting nearly all trees. The individual site scores are available in Appendix F. 32 Table 7 | Average segmentation score for Mean Shift across all four metrics Mean Shift Metric Metric Metric Metric 1 2 3 4 Average -1.8970 -3.3371 -2.9115 -4.1571 St. Dev 3.3994 3.8410 3.3919 3.1491 Stability 0.5580 0.8688 0.8584 1.3201 Figure 14 | Results of Mean Shift segmentation on 200rd site optimized for metric 1 in red, training segments in black. 2.6.5 YOLO Figure 15 below shows the results; each object is given a bounding box, each box belongs to a class (maps to species of training data), followed by the likelihood that the object is of that class. One interesting result that was captured here however, is that attempting to train the algorithm to detect trees in general (tree vs no tree), was essentially a complete failure. By providing the training data classified as species, it was able to provide a higher success rate, though with a great deal of variance between species. 33 detection_model-ex-006--loss-0033.073.h5 Evaluation samples: 300 Using IoU: 0.5 Using Object Threshold: 0.3 Using Non-Maximum Suppression: 0.5 0: 0.0550 - Cs / Red-osier Dogwood (cornus stolonifera) 1: 0.0000 – Ac / Cottonwood (populus balsamifera) 3: 0.0000 – Bl / Subalpine Fir (abies lasiocarpa) 4: 0.2057 – Ep / Paper Birch (betula papyrifera) 5: 0.0000 – Fd / Douglas-Fir (pseudotsuga menziesii) 7: 0.1273 – Sx / Hybrid White Spruce (picea glauca x engelmannii) mAP: 0.0647. Figure 15 | Example of YOLO Segmentation and Classification, bounding boxes are labeled with species code: confidence percentage. 34 The detection rate of YOLOv3 was overall very low, however it does show a strong contrast between the ability to classify different species of trees. Due to its distinctly different format compared to the other algorithms, and the low accuracy of results this algorithm was not explored further for this paper. 2.6.6 Summary Table 8 below provides a summary to make an easier comparison between methods. In Table 8 each metric has an average accuracy for all sites where each site used optimal parameters, as well as an aggregate value where the same hyperparameters were used for all sites. It is expected that the Aggregate scores will be lower than Average score, as it is less tuned to specific sites. Table 8 | Summary of Algorithm performance across all four metrics. Average is the score of each site segmented independently and scores averaged, Aggregate is the score when the same hyper-parameters are used on all sites then averaged. Average Aggregate SLIC -0.11367 -0.1161 SLIC Average -0.48293 Aggregate -5.0441 SLIC Average -0.30448 Aggregate -2.5232 SLIC Average -0.29074 Aggregate -3.1395 Average Scores Metric 1 QuickShift F_Graph -0.07550 -0.08227 -0.0763 -0.0897 Metric 2 QuickShift F_Graph -0.50196 -4.79493 -5.6223 -5.3433 Metric 3 QuickShift F_Graph -0.34644 -2.334 -2.765 -2.7298 Metric 4 QuickShift F_Graph -0.33147 -2.87894 -3.2964 -3.2869 MeanShift -1.89704 -1.897 MeanShift -3.33711 -3.3371 MeanShift -2.91146 -2.9115 MeanShift -4.15706 -4.1571 35 Based upon the scores above we can see that SLIC had the highest scores for metrics 3 and 4 and was a strong contender on all metrics; Metric 2 does show Mean Shift with the advantage for average score. Finally, Quick Shift as the leader for Metric 1 and showing up in 2nd place for average score on all metrics. To further examine this is it helpful to understand how the stability of the metrics between sites, as the Aggregate functions are what would be applied to novel sites without training data and having stable results provides expectations of the data quality. Table 9 | Stability Matrix for segmentation algorithms and scoring metrics, higher stability indicates more consistent results between sites. Stability SLIC Quick Shift F Graph Mean Shift Metric 1 Metric 2 Metric 3 Metric 4 1.892777 1.703242 1.69425 1.553036 2.55691 1.923048 2.899101 2.641684 1.658837 1.698979 1.813909 2.410898 0.558048 0.868819 0.858369 1.320089 Table 9 shows that Quick Shift using Metric 3 provided the most stable results, with Metrics 4 and 1 claiming respective 2nd and 3rd place and was also the top performer on Metric 2. Conversely Mean Shift took last place in every category for stability. 2.6.7 Inter-site results The results were also calculated with the case that every site is scored separately, however all using the same hyper-parameters. By using the same parameters for all sites, it provides an indication of expected results if segmenting a novel image without training. Calculated hyperparameters are in Appendix G, Appendix H, and Appendix I. 36 Figures were created for each of the four-scoring metrics. Metric 1: Figure 16, Metric 2: Figure 17, Metric 3: Figure 18, and Metric 4: Figure 19. These graphs show the accuracy of the site specific hyperparameters as blue dots, and the aggregate hyperparameters as red dots. The vertical distance between the blue and red dots shows the loss in accuracy from using aggregate hyperparameters. Figure 16 | Metric 1 scores across sites, horizontal line represents sore of all sites combined. 37 Figure 17 | Metric 2 scores across sites, horizontal line represents sore of all sites combined. Figure 18 | Metric 3 scores across sites, horizontal line represents sore of all sites combined. 38 Figure 19 | Metric 4 scores across sites, horizontal line represents sore of all sites combined. Figure 16 shows that there is almost no difference on score whether using site specific or aggregate hyperparameters, while all others show better scores when using site specific, except for Mean Shift which does not have hyperparameters and has identical results for all four metrics. To help examine why some sites have larger differences in single site as opposed to aggregate hyperparameters, several additional factors were compared for correlations with aggregate scores in Table 10. The various sites represent a variety of conditions, variations in species composition, as well as age distribution (here represented as tree size). 39 Table 10 |Correlations between Metric 1 score and site attributes Algorithm Conifer Coverage % Deciduous Coverage % Plant Coverage % Leading spp. Coverage % 2nd spp. Coverage % 3rd spp. Coverage % Tree Size Tree Size Variation Tree Count Sharpness SLIC Fgraph 0.0184 0.0888 0.3397 0.4988 -0.157 -0.2813 Quick Shift 0.353 0.5964 -0.5394 Mean Shift -0.0489 0.4531 -0.149 -0.1655 -0.1387 0.5085 0.4722 0.5895 0.6098 -0.2081 -0.2692 -0.1079 -0.102 -0.5408 -0.51 -0.3125 -0.3462 0.1435 0.6989 0.4173 -0.4849 -0.0487 -0.4105 -0.4116 0.0952 -0.0685 0.3034 -0.5384 -0.3578 0.3193 0.0624 Some interesting notes on the correlations are how high third leading species seems to correlate with segmentation score. Tree Size and Variation have low correlation with segmentation scores. 2.7 Discussion With the analysis completed, it is interesting to note that based on Metric 1 quick shift produces the best results. However, when comparing the results between SLIC and Quick Shift (available in appendix), SLIC’s segments give the appearance of segments closer in size to the trees than with Quick Shift, Quick Shift’s lack of over segmentation places it higher in the scoring system. Looking at this in a wholistic sense moving forward it will be essential to take a few of these metrics forward into future classification work to see which works best in the final classification. Moving forward to classification of segments, SLIC and Quick Shift, optimized to Metrics 1 and 3 seem the most beneficial to proceed with. Between these two metrics, Metric 3 was chosen for its higher stability especially when used with the Quick Shift segmentation 40 algorithm. SLIC consistently produced the highest scores, additionally its uniform distribution may be beneficial as there are generally few crowns per segment due to the small size of each segment. While there is likely an optimal combination of the base-metrics for the sake of producing an initial dataset that can be used to start training models, the naïve approach and equal rating of all 5 parameters was the initial attempt at finding a balance to these extremes. However, upon further testing, based on a visual examination using the ARP and subtracting the FM, seemed to produce the most usable results. The visual examination was focused mainly by looking at segmentations where the lines most closely followed the edges of the trees and were small enough to capture the smaller trees. However, at this stage it is impossible to definitively say this is the ideal metric, as it is unclear whether the visual intuition will translate directly to classification performance, and it was only practical to review a subset of the 135,000 experiments that were completed. A variety of metrics for determining the best segmentation were used to select those experiments from the database that would receive a manual review. The parameters that maximize the metric ARP – FM produce segments that most closely align with the original training data. However, it should be noted that this formula was chosen by visual qualitative analysis of various simple potential metrics. This formula is likely not the ideal metric for optimization of segmentation, however there is a bit of a “which comes first: chicken or egg” problem at this stage; although the goal is to determine which segments produce the best classification; segments are concurrently used to develop the classification algorithms, which then are used to quantitatively score the segmentation properly. As such there is room for future work once the classification algorithm is developed to then run various segmentations against this classification to find an optimal solution. 41 It is notable that none of the algorithms produced cleanly delineated tree crowns. A possible explanation for this is the lack of sharpness between the crowns and surrounding vegetation, as these algorithms rely upon edge detection to place the segment boundaries. As an example, (Figure 20 below) is a Sobel Edge filter applied to the 200rd_13km site, displayed as intensity of edge. This image covers roughly the same area as figures above showing segments, however it is very hard to identify crowns at all in this image. Figure 20 | Sobel Edge Detection from 200rd_13km site. Another point of interest in this research is that SLIC performed better than anticipated, as it is the simplest of the algorithms it was selected to provide a control of sorts, to show the naive solution. However, it has shown itself as the strongest contender on three of the metrics, and 2nd on the fourth metric. There is also future work looking at segmentation that works at multiple scales as there are large discrepancies in size between the smallest and largest trees, presenting a challenge to optimizing for segment size. Alternatively, if a particular size of tree is of interest it could be 42 useful to filter the training data to train base only upon trees of that size; giving up accuracy for the set of all trees to gain accuracy for a subset that is the target tree size. Finally, it should be emphasised that many training sites must be used in the development of a set of generic aggregate hyperparameters. Initial testing for this paper was conducted using a single site (North Fraser 41km), and led to early results that were dramatically different than looking at the set of sites, and even at this stage eleven site is still a small sample size, and while it may be tempting to say that the sites [bend_0-5km, conifex_k14, 700rd_28km, conifex_h47, 200rd_13km, alezza_lake north_fraser_11km] are the most common presentation of sites, it cannot be ignored that four of the sites performed atypically, and three of these relatively similar. 43 Chapter 3 Image Classification for Early-Stage Vegetation in RPAS Imagery 3.1 Introduction Classification is the process where the computer evaluates information and places that information into a class with the objective of providing meaning or context to that information. Classification is the fundamental problem that machine learning was pioneered upon. The origins of machine learning has been proposed as Frank Rosenblatt who developed the ‘perceptron’ in 1957 for the detection of letters of the alphabet (Fradkov, 2020). As research advances machine learning is being used to classify an ever growing variety of objects, such as automated counting of cars (Biswas et al., 2017), monitoring sea ice (Dumitru et al., 2019) and forest fire detection (Seydi et al., 2022). The same way we as humans use our perception of the world to identify our surroundings, a computers ability to classify information allows it to solve problems within the real world. The classification of information is an extensive discipline able to use a variety of techniques to work with a range of different data types. In this thesis a specific vein of research is pursued; specifically, the techniques that look at regions of images that can be processed as tabular data for classification. The motivation to perform these classifications automatically is twofold. The first and more obvious motivation is that the identification of thousands or (if done at scale) millions of samples is a very mundane job which would be cost prohibitive and likely unfulfilling to have completed manually by human operators. Secondly the types of data produced are at a 44 dimensionality that is difficult to express to human operators; allowing the computer to have a more precise view between samples in a much faster time. No classification system will yield perfect results; in practice a human would perform tree identification with much greater accuracy. However, a human would simply not undertake the task of individual tree identification at the landscape level. In this respect the advancement of research looks to make possible access to information at scales effectively not currently possible; despite the seemingly simple problem presented. An example of why this distinction matters is the development of Optical Character Recognition (OCR) that allowed for the digitization of entire libraries; while true reading and typing books is not a particularly challenging task, completing this process for entire libraries would have been effectively impossible without the aid of machine learning. The ability to take imagery and classify it by vegetation type will provide land managers with new opportunities for decision making about their land base. An understanding of what vegetation is present, and to watch the development over time will allow for better understanding of how healthy the environment is; and make choices about how to best enhance ecosystems for a variety of factors. Some of these factors may include keeping it suitable for wildlife, ensuring economic value in the timber supply, or considering vegetation type impacts on forest fire susceptibility. This chapter aims to answer the following questions: - How should data be segmented? - Should algorithms be used to balance the quantity of training data? - Which algorithm should be used as input data for use with classification? 45 - Are trained models re-usable? - What are the key areas for improvement? 3.2 Definitions The definitions presented below are intended to help provide clarity to readers as to the specific definitions being applied to terms in this thesis. These are especially important as the terms used can carry domain specific definitions that may not be consistent across the domains of study involved: Remote Sensing, Machine Learning, Ecology, Computer Vision, etc. The details within the definitions also help to provide clarity to the methods presented. Feature: an instance of what is being modelled in the real world, in the case of this work a feature is an individual tree. Segment: a region of imagery; these may be hand drawn (Truth Segments as seen in Figure 3); or automatically produced by segmentation algorithms. In an ideal circumstance there would be a 1:1 mapping between features and segments, however in practice this is rarely the case. Sample: the information about the intersection of imagery and segments; the machine learning algorithms used here do not work directly on imagery, but rather numeric information. Samples are produced by calculating raster statistics for each segment. Additionally, Truth Segments contain information about species classes of the feature they represent; algorithm generated samples contain information about overlap between the given sample and intersecting Truth Segments. Coverage: For data that has been automatically segmented the boundaries will very rarely be fully contained within ground truth polygons, as such it is beneficial to calculate how much of the area in the segment being trained on is represented by the labeled species. Note that coverage 46 is a measure of sample purity only and does not convey how much of the original feature is contained by the segment; see the first and last examples in Figure 21, both have the same coverage, yet the first example covers more of the original feature. Limitations of assigning a single species to each segment are that it does not account for the case where segments overlap two trees of the same species, coverage will be calculated only as the single tree with the greatest coverage, potentially underrepresenting coverage. Figure 21 | Examples of how coverage is measured based on segment overlap. Dimensionality: Refers to the number of attributes each sample contains, for example five bands each with a mean and standard deviation would provide ten dimensions to the training process. High dimensionality is a relative term generally comparing the ratio of dimensions to training samples, this is useful in conveying comparisons of the general behavior of various algorithms. Model: the result of using training algorithms against the training dataset; the resultant model can then be used to classify novel data. Overfitting: an occurrence in models where the trained model can easily identify the training data but fails to maintain accuracy on novel datasets. Conceptually this could be thought of as a fit model being able to identify people in photographs, where an overfit model may only identify specific persons as people. In simpler terms a fit model would be able to identify aspen trees, 47 while an overfit model would only identify a specific subset of aspen trees; likely those contained in the training data or possessing an extremely similar representation. 3.3 Data Preparation The data used in this Chapter is the same dataset that was used in Chapter 2, with the results of Chapter 2 preparing the data for further analysis. More generally before the data can undergo object-oriented classification, it must be provided to these algorithms as segments of the image, where each segment will be treated as a unit to be segmented. This also begins to show the modular approach of the overall process from data collection to classification in that any process which segments the image could be used in place of the results of Chapter 2 as the input data for this section. 3.3.1 Zonal Statistics To prepare the previously segmented imagery for classification it must be converted from raster into tabular data. This is done by overlaying the segments on the orthomosaic, calculating statistics for each band, and saving the results into a database table. The statistics calculated here are minimum, maximum, mean, number of pixels, standard deviation, median, and range for all pixels that fall within the segment. In addition to the zonal statistics the table includs site, date of capture and how many bands are present. This information will be required as the geometry of segments will not be included in the algorithm training process. 3.3.2 Species Coverage It is also important that we know what species are represented by the recorded statistics; this is achieved by overlaying the segments with the ground truth data. Each segment is assigned 48 a species code based upon which shape in the ground truth data has the most overlap; percent coverage is calculated as what percentage of the new segment is contained within the truth segment with the greatest overlap. Table 11 | Table of sample counts by site, both total samples as well as counts of only target species Samples in each Site All Target 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 11529 3774 5649 1521 12861 1398 1077 8331 1677 1107 2862 2031 1977 2064 636 1491 261 513 3453 270 468 567 Table 11 shows how many training samples are present in each site of both all samples as well as samples of the target species. This will be used later to see if there is correlation between quantity of training data and accuracy. As well as how the ratio of target species affect the target accuracy specifically. Average Reflectance 250000 200000 150000 100000 50000 0 Aa Al At Bl Ep Fd Li Pl Sx W Figure 22 | Average reflectance of species across all five bands of 10 most common species per site. The sites varied considerably in terms of quantity and quality of data. Even after calibration, 200rd, Chief Lake, and Olson5km have significantly lower reflectance than the other sites (Figure 22). This is believed to be due to issues with the calibrated reflectance panel with the RedEdge-M camera. Through other ongoing research projects, it has been observed that newer versions of the MicaSense cameras utilizing version 2 of the DLS sensor provide more consistent and reliable calibration results. This lack of consistent calibration is likely to be the cause of negative impacts on accuracy of models trained on multiple sites. 49 3.4 Classification Algorithms In this research a variety of common classification algorithms were tested and compared to each other for effectiveness at classifying early seral vegetation. While machine learning is a powerful tool for solving a variety of problems, not all methods of learning work for all types of data. Different algorithms may perform better or worse depending upon the amount of noise in data, number of variables present, correlation between input variables amongst others. 3.4.1 Support Vector Machines (SVM) Support Vector Machines (Cristianini and Shawe-Taylor, 2000) are a binary nonprobabilistic classifier building upon the work at UC Berkley and Bell Laboratories (Boser et al., 1992) building classifiers that optimize the margins between classes. SVM classifiers are particularly suited to problems with high dimensionality (Erfani et al., 2016). However SVM classifiers also have the drawback of being a binary classification such that if there is a goal of detecting multiple classes, multiple classifiers must be used with a method of combining results (Duan and Keerthi, 2005). SVM could represent an advantage in this study due to the high number of input variables, with five image bands, each containing multiple statistics (mean, median, SD, etc.) and avoiding the ‘curse of dimensionality’ (Friedman, 1997) is valuable; conversely it is also hypothesized that if the representation of a given species were to change due to lighting conditions or time in the phenological cycle, the classifier could have a challenge differentiating these different representations. Implementation Used: https://scikit-learn.org/stable/modules/svm.html. With gamma parameter set to auto. 50 3.4.2 Random Forest (RF) The Random Forest Classifier (Breiman, 2001) is one of the more common machine learning algorithms used today. Random Forest classifiers have a relatively high resistance to overfitting as well as resistance to outliers compared to some other methods; however random forests can be heavily impacted by the curse of dimensionality especially when using imbalanced quantities of training data (Evangelista et al., 2006). Random Forest is a type of ensemble learning building upon a traditional decision tree. To build the random forest multiple random subsets of data, a random subset of variables is used to build a decision tree; this collection of trees comprises the forest. When classifying new data each sample is run through all the decision trees and the most common result of all trees is the identified class. This method of building prevents overfitting as the complexity of each individual tree is contained and can provide resistance to outliers as even if a tree is providing invalid results, it can still be overpowered by the rest of the forest. Conversely highly imbalanced data may produce poor results as the trees are built by forming even distribution at each decision point which may force the algorithm to split inside of classes as well as provide an over abundance of leaf nodes for the overrepresented class. Additionally, the curse of dimensionality can be particularly challenging here as redundant variables will cause more decisions to be made on the redundant information potentially reducing the impact of other important variables. For the purposes of this study, it is expected that Random Forest will have a good ability to handle the large number of input classes, as well as handle outliers such as a tree that is unhealthy or in shadows on the imagery. It is also susceptible to the unbalanced quantities of training data, and that all five bands of imagery will be highly correlated due to relative closeness in spectrum especially in the visible bands. 51 Implementation Used: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. With number of estimators increased from 10 to 1000. 3.4.3 K Nearest Neighbors (KNN) K Nearest Neighbor classification (Taunk et al., 2019) is a common algorithm used in the field of remote sensing, and is often utilized due to its ease of implementation and computation; yet is still able to produce useful results in many cases. This algorithm works by placing all samples into n-dimensional space where n is the number of input variables provided, along with K randomly selected seeds. Each iteration every sample is placed into the cluster it is nearest to based on an n-dimensional vector; then each seed is updated to be the center or average of all points in the given cluster; this process is repeated either a fixed number of times or until stability is reached in which cluster each sample belongs to. After training new data is simply assigned to the cluster to which it is nearest. KNN is utilized in this research as a baseline of a relatively naive approach to classification, making it a form of base line for the lowest effort approach to classification. Implementation Used: https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html. 3.4.4 Multinomial Naive Bayes Probabilistic Classifier (MNB) The Multinomial Naive Bayes Probabilistic Classifier works by calculating the normal distribution of values for each class and then places each new sample into the distribution where it has the highest probability of matching. Naïve Bayes classifier is well suited to handling the curse of dimensionality, however, it also assumes that all input variables are independent. For the 52 purposes of this research this has the potential to be beneficial as the data has relatively high dimensionality, however at the same time may suffer in the ability to tell a more reflective species from an over-exposed photo as the relative weights of variables are not considered. Implementation Used: https://scikit-learn.org/stable/modules/naive_bayes.html. 3.5 Minimum data requirements At the scale of the current research the ground truth serves as a baseline for classification effectiveness with the given dataset. At the scale of the broader project the accuracy of ground truth opposed to algorithmically generated segments provides some indication as to feasibility of fully automated systems. During training for the classification algorithms, species with too few samples must be filtered out; the minimum requirement is that there be at least as many samples as folds during training. In the case where oversampling is to be used there must be the number of unique classes times the number of folds samples of each species. In the case that a species does not have enough samples from a given set of sites it is removed from the training. As the data used in this research has large variations in the quantities of training samples available per species per site; there is a potential for the classification algorithms to over classify those samples which have a higher representation. To attempt to mitigate this effect all classification algorithms were tested with Synthetic Minority Over-sampling Technique (Chawla et al., 2002). SMOTE works by oversampling underrepresented classes while simultaneously under sampling overrepresented classes. In SMOTE the synthetic oversampling is done by producing new features within the sample space of underrepresented classes as opposed to simply repeating real samples. 53 3.6 Framework for Evaluation To evaluate the comparable effectiveness of different ML Classification Algorithms, each algorithm was tested for accuracy using a 4-fold cross validation, for a variety of configurations, the choices made for each configuration are based on the following factors. 1. Algorithm used: Random Forest, Support Vector Machines, K Nearest Neighbor, or Multinominal Naive Bayes Classifier. 2. Samples: ground truth, SLIC segmentation or Quick Shift segmentation. 3. How many sites are included: 1, 3, 7, 8, or 11. 4. Data may be either over-sampled or not using the Synthetic Minority Over-sampling Technique (SMOTE) function (Chawla et al., 2002). The above options produce 120 scenarios for configuration of the processing pipeline including options for classification algorithm, segmentation algorithm, how many sites to use and whether or not to oversample the data. Within each scenario the accuracy is calculated for a range of minimum coverage thresholds. The objective is to determine which of the scenarios has the highest accuracy in classification; as well as looking for trends in specific options (i.e., does SLIC produces consistently better, consistently worse, or mixed performance compared to Quick Shift when used as the input segments; regardless of the other options chosen). 54 3.6 Results The sites can be broken down into the following sets, based on the results of single site segmentation accuracy (Figure 16 to Figure 19): Alpha (α) 1 site: Chief Lake Beta (β) 3 sites: North Fraser 41, North Fraser 50, North Olson 5km Gamma (γ) 7 sites: 200rd, 700rd, Alezza, Bend05km, ConifexH47, ConifexK14, NorthFraser11 Delta (δ) 8 sites: 200rd, 700rd, Alezza, Bend05km, ChiefLake, ConifexH47, ConifexK14, NorthFraser11 Epsilon (ε) 11 sites: all sites These groups were formed based on the results of single site segmentation accuracy found in Figure 18. Group α contains only a single (Chief Lake) based on having largest quantity of training samples available. Group β is composed of three sites that represented the lowest segmentation scores (Figure 18); when looking at only the target species for Quick Shift and SLIC segmentation. Group γ represents the seven sites with the highest segmentation scores and group δ is a combination of groups α and γ, making it the set of the 8 highest segmentation accuracies. Finally, group ε represents the set of all sites with training data. 55 These sets are used for separating the data for training and testing models and are intended to help determine the stability of models moving from site to site. Training was performed using all available species, the accuracy of the classification was computed in two ways; using all species in data set (Appendix A), as well as a subset of species. The target species are a subset that have been identified as valuable for moose browse; these species are Trembling Aspen (Populus tremuloides), Red-osier Dogwood (Cornus stolonifera), Paper Birch (Betula papyrifera), Highbush-Cranberry (Viburnum edule), and Willow (Salix spp.). 3.6.1 Ground Truth Segments This section begins by analyzing what accuracy can be achieved by training ML algorithms on the ground truth segments. Answering this question has two motivations, first it shows if the trees can even be identified by spectral reflectance at all. Second this provides a baseline accuracy that segmentation can be compared against; as the true effectiveness of segmentation can not be directly observed, but rather how good the final classification will be. 3.6.1.1 Classification of individual Sites First the accuracies for each site are determined using unaltered data in Table 12, then the accuracies with SMOTE are shown in Table 13, finally Table 14 demonstrates how SMOTE changed the accuracy of classification achieved. The ideal classification algorithm will provide accuracies that are consistently high across sites. 56 Table 12 | Classification accuracy of training segments for all species at each site with unbalanced data Accuracy Site Classification Accuracy 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km RF SVM MNB KNN 63% 81% 73% 78% 63% 68% 79% 81% 56% 75% 61% 65% 83% 74% 78% 65% 68% 80% 82% 52% 79% 63% 34% 57% 47% 60% 33% 54% 54% 42% 34% 49% 28% 52% 79% 66% 77% 48% 62% 69% 75% 46% 72% 40% Algorithm No oversample Table 13 | Classification accuracy of training segments for all species at each site with SMOTE balanced data Accuracy SMOTE Site Classification Accuracy 200rd Algorithm RF SVM MNB KNN 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 63% 85% 73% 78% 62% 74% 87% 82% 65% 85% 71% 77% 64% 74% 88% 78% 41% 43% 74% 53% 67% 40% 66% 72% 61% 77% 69% 68% 44% 62% 76% 66% NorthFraser41 3 NA NA NA NA NorthFraser50 Olson5km 83% 57% 83% 59% 70% 33% 75% 37% Table 14 | Change in classification accuracy of training segments for all species resulting from using SMOTE. Accuracy Change Algorithm RF SVM MNB KNN Effect of SMOTE on Site Classification Accuracy 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 0% 4% 0% 0% -1% 6% 8% 1% -4% 0% 2% -3% -1% -1% 6% 8% -4% 4% -4% 7% 17% 6% 7% 7% 12% 18% 19% 21% 5% -9% -2% 3% -9% -4% 0% 7% -9% NA NA NA NA 8% 3% -3% Table 14 shows the overall net effect of SMOTE is very small, with the exception being improvements across the board with the MNB classifier; these results must however be tempered with the understanding that even with these improvements the MNB classifier still produces consistently lower accuracies than the other classifiers. Further NorthFraser41 had too few samples to use SMOTE, resulting in a failure to train; highlighting that while SMOTE may be able to help balance the data, there does still need to be sufficient data captured. Another question that one might ask is what accuracy can be obtained if looking only for a few target species, as opposed to identifying everything in the land base. Table 15 presents the accuracy of models, tuned to provide the best possible accuracy of the five target species 3 Insufficient training samples were present to complete classification. 57 presented in this paper. Again, SMOTE was used to oversample and balance the number of samples in the dataset shown in Table 16, with the difference in results in Table 17. Table 15 | | Classification accuracy of training segments for target species at each site with unbalanced data Accuracy No oversample Algorithm RF SVM MNB KNN Site Classification Accuracy (Target Species Only) 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 59% 90% 56% 62% 46% 38% 82% 78% 35% 67% 43% 61% 91% 69% 64% 44% 37% 81% 79% 18% 71% 43% 35% 57% 43% 49% 26% 34% 60% 34% 45% 52% 28% 48% 88% 56% 63% 30% 28% 70% 70% 27% 64% 29% Table 16 | Classification accuracy of training segments for target species at each site with SMOTE balanced data Accuracy SMOTE Algorithm RF SVM MNB KNN Site Classification Accuracy (SMOTE & Target Species Only) 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 60% 95% 66% 65% 53% 35% 83% 78% 67% 52% 59% 96% 68% 68% 57% 43% 83% 77% 71% 49% 42% 83% 50% 56% 30% 30% 68% 63% 59% 48% 42% 89% 51% 56% 43% 32% 69% 65% NA NA NA NA 61% 45% Table 17 | Change in classification accuracy of training segments for target species resulting from using SMOTE. Accuracy Change Algorithm RF SVM MNB KNN Effect of SMOTE on Site Classification Accuracy (Target Species Only) 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 1% 5% 10% 3% 7% -3% 1% 0% 0% 9% -2% 5% -1% 4% 13% 6% 2% -2% 0% 6% 7% 26% 7% 7% 4% -4% 8% 29% 7% 20% -6% 1% -5% -7% 13% 4% -1% -5% NA NA NA NA -3% 16% Looking at Table 17, it is shown that when examining only the target species SMOTE is generally positive for Random Forest, and SVM in addition to the MNB as seen when using all sites. Table 16 shows that most sites saw a reduction in classification accuracy when looking at only a subset of species that are targeted. This could be from a variety of reasons, such as how hard the target species are to classify relative to the overall set, or how well represented the target species are in the training data. This demonstrates that there are different ways to assess accuracy 58 based upon the question being asked. If a forest manager wants to know about all species vs only target species; in the case of ConifexH47, this could be over 30% difference which may lead to different conclusions for the viability of machine learning for classification. 3.6.1.2 Classification done with Groups of sites. The next set of results look at training the classification models on groups of sites; such that the models are more generalized than those using only a single site for input. By training on multiple sites models are more likely to be representative of results that might be expected when classifying novel data. Just as with the individual sites these tables are presented as Table 18 showing the accuracy with the original unbalanced data; Table 19 showing the results after oversampling with SMOTE, and Table 20 showing the effects of SMOTE on classification accuracy. Table 18 | Classification accuracy of training segments for all species by site group with unbalanced data Accuracy No oversample Algorithm RF SVM MNB KNN Site Group γ δ α β 63% 64% 75% 73% ε 71% 65% 65% 76% 74% 72% 33% 37% 50% 48% 45% 48% 53% 69% 66% 62% Table 19 | Classification accuracy of training segments for all species by site group with SMOTE balanced data Accuracy SMOTE Algorithm RF SVM MNB KNN Site Group γ δ α β 62% 70% 77% 76% ε 74% 64% 71% 77% 75% 74% 40% 52% 62% 59% 58% 44% 56% 66% 63% 62% 59 Table 20 | Change in classification accuracy of training segments for all species resulting from using SMOTE with site groups. Accuracy Change α β Algorithm RF -1% -1% MNB 7% KNN -4% SVM Site Group γ δ 6% 6% 15% 3% 3% 1% 12% -3% 2% 1% 12% -3% ε 4% 3% 13% -1% With the site groupings, we again see that MNB sees the biggest improvement in accuracy from SMOTE, but is also still the lowest accuracy algorithm tested, even with SMOTE applied. In terms of both Random Forest and SVM, there is on average a small increase in accuracy that can be achieved by using SMOTE on the data before classification. The groups were then again tested for classification on the target species only in Table 21 and Table 22, just as were the individual sites. Table 21 | Classification accuracy of training segments for target species by site group with unbalanced data Accuracy No oversample Site Group (Target Species Only) α β γ δ ε Algorithm RF 46% SVM 44% MNB 26% KNN 30% 48% 44% 42% 40% 66% 69% 45% 60% 64% 66% 42% 57% 60% 60% 42% 52% Table 22 | Classification accuracy of training segments for target species by site group with SMOTE balanced data Accuracy SMOTE Site Group (Target Species Only) α β γ δ ε Algorithm RF 53% SVM 57% MNB 30% KNN 43% 60% 60% 54% 53% 69% 71% 56% 58% 67% 69% 53% 56% 65% 67% 53% 55% Table 23 | Change in classification accuracy of training segments for target species resulting from using SMOTE with site groups. Accuracy Change α Site Group (Target Species Only) β γ δ ε 60 Algorithm RF SVM 7% 13% MNB 4% KNN 13% 11% 16% 12% 13% 2% 2% 11% -3% 3% 3% 11% -1% 6% 7% 11% 3% Looking at Table 23, we do however see a stronger positive result to using SMOTE to oversample data, when training on and testing using groups of sites as opposed to individual sites. While it is hard to determine why a bigger increase is seen here, one potential cause could be that the sample sizes of the most underrepresented classes have more data, potentially causing less error to be induced by SMOTE. 3.6.2 Effects of Segmentation and Coverage on Classification Accuracy The accuracy of classification using the automatically generated segments is presented in graphical form. The minimum coverage of a segment to be considered identified as a species has two primary effects making selecting a coverage threshold non-trivial. First, as the minimum coverage required decreases the purity of samples is reduced presenting more noise to the training algorithms. Conversely as the minimum coverage required increases the number of samples available for training decreases; this provides the need to determine an optimal ratio of quality and quantity samples. It is also at this stage where the differences between the human created and computer-generated segments may become most apparent, as a difference of even a single pixel would show as error at this stage; in order to prevent training sized from becoming vanishingly small some level of error must likely be tolerated at this stage. Due to the large amount of computation required to train the models at various coverage levels only the two best performing algorithms Quick Shift and SLIC were continued to this stage of analysis. 61 The graphs presented below show the classification accuracies for the case with SMOTE applied to the data, tables without SMOTE can be found in Appendix J and Appendix K. 3.6.2.1 Quick Shift Segments Figure 23 | Classification Accuracy of Quick Shift Segments on individual sites. 62 Figure 24 | Classification Accuracy of Quick Shift Segments on groups of sites. For the case of the Quick Shift algorithm, Figure 23 and Figure 24 show that in general requiring a very high level of coverage is very beneficial, with the exception that at or very near 100% coverage, less training data is available due to rejecting segments for differences as small as 1px which could certainly be explained as human error in the delineation process. I would suggest that a 75% minimum coverage may be a good starting point when working with groups of sites as it is around this range that diminishing returns are starting be observed; while for a single site 90% coverage may make more sense. 63 3.6.2.2 SLIC Segments Figure 25 | Classification Accuracy of SLIC Segments on individual sites. Figure 26 | Classification Accuracy of SLIC Segments on groups of sites. 64 The SLIC segments provide more interesting results in terms of ideal coverage levels with most cases preferring high coverage with some exceptions. The individual sites 200rd and NorthFraser11 along with groups γ and ε prefer a coverage in the range of 75%-80% before accuracy begins to decrease. 3.6.2.3 Segmentation Accuracy Comparison The below graphs show a clearer comparison of the difference in accuracies achieved with SLIC and QuickShift. Figure 27 | Relative Accuracy of SLIC over QuickShift on sites. 65 Figure 28 | Relative Accuracy of SLIC over Quick Shift on site groupings. 3.6.2.4 SMOTE Figure 29 | Quick Shift Oversampling Change in Accuracy 66 Figure 30 | SLIC Oversampling Change in Accuracy The use of SMOTE oversampling the generated segments increased accuracy in most cases (Figure 29, Figure 30), with the exception when using the MNB classifier. This is an interesting result, considering that SMOTE produced the biggest improvements for the MNB classifier when using the ground truth data as seen in Figure 27. 67 Figure 31 | Relationship between the number of samples available for training, and the minimum overlap required. Figure 32 | Relationship between the number of classes present in training data, and the minimum overlap required. 68 3.6.3 Shannon’s Diversity Index When looking at comparisons of classification accuracies, and why some sites perform better than others, one potential theory to look at is how well distributed are the samples being trained with. Shannon’s Diversity Index (Shannon, 1948) is one such method for measuring the diversity of sample sizes, where higher numbers represent a more unbalanced dataset. Shannon’s Diversity Index was calculated for both individual sites Table 24, as well as the groups of sites tested Table 25. Table 24 | Shannon’s Diversity Index of sample counts by site Diversity of Sample Sizes 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km Truth 2.260 2.105 2.066 1.680 2.364 2.563 1.881 1.761 2.424 1.569 1.716 QuickShift 2.060 2.045 2.292 1.696 2.156 2.270 1.297 1.665 2.082 1.419 1.749 SLIC 1.916 1.937 2.220 1.569 2.031 2.173 1.231 1.663 1.996 1.337 1.720 Table 25 | Shannon's Diversity Index of sample counts by site groups Diversity of Sample Sizes α Truth QuickShift SLIC 2.364 2.156 2.031 β γ δ ε 2.480 2.598 2.720 2.660 2.342 2.596 2.646 2.660 2.295 2.563 2.578 2.649 These diversity indexes can then be compared to both the accuracy of sites Table 26, Table 28, as well as the effects of SMOTE on those accuracies Table 27, Table 29. Table 26 | Correlation of Shannon’s Diversity Index and classification accuracy of individual sites Algorithm RF SVM MNB KNN Correlation -0.53937 -0.58521 -0.19679 -0.41848 69 Table 27 | Correlation of Shannon’s Diversity Index and SMOTS effects on classification accuracy of individual sites Algorithm RF SVM MNB KNN Correlation 0.000663 0.224732 -0.30708 -0.00788 Table 28 | Correlation of Shannon’s Diversity Index and classification accuracy of site groups Algorithm RF SVM MNB KNN Correlation 0.854157 0.783865 0.536354 0.536354 Table 29 | Correlation of Shannon’s Diversity Index and SMOTS effects on classification accuracy of site groups Algorithm RF SVM MNB KNN Correlation 0.329936 0.112355 0.500587 -0.07188 From Table 26, a negative correlation between Shannon’s Diversity Index, and classification accuracy can be observed suggesting that as sample sizes become more unbalanced accuracy decreases, which is an intuitive solution. However, that is somewhat countered by positive correlation when looking at groups of sites in Table 28, there is not a good explanation for this other than the different groupings of sites may be impacting this outside of the diversity of the sites. When looking at the correlations with the effects of SMOTE on the data Table 27; does not provide a clear correlation with SVM being positive, and MNB being negative. Table 29 on the other hand does show a consistent high correlation, which shows higher site diversity leads to higher effectiveness of SMOTE at improving accuracy. 70 3.7 Results Synthesis 3.7.1 How should data be segmented? While this chapter is focused on classification rather than the segmentation, it must be noted that it is not possible to completely decouple these processes. Into the classification process carried further were the two highest ranked approaches to segmentation. When looking at Figure 27 and Figure 28 graphs with data above the 0 line have better performance for SLIC, while those points below the line Quick Shift performed better. When looking at these figures we can see that SLIC generally performs better than Quick Shift, with a few outliers such as northfraser41 as well as groups β and ε where improvement was not seen until coverages were above 50%; and olson5km where there was little difference in performance. This is unexpected based upon the results of the segmentation indicating Quick Shift having marginally better scores and stability. 3.7.2 Should SMOTE resampling be used? The use of SMOTE provides generally positive results and thus the general recommendation is that SMOTE is used. The notable exceptions to this can be seen in Figure 29 and Figure 30 are that MNB performs poorly with SMOTE applied, as well as limited effects on SLIC segments with very high coverage and use of the Random Forest Classifier. 3.7.3 Which algorithm should be used for training? The two algorithms with the highest performance were Random Forest and Support Vector Machines, as can be seen in Figure 25 and Figure 26. Of interest in these figures is that Support Vector Machines performed better on most of the single site tests, while Random Forest performs better on groupings of sites. 71 3.7.4 Are trained models re-usable? The results strongly suggest that the current models are not re-usable as each site presents differences in species relative reflectance; looking at Figure 22 we can see for example that 200rd, ChiefLake and Olson5km all have very low reflectance values; without a way of anchoring all data to a relative point moving trained models may be difficult. 72 3.8 Discussion Evaluating multiple species accuracy will be based on number of species and their relative proportions. Accuracy decreases as number of species increases; conversely it will increase as the dataset becomes imbalanced. A simple example of this is in the most trivial case where the algorithm simply says all samples are of the most common class, we might expect 50% accuracy for two classes, and 33% accuracy for three classes; however, if 80% of the samples were class 2, we could classify all data as class 2 producing an overall accuracy of 80%. In practical application looking at the Chief Lake site the leading species was Pine with a 23% representation, however classification accuracies just over 70% were achieved, this demonstrates that the ability to classify tree species is greater than a random distribution. It must also be noted that the accuracies of this work show lower accuracy rates than other previous work. For example (Csillik et al., 2018) demonstrated a greater than 96% accuracy on the identification of citrus trees; this work in comparison is working with a very diverse ecosystem including more species of trees, understory, and mixture of tree sizes. It is the hope of this paper that even if the results have room for improvement that a methodology for comparison of approaches to segmentation and classification has been demonstrated that would allow for future work to continue to build upon testing more algorithms, with larger datasets. As the fields of Computer Vison and Machine Learning continue to develop a systematic approach to evaluating the effectiveness of new tools in comparison to existing tools will prove valuable for those seeking more comprehensive management of natural resources. The availability of training data must also be considered when training models. While this research was conducted with what may appear to be a very large dataset; the lack of balance in the data also makes the sample size small in many respects. Many of the under-represented 73 species were simply removed from classification due to not having enough samples at a given site to be able to split into K-folds, and even those that were included many were very underrepresented potentially leading to low training accuracy. Operationally this introduces the need to be thoughtful during the collection of training data, ensuring training sites both contain representation of species of interest, and that these samples are adequately documented. While more training data is always better, this researcher would suggest targeting a minimum of 100 samples of target species, if possible, per site; this guidance is tempered with the realization this may not be possible due to the high cost of collecting training data. 3.8.1 Considerations of algorithms tested This chapter reviewed 4 common classification algorithms; while this does not present an exhaustive search of available methods, the algorithms were chosen due to their popularity of use and being mechanically different with the goal of highlighting differences between approaches. Overall, the highest performance came from SVM and RF. KNN and MNB are faster and more naive algorithms, providing a baseline of very common, and relatively accessible tools. K-Nearest Neighbors is the simplest of the algorithms used and is commonly used in remote sensing projects due to the simplicity and speed of implementation. Due to the nature of the algorithm, it is relatively robust in highly dimensional data provided that each dimension has either a single representation, or the number of clusters is sufficiently high to capture clusters for each representation of any given class. However, at the same time this simplistic nature also makes the algorithm poor for use with noisy data; especially when the values of a given dimension have overlap between classes. 74 The Multinominal Naïve Bayes Probabilistic Classifier works by treating each dimension as an independent variable (Murphy, 2006); this can be very advantageous if individual dimensions are particularly noisy, allowing more consistent dimensions to take the lead. This type of classifier is more commonly applied to text classifications than imagery. Both KNN and MNB present issues for this classification problem in that they are looking for mean values on each dimension; something that is particularly challenging to achieve in the case where the calibration of reflectance is not consistent between sites. State Vector Machines provide a compelling option for the classification of tree species given the collected dataset as they are very well suited to handling data with very high dimensionality, as well as filtering out redundant dimensions. Additionally, the mechanics of calculating dividing lines maximize the distance between clusters as opposed to placing nearest to the mean of a cluster while subtle can make the algorithm much more robust to certain types of noise in the data. However, this is challenged by the continuing theme that SVM expects all elements of a class to present in the same way; a condition that was not true due to the pool calibration abilities of the datasets used in this research. It is believed that should a way of securing better calibration be achieved this could be the preferred method of segmentation. Finally, SVM, like the previous two algorithms has a very fast training time; this is beneficial as over time more training data could be collected and used for efficient retraining of models. Random Forest was the final model used for training; in comparison to SVM in brief RF has worse handling of high dimensionality, but better handling for multiple representations. Among the methods shown RF is unique in its ability to handle dependent dimensions; this is particularly useful as it can allow the model to respond to changes in exposure through the nested decision trees. Such that a high value on dimension A might lead to a higher threshold on 75 dimension B and vice versa thus providing a basic correction for image exposure. It should however be remembered that having multiple representations of classes makes the training set effectively smaller as each representation will contain only a portion of the training data for that class. 3.8.2 Future work and areas of improvement The work here demonstrates the potential of machine learning to be used in a production environment especially with further development on the primary limitations. The first limitation shown is that there needs to be more control of site variability, and it is hypothesized this could be addressed with better calibration of the multispectral cameras used. This has the potential to be as simple as using the more modern revisions of the downwelling light sensors. With the calibration challenges better addressed there would be further room to examine the impacts of the phenological cycle of plants and general soil conditions. The other primary limitation demonstrated is sample sizes; while the set of training data did include 40,257 labeled trees, due to the highly imbalanced nature of the data many species did not contain sufficient training data. It is proposed that to make the methods presented more robust effort would need to be placed into collecting samples from sites with better representation of the rarer species; it would be ideal to have a minimum of at least 100 of each species with a target closer to 1000 or more. It is the hope that this research presents the framework for evaluating accuracy of various tools that will allow for future research to continually try new algorithms and compare against existing solutions to provide a path for continual improvement in accuracy as new machine learning techniques are developed and better imagery is available. 76 Finally, it is believed that the development of technologies whether higher resolution cameras or something like RPAS based LiDAR allowing for a finer scale of structural attributes to be captured may greatly improve these results. Other research papers were reviewed that were able to achieve higher accuracies (Brandtberg, 2007; Csillik et al., 2018). One notable difference in these papers was a focus on mature trees that were easier to discriminate from the understory; thus, removing the need to classify between small deciduous trees and grasses for example. As more research is conducted the models produced will be useful for ecologists and land managers to provide more detailed information about the trees present in terms of species, number, and size. This information could then provide for new ecological questions to be asked in terms of this more detailed inventory not currently possible. This also allows for positive feedback whereas more data is collected models can be continuously trained to be more accurate. 77 Chapter 4 Synthesis 4.1 Introduction The research presented above presents a framework for identification of early seral vegetation. While at present these results provide an accuracy of 52%-83% depending on site and what species are targeted for classification. This accuracy is respectable on its own given the unique challenges of classifying relatively small targets. The methods presented are offered as a framework that will be able to continually integrate with new technologies; and see further increases to quality of results as technology advances. 4.2 Data Collection As effective machine learning requires that a high-quality dataset be used for training and implementation, this research has also provided the opportunity to think of strategies for effective data collection moving forward. The first group of suggestions is around what data should be collected, as well as when and where it should be collected. This research had some outliers in terms of some sites not training well with the other sites, there are several possible explanations for this, differences in the phenological cycle, differences in site conditions such as soil nutrients, or inconsistent distribution of species from site to site. In terms of the phenological cycle the importance of this can be most easily seen as this classification is based on the reflectance of light in various spectrums, something that very visibly changes from summer to fall, and to a greater extent on deciduous trees, during leaf off. As such it would be recommended that future work either develop controls for differences in 78 phenological cycle such as multiple training sets based on the position in the cycle, or taking measures to ensure all data is captured at the same point within the cycle. Both options do however provide operational challenges as research would require either much larger training sites to obtain sufficient samples, or work with very short and potentially unpredictable data collection windows. Site condition is another element that has not been able to be adequately explored in this research. As we know that work is being done to monitor forest health using RPAS Multispectral imagery (Fraser and Congalton, 2021) it would then also follow that a trees health will alter its appearance and thus how a given class is represented in machine learning models. An additional complication here is that preliminary research suggests that some disease such as bark beetle infestation can be detected fast than traditional methods using multispectral imaging (Bárta et al., 2021), thus it may not even be clear at the time of analysis if the trees are healthy or not. In terms of site conditions effects on forest health this should be easier to account for by looking at available moisture and soil nutrients as a variable within the dataset. Finally, in terms of training models, training models on one site then applying to another site adverse impacts on classification accuracy could be observed due to differences in species distribution between these sites. A model may work very well with the set of species on one site but could then be moved to a new site where previously unrepresented species are more plentiful, the model may then have a bias towards not classifying these species due to the imbalanced training data used. A method for selecting training data that strives to balance the training data is more useful than one that focuses more on spatial distribution of sites. 79 The quality of data collected will also serve to have impacts on how effective classification can be. This accounts for both collection of the imagery, as well as the validation surveys. For imagery collection it is important that images be sharp, use the same spectral bands on all collected data for which a model is to be applied and make best efforts to control differences in sunlight. The most prominent issues found with data during this research were variations in sharpness within individual sites, with the hypothesis that this was caused by motion blur from tail winds, then getting sharper images on the return path where a slower relative ground speed is present. Some sites also did include visible shadow from changing lighting conditions, high quality light sensors to account for this would be beneficial. For the survey component, the first and most obvious point is that accuracy matters a bias in segments of trees will affect model accuracy. Additionally, while not observed as an issue in this research it should be a priority to ensure that all training data is correctly labeled. Another issue to be aware of is the need to classify everything in the imagery collected. Most classification algorithms do not have a default option for ‘unknown’ instead opting to classify everything as what it is most comparable to in the training data. As such in initial testing the roads were being classified as trees despite being very different in appearance; this was solved after the fact by adding a bare earth class to the training data. Caution is urged here at the data collection stage to identify all types of vegetation and not just target species as adding to the training set after the fact may not be an option, and non-target vegetation may exhibit similar characteristics to target species. These suggestions can be reduced to the summary that training data must include representations for all data that the model is to be applied to, in all its aspects including species, 80 health and phenological cycle. In order to account for reductions in accuracy due to highly imbalanced data, at the time of data collection either training data collected in quantities sufficient for training, or site selection must be done to limit appearance underrepresented species of little consequence from appearing in the training data. And then data quality must be ensured, any errors in training data will lead to flawed models regardless of how good the model is. While minor errors can be mitigated simply by having large data sets, attention to accuracy is critical at the training stage. 4.3 Evaluation of Segmentation When looking at the accuracy of segmentation, it is very hard to develop a comparison to other work, as other research did not reveal any consistent methods for measuring the quality of segmentation. However, this paper does propose some possible equations to address this. To be clear it is not the concept of rating segmentation that is novel, but rather presenting the quality as a single score. While metrics such as false segments, and false merges are common, they pose a problem for automated optimization processes due to the ambiguity of which result is better based upon multiple competing metrics. While there is undoubtedly room for improvement upon the algorithms presented here in terms of fine tuning their effectiveness. The segments produced were usable for producing viable classification in the following stages of analysis. And potentially more importantly a framework has been produced for the rapid evaluation of other algorithms. The core idea here is that by producing standardized testing frameworks as research advances segmentation methods novel approaches should be able to be calculated and examined against previous results to determine an optimal method that can evolve over time. This also allows for automated retraining and testing as datasets grow. 81 It was found that two of the proposed metrics Metric 1 MAX (ARP - FM) and Metric 3 MAX (-FM - 0.5 * FS)) were the most useful. Both focusing on reducing the number of false merges; with the difference being metric 1 focuses on the number of pixels correctly segmented, where Metric 2 focuses on the number of false splits. In this case Metric 1 in theory could be a better representation of land cover, while Metric 3 may be a better representation of individual stems. As such while the are industry trends to wards per stem forest management (Gray et al., 2021; Seidl et al., 2012) it would seem logical to focus on the refinement of metric 3 as an ongoing measurement of segmentation quality. All the algorithms tested produced an over segmentation based upon the developed scoring metrics. Given this process is based solely on reflectance data and not texture or height of trees this is likely as good as could be expected due to the often-ambiguous boundaries between small trees and the understory. There are many successful applications of individual tree segmentation (Jing et al., 2012; Morsdorf et al., 2003; Zhang et al., 2015) these works all work with more mature trees that can be more easily discerned from noise in the data. One issue with the early seral vegetation is that the elevation models produced due not accurately reflect heights of vegetation after noise filtering is completed, this is due to both the resolution used, and movement in vegetation cause by even a small breeze. 4.4 Effectiveness of Classification Measuring the effectiveness of classification can be a complex topic hard to reduce accuracy to a single number. Effectiveness will vary based upon how well balanced the data is, the quality of imagery collected, and quality of segmentation and training labels provided. When reviewing the results below it is important to remember that the data below represents real world 82 data and not all of these variables can be controlled, thus producing some natural variance from site to site in terms of classification accuracy. Table 30 |Classification accuracy of target species using SVM, with top row showing total number of training samples. Site Total Samples SVM SVM SMOTE 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km 11529 3774 5649 1521 12861 1398 1077 8331 1677 1107 1862 65% 83% 74% 78% 65% 68% 80% 82% 52% 79% 63% 65% 85% 71% 77% 64% 74% 88% 78% NA 83% 59% Table 31 |Classification accuracy of target species using SVM, with top row showing percentage of samples that are target species. Site 200rd 700rd Alezza Bend05km ChiefLake ConifexH47 ConifexK14 NorthFraser11 NorthFraser41 NorthFraser50 Olson5km % Target Samples 18% 52% 37% 42% 12% 19% 48% 42% 16% 42% 20% Target SVM 61% 91% 69% 64% 44% 37% 81% 79% 18% 71% 43% Target SVM SMOTE 59% 96% 68% 68% 57% 43% 83% 77% NA 71% 49% Looking at Table 30 above we can see that simply having a larger training dataset does not ensure better accuracy; however, Table 31 does show that when attempting to classify target species there is a very strong correlation (0.929) between the prevalence of target species and ability to classify those species. One possible explanation is that the variance in number of samples for each species is very important as an example of this if Shannon’s Diversity index is calculated for the number of each sample type per site as in Table 24 we can see a correlation of -0.585 with SVM’s classification accuracy, this correlation suggests that sites with less diversity have a better classification accuracy. It is still an open question if the need for homogeneity is only in the training data or if more uniform sites would classify better even with a homogenous training set. SMOTE did increase the accuracy of classification on average, suggesting that it is just the training that needs to be homogenous, but with so many species being extremely underrepresented this is hard to discern from the current dataset. For those sites able to achieve the 78% and higher accuracies RPAS imagery is a relatively effective method for classification of entire sites. In comparison the other options for 83 classification would be manual surveys of small plots (typically a circle with a 3.99m radius, hereafter referred to as a 3.99 plot) (Raymer, 2001) and interpolating over the entire land base which would still contain errors; or the prohibitively expensive manual survey of every tree which while would have 100% accuracy could realistically be accomplished. Another interesting avenue for future research would be to look at comparisons of RPAS classification to 3.99 plots, as well as looking at a potential for a hybrid approach. 4.5 What information do these results provide to forest managers? Just as important as developing methods for data collection and classification is to develop an understanding of what the information can be used for. The methods presented in this research classify ground cover by species type, which does have an important distinction from classifying trees. Due to the relatively high number of false splits allowed at the segmentation stage this is not a method for counting trees, only how much area each species covers. Due to the nature of early seral vegetations small size and lack of defined edges in RPAS imagery, the orthomosaics produced from a photogrammetry workflow result in an inability to separate from grass and brush in the understory. Further the small size of early seral vegetation, results in the height being filtered out as noise in the photogrammetry workflow resulting in a lack of data on the vertical dimension; that could be used derive tree heights or total volume of biomass. The information collected is however still very useful from an ecological and forest management perspective as it does allow for determining a general idea of the forest makeup by area; if this is combined with knowledge of how recently logging has occurred it may be possible to get good estimates on food supply for ungulates as an example. This information is also useful for monitoring changes over time; RPAS is a relatively cost-effective method for collecting data 84 and changes in surface cover vegetation over time can provide indicators of how the forest is developing. This methodology also allows for a more holistic data collection, while in some ways the need to classify every type of vegetation to produce accurate models could be a disadvantage in some cases. The collection of information on every type of vegetation instead of just merchantable timber for example could lead to data sets that can be used for multiple aspects of forest management, including not just how profitable logging could be but also what other ecosystem services are provided, and monitoring how much biodiversity is present in our forests. 4.6 Framework for evaluation The technologies used in remote sensing continue to advance; and have done so since the beginning of this research. As new approaches are implemented into working processes, they will continually have room for improvement with modern advancements. This demonstrates the need for processes in place that can be used to evaluate these new advancements in a timely manner providing for faster integrations. This research should not be understood as a prescriptive method for the classification; but rather a framework for considering how the tools can be applied. While working on this research early on it became clear there is not a simple and definitive way for saying how good the segmentation of an image is. One of the primary factors for this is that different problems may be more affected by different types of errors. The optimal segmentation is the segmentation that produces the highest accuracy classification. The most trivial solution to this problem would be to train models-based segmentations were the parameters used to generate the segments spanned the range of possibilities. However, given the computational time needed to train machine learning algorithms, combining all the potential 85 parameters of both segmentation and classification would be nearly impossible to compute with current technology. For this reason, these two steps were decoupled. Providing a quantitative metric for scoring the segmentation is a critical piece to producing automated training pipelines. As spatial and spectral resolutions change with different sensors different hyper-parameters are needed to produce optimal results. Determining a consistent way of weighing the importance of the different error measurements such as false splits and false merges is essential for performing fair comparisons between sensors. 86 References Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2010. SLIC Superpixels [WWW Document]. Infoscience. URL http://infoscience.epfl.ch/record/149300 (accessed 3.21.21). Arganda-Carreras, I., Turaga, S.C., Berger, D.R., Cireşan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J., Laptev, D., Dwivedi, S., Buhmann, J.M., Liu, T., Seyedhosseini, M., Tasdizen, T., Kamentsky, L., Burget, R., Uher, V., Tan, X., Sun, C., Pham, T.D., Bas, E., Uzunbas, M.G., Cardona, A., Schindelin, J., Seung, H.S., 2015. Crowdsourcing the creation of image segmentation algorithms for connectomics. Front. Neuroanat. 9. https://doi.org/10.3389/fnana.2015.00142 Baleshta, K.E., Simard, S.W., Roach, W.J., 2015. Effects of thinning paper birch on conifer productivity and understory plant diversity. Scandinavian Journal of Forest Research 0, 1–47. https://doi.org/10.1080/02827581.2015.1048715 Bárta, V., Lukeš, P., Homolová, L., 2021. Early detection of bark beetle infestation in Norway spruce forests of Central Europe using Sentinel-2. International Journal of Applied Earth Observation and Geoinformation 100, 102335. https://doi.org/10.1016/j.jag.2021.102335 Beaudry, L., Coupe, R., Delong, C., Pojar, J., 1999. Plant Indicator Guide for Northern British Columbia: Boreal, Sub-Boreal, and Subalpine Biogeoclimatic Zones: BWBS, SBS, SBPS, and northern ESSF [WWW Document]. URL https://www.for.gov.bc.ca/hfd/pubs/docs/Lmh/Lmh46.htm (accessed 4.1.24). Biswas, D., Su, H., Wang, C., Blankenship, J., Stevanovic, A., 2017. An Automatic Car Counting System Using OverFeat Framework. Sensors 17, 1535. https://doi.org/10.3390/s17071535 Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT ’92. Presented at the fifth annual workshop, ACM Press, Pittsburgh, Pennsylvania, United States, pp. 144–152. https://doi.org/10.1145/130385.130401 Boukherroub, T., LeBel, L., Ruiz, A., 2017. A framework for sustainable forest resource allocation: A Canadian case study. Omega, New Research Frontiers in Sustainability 66, 224–235. https://doi.org/10.1016/j.omega.2015.10.011 Brandtberg, T., 2007. Classifying individual tree species under leaf-off and leaf-on conditions using airborne lidar. ISPRS Journal of Photogrammetry and Remote Sensing 61, 325– 340. https://doi.org/10.1016/j.isprsjprs.2006.10.006 Breiman, L., 2001. Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324 British Columbia, Forest Inventory and Monitoring Program, British Columbia, Resources Information Standards Committee, British Columbia, Forest Analysis and Inventory Branch, British Columbia, Ministry of Forests and Range, Growth and Yield Program, 2007. Forest Inventory and Monitoring Program: growth and yield standards and procedures. Resources Information Standards Committee, Victoria, B.C. Brown, G.S., Rettie, W.J., Brooks, R.J., Mallory, F.F., 2007. Predicting the impacts of forest management on woodland caribou habitat suitability in black spruce boreal forest. Forest Ecology and Management 245, 137–147. https://doi.org/10.1016/j.foreco.2007.04.016 Bugmann, H.K.M., Yan, X.D., Sykes, M.T., Martin, P., Lindner, M., Desanker, P.V., Cumming, S.G., 1996. A comparison of forest gap models: Model structure and behaviour. Climatic Change 34, 289–313. 87 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357. https://doi.org/10.1613/jair.953 Comaniciu, D., Meer, P., 2002. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619. https://doi.org/10.1109/34.1000236 Coops, N.C., Tompaski, P., Nijland, W., Rickbeil, G.J.M., Nielsen, S.E., Bater, C.W., Stadt, J.J., 2016. A forest structure habitat index based on airborne laser scanning data. Ecological Indicators 67, 346–357. https://doi.org/10.1016/j.ecolind.2016.02.057 Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach Learn 20, 273–297. https://doi.org/10.1007/BF00994018 Costanza, R., d’Arge, R., de Groot, R., Farber, S., Grasso, M., Hannon, B., Limburg, K., Naeem, S., O’Neill, R.V., Paruelo, J., Raskin, R.G., Sutton, P., van den Belt, M., 1997. The value of the world’s ecosystem services and natural capital. Nature 387, 253–260. https://doi.org/10.1038/387253a0 Cristianini, N., Shawe-Taylor, J., 2000. An introduction to support vector machines: and other kernel-based learning methods. Cambridge University Press, Cambridge ; New York. Csillik, O., Cherbini, J., Johnson, R., Lyons, A., Kelly, M., 2018. Identification of Citrus Trees from Unmanned Aerial Vehicle Imagery Using Convolutional Neural Networks. Drones 2, 39. https://doi.org/10.3390/drones2040039 D’Amato, A.W., Bradford, J.B., Fraver, S., Palik, B.J., 2011. Forest management for mitigation and adaptation to climate change: Insights from long-term silviculture experiments. Forest Ecology and Management 262, 803–816. https://doi.org/10.1016/j.foreco.2011.05.014 Dhar, A., 2013. Birch (Betula papyrifera) white spruce (Picea glauca) interactions in mixedwood stands: implications for management. Journal of Forest Science 59, 137–149. Duan, K.-B., Keerthi, S.S., 2005. Which Is the Best Multiclass SVM Method? An Empirical Study, in: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (Eds.), Multiple Classifier Systems, Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 278–285. https://doi.org/10.1007/11494683_28 Dumitru, C.O., Andrei, V., Schwarz, G., Datcu, M., 2019. MACHINE LEARNING FOR SEA ICE MONITORING FROM SATELLITES. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-2/W16, 83–89. https://doi.org/10.5194/isprs-archives-XLII-2-W1683-2019 Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C., 2016. High-dimensional and largescale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition 58, 121–134. https://doi.org/10.1016/j.patcog.2016.03.028 Evangelista, P.F., Embrechts, M.J., Szymanski, B.K., 2006. Taming the Curse of Dimensionality in Kernels and Novelty Detection, in: Abraham, A., de Baets, B., Köppen, M., Nickolay, B. (Eds.), Applied Soft Computing Technologies: The Challenge of Complexity, Advances in Soft Computing. Springer, Berlin, Heidelberg, pp. 425–438. https://doi.org/10.1007/3-540-31662-0_33 Felzenszwalb, P.F., Huttenlocher, D.P., 2004. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59, 167–181. https://doi.org/10.1023/B:VISI.0000022288.19776.77 Forest Health Aerial Survey Manual, 2012. 65. 88 Fradkov, A.L., 2020. Early History of Machine Learning. IFAC-PapersOnLine 53, 1385–1390. https://doi.org/10.1016/j.ifacol.2020.12.1888 Fraser, B.T., Congalton, R.G., 2021. Monitoring Fine-Scale Forest Health Using Unmanned Aerial Systems (UAS) Multispectral Models. Remote Sensing 13, 4873. https://doi.org/10.3390/rs13234873 Friedman, J.H., 1997. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality 23. Government of British Columbia, n.d. VRI - HISTORICAL Vegetation Resource Inventory (2002 - 2022) - Open Government Portal [WWW Document]. URL https://open.canada.ca/data/en/dataset/02dba161-fdb7-48ae-a4bb-bd6ef017c36d (accessed 12.9.24). Gray, A.N., McIntosh, A.C.S., Garman, S.L., Shettles, M.A., 2021. Predicting canopy cover of diverse forest types from individual tree measurements. Forest Ecology and Management 501, 119682. https://doi.org/10.1016/j.foreco.2021.119682 Gupta, D.D., Nair, L.M., 2005. IMPROVING OCR BY EFFECTIVE PRE-PROCESSING AND SEGMENTATION FOR DEVANAGIRI SCRIPT:A QUANTIFIED STUDY. . Vol. 52. Holmgren, J., Persson, Å., Söderman, U., 2008. Species identification of individual trees by combining high resolution LiDAR data with multi‐spectral images. International Journal of Remote Sensing 29, 1537–1552. https://doi.org/10.1080/01431160701736471 Jing, L., Hu, B., Noland, T., Li, J., 2012. An individual tree crown delineation method based on multi-scale segmentation of imagery. ISPRS Journal of Photogrammetry and Remote Sensing 70, 88–98. https://doi.org/10.1016/j.isprsjprs.2012.04.003 Lemprière, T.C., Kurz, W.A., Hogg, E.H., Schmoll, C., Rampley, G.J., Yemshanov, D., McKenney, D.W., Gilsenan, R., Beatch, A., Blain, D., Bhatti, J.S., Krcmar, E., 2013. Canadian boreal forests and climate change mitigation. Environ. Rev. 21, 293–321. https://doi.org/10.1139/er-2013-0039 McGrath, M.J., Luyssaert, S., Meyfroidt, P., Kaplan, J.O., Bürgi, M., Chen, Y., Erb, K., Gimmi, U., McInerney, D., Naudts, K., Otto, J., Pasztor, F., Ryder, J., Schelhaas, M.-J., Valade, A., 2015. Reconstructing European forest management from 1600 to 2010. Biogeosciences 12, 4291–4316. https://doi.org/10.5194/bg-12-4291-2015 Meilă, M., 2007. Comparing clusterings—an information based distance. Journal of Multivariate Analysis 98, 873–895. https://doi.org/10.1016/j.jmva.2006.11.013 Millennium Ecosystem Assessment, 2005. Ecosystems and Human Well-Being: Synthesis. Morsdorf, F., Meier, E., Allgower, B., 2003. CLUSTERING IN AIRBORNE LASER SCANNING RAW DATA FOR SEGMENTATION OF SINGLE TREES 7. Murphy, K.P., 2006. Naive Bayes classifiers. Nowak, D.J., Hirabayashi, S., Bodine, A., Greenfield, E., 2014. Tree and forest effects on air quality and human health in the United States. Environmental Pollution 193, 119–129. https://doi.org/10.1016/j.envpol.2014.05.028 Oettel, J., Lapin, K., 2021. Linking forest management and biodiversity indicators to strengthen sustainable forest management in Europe. Ecological Indicators 122, 107275. https://doi.org/10.1016/j.ecolind.2020.107275 Pearce, D.W., 2001. The Economic Value of Forest Ecosystems. Ecosystem Health 7, 284–296. https://doi.org/10.1046/j.1526-0992.2001.01037.x Pettis, B.T., 2023. reCAPTCHA challenges and the production of the ideal web user. Convergence 29, 886–900. https://doi.org/10.1177/13548565221145449 89 Pitt, D.G., Comeau, P.G., Parker, W.C., MacIsaac, D., McPherson, S., Hoepting, M.K., Stinson, A., Mihajlovich, M., 2010. Early vegetation control for the regeneration of a singlecohort, intimate mixture of white spruce and trembling aspen on upland boreal sites. Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere 40, 549–564. https://doi.org/10.1139/X10-012 Pollard, D., 1982. Quantization and the method of k-means. IEEE Transactions on Information Theory 28, 199–205. https://doi.org/10.1109/TIT.1982.1056481 Raymer, B., 2001. Juvenile Spacing Quality Inspection. Forest Renewal BC. Redmon, J., Farhadi, A., 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs]. Schuster, C., Förster, M., Kleinschmit, B., 2012. Testing the red edge channel for improving land-use classifications based on high-resolution multi-spectral satellite data. International Journal of Remote Sensing 33, 5583–5599. https://doi.org/10.1080/01431161.2012.666812 Seely, H.E., 1934. AERIAL PHOTOGRAPHY IN FOREST SURVEYS. The Forestry Chronicle 10, 226–229. https://doi.org/10.5558/tfc10226-4 Seidl, R., Rammer, W., Scheller, R.M., Spies, T.A., 2012. An individual-based process model to simulate landscape-scale forest ecosystem dynamics. Ecological Modelling 231, 87–100. https://doi.org/10.1016/j.ecolmodel.2012.02.015 Seydi, S.T., Saeidi, V., Kalantar, B., Ueda, N., Halin, A.A., 2022. Fire-Net: A Deep Learning Framework for Active Forest Fire Detection. Journal of Sensors 2022, e8044390. https://doi.org/10.1155/2022/8044390 Shannon, C.E., 1948. A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Takumi, K., Watanabe, K., Ha, Q., Tejero-De-Pablos, A., Ushiku, Y., Harada, T., 2017. Multispectral Object Detection for Autonomous Vehicles, in: Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Thematic Workshops ’17. Association for Computing Machinery, New York, NY, USA, pp. 35–43. https://doi.org/10.1145/3126686.3126727 Taunk, K., De, S., Verma, S., Swetapadma, A., 2019. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification, in: 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Presented at the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1255–1260. https://doi.org/10.1109/ICCS45141.2019.9065747 Terry, E.L., McLellan, B.N., Watts, G.S., 2000. Winter habitat ecology of mountain caribou in relation to forest management. Journal of Applied Ecology 37, 589–602. https://doi.org/10.1046/j.1365-2664.2000.00523.x Tin Kam Ho, 1995. Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition. Presented at the Proceedings of 3rd International Conference on Document Analysis and Recognition, pp. 278–282 vol.1. https://doi.org/10.1109/ICDAR.1995.598994 Tompalski, P., Coops, N.C., White, J.C., Wulder, M.A., Pickell, P.D., 2015. Estimating Forest Site Productivity Using Airborne Laser Scanning Data and Landsat Time Series. Canadian Journal of Remote Sensing 41, 232–245. https://doi.org/10.1080/07038992.2015.1068686 van Leeuwen, M., Hilker, T., Coops, N.C., Frazer, G., Wulder, M.A., Newnham, G.J., Culvenor, D.S., 2011. Assessment of standing wood and fiber quality using ground and airborne 90 laser scanning: A review. Forest Ecology and Management 261, 1467–1478. https://doi.org/10.1016/j.foreco.2011.01.032 Vedaldi, A., Soatto, S., 2008. Quick Shift and Kernel Methods for Mode Seeking, in: Forsyth, D., Torr, P., Zisserman, A. (Eds.), Computer Vision – ECCV 2008, Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 705–718. https://doi.org/10.1007/978-3-540-88693-8_52 Walsh, S.J., 1980. Coniferous tree species mapping using LANDSAT data. Remote Sensing of Environment 9, 11–26. https://doi.org/10.1016/0034-4257(80)90044-9 Weisberg, P.J., Bugmann, H., 2003. Forest dynamics and ungulate herbivory: from leaf to landscape. Forest Ecology and Management 181, 1–12. Whitman, E., Parisien, M.-A., Price, D.T., St-Laurent, M.H., Johnson, C.J., DeLancey, E.R., Arseneault, D., Flannigan, M.D., 2017. A framework for modeling habitat quality in disturbance-prone areas demonstrated with woodland caribou and wildfire. 8, e01787. https://doi.org/10.1002/ecs2.1787 Wulder, M., 2003. EOSD Land Cover Classification Legend Report (No. V2). Victoria, B.C. Yancho, J.M.M., Coops, N.C., Tompalski, P., Goodbody, T.R.H., Plowright, A., 2019. Fine-Scale Spatial and Spectral Clustering of UAV-Acquired Digital Aerial Photogrammetric (DAP) Point Clouds for Individual Tree Crown Detection and Segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12, 4131–4148. https://doi.org/10.1109/JSTARS.2019.2942811 Zhang, C., Zhou, Y., Qiu, F., 2015. Individual Tree Segmentation from LiDAR Point Clouds for Urban Forest Inventory. Remote Sensing 7, 7892–7913. https://doi.org/10.3390/rs70607892 Zheng, G., Chen, J.M., Tian, Q.J., Ju, W.M., Xia, X.Q., 2007. Combining remote sensing imagery and forest age inventory for biomass mapping. Journal of Environmental Management, Carbon Sequestration In China’s Forest Ecosystems 85, 616–623. https://doi.org/10.1016/j.jenvman.2006.07.015 91 Appendix Appendix A | List of all species recorded in data collection. Latin Name Amelanchier alnifolia Aruncus dioicus Acer glabrum Alnus sp Anaphalis margaritacea Aralia nudicaulis Actaea rubra Arctostaphylos uva-ursi Cornus canadensis Crataegus douglasii Castilleja miniata Corylus cornuta Cornus stolonifera Disporum hookeri Epilobium angustifolium Equisetum sp Geocaulon lividum Heracleum lanatum Juniperus communis Lysichiton americanum Linnaea borealis Ledum groenlandicum Lonicera involucrata Lupinus sp Lycopodium annotinum Mitella nuda Paxistima myrsinites Petasites palmatus Rosa acicularis Rubus idaeus Ribes lacustre Rubus parviflorus Symphoricarpos albus Streptopus amplexifolius Sambucus racemosa Spiraea betulifolia Shepherdia canadensis sp_code Aa Ad Ag Al Am An Ar Auu Cc Cd Cm Coco Cs Dh Ea Eq Gl Hl Jc La Lb Lg Li Lu Lya Mn Pm Pp Ra Ri Rl Rp Sa Sap Sar Sb Sc Common Name Saskatoon Goat's Beard Douglas Maple Alder Pearly Everlasting Wild Sarsaparilla Baneberry Kinnikinnick Bunchberry Black Hawthorn Red Paintbrush Beaked Hazelnut Red-osier Dogwood Hooker's Fairybells Fireweed Horsetail Bastard Toad-flax Cow-parsnip Common Juniper Skunk Cabbage Twinflower Labrador Tea Black Twinberry Lupine Stiff Clubmoss Common Mitrewort Falsebox Palmate Coltsfoot Prickly Rose Red Raspberry Black Gooseberry Thimbleberry Common Snowberry Clasping Twistedstalk Red Elderberry Birch-leaved Spirea Soopolallie Class plant plant deciduous deciduous plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant 92 Spiraea douglasii ssp. menziesii Smilacina racemosa Sorbus sp Viburnum edule Vaccinium membranaceum Veratrum viride Salix sp Populus balsamifera Populus tremuloides Abies lasiocarpa Betula papyrifera Pseudotsuga menziesii Pinus contorta Picea glauca x engelmannii Sd plant Sr Ss Ve Vm Vv W Ac At Bl Ep Fd Pl Sx Douglas Spirea (pink / hardhack) False Solomon's-seal Mountain-ash Highbush-cranberry Black Huckleberry Indian Hellebore Willow Cottonwood Trembling Aspen Subalpine Fir Paper Birch Douglas-fir Lodgepole Pine Hybrid White Spruce Latin Name Amelanchier alnifolia Aruncus dioicus Acer glabrum Alnus sp Anaphalis margaritacea Aralia nudicaulis Actaea rubra Arctostaphylos uva-ursi Cornus canadensis Crataegus douglasii Castilleja miniata Corylus cornuta Cornus stolonifera Disporum hookeri Epilobium angustifolium Equisetum sp Geocaulon lividum Heracleum lanatum Juniperus communis Lysichiton americanum Linnaea borealis Ledum groenlandicum Lonicera involucrata Lupinus sp Lycopodium annotinum sp_code Aa Ad Ag Al Am An Ar Auu Cc Cd Cm Coco Cs Dh Ea Eq Gl Hl Jc La Lb Lg Li Lu Lya Common Name Saskatoon Goat's Beard Douglas Maple Alder Pearly Everlasting Wild Sarsaparilla Baneberry Kinnikinnick Bunchberry Black Hawthorn Red Paintbrush Beaked Hazelnut Red-osier Dogwood Hooker's Fairybells Fireweed Horsetail Bastard Toad-flax Cow-parsnip Common Juniper Skunk Cabbage Twinflower Labrador Tea Black Twinberry Lupine Stiff Clubmoss Class plant plant deciduous deciduous plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant plant deciduous deciduous deciduous conifer deciduous conifer conifer conifer 93 Mitella nuda Mn Common Mitrewort plant Appendix B | Site Attributes Site SP_CD_1 SP_COUNT_1 SP_PCT_1 SP_CD_2 SP_COUNT_2 SP_PCT_2 SP_CD_3 SP_COUNT_3 SP_PCT_3 CON_COUNT CON_PCT DECD_COUNT DECD_PCT PLANT_COUNT PLANT_PCT Tree_size Tree_size_variation tree_count AVG_ELEVATION Var_ELEVATION 200rd_13km Pl 1267 32.97 At 564 14.68 Li 389 10.12 1708 44.44 1035 26.93 1100 28.62 0.849268 2.498069 3843 714 1.756 700rd_28km Cs 399 31.72 Sx 288 22.89 Ep 221 17.57 373 29.65 287 22.81 598 47.54 1.044152 2.47825 1258 713 16.778 alezza_lake Sx 791 42.01 Ep 337 17.9 W 198 10.52 976 51.83 587 31.17 320 16.99 1.021664 5.773248 1883 758 6.424 bend_0-5km Sx 251 49.51 Ep 159 31.36 W 49 9.66 257 50.69 226 44.58 24 4.73 0.0872799 5.26019 507 577 9.331 chief_lake_site_1 Pl 969 22.6 Fd 770 17.96 Al 555 12.95 2189 51.06 1092 25.47 1006 23.47 1.265636 5.97062 4287 674 10.454 conifex_h47 Li 106 22.75 Bl 88 18.88 Al 75 16.09 126 27.04 146 31.33 194 41.63 0.901288 3.355597 466 1118 4.981 conifex_k14 Bl 104 28.97 W 88 24.51 At 66 18.38 129 35.93 161 44.85 69 19.22 0.998359 4.203311 359 1009 7.082 north_fraser_11km Sx 1153 41.52 Ep 751 27.04 W 225 8.1 1386 49.91 1048 37.74 343 12.35 0.912881 1.617215 2777 596 7.693 north_fraser_41km Ri 118 21.11 Li 104 18.6 Mixed 46 8.23 9 1.61 87 15.56 463 82.83 1.538062 6.820703 559 522 3.93 north_fraser_50km Sx 163 44.17 Ep 91 24.66 W 43 11.65 174 47.15 142 38.48 53 14.36 1.279441 5.500881 369 492 5.405 north_olson_5km Bl 309 32.39 Pl 252 26.42 Fd 121 12.68 742 77.78 193 20.23 19 1.99 1.111404 3.99158 954 732 3.118 Appendix C | SLIC Site-Specific Hyper Parameters Metric Site 1 2 3 4 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km Score Compactness -0.16414 -0.11134 -0.23842 -0.07174 -0.12559 -0.05464 -0.02676 -0.15121 -0.14689 -0.09012 -0.06955 -0.67233 -0.43038 -0.72822 -0.48668 -1.14613 -0.25366 -0.14643 -0.47346 -0.4463 -0.29921 -0.22939 -0.3148 -0.22408 -0.47856 -0.28838 -0.6889 -0.10666 -0.0451 -0.22789 -0.4463 -0.29921 -0.22939 -0.3148 -0.1645 -0.47856 -0.21667 -0.6889 -0.09455 -0.03739 -0.22789 -0.4463 -0.29921 -0.22939 Edges/ha 0.003052 0.003052 0.390625 6.25 0.003052 0.001526 0.003052 0.003052 0.003052 0.001526 0.001526 0.003052 0.003052 0.006104 0.024414 0.001526 0.001526 0.001526 0.003052 0.001526 0.001526 0.001526 0.003052 0.003052 0.012207 0.195313 0.003052 0.001526 0.001526 0.003052 0.001526 0.001526 0.001526 0.003052 0.003052 0.012207 0.006104 0.003052 0.001526 0.572344 0.003052 0.001526 0.001526 0.001526 13300 12700 13600 13600 13700 10100 13700 12700 12600 13900 13400 1800 1500 3000 1000 500 1600 1400 3800 500 500 500 5200 3800 5500 2000 1100 3500 2400 8700 500 500 500 5300 6500 5500 3200 1100 4800 2400 8200 500 500 500 94 Appendix D | Quick Shift Site-Specific Hyperparameters Metric 1 2 3 4 Site 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km Score -0.11529 -0.0923 -0.10861 -0.03424 -0.07106 -0.03572 -0.05452 -0.07675 -0.11377 -0.07619 -0.05211 -0.82122 -0.55086 -0.95105 -0.33437 -0.50668 -0.4337 -0.37687 -0.82614 -0.33876 -0.22037 -0.16153 -0.43327 -0.3214 -0.52079 -0.33437 -0.50668 -0.19262 -0.37687 -0.40515 -0.33876 -0.21943 -0.16153 -0.43373 -0.32147 -0.52075 -0.33426 -0.50668 -0.19829 -0.20615 -0.40515 -0.33876 -0.21943 -0.16153 Ratio 0.9 0.6 0 0.5 0.9 0.4 0.8 0.1 0.3 0.6 0.7 0.9 0.9 0.6 0.4 0.1 0.7 0.8 0.1 0.9 0.9 0.9 0.6 0.6 0.5 0.4 0.1 0.6 0.8 0.1 0.9 0.8 0.9 0.5 0.4 0.5 0.8 0.1 0.3 0.8 0.1 0.9 0.8 0.9 Kernel Size 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 3 3 3 5 5 3 5 3 5 5 5 3 3 3 5 5 3 3 3 5 5 5 Sigma 1 5 5 1 3 1 3 1 5 3 3 3 11 9 9 11 1 5 3 1 1 7 1 9 9 9 11 3 5 3 1 1 7 1 9 7 11 11 3 3 3 1 1 7 95 Appendix E | Felzenszwalb’s Efficient Graph Site-Specific Hyperparameters Metric Site 1 2 3 4 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km Score -0.13814 -0.10638 -0.14963 -0.05281 -0.08896 0.00665 -0.01532 -0.09844 -0.12722 -0.08017 -0.05455 -3.29638 -2.92586 -3.4425 -2.77591 -6.61233 -2.34211 -2.1713 -2.8154 -8.6102 -8.78572 -8.96649 -1.68415 -1.4949 -1.6962 -1.42957 -3.35091 -1.13341 -1.11794 -1.44071 -4.10873 -4.0638 -4.15368 -2.16572 -2.13797 -2.15994 -2.05404 -3.68367 -1.84296 -1.85426 -2.06405 -4.54648 -4.53363 -4.62563 Scale 0.0425 0.1025 0.0775 0.05 0.0325 0.2375 0.1825 0.1075 0.055 0.06 0.03 0.28 0.4275 0.505 0.585 0.995 0.575 0.505 0.615 0.9975 0.9925 0.985 0.2025 0.3125 0.3125 0.3 0.345 0.4475 0.3625 0.31 0.9975 0.9925 0.985 0.225 0.31 0.3575 0.2925 0.43 0.37 0.275 0.31 0.9975 0.9925 0.985 96 Appendix F | Mean Shift Site Scores Metric Site 1 2 3 4 Score 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km 200rd_13km 700rd_28km alezza_lake bend_0-5km chief_lake_site_1 conifex_h47 conifex_k14 north_fraser_11km north_fraser_41km north_fraser_50km north_olson_5km -0.16414 -0.11134 -0.23842 -0.07174 -0.12559 -0.05464 -0.02676 -0.15121 -0.14689 -0.09012 -0.06955 -0.67233 -0.43038 -0.72822 -0.48668 -1.14613 -0.25366 -0.14643 -0.47346 -0.4463 -0.29921 -0.22939 -0.3148 -0.22408 -0.47856 -0.28838 -0.6889 -0.10666 -0.0451 -0.22789 -0.4463 -0.29921 -0.22939 -0.3148 -0.1645 -0.47856 -0.21667 -0.6889 -0.09455 -0.03739 -0.22789 -0.4463 -0.29921 -0.22939 Appendix G | Aggregate Hyperparameters for SLIC SLIC Metric 1 Metric 2 Metric 3 Metric 4 Compactness 0.003051758 0.003051758 0.003051758 0.003051758 Segments/ha 13900 500 2300 2300 Appendix H | Aggregate Hyperparameters for QuickShift QuickShift Ratio Metric 1 0.5 Metric 2 0.6 Kernel 1 5 Sigma 1 9 97 Metric 3 Metric 4 0.3 0.1 5 5 9 5 Appendix I | Aggregate Hyperparameters for F-Graph F-Graph Metric 1 Metric 2 Metric 3 Metric 4 Scale 0.045 0.635 0.5 0.4825 Appendix J | Classification Accuracy of Quick Shift Segments on individual sites. 98 Appendix K | Classification Accuracy of SLIC Segments on individual sites. 99