APPLICATION OF BAYESIAN NETWORKS TO LARGE-SCALE PREDICTIVE ECOSYSTEM MAPPING by Adrian Walton BES, University of Waterloo, 1995 PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF NATURAL RESOURCES AND ENVIRONMENTAL STUDIES UNIVERSITY OF NORTHERN BRITISH COLUMBIA March 2005 © Adrian Walton, 2005 .IJNIVERSITY of NORTHERN · BIUTISH COLUMlst,. LIBRARY ·1frJnce George, B.C. ABSTRACT Large-scale ecosystem maps are essential tools for managers of forest-related activities. In British Columbia, the prevailing approach for ecosystem mapping has been to use an expert system that captures expert knowledge in the form of a belief matrix. In this project, I replaced the belief matrix with a Bayesian network in an attempt to overcome some of the drawbacks of the belief-matrix approach. I created a Bayesian-network knowledge base applied to an area encompassing Prince Rupert and part of the following three biogeoclimatic units: Coastal Western Hemlock Very Wet Maritime Montane, Coastal Western Hemlock Very Wet Maritime Submontane and, Coastal Western Hemlock Very Wet Hypermaritime Central. Using each knowledge base, I produced a map of grouped site series. Accuracy assessments performed on each of the maps of grouped site series revealed that the maps poorly predicted the spatial distribution of rare and very wet site-series groups. The results of the map-accuracy assessment, however, were consistent with those resulting from a belief-matrix approach. I considered the Bayesian-network knowledge bases easier to develop, interpret and update than belief matrices. 11 TABLE OF CONTENTS ABSTRACT ....... .. ................................... .. ......... .. ... ............. .......................... ..... ........... .. ......... ii LIST OF FIGURES ....................................... ....................... ...... ........ .. .................................... v LIST OF TABLES .................... .. ............................................... ........................... .. ................ vii ACKNOWLEDGEMENTS ........ .......... .. .... ... .. .. ............................................... ............ ......... viii 1 INTRODUCTION .... .... .... ....................................... ......................................................... 1 1.1 Statistical Methods .... .......................................................................................... .. ........ 4 1.2 Geostatistical Methods .. .... .. .......... ....... ......... .. .......... .................................... .... .... ........ 5 1.3 Artificial Neural Network Methods .......................................................................... .... 6 1.4 Decision Tree Analysis ................................................................................................. 8 1.5 Expert Systems ..................................................................... ..................................... .... 9 1.6 Hard Versus Soft Classification ..... .......... .............. ... .. ....... ...................................... ... 15 1. 7 Accuracy Assessment ....................................................... ................. ......................... 18 1.8 Statement of the Problem and Objectives ................................................................... 21 2 METHODS ............................. ... ... .................................................................................. 24 2.1 Study Area .............. ............................... .. ............................... .. .................................. 24 2.2 Methods Overview .. .................................................................................... ................ 27 2.3 GIS Database Development ... ........................................................ ............................. 29 2.4 Knowledge Base Construction .............................................................. ................... ... 43 2.5 Final Map Generation .. .............. ..... ............................. .. ............................................. 64 2.6 Map Accuracy Assessment ............. ......... ............................ .............................. ......... 64 3 RESULTS ................... ......... ............ .. ..................................................... .......... ........... .. . 66 3.1 Knowledge Base Structure ... .... .... .. .............................. .... .. ........................... .............. 66 3.2 Predictive Ecosystem Maps .......................................................................... .............. 70 111 3.3 Map Accuracy Assessment ..... .................. ................. ................................................. 74 4 DISCUSSION ....... .. ..... ... ....................... ......................................................................... 80 5 CONCLUSION AND RECOMMENDATIONS .............................. ........ ...... ................ 96 6 REFERENCES ............................... ......... .... ........................................... .. ...................... 99 IV LIST OF FIGURES Figure 1 - A graphical representation of the semantic-import model used to define the relationship between slope gradient and the semantic descriptions of flat and sloping. 17 Figure 2 - The study area which was used to conduct large-scale predictive ecosystem mapping ................... ... ...... ........... ..................... .............................. .. .... ........... ....... ......... 25 Figure 3- This diagram illustrates the methodology used to meet the objectives of this study. There were three general sections in the methodology that also comprises most expert systems: create the database, develop the knowledge base and perform inference . .... .. . 28 Figure 4- This sample map (a) demonstrates the difference between the upper-landform elements as defined by Jones (2003) (b) and the upper-landform elements as defined by the continuous-landform-elements layer (c) .............. .............. .. ................................... .. 34 Figure 5 -The Bayesian network of continuous landform elements, which was developed in Netica version 2.17, was developed to predict the spatial arrangement of landform elements .. ... ..... ................................. ....... .. .... .. .......... ....... ....... ... ... ............. ................ ... .. 37 Figure 6 - Four examples of the GIS data layers used to create the database used in this study. This diagram highlights the different spatial scales that occurred within the GIS · database ........ ... ..... ... ................... ... ................................................................. ....... .......... 42 Figure 7 - The influence diagram created to explain the relationship between the GIS database used in this study and soil-moisture regime . .. .. .................................... .... ........ 48 Figure 8 - The influence diagram created to explain the relationship between the GIS database used in this study and soil-nutrient regime ....................................................... 49 Figure 9 - The Bayesian network used to defme the relationships between the GIS data layers used in this study, and soil-moisture and soil-nutrient regimes within the CWHvm2 BEC unit. This network was created by combining the soil-moisture regime and soil-nutrient regime influence diagrams in Figure 7 and Figure 8 ............. .............. 50 Figure 10- Edatopic grids and corresponding site-series groups used for the CWHvm1, CWhvm2 and CWHvh2 Bayesian networks ................. ...... .. ...................................... .... 55 Figure 11- The portion of the CWHvm2 Bayesian-network, which illustrates the relationship between site-series group (Site Series Edatopic), soil-moisture regime (Soil Moisture) and soil-nutrient regime (Soil Nutrient) ...................... .. ...................... 56 Figure 12 - The portion of the Bayesian network for the CWHvm2, which illustrates the influence of slope (Slope Group2) on the previous estimation of site-series-group (Site Series Edatopic) to produce a new prediction of site-series group (Site Series Slope Modifier) . ..... ...... ....... ... .... .... ............... .. ....... ... .... ...... ............... ............ ......... .. ........ .... ... 57 v .----------------------------------- ----- Figure 13- The portion of the Bayesian network for the CWHvm2, which illustrates the influence of forest-stand productivity (Site Productivity) and forest-stand species (Species Influence) on the previous estimation of site-series group (Site Series Slope Modifier) to produce a final estimation of site-series group (Site Series Veg Modifier) for the CWHvm2 ................................. .. .......................................................................... 57 Figure 14- The portion of the Bayesian network for the CWHvm1, which illustrates the influence of floodplains (Floodplain) on the previous estimate of site-series group (Site Series Veg Modifier) to produce a final estimate of site-series group (Site Series Fluvial Modifier) for the CWHvm1 .................. ... .... ........ .... ......................................... 59 Figure 15- The Bayesian network for the CWHvh2, which illustrates the influence of ocean wind (Salt Spray Shoreline) and ocean estuaries (Gully Adjacent to Shore) on the previous estimation of site-series group (Site Series Fluvial Modifier) to produce a final estimation of site-series group (Site Series Shore Modifier) for the CWHvh2 .... 60 Figure 16 - The complete Bayesian network developed to predict site-series group in the CWHvm2 BEC unit of this study. .................................................................................. 67 Figure 17 - The complete Bayesian network developed to predict site-series group in the CWHvm1 BEC unit of this study....... ....... ;......... ... ....... ......... ................... ..................... 68 Figure 18 - The complete Bayesian network developed to predict site-series group in the CWHvh2 BEC unit of this study............ :...........:.'.............. ................ .... .......................... 69 Figure 19 - An overview map of predicted site-series groups in all three BEC subzone/variants in the study area. This map shows the relative position and abundance of each of the predicted site-series groups within each of the three BEC subzone/variants .............................................................................................................. 71 Figure 20 - A perspective view illustrating the relative terrain position of the final site-seriesgroup predictions for the CWHvm2 and CWHvm1 BEC units. The wet site-series groups, such as 05, appear to be correctly placed within the gullies and toe slopes, and the dry site-series groups, such as 03, appear to be correctly placed on ridges and shoulders ........................... ............ ..................................................................... ............. 72 Figure 21- A perspective view illustrating the relative terrain position of the final site-seriesgroup predictions for the CWHvh2 BEC unit. The wet site-series groups, such as 07 and 14, appear to be correctly placed in the gullies and toe slopes, and the dry site-series groups, such as 03, appear to be correctly placed on ridges and shoulders .................... 73 Vl LIST OF TABLES Table 1- A summary of the Jones (2003) rule-based logic used to create the TIN/DEM dependant GIS layers used in this project ....................................................................... 31 Table 2 - The GIS layers ofbedrock, stream density, and ocean wind, which were derived using the procedures of Jones (2003), were not derived solely from the TIN or raster DEM, but rather depended on other data sources ................................................. .......... 32 Table 3 - The conditional-probability table used to define the relationship between landform elements, terrain context and form elements ........... ................. .................. .... ................ 39 Table 4 - The heuristic rule-base used to convert input values into fuzzy landform attributes. ····································································································································· 40 Table 5- A list of the data layers contained within the GIS database, along with comments regarding each layer's ability to assist site series prediction .......................................... 41 Table 6- Site-series groups used for the CWHvm2 BEC unit. CWHvm2 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers ........................ ............................ ... ..... ... ... ...... 44 Table 7- Site-series groups used for the CWHvm1 BEC unit. CWHvm1 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers ........... ... ................. ........ ................................ . 45 Table 8- Site-series groups used for the CWHvh2 BEC unit. CWHvh2 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers ............................................................ ................... 46 Table 9- The accuracy-assessment contingency-matrix for the CWHvm2, which includes percentage correct, user's and producer's accuracy........................ ................................ 75 Table 10- The matrix of average posterior probabilities for the CWHvm2, which includes the average posterior probabilities for each mapped site-series group .............. ............. 75 Table 11- The accuracy-assessment contingency-matrix for the CWHvm1, which includes percentage correct, user's and producer's accuracy ................... ... ....................... ..... ...... 76 Table 12- The matrix of average posterior probabilities for the CWHvm1, which includes the average posterior probabilities for each mapped site-series group ........................... 76 Table 13 -The accuracy-assessment contingency-matrix for the CWHvh2, which includes percentage correct, user's and producer's accuracy ................ .......... .............................. 78 Table 14- The matrix of average posterior probabilities for the CWHvh2, which includes the average posterior probabilities for each mapped site-series group ................................. 79 Vll ACKNOWLEDGEMENTS It is a pleasure to thank the many people that made this degree possible over the past six years. Foremost, I wish to state my gratitude to my MNRES supervisor, Roger Wheate. His patience allowed me to explore many possible research topics and degrees within the College of Science and Management. I would also like to express my gratitude to Michael Gillingham for being a part of my supervising committee and providing thorough reviews. I thank my co-workers for providing me with useful advice during the completion of my final project. The greatest contribution came from Del Meidinger. He not only agreed to work closely with me to create the ecological models but also agreed to be part of my supervising committee. This project would not have been possible with out his advice and support. Marvin Eng, my work supervisor, very graQiously allowed me to focus my past year on the completion of my final project. I would not have completed this project without this opportunity. I also thank Peter Ott for providing me with statistical advice and Shirley Mah for helping me with my writing -both valuable contributions. I wish to thank my siblings and extended family. My sisters, Heidi Robinson and Liesl Walton, and my brother-in-law, Michael Robinson, were particularly supportive and patient during the past six years. Also very important to me was the direct support I received at home from my partner, Jodi Mucha. She graciously accepted the impact that combined academic and work life had on our life at home and provided support during times of frustration. Lastly, and most importantly, I thank my parents, Christine Walton and Colin Walton. They raised me and supported me, which allowed me to be who I am today. To them I dedicate this degree. Vlll 1 INTRODUCTION It is essential for land managers to have information regarding the nature and distribution of ecosystems to manage forest-dependant activities. In order to facilitate this need, a working definition of ecosystems is required, including inventories of the components that comprise ecosystems, and an understanding of the spatial pattern of ecosystems, their formation and interrelationship. Hierarchical systems that classify land based on ecological principles have been developed for scales ranging from global to local. At the global scale, also referred to as small-scale, several researchers have developed ecological land classifications using a bioclimatic approach: e.g., Udvardy (1975), Walter and Box (1976), and Bailey (1989a; 1989b). In the United States, Drisco11(1984), Omernik (1987), and Cleland et al. (1997) developed ecologically-based classifications at the regional scale. In New Zealand, Huser (1999) proposed a hierarchical ecosystem-classification scheme ranging from the regional scale to the local scale, also referred to as large scale. In Canada, nation-wide ecosystem classifications have been created by Wiken (1986), the Ecological Stratification Working Group (1996), and Marshall and Schuts (1999). Large-scale ecological-classification schemes have been presented in the United States (Barnes et al. 1982), New Brunswick (Zelazny eta!. 1989), Ontario (Jones 1983; Sims eta!. 1989; Chambers eta!. 1997), Manitoba (Zolaneski eta!. 1995) Saskatchewan (Beckingham et al. 1996b) and Alberta (Beckingham & Archibald 1996; Klappstein & Corns 1996; Beckingham et al. 1996a). In the province of British Columbia, Pojar et al. (1987) formalized the Biogeoclimatic . Ecosystem Classification (BEC) approach to ecosystem classification. 1 The BEC system is an integrated hierarchical classification scheme that combines climate, vegetation and site classifications. Ecosystems ranging in scale from regional to local are defined using zonal (climatic) concepts and the vegetation of zonal climax or nearclimax ecosystems. In the BEC scheme, the basic unit of zonal or climatic classification is the subzone. Grouping subzones forms zones, while subzone divisions define variants (Pojar et al. 1987). Site series, which are the largest-scale ecosystems unit of the BEC system, are defined within variants, or subzones if variants are not described for a subzone. Site-series units describe all land areas that are capable of supporting a specific climax vegetation within a climatic area (subzone or variant) (Pojar et al. 1987). The ecosystem-forming model ofPojar et al. (1987) is based on the vegetation and soil-formation theory of Major (1951): vegetation and soils are products of climate, organisms, topography, parent material, and time. Plants and soil, considered simultaneously, integrate all the components of an ecosystem and reflect ecosystem formation. As a result, an ecological unit can be described by the plant community (a volume of relatively uniform vegetation) and the soil polypedon (a volume, to the depth of the solum, of relatively uniform soil) upon which the plant community occurs (Pojar et al. 1991). Delineating the geographical distribution of the ecological units that belong to an ecological classification scheme is broadly referred to as ecosystem, or ecological, mapping. The approaches used to delineate ecological units on the landscape are very similar to those employed in vegetation (Skidmore 1989; Kimes et al. 1996; Carpenter et al. 1997; Carpenter et al. 1999; Miller & Franklin 2002; Linderman et al. 2004) and soils mapping (McCracken & Cate 1986; Burrough 1989; Moore et al. 1993; Skidmore et al. 1996; MacMillan et al. 2 2000; Zhu et al. 2001; Qi & Zhu 2003). This is because all three domains rely on similar relationships between topography, climate, parent material, qrganisms and time (Jenny 1941; Major 1951; Pojar et al. 1987). In addition, large-scale mapping techniques employed in each domain have evolved similarly over time (Scull et al. 2003). Traditionally, large-scale ecosystem-, soil- and vegetation-mapping were derived through a site-survey approach (Scull et al. 2003). This mapping approach involved conducting intensive field surveys over the entire area to be mapped. Although the survey approach is useful for small spatial extents, high costs and time are required to conduct intensive field surveys over large areas. Instead, most current mapping projects oflargescale units over large spatial extents must rely on information gathered remotely. For at least two decades large-scale ecosystem, soil and vegetation mapping projects covering large extents were based on manual procedures for delineating ecological units from remotely gathered aerial photos (Klinka & Skoda 1977; Mitchell & Eremko 1987; RIC 1998; Scull et al. 2003). This required a human interpreter to view and analyze aerial photographs and infer the classification units using visible landscape attributes such as slope gradient, slope position, surface curvature, drainage and vegetation (RIC 1998). Recognizing the high cost ($3.00- $8.00/ha) and time required to manually produce large-scale ecosystem maps from aerial photos, however, research began to focus on producing maps using computerbased predictive approaches (Mulder & Corns 1994; Biggs et al. 1997; Jones et al. 1999). Computer-based approaches were seen as an attractive alternative to ecosystem maps derived from aerial photo due to their low reliance on field surveys and low costs, which ranged from $0.20 to $2.00/ha (Biggs et al. 1997). 3 In BC, procedures that use a computer to stratify the landscape into large-scale ecological units are referred to as predictive ecosystem mapping (PEM) (Jones eta/. 1999). Only BC (RIC 1999) and Alberta (Mulder & Corns 1994; Beckingham eta/. 1999) generate large-scale, ecosystem-unit maps using computer-based predictive approaches, although ecosystem-Classification systems exist that define large-scale ecological units in the United States (Cleland et a/. 1997) and New Zealand (Huser 1999), BC (Pojar eta/. 1987), Alberta (Beckingham & Archibald 1996; Klappstein & Corns 1996; Beckingham eta/. 1996a), Saskatchewan (Beckingham eta/. 1996b), Manitoba (Zolaneski eta/. 1995), Ontario (Chambers eta/. 1997) and New Brunswick (Zelazny eta/. 1989). Although the application of computer-based predictive approaches to large-scale, ecological-unit mapping is limited, it is closely related to the field of predictive vegetation mapping, and predictive soil mapping. All these terrain-related predictive-mapping domains usually start with the development of a numerical or statistical model of the relationship between environmental variables and ecosystem properties, which is then applied to a geographic database to create a predictive map (Jones eta/. 1999). The methods used to predict ecosystem, soil and vegetation distribution could generally be divided into five separate types: statistical, geostatistical, artificial neural networks, decision trees and expert systems (Scull eta/. 2003). 1.1 STATISTICAL METHODS Statistical methods are used to explore and model the relationship between quantifiable environmental variables and ecological properties to create predictive maps (Scull eta/. 2003). Statistical models that have successfully been applied to predictive soils 4 mapping include linear regression (Moore eta!. 1993), logistic regression (King eta!. 1999), and exponential regression (Bell eta!. 2000). Miller and Franklin (2002) successfully applied generalized linear models to predictive vegetation mapping, and Beckingham et a!. (1999) used a combination oflinear regression and correspondence analysis to successfully predict ecological unit distribution. Most statistical methods, excluding generalized linear models, however, are limited by their assumption that the data are normally distributed (Scull eta!. 2003). It is more likely that the mapped units do not fit a normal distribution, but rather fit a multinomial distribution (Foody 2002a). Statistical methods are also limited by their high data requirements. Also, the inflexibility of these statistical methods prevent integration with a variety of potential data sources, such as expert knowledge (Scull eta!. 2003). 1.2 GEOSTATISTICAL METHODS Similar to statis~ical methods, geostatistical methods are another approach employed in creating predictive maps. Geostatistical methods are a form of statistics that deal with spatial data and interpolation (Scull eta!. 2003). Their primary purpose is to spatially interpolate values for locations where values are not known using locations where values are known. Kriging, which was developed by Krige (1963), is the main approach used for interpolation ofunknown values. Using this approach, geostatistics have been applied to predictive soil mapping through the work of such authors as: Burgess and Webster (1980), Burrough (1989), McBratney eta!. (1991), Webster (1994), and McBratney eta!. (2000). Criticisms of geostatistic methods, however, include their heavy reliance on data, requiring mapping projects to have a large number of closely spaced data points. 5 Geostatistical methods work best in simple terrain because they rely on a high amount of spatial autocorrelation, but perform poorly in complex terrain where there are abrupt changes in environmental variables (Zhu 1997). Furthermore, like traditional statistics, geostatistics do not incorporate expert knowledge (Scull et al. 2003). 1.3 ARTIFICIAL NEURAL NETWORK METHODS Artificial neural networks (ANNs) are another method employed in the creation of ecosystem, soil and vegetation maps through predictive methods. ANNs are a type of classification model that attempt to emulate the learning process of the human neural network (Atkinson & Tatnall1997). The term artificial neural network is really a broad term that encompasses a number of approaches that attempt to emulate the neural network process. ANN models first start by creating relationships between a set of known attributes mapped and a set of known outcomes. This data set, which is commonly referred to as a training data set, trains the neural network to establish relationships between the known attributes and the known outcomes. The artificial neural network can then be used to predict outcomes from known attributes once it has established the relationships using the training data (Atkinson & Tatnall 1997). The most commonly used ANN technique for vegetation mapping is the multilayer perception neural network, but other successfully employed techniques include the self-organizing feature-map neural network and the adaptive-resonance-theory network (Atkinson & Tatnall1997; Carpenter et al. 1999; Foody 1999). The popularity of ANNs has stemmed from their ability to integrate data acquired at a low level of precision, to handle categorical and continuous data sources, and their freedom from linear or simple non-linear distribution assumptions (Atkinson & Tatnall1997; Zhang 6 & Foody 2001). The process of training an ANN is computationally intensive and can take considerable time, but once trained an ANN can quickly process new input data (Kimes et al. 1996). Criticism of ANNs include their potential to terminate model development at a "local minimum" and produce a sub-optimal model (Kanellopoulos & Wilkinson 1997). On the other hand, ANNs can also produce a model that "over-fits" the training data and produces poor predictions on other independent data sets (Park & Vlek 2002; Qi & Zhu 2003). The potential of over-fitting to the training data increases as the training data increasingly misrepresents the conditions in the data set to be classified (Qi & Zhu 2003). ANNs have been applied extensively to land-cover classification and predictivevegetation mapping, primarily where satellite imagery was the main component of the geographic database. Kimes et al. (1996) used a multilayer perception neural-network to create a model of forest age using Landsat satellite imagery and topographic data. Linderman et al. (2004) successfully used Landsat Thematic Mapper imagery to predict the presence of understory bamboo. Carpenter et al. (1997; 1999) successfully applied adaptive-resonancetheory networks to vegetation mapping using Landsat remote sensing imagery and digital terrain data. The application of ANNs to predictive soil mapping has been less extensive. Qi and Zhu (2003) used neural networks to model the relationships between existing soil maps and terrain variables. Park and Vlek (2002) compared the performance of neural networks, regression trees and general linear models at predicting soil variability using soil type, vegetation attributes and terrain attributes. 7 1.4 DECISION TREE ANALYSIS Decision-tree (DT) analysis is similar to the ANN approach to predictive mapping. However, DT analysis uses a multi-stage approach when making a prediction, unlike ANNs, which use all the attributes simultaneously and make a single prediction. The prediction process is considered a chain of simple decisions, where tests are applied at each node and each branch represents the resulting decision path. The "leaves" (or branch termini) of the tree represent the prediction labels (Pal & Mather 2003). Like artificial neural networks, decision trees are developed using a training data set and then applied to a data set of attributes for which predictions are to be generated (Qi & Zhu 2003). Manually "pruning" the tree is often required to prevent the DT from overfitting the training data and to reduce tree complexity (Qi & Zhu 2003; Pal & Mather 2003). Tree pruning is accomplished by merging two leaf nodes into a single node, usually under the direction of expert knowledge (Qi & Zhu 2003). Advantages of DT analysis over traditional statistical methods include: its ability to handle missing data values and outliers, capture nonadditive and nonlinear behaviour, the lack of assumptions about the data distribution, and ease of updating when new data become available (Moore et al. 1991). Also, expert knowledge can be indirectly applied to DTs through tree pruning. This highlights a significant advantage DTs have over neural networks. With DT methods, the relationships established between the input variables and the predicted outcomes are explicit and interpretable by the analyst, unlike neural networks where the relationships are hidden. This explicit presentation of the relationships can help the expert understand which environmental variables influence the relationship between input 8 and output variables and make adjustments to the tree structure if necessary (Pal & Mather 2003). DT models, however, are negatively affected by outliers and can produce very different results if outliers are included. Maps produced using DTs are more likely to retain the spatial pattern of the input data because DT models partition the data using one input variable at a time (Miller & Franklin 2002). Pal and Mather (2003) also found that the classification accuracy of decision-tree methods decreased relative to neural networks when the number of input variables used during the model training step exceeded 25. They still achieved high classification accuracy (86%) when classifying land-cover type from satellite imagery. Cialella et al. (1997) applied decision-tree analysis to predictive soil mapping to model soil drainage from satellite imagery and digital elevation models. 1.5 EXPERT SYSTEMS Expert systems differ substantially from the statistical, geostatistical, ANN, and decision-tree approaches to predictive mapping. They are computer programs that are built using a human expert's knowledge to simulate the expert's reasoning (Stock 1987). fu predictive mapping the purpose of an expert system is to capture the information a surveyor accumulates while working in the field and integrate that knowledge into a predictive model (McCracken & Cate 1986, as cited in Scull et al. 2003). Expert systems are generally comprised of data, a knowledge base, and an inference engine (Skidmore et al. 1996). The knowledge base contains a set of rules or decisions that define the relationships between the input variables and the output prediction (Stock 1987). The inference engine applies the 9 input data, which would comprise the GIS database in the case ofPEM, to the knowledge base to infer a solution, which in the case ofPEM would be the ecological unit. What makes expert systems different from conventional statistical methods is their ability to store qualitative information and that, essentially, they are meta-models where the knowledge is separate from the model (Davis 1993). This allows expert models to apply the relevant input at different stages ofthe modelling process, and allows for the input to be updated easily (Scull eta/. 2003). Expert systems offer many benefits over traditional statistical approaches. Some of these advantages include the ability to integrate qualitative information (Scull et al. 2003), the ability to integrate expertise from multiple sources (Renooij 2001), the ability to capture knowledge before it becomes unavailable, and improved knowledge transfer to novice domain specialists (Zhu et al. 2001). Despite these advantages, the expert system does have some limitations. These limitations include being unsuitable in areas where there is little expert knowledge regarding the relationships between input variable and output conditions (Scull eta/. 2003), and that experts may have difficulty expressing their own expertise or eliciting a set of rules without bias (Renooij 2001). A criticism of decisions derived from expert opinion is that the result is unfalsifiable because it contains unknown assumptions, limitations and accuracy (Scull et al. 2003). For the results from expert systems to be falsifiable the elicitation of expert knowledge must be presented transparently and explicitly (Varis & Kuikka 1997). Expert systems have been applied to a wide range of predictive-mapping projects. Examples include forest-fire modelling (Davis et al. 1986), forest-vegetation mapping 10 (Skidmore 1989), forest-soil mapping (Skidmore et al. 1991 ), terrain mapping (MacMillan et al. 2000; Schmidt & Hewitt 2004), mineral exploration (D'Ercole et al. 2000), and land-use planning (Zhu et al. 1998). These may employ one of four types of knowledge bases, as described below. Rule-Based Knowledge Base Most expert systems use a rule-based knowledge method. In a rule-based system, the relationships between the input variables and output predictions are derived through a series of if-then-else statements. Simple rule-based knowledge bases are binary, where a condition is either true or false; partially-true and partially-false conditions are not allowed. In a rulebased expert system there are two general methods in which the inference engine can make a decision: forward chaining or backward chaining (Moon 1999). ·In forward chaining, the inference engine compares the attributes of the unknown object to the rule base until arriving at a prediction. This approach is sometimes referred to as data-driven reasoning. In backward chaining, a prediction is hypothesised for an unknown object and the inference compares the attributes of the unknown object to the attributes of the prediction. This approach is sometimes referred to as goal-driven reasoning (Teng & Fairbairn 2000). BeliefMatrix Knowledge Base A less popular knowledge-base method used in expert-based systems is that of the belief or decision matrix. The belief matrix records the belief that an event will occur given a set of conditions (Moon 1999; Hailu & Sommer 1999). For example, in PEM, a belief matrix would record the belief that an ecological unit occurs on slopes over 50%. The values in a belief matrix can be any positive or negative value, where a highly positive value 11 indicates a high belief that an event could occur given a condition, and a highly negative value indicates a low belief or impossibility that an event could occur. The inference engine produces a prediction by summing, for each possible output class, the belief values associated with the input conditions. High sums ofbeliefvalues indicate high belief that the output event occurs given the input conditions, while low belief-value sums indicate low belief. Fuzzy Logic Network Knowledge Base A third type of knowledge-base method used in expert based systems is that ofthe fuzzy-logic network. The fuzzy-logic network uses the theory of fuzzy logic, which attempts to recognize the concept of partial truth developed by Zadeh (1965). The theory of fuzzy logic permits an object to belong to more than one class, recognizing that objects in nature rarely fit exactly the classification types to which they are assigned (Scull eta!. 2003). The fuzzy-logic network extends the rule-based knowledge base to include partial membership to decision paths. When the inference engine processes the input variables through a fuzzylogic network, the output is a membership value, usually ranging from zero to one, for each possible output class. Membership values of zero indicate no membership while values of one indicate full membership. Bayesian Network Knowledge Base Lastly, Bayesian networks (BN) are another common type of knowledge-base method used in expert-based systems. Bayesian networks, also known as belief networks, Bayesian belief networks, Bayes networks and causal probabilistic networks, were first introduced to probability-based decision-systems through the work ofPearl (1988). BNs are so called 12 because they commit to Bayes' rule for probabilistic inference. This rule, developed by mathematician and theologian Reverend Thomas Bayes, generally states that our belief that an event will occur is based on the number oftimes the event has or hasn't occurred in the past given similar conditions. Bayesian methods have been gaining use in applied science, including ecology. Fisheries biology and management has, by far, seen the greatest use of Bayesian methods (Marcot et al. 2001). There has been to date, however, no application of Bayesian networks to predictive ecosystem mapping. Bayesian networks, like fuzzy-logic networks, are numerical models that are primarily built as graphical networks, sometimes referred to as directed-acrylic graphs (Jensen 1996). Bayesian networks consist of the following: a set of variables, also referred to as nodes, which have a finite set of mutually exclusive states, and a set of directed edges, or links, among the variables. When focusing on a single node, parent nodes are those nodes that are at the beginning of a link, which is pointing at the node of interest. Child nodes are those nodes that are at the end of a link, which is exiting the node of interest. BNs cannot contain feedback loops, where the child of a node is also the node's parent or grandparent in some way (Jensen 1996). Those nodes that have no parent are commonly referred to as root or input nodes, while those nodes with no children are referred to as leaf or output nodes (Freidman et al. 1997). Bayesian networks are similar to belief matrices, however, instead ofthe knowledge base containing a single large table, the BN simplifies the problem by breaking the single large table into several smaller tables. A conditional-probability table is associated with each variable with parents, while an unconditional-probability table is associated with each variable with no parent (Jensen 1996). A complete Bayesian network (i.e., nodes, links and 13 conditional probabilities) can wholly be defined using expert knowledge. When the inference engine processes input variables through a Bayesian network the output is a posterior-probability distribution for the output variable, which contains the occurrence likelihood of each output class. A simple Bayesian network could be envisioned as one parent node, which could represent some environmental variable such as soil moisture (SM), and one child node, such as site series (SS). SM ss According to Bayes theorem: P(SS ISM)= P(SM I SS)P(SS) P(SM) Using this theorem, the probability of a given site-series (SS) outcome given soil moisture (SM) is dependent on the likelihood of SM given a specific value of SS (P(SMjSS)) and on the prior probabilities of both SS (P(SS)) and SM (P(SM)). The result is the "posterior" probability for site series (SS). An important aspect of Bayesian networks is the ability to update the conditional probabilities using a "learning" algorithm such as "expectation maximization" (Dempster et a/. 1977). Learning generally involves using observations to update the conditional probabilities of some or all of non-parent nodes to reflect the conditions encountered in the 14 observations (Plach 1999). The values for variables that have no findings are determined through causal inference (Norsys 1997). 1.6 HARD VERSUS SOFT CLASSIFICATION Statistical, geostatistical, ANNs, decision-trees, and expert-systems approaches to predictive mapping can all produce hard classifications (sometimes referred to as crisp classifications). Some of these approaches can also produce soft classifications (sometimes referred to as fuzzy classifications). In a hard classification, only one output class is permitted per observation or location. Soft classification, however, allows an observation or location to belong to several or all possible output classes. The soft-classification result records how strongly an observation or location represents each class (MacMillan et a/. 2000). Bayesian networks and fuzzy-logic models are both examples of approaches that can produce both hard and soft classifications. Predictive-mapping projects are increasingly producing soft-classification results (MacMillan eta/. 2000; Zhang & Foody 2001; Foody 2002a; Foody 2002b). Soft classification can capture prediction uncertainty whether it is a result of class-definition (thematic) uncertainty or spatial uncertainty (Zhang & Foody2001). In predictive mapping, there are mainly two advantages that soft classification has over hard classification. Firstly, many predictive-mapping projects attempt to map concepts that are not discretely defined within the landscape. Instead, many class concepts occur as a continuum across the landscape, which overlap with other class concepts. For example, when mapping landscape elements such as ridge, hill, valley, etc., rarely do these semantically defined features occur discretely or mutually exclusively across the landscape. Some locations will strongly match 15 the semantic definition of a ridge, some locations will weakly match the concept of a ridge, although some locations will not match the concept of a ridge at all. Also, a location may weakly represent the concept of a ridge but strongly represent the semantic definition of a hill. Soft classification can capture this continuum of landscape features that are nonmutually exclusive, but hard classification can not. The second advantage of soft classification involves the scale of the observation or input data. If the scale or resolution of the input data is such that a boundary separating two different feature classes occurs within the minimum-mappable area, then soft classification can capture the proportional occurrence of each class within that area. The occurrence of multiple features within the minimummappable area is affected by the latter's size. The larger the minimum-mappable area the greater the chance that more than one class exists within its area boundary. This is sometimes referred to as a "mixed pixel" in satellite-imagery classification, because more than one feature can exist within an image pixel (Foody 1996). Similar to soft-classification output, which allows the outputs to contain fuzzy membership, is the semantic-import model of Burrough (1989), which allows the inputs to contain fuzzy membership. Using the semantic-import model, the hard thresholds between classes can be converted to fuzzy boundaries. For example, in a hard classification the classification of slope gradient into flat and sloped would occur at a single slope-gradient value. Using the semantic-import model, the belief that a location belongs to either the flat or sloped class would vary according to a range of slope-gradient values. With the semantic.import inodel, instead of the transition between sloped and flat being determined by a single value, the transition from one class to another occurs over a zone defined by two values (ST1 and ST2 in Figure 1). The advantage of the semantic-import model over the hard-class 16 approach is that the semantic-import model captures the uncertainty inherent in the semantic definition oflandscape features (Burrough 1989). This uncertainty in class membership may result from either the continuous nature of the input or the varying semantic definitions of the input classes. ........... • flat ~ sloping ~ ••• ••• ••• ••• "0 0 0 ~ •• ••• ~ ••• ••• •• 0 ~............................J.,_··· sh •• J . sr2 slope gradient Figure 1 -A graphical representation of the semantic-import model used to define the relationship between slope gradient and the semantic descriptions of flat and sloping. Slope gradients between zero and STJ are considered flat, and slope gradients greater than ST2 are considered sloping. As slope gradients increase from STJ to ST2, the likelihood that the slope is considered flat decreases linearly to zero while the likelihood that the slope is considered sloping increases linearly to.one. Flat and sloping semantic-descriptions are considered equally likely at the mid-point between STJ and ST2, although both likelihoods are 0.5 . Although maps resulting from soft classification are gaining popularity, in some circumstances it may be most appropriate to produce a hard classification. For example, decision makers often find probability distributions unclear and prefer a single answer upon which to make decisions. In addition, accuracy-assessment procedures are often based on hard-classification results. To this end, it is not necessary to solely produce hardclassification results, but rather the results from a soft-classification model can be "hardened". Classification "hardening" often involves taking the dominant class from each soft-classification result (Foody 1996). This usually means choosing the class with the 17 highest likelihood (Skidmore et a/. 1996), but in the case of fuzzy logic, the dominant class is the class with the highest membership value (Zhang & Foody 2001). 1.7 ACCURACY ASSESSMENT Most assessments of map accuracy are built around hard-classification results. The contingency table, which is also known as the contingency matrix, error matrix and confusion matrix, is currently at the core of accuracy-assessment procedures for land-cover classification and is used extensively for accuracy assessment in the field of remote-sensing classification (Congalton & Story 1986; Foody 2002b ). The contingency table is simply a square array that relates the number of sample units assigned to a particular class in one classification relative to the number of sample units assigned to a particular class in another classification (Congalton & Green 1999). Contingency tables are almost exclusively applied to hard-classification results, because they are unable to handle soft-classification results. Commonly, one of the classifications is considered correct and is assigned to the matrix columns while the predicted classification is assigned to the matrix rows. The major diagonal then indicates the agreement between the two data sets. The contingency table is an effective way to portray individual accuracies of each class along with both the errors of inclusion (commission errors) and errors of exclusion (omission errors). Commission errors result from putting an area into a class to which it does not belong: committing the act of getting it wrong. Omission errors result from excluding an area from a class to which it does belong: omitting the act of getting it right. Omission and commission errors are related, because every error is both an omission from the correct class and a commission to the incorrect class. Other accuracy-assessment measures commonly 18 derived the contingency matrix are: percentage correct, producer's accuracy, user's accuracy and kappa analysis (Congalton & Story 1986). Percentage correct is commonly used to represent overall map accuracy (Congalton & Story 1986). To derive this measurement the diagonal elements of the contingency table are summed and divided by the total number of samples. Producer's accuracy is a class-specific accuracy-assessment measure that reports how well a specific landscape feature can be mapped (Congalton & Story 1986). Its calculation involves taking the total number of correctly classified samples in a class and dividing by the total number of reference samples in the same class. The result indicates the probability that a sample is placed in the correct class, expressed as a percentage. The producer's accuracy for each class essentially measures the error of omission; the proportion of samples that are omitted from the correct class, where the error of omission is equal to one minus the producer's accuracy. User's accuracy is a class-specific accuracy-assessment measure that essentially measures the error of commission. It is closely related to the producer's accuracy because every error of omission in one class is an error of commission into another class. The user's accuracy is so named because it measures the reliability that the map actually represents what is really on the ground. It is calculated by taking the total number of correctly classified samples in a class and dividing by the total number of samples that are placed in that class. The percentage generated by this calculation indicates to the map user the probability that a location on the classified map actually represents that category on the ground (Congalton & Story 1986). 19 Kappa analysis is a discrete multivariate technique that expresses the degree of agreement between two classifications while compensating for chance agreement (Congalton & Green 1999). The result of performing a Kappa analysis is a KHAT statistic, also denoted K, which is an estimate of Kappa. The KHAT statistic was introduced into the field of land-cover classification through the work ofCongalton and Mead (1983), and has been used extensively to assess the accuracy of land-cover classifications derived from remotely sensed imagery (Foody 2002b). KHAT values range from -1 to 1. Positive values KHAT values, however, should be expected, because there should be a greater than random, or chance, agreement between the classification map and observation data. Congalton and Green (1999) suggest using three KHAT value classes to express agreement. KHAT values greater than 0.8 represent strong agreement, values between 0.4 and 0.8 represent moderate agreement and values less than 0.4 represent poor agreement. In addition to the KHAT statistic, the KHAT variance (Congalton & Green 1999) and Z statistic (Devore 2000) can be calculated to determine ifthe generated classification map is significantly greater than a random, or chance, classification. For map accuracy assessment, Congalton and Green (1999) recommend 50 reference samples per class, however, they also recognize that it may not be possible to obtain 50 samples for rare classes. In the case of rare classes it is acceptable to collect less than 50 samples as long as a minimum number of samples collected per class (Congalton & Green 1999). If obtaining 50 samples per class is not possible then Congalton & Green (1999) recommend a balance between the project's time and cost limitations, and the need to adequately populate the contingency matrix. 20 1.8 STATEMENT OF THE PROBLEM AND OBJECTNES In BC, ecosystem maps have been applied most widely to forest-related management activities. Managers of forest-related activities routinely rely on detailed, large-scale ecosystem maps where the base ecological unit typically occupies no more than 10 ha. Forest-development planning, wildlife-habitat suitability/capability mapping, and ecologically-based yield analysis with site-based site-index curves are all example uses of large-scale ecosystem maps for forest-related management (Biggs et al. 1997). During the past six years, predictive mapping of large-scale ecological units has been developing in BC to meet the management needs of forest-related activities. All of the completed PEM projects have attempted to predict the spatial distribution of site-series ecological units from the biogeoclimatic ecosystem-classification scheme ofBC. The first pilot projects ~ small areas and employed expert-based systems, utilising either a rule- based or a belief-matrix knowledge-base approach (Jones 1999). Since that beginning, over 40 PEM projects have been completed. Although plans exist to complete large-scale ecosystem mapping of the entire province, the completed projects now cover approximately 15% of the province's land base- primarily in non-coastal regions. Of the 32 completed project reports available for review, 28 reported using an expert-based system with a beliefmatrix knowledge base. Each of the completed projects have first stratified the landscape using the mapping method of Eng and Meidinger (1999), which delineates biogeoclimatic subzone/variants, and then developed a separate expert-based system for each subzone/variant stratum. 21 All of the completed large-scale predictive ecosystem maps have been evaluated using the 65% correct de facto standard established by Meidinger (2003). Based on the review of completed projects, results of map-accuracy assessment have been mixed, with approximately half of the mapped BEC-subzone/variants only just meeting the 65% correct target. Approximately 25% of the mapped subzone/variants achieved percent-correct scores well above the 65% target and approximately 25% did not meet this target. Achieving 65% correct targets, however, does not guarantee the applicability of the map product to all forestrelated management activities. Meidinger's (2003) 65% correct criterion was established to evaluate whether a large-scale predictive ecosystem map could be included in a timber supply analysis. A predictive site-series map may not be suitable for such purposes as establishing silviculture prescriptions or identifying rare ecosystems, especially if the map only just achieves the 65% correct target. Efforts to create large-scale predictive-ecosystem maps in BC (e.g., Jones 1999; Timberline Forest Inventory Consultants Ltd. 2000; Rosen et al. 2001; Atticus Resource Consulting Ltd. 2001; Ketcheson et al. 2001a; Ketcheson et al. 2001b; Ketcheson et al. 2001c; Jones & McGregor 2002; Sulyma & Alward 2004; Jones 2004), however, have encotintered several key problems, specifically: 1) the procedure used to model the relationships between environmental variables and ecological units has been difficult to understand for anyone other than those individuals who created the model; 2) the model of the relationships between environmental variables and ecological units could not be automatically updated with field observations; and 3) the resulting spatial database did not report prediction uncertainty. 22 To address the problems encountered by earlier PEM projects, and to meet the demand for large-scale ecosystem maps by managers of forest-related activities, this project aimed to develop and evaluate a modelling procedure that: 1) predicted the spatial distribution of large-scale ecological units, with a minimum 65% correct, through the incorporation of expert knowledge, but still allowing automated model adjustment based on field gathered observations; 2) improved the interpretability of the ecological relationships for users other than the ecologist used to develop the model; and 3) retained prediction uncertainty. 23 2 METHODS 2.1 STUDY AREA The study area was located along the west coast of mainland BC, centred around Work Channel, and falling roughly between 54°4'N 130°31 'Wand 54°49'N 129°51 'W (Figure 2). The study area was approximately 150,000 ha and was situated between Portland Inlet and the mouth of the Skeena River. It was defined by the biogeoclimatic subzone/variant units: Coastal Western Hemlock Very Wet Maritime Montane (CWHvm2), Coastal Western Hemlock Very Wet Maritime Submontane (CWHvm1), and Coastal Western Hemlock Very Wet Hypermaritime Central (CWHvh2), that fell within six BC Ministry of Sustainable Resource Management landscape units: Tuck, Kaien, Quottoon, Khutzeymateen, Somerville and Union. I chose this study area for two reasons. Firstly, the area appeared to contain an adequate amount of georeferenced field data to aid model development and perform accuracy assessment. Secondly, the study area overlapped with the North Coast PEM Project study · area (Jones 2003), which was a previously completed PEM project that employed a belief matrix expert system. I felt that overlapping with a previously completed PEM project would increase the availability of the GIS data required to build the GIS database. The study area lies within the Coast and Mountains Ecoprovince of the Humid Maritime and Highlands Ecodivision of the B.C. Ministry of Sustainable Resource Management's (MSRM) ecoregional classification (Demarchi 1995). The climate is generally temperate and rainy with typically warm summers. The climate is dominated by 24 - KWIN"AMASS Pf:AK f () I ,_ .. ,_ (j) 0 c. z - .. I!) Q ...._ Study Area BEC Units C\NH vh 2 C\NH vm 1 C\NH vm 2 ~ j Figure 2 - The study area which was used to conduct large-scale predictive ecosystem mapping. The study area was comprised of the CWHvm2, CWHvml and CWHvh2 BEC units within the BC Ministry of Sustainable Resource Management Tuck, Kaien, Quottoon, Khutzeymateen, Somerville and Union landscape units. It was approximately 150,000 ha and centred on Work Channel in North-Coastal BC. 25 the arrival of frontal systems from the Pacific Ocean and their subsequent lifting over the Coast Mountains. As a result, rain is abundant throughout the year, but markedly lower in the summer months. Ecoprovinces are divided into ecoregions, and the study area overlapped with both the Coastal-Gap Ecoregion and the Northern Coastal Mountains Ecoregion. The Northern Coastal Mountains Ecoregion, which comprises the northern third of the study area, contains typically large, rugged, ice-capped mountains that rise sharply from the ocean. The Coastal Gap Ecoregion portion of the project area contains somewhat lower and more rounded mountains, although valley sides are still rugged and steep (Campbell eta/. 1990). Starting at sea level, the Coastal Western Hemlock (CWH) zone occupies elevations less than 600 m near the ocean and less than 800 ni inland. Western hemlock (Tsuga heterophylla) is usually the most common tree species although western redcedar (Thuja plicata) is abundant throughout the Coastal Gap Ecoregion. Amabilis fir (Abies amabilis) and yellow-cedar (Chamaecyparis nootkatensis) are both found in wetter sites, but with arnabilis fir dominating the zone's high elevations and northern latitudes. Douglas-fir (Pseudotsuga menziesii) is abundant in drier parts of the zone, which lie south of the study area (Meidinger & Pojar 1991). The hypermaritime portion of the study area contains two distinct areas: the lowland areas and the mountainous areas. The lowland area is dominated by subdued terrain occurring adjacent to the ocean, while the mountainous area is dominated by rounded mountains occurring adjacent to Work Channel (Banner eta/. 1993). Bog woodlands, bog forests and swamp forests occur extensively in the low-lying areas of the CWHvh2. These 26 sites are characterized by extremely wet soils with poor nutrient availability and poor soilmoisture transmissivity, especially at low slope gradient. The upland sites of the CWHvh2 are dominated by wet and mesic soils with good soil transmissivity (Banner eta!. 1993). The maritime portion of the study area occurs inland from the CWHvh2 and is dominated by large rugged mountains. In the CWHvm2 and CWHvm1 BEC units, which occur in the maritime climate, soil transmissivity is generally constant with slope gradient. This is in contrast to the CWHvh2, which has extremely low soil-moisture transmissivity on low slope gradients (Banner eta!. 1993). The subalpine elevations that lie above the CWH zone are occupied by the Mountain Hemlock (MH) and Alpine Tundra (AT) biogeoclimatic zones (Meidinger & Pojar 1991). The MH zone was excluded from the study area due to a lack of field data. The AT zone was excluded because it typically contains very little forest and did not fit with the project's objective of producing ecosystem maps that aid the management of forest related activities. 2.2 METHODS OVERVIEW I used a multistage methodology (Figure 3) to create the expert system and meet the objectives in this study. Identifying the required GIS database layers was my primary step in the methodology. To verify the spatial and thematic accuracy of the GIS layers I used expert opinion and comparisons to other spatial-data sets. To build the Bayesian-network knowledge base I used Norsys Software Corporation's Netica (version 2.17) software application (Norsys 1997). Building a Bayesian network in Netica generally involved creating: 1) input, output and intermediate nodes, 2) links between 27 Create GIS Database Add nodes to adjust intial estimate of siteseries group Acquire necessary GIS data layers Adjust conditional probabilities to re fleet beliefs Run/adjust spatial models and queries necessary to produce additional GIS layers Create test map Spatial models and queries performing as expected? Assemble individual GIS layers into a GIS database ~ Develop nowledge Ba se Create Bayesian network to predict soil-moisture and soil-nutrient regimes ... .. Create final map of site-series group Add node to reflect edatopic grid (site-series group) I I I I I I ·--------------------------------------------· I Assess map accuracy Figure 3- This diagram illustrates the methodology used to meet the objectives of this study. There were three general sections in the methodology that also comprises most expert systems: create the database, develop the knowledge base and perform inference. 28 the nodes 3) classes for each node, and 4) conditional-probability tables. I created an initial Bayesian network to predict soil-moisture and soil-nutrient regimes. I made a preliminary site-series-group prediction by adding a node to reflect the edatopic-grid relationships between soil-moisture and soil-nutrient regimes in each BEC subzone/variant. By adding new nodes and node relationship, I adjusted the initial siteseries-group prediction, which was derived from the edatopic grid. Using the advice ofthe ecological-domain expert, I adjusted the conditional probabilities by assessing those posterior probabilities that reflect the entire study-area input conditions. I used two procedures to test the knowledge base: 1) visual assessment of a map of predicted site-series group that covered a portion of the study area, and 2) comparison of predicted site-series group with a reference-data set. Ifthe results of either of these two testing procedures were not satisfactory then the knowledge base received further conditional-probability value adjustments. Using Netica as the inference engine, I processed the entire GIS database through the knowledge base to generate a final predictive ecosystem map. To assess the accuracy ofthe final ecosystem maps, I compared the map results to an independent reference-data set using several statistical procedures in Microsoft Excel 97. 2.3 GIS DATABASE DEVELOPMENT Prior to Bayesian-network knowledge base construction, the ecological-domain expert, Del Meidinger of the BC Ministry ofForests, and I reviewed a belief matrix developed for the North Coast PEM Project by Jones (2003). Although encompassing a 29 larger geographic area and a greater number ofBEC subzone/variants, the North Coast PEM Project contained predictive site-series mapping of the CWHvm2, CWHvm1 and CWHvh2 BEC units. This review of the North Coast PEM Project allowed us to understand how existing and modelled GIS layers influenced site-series prediction, and which GIS layers could be available for the Bayesian-network knowledge base. Based on that review, we identified 14 GIS layers that we considered ecologically significant for site-series prediction. Four of these layers were directly available from an existing GIS database (described below). The remaining 10 were a product of applying spatial models to existing GIS databases (described below). The four GIS layers directly available from an existing GIS database were foreststand age, forest-stand height, forest-stand species, and forest-stand crown closure. I acquired all four forest-stand attributes from the B.C. Ministry of Sustainable Resource Management's (MSRM) vegetation resource inventory (VRI) GIS database. The 10 GIS layers that were a product of applying spatial models to existing GIS databases were: ridge, hill, river bench, lake/wetland bench, gully, slope gradient, bedrock, stream density, ocean wind, and toe slope. I derived these GIS layers using the same logic as Jones (2003). Using Arc/Info version 8.3 GIS software from Environmental Systems Research Institute. (ESRI), I created a triangulated irregular network (TIN) from MSRM's terrain resource information management (TRIM) digital elevation points and air-photointerpreted ridges and gullies. I modelled the GIS layers of slope gradient, ridge, river bench, lake/wetland bench, and gully using the TIN digital terrain model (DTM) (Table 1). 30 Table 1- A summary of the Jones (2003) rule-based logic used to create the TIN/DEM dependant GIS layers used in this project. GIS Layer Slope Gradient Rule-Based Logic A natural component of a TIN. Slope gradient measures the maximum rate of change in elevation across the triangle. Ridge - TIN polygons with a slope gradient greater than 40% and that were Ridge adjacent and within 20m of a TRIM ridge Ridge Buffers - TIN polygons with a slope gradient greater than 40% and that were between 20m to 40m of a TRIM ridge Low River Bench - TIN polygons with a slope gradient less than 5% and that were adjacent and within 1OOm of a fourth order TRIM definite stream High River Bench - TIN polygons with a slope gradient less than 5% and River Bench between 1OOm and 200m from a fourth order TRIM definite stream River Terrace - TIN polygons with a slope gradient between 5% and 10% and that were between 200m and 500m from a fourth order TRIM definite stream. Or, polygons with a slope gradient less than 10% and that were between 500m and 1OOOm from a fourth order TRIM definite stream. Gully - TIN polygons with a slope gradient greater than 30% and that were Gully adjacent and within 20m of a TRIM gully Gully Buffer - TIN polygons with a slope gradient greater than 30% and that were between 20m and 40m of a TRIM gully Lake/Wetland . TIN polygons with a slope gradient less than 5% and that were adjacent and Bench within 200m of a TRIM lake or wetland Hill - Hills were derived from TRIM contour lines. A hill was represented by the largest outside contour line, that would form a closed area, and was less than Hill or equal to 1200m in length, but which was not a depression. Hill Buffer - TIN polygons with a slope greater than 30% slope gradient and were adjacent and within 40m of a hill polygon Using a digital elevation model, a location was considered to be a toe slope if Toe Slope slope gradient was less than 25% and was down-slope and within 100 metres of an area with a slope gradient greater than 40% 31 Toe slope was generated from a raster digital elevation model (DEM), which was produced from the TIN (T able 1). During the creation of aDEM from a TIN, I set the raster resolution at 625m2 , or 1/ 16 ha, which was the product of a 25 x 25 m pixel. I chose this size because it approximated the minimum geographic spacing between TRIM elevation points along ridge and gully terrain features. This resolution also matched the MSRM provincial raster DEM product. The remaining GIS layers that I created using Jones (2003) procedures were not derived directly from a TIN or DEM but were generated from other data sources (Table 2). Table 2 - The GIS layers ofbedrock, stream density, and ocean wind, which were derived using the procedures of Jones (2003), were not derived solely from the TIN or raster DEM, but rather depended on other data sources. GIS Layer Procedu re The B.C. Ministry of Energy and Mines' 1:250000 bedrock geology GIS Bedrock database was reclassified into 5 classes: limestone, volcanic, rich, intermediate, and poor. The length ofMSRM TRIM stream that intersects a stream density "base-unit" polygon. The "base-unit" polygon was defined by the intersection ofVRI Stream Density polygons with TIN slope gradient polygons that has been simplified into five classes, and with TIN slope-aspect polygons that has been simplified into two classes. Ocean Wind Those areas of land that were eastwardly visible from a sea-levelline running down the middle of Chatham Sound, to a maximum elevation of 50 m. In addition to thos eGIS layers generated using the procedures of Jones (2003), the ecological-domain expert and I identified five other GIS layers useful in predicting site series. These GIS layers were floodplains, fans, soil moisture, shoreline and a continuous landform-elements layer. 32 The floodplains layer and the continuous-landform-elements layer, which delineates ridges, shoulders, spurs, hollows, back-slopes, terraces, foot-slopes and valley-bottoms, were created to supplement the individual GIS landform layers (ridge, gully, hill-top, etc.) and the river-benches layer that were derived using the Jones (2003) procedures. This supplementation was conducted due to the ecological-domain expert's concern with some of the Jones (2003) procedures used to generate landform elements and river floodplains (see Figure 4 for an example). The ecological-domain expert and I felt that by incorporating two methods ofmodelling landform elements and river floodplains within the knowledge base v.;e could increase our chances of successfully predicting the location of landform elements and river floodplains. I modelled floodplains, fans and soil moisture in the GIS software application SELES (Spatially Explicit Landscape Event Simulator), which was available from Gowlland Technologies Inc., using routines developed by Fall (2002). To generate these knowledge base inputs, I used the same DEM that was used to generate the toe slopes. I used the SELES' floodplain model (Fall 2002) to generate floodplains . Moving · outward from the location of approximate fourth-order streams, the floodplain model accumulated cost at a rate of (1 +percent slope gradient) 1.5 until the cost exceeded 100. I divided the resulting floodplain-cost layer into low-river bench, high-river bench and river terrace based on a visual comparison with air photos, satellite imagery and DEM terrain. I considered floodplain-cost values less than 10 to represent low-river benches; cost values between 10 and 35 to represent high-river benches; and cost values greater than 35 to represent river terraces. 33 Kilometres 1 2 ~ ~ ~ c ~~ ~~~~~ (c) Figure 4- This sample map (a) demonstrates the difference between the upper-landform elements as defined by Jones (2003) (b) and the upper-landform elements as defined by the continuous-landform-elements layer (c). After reviewing the results of the Jones (2003) ridges and hill layers (b), the ecological-domain expert and I felt that these layers were underestimating upper-landform elements (convex areas that shed water). Additional upper-landform elements (c) were identified using the continuous-landform-elements layer, which was based on the Schmidt and Hewitt (2004) procedure. 34 I delineated alluvial fans using the SELES alluvial-fan model (Fall2002). The model first identifies confluence points between medium-sized streams and floodplains or larger bodies of water. Starting from these confluence points, the fan model spreads across slope gradients between 10% and 20% without accumulating cost. Cost will accumulate if slope gradients exceed 20% or drop below 10%. To control fan size, I set the maximum allowable accumulated-cost to be 500 or 1000 m from the confluence point. I estimated soil moisture using the SELES soil-moisture model (Fall 2002), which implements the SINMAP algorithm developed by Packet al. (1998). In the SINMAP soil moisture algorithm, a raster cell's soil moisture is related to the amount of upstream area that is feeding into the cell and the cell's slope (Packet al. 1998). Although this algorithm can incorporate local rates of soil transmissivity and recharge, these rates were unknown within the study area. Consequently, the default values were used for transmissivity and recharge. I created the shoreline GIS layer using raster analysis functions in Arc/info. This GIS layer represented all areas less than 5 m above-sea-level that were adjacent to the ocean. Based on the advice of the ecological-domain expert, I chose 5 m because it approximated the influence of tidewater and large ocean swells on the shoreline. I developed the layer of continuous landform elements using an approach similar to Schmidt and Hewitt (2004), which uses the semantic-import model of Burrough (1989). Using the raster DEM, each cell was assigned a membership value for the landform elements: ridges, shoulders, spurs, hollows, back-slopes, terraces, foot-slopes and valleybottoms. To assign membership values, however, I used a Bayesian network, instead of the fuzzy-logic approach used by Schmidt and Hewitt (2004). The follow section describes the 35 Bayesian network of continuous landform elements, which was created for this study. During this description, and during subsequent descriptions of Bayesian networks, node names will be displayed in bold and node classes will be displayed with an underline. Using the Schmidt and Hewitt (2004) approach, six GIS input layers were required to produce the continuous-landform-elements layer: tangential curvature (Dikau 1989), profile curvature (Dikau 1989), maximum and minimum curvature (Wood 1996), slope gradient and terrain context. I generated each of the four types of curvature, plus slope gradient, using Wood's (1996) multi-scale geomorphic characterization ofDEMs. I created terrain context, which describes whether a raster cell was a valley or hill, using the TOPHAT function of Rodriguez et al. (2002). Using the logic of Schmidt and Hewitt (see Figure 3 in Schmidt & Hewitt 2004), I used combinations of tangential curvature (TangentialCurve) and profile curvature (ProfileCurve) to assign sloping areas into one of nine local form elements: shoulder (Shoulder), shoulder slope (ShoulderSlope), hollow shoulder (HollowShoulder), spur (Spur), planar spur (PlanarSpur), hollow (Hollow), spur foot (SpurFoot), foot slope (FootSlope) and hollow foot (HollowFoot). This combination of tangential curvature and profile curvature was represented in the Landform BN by the SlopingShape node, which was conditional on the TangentialCurve node and the ProfileCurve node (Figure 5). Again using the logic of Schmidt and Hewitt (see Figure 3 in Schmidt & Hewitt 2004), I used combinations of maximum curvature (MaxCurve) and minimum curvature (MinCurve) to assign flat areas into one of six local form elements: peak (Peak), ridge (Ridge), plain (Plain), saddle (Saddle), channel (Channel) and pit (Pit). Figure 5 depicts this relationship, with the FlatShape node being conditional on the MaxCurve and the MinCurve nodes. 36 PronteCurve TangenttaiCurve Convex 100 i ! i straight 0 i I i 0 ~ ; ; i \ Convex Straight Concave SlopingShape Shoulder ShoulderS lope HolloWShoulder Spur PlanarSiope Hollow SpurFoot FootSiope HollowFoot 100 0 0 0 0 0 0 0 0 ,I 100 0 0 i i i i i i ; i 100 0 0 SlopeGradientDegrees Flat Sloping ! ~ TerralnContext Hill 0 ;' ; ' HIIISiope 100 ' Valley 0 j ' ~ MaxCurve Convex straight Concave ~I i i ! FormEiement 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Shoulder ShoulderSiope HolloWShoulder Spur PlanarSiope Hollow SpurFoot FootSiope HollowFoot Peak Ridge Plain Saddle Channel PH I ' / MlnCurve Convex 100 i ! ! Straight 0 i ! I Concave 0 i ! i ! i i i i ! "' i I ' FlatShape Peak Ridge Plain Saddle Channel Pit I 100 ' 0 0 0 0 ! 0 LandformEiement Ridge Shoulder BackSiope Hollow Spur Terrace FootSiope ValleyBollom 0 25.0 0 0 75.0 0 0 0 ... ' I i i i ~ l ;' l ; i Figure 5 - The Bayesian network of continuous landform elements, which was developed in Netica version 2.17, was developed to predict the spatial arrangement of landform elements. In this diagram, the boxes represent nodes and the arrows represent the relationship between nodes. The input nodes, which represent GIS database layers and are not related to a parent node, were TangentialCurve, ProfileCurve, MaxCurve, MinCurve, SlopeGradientDegrees and Terrain Context. The output node, which is not related to a child node, was LandformEiement. The remaining nodes, which were SlopingShape, FlatShape, and FormElement, were intermediate nodes. The conditional probability tables for the intermediate nodes and output nodes were defined using expert opinion and the logic of Schmidt and Hewitt (2004). For each node, the node name is provided at the top of the box with the node classes listed on the left. The thick horizontal black bars and proceeding number indicate the probability, or belief, that a class occurs given the parentnode conditions. For example, given that the form-element node (FormElement) was 100% Shoulder and the terrain-context node (TerrainContext) was Hillslope, there was 25% probability that the landform element (LandformEiement) was Shoulder and 75% probability that it was Spur. ! 1 37 Once each raster cell was assigned to either a sloping-form element or a flat-form element through the FormElement node, I assigned each cell a landform-element probability (LandformElement) based on terrain context (TerrainContext) and FormElement. The result was that each cell had a probability ofbelonging to each of the eight landform elements: Ridge, Shoulder, Backslope, Hollow, Spur, Terrace, Footslope and Valley Bottom. To produce these fuzzy landform-element probabilities I incorporated some uncertainty into the LandformElement node's conditional probabilities (Table 3). In Table 3 and Figure 5, for example, if a cell had a Shoulder FormElement and fell on a Hillslope in TerrainContext, then the cell was considered to have a 75% probability ofbeing called a Spur LandformElement and a 25% probability of being called a Shoulder. To capture the uncertainty in defining flat, slope or curved, I used the semanticimport model ofBurrough (1989) for each of the input nodes. The semantic-import models allowed the BN inputs to reflect the class uncertainty, or fuzzy membership, resulting from either the continuous nature of the input or the varying semantic definitions of the input classes. Table 4 lists the thresholds I used to convert the continuous input values into soft classes. I heuristically determined these thresholds by visually assessing the spatial arrangement of each class relative to terrain. I ran the continuous-landform-element BN three times to capture the multiple spatial scales at which a semantically-defined landform element can exist. Each time I used a different search-window radius for curvature and slope gradient: 49 m, 74 m and 99 m. During all three runs, the search-window radius for the 38 Table 3- The conditional-probability table used to define the relationship between landform elements, terrain context and form elements. This table determines the probability of occurrence of each landform-element class given the input conditions of terrain context and form element. For example, if the TerrainContext was Hill and FormElement was Shoulder then there was a 60% probability that LandformElement node was Ridge and a 40% that it was Shoulder TerrainContext FormElement Hill LandformElements Ridge Shoulder BackSlot;2e Hollow St;2Ur Terrace FootSlot;2e ValleyBottom Shoulder 0.6 0.4 0 0 0 0 0 0 Hill ShoulderSlot;2e 0 1 0 0 0 0 0 0 Hill HollowShoulder 0 0.6 0 0.4 0 0 0 0 Hill St;2ur 0.6 0.2 0 0 0.2 0 0 0 Hill PlanarSlot;2e 0 0 1 0 0 0 0 0 Hill Hollow 0 0 0 1 0 0 0 0 Hill St;2urFoot 0.6 0 0 0 0.4 0 0 0 Hill FootSlot;2e 0 0 1 0 0 0 0 0 Hill HollowFoot 0 0 0.4 0.6 0 0 0 0 Hill Peak 1 0 0 0 0 0 0 0 Hill Ridge 1 0 0 0 0 0 0 0 Hill Plain 1 0 0 0 0 0 0 0 Hill Saddle 0.5 0 0 0.5 0 0 0 0 Hill Channel 0 0 0 1 0 0 0 0 Hill Pit 0 0 0 1 0 0 0 0 HillSlot;2e Shoulder 0 0.25 0 0 0.75 0 0 0 HillSlot;2e ShoulderSlot;2e 0 0.25 0.75 0 0 0 0 0 HillSlot;2e HollowShoulder 0 0.25 0 0.75 0 0 0 0 HillSlot;2e St;2ur 0 0 0 0 1 0 0 0 HillSlot;2e PlanarSlot;2e 0 0 1 0 0 0 0 0 HillSlot;2e Hollow 0 0 0 1 0 0 0 0 HillSlot;2e St;2urFoot 0 0 0 0 0.75 0 0.25 HillSlot;2e 0 FootSlot;2e 0 0 0.75 0 0 0 0.25 0 HillSlot;2e HollowFoot 0 0 0 0.75 0 0 0.25 0 Hil1Slot;2e Peak 0.25 0 0 0 0.75 0 0 0 Hil1Slot;2e Ridge 0.25 0 0 0 0.75 0 0 Hil1Slot;2e 0 Plain 0 0 0 0 0 1 0 Hil1Slot;2e 0 Saddle 0 0 0 0.5 0 0.5 0 HillSlot;2e 0 Channel 0 0 0 1 0 0 0 Hi11Slot;2e 0 Pit 0 0 0 0.5 0 0.5 Valley 0 0 Shoulder 0 0 0 0 0.75 0 Valley 0.25 0 ShoulderSlot;2e 0 0 0 0 0.5 0 Valley 0.5 0 HollowShoulder 0 0 0 0.35 0.35 0 Valley 0.3 St;2Ur 0 0 0 0 0 0.5 Valley 0 0.5 PlanarSlot;2e 0 0 0 0.5 0 0 0 Valley 0.5 Hollow 0 0 0 0 0.35 0 Valley 0 0.65 St;2urFoot 0 0 0 0 0 0.25 Valley 0 0.75 FootSlot;2e 0 0 0 0 0 0 Valley 0 1 HollowFoot 0 0 0 0 0.25 0 Valley 0 Peak 0.75 0 0 0 0 0 0.1 Valley 0 0 Ridge 0.9 0 0 0 0 0.1 Valley 0 0 Plain 0.9 0 0 0 0 0 Valley 0 0 Saddle 1 0 0 0 0.1 0.1 Valley 0 0 Channel 0.8 0 0 0 0 Valley 0 0 Pit 0 0 1 0 0 0 0 0 0 1 39 terrain context node remained fixed at 1875 m. My choice of search-window radii, which attempted to capture the scale that site series were influenced by local terrain geometry, was based on the advice and opinion of the ecological-domain expert. I created the final layer of continuous landform elements by taking the arithmetic mean from the output of the three continuous-landform-element runs. Table 4 - The heuristic rule-base used to convert input values into fuzzy landform attributes. For example, slope gradients less than 4° had full membership to the Flat class. Membership to the Flat class decreased linearly to zero between 4° and 6° slope-gradient. Above 6° slope-gradient there was no membership to the Flat class. Node Class Curvature (Tangential, Profile, Minimum and Maximum) SlopeGradientDegree TerrainContext Threshold Full Membership No Membership Concave < -0.002 > -0.0004 Straight > -0.0004 and < 0.0004 < -0.002 and > 0.002 Convex > 0.002 < 0.0004 Flat <4 >6 Sloping >6 <4 Valley < -0.6 > -0.3 > -0.3 and < 0 < -0.6 and > 0.4 > 0.4 <0 Hillslope or Plain Hill I combined all of the separate input GIS layers into a single GIS database using a raster data model. The raster-cell dimensions were matched to the SELES raster dimensions, which were 25 x 25 m. I converted the vector data sets ofbedrock, stream density, forest-stand height, forest-stand age, forest-stand crown-closure and forest-stand species to raster using a majority rule. With a majority rule, the value of a raster cell reflects the vector polygon that occupies the largest portion of the cell. Table 5 summarizes the GIS database's contents along with comments regarding each layer's accuracy as it may pertain towards predicting site series. Figure 6 illustrates the varying spatial scales of the GIS input layers. 40 Table 5 - A list of the data layers contained within the GIS database, along with comments regarding each layer's ability to assist site series prediction. GIS Layer Confidence in the GIS layer's spatial and thematic content with respect to the needs of site series prediction Ridge High for sharp ridges Hill Top Medium- identifies some but not all Gully High River Floodplains Low Lake/Wetland Benches Low Slope Gradient High Bedrock Low Stream Density Medium Ocean Wind Medium Shoreline High Toe Slope Low SELES Floodplains Medium SELES Alluvial Fans Low SELES Soil Moisture Medium in high relief areas, low in low relief areas Continuous Landform Elements Medium Forest Tree Age High (but low for old age trees) Forest Tree Height High Forest Tree Crown Closure High Forest Tree Species High 41 Landform Elements 0 Ridge - Shoulder - - BackSiope Hollow Spur - - Terrace - FootSiope 0 ValleyBottom - SELES Soil Moisture Dry - Mesic - Wet Very Wet 0 Outside the study area c=:J acean Forest Crown Closure 0-5% - - 46-55% 6-15% - ~ 16-25% - 66-75% 25-35% - 76-85% 36-45% 66-95% - 9 6 -100% 0 Ocean 0 0 5 Limestone Volcanics Bedrock Geology 0 Rich Mineralogy - Poor Mineralogy c:J Intermediate Mineralogy 0 Outside the study area 10 Kilometres Figure 6- Four examples of the GIS data layers used to create the database used in this study. This diagram highlights the different spatial scales that occurred within the GIS database. For example, the landform-elements layer has a large-spatial scale and the bedrockgeology layer has a small-spatial scale. 42 contents along with comments regarding each layer's accuracy as it may pertain towards predicting site series. Figure 6 illustrates the varying spatial scales of the GIS input layers. Slope-aspect is considered an important ecological factor in relation to site series in some area of the province, however, the ecological domain expert did not consider slopeaspect ecologically important within the study area. The high latitude and the presence of heavy cloud cover for most of the year effectively removes the influence that sun-incidence angle can have on soil moisture within the study area. 2.4 KNOWLEDGE BASE CONSTRUCTION Similar to the traditional approach of predictive site-series mapping (RIC 1999), I first stratified the study area using the BC Ministry of Forest's medium-scale subzone/variant mapping of the biogeoclimatic ecosystem classification. I constructed a separate site-series knowledge base for each of the three resulting strata (CWHvrn2, CWHvml and CWHvh2) using the advice and knowledge oflocal ecological-domain expert, Del Meidinger. A review of the input GIS database contents revealed that, based on the ecologicaldomain expert's opinion, the spatial and thematic accuracy of the GIS database was not sufficient to accurately map all of the site-series units. The solution was to group similar site .series into site-series groups. Although not desirable, site-series grouping is not uncommon in predictive site-series mapping (Timberline Forest Inventory Consultants Ltd. 2000; Jones 2003; Jones 2004). Grouping site series reduced the total number of mapped units from 42 down to 23, and resulted in six site-series groups for CWHvrn2 (Table 6), seven site-series groups for CWHvml (Table 7), and 10 site-series groups for CWHvh2 (Table 8). 43 Table 6- Site-series groups used for the CWHvm2 BEC unit. CWHvm2 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers. Original Site Series 01 - HwBa- Blueberry Final Site-Series Group 01 06- HwBa - Deer fern 02 - HwPl - Cladina 03 03 - HwCw - Salal 04 - CwHw- Sword fern 05 - BaCw- Foamflower 05 08- BaSa- Devil's club 09- CwYc- Goldthread (Bog forest) 09 10 - Pl - Sphagnum (Bog woodland) 10 U - CwSs - Skunk cabbage (Swamp forest) 11 44 Table 7 - Site-series groups used for the CWHvrn1 BEC unit. CWHvrn1 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers. Original Site Series 01 - HwBa- Blueberry 06- HwBa- Deer fern 02 - HwPl - Cladina 03 - HwCw - Salal Final Site-Series Group 01 03 04 - CwHw - Sword fern 05 - BaCw - Foamflower 05 08 - BaS a - Devil ' s club 09 - Ss - Salmonberry (high fluvial bench) 10 - Act- Red-osier dogwood (middle fluvial bench) 09 11 - Act - Willow (low fluvial benches) 12- CwYc - Go1dthread (bog forest) 12 13 - P1 - Sphagnum (bog woodland) 13 14- CwSs - Skunk cabbage (swamp forest) 14 45 Table 8 - Site-series groups used for the CWHvh2 BEC unit. CWHvh2 site-series units of Meidinger and Pojar (1991) were grouped into simpler units due to the low thematic and spatial accuracy of some GIS layers. Original Site Series Final Site Series-Group 01 - CwHw - Salal 01 02- PlYc - Rhacomitrium 03 - CwYc - Salal 04 - HwSs - Lanky moss 03 04 05 - CwSs - Sword fern 06 - CwSs - Foamflower 07 07 - CwSs- Devil's club 08 - Ss - Lily-of-the-valley (high fluvial benches) 09 - Ss - Trisetum (middle fluvial benches) 08 10 - Dr - Lily-of-the-valley (low fluvial benches) 11- CwYc- Goldthread (bog forest) 11 12 - PlYc- Sphagnum (bog woodland) 12 13 - CwSs- Skunk cabbage (swamp forest) 13 14 - Ss - Salal (Rock headlands and beach plains) 15 - Ss - Kindbergia (old beach plains) 16- Ss- Reedgrass (rocky headlands, colluvium, and old dunes) 14 17- Ss - Swordfem (marine terraces and scarps) 18- Ss- Slough sedge (strongly fluctuating brackish water table) 19 - Ss- Pacific crab apple (sites affected by brackish water) 18 46 Predicting soil moisture and soil nutrient regimes My first step in creating the BN knowledge bases was to predict soil-moisture and soil-nutrient regimes using the variables in the GIS database. I developed a preliminary influence diagram using the ecological-domain expert' s opinion on the relationships between soil-moisture regime, soil-nutrient regime and the layers in the GIS database (Figure 7 and Figure 8). To turn the influence diagram into a BN, I added classes to each of the variables using the ecological-domain expert's advice. I also merged the two influence diagrams into one BN and allowed the soil-moisture and soil-nutrient sub-models to share common variables (Figure 9). I populated each of the conditional probability tables using the ecological-domain expert's advice and supplemented with information published in the site identification field guide (Banner et al. 1993). In the soil-moisture-regime sub-model (Figure 7) I used two environmental factors that, based on the advice of the ecological-domain expert, were believed to be related to soilmoisture regime: terrain shape, and surface-visible waters. I estimated the component of soil moisture that was related to terrain shape through the Ground Wetness intermediate node, while the component of soil moisture that was related to surface-visible water was estimated through the Surface Wetness intermediate node. To predict soil wetness related to terrain shape (Ground Wetness), I used a combination of variables that described terrain shape or position (Landform Elements, Hill, Ridge, SELES Floodplain, Lake/Wetland Bench, River Bench, Toe Slope and Gully) and SELES soil moisture (Flow Accumulation Wetness) . I divided the landscape into areas that 47 Figure 7 - The influence diagram created to explain the relationship between the GIS database used in this study and soil-moisture regime. Like the description of the landform-elements Bayesian-network in Figure 5, the input nodes represent the GIS data layers and the intermediate nodes help by simplfying the relationship between input nodes and output nodes by grouping related nodes. Arrows represent the relationships between nodes. 48 Figure 8 -The influence diagram created to explain the relationship between the GIS database used in this study and soil-nutrient regime. Like the description of the landform-elements Bayesian-network in Figure 5, the input nodes represent the GIS data layers and the intermediate nodes help by simplfying the relationship between input nodes and output nodes by grouping related nodes. LakeWetland Bench' 49 I 3.63 6.81 23 .8 23.4 27.9 0.26 13.2 1.0 ! i i / : ' ~[ ',, ' " "' ~ ~~ fliS , GWSDry GWS Mesic GWSWet GWSVe "'"' c ~ 1 "' i_ Soil Moisture SM Dry 27.2 SM Mesic 54.1 SMWet 17.7 SMVeryWet 0.98 ! II 2.18e•003 ± 1 2e•004 Flow Accumuletion Wetnes SELES Dry 18.0 SELES Mesic 57.1 SELE S Wet 20.6 SELE S VeryWet 4.30 Bench BenchTerrace 0.34 BenchHigh 0.30 Benchlow 0.58 BenchNot 98.8 \ 0.16 99.8 LekeWetlend Bench LW Bench LW BenchNot SELES Fen 1.22 98.8 ~ " ' SELE S Fan SELES FanNot Ground Wetness "' LowSiopePos MS Low 30 .3 MS LowBuf 17.3 LowNot 52.3 Terrein Position Wetness GW ury GW Mesic GW W et 31 .5 : GW Verj!(__et 0.15 i ' ' ', ' ',, ',,, ' ',, ',, "'"' ~~~~~ "'"' "'"' "'"'"' "'"' "' ' ' i I SELES Toe 2.80 97.2 I 0.35 0.35 0.56 98.7 GlaciaiFioodplain HighFioodplain LowerFioodplain NotFioodplein Floodplein ... l i 1 --..... ! i I' . S 1to1 0 S 11to35 S 36to60 S 61to80 S 81plus so t 0.28 2.68 18.1 34.4 22.5 220 RPoor R Medium RRich R VeryRich i I l 45.7F. f 37.5 ' 16.7 i 0.14 Soil Richness ! I J SP High SP Medium SP Low / 'j 21 .8a i 33.7 .'_ 44.5 : A251plus 73.6 ~ Age Cless AO 0.21 A 1to20 1.62 A 21to40 2.68 A 41 to60 0.71 A 61to80 1.22 A 81to1 00 0.33 A 101to120 0.83 A 121to140 0.36 A 141to250 18.5 ~ Age Group A Mature AMid A Young CC0to5 CC6to 15 CC16to25 CC26to35 CC 36to45 CC 46to55 CC 56to65 CC66to75 CC76to85 CC86to95 CC 96to1 00 0• 0• Crown aosure Cless , i i . 40 3 m - f1 52.9 : 6.81 j Height Group Height Cless HO 0.22 H 1to1 0 13.5 H 11to19 26.5 H 20to28 34.2 H 29to37 18.8 H 38to46 5.69 H 47to55 1.04 H 56to64 .084 H 65plus 0• H Short H Medium HTall Site Produdivity Bedrock Group BRRich 992 BR Poor 0.82 1 Bedrock BR Limestone 43.6 BR Volcanics 25.0 BRRich 30.6 BRintermed 0.82 BRPoor 0• Richness with Fens RS Poor 45 .5 i RS Medium 37 .5 ' I RS Rich 16.8 j RS VeryRich 0.22 I Slope Cless Soil Nutrient SN Poor SN Medium SN Rich SNVeryRich Seepege with Flow Accum 38.6 SS SeepSome 58 .6 · SS SeepHigh 2.84 -J SS SeepNo 55.9 41 .4 2.71 Seepage ', , Surfece Wetness 75.0 12.4 8.16 4.48 S SeepNo S SeepSome S Seep High ', """ ', ',, ',, Streem Density SO Omha SO 1to20mha SO 21 to40mha SO 41 plusmha 0.32 0.17 0.30 99.2 River Bench Riv Bench Terrace Riv Bench High Riv Bench low Riv BenchNot Figure 9 - The Bayesian network used to define the relationships between the GIS data layers used in this study, and soil-moisture and soil-nutrient regimes within the CWHvm2 BEC unit. This network was created by combining the soil-moisture regime and soil-nutrient regime influence diagrams in Figure 7 and Figure 8. To tum the influence diagram into a Bayesian network, each node was given a set of classes and conditional probabilities. A description of the node format and nomenclature can be found in Figure 5. Hill HiiiBuf HiiiNot Hill 0.54 0.75 98.7 Ridge Shoulder BackSiope Hollow Spur Terrace FootSiope Valley8ottom Lendform Elements SELES Floodplein SELES NotFioodplein 97.1 SELES LowerFioodplein 1.30 SELES HigherFioodplein 0.80 SELES UpperFioodplein 0.80 1.28 ± 6.6 50 I considered water shedding (UpSlopePos - upland areas where water is moving out quicker than being replenished), water accumulating (LowSlopePos - lowland areas where water is moving out slower than being replenished), benches adjacent to water bodies (Bench), and areas between upland and lowland (Mid Flat Landform). For each combination of upperslope position, mid-slope position, lower-slope position and bench, I assigned a conditional probability of belonging to each class in the Terrain Position Wetness intermediate node. To produce Ground Wetness, I created a conditional-probability table ,which took into account the Terrain Position Wetness intermediate node and the SELES' soil moisture input node (Flow Accumulation Wetness). Water visible on the earth's surface, such as streams and rivers, was the other environmental factor that the ecological-domain expert and I believed reflected soil moisture. I estimated surface-visible soil moisture (Surface Wetness) using local stream density combined with slope gradient, where higher stream density was assumed to signify greater soil moisture. · When comparing areas of equal stream density, however, areas of high slope would be drier than areas of low slope. To estimate soil-nutrient regime (Figure 8) I used two environmental factors that, based on the ecological-domain expert's opinion,, were believed to influence soil-nutrient regime. These environmental factors were: soil seepage a ~ with Flow Accum) and soil richness (Soil Richness). Seepage was used as an influence on soil richness because it was believed that increased seepage contributes to increased soil richness. I derived the seepage intermediate node (Seepage with Flow Accum) from lowland, bench, high stream-density and high flow-accumulation areas because, based on the advice of 51 the ecological-domain expert, high soil-moisture levels affect seepage rates. I used two stages to estimate seepage rates: first seepage (Seepage) was estimated from the intermediate nodes representing low-terrain position (LowSiopePos), fluvial benches (Bench) and surface-visible soil-moisture (Surface Wetness). This estimate of seepage was subsequently modified by the SELES estimate of soil moisture (Flow Accumulation Wetness) to produce a final seepage estimate (Seepage with Flow Accum). The other influence on soil-nutrient regime was soil richness. Based on the opinion of the ecological-domain expert, soil richness was related to four environmental variables: alluvial fans, slope gradient, underlying bedrock and forest. I created an initial soil-richness intermediate node (Soil Richness) using slope (Slope Group), bedrock geology (Bedrock Group) and forest productivity (Site Productivity). The ecological-domain expert defined the conditional probabilities for this initial soil-richness node. Based on the ecologicaldomain expert's opinion, forest productivity was given far greater influence on the conditional probabilities of soil richness than bedrock geology or slope. Slope was considered to have only a slight influence on soil richness, and bedrock geology was mapped at such a coarse scale that confidence in its predictions was low. Generally, areas of high forest productivity were considered by the ecological-domain expert to have high soil richness, and areas of low forest productivity were considered to have low soil richness. Forest productivity was seen by the ecological-domain expert as an important effect of soil richness. Unlike any of the previous intermediate nodes, however, which relied on terrain variables, the estimate of forest productivity (Site Productivity) relied solely on the MSRM VRI forest vegetation inventory. Using classes of forest-stand age (Age Group), height (Height Group) and crown closure (Crown Closure Group), the ecological-domain . 52 expert assigned each class combination a probability of being high (SP High), medium (SP Medium) or low productivity (SP Low). The ecological-domain expert's belief was that forested areas with short, old-aged trees with open crown-closure indicated areas of low productivity, while areas with medium or old-aged tall trees with closed crowns indicated areas of high productivity. Once the initial soil-richness node (Soil Richness) was defined, I added the influence of alluvial fans (Fans) on soil richness to produce a final soil-richness node (Richness with Fans). Although occurring infrequently across the landscape, alluvial fans were seen by the ecological-domain expert as an important indicator of rich soils. To reflect this belief, I created the conditional-probability table for Richness with Fans with increased probability of rich soils occurring on sites with alluvial fans. Once I built theBN knowledge base to predict soil-moisture and soil-nutrient regimes, I adjusted the conditional-probability tables to reflect the unique attributes of each BEC subzone/variant. In the CWHvml and CWHvm2 BEC units, I considered the soil transmissivity to be uniform across the landscape. In the CWHvh2, however, many of the areas of low slope-gradient hold moisture due to their low soil-transmissivity. To reflect this difference in soil transmissivity between the hypermaritime BEC subzone/variants and the maritime BEC subzone/variants, I adjusted the implementation ofthe SINMAP algorithm in SELES to increase the influence of low slope gradients on soil moisture. This adjustment allowed the SELES flow-accumulation GIS layers to reflect the ecological-domain expert's belief that low slope-gradients within the CWHvh2 BEC unit were much wetter than low slopes in the either the CWHvml or CWHvm2 BEC units. 53 In addition to the changes to the soil-moisture algorithm in the CWHvh2, I also adjusted two of the nodes required to predict soil nutrient in the CWHvh2. I gave the Crown Closure Group and the Site Productivity intermediate nodes an additional class to distinguish sites with very sparse crown closure. Adding a node to reflect the edatopic grid Once I developed a BN knowledge base for each BEC subzone/variant to reflect the ecological-domain expert's initial beliefs around soil moisture and soil nutrient, I used the domain expert's advice and the edatopic grids (Figure 10) to combine soil-moisture and soilnutrient regimes into an initial site-series-group prediction (Site Series Edatopic) (Figure 11). Although I used soil-moisture and soil-nutrient regimes to place each location in the appropriate general area on the edatopic grid, the ecological-domain expert recognized that other variables were required to adjust site-series prediction. Adjustments were necessary either because of inaccuracies in the soil-moisture and soil-nutrient sections of the model, or because more than one site-series group could occur at a given edatopic-grid position. Based on advice from the ecological-domain expert I used such variables as vegetation, slope, floodplain and shoreline to refine the site-series prediction. For each BEC subzone/variant, the BN knowledge base required different adjustments to the post-edatopic site-series prediction depending on the unique environmental conditions of each BEC subzone/variant. 54 Soil Nutrient Regime Soil Nutrient Regime medium poor I 03 dry very ric h ric h poor medium I ICWHvm21 dry 03 ric h very rich ICWHvml l Q) E C) Q) 0::: - mesic a.. :I ~ 0 :!: mesic 01 Q) 01 05wa 0509wet 12 09 0 en very wa 10 11 poor dry medium 1 03 14 very wet rich 13 14,09 very rich I CWHvh21 Q) .5 C) Q) 0::: - ~ mesic I 11,14 ~ :I ~ 0 wa :!: 07,08, 14 1114 0 en very wa 12, 18 13,08,11 Figure 10 - Edatopic grids and corresponding site-series groups used for the CWHvml, CWhvm2 and CWHvh2 Bayesian networks. Some areas of the edatopic grids were comprised of only one site-series group. For example, only site-series-group 10 in the CWHvm2 was comprised of poor soil-nutrient and very wet soil-moisture. Some areas of the edatopic grids, however, were comprised of more than one site-series group. For example, siteseries groups 12 and 18 in the CWHvh2 were both comprised of poor soil nutrient and very wet soil moisture. 55 Soil Moisture SM Dry 26.2 SM Me sic 52.4 SM Wet 20.2 SM VeryWet 1.19 Soil Nutrient SN Poor 38.4 SN Medium 38.7 SN Rich 20.0 SN VeryRich 2.87 Site Series Edatopic SSE 01 45 2 SSE 03 23. 5 SSE 05 22.9 SSE 09 7.24 SSE 10 0.43 SSE 11 0.67 Figure 11 - The portion of the CWHvm2 Bayesian-network, which illustrates the relationship between site-series group (Site Series Edatopic), soil-moisture regime (Soil Moisture) and soil-nutrient regime (Soil Nutrient). A description of the node format and nomenclature can be found in Figure 5. Adding nodes to adjust the estimation of site-series group The BN knowledge base for the CWHvm2 required the fewest post-edatopic adjustments to site-series group. Based on the ecological-domain expert' s advice, I incorporated slope gradient (Slope group2) (Figure 12) and forest vegetation (Site Productivity and Species Influence) (Figure 13) directly into the site-series-group prediction. Slope gradient was used to express the ecological-domain expert's belief that the 05 site-series group occurred on sites with rich soil and moderate to high slope gradient, and the 11 site-series group occurred on sites with rich soil and little to no slope gradient. In addition, slope was used to differentiate site-series groups with poor/medium soils, such as 03 that occurred on sites with high slope gradient, from 09 and 10 that occurred on low slope gradients. The influence of forest vegetation on the prediction of site-series group was 56 ":.... " - Site Series Edotopic SSE01 SSE 03 SSE 05 SSE 09 SSE10 SSE11 • '• 45 .2 23.5 22 .9 7.2 4 I 0.43 0.67 ~ Slope Group2 SG2 Oto10 SG2 11to35 SG2 36plu s / 2. 96 18.1 78.9 •i // Site Series Slope Modifier sss 01 sss 03 sss 05 sss 09 SSS 10 sss 11 47 .0 23.5 ~ 22 .4 5.65. 0.28 1 16 ~ • I Figure 12 - The portion of the Bayesian network for the CWHvrn2, which illustrates the influence of slope (Slope Group2) on the previous estimation of site-series-group (Site Series Edatopic) to produce a new prediction of site-series group (Site Series Slope Modifier). A description of the node format and nomenclature can be found in Figure 5. Site Series Slope Modifier sss 01 sss 03 SSS 05 sss 09 SSS10 sss 11 47.0 ~ ~ 22.4 . 5.65. 0.28 1.16 i 1 i Site Productivity SPHigh SP Medium SP Low 21.8 ~ ~ 1: • _.-- 31 7 ~ 1 , 44 .5 . . 1 i i Species Influence Poplar LodgepoleOid SpruceOid Site Series Veg Modifier I - - - - YellowCedar SSV 01 413 .... ~ Nolnfluence 03 25.5 - · SSV 05 20.7 • SSV 09 7.98 1 SSV 10 0.67 SSV 11 1 77 ssv 14.3 ~ 13.3 ~ 113 ~ 14.3 .. 44.7 ~ Figure 13 - The portion of the Bayesian network for the CWHvrn2, which illustrates the influence of forest-stand productivity (Site Productivity) and forest-stand species (Species Influence) on the previous estimation of site-series group (Site Series Slope Modifier) to produce a final estimation of site-series group (Site Series Veg Modifier) for the CWHvrn2. A description of the node format and nomenclature can be found in Figure 5. 57 more complex. In general, the presence of old and low-productivity lodgepole pine (Pinus contorta var. contorta) was believed by the ecological-domain expert to often occur on 03, 09 and 10 site-series groups. Sites with highly productive lodgepole pine were more often on 01 site-series group. Sites that were highly productive and contained Sitka spruce (Picea sitchensis) were more likely to be 05 site-series group, and the low-productivity sites with Sitka spruce were more likely to be 11. The presence of highly productive yellow-cedar usually signified 01 site-series group. The BN knowledge base for the CWHvm1 incorporated similar beliefs as the CWHvm2 with respect to the probability that certain site-series groups were associated with certain slope gradients and forest vegetation. Tree species was used similarly in the CWHvm1 as in the CWHvm2, but, based on the ecological-domain expert's opinion, I added amabilis fir, poplar (Populus balsamifera) and red alder (Alnus rubra) to the list of vegetation species. It was the ecological-domain expert's opinion that both poplar and red alder helped to predict the occurrence of 09 site-series group. The 09 site-series group occurred on similar slopes and soil-moisture/nutrient regimes as the 14 site-series group, but was associated with fluvial benches that were commonly vegetated with mature red alder and poplar. In the BN knowledge base ofthe CWHvm1, I incorporated river floodplains (Floodplain) directly into the estimation of site-series group to further refine the prediction of site-series group 09 (Figure 14). I gave low floodplains (LowerFloodplain) a high probability ofbeing 09 site-series group, high floodplains (HighFloodplain) a medium probability ofbeing 09, and glacial floodplains (GlacialFloodplain) a low probability of being 09. 58 • \\ Site Series Veg Modifier 34.3 ~ SSV 01 SSV 03 17.5 ~ SSV 05 19.8 ~ SSV 09 19.0 ~ SSV 12 5.26 SSV 13 0.98 SSV 14 3.13 Floodplain GlaciaiFioodpl... 1.03 HighFioodplain 0.82 LowerFioodplain 0.38 NotFioodplain 97.8 ""' ~ / r-- -· Site Series Fluvial Modifier 34.3 SSF01 SSF03 17.5 SSF05 19.6 19.2 SSF09 5.26 SSF12 SSF13 0.98 SSF14 3.09 : • Figure 14 - The portion of the Bayesian network for the CWHvml , which illustrates the influence of floodplains (Floodplain) on the previous estimate of site-series group (Site Series Veg Modifier) to produce a final estimate of site-series group (Site Series Fluvial Modifier) for the CWHvml. A description of the node format and nomenclature can be found in Figure 5. I developed the BN knowledge base of the CWHvh2 to incorporate slope gradient, vegetation and floodplains variables directly into the site-series-group prediction. I defined conditional-probability tables for these relationships using the ecological-domain expert's opinion. I also incorporated shoreline effects directly into the prediction of site-series group (Figure 15). I converted to site-series group 14, all site-series groups occurring in areas affected by the ocean wind input layer (Salt Spray Shoreline). In addition, based on the ecological-domain expert's advice and the CWHvh2 edatopic grid (Figure 10), I increased the probability that a site was site-series group 18 if it occurred on an existing very wet site (site-series groups 12 and 13) in a gully adjacent to the ocean (Gully Adjacent to Shoreline). 59 Low Area Adjacent to Shor ... LowShoreNot LowShore 98.5 1.50 Site Series Fluvial Modifier Gully Adjacent to Shoreline ShoreCreekNot 99 .9 ShoreC reek .06 1 Salt Spray Shoreline SS F 01 SS F 03 SS F 0<1 SS F 07 SSF0 8 SSF11 SSF1 2 SSF1 3 18.5 10.7 1<1 .2 11 .6 .055 2<1.<1 770 12.8 Wi ndShoreN ot 95.9 4.06 W indS hore Site Series Shore Mod ... SSS h 01 SSSh 03 SSS h 04 SSS h 07 SSS h DB SSS h 11 SSS h 12 SSSh 13 SSS h 14 SSSh 18 17.8 10.3 13.7 11.1 .052 23. 4 ! - 7.39 12.3 4.06 .014 Figure 15 - The Bayesian network for the CWHvh2, which illustrates the influence of ocean wind (Salt Spray Shoreline) and ocean estuaries (Gully Adjacent to Shore) on the previous estimation of site-series group (Site Series Fluvial Modifier) to produce a final estimation of site-series group (Site Series Shore Modifier) for the CWHvh2. A description of the node format and nomenclature can be found in Figure 5. Adjusting the knowledge base conditional probabilities I populated each input node with an unconditional-probability distribution, which represented the proportion that each input class occurred within the corresponding BEC subzone/variant. These unconditional probabilities aided model adjustment because the posterior distributions generated by each of the intermediate nodes could then reflect the relative abundance of each class within the BEC subzone/variant (see the "probability bars" of each node in Figures 7, 8, 9, 10, 11 , 12, and 13 ). If the posterior distribution of an intermediate node did not reflect the relative distribution of the classes within the study area then this signalled the possible need to revise the node's conditional-probability table. 60 For example, in Figure 9, which contains the unconditional probabilities for the CWHvm2 BEC unit, 0.28% of the area has slopes ofO% (S 0), 2.68% of the area has slopes from 1% to 10% (S lto10), 18.1% of the area has slopes from 11% to 35% (S 11to35), 34.4% of the area has slope from 36% to 60% (S 36to60), 22.5% of the area has slopes from 61% to 80% (S 61to80), and 22% of the area has slopes greater than 80% (S 81plus). Based on the unconditional probabilities of Slope Class and the conditional probabilities of Slope Group, the posterior probabilities of the Slope Group classes were 12% Gentle slope and 88% Very Steep slope. If the posterior-probability values for Slope Group did not reflect expert opinion then I would adjust the conditional-probability values for Slope Group in consultation with ecological-domain expert. Although the posterior probabilities that resulted from the BEC subzone/variant's unconditional-probability tables reflected the relative abundance of each class within the BEC subzone/variant, they did not portray the spatial arrangement of each class. To assess the spatial distribution of the intermediate- and leaf-nodes' posterior-probabilities, I routinely generated maps using a sample of the GIS database as input. The ecological-domain expert and I visually interpreted the maps generated from intermediate- and leaf-nodes' posteriorprobabilities to assess how well the knowledge bases were predicting the spatial distribution of the intermediate- and leaf-nodes' classes. If the spatial arrangement of a node's classes did not reflect the ecological-domain expert's expectation then I revised the conditional probabilities using the advice of the ecological-domain expert. 61 Testing the knowledge base I tested the spatial output from the knowledge bases against spatially explicit reference data once the knowledge base appeared to produce predictions that spatially matched the expectation of the ecological-domain expert. To test the knowledge bases of the CWHvml and CWHvm2, I randomly selected 68 raster cells from an area in which I had stereo air-photo coverage. For each randomly located point, I identified the corresponding location on the air photos. Using air-photo interpretation, the ecological-domain expert estimated the site-series group occurring at each randomly selected point. For the CWHvh2. I created 32 randomly distributed points, but I augmented these with 59 site-series field plots gathered for a previous project conducted in the CWHvh2 (Banner eta/. 2004). These field plots occurred in two spatially distinct clusters, with each cluster employing a grid-sampling plan ranging from 100 m to 200 m spacing. By comparing the knowledge base estimation of site-series group with the air-photointerpreted and field-gathered predictions it was possible to assess how well the knowledge base predicted site-series group. I conducted the comparison between the map's site-series group and the site-series group interpreted from air-photos using a simple hard classification. For each sample point, I chose the site-series group with the highest probability from the map and the dominant site-series group from the photo. To rate knowledge base performance I simply tallied the number of times that the highest-probability site-series-group prediction matched the dominant air-photo interpretation. If the knowledge base correctly predicted site-series group greater than 65% of the time then I made no additional adjustments to the knowledge base. If the testing results were low, however, I did make additional adjustments to the conditional-probability table. I based these additional adjustments on both evidence 62 resulting from the test analysis and further visual analysis. All of the conditional-probability adjustments I made were reviewed by the ecological-domain expert prior to further map generation. I repeated this process of adjusting the conditional-probability values, making visual assessments, and testing against reference data until the adjustments, believed to be required to increase accuracy, were no longer resulting in increased prediction success. Visual assessment of the preliminary results from the knowledge bases for the CWHvm2 and CWHvm1 revealed that the conditional-probability values required only minor adjustment. After the first round of testing using the testing reference data, the CWHvm2 knowledge base required no adjustments to the conditional-probability tables, however, the CWHvm 1 required some adjustments to increase the successful prediction of the 09 site-series group. Final testing of knowledge bases for the CWHvm2 and CHWvm1 indicated that the CWHvm2 knowledge base achieved a 92% successful prediction rate for the 14 air-photo reference points, while the CWHvm1 knowledge base correctly predicted 72% of the 54 airphoto reference points. The CWHvh2 knowledge base, however, required extensive adjustments due to either visual assessment or the results generated from the testing procedure. In total the CWHvh2 was tested and adjusted three times, but never achieved the 65% successful prediction rate. At best, the CWHvh2 knowledge base was successful in predicting site-series group on 59% of the test sites. At this point, I used all three BN knowledge bases to generate a final siteseries-group map for each of the three BEC subzone/variants in the study area. 63 2.5 FINAL MAP GENERATION I created a soft-classification map of site-series group from the input GIS database by passing each individual cell in the input database through the appropriate CWHvh2, CWHvm1 and CWHvm2 knowledge base. The resulting map of fuzzy site-series groups recorded the probability of site-series-group occurrence within each raster cell. I created a hard classification of site-series group by choosing the site-series group with the highest probability within each raster cell. 2.6 MAP ACCURACY ASSESSMENT To assess map accuracy, I compared reference data, which were comprised of site- series-group information gathered from georeferenced field observations, to geographically equivalent areas on the map of predicted site-series group. In total, I used 286 reference points for accuracy assessment: 92 in the CWHvm2, 161 in the CWHvm1, and 33 in the CWHvh2. These field observations were gathered for the North Coast PEM Project (Jones 2003) using a cluster-sampling plan. For each point in the reference data, I compared the site-series group in the reference · data to the site-series-group prediction at the same location on the map. I used a simple hard classification for this comparison. For each point in the reference data, I chose the dominant site-series group, and for each point on the map, I chose the site-series group with the highest probability. I constructed a contingency table from these map/reference-data comparisons for each BEC subzone/variant. From these contingency tables, I generated several accuracyassessment measures: percentage correct, producer's accuracy, user's accuracy and Kappa 64 analysis. I used an a= 0.05 level of significance throughout these test for inferring statistical significance. In conjunction with each contingency matrix, I constructed a separate matrix that recorded the average posterior-probability from the site-series group with the highest probability. The content of this matrix revealed the knowledge base's confidence when predicting the most-probable site-series group. For example, if the contingency matrix records that five samples were 01 in the reference data and most likely 01 in the map, an average posterior-probability value of 0. 7 for this combination would indicate that the knowledge base was 70% confident that these locations were 01. 65 3 RESULTS 3.1 KNOWLEDGE BASE STRUCTURE The Bayesian-network knowledge base for the CWHvm2 was composed of 46 nodes, 58 links and 2179 conditional probabilities (Figure 16). Of the 46 nodes in the network, 17 nodes were input nodes, which represented the layers in the GIS database, and one node was an output node, which represented the final estimation of site-series group (Site Series V eg Modifier). Based on the input nodes' unconditional probability values for the entire study area, the posterior probabilities for the final estimation of site-series groups -01 (SSV 01 ), 03 (SSV 03), -05 (SSV 05), -09 (SSV 09), -10 (SSV 10) and -11 (SSV 11), were 46.4%, 25.9%, 20.8%, 4.79%, 0.5% and 1.7%, respectively. The Bayesian-network knowledge base for the CWHvm1 was composed of 47 nodes, 60 links and 2759 conditional probabilities (Figure 17). Of the 47 nodes in the network, 17 nodes were input nodes, which represented the layers in the GIS database, and one node was an output node, which represented the final estimation of site-series group (Site Series Fluvial Modifier). Based on the input nodes' unconditional-probability values for the entire study area, the posterior probabilities for the final estimation of site-series groups -01 (SSF 01), -03 (SSF 03), -05 (SSF 05), -09 (SSF 09), -12 (SSF 12), -13 (SSF 13) and -14 (SSF 14), were 42.6%, 16.3%, 25%, 6%, 5.5%, 0.4% and 4.2%, respectively. The Bayesian-network knowledge base for the CWHvh2 was composed of 50 nodes, 64links and 3725 conditionalprobabilities (Figure 18). Of the 50 nodes in the network, 19 nodes were input nodes, which represented the layers in the GIS database, and one node was 66 Landform Elements GNDry GWMesic GWWet GWVeryWet ~ ""'""' GWSDry GWSMesic GWSWet GWSVeryWet ~ " ~ S 61to80 S 81plus ~~ S 11to35 18.1 S 36to60 34.4 ~ Nolnfluence Poplar LodgepoleOid SpruceOid YeUowCedar 72.7 7.48 19.8 .007 .014 0+ Bedrock BR Umestone BR Volcenics BR Rich BR lntenned BR Poor Specie• Influence "'"' l \i "s 1to10 2.68~~tI! ••'~ S Spruce S AmabMis S YellowCedar S Alder SOther .007 .015 8.01 1.11 19.8 0.24 70.8 Species ' Crown Cloaure Claaa S Lodgepole S Poplar HO H 1to10 H 11to19 H 2Dto28 H 29to37 H38to46 H 47to55 Figure 16- The complete Bayesian network developed to predict site-series group in the CWHvm2 BEC unit of this study. A description of the node format and nomenclature can be found in Figure 5. Ridge Shoulder BackSiope Hollow Spur Terrece FootSiope VelleySottom 67 Landform Elements - ~ GWVeryWet GWWet GWMe1ic ~ GWSVeryWet GWSDry GWSMesK: GWSWet """" " ---. I SSF01 SSF09 SSF12 SSF 13 SSF14 SSFOS SSF03 ~ " ~ S 81plus S 11to35 S 3Sto60 S 61to80 so S 1to10 Nolnftuence 76.-' .063 11 .9 2.\Jo& 7.68 1.91 .032 Species Influence LodgepoleOkt l SpruceOid AmabilisOid YellowCedar AklerOid Poplar 56.1 15.7 26.5 1.76 0+ Bedrock BR Umestone BR Volcanics BR Ri<:h BR lntermed BR Poor Figure 17- The complete Bayesian network developed to predict site-series group in the CWHvml BEC unit of this study. A description of the node format and nomenclature can be found in Figure 5. Terrace FootSiope ValleyBottom Spur Hollow Ridge Shoulder BackSiope ----------------------------------------------------------------------------- 68 L.ndform Elements -;-... ,, -· - GNSVeryWet GWSDrt GWSMesic GWSWet GWWet GWVeryWet GWO.., GWMeaic ~ """""" AkterOid Nolnflu6nee 1.12 70.G Species lnAuence lodgepoleOid 8.82 SpruceOid 5.68 l..-----1 AmabilisOkl 0.75 YellowCedltt" 13.1 Bedrock SAider S Other SPopl• S lodgepole S ~c SAmabilil S YeHowCeder Age Cia.. Height Class HO 1.85 H1to10 11 .1 Figure 18 - The complete Bayesian network developed to predict site-series group in the CWHvh2 BEC unit of this study. A description of the node format and nomenclature can be found in Figure 5. ValleyBoUom Ridge Shoulder Back.Siope Hollow Spur Terrace FootSiope 69 an output node, which represented the final estimation of site-series group (Site Series Shore Modifier). Based on the input nodes' unconditional-probability values for the entire study area, the posterior probabilities for site-series groups -01 (SSSh 01), -03 (SSSh 03), -04 (SSSh 04), - 07 (SSSh 07), -08 (SSSh 08), -11 (SSSh 11), -12 (SSSh 12), -13 (SSSh 13),14 (SSSh 14) and -18 (SSSh 18), were 20.2%, 10.6%, 14%, 8.5%, 0.04%, 28%, 9.3%, 5.2%, 4.1% and 0.01 %, respectively. 3.2 PREDICTIVE ECOSYSTEM MAPS Figure 19 ilhistrates the spatial arrangement of those site-series groups with the highest probability value within each raster cell. The spatial arrangement reveals the general location of each of the predicted site-series-groups and their abundance within the landscape. Figure 20 and Figure 21 provide detailed views of a section of the study area, with the intention of demonstrating the relative terrain position of the predicted site series groups. Within the CWHvm2, 01 was the most abundant site-series group and situated midslope. The second most abundant site-series group was 03, which was situated at higher elevations within large valleys. The third most abundant site-series group was 05, which was placed close to rivers and streams. The 09 and 10 site-series groups occurred only in small isolated areas at lower elevations within the valleys. Within the CWHvm1, like the CWHvm2, 01 was the most abundant site-series group, followed by 03 and 05. With respect to the relative spatial position of these three leading site-series groups, 01 site-series group was placed on mid-slopes, 03 was placed at higher 70 Site Series Group - 01 O o3 - 1 \. c._ --< } ,I 05 \ 09 10 \ 11 • Final CWHvm1 PEM Site Series Group - 01 O o3 - 05 - 09 12 - 13 14 Final CWHvh2 PEM Site Series Group - 01 O o3 - 04 - 07 - 08 - D l> ( 4 11 \ 12 J 13 ( 14 \ J 18 BEC SubzoneNariant Boundaries D I 5 10 Kilometres Figure 19 - An overview map of predicted site-series groups in all three BEC subzone/variants in the study area. This map shows the relative position and abundance of each of the predicted site-series groups within each of the three BEC subzone/variants. The red "viewpoint" boxes indicate the spatial extent of the perspective view in Figure 20 and Figure 21. 71 - 01 - 05 -- 110 - 01 - 05 -- 1134 12 09 72 Figure 20 - A perspective view illustrating the relative terrain position of the fina l site-series-group predictions for the CWHvm2 and CWHvml BEC units. The wet site-series groups, such as 05 , appear to be correctly placed within the gullies and toe slopes, and the dry siteseries groups, such as 03, appear to be correctly placed on ridges and shoulders. In the figure, the CWHvm2 occurs above the orange line, which separates the CHWvml from the CWHvm2. Figure 19 shows the location of this perspective view within the study area. - D o3 Site Series Group Final CWHvm1 PEM 09 D o3 Site Series Group Final CWHvm2 PEM - 01 -- 0047 -- 018 -- 1123 - 18 73 Figure 21 - A perspective view illustrating the relative terrain position of the final site-series-group predictions for the CWHvh2 BEC unit. The wet site-series groups, such as 07 and 14, appear to be correctly placed in the gullies and toe slopes, and the dry site-series groups, such as 03 , appear to be correctly placed on ridges and shoulders. Figure 19 shows the spatial extent of this perspective view within the study area. 14 D o3 Site Series Group Final CWHvh2 PEM elevations within the large valleys, and 05 was placed close to small rivers. The 09 siteseries group was situated along the bottom of larger valleys while 13 and 14 appeared higher up the valley sides than 09. Within the CWHvh2, the low relief areas were defined by the dominance of siteseries-groups 11, 12 and 13, while the 01, 03, 04 and 07 site-series-groups dominated the mountainous areas. Site-series-group 14, which was related to offshore winds, occurred along the western coastline of the study area. 3.3 MAP ACCURACY ASSESSMENT The map of predicted site..;series group for the CWHvm2 obtained an overall percentage correct of 47.8%, with a 95% confidence interval of± 9.8%. Producer's accuracy varied from 0% to 65.6% and user's accuracy varied from 0% to 59.1% (Table 9). The Kappa analysis produced a KHAT of0.249 with a variance of0.006, producing a Z-score of 3.28 (p = 0.001). Average posterior probabilities for the predicted site-series groups ranged from 0% to 55% (Table 10). For those map-accuracy points where there was a match between the site-series-group value in the reference data and in the map, the average posterior probabilities ranged from 0% to 59.4%. The map of predicted site-series group for the CWHvm1 obtained an overall percentage correct of50.3%, with a 95% confidence interval of± 7.4%. Producer's accuracy varied from 0% to 73.8% and user's accuracy varied from 0% to 82.8% (Table 11). The Kappa analysis produced a KHAT of0.37 with a variance of0.022, producing a Z-score of 2.51 (p = 0.012). Average posterior probabilities for the predicted site-series groups ranged from 0% to 53.7% (Table 12). For those map-accuracy points where there was a match 74 ~ Table 9 - The accuracy-assessment contingency-matrix for the CWHvm2, which includes percentage correct, user' s and producer' s accuracy. Map Site Series Group Reference Site Series Group User's Total Accuracy 01 03 05 09 10 11 01 21 4 18 I 1 2 47 44.7% 03 5 13 1 0 3 0 22 59.1 % 05 6 3 10 0 0 3 22 45.4% 09 0 0 0 0 0 0 0 0.0% 10 0 0 0 0 0 0 0 0.0% 11 0 0 0 1 0 0 1 0.0% Total 32 20 29 2 4 5 92 Producer's Accuracy 65.6% 65 .0% 34.5% 0.0% 0.0% 0.0% Percentage Correct 47.8% 1 Table 10 - The matrix of average posterior probabilities for the CWHvm2, which includes the average posterior probabilities for each mapped site-series group. Map Site Series Group Reference Site Series Group Average 01 03 05 09 12 14 01 0.484 0.511 0.500 0.560 0.468 0.459 0.493 03 0.534 0.594 0.516 0 0.400 0 0.550 05 0.493 0.456 0.436 0 0 0.466 0.458 09 0 0 0 0 0 0 0.000 10 0 0 0 0 0 0 0.000 14 0 0 0 0.290 0 0 0.290 75 Table 11 - The accuracy-assessment contingency-matrix for the CWHvml , which includes percentage correct, user's and producer's accuracy. Map Site Series Group Reference Site Series Group Total User's Accuracy 01 03 05 09 12 13 14 01 31 12 15 2 8 0 4 72 43.1 % 03 6 10 0 0 1 4 0 21 47.6% 05 4 0 24 1 0 0 0 29 82.8% 09 1 0 7 9 0 0 7 24 37.5% 12 0 0 0 0 0 1 0 1 0.0% 13 0 0 0 0 0 0 0 0 0.0% 14 0 0 1 3 3 0 7 14 50.0% Total 42 22 47 15 12 5 18 161 0.0% 0.0% 38.9% Producer's Accuracy 73.8% 45.4% 51.1 % 60.0% Percentage Correct 50.3 % Table 12 - The matrix of average posterior probabilities for the CWHvm1, which includes the average posterior probabilities for each mapped site-series group. Map Site Series Group Reference Site Series Group Average 01 03 05 09 12 13 14 01 0.435 0.441 0.423 0.368 0.420 0 0.346 0.425 03 0.469 0.548 0 0 0.820 0.543 0 0.537 05 0.488 0 0.423 0.380 0 0 0 0.431 09 0.371 0 0.506 0.451 0 0 0.474 0.470 12 0 0 0 0 0 0.408 0 0.408 13 0 0 0 0 0 0 0 0 14 0 0 0.426 0.306 0.365 0 0.355 0.352 76 from 0% to 53.7% (Table 12). For those map-accuracy points where there was a match between the site-series-group value in the reference data and in the map, the average posterior probabilities ranged from 0% to 54.8%. The map of predicted site-series group for the CWHvh2 obtained an overall percentage correct of33.3%, with a 95% confidence interval of± 15.2%. Producer's accuracy varied from 0% to 60% and user's accuracy varied from 0% to 100% (Table 13). The Kappa analysis produced a KHAT of 0.199 with a variance of 0.294, producing a Zscore of0.366 (p = 0.714). Average posterior probabilities for the predicted site-series groups ranged from 0% to 57.6% (Table 14). For those map-accuracy points where there was a match between the reference data and the map site-series-group values, the average posterior probabilities ranged from 0% to 54.6%. 77 0 2 0 0 0 0 5 08 11 12 13 14 18 Total 5 6 0 0 0 0 0 0 0 0 4 0 0 0 0 0 7 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 1 0 0 0 0 0 12 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 14 2 0 2 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 18 33.3% 0 07 1 2 0 0 1l Percentage Correct 0 04 0 1 0 1 1 2 08 60.0% 20.0% 16.7% 85.7% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0 03 07 04 03 Reference Site Series Group Producer's Accuracy 3 01 01 Map Site Series Group 33 0 0 0 1 5 0 12 3 1 11 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 50.0% 33.3 % 100.0% 27.3% User's Total Accuracy Table 13 - The accuracy-assessment contingency-matrix for the CWHvh2, which includes percentage correct, user's and producer's accuracy. 78 0.380 0 0 0 0 0.522 0 0 0 0 03 04 07 08 11 12 13 14 18 01 01 Map Site Series Group 0 0 0 0 0 0 0 0 0 0 0.600 0.318 0 0.396 04 0 0 0 0.351 0.374 0.385 03 0 0 0 0 0 0 0.546 0 0 0.362 07 0 0 0 0 0 0 0 0 0 0 08 0 0 0 0 0 0 0 0 0 0.393 11 0 0 0 0 0.490 0 0 0 0 0 12 Reference Site Series Group 0 0 0 0.388 0.556 0 0.619 0 0 0 13 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0.388 0.529 0 0.576 0.340 0.374 0.385 Average 79 Table 14 - The matrix of average posterior probabilities for the CWHvh2, which includes the average posterior probabilities for each mapped sitesenes group. 4 DISCUSSION One of the objectives of this project was to develop an approach to map large-scale ecological units that incorporated expert knowledge. It was anticipated that this objective would be met by using Bayesian networks as the knowledge base of an expert-based system. Bayesian networks also held promise of being a good way to allow an ecological-domain expert to build a model that could capture prediction uncertainty when describing the relationships between environmental variables and site series. Unfortunately, none t~ maps of site-series group produced by this study correctly predicted the spatial locations of even 65% of the known site-series groups. Results from the accuracy assessment procedures performed within the CWHvm2 unit indicated that the predictive-ecosystem map was only fairly representing the spatial distribution of site-series group. The predictive ecosystem map for the CWHvm2 obtained an overall percentage correct of 47.8%, with a 95% confidence interval of± 9.8. This indicated that the map performed fairly well, but fell short of the target of 65% established by Meidinger (2003). This result, however, was higher than the result obtained by the North Coast PEM Project, which used a belief-matrix knowledge base. The North Coast PEM Project, although utilizing slightly different site-series groupings, achieved an overall percentage correct of 21% in the CWHvm2 (Meidinger 2004). According to the contingency matrix for the CWHvm2 (Table 9), the map successfully identified the 01 and 03 site-series groups, with producer's accuracies of 65.6% and 65% respectively. The map, however, performed poorly at identifying the remaining site-series groups, with 05 being successfully predicted only 34.5% ofthe time and 09, 10 80 and 11 not predicted at all relative to the reference data. Results of the CWHvrn2 user's accuracy assessment revealed that several of the site-series groups were assigned to the wrong unit. A large number ofthe samples identified as 01 on the map were identified as 05 in the reference data. This was similar for the 05, where the map falsely predicted the 01 site-series groups as 05. These errors of commission were confirmed by the low user's accuracy values (Table 9). The map was also unable to identify any ofthe 09 or 10 units within the reference data, and unable to correctly identify any 11 units within the reference data. Although the KHAT value of 0.249 indicated that the map poorly predicted site-series group, the KHAT variance was 0.006. This meant that, although the map was unable to meet the 65% overall percent correct, the map was significantly better than chance agreement. Evaluation of the average posterior probabilities generated by the BN for each of the site-series groups mapped in the CWHvrn2 (Table 10) revealed that each site-series group was predicted with differing levels of confidence. The 03 site-series group was predicted with the highest confidence, with an average posterior probability of 55%, but none of the other site-series groups were predicted with high confidence. The 01 and 05 site-series groups were mapped with confidence levels of 49.3% and 45.8% respectively, while 14 was predicted with the lowest confidence: 29.2%. Site-series-groups 09 and 10 were not predicted at all in relation to the reference data, resulting in 0% confidence for both. Results from the accuracy assessment procedures for the CWHvml indicated that the predictive ecosystem map only partially represented the spatial distribution of site-series group (Table 11 ). The overall percentage correct of 50.3%, with a 95% confidence interval 81 of± 7 .4, indicated that the map performed fairly well at predicting site-series group, but, as with the CWHvm2, fell short ofthe target of 65%. This result, however, was similar to the result obtained by the North Coast PEM Project, which achieved an overall percentage correct of 50% for the CWHvm1 using similar site-series groups (Meidinger 2004). According to the contingency matrix for the CWHvm1 (Table 11 ), the map of siteseries group did successfully identify the 01, with a high producer's accuracy of73.8%. The map, however, achieved only moderate success when predicting the 09, 05 and 03 sites-series groups, with producer's accuracy results of 60%, 51.1% and 45.4% respectively. The map poorly predicted the 14 site-series group, with only 38.9% producer's accuracy, and was unable to predict any of the 12 or 13 site-series groups. Results of the user's accuracy assessment revealed that several of the site-series groups were assigned to the wrong unit. The user's accuracy results for both 03 and 14 were moderate, with approximately half the mapped sites actually representing the reference data. Although the producer's accuracy for 01 was high, a large number of the accuracy assessment points identified as 01 on the map were identified as 05 or 03 in the reference data, resulting in a low user's accuracy value for the 01 sites-series group. The same was true for the 09, where the map falsely predicted the 05 and 14 site-series groups to be 09. The map was also unable to identify any 13 site-series groups within the reference data, and unable to correctly identify any 12 site-series groups within the reference data. The KHAT value of0.37 suggests that the map poorly/fairly predicted site series group. However, a KHAT variance of 0.022 meant that, like the site-series-group map for the CWHvm2, the map was significantly better than chance agreement, although the map was unable to meet the 65% overall percent correct. 82 The average posterior probabilities generated by the BN were roughly equivalent for each of the site-series groups that were mapped in the CWHvm1 (Table 12), although none of the site-series groups were predicted with high confidence. Like the CWHvm2, the 03 site-series group was predicted with the highest confidence, with an average posteriorprobability of 53.7%. The remaining site-series groups, with the exception of 13, were predicted with moderately low confidence, with values ranging from 35.2% to 47%. Siteseries-group 13 was not predicted at all with respect to the reference data, resulting in 0% confidence. An overall percentage-correct value of33.3%, with a 95% confidence interval of± 15.2%, indicated that the CWHvh2 map poorly predicted site-series group, and fell well short of the target of 65%. Again, this result was similar to the result obtained by the North Coast PEM Project, which achieved an overall percentage correct of 38% in the CWHvh2 using similar site-series groups (Meidinger 2004). The map of site-series group for the CWHvh2 performed well when identifying 07 and moderately well when identifying 01, with producer's accuracies of 85.7% and 60% respectively. The map, however, performed poorly when identifying the remaining siteseries groups. The 03 and 04 site-series groups obtained producer's accuracies of only 20% and 16. 7%, respectively, and site-series-groups 11, 12 and 13 were never.identified correctly with respect to the reference data. The 08, 14 and 18 site-series groups could not generate producer's accuracy values greater than zero because these site-series groups were not present in the reference data. The user's accuracy for the 07 site-series group was fair, with approximately half the mapped 07 sites actually representing 07 in the reference data (Table 13). Several of the other site-series groups, however, were assigned to the incorrect unit. 83 Although the producer's accuracy for 01 was high, a large number of the accuracyassessment points identified as 01 on the map were identified as 11 in the reference data. The 04, 11, 12 and 13 site-series groups were overestimated on the map, which resulted in low user's accuracy. The user's accuracy results for site-series groups 08, 14 and 18 were understandably zero because they were not present in the reference data. The KHAT value of 0.199 indicated that the map poorly predicted site-series group. In addition, due to the low number of reference-data points per site-series group, the comparison between map and reference data produced a high KHAT variance value: 0.294. This meant that not only was the site-series-group map not able to meet the 65% overall percent correct target but the map was not significantly better than chance agreement. The average posterior probabilities for the site-series groups in the CWHvh2 (Table 14) can roughly be broken into three categories: those site-series groups that were not predicted at all, those that were predicted with low confidence, and those that were predicted with moderate confidence. Site-series-groups, 08, 13, 14 and 18, were predicted with 0% confidence because they were never predicted to be the dominant site-series group at the reference-data locations. Site-series-groups, 01, 03, 04 and 12, were all predicted to be the dominant site-series group with a low level of confidence, with confidence values ranging from 34% to 38.8%. The 07 and 11 site-series groups were predicted to be the dominant siteseries group with moderate confidence, with confidence values of 57.6% and 52.9%, respectively. During the accuracy assessment procedures, the georeferenced reference-data were considered correct in both spatial accuracy and thematic content. The impact of this assumption was that, because accuracy-assessment results were relative to the reference data, 84 the accuracy-assessment results indicated how well the generated map represented the reference data, not how well the map represented what was actually on the ground. The spatial juxtaposition of the predicted-site-series groups for the CWHvm2 and CWHvm1, however, were similar to the landscape profiles presented in the field guide of Banner eta/. (1993). The knowledge bases for the CWHvm2 and CWHvm1 were considered to be performing well at predicting the most common site-series groups, based on the ecologicaldomain expert's visual interpretation of relative position and abundance of each site-series group within the final ecosystem maps (Figure 19) and the perspective view (Figure 20). The relative terrain positions of the predicted site-series groups in the CWHvh2 generally match the landscape profiles that were presented in the Banner eta/. (1993) field guide for site-interpretation. Based on ecological-domain expert's visual interpretation of relative position and abundance of each site-series group within the final ecosystem maps (Figure 19) and the perspective view (Figure 21), the CWHvh2 knowledge base appeared to perform fairly at predicting the most common site-series groups. The 07 site-series group occurred at lower elevations on moderate slopes while the 03 occurred at higher elevations. Site-series-groups 01 and 04 were between the 07 and 03, with 04 tending towards steeper slopes than 01. Site-series-groups 11 and 12 were found on low-slope areas with 12 being found closer to water bodies than 11. Observation that the 03 site-series group was being placed in the upland areas within complexes of 11 and 12, however, was disconcerting. According to the ecological-domain expert, these areas of03 within 12/13 should have a higher probability of being 01 rather than 03. This indicated that the knowledge base was underestimating the soil moisture of these upland areas within larger lowland complexes. 85 Comparison of the row totals to the column totals in all three contingency matrices revealed that 01 site-series group was overestimated in all three BEC subzone/variants. In the CWHvm2 and CWHvm1 maps, this overestimation was to the detriment of the second most common site-series group- 05 . While there was overestimation of the 01 on the map, there was also underestimation of some site-series groups. Generally, those site-series groups being underestimated and incorrectly mapped had very wet soil moisture regime and occurred infrequently in the reference data. The underestimation ofthese very wet site-series groups may in part be due to the scale of the GIS data. If these very wet units occurred in small isolated areas then the input GIS data may not have contained adequate spatial resolution to detect the environmental conditions required to delineate these units. For example, site-series-group 13 in the CWHvh2 usually covers no more than a couple of hectares and is commonly associated with depressions with low forest productivity. Although the minimum spacing between TRIM elevation points was 25m, the average was roughly 75 m, making it difficult to detect small one- or two-hectare depressions. In addition, forest-cover polygons had an average polygon area of 12 ha in the study area and did not delineate small one- or two-hectare features. A common technique used in past PEM projects was to indicate that some small-unmapped sites were present within some larger ecological units (Jones 2003). For example, units mapped as site-series 01 in the CWHvh2 of Jones (2003) actually contained approximately 90%-01 and 10%-11. Evaluation of the off-diagonal elements in all three contingency matrices revealed that generally, errors occurred as a result of a cell being incorrectly assigned to the adjacent unit in the edatopic grid. For example, 01 site-series group, which was adjacent to 03, 05 and 09 groups in the edatopic grid of CWHvm2, was commonly misidentified as either 03 or 05 86 site-series group. Although this was an encouraging sign that the knowledge base was producing results close to the correct location on the edatopic grid, the level of misclassification was considered high. The solution employed by previous predictivemapping projects that map site-series was to improve classification accuracy by grouping site-series units into larger site-series groups (Jones 2003). A further grouping of site-series groups was not considered a reasonable solution in the current project because the knowledge bases already incorporated groups of site series. In the CWHvh2, the original 01 and the 04 site series ofBanner et al. (1993) occupy adjacent areas on the edatopic grid and exist within similar environmental conditions. During development of the CWHvh2 knowledge base, it was recognised that the GIS database may not be detailed enough to separate the 01 site series from the 04. Although these two similar site series were not grouped during CWHvh2 model development, they could have been grouped within the Bayesian network through the creation of an alternate output node. In retrospect, it may have been best to create the knowledge bases with the intent to predict site series, not grouped site-series. As a last step in the Bayesian network development, an alternate output node could have been added, which grouped together those ecologically similar site series that the knowledge base was having difficulty differentiating. This alternate approach could also ease future knowledge base updating. For example, with the current knowledge bases, which group site-series from the onset, if new data become available that can differentiate previously grouped site-series then many of the conditional probability tables involving site-series group would need to be altered. Ifthe Bayesian knowledge base was created with the intention to predict site series then the final site-series87 grouping node would be the only node requiring an alteration to the number of site-series classes. Although the BN knowledge bases were constructed solely from expert opinion, the results of the accuracy assessment procedures were lower than anticipated, especially, given the moderately-high agreement between the CWHvm2 and CWHvml testing-data set and the test maps. This disparity between the results of knowledge base testing and the accuracy assessment results could be partially explained by the different methodologies used to collect the two sets of reference data. The reference data used for accuracy assessment was comprised of site-series observations collected through ground surveys, while the set of reference data used for testing was collected using air-photo interpretation. Although both sets of data were gathered using sampling plans that intended to capture the proportional occurrence of each site series within the study area, the testing set contained a higher proportion of points in the more common site-series groups than did the accuracy assessment set. This increase in the more common site-series groups in the set of testing reference data may have contributed to the high percent-correct scores during knowledge base testing. Disagreement between the results ofknowledge base testing and the results of map-accuracy assessment could have been reduced using the same sampling plan to gather the reference data for knowledge base testing and accuracy assessment of the map. A review of the matrices that recorded the average posterior-probabilities for each of the site-series groups revealed that none of the most likely site-series-groups were predicted with high confidence. In all three BEC subzone/variants, the average confidence values for the most-likely site-series group were around 0.4 or 0.5. These values indicated that rarely was there a clear majority in the soft-classification results at each location. Although 88 disappointingly low, these confidence values reflected how difficult it was to determine a location's site-series group, given the information provided in the input GIS database. The ecological-domain expert often indicated that conditional-probability values should be lowered to compensate for a lack of confidence in either the input GIS database or subsequent intermediate nodes. Thus, lowering the conditional-probability values leads to a reduction in prediction confidence. The accuracy assessment results and confidence values revealed the knowledge base and GIS database performed poorly together when making an estimate of site-series group. The overall percentage-correct results, which were produced using the site-series group with the highest probability at each location, indicated that the dominant site-series group was correctly identified by the knowledge base at less-than 50% of the reference data locations. The low posterior-probability values further highlighted the poor performance of the knowledge base. Rarely did the knowledge base confidently predict the correct site-series group from the input ·GIS database. In addition to the low percent-correct and confidence values, the user's accuracy results were poor for almost each site-series group. The methods employed in this project did not produce a large-scale ecosystem classification map that met the needs of forest-related management activities. With low percent-correct results and low prediction confidence, the maps produced by this study would not be suitable for forest management activities that rely on a minimum 65% correct estimation of either site-series or the site-series groups created in this study. For example, the resulting maps would not be suitable for timber supply analysis, identifying the location of rare ecosystems, or silviculture prescriptions. 89 As with all expert systems, the results and knowledge base were closely linked to the opinion of the expert. This could make it difficult to replicate the results without using the same experts involved in the initial model development. It also makes it difficult to assess the model's sensitivities, especially if the model is not explicitly and transparently presenting the assumptions and decisions made by the domain expert. The Bayesian-networks developed for this study, however, explicitly and transparently captured and presented the assumptions and decisions made by the ecological-domain expert. The chances of successfully replicating the process of estimating site-series group from the GIS input layers are greatly improved by the Bayesian network's transparency. The BN was successful at capturing and expressing ecological knowledge with respect to the relationships between environmental variables and site-series group. The opportunity to create intermediate nodes, which were ecologically meaningful to the ecological-domain expert, allowed the ecological-domain expert to focus on only a few ecological interactions at a time. The task of generating conditional probabilities for intermediate nodes that were derived from either multiple parent nodes or parent nodes with multiple classes, however, was sometimes daunting. For example, the number of conditional probabilities required for a node with two classes and two parent nodes, each with two classes, equals eight (2 x 2 x 2). A node with six classes and two parents, each with four classes, however, required the definition of96 (6 x 4 x 4) conditional probabilities. Although nodes with a large number of conditional probabilities existed within the three Bayesian networks, no nodes had more than three parent nodes. By keeping the number of parent nodes to a minimum, the ecological-domain expert could focus on a few environmental factors at a time. This was an improvement over the belief-matrix method of 90 large-scale ecosystem mapping, where the knowledge base could be comprised of a single table requiring 2000 or more defined weights (e.g., Timberline Forest Inventory Consultants Ltd. 2000; Rosen eta/. 2001; Atticus Resource Consulting Ltd. 2001; Jones 2003; Timberline Forest Inventory Consultants Ltd. 2004). Although the total number of conditional probabilities within the three BN knowledge bases ranged from 2179 to 3725, partitioning the ecological influences into smaller, independent interactions made model parameterization manageable for the ecological-domain expert. One difficulty with regards to the parameterization of the Bayesian network's conditional-probability tables, however, was the requirement to enter conditional probabilities for all combinations of the parent-node classes. This meant that conditional probabilities would be required for environmental conditions that seem mutually exclusive. For example, the Ground Wetness node was dependent on parent nodes that defined terrain position: upper-terrain position (UpSlopePos), mid-terrain position (Mid Flat Landform), lower-terrain position (LowSlopePos), and fluvial benches (Bench). Sometimes it was easy entering the conditional probabilities for Ground Wetness, such as when the input combination was: upper-terrain position (MS Up), not mid-terrain position (MidFlatNot), not lower-terrain position (MS LowNot) and not fluvial bench @enchNot). With this input combination, the Ground Wetness was given 100% probability ofbeing ill:y. It was more difficult, however, to enter conditional-probabilities values for Ground Wetness when the combination of input conditions was: upper-terrain position (MS Up), mid-terrain position (MidFlat), lower-terrain position (MS Low) and high bench (BenchHigh). Each one these input conditions is associated with a different level of ground wetness. As a result, with this combination of input conditions, there was less certainty that wetness could be ascribed to 91 only one Ground Wetness class. To capture this uncertainty, Ground Wetness was given a 20% probability of being wet and an 80% probability of being very wet for this combination of input conditions. In previous projects conducted in BC involving large-scale predictive ecosystem mapping, each GIS database layer could only contain a single value for each location (Ketcheson et al. 2001a; Ketcheson et al. 2001b; Ketcheson et al. 2001c; Jones & McGregor 2002; Jones 2003). The ability of the Bayesian network knowledge base to accept fuzzy classified GIS data inputs was seen as an advantage over previously employed methods for predictive-site-series mapping. The BN was able to ·capture the continuous nature of the Landform Elements input node, which was generated by the continuous-landform-elements BN. The BN was also able to capture the composition of tree species within a forestvegetation stand through the forest-species input node (Species). Other input nodes, such as Age Class, Crown Closure Class and Height Class could be t ~ to use this technique of accepting data as a soft classification rather than a hard classification. In all three of the BN knowledge bases created to estimate site-series group, some input and intermediate nodes were parents to more than one child node. For example, in all three models the Lower Landform intermediate node was used to help define floodplains (Floodplain), toe slopes (Toe), low-terrain position (LowSlopePos) and alluvial fans (Fan). This produced a messy and confusing looking network with links crossing each other and pointing different directions, rather than a simple dendritic structure, which progresses from input to output with all links pointing in the same general direction. The use of input and intermediate nodes for multiple conditional-probability tables, however, reduced variable redundancy within the network. 92 Previous projects of large-scale predictive ecosystem mapping conducted in BC, which employed the belief-matrix knowledge base, were unable to use automated procedures to update the knowledge base to reflect field observations (Jones & McGregor 2002; Jones 2003). Bayesian networks, however, are able to incorporate relationships derived either statistically and from an expert. At the outset of this project, however, it was recognized that model development would rely primarily on expert opinion because the availability of useful reference data was low. Although the study area was partially chosen because of reference-data availability, examination of the reference data during model development revealed that not all site-series groups were adequately sampled. Future projects of large-scale predictive mapping should ensure that an adequate representative sample of reference data be acquired, preferably, specifically for the predictive-mapping project. Additional field gathered reference-data would have proved useful at many stagesduring model development, not necessarily to substitute expert opinion with statistical models, but to aid the expert's decision making process. Field data, which recorded site series and were georeferenced, would have proved useful for both model testing and increasing the number of reference points available for accuracy assessment. For example, the disparity between the results of the knowledge base testing, which ranged from 59% to 92% correct, and the results of the map-accuracy assessment, which ranged from 33% to 48% correct, highlighted the difficulty of accurately interpreting site series from air photos. The results of knowledge base testing may have matched the results of map-accuracy assessment if knowledge base testing was conducted using reference points that were field gathered rather than air photo interpreted. 93 The lack of reference data for map-accuracy assessment hindered adequate mapaccuracy assessment, especially when assessing those site-series groups occurring infrequently on the landscape. Although the CWHvml and CWHvm2 had a large number of reference plots in the common site-series groups, there were a low number of reference points in the uncommon sites. There were very few reference points in each site-series groups of the CWHvh2, due to the low number of CWHvh2 field plots gathered by Jones (2003) that overlapped with the study area. This low number ofCWHvh2 field plots contributed to the high kappa variance. In all three of the developed Bayesian networks, some site-series groups were not present in the reference data, preventing accuracy assessment of these site-series groups. Field data, which recorded the environmental conditions estimated by some or all of the BN nodes, would have allowed additional knowledge base adjustments. It would have been possible to perform some exploratory analysis into the relationships between environmental factors and site-series groups using this kind of reference data. Using such techniques as cluster analysis and regression it may have been possible to see how relevant each node was to the estimation of site-series group, and remove those nodes with little or no significance prior to model construction. Reference data would have possibly been useful during conditional-probability adjustment and construction of the input GIS database. All of the input and intermediate nodes in the Bayesian networks were meant to represent measurable environmental factors (e.g., Soil Moisture and Gully). Although the thematic and spatial accuracy of some input nodes were known (i.e. those nodes derived from the Ministry of Sustainable Resource Management's vegetation resource inventory), this information was unknown both for the 94 GIS modelled input nodes and the intermediate nodes. Thematic and spatial accuracy assessment of these nodes was based solely on visual comparison to other informative data sources, such as air photos, water body/course mapping or digital elevation models. Consequently, little was known about the input or intermediate nodes' ability to accurately portray their respective themes. Reference data could have identified either those input GIS models needing adjustment or the conditional-probability tables needing adjustment. One of the objectives of this project was to create a model that could be updated automatically using field data. This objective was achieved by using Bayesian networks, which allow the conditional probabilities to be automatically adjusted using expectationmaximization algorithm. Unfortunately, this procedure was not possible to conduct without reference data, which recorded the environmental conditions estimated by the intermediate nodes. However, by directing future field-sampling plans to collect those environmental conditions modelled in the knowledge bases, it would be possible to continuously update the conditional probabilities, as new sampling data becomes available. 95 5 CONCLUSION AND RECOMMENDATIONS The overall goal of this project was to develop and test a large-scale ecosystem classification mapping method that met the needs of forest-related management activities in British Columbia. The approach employed was an expert-based system built around a Bayesian-network knowledge base. This approach, however, did not meet the project's overall goal because not all three of the objectives were accomplished successfully. None of the large-scale ecosystem maps produced for the study area's biogeoclimatic variants achieved the accuracy assessment result of 65% correct. The approach, however, did successfully achieve the other two study objectives. Bayesian networks, with their graphical components of nodes and links, offered to the ecological-domain expert an easy to use interface to express the interaction between ecological factors. The graphical nature of Bayesian networks also improved the interpretability of the ecological interactions. The ability to create intermediate, or summary nodes, between the model's inputs and outputs allowed the domain expert to reduce the complexity of the problem and allowed the ecologist to focus on only a few interactions at a time. Bayesian networks have the capacity to be updated and adjusted automatically using data gathered from field observations or air-photo interpretation. Due to a lack of supplementary field data, however, the Bayesian-network knowledge bases developed for each ofthe three BEC subzone/variants within the study area were defined solely using expert opinion. Similar site series were grouped to compensate for the lack of spatial resolution in the input GIS database required to discern individual site series. Even with this 96 site-series grouping, however, the accuracy assessment results indicated that the resulting predictive ecosystem maps poorly predicted the dominant site-series group. Despite the lower than expected results from the.accuracy assessment, the Bayesiannetwork approach could still be considered a viable approach to predict large-scale ecological units given some improvements to the procedure. Future attempts to apply Bayesian networks to predictive mapping of site series should consider applying some of the suggestions summarized below. Changes to each of the modelling stages, GIS database construction, knowledge base construction, and accuracy assessment, could improve site series prediction. To potentially improve the successful prediction of site series, all GIS database layers should be verified using georeferenced points that are gathered through field observations or air-photo interpretation. Performing exploratory statistical analysis on the relationships between the GIS layers and site series could help reduce knowledge base complexity by identifying those GIS layers exhibiting a relationship with site series. Model updates could be easier if the knowledge base was designed to predict site series, with site-series groups added as an alternate output node. To aid knowledge base development there should be a representative georeferenced sample of site-series points. Additionally, a representative georeferenced sample of each of the intermediate nodes would be beneficial. These samples could be used to inform the ecologist of the need to change the conditional-probability tables, or to automatically update the conditional-probability tables through the expectation-maximization algorithm. 97 To ensure that the data set used to test the knowledge base adequately represents the data set used to assess map accuracy, both sets of data should come from the same sampling plan. The data set used to test the knowledge base should be a subset of all the samples, while the data set used to assess map accuracy should be comprised of the remaining samples. Although the focus of the PEM work conducted in BC over the past six years has involved the use ofbeliefmatrices, Bayesian networks can offer a viable and advantageous alternative approach. Further pilot studies incorporating some or all of the recommendations suggested herein should result in large-scale ecological-unit maps that meet or exceed map accuracies obtained through the belief-matrix approach. Exploration of the Bayesiannetwork approach should continue with an overall goal of producing the most accurate largescale predictive-ecosystem maps possible for managing forest-related activities. 98 6 REFERENCES Atkinson, P. M ., and Tatnall, A. R. L. 1997. Neural networks in remote sensing. International Journal of Remote Sensing 18:699-709. Atticus Resource Consulting Ltd. 2001. Predictive Ecosystem Mapping (PEM) Operational Pilot Project for the Upper Wood River. Unpublished report prepared for Wood River Forest Inc., Revelstoke, BC. Bailey, R G. 1989a. Ecoregions of the Continents (map) . U.S . Department of Agriculture, Forest Service, Washinton, DC. Scale 1:30,000,000. Bailey, R. G. 1989b. Explanitory supplement to the ecoregions map of the continents. Environmental Conservation 15:307-309. Banner, A., IePage, P. , Shaw, J. & deGroot, A. J. (compilers and editors). 2004: Pattern, Process and Productivity in Hypermaritime Forests of North Coastal British Columbia: The HyP 3 Project. Background and Six Year Results. Unpublished Draft Report, BC Ministry afForests, Smithers, British Columbia. Banner, A., Mackenzie, W. , Haeussler, S., Thomson, S., Pojar, J. & Trowbridge, R. 1993. A Field Guide to Site Identification and Interpretation for the Prince Rupert Forest Region. Land Management Handbook Number 26. Government of British Columbia, Ministry of Forests, Victoria, BC. Barnes, B. V., Pregitzer, K. S., Spies, K. S., and Spooner, T. A. 1982. Ecological forest site classification. Journal of Forestry 80:493-498. Beckingham, J.D. & Archibald, J. H. 1996. Field Guide to Ecosites ofNorthern Alberta. Special Report 5. Natural Resources Canada, Canadian Forest Service Northwest Region, Northern Forestry Centre, Edmonton; Alberta. Beckingham, J. D., Corns, I. G. W. & Archibald, J. H. 1996a. Field Guide to Ecosites of West-Central Alberta. Special Report 9. Natural Resources Canada, Canadian Forest Service Northwest Region, Northern Forestry Centre, Edmonton, Alberta. Beckingham, J.D., Desilets, M., Nielsen, D., and Johns, F. 1999. Multi-layered and statistically based ecosystem mapping: the de facto standard for land resource planning in the 21st century. Earth Observation Magazine . Beckingham, J.D., Nielsen, D. & Futoransky, V. A. 1996b. Field Guide to Ecosites ofthe Mid-Boreal Ecoregions of Saskatchewan. Special Report 6. Natural Resources Canada, Canadian Forest Service Northwest Region, Northern Forestry Centre, Edmonton, Alberta. 99 Bell, J. C., Grigal, D. F., and Bates, P. C. 2000. A soil-terrain model for estimating spatial patterns of soil organic carbon. Pages 295-310 in Wilson, J., and J. Gallant, editors. Terrain analysis: principles and application. John Wiley and Sons, New York City NY. Biggs, W., Eligh, P., Sims, R. & Wiart, R. 1997. A Business Approach to Terrestrial Ecosystem Mapping (TEM) and TEM Alternatives. Government of British Columbia, Resource Inventory Committee, Victoria, B.C. Burgess, T. M., and Webster, R. 1980. Optimal interpolation and isarithmic mapping of soil properties; the semi-variogram and punctual kriging. Journal of Soil Science 31:315331. Burrough, P. A. 1989. Fuzzy mathematical methods for soil survey and land evaluation. Journal of Soil Science 40:4 77-492. Campbell, R. W., Dawe, N. K., Mctaggart-Cowan, I., Cooper, J. M., Kaiser, G. W., and McNall, M. C. E. 1990. Birds ofBritish Columbia. Volume 1 Nonpasserines: introduction and loons through waterfowl. Government ofBritish Columbia, Victoria, BC Carpenter, G. A., Gjaja, M. N., Gopal, S., and Woodcock, C. E. 1997. ART neural networks for remote sensing: vegetation classification from Landsat TM and terrain data. IEEE Transactions on Geoscience and Remote Sensing 35:308-325. Carpenter, G. A., Gopal, S., Macomber, S., Martens, S., Woodcock, C. E., and Franklin, J. 1999. A neural network method for efficient vegetation mapping. Remote Sensing of Environment 70:326-338. Chambers, B. A., Naylor, B. J., Nieppola, J., Merchant, B., and Uhlig, P. 1997. Field Guide to Forest Ecosystems of Central Ontario. Ontario Ministry ofNatural Resources, Toronto, Ontario Ciallela, A. T., Dubayah, R., and Lawrence, W. 1997. Predicting soil drainage class using remotely sensed and digital elevation data. Journal of Soil Science 62:171-178. Cleland, D. T., Avers, P. E., McNab, H., Jensen, M. E., Bailey, R. G., King, T., and Russell, W. 1997. National Hierarchical Framework ofEcological Units. Pages 181-200 in M. S. Boyce, and A. Haney editors. Ecosystem Management Applications for Sustainable Forest and Wildlife Resources; Yale University Press, New Haven, CT. Congalton, R. G., and Green, K. 1999. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices. Lewis Publishers, Boca Raton, Florida 100 Congalton, R. G., and Mead, A. 1983. A quantitative method to test for consistency and correctness in photo-interpretation. Photogrammetric Engineering and Remote Sensing 49:69-74. Congalton, R. G., and Story, M. 1986. Accuracy assessment: a user's perspective. Photogrammetric Engineering and Remote Sensing 52:397-399. D'Ercole, C., Groves, D. 1., and Knox-Robinson, C. M. 2000. Using fuzzy logic in a geographic information system environment to enhance conceptually based prospectivity analysis of Mississippi Valley-type mineralisation. Australian Journal of Earth Sciences 47:913-927. Davis, J. R. 1993. Experts systems and environmental modelling. Pages 3-35 in Jakeman, A. J., M. B. Beck, and M. J. McAleer, editors. Modelling change in environmental systems. John Wiley and Sons Ltd., New York NY. Davis, J. R., Hoare, J. R. L., and Nanninga, P.M. 1986. Developing a fires management expert system for Kakadu National Park, Australia. Journal ofEnvironmental Management 22 :215-227. Demarchi, D A. 1995. Ecoregions ofBritish Columbia. 4th Edition. Ministry of Environment Lands and Parks, Wildlife Branch, Victoria, B.C. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39:1-38. Devore, J. L. 2000. Probability and Statistics for Engineering and the Sciences., 5th edition. Duxbury, Pacific Grove, CA Dikau, R. 1989. The application of a digital relief model to landform analysis in geomorphology. Pages 51-77 in J. Raper editor. Three Dimensional Applications in Geographic Information Systems; Taylor & Francis, London . .Driscoll, R. S. 1984. An Ecological Land Classification Framework for the United States. Miscellaneous Publication 1439. U.S. Department of Agriculture, Forest Service, Washington DC. Ecological Stratification Working Group 1996. A National Ecological Framework for Canada. Environment Canada and Agriculture and Agri-Food Canada, Hull, Quebec. Eng, M. & Meidinger, D. V. 1999. A method for large-scale biogeoclimatic mapping in British Columbia. Web report at: http://www.for.gov.bc.ca!hre/pubs/pubs/l 032.htm, 101 Fall, A. 2002. SELES Predictive Ecosystem Mapping Support Models. Unpublished. Foody, G. M. 1996. Fuzzy modelling of vegetation from remotely sensed imagery. Ecological Modelling 85:3-12. Foody, G. M. 1999. Applications of the self-organising feature map neural network in community data analysis. Ecological Modelling 120:97-107. Foody, G. M. 2002a. Hard and soft classifications by neural network with a non-exhaustively defined set of classes. International Journal of Remote Sensing 23:3853-3864. Foody, G. M. 2002b. Status of land cover classification accuracy assessment. Remote Sensing of Environment 80:185-201. Freidman, N., Geiger, D., and Goldszmidt, M. 1997. Bayesian network classifiers. Machine Learning 29:131-163. Hailu, G. and Sommer, G. 1999. On amount and quality ofbias in reinforcement learning. Pages 1491-1495 in IEEE International Conference on Systems, Man and Cybernetics (IEEE SMC'99). Tokyo, Japan. Huser, B. 1999. Integrated Monitoring: Final Project Report. Environment Waikato, Hamilton East, New Zealand. Jenny, H. 1941. Factors of Soil Formation. McGraw-Hill, New York Jensen, F. V. 1996. Bayesian network basics. AISB Quarterly 94:9-22. Jones, C. 1999. Virtual Ecosystem Mapping Prototype Accuracy Analysis. Unpublished report prepared for Government of British Columbia Ministry of Forests, Victoria, BC. Jones, C. 2003. North Coast Predictive Ecosystem Mapping Report 2002-2003. Unpublished consultant report prepared for Allen Banner, Regional Research Ecologist, Ministry of Forests, Smithers, B.C. Jones, C. 2004. Patry Lake Predictive Ecosystem Mapping Final Report. Unpublished report prepared for Slocan Group, Nelson, BC. Jones, C. & McGregor, G. 2002. EcoGen Predictive Ecosystem Mapping Report for the Lillooet Forest District. Unpublished. 102 Jones, R. K. 1983. Field Guide to Forest Ecosystem Classification for the Clay Belt, Site Region 3e. Ministry ofNatural Resources, Toronto, Canada. Jones, R. K., Meidinger, D. V., Clark, D., Shultz, F. & Moon, D. 1999. Towards the Establishment of Predictive Ecosystem Mapping Standards: A White Paper. Government of British Columbia, Resource Inventory Committee, TEM Alternatives Task Force, Victoria, B.C. Kanellopoulos, I., and Wilkinson, G. G. 1997. Strategies and best practice for neural network image classification. International Journal of Remote Sensing 18:711-725. Ketcheson, M. V., Dool, T., Kernaghan, G., Lessard, K., Bums, G., Wilson, S. & Smith, G. 2001a. TEMBEC Industries Inc.: Canal Plats Operating Area PEM (Predictive Ecosystem Mapping). Unpublished. Ketcheson, M. V., Dool, T. & Wilson, S. 2001b. Arrow TSA Site Index Adjustment Project, Predictive Ecosystem Mapping (PEM), Year Two Final Project Report. Unpublished. Ketcheson, M . V., Dool, T. & Wilson, S. 2001c. TFL 3 Predictive Ecosystem Mapping (PEM): Year Two Final Report. Unpublished. Kimes, D. S., Holben, B. N. , Nickeson, J. E., and McKee, A. 1996. Extracting forest age in a Pacific Northwest forest from thematic mapper and topographic data. Remote Sensing of Environment 56:133-140. King, D., Bourennane, H., Isambert, M., and Macaire, J. J. 1999. Relationships ofthe presence of a noncalcareous clay-loam horizon to DEM attributes in a gently sloping area. Geoderma 89:95-111 . Klappstein, G. D. & Corns, I. G. W. 1996. Field Guide to Ecosites of Southwestern Alberta. Special Report 8. Natural Resources Canada, Canadian Forest Service Northwest Region, Northern Forestry Centre, Edmonton, Alberta. Klinka, K., and Skoda, L. 1977. Synecological Map of the University of British Columbia Research Forest. The Forestry Chronicle 53:348-352. Krige, D. G. 1963. Two dimensional weighted moving average trend surfaces for oreevaluation. Journal of the South African Institution ofMining and Metallurgy 66:1338. Linderman, M., Liu, J., Qi, l, An, L., Ouyang, Z., Yang, J., and Tan, Y. 2004. Using artificial neural networks to map the spatial distribution of understorey bamboo from remote sensing data. International Journal ofRemote Sensing 25:1685-1700. 103 MacMillan, R. A., Pettapiece, W. W., Nolan, S.C., and Goddard, T. W. 2000. A generic procedure for automatically segmenting landforms into landform elements using DEMs, heuristic rules and fuzzy logic. Fuzzy Sets and Systems 113:81-109. Major, J. 1951. A functional, factorial approach to plant ecology. Ecology 32:392-412. Marcot, B. G., Holthausen, M. G., Raphael, M. G., Rowland, M., and Wisdom, M. 2001. Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. Forest Ecology and Management 153:29-42. Marshall, I. B. & Schuts, P. H. 1999. A National Ecological Framework for Canada: Overview. Environment Canada and Agriculture and Agri-Food Canada, Hull, Quebec. McBratney, A. B., Hart, G. A., and McGarry, D. 1991. The use of region partioning to improve the representation of geostatiscally mapped soil attributes. Journal of Soil Science 42:513-532. McBratney, A. B., Odeh, I. 0. A., Bishop, T. F. A., Dunbar, M. S., and Shatar, T. M. 2000. An overview ofpedorhetric techniques for use in soil surveys. Geoderma 97:293-327. McCracken, R. J., and Cate, R. B. 1986. Artificial intelligence, cognitive science, and measurement theory applied in soil classification. Soil Society ofAmerica Journal 50:557-561. Meidinger, D. V. 2003. Ecosystem Mapping Accuracy and Timber Supply Applications. BC Ministry of Forests, Research Branch, Victoria, BC. Meidinger, D. V. 2004. North Coast PEM Project Quality Assurance. Unpublished Draft Report, BC Ministry of Forests, Victoria, BC. Meidinger, D. V. & Pojar, J. 1991. Ecosystems of British Columbia. Special Report Series No.6. B.C. Ministry of Forests, Victoria, B.C. Miller, J., and Franklin, J. 2002. Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependencies. Ecological Modelling 157:227-247. Mitchell, W. R. & Eremko, R. 1987. Ecosystem Mapping of the Truax Creek Basin in the Kamloops Forest Region. Research Report RR87007-KL. BC Ministry of Forests and Lands, Victoria, BC. 104 Moon, D. 1999. Situation Analysis for Knowledge-Based Systems. Government of British Columbia, Resource Inventory Committee, TEM Alternatives Task Force, Victoria, B.C. Moore; D. M., Lees, B. G., and Davey, S.M. 1991. A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. Environmental Management 15:59-71. Moore, I. D., Gessler, P. E., Nielsen, G. A., and Peterson, G. A. 1993. Soil attribute prediction using terrain analysis. Soil Science Society ofAmerica Journal57:443-520. Mulder, J. A. and Corns, I. G. W. 1994. Knowledge Based Ecosystem Prediction: Field Testing and Validation. Pages 829-835 in GIS '94 Symposium. Vancouver, BC. Norsys. 1997. Netica Application: User's Guide, version 1.05. Norsys Software Corp., Vancouver, B.C. Omernik, J. M. 1987. Ecoregions ofthe conterminous United States. Annals of the Association ofAmerican Geographers 77:118-125. Pack, R. T., Tarboton, D. G., and Goodwin, C. N. 1998. The SINMAP approach to terrain stability mapping. in 8th Congress of the International Association ofEngineering Geology, Vancouver, British Columbia, Canada, 21-25 September 1998. Pal, M., and Mather, P.M. 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing ofEnvironment 86:554-565. Park, S. J., and Vlek, P. L. G. 2002. Environmental correlation of three-dimensional soil spatial variability: a comparison of three adaptive techniques. Geoderma 109:117140. Pearl, J. 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. in Series in Representation and Reasoning; Morgan Kaufmann Publishers, San Francisco, CA. Plach, M. 1999. Bayesian networks as models of human judgement under uncertainty: the role of causal assumptions on belief updating. Kognitionswissenschaft 8:30-39. Pojar, J., Klinka, K., and Meidinger, D. V. 1987. Biogeoclimatic ecosystem classification in British Columbia. Forest Ecology and Management 22:119-154. 105 Pojar, J., Meidinger, D. V., and Klinka, K. 1991. Chapter 2: Concepts. Pages 9-38 in Meidinger, D. V., and J. Pojar, editors. Ecosystems ofBritish Columbia. Special Report Series 6. BC Ministry of Forests, Victoria, BC. Qi, F., and Zhu, A.-X. 2003. Knowledge discovery from soil maps using inductive learning. International Journal of Geographical Information Science 17:77-795. Renooij, S. 2001. Probability elicitation for belief networks: issues to consider. The Knowledge Engineering Review 16:255-269. RIC 1998. Standard for Terrestrial Ecosystem Mapping in British Columbia. Government of British Columbia, Resource Inventory Committee, Terrestrial Ecosystem Task Force, Victoria. RIC 1999. Standards for Predictive Ecosystem Mapping- Inventory Standards. Government of British Columbia, Resource Inventory Committee, Terrestrial Ecosystems Task Force, Victoria, B.C. Rodriguez, F., Maire, E., Comjault-Rade, P., and Darrozes, J. 2002. The Black Top Hat function applied to aDEM: a tool to estimate recent incision in a mountainous watershed (Estibere Watershed, Central Pyrenees). Geophysical Research Letters 29:9-1-9-4. Rosen, D., Blashill, W. & Coderre, M. 2001. Predictive Ecosystem Mapping for Canfor's TFL48 (Chetwynd). Unpublished. Schmidt, J., and Hewitt, A. 2004. Fuzzy land element classification from DTMs based on geometry and terrain position. Geoderma in press. Scull, P., Franklin, J., Chadwick, 0. A., and McArthur, D. 2003. Predictive soil mapping: a review. Progress in Physical Geography 27:171-197. Sims, R., Towill, W. D., Baldwin, K. A., and Wickware, G. M. 1989. Field Guide to the Forest Ecosystem Classification for Northwestern Ontario. Ontario Ministry of Natural Resources, Toronto, Ontario Skidmore, A. K. 1989. An expert system classifies eucalypt forest types using Landsat Thematic mapper data and digital terrain model. Photogrammetric Engineering and Remote Sensing 55:1449-1464. Skidmore, A. K., Ryan, P. J., Dawes, W., Short, D., and O'Loughlin, E. 1991. Use of an expert system to map forest soils from a geographical information system. International Journal of Geographical Information Science 5:431-445. 106 Skidmore, A. K., Watford, F., Luckananurug, P., and Ryan, P. J. 1996. An operational GIS expert system for mapping forest soils. Photogrammetric Engineering and Remote Sensing 62:501-511. Stock, M. 1987. AI and expert systems: an overview. AI Applications in Natural Resources, Agriculture and Environmental Science 1:9-1 7. Sulyma, R. & Alward, B. 2004. Blackwater PEM Retrofit Accuracy Assessment. Forest Floor Contracting Ltd., Fort St. James, BC. Teng, C. H., and Fairbairn, D. 2000. Comparing expert systems and neural fuzzy systems for object recognition in map object revision. International Journal ofRemote Sensing 23:555-567. Timberline Forest Inventory Consultants Ltd. 2000. An Application of the Standards for Predictive Ecosystem Mapping, Version 1.0: A Report on the Targe CreekSutherland PEM Project. Unpublished report prepared for the BC Ministry of Environment, Lands and Parks, Victoria, BC. Timberline Forest Inventory Consultants Ltd. 2004. Blackwater Predictive Ecosystem Mapping Retrofitting. Unpublished report prepared for Slocan Forest Products Ltd., Mackenzie, BC. Udvardy, M.D. F. 1975. A Classification of the Biogeographical Provinces of the World. IUCN Occasional Paper No. 18. International Union for Conservation of Nature and Natural Resources, Morges, Switzerland. Varis, 0., and Kuikka, S. 1997. BENE-EIA: A Bayesian approach to expert judgement elicitation with case studies on climate change impacts on surface waters. Climate Change 37:539-563. Walter, H., and Box, E. 1976. Global classification ofnatural terrestrial ecosystems. Vegetatio 32:75-81. Webster, R. 1994. The development ofpedometrics. Geoderma 62:1-15. Wiken, E. B. 1986. Terrestrial Ecozones of Canada. Ecological Land Classification Series No. 19. Environment Canada, Hull, Quebec. Wood, J. 1996. The geomorphological Characterization of Digital Elevation Models. Ph.D. Thesis. Department of Geography, University ofLancaster, UK. Zadeh, L.A. 1965. Fuzzy sets. Information and Control8:338-353. 107 Zelazny, V. F., Ng, T. T. M., Hayter, M.G., Bowling, C. L., and Bewick, D. A. 1989. Field Guide to Forest Site Classification in New Brunswick: Harvey- Harcourt and Fundy Site Regions. New Brunswick Ministry ofNatural Resources and Energy, Fredericton, N.B. Zhang, J., and Foody, G. M. 2001. Fully-fuzzy supervised classification of sub-urban land cover from remotely sensed imagery: statistical and artificial neural network approaches. International Journal of Remote Sensing 22:615-628. Zhu, A.-X. 1997. Measuring uncertainty in class assignment for natural resource maps under fuzzy logic. Photogrammetric Engineering and Remote Sensing 63:1195-1202. Zhu, A.-X., Hudson, B., Burt, J., Lubich, K., and Simonson, D. 2001. Soil mapping using GIS , expert knowledge, and fuzzy logic. Soil Science Society ofAmerica Journal 65:1463-1472. Zhu, X., Healey, R. G., and Aspinall, R. J. 1998. A knowledge-based system approach to design of spatial decision support systems for environmental management. Environmental Management 22:35-48. Zolaneski, C. A., Wickwaer, G. M ., Delorme, R. J., Sims, R. & Corns, I. G. W. 1995. Forest Ecosystem Classification for Manitoba: Field Guide. Special Report 2. Natural Resources Canada, Canadian Forest Service Northwest Region, Northern Forestry Centre, Edmonton, Alberta. 108