MODELLING THE DISTRIBUTION OF ADVANCE REGENERATION IN LODGEPOLE PINE STANDS IN THE CENTRAL INTERIOR OF BRITISH COLUMBIA by Darin Warren Brooks B.Sc. (Hons) University of Saskatchewan, 1996 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN NATURAL RESOURCES AND ENVIRONMENTAL STUDIES (GEOGRAPHY) UNIVERSITY OF NORTHERN BRITISH COLUMBIA December 2012 © Darin Brooks, 2012 1+1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre reference ISBN: 978-0-494-94154-6 Our file Notre reference ISBN: 978-0-494-94154-6 NOTICE: AVIS: The author has granted a non­ exclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distrbute and sell theses worldwide, for commercial or non­ commercial purposes, in microform, paper, electronic and/or any other formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conform em ent a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. W hile these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada Abstract The recent mountain-pine beetle outbreak in the Central Interior of British Columbia is leaving unsalvaged stands with minimal silvicultural treatment, raising questions about their ability to regenerate and die implications of this uncertainty to future timber supply and habitat values. No system currently exists to predict, on a landscape level, which pine stands will have adequate stocking of advance regeneration suitable for release upon canopy death. My research takes a groundtruthed, landscape-level approach to modelling, predicting, mapping, and prioritizing stands for salvage or rehabilitation. The resulting model, derived from recursive partitioning of data from 964 sample plots, created a landscape level output with a predictive accuracy of 78%. Across the Sub-Boreal Spruce study area, I estimate that 58% of m ature pine-leading stands (approximately 840,000 ha) are likely or very likely to be stocked with at least 600 stems/ha of living understory trees. ii Table of Contents Chapter One: Introduction................................................................................................... 1 Factors Impacting Advance Regeneration and Stand Recovery..................................6 Forest Management O ptions.......................................................................................... 13 Empirical Models of Species Distribution and Abundance....................................... 13 Recursive Partitioning and Classification Trees.......................................................... 15 Objectives..........................................................................................................................21 Chapter Two: M ethods....................................................................................................... 23 Study A rea........................................................................................................................23 Field Data Collection....................................................................................................... 25 Post-Field Data Organization......................................................................................... 31 Statistical A nalyses.......................................................................................................... 35 Geographical Information System (GIS) A nalyses..................................................... 36 Chapter Three: Results........................................................................................................ 40 Trends in Advance Regeneration D ensity................................................................... 40 Classification Tree A nalysis........................................................................................... 51 Applying the Classification Tree in a GIS M odel........................................................ 64 Chapter Four: Discussion and Recom m endations......................................................... 75 Application to Forest M anagem ent............................................................................... 78 Assessment of the Modelling A pproach...................................................................... 87 Chapter Five: Conclusions................................................................................................. 96 References...........................................................................................................................101 Appendix 1 ........................ 118 Appendix 2 .........................................................................................................................119 Appendix 3 .........................................................................................................................138 List of Tables Table 1. Number of advance regeneration plots per BEC unit......................................32 Table 2. Ecological variables from publicly available geospatial data layers..............37 Table 3. Number of regeneration plots sampled in different BEC u n it........................40 Table 4. Number of regeneration plots sam pled by age classes/ BEC u n it..................42 Table 5. Number of regeneration plots sampled by age classes and sites series........ 42 Table 6. Descriptive statistics for advance regeneration densities (BEC unit)............ 44 Table 7. Descriptive statistics for advance regeneration densities (age class).............44 Table 8. Descriptive statistics for advance regeneration densities (forest district).... 44 Table 9. Total num ber of stocked and not stocked plots across the study area.......... 45 Table 10. Key statistics describing the classification tree accuracy................................52 Table 11. List of im portant variables/significant ecological factors...............................58 Table 12. Classification tree variable importance for each stocking group................60 Table 13. Translation of classification tree model into textual rules (all trees).......... 61 Table 14. Translation of classification tree model into textual rules (conifer-only)... 62 Table 15. Translation of classification tree model into textual rules (MSSpa)............63 Table 16. Probability of being stocked for each BEC unit derived................................ 67 Table 17. Probability of being stocked for each NTS 1:250,000 m aptile.......................68 Table 18. Probability of being stocked for each forest district....................................... 69 List of Figures Figure 1. Example of a classification tree illustrating predictor variable splits 17 Figure 2. Example of a ROC curve.................................................................................... 20 Figure 3. Study area.............................................................................................................24 Figure 4. Location of sampled stands (thesis plots)........................................................ 26 Figure 5. Cross section illustrating site mesoslope positions........................................ 29 Figure 6. Proportion of plot locations am ong different site meso-slope positions.... 41 Figure 7. Proportion of plots in pine-leading forest cover polygons (all trees) 46 Figure 8. Proportion of plots in pine-leading forest cover polygons (conifers-only). 46 Figure 9. Proportion of 964 plots in pine-leading forest cover polygons (MSSpa).... 47 Figure 10. Advance regeneration and plot basal area linear model..............................48 Figure 11. Advance regeneration and distance to seed source linear m odel.............. 49 Figure 12. Advance regeneration and m ean annual precipitation linear m o d e l 49 Figure 13. Advance regeneration density and projected height linear m odel............ 50 Figure 14. Advance regeneration density and projected age linear m odel................. 50 Figure 15. Example of a rule path in the conifer-only 600 stems/ha............................. 53 Figure 16. Final pruned classification tree model (all trees).......................................... 55 Figure 17. Final pruned classification tree model (conifer-only species)..................... 56 Figure 18. Final pruned classification tree model (MSSpa)........................................... 57 Figure 19. ROC curve chart illustrating model accuracy (all trees).............................. 65 v Figure 20. ROC curve chart illustrating m odel accuracy (conifer-only species).........65 Figure 21. ROC curve chart illustrating m odel accuracy (MSSpa)................................66 Figure 22. Proportion of forest districts that are within the study area....................... 70 Figure 23. Colour themed map depicting probability of stocking (conifer-only) 71 Figure 24. Spatial extent of probability of stocking >80% for map tile NTS 93G ....... 73 Figure 25. Spatial extent of probability of stocking <20% for map tile NTS 93G ....... 74 Figure 26. Intersection of very likely and very unlikely stocked cells..........................77 Figure 27. Landscape view of m apped m odel output....................................................80 Figure 28. Advance regeneration location example (mature pine stands).................. 82 Figure 29. Advance regeneration location example (5 probability classes)................ 84 Figure 30. Location of large, very unlikely to be stocked m ature pine areas.............. 86 vi Acknowledgements I would like to thank the following people and agencies: Dr. Phil Burton, Dr. Roger Wheate, and Dr. Michael Gillingham (my committee members), w ho provided me with more latitude than I deserved. This thesis w ould not have been possible without their patience, guidance, and support. They have been incredible m entors ... and consequently I am a better student, teacher, and mentor for having know n them. Thank you from the bottom of my heart. I thank my wife, kids, and extended families for their patience as they watched me sacrifice holidays, w eekends, family nights and evenings of dance for this thesis; always graciously and unselfishly supporting me in this dream, I can't imagine how I will begin to pay them back. To David Kim from KFM in Prince George, BC: thank you for seeing the im portance of this thesis and for providing me w ith the time, material, equipment, and m entorship that was required. Thank you to everyone who provided funding (w hether monetary or in kind): BC Forest Science Program, the Pacific Forestry Centre of the Canadian Forest Service, PFC graduate scholarship, Kim Forest M anagem ent Ltd. A special thanks to Heather and Talya for helping m e with field research and for providing me w ith a couple of summers that I will never forget. I thank everyone who provided data to the aggregate dataset: Dave Coates, Craig DeLong, Patience Rakochy, Chris Hawkins, Brad Hawkes, Deb Cichowski, and Phil Burton. And finally, thanks to the College of the North Atlantic for the use of m aterials and time during the "final push". Chapter One: Introduction British Columbia's (BC) lodgepole pine (Pinus contorta var. latifolia) forests have recently (-1999 to present) suffered from an epidemic of mountain pine beetles (Dendroctonus ponderosae), hereafter MPB. Although such infestations are largely a natural occurrence, the intensity and spatial extent of the present infestation are unprecedented, such that attem pts to control the growing pest population have failed (Stockdale et al., 2004; Burton, 2010). In 2004, Pedersen (2004, p. 11) noted "despite the suppression measures, the epidemic as well as the am ount of beetlekilled wood continues to increase". The Provincial Aerial Overview Surveys of Forest Health have indicated that the current epidemic reached its peak in 2005, and although mountain pine beetles continue to attack pine stands throughout the province, the outbreak will eventually subside by 2021 (Walton, 2012). In the meantime, because pest-management intervention w as not sufficient to combat the current damage, research interest has largely shifted to guide timber salvage and regeneration operations. Social, economic, and operational limitations continue to ham per those attempts to salvage the infested area (Stockdale et al., 2004). Those constraints have contributed to predictions of a future severe timber shortage (Pedersen, 2004; Burton, 2010). Immature pine stands, previously presum ed to be immune to pine beetle attack and provide harvestable tim ber within the next few 1 decades, have become susceptible to pine beetle attack at the peak of the outbreak as the older stands most susceptible to beetle-induced mortality were overrun w ith beetles and their offspring become increasingly desperate for food (Maclauchlan, 2006). With most of the m ature pine forest killed, and with many of the plantations established in the 1970s and 1980s equally compromised or unlikely to reach maturity by the time salvage operations are completed, it is expected that BC will soon be suffering from an unprecedented timber supply fall-down over the m idterm of the next 10 to 50 years (Pousette and Hawkins, 2006; Snetsinger, 2011). Advance regeneration consists of the seedlings, saplings or sprouts that have naturally established in a forest understory before any large-scale disturbance, and can be found naturally established under some m ature lodgepole pine stands (Johnson et al., 2003; Burton, 2006). The survival and release of those understory trees has been identified as a valuable m echanism to help avoid tim ber shortages created by the MPB infestation (Burton, 2006; Coates, 2006; Greisbauer and Green, 2006; Pousette and Hawkins, 2006). The presence of advance regeneration decreases the need for the use of more aggressive intervention and recovery m ethods to regenerate forest stands by forest managers (Greene et al., 1999). Forest planners are faced w ith the quandary of deciding which stands of trees are unlikely to recover and, therefore, should be harvested and planted to speed 2 regeneration and which stands should be left to recover on their own. The lack of knowledge regarding the capacity and growth of advance regeneration to respond to a disturbance will be a major obstacle for the prediction of future harvests and the development of appropriate response efforts (Messier et al., 1999). The ability of advance regeneration to respond to a disturbance can be impacted by factors such as the composition of species within the affected stand, the particular characteristics (e.g., height, age, and diameter) of the trees involved, and the availability of light determined by canopy gaps (Mitchell, 2005). This potentially high variability among stands has limited advance regeneration modelling using individual tree and stand models such as SORTIE (Hawkins et al., 2012). It is this variability within and am ong stands that also present a challenge to SORTIE's modelling. SORTIE develops an o utp u t that is derived from complex interactions between light, growth, seed dispersal, and mortality—all ecological factors that have a high variability between stands (Sattler, 2009). Coates (2006) suggests that the variability among stands is so great, that stands can only be m anaged on an individual basis. A landscape-level m odel for predicting the distribution of advance regeneration is critical for efficient and effective forest planning, particularly in the context of the province's m anagem ent unit objectives to ensure harvested areas in BC are adequately renewed through stocking standards (McWilliams, 2009). 3 Despite the widespread occurrence of understory regeneration in lodgepole pine forests, the question remains as to w hether those forests will rem ain adequately stocked upon the death of the overstory. Stocking can be broadly defined through current tree metrics (typically volume based) and early stand conditions that are used to measure the probability of achieving long-term m anagement goals for that stand (Martin et al., 2005; McWilliams, 2009). A fully stocked stand or sample plot is one in which all open space is (or is projected to be) occupied by living trees. To ensure that BC forests continue to return to their pre-harvested conditions, stocking standards have been established to maximize the probability of successful regeneration. It is, therefore, both critical and prudent, for forest planners to assess stand regeneration at a landscape level. The stocking required to regenerate stands, however, is not the same across the landscape (i.e., stands within different biogeoclimatic unit and site series require different stocking standards to achieve renewal). And in some cases harvested areas do not have timber production as their primary objective (set by government and industry planners), and, therefore, have different stocking standards to achieve their particular managem ent objectives. Dhar and Hawkins (2011) identified three critical research-driven purposes for advance regeneration assessment: forecasting long-term development (yield) of attacked stands, selecting stands for further research, and forecasting impacts on ecological attributes. The data to support estimates of stocking in stands following 4 MPB are available, but are addressed in several different research projects. There is a need to collapse those data sets into a single usable form at (perhaps even differentiating between predictive and operational) that can be used by industry, research entities, and a variety of other stakeholders (Wilford, 2008; D har and Hawkins, 2011). The need to understand the patterns of advance regeneration occurrence has an operational focus. Current reforestation stocking standards - designed prim arily to guide the reforestation of clearcuts - m ay need to be revisited by forest planners. Designation of preferred and acceptable species, as well as their densities, m ay need to be altered or amended in order to economically utilize post-MPB advance regeneration (Lewis, 2005; Greisbauer and Green, 2006). Commercial logging strategies may also require adjustm ent on a stand-by-stand basis to provide protection to advance regeneration that may or may not contribute to the m idterm timber supply. Forest managers require a better understanding of the cost implications associated with the retention of advance regeneration versus 'starting over' with planting after logging in term s of facilitating important m idterm forest harvesting opportunities (Dhar and Hawkins, 2011). Consideration m ust also be provided to the potential effects of climate change on advance regeneration, and its ability to sequester carbon and thereby mitigate some climate change impacts (Brown et al., 2012). A more complete assessment of the distribution of advance regeneration is required to identify the potential for future challenges such as vulnerability to pests and diseases. As forests continue to respond to climate change, so too do the populations and distributions of num erous insects and fungal pathogens that may pose a threat to the regenerating species (Lewis, 2005). Factors Impacting Advance Regeneration and Stand Recovery Research on the recovery of stands following a disturbance through advance regeneration is limited because the vast majority of studies have chosen to focus only upon their early development (Messier et al., 1999). Patterns of understory release in forests disrupted by natural disturbance, and the long-term grow th, yield and habitat value of such forests are unfortunately poorly documented. Natural disturbances are im portant ecological processes that drive observed patterns in ecosystems. The study of disturbance regimes and the interaction of disturbance agents has only recently become a central theme in ecology (Mori, 2011). Aspects of disturbance ecology that require further inquiry include disturbance history, spatiotemporal dynamics, disturbance interactions, regeneration response to disturbance, and the application of resistance and resilience theory to ecosystem managem ent (Wright et al., 2000). Ecological disturbances are defined as disruptive changes to an ecological system by an external event that makes changes to the resources w ithin the system, 6 but does not necessarily result in the destruction of the ecological system itself (White and Pickett, 1985; Pickett et al., 1989; Johnson and Miyanishi, 2007; Hughes, 2010). Studies indicate that disturbance impacts are defined by m ore than just the affecting agent (i.e., whether it is insect, fire, wind, or other agents). Pickett et al. (1986) state that the bases for understanding of an ecological disturbance are threefold: 1) identifying the existing ecological system that may be affected by a disturbance; 2) discerning only the changes to the ecological system that are a result of a disturbance; and 3) understanding the consequences of the disturbance. How effective will advance regeneration be in supporting stand recovery after disturbance? Stand recovery rates vary through advance regeneration. For example, the re-establishment of MPB attacked stands through regeneration can be delayed by five to ten years due to species composition before the beetle attack and the vigour of overstory trees (Bouchard et al., 2005; Coates, 2006). A m ore complex example of stand recovery after disturbance is the creation of a thinning effect in balsam fir (Abies balsamea) stands by spruce budw orm (Choristoneura fumiferana) in eastern Canada. The natural regeneration is released post-budworm attack, only to be attacked itself 30 or 40 years later - perpetuating a cycle of favourable conditions for subsequent spruce budw orm attacks (MacLean and Anderson, 2008). The MPB outbreak in BC is having a profound impact on the m akeup of the affected stands. MPB is not unique as a disturbance agent that just kills the overstory, typically 7 serving as a means of releasing the understory regeneration and vegetation; this is true of wind storms, other insect outbreaks, and root rot pockets, etc. (Gautreaux, 1999). Pine-dominated forests have varying amounts of live trees, saplings and seedlings remaining after they have been attacked by MPB, which are collectively referred to as "secondary stand structure" or "secondary structure" (Coates et al., 2006). Whereas forest fire will kill most seedlings and saplings, w indthrow and insect infestations typically kill the overstory trees, but not the regeneration, thereby facilitating its release (Johnson et al., 2003; Roberts, 2004; Burton, 2008b). N atural methods of forest regeneration can be particularly attractive to forest planners w hen stocking is reliable, or economic and operational factors prohibit the large-scale use of artificial regeneration strategies. Natural regeneration - whether by seed or through the release of existing seedlings -- offers several advantages to alternative approaches. For example, natural regeneration is cost-effective w hen com pared to more interventionist reforestation strategies and helps to protect the forest's natural diversity (Weetman and Vyse, 1990). Forest understory species m ay thrive following bark-beetle attack and this will cause a dominant species shift w ithin the stand. The dwarf shrubs kinnikinnick (Arctostaphylos uva-ursi) and twinflower (Linnaea borealis) have been documented to increase in cover following MPB-induced canopy opening (Williston 8 et al., 2006). Other studies indicate that plant species richness (particularly grasses) is measurably higher in post-MPB attacked stands and non-tree vegetation can represent one to two-thirds of the CO 2 uptake contribution in these stands (Stone and Wolfe, 1996; Bowler et al., 2012). Therefore, the beetle outbreak m ay be regarded as a stand-releasing event that causes the understory vegetation to assume a more dominant position within the stand as it grows to take the place of the lost canopy trees (Greene et al., 1999; Burton, 2008a; Lindenmayer et al., 2008). Young trees and other vegetation surviving in the understory have a strong potential to thrive because of the new availability of resources created by the lost stand members (Coates et al., 1994). Stands w ith a healthy understory m ay recover from MPB attack to yield harvestable timber within a timeframe of 40 to 80 years (Coates and Hall, 2005; Coates et al., 2006). Understory light availability increases in natural canopy gaps (resulting from the mortality of one or more m ature trees) and after forest harvesting (Palik et al., 1997; Burgess and Wetzel, 2000; Oguchi et al., 2006; Boucher et al., 2007), and larger gaps provide more light than smaller gaps (McGuire et al., 2001; Gray et al., 2002; Palik et al., 2003). Large canopy gaps due to dead trees can result in reduced evapotranspiration and, therefore, decrease sum m er groundwater depletion (Helie et al., 2005). Light availability is also greater near gap edges than in forest interior positions (Matlack, 1993; Heithecker and H alpem , 2007). High light availability is 9 associated w ith greater light-saturated photosynthetic rates (Ellsworth and Reich, 1992; Bond et al., 1999) so any death or removal of overstoiy trees should prom ote increased growth of advance regeneration (Williams et al., 1999). The new availability of increased light that results from the loss of shading crowns leads to the stimulation of germination and seedling release in m any shade-tolerant trees such as Abies spp. (McCarthy, 2001). The adaptability of these shade-tolerant species is responsible for their establishment in the understory and further supports their recruitment into the canopy of the stand (Messier et al., 1999). As a result, some tree species are more likely than others to be found within the advance regeneration stratum. For example, shade tolerant species such as the subalpine fir and interior white spruce (a natural hybrid of Picea engelmannii and P. glauca, common throughout the BC Central Interior) are species commonly found among the advance regeneration found in BC's sub-boreal forests (Coates et al., 1994; Kneeshaw and Burton, 1997). The distribution of other plant species w ithin a region can also impact the success of advance regeneration strategies. For example, the presence of aggressive non-tree vegetation can negatively impact the ability of advance regeneration to respond to overstory m ortality (Bassman et al., 1992; Stone and Wolfe, 1996). Knowledge of sapling and seedling grow th is also an essential element for the formulation of accurate predictions of future stand conditions (Wright et al., 2000). The species composition w ithin the understory itself m ay also 10 have an influence on the density of advance regeneration. Advance regeneration is generally more abundant w hen accompanied by a greater diversity of tree species within the overstory (Amup, 1996). Particular characteristics allow some species to respond more positively than others after an insect outbreak. For example, the relative availability of shade and sunlight has been highlighted as particularly influential for advance regeneration. Wright et al. (2000, p. 1528) found "a clear relationship between shade tolerance and the magnitude of the effects of past periods of suppression and release on sapling growth." Species able to tolerate conditions of low light have a propensity to thrive, and possibly adapt to variable light regimes due to changing canopy structure within the understory through advance regeneration (Oliver and Larson, 1996; Messier et al., 1999; McCarthy, 2001). Understory light m ay remain relatively unchanged for up to five years after a MPB attack because it takes that long for the dead foliage to fall and the residual canopy will continue to shade the vegetation in the understory (Coates and Hall, 2005). Furthermore, pine snags are another enduring source of shade for plants w ithin the understory (Coates and Hall, 2005). The shade from these snags and any surviving trees can help to prevent understory trees from being out-competed by other vegetation (Lieffers and Stadt, 1994). Although some lodgepole pine seedlings can establish under full canopies, it is generally classified as a shade-intolerant species (Bums and Honkala, 1990; Klinka 11 et al., 2000). Shade-intolerant spedes tend to be less likely to respond well w ithin the cohort of advance regeneration following a disturbance because they have a more fixed, less adaptable structure and physiology (Messier et al., 1999). Its status as a shade-intolerant spedes, however, does not p red u d e lodgepole pine from being found as a natural component of the secondary stand structure surviving an MPB outbreak. The presence of lodgepole pine regeneration within a m ature forest can be explained by the presence of uneven canopy closure, which perm its sunlight to reach the light-demanding young pines (McCarthy, 2001). Indeed, in cold dry environments where other tree spedes are at a competitive disadvantage, lodgepole pine may be the dom inant spedes of advance regeneration, and can release to create an uneven-aged stand after MPB attack (Axelson et al. 2009). Finally, there is evidence that natural regeneration may be assisted through seed rain recruitment (the deposition of seeds spread by bird, wind, hum ans, and animals) by some spedes (Moles and Drake, 1999), although Burton (2006) found inconsistendes in the relationship between regeneration densities and proxim ity to non-pine conifer seed sources. Leadem et al. (1997) state that successful regeneration of stands relies on seed production and dispersal, so there m ust be some fundamental dependency on the availability of seed from shade-tolerant spedes, either within the stand or from nearby on the landscape. 12 Forest Management Options Standard industrial forestry in the form of even-aged management and dearcut harvesting promotes the development of simplified stand structures through the application of homogeneous treatm ents across stands at fixed intervals that are often shorter than the return intervals of natural disturbances in the same region (Coates and Burton, 1997; Palik et al., 2002; Seymour et al., 2002). The loss of structural complexity in m anaged forests presents concerns for conserving biodiversity, sustaining key ecosystem functions, and maintaining ecological resilience in the face of a changing climate (Franklin et al., 2002; Lindenm ayer and Franklin, 2002; Palik et al., 2002; Tews and Jeltsch, 2004; Drever et al., 2006). These concerns over the ecological consequences of simplified forest structures have created interest in developing novel silvicultural systems that promote more natural patterns of stand development by emulating the frequency, scale, and severity of natural disturbances (Coates and Burton, 1997; Franklin et al., 1997,2002; Palik et al., 2002; Seymour et al., 2002). Empirical Models of Species Distribution and Abundance Ecologists have spent a great deal of time examining the interaction betw een plant species and their micro- and macro-environments. Through the collection and examination of environmental data, ecologists attem pt to discover patterns that assist them in predicting the presence or absence of a particular species, community, 13 or ecosystem (e.g., Franklin, 2009). Unfortunately, the environmental variables used in predictive models are complex and covarying. Their interaction w ith each other can vary by simply changing where they exist in time or space. For example, even if we understand how variables interact w ith each other, our understanding can change when the variables are examined at a slightly higher elevation or on a warm er day. This ties a model to a particular geographical context, as the derived model may encounter accuracy errors w hen the geography changes, even if the combination of variables remains the same (Guisan et al., 1999). Furthermore, environmental variables can be significantly affected by unknown additional lurking variables, (such as anthropogenic interferences) or a sophisticated interaction of multiple variables. That is, our understanding of the variable interaction between variablex+ variabley + variablez can change if an additional unknown and unseen variableu is present. These limitations are especially problematic for ecology, where variability and deviations occur in nature as the norm, not the exception. General linear models have traditionally been used by ecologists to describe the relationship between causal factors and an observed response (Draper and Smith, 1981; Burnham and Anderson, 2002). Yet despite the well docum ented drawbacks of relying on techniques such as stepwise m ultiple regression, the use of 14 these procedures for ecological studies are prevalent (Stephens et al., 2005). Many ecologists sacrifice the potential of m aking erroneous conclusions in the search for the most parsimonious model. This is im portant in datasets that have factor interactions too complex for parametric models (Freedman, 1983; Derksen and Keselman, 1992). Recent studies have show n that classification trees, or recursive partitioning, can be more reliable and accurate than traditional param etric linear models (Friedl et al., 1999; Hansen et al., 2000). Recursive Partitioning and Classification Trees Classification and regression trees are the statistical application of binary trees first introduced by Breiman et al. (1984). Classification trees are an intuitive and easily interpreted type of supervised learning m ethod used in exploring relationships in data. Further, w hen the explored relationships in the data take the form of a decision tree and are then applied to new data to predict new values, the classification tree becomes a predictive tool (Han et al., 2011). Classification trees are equipped to deal with continuous and categorical data, missing values, and outliers (Moisen, 2008). But unlike linear regression models, which identify and measure the relationship between the response and explanatory variables, classification trees divide the explanatory variables into homogenous groups through a series of recursive partitions. Botanists employ dassification-style decision trees in the form of the routinely used dichotomous keys for the correct 15 identification of a plant by following a tree of if-then statements. Classification trees are also prevalent in the fields of medicine and psychology for diagnosis and decision making (Ripley, 1996). In a standard classification tree, the idea is to split the dataset based on homogeneity of data. The goal is to achieve pure homogeneous groupings of data to 1) describe the systematic structure of the data; and 2) predict unobserved data (De'ath and Fabricius, 2000). For example, consider two variables, tree age and tree vigour, that predict w hether a tree is likely to be attacked by mountain pine beetle (1), or not (0). If 90% of the trees that are >80 years old in our training data showed signs of beetle attack, we can split the data here and age becomes a top node in the tree. Further, if it is discovered that any tree w ith a vigour less than five (ten being the healthiest and one being the unhealthiest), is attacked 80% of the time, then vigour <5 or >5 would be the second branch in the tree. Graphically it w ould look like the sequence of decisions portrayed in Figure 1. 16 ALLTREES A 6F > 8 0 YEARS < 8 0 YEARS ATTACKED WOT ATTACKED 20 % VIGOUR > 5 .0 < 5 .0 ATTACKED WOT ATTACKED Figure 1. Example of a classification tree illustrating predictor variable splits. Note that the first split at age >80 years old is an internal node, i.ev other nodes can branch from this node. The next split at vigour >5.0 is a terminal node, signalling the end of the branch. Classification trees are a useful tool for analyzing data because of their visual simplicity. The trees are rules for predicting or explaining the response category using hierarchical binary splits of tire explanatory variables. When predicting the category of response, classification trees are used as an algorithm to classify new data. An observation will follow a path in the tree starting in the top or root node and follow its individual splits at the interior or branch nodes, dow n the path until it reaches a terminal or leaf node w here no m ore splitting occurs (Kim and Yates, 2003). The criteria used to split the branches to achieve the nodes represent the "ifthen" model. Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or 17 more predictor variables. Classification tree analysis is one of the m ain techniques used in data mining (Hastie et al., 2001). Each predictor variable is examined to see how well it can divide the node into two groups. If the predictor is continuous, a trial split is made betw een each category (every unique value is a 'category') of the variable. The process is repeated by moving the split point across all possible division points in the training data until the best improvement is found. This split point is saved as the best possible split for that predictor variable in this node. The process is then repeated for each of the other predictor variables. A well-recognized advantage of the decision tree representation of a m odel is that the paths through the decision tree can be interpreted as a collection of rules. The information associated with the textual rules include a node num ber for reference, a decision of 0 or 1 to indicate (in the application considered here) whether the plot is stocked or not stocked, the num ber of training observations and the strength or confidence of the decision. The measurement of predictive accuracy of a variable (i.e., m ean decrease in accuracy) is the more meaningful importance indicator (Berk, 2005). The m ean decrease in accuracy is defined as the normalized difference between classification accuracy and the accuracy when the variable values have been random ly perm uted. 18 Higher mean decrease in accuracy indicates that a variable is more im portant to the accuracy of the classification. The Receive Operating Characteristic (ROC) curve, a diagnostic m easure (represented by a graphic curve and a numerical score) that plots false positive against true positive rates, is generally described as one of the most accurate ways to measure the discriminatory power of the resultant classification tree m odel through comparison of the relationship between sensitivity (in this case, true positives) and specificity (false positives) (Hanley and McNeil, 1982; Beck and Schultz, 1986; Krivec and Matjaz, 2011). In other words, specificity of the tree is the probability that one more not stocked point added to the analysis will be correctly classified as not stocked; and conversely, sensitivity is the probability that one more stocked point added to the analysis will be correctly classified as stocked (Feldman and Gross, 2003). A perfect test (100% sensitive and 100% specific) would score a ROC value of 1.0. The closer the value is to 1.0 (100%), the better the distinguishing capability of the classification tree model. The ROC graphic is a curved line that extends from the 45 degree line to the upper left comer; higher model accuracy is associated w ith a sharp distinct curve that approaches the top left com er of the graph (Figure 2). The ROC curve is essentially the m easurem ent of the trade-off between sensitivity 19 1.0 > 0.5 and < 1.0 0)-> •+ ns pH 0) > 0.5 •J3 o Ph QJ 3 i- i b £ ca e 0) 10cm tall) and saplings at each plot represented the am ount of advance regeneration. This num ber was extrapolated to estimate the num ber of stems/ha of advance regeneration per plot. The stems/ha of all conifer regeneration >30 cm in height represented the target or response variable for much of the classification tree analysis. At each of my plot locations, a camera tripod (with built-in level) was placed and levelled at plot center. Digital oblique and upward-directed hemispherical ("fisheye") photographs were taken to assist in site description and subsequent incoming light/canopy openness studies respectively. Finally, three m ature trees, that were indicative of the stand's average age, were cored using an increment borer. The cores were protected in soda straws, and tree rings were counted post-field to estimate approximate stand age. Additional data used for my thesis included an aggregation of raw field data collected from advance regeneration studies in the study area from 1996 to 2007. Although the studies may not have had similar objectives or deliverables as my thesis, I was able to extract portions of the raw data that w ere suitable inputs for my 30 analysis. In total, I collected data from 162 plots within 37 stands (sam pled in 2006 and 2007). These data were combined w ith data collected in a similar m anner (i.e., along transects from non-pine stands into pine-dom inated stands) by P.J. Burton and K.D. Coates (Coates et al. 2006), and other analogous plot data designed to representatively sample entire pine-leading stands rather than specifically the effects of plot distance from a non-pine seed source (e.g., Dhar and Hawkins, 2011). In total, 4241 plots were aggregated into a single plot dataset. Although I only collected 162 plots in support of this analysis, the study derives its conclusions from a 964 plot subset of the aggregated plot dataset. Table 1 lists the source, num ber of plots, and BEC units of all data used as inputs for the thesis classification tree m odel development. The 964 plots were selected by a GIS query that isolated only those plots within m ature pine-leading stands (>50% lodgepole pine by basal area and m apped as >60 years old at the time of sampling) w ithin the SBS dk, dw2, dw3, mc2, mc3, m kl, mw, and w kl BEC units of 1:250,000 NTS m ap tiles 93E, 93F, 93G, 93J, 93K, and 93L. Post-Field Data Organization I organized the data from the 964 plot sam ple data in an ArcGIS (ESRI, 2011) file geodatabase. I subsequently used this dataset as the training data for the classification tree analysis. A common table attribute (advance regeneration 31 stems/ha) was established and populated with field collected data. Classification tree analysis requires the variable being predicted to be categorical. Table 1. Number of advance regeneration plots sam pled per BEC unit by data source. Source Year dk Brooks 2007 Burton 2005 CarrotLake 2006 Cichowski 2005 2005 Coates DeLong 2005 Fluxnet 2006 Hawkes 2002 2005 Hawkins NIVMA 1996 - 2001 Rakochy 2004 5 38 10 77 4 150 Total Plots 292 dw2 - - 8 21 - - - - 23 1 - 45 BEC Unit (SBS) mc2 mc3 dw3 m kl 59 54 36 138 5 100 18 55 9 6 9 53 13 8 1 4 42 - - - - - 141 259 - w kl 14 - - - - - - - 3 - - - 143 67 mw Total Plots - - 127 208 100 23 93 25 9 77 92 18 192 3 14 964 - - As my thesis objective was to build a m odel that assists in predicting the stocking status of stands, it was necessary to convert the advance regeneration stems/ha to a Boolean attribute of stocked or not stocked. An im portant qualification should be made with regards to my thesis' use of the attribute "stocked". I applied a conservative estimate of stocking, as the data only refers to the stocking of seedlings (>10cm tall but less than 130cm tall) and saplings (>130cm tall but less than 7.5 cm dbh). My analysis does not take into account germ inants (<10 cm tall) or existing trees (>7.5 dbh), even though both could be considered important elements of advance regeneration. Germinants are not infrequent in the understories of the 32 more moist zones of the SBS (Vyse et al., 2009). I also m ake the assum ption that all pine trees >7.5 can dbh within the plot will not survive post MPB, and therefore are not considered in the stocking calculation. This becomes an important consideration when we consider the growing space actually available to the advance regeneration. The results are consequently a conservative estimate of stocking and could be better identified as "stocked with seedlings and saplings". I selected a threshold of 600 stems/ha to separate stocked from not stocked stands as several studies referring the percentage of plots meeting or exceeding minimal stocking standards have used 600 stems/ha as the baseline (Bulmer et al., 2002; Burton, 2006; Vyse et. al., 2009). 600 stems/ha is widely recognized as the minimum well-spaced preferred trees/ha in in stocking guidelines for the regeneration of clearcuts (e.g., British Columbia Ministry of Forests 2000), and is threshold beneath which stand rehabilitation measures m ight be undertaken. Based on that threshold, I created two new attributes in the geodatabase, nam ed alltrees600 and conifers600. The alltrees600 attribute was populated w ith a 1 if the advance regeneration total stems/ha for the sample plot w as equal to or greater than 600 stems/ha and populated with a 0 if the advance regeneration total stem s/ha was less than 600 stems/ha. This attribute applied to all tree species within the sample plot. A second 600 stems/ha attribute, conifer600, was also populated w ith a 1 or 0 following the same criteria as the alltrees600 attribute, bu t only counting those 33 conifer species in the sample plot. In addition, a third threshold group, MSSpa, w as established to assign a stocking value of one or zero according to the appropriate minimum stocking standard of preferred and acceptable species found in the Establishment to Free Growing Guidebook for the Prince George Forest Region (British Columbia Ministry of Forests, 2000). Using the Establishment to Free Growing Guidebook the MSSpa, I identified criteria for each biogeodimatic unit and site series pairing that contained a study plot. The published MSSpa values and conditions were used to define the level of stocking against which to assess seedling and sapling densities (Appendix 1). MSSpa can be as low as 200 stem s/ha to as high as 700 stems/ha in the subzone/site series present in my study area (British Columbia Ministry of Forests, 2000). Plots that h ad less than the prescribed MSSpa were considered not stocked and plots that met or exceeded the prescribed MSSpa were considered stocked. The "well-spaced" requirem ent for counting regeneration was ignored, as this value (which typically ranges from 1.0 to 2.5 m between seedlings) is arbitrary and depends on the silvicultural prescription and grow th modelling assumptions, while fully m ature trees are often observed w ith stem bases growing <1.0 m apart. This simple bu t conservative measurement of understory stocking (designed to be applied under open-growing conditions) was critical to the classification tree development and subsequent GIS m apping model. 34 Statistical Analyses Initially, basic descriptive statistics were applied to the data to examine patterns in the variables. T-tests, R package t.test (R Core Team, 2012), were conducted to examine the contribution of non-conifers to the advance regeneration densities. Linear regression models, R package glm (R Core Team, 2012), were applied to the data to explore the possibilities of relationships between advance regeneration densities and several key variables. The prim ary objective of building a predictive geospatial model of the probability distribution of advance regeneration in target pine stands was pursued by developing classification tree m odels in DTREG (Sherrod, 2006) and R (R Core Team, 2012), using a recursive partitioning package called rpart (Themeau et al., 2012). The classification-tree m odel required a binary response variable (stocked or not stocked) and publicly available explanatory variables that currently exist in digital format. Therefore, any data I collected in the field that could not be derived from publicly available geospatial data sets (e.g., understory plant cover) were not used as a potential explanatory (predictor) variable. I determined that data derived from: 1) the Vegetation Resource Inventory (VRI, 2006); 2) Predictive Ecosystem Modelling (PEM, 2008); and 3) interpolated and elevation adjusted climate data calculated using ClimateBC (W ang et al. 2006) were all publicly accessible and contained key potential predictors. The three datasets together yielded 54 potential predictors (Table 2). The intention w as to find 35 the best combination of potential predictors that could be used to create a parsimonious predictive model to accurately classify each 1-ha cell section w ithin target stands across the study area as stocked (1) or not stocked (0). Additionally, each cell would not only be designated as stocked or not stocked, but w ould also be assigned the probability of being stocked or not stocked. A cross validation argument cv.tree function was applied to the full tree to minimize the misclassification error associated w ith overfitting of the tree. The cross validation pruned back the tree to the optimal num ber of splits/node pairs. See A ppendix 3 for a more detailed description of recursive partitioning, tree pruning using crossvalidation, and variable importance calculation. Geographical Inform ation System (GIS) Analyses The second objective of my thesis w as to create a geospatial m odel that assists in converting the probability of stocking into a geospatial environm ent. Through the use of classification tree modeling, the rules for determining stocked or not stocked and the probabilities of stocking were determined. One of the passive inputs (included in the model developm ent but not used as a key predictor) into the classification tree analysis was the geographic coordinates of the 100 m cells. The themed polygons portraying the likely distribution and location of stocked and unstocked stands dom inated by m ature lodgepole pine in Sub-Boreal Spruce 36 Table 2. Ecological variables from publicly available geospatial data(evaluated as potential inputs) to the probability of stocking classification tree model. Variable Elevation Leading Spedes Live Volume per Hectare at 12.5 cm Leading Spedes Live Volume per Hectare at 17.5 cm Second Spedes Live Volume per Hectare at 12.5 cm Second Spedes Live Volume per Hectare at 17.5 cm Leading Spedes Dead Volume per Hectare at 12.5 cm Leading Spedes Dead Volume per Hectare at 17.5 cm Mean Annual Temperature Mean Warmest Month Temperature Source Type GIS 2 VRI 2 VRI 2 2 VRI VRI 2 VRI 2 2 VRI 2 C*c 2 C?c Variable UTM coordinate (easting) UTM coordinate (northing) Forest District Biogeodimatic Subzone/Variant Site Series Soil Moisture Regime Soil Nutrient Regime Absolute Moisture Regime Surface Expression Source Type GIS 2 GIS 2 VRI 1 VRI 1 PEM 1 VRI 1 VRI 1 VRI 1 VRI 1 Modifying Process VRI 1 Mean Coldest Month Temperature C*c Mesoslope Position VRI VRI 1 Temperature Difference Between MWMT and MCMT 2 Mean Annual Predpitation dc c*c Quadratic Diameter at 12.5 cm Mean Summer (May to Sept.) Predpitation Annual Heat: Moisture Index C bc Summer Heat: Moisture Index Degree-Days Below 0°C C bc 2 2 2 2 2 Quadratic Diameter at 17.5 cm Crown Closure VRI VRI 2 Site Index VRI VRI 2 VRI Degree-Days Above 5°C dc dc Degree-Days Below 18°C Degree-Days Above 18°C & 2 c*c 2 Basal Area 2 2 C *c 2 2 Spedes Composition of Leading Spedes VRI VRI 1 1 1 Leading Spedes Percentage VRI 2 Number of Frost-Free Days C *c 2 Spedes Composition for Second Spedes Second Spedes Percentage Percent of Pine in Leading Spedes VRI VRI VRI 1 2 Frost-Free Period Predpitation as Snow Extreme Minimum Temperature over 30 Years & 2 2 2 2 Percent of NonPine in Stand VRI 2 Hargreaves Reference Evaporation VRI VRI VRI 2 2 2 Hargreaves Climatic Moisture Defidt Distance (m) to Nearest Nonpine Seed Source (SW direction) Species Composition of Nearest Nonpine Seed Source d* dc 2 Ratio of Pine to NonPine in Stand GIS GIS 2 1 T ree Cover Pattern Vertical Complexify Projected Age for Leading Species Prcjeded Height for Leading Spedes d* C *c Data Source: VRI (Vegetation Resource Inventory), PEM (Predictive Ecosystem Mapping), O (ClimateBQ, GIS (derived data using ArcGIS) Data Type: 1 (categorical), 2 (continuous) 2 2 biogeodimatic zone. The GIS m odel incorporates the exported rules from the dassification tree to isolate criteria identified in each branch. Each of the term inal nodes in the dassification tree could be extracted from the resultant tree individually through a series of single program m ing statements, w ith each properly executed statement resulting in a queried output. This, however, w ould be an ineffident use of the final model, as the result w ould be a series of if-then strings that would each have to be applied against test data. By creating a dassification tree and the splitting rules that make u p the tree as a model, all the branches leading to all the terminal nodes could be applied against test data simultaneously. Because the model dataset contained geographical coordinates, the dataset could be imported into a GIS. The resultant is a spatially attributed file that identifies the variables used in the model, the predicted target value for every 100-m raster cell (as stocked or not stocked), the terminal node used for each predicted value, and a probability value for the designation of stocked or not stocked for every 100-m raster cell. The probability value is the critical attribute that is used in creating the predictive colour-theme map. Much like the classification tree output, ArcGIS models are components linked together through connectors such as tools, variables, and iterators (Allen, 2011). The final model output is an extensible geospatial m odel that can be run within the ArcGIS environment. The m odel automates the 38 geoprocessing tasks that are required to prepare the data for modelling, geospatial analysis, and colour-themed mapping. 39 Chapter Three: Results Trends in Advance Regeneration D ensity A total of 243 pine-leading stands (containing the 964 plots) w ere field sampled. A majority of these plots were located in the SBSdk, SBSdw3, SBSmc2, and SBSmc3 biogeodimatic units and on 01,03,04, or 05 site series (Table 3). Table 3. Num ber of regeneration plots sam pled in pine-dominated stands on different site series in different biogeodim atic subzones. BEC Unit dk dw2 dw3 mc2 mc3 m kl mw w kl Total 01 142 16 38 97 20 12 1 0 326 02 71 0 2 21 0 2 0 0 96 03 52 6 27 1 2 16 0 7 111 04 0 13 45 0 114 4 2 3 181 Site Series 07 05 06 3 2 21 2 1 2 39 14 73 13 5 0 2 0 5 9 0 23 0 0 0 1 0 3 137 58 23 08 0 0 3 0 0 0 0 0 3 09 0 4 16 0 0 1 0 0 21 10 1 1 2 4 0 0 0 0 8 Total 292 45 259 141 143 67 3 14 964 The 964 plots were located on a range of topographic locations. The percentages on the site meso-slope cross section (Figure 6) quantify the location of the 964 plots according to their topographical locations (as determ ined by intersection w ith VRI digital data). It should be noted that although site mesoslope position w as one of the potential predidors, it was a more general topographic descriptor nam ed surface expression (VRI, 2006) that became a key predictor variable. 40 57-5% 3.8% 1.8% 3.6% Stockad Plots 0.0% 0.4% 63.3% 4.9% 12% 5.3% M iddle L ow er * ► 1 1 1 1 1 « « 1 « 1 c .2 0} sa s L evel 0.4% C re st 0.1% U pper All Mots t of Plots Site Position Meoo NA crest 31* fUtOeveO 35 lower slope middle slope toe 37 554 17 4 1 * NA * 32.8% Figure 6. Proportion of the stu d y area p lo t locations (n=964) am ong different site m eso­ slope positions. N A represents plots for w hich m eso-slope value was n o t available in the VRI The distribution of plots reflects the current age class structure of pine forests in the Northern Interior, with many m apped as being greater than 100 years of age, but w ith "old-growth" lodgepole pine and natural stands under 80 years of age m uch more difficult to find. The distribution of plots by stand age and BEC subzone is shown in Table 4. Cross-tabulating plots by site series and age class, a concentration of sampling in circum-mesic stands 80 to 140 years of age is evident (Table 5). Note that site series 07,08,09,10 have been binned into a single "m oist 41 Table 4. Num ber of regeneration plots sampled in pine-dominated forest cover polygons by age classes* and biogeodim atic subzones. BEC Unit dk dw2 dw3 mc2 mc3 m kl mw w kl Total 4 9 4 2 0 14 0 0 0 29 5 63 4 103 2 5 32 0 14 223 Nominal M apped Age Class * 9 7 8 6 55 64 101 0 24 0 10 3 22 0 42 90 2 23 104 10 1 0 105 18 4 31 0 0 3 0 0 0 0 0 0 0 113 242 354 3 Total 292 45 259 141 143 67 3 14 964 * age class 4 is 61 to 80 years, age class 5 is 81 to 100, age class 6 is 101 to 120, age dass 7 is 121 to 140, age class 8 is 141 to 240, and age class 9 is 240+ years. Table 5. N um ber of regeneration plots sam pled in pine-dom inated stands b y age classes* and sites series, across all biogeodim atic subzones. Site Series 01 02 03 04 05 06 07,08,09,10 Total 4 6 8 3 10 0 1 1 29 5 28 27 37 22 64 28 17 223 Nominal M apped Age Class * 7 9 6 8 76 2 48 166 16 1 8 36 17 0 23 31 99 0 10 40 19 45 0 9 5 14 0 10 10 0 22 5 242 354 3 113 * age dass 4 is 61 to 80 years, age class 5 is 81 to 100, age class 6 is 101 to 120, age dass 7 is 121 to 140, age class 8 is 141 to 240 years, and age dass 9 is 240+ years. 42 Total 326 96 111 181 137 58 55 964 site" category in order to increase sample size. Across all 964 plots, advance regeneration by all species (>10cm tall) averaged a m ean density of 1268 stems/ha and a median density of 600 stems/ha. This means that one-half of the plots sampled had more than 600 stems/ha of advance regeneration (my thesis threshold for designating a plot as stocked), and one-half of the plots sampled had less. A paired-samples t-test was conducted to determine if there was a significant difference between the m ean densities of advance regeneration w hen calculations included and excluded non-conifer species. I found that there was not a significant difference in the mean density between conifers only (mean=1267.87, s.d.=1911.90 stems/ha) and all species (mean=1289.02, s.d.=1909.36 stems/ha) across the study area; t963=4.42, P = 0.08. This result suggests that non-conifer species do not significantly influence the mean density of advance regeneration, and by extension, the stocking status of the stand. An examination of advance regeneration across biogeodim atic subzones shows that mean densities across all subzones exceed the 600 stem s/ha— except in the SBSmw, a subzone with only three samples. Comparison of m ean and median advance regeneration densities across the thesis study area proved to be quite valuable, as this comparison indicates; most sub-populations are not normally distributed. Mean regeneration densities for each BEC unit, age dass, and forest district respectively are presented in Tables 6 through 8. 43 Table 6. Descriptive statistics for advance regeneration densities (stems/ha) for all sam ple plots in each BEC unit. Mean, m edian, stan d ard deviation, and interquartile ran g es are reported. BEC Unit dk dw2 dw3 mc2 mc3 m kl mw w kl n 287 45 262 143 143 61 3 20 Mean 743 1504 1278 1086 2120 2077 266 2035 s.d 1342 1649 1661 1414 2992 2263 115 1571 25% 0 200 200 200 200 400 200 850 Median 200 1200 750 600 600 1200 200 1650 75% 800 1700 1600 1400 3250 2700 300 3750 Table 7. Descriptive statistics for advance regeneration densities (stems/ha) for all sam ple plots per age class. Mean, m edian, stan d ard deviation, and interquartile ranges are reported. Age Class 4 5 6 7 8 9 n 29 223 113 242 356 1 Mean 231 1221 1132 1399 1387 2800 s.d 355 1597 1435 2082 2136 na 25% 0 200 200 200 200 2800 Median 100 600 600 600 600 2800 75% 300 1700 1600 1675 1613 2800 Table 8. Descriptive statistics for advance regeneration densities (stems/ha) for all sam ple plots p er forest district. M ean, m edian, stan d ard deviation, and interquartile ran g es are reported. Forest District Fort St. James Nadina Prince George Quesnel Vanderhoof n 1 350 275 4 334 Mean 1200 746 1410 900 1762 44 s.d na 1283 1663 621 2446 25% 1200 0 300 500 200 Median 1200 400 900 900 750 75% 1200 1000 1750 1300 2375 Across the entire study area, there w as a near equal distribution of stocked and not stocked across the 964 sample plots for all three stocking groups (Table 9). Table 9. Total num ber of stocked and n o t stocked sam ple plots across the stu d y area based on three different definitions of acceptable stocking (stocking groups). Stocked N ot Stocked All Tree Species (600 stem s/ha) 510 454 Conifer Species only (600 stems/ha) 496 468 M inim um Stocking S tandards * 460 504 Stocking G roup Based on the sampling within the study area, all bu t two BEC units (SBSdk and mw) are greater than 50% stocked for all stocking groups, and as high as 70% stocked in the m k l and w kl BEC units (Figures 7 , 8 , and 9). The biogeoclimatic unit SBSmw had no stocked plots within the study area. It is worth noting that there were only three plots gathered in the mw subzone variant, all collected from the same stand. In an attempt to identify any key variables that m ay have a direct (positive or negative) effect on the advance regeneration densities, several potential key variables were examined through a simple linear regression model. This exercise was conducted for the purposes of data exploration and description only, searching for any potential relationships between the advance regeneration 45 100% 90% jj i 80% | 70% | 60% S 50% e 40% | 30% .2 S 20% ** 10% 0% dk dw 2 dw 3 m c2 m c3 m kl mw w kl B io g eo c lim a tic U n it Figure 7. Proportion of 964 plots in pine-leading forest cover polygons in th e N o rth ern Interior Forest Region that m eet a 600 stem s/ha stocking density threshold (for all tree species). 100% ^ 90% | 80% 55 70% | 60% 5 50% e 40% '€ 30% | 20% 10% 0% dk dw 2 dw 3 m c2 m c3 m kl mw w kl B io g eo c lim a tic U n it Figure 8. Proportion of 964 plots in pine-leading forest cover polygons in th e N o rth ern Interior Forest Region that m eet a 600 stem s/ha stocking density threshold (for conifer species). 46 100% na 90% | 80% 35 70% | 60% £ o e 40% 50% •S 30% g1 20% 10% 0% dk dw 2 dw 3 m c2 m c3 m kl mw w kl B io g eo c lim a tic U n it Figure 9. P roportion of 964 plots in pine-leading forest cover polygons in the N o rth ern Interior Forest Region th a t m eet a prescribed stocking density threshold (for m in im u m stocking stan d ard s - preferred and acceptable species). (positive or negative) effect on the advance regeneration densities, several potential key variables w ere examined through a simple linear regression model. This exercise was conducted for the purposes of data exploration and description only, searching for any potential relationships between the advance regeneration stems/ha of each stand and basal area (from VRI), distance to nearest seed source (calculated using GIS), live pine volume per ha (VRI), dead pine volum e per ha (VRI), mean annual precipitation (ClimateBC), m ean annual tem perature (ClimateBC), stand height (VRI), and stand age (VRI). The results of the models, presented in Figures 10 through 14, are represented as scatterplots. The exercise did not uncover any variables that had a strong (r2 >0.5) linear relationship w ith density of advance regeneration. Only distance from nearest seed source yielded a 47 statistically significant result (P=0.007). The r2(0.008), however, indicates that distance from nearest seed source explains less than 1% of the variation in advance regeneration stems/ha. The resulting linear regression equation for distance from nearest seed source is total stems/ha = 1462.0493 - 0.5851 * m to nearest seed source. 18000 • 16000 1 14000 JiCO 12000 • I 10000 4> s60V * • a 8000 < uu I • d 6000 ■■................ 4000 ..... • • i•• . 5 M < 2000 • * * ------1 • 0 , , » M 20 40 60 80 100 120 140 Basal Area (m2/ha) Figure 10. Scatterplot of advance regeneration density vs. stan d basal area (r2 = 0.0002315, P = 0.637, F = 0.2228, n = 964). 48 18000 m 16000 *5 14000 a ~ 12000 i *•2 io o o o 8w s 8000 ■* ' bO u LV B3 w u # * # «* • .. • •* * - 200 400 y • -0.5851X +1462 R 2 = 0.0075 < 600 800 1000 1200 1400 1600 Distance to Nearest Non-Pine Seed Source (m) Figure 11. Scatterplot of advance regeneration density vs. distance to the n eare st non-pine seed source (r2 = 0.008, P = 0.007, F= 7.239, n = 964). 18000 « 16000 14000 12000 •-C 1 0 0 0 0 2000 -T - 200 400 600 800 1000 M e a n A n n u a l P recin itatin n (m m ) Figure 12. Scatterplot of advance regeneration density vs. distance to the n eare st non-pine seed source (r2= 0.003, P = 0.058, F= 3.596, n = 964.) 49 18000 16000 14000 12000 10000 0 5 10 15 20 25 30 35 Projected Height For Leading Species (m) Figure 13. Scatterplot of advance regeneration density vs. stan d projected h eig h t (r2 = 0.001, P = 0.436, F= 0.6068, n = 964). m1 <5 CD 5QJ ■ CO. 1 * I V ft 60 V 6 V V V ■a < 0 50 100 150 200 250 300 Projected Age For Leading Species (yr) Figure 14. Scatterplot of advance regeneration density vs. stan d projected age (r2 = 0.003, P = 0.076, F= 3.147, n = 964). 50 Classification Tree Analysis My model derived from the rpart module in R classified the presence/absence of advance regeneration stocking w ith an accuracy of 77% (11 terminal nodes) for all-tree stocking, 78% (13 terminal nodes) for conifer-only stocking, and 78% (14 terminal nodes) for MSSpa stocking. As expected, the identity and order of predictors deemed im portant by the models were similar for each of the three stocking groups. The final tree for all three stocking groups had BEC unit, distance to nearest non-pine seed source, projected stand age, and basal area as their top four im portant variables. It is also interesting that all three stocking groups utilized the categorical variable BEC unit as their initial split, making it the most im portant variable for all three groups. This w ould appear to indicate that the density of advance regeneration is influenced by the broad factors associated w ith biogeoclimatic subzones (i.e., climate, vegetation, soil) and may im ply that the model is suitable as an overarching landscape-level predictor. The critical statistical outputs for the final model (Table 10) indicate that in all three stocking groups, the resultant final tree w as larger than it needed to be (i.e., the num ber of terminal nodes before pruning was overfitting the data) and that a simpler tree with fewer terminal nodes could achieve the same (or better) accuracy. The num bers of terminal nodes after pruning are the final number of nodes associated w ith the best fitting and m ost parsimonious classification tree model. 51 Table 10. Key statistics describing the classification tree accuracy for the advance regeneration prediction model. The classification tree size, accuracy, and evaluation are listed for all three stocking groups. MSSpa is the Minimum stocking standard (various stems/ha) of preferred and acceptable species, particular to each site series in each BEC unit. 600 stems/ha (all tree) 600 stems/ha conifers) MSSpa Number of terminal nodes (before turning) 24 31 29 Number of terminal nodes (after pruning) 11 14 14 Number of splits (after pruning) 10 13 13 Sensitivity (true positive rate) 82.45% 80.85% 71.99% Specificity (false positive rate) 71.33% 74.15% 83.63%. Misclassification rate 0.2282 0.2241 0.2189 Model accuracy 77.18% 77.59% 78.11%, ROC score 0.8386 0.8327 0.8528 The red arrows in Figure 15 illustrates the path of a rule from initial split to term inal node. The initial split at SUBZVAR (BEC unit) is governed by the rule that the plot m ust be located in SBSdk or SBSmw to follow this branch. 52 YES SU BVAR = dk, mw ^ NO DISTAN CEnear >= 544 BASAL AREA >=39 SURFACE_EXPRESSION = N, 51% of 150 84% of 44 PROJ_H EIGHT <21 100% of 8 74% of 34 79% of 14 Figure 15. Example of a rule path in the conifer only 600 stems/ha classification tree. The tree begins with an initial split of BEC unit and terminates after a split based on surface expression. This rule can be found in Table 15 as rule number 10. The second split at DISTANCEnear (distance to nearest non-pine seed source) is governed by a threshold of 544 m to the nearest non-pine seed source. If the nearest distance to a non-pine seed source is >544 m, then the rule follows to the left and 53 terminates at the dark green node classified as 0, or not stocked. The rule indicates that 96 plots are present in the split. The rule stipulates that there are only 96 plots that fit both splits (BEC unit and distance to nearest seed source) and that the likelihood of those 96 plots being not stocked is 17%. This provides a low confidence that BEC unit and distance from nearest non-pine seed source alone are good indicators for predicting stocking. If the plot is <544.3 m to the nearest non pine seed source, then the rule follows the branch to the right where it encounters a split at basal area. If the basal area is <39.34 m2/ha, then the rule follows to the right where it encounters its last split of surface expression being either U (undulating) or M (rolling). The plot terminates as a stocked node, where 100% of the 8 plots in the 964 plot dataset that conformed to this set of rules were correctly classified as stocked. The rule in text form reads as follows: SUBZVAR=dk, or mw DISTANCEnear<544.3 m BASAL_AREA<39.34 m2/ha SURFACE_EXPRESSION=M (rolling), or P (undulating) The fully pruned classification trees (for all three stocking groups) provide sets of rules for the prediction of stocked with seedlings and saplings (Figure 16,17, 18). The trees are interpreted (as per Figure 14) by following the splits in the branches — to the left if the split value in the tree is true and to the right if the split 54 Projected Age < 7* yi» Diitanceto nwraM non-pin* tourc* > ■ SS7m Saul Aim >* 39 m2Ai* in in Surfec* Expression * It U 9*%af 1 8 / U m f c ^ lm O ^ V b iu m .p - lta c ttm c ttm M - Sped** of n**r*st non-pin* m uk* • AU&ft&Sx Meen ennuil temperature >■ 4 ariciue 0 57*o« 10« Projected Height < 21m 0 61%of 91 Figure 16. Final pruned (using 10-fold cross-validation) classification tree model for predicting the probability of stocking by all tree species m nan Sutaons-d^mwfBTI 0 0 Dbta«K»tOMMftno(V{Mmsourc*>-344m Prc|aaad Aga < 7®yvs UKHngSpadaiOtadVbkmaparMKtam<4JmMia Raul Ana >wMma/ha SurfxaEjpasta-ttU SpactesofManatnor-pinasuica- M JEaSw^c ManamualMnpanOm>-«cddui 1 44%of553 Ln o> ReactedHaight<2ira T [M % o f» 1 \ Bmdon<733*1 SpKtMofiMna nc»vp*»«ourc# mttff f t f i f i * J ftqfKtsdMgM»- 19m 74*0(34 79%S ill OWanots nH fut ikihiIm m m > « ic fn Figure 17. Final pruned (using 10-fold cross-validation) classification tree model for predicting the probability of stocking by conifer only species. PrcfKiodAg»<7Yyti •wriAnb>■19mMwi LoadingSpactoDoadUbfemoporHactam<42mart* Otoncetoneaiesineiw0iesoira>■ssmi 12% Of212 SwracoBtpPHM Dn- N,u l/lJ N SpadMOfMarMnoivpiMSOuffia-ALSw.SR SubBMadW^IKl M m annul pradpiMon< Siam */ t Ita m o f a J SgadH of rwaraot notvfUnt some* - I tfifib j* Figure 18. Final pruned (using 10-fold cross-validation) classification tree model for predicting the probability of MSSpa stocking. ftqfKMIWghUKiri value in the tree is false, until a terminal node is reached. The num ber of root-toterminal nodes range from a minimum of two splits to a maximum of 14 splits. The key to building the classification tree is the identification of the im portant variables used, as they are the only variables used in the resultant rules. The order of appearance in the classification tree does not necessarily indicate the strength or overall importance of a particular variable. W hat is m ore important is the variable's overall contribution to the m odel (i.e., is the model worse without it?) The important variables for predicting the probability of stocking for all three stocking groups remained consistent. Even though the classification m odel had access to an input of 54 equally weighted potential predictors, Table 11 details the important variables (and therefore significant ecological factors) selected for each classification tree. Table 11. List of important variables/significant ecological factors involved in the final model predicting the probability of stocking. MSSpa is the Minimum stocking standard (various stems/ha) of preferred and acceptable species, particular to each site series in each BEC unit. Variable/Ecological Factor Biogeoclimatic Unit Projected Height for Leading Species Leading Species Dead Volume/ha at 12.5 cm Elevation Basal area Projected Age for Leading Species Species Composition of Nearest Non-pine Seed Source Distance (m) to Nearest Non-pine Seed Source (SW direction) Mean Annual Temperature Surface Expression Mean Annual Precipitation Leading Species Live Vohime/ha at 12.5 cm 58 Classification Tree all trees conifers MSSpa ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ The variable importance for each stocking group is based on the m ean decrease in accuracy. The higher the m ean decrease in accuracy, the more im portant the variable is for the overall predictive accuracy of the model (Table 12). The classification tree branches are translated into textual paths that can be interpreted as a collection of rules (Tables 13-15). They are listed in the order of their strength, i.e., highest probability in predicting the target variable (stocked or not stocked). For example: there are 11 textual rules for the stocking group all tree species (listed in Table 13). The two strongest and highest probability rules are: Predictive Rule (Rank 1) probability = 1.00 biogeoclimatic unit = dk, or m w distance to the nearest non-pine seed source <557 m stand basal area <39 m2/ha surface expression = M (rolling), or P (undulating) Following the tree in Figure 16, the above predictive rule terminates at node 23. The probability of 1.00 relates to the node text 1:100% of 8. The 1 indicates that the rule results in the prediction of a stocked cell. The 100% of 8 indicate that 8 of the 964 plots remain in the predictive solution at this point of the tree. And of the remaining 8 plots, all 8 plots are both actually stocked and predicted to be stocked, resulting in a 100% prediction or 1.00 probability. This set of rules has the highest confidence for predictive power on test plots. The rule set with the second strongest predictive power terminates at node 29 in Figure 16. The following path through the classification tree (represented by textual rules below) indicates that 13 of the 59 Table 12. Classification tree variable importance (in order of importance) for each stocking group. Order of importance is determined by the mean decrease accuracy value; a higher mean decrease accuracy values indicate a more important variable for prediction. all tree species stocking group 1 2 3 4 5 6 7 8 9 10 Variable Importance MeanDecreaseAcccuracy Biogeoclimatic Unit Distance (m) to Nearest Non-pine Seed Source (SW direction) Projected Age for Leading Species Leading Species Dead Volume per Hectare at 12.5 cm Basal area Projected Height for Leading Species Mean Annual Temperature Species Composition of Nearest Non-pine Seed Source Elevation Surface Expression 1.01 0.93 0.93 0.91 0.86 0.86 0.85 0.83 0.81 0.44 conifer only stocking group 1 2 3 4 5 6 7 8 9 10 Variable Importance MeanDecreaseAcccuracy Biogeoclimatic Unit Distance (m) to Nearest Non-pine Seed Source (SW direction) Leading Species Dead Volume per Hectare a t 12.5 cm Projected Age for Leading Species Basal area Projected Height for Leading Species Species Composition of Nearest Non-pine Seed Source Mean Annual Temperature Elevation Surface Expression 1.02 0.90 0.90 0.89 0.88 0.87 0.87 0.86 0.84 0.54 MSSpa stocking group 1 2 3 4 5 6 7 8 9 10 Variable Importance MeanDecreaseAcccuracy Biogeoclimatic Unit Leading Species Dead Volume per Hectare at 12.5 cm Projected Age for Leading Species Basal area Distance (m) to Nearest Non-pine Seed Source (SW direction) Projected Height for Leading Species Leading Species Live Volume per Hectare at 12.5 cm Mean Annual Precipitation Species Composition of Nearest Non-pine Seed Source Surface Expression 1.04 1.01 0.96 0.95 0.92 0.92 0.91 0.88 0.88 0.59 60 Table 13. Translation of classification tree model for predicting the probability of stocking in all tree species 600 stems/ha stocking threshold into textual rules. 600 stemsAta (all trees) stocked 1 allsp600“ l prob=1.00 SUBZVAR=dk,mw DJSTANCEnear< 557.4 BASAL_AREA< 38.78 SUKFACE_EXPRESSION=M,P not stocked 6 allsp600=0 prob=0.37 SUBZVAR-dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>=77.5 DEAD_VOL_PER_HA_SPPl_125< 4233 SPECIES_CD_lnear-AT,EF,SW,SX 2 allsp600*1 prob-O.85 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>-77.5 DEAD_VOL_PER_HA_SPPl_125< 42.33 SPEOES_CD_lnear*SSB 7 allsp600=0 prob-0.31 SUBZVAR*dk,mw DISTANCEnear< 557.4 BASAL_AREA< 38.78 SURFACEJEXFRESS!ON=N,U PROJ_HEIGHT_l< 2135 3 allsp600*l prob-0.75 SUBZVAR-dk,mw DISTANCEnear< 557.4 BASAL_AREA< 38.78 SURFACE_EXFRESSION*N,U PROJ_HEIGHT_l>=21.35 4 allsp600=l prob=0.69 SUBZVAR-dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l»77.5 DEAD_VOL_PER_HA_SFPl_125>-4233 MAT>=4.05 Elevation>=722.5 5 allsp600*l prob=0.68 S UBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l >*77.5 DEAD_VOL_PER_HA_SPPl_125>=42.33 MAT< 4.05 8 allsp600*0 prob=0.21 SUBZVAR*dw2/dw3,mc2,mc3,mkl,wkl FROJ_AGE_l>-77.5 DEAD_VOL_PER_HA_SPP1.125>=42.33 MAT>*4.05 Elevation< 722.5 9 allsp600=0 prob=0.20 SUBZVAR=dk,mw DISTANCEnear< 557.4 BASAL_AREA>*38.78 10 allsp600=0 prob=0.17 SUBZVAR=dk,mw DISTANCEnear>=557.4 11 allsp600-0 prob=0.06 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l< 77.5 61 Table 14. Translation of classification tree model for predicting the probability of stocking in conifer-only 600 stems/ha stocking threshold into textual rules. 600 stems/ha (conifers only) ___________________stocked_________ 1 conifer600=l prob=1.00 SUBZVAR=dk,mw DBTANCEneaK 544.3 BASAL_AREA< 39.34 SURFACE_EXPRESSION=M,P 2 conifer600=l prob=0.90 SUBZVAR*dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>-77.5 DEAD_VOL_PER_HA_SPPl_125>-42.33 MAT< 4.05 SPECIES_CD_lneai=AT,EP,FD,FDtS,SX PROJ_HHGHT_l< 19.25 3 conifer600-l prob-0.85 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l >-77 3 DEAD_VOL_PER_HA_SPPl_125< 42.33 SPEQES_CD_lnear=S,SB 4 conifer600=l prob=0.79 SUBZVAR=dk,mw DISTANCEnear< 5443 BASAL_AREA< 39.34 SURFACE_EXPRESSKDN=N,U PROJ_HEIGHT_l >=21.35 5 conifer600”l prob-0.74 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>-77.5 DEAD_VOL_PER_HA_SPPl _125>=42.33 MAT< 4.05 SPEOES_CD_lnear=BLSB,SW,SXW 6 conifer600=l prob=0.69 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>-77.5 DEAD_VOL_PER_HA_SPPl_125>-42.33 MAT>=4.05 Elevation>=722.5 ___________________ not stocked________ 8 conifer600=0 prob=0.40 SUBZVAR-dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>=77.5 DEAD_VOL_PER_HA_SPPl_125>=42.33 MAT< 4.05 SPECIES_CD_lnear=AT,EP,FD,FDI,S,SX FROJ_HElGHT_l>=19.25 DBTANCEnear>-l 82.4 9 conifer600=0 prob=0.36 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_1>-77 3 DEAD_VOL_PER_HA_SPPl_125< 42.33 SPECIES_CD_lnear-AT,EP5W,SX 10 conifer600=0 prob-0.26 SUBZVAR=dk,mw DBTANCEneaK 544.3 BASAL_AREA< 39.34 SURFACE_EXPRESS!ON=N,U PROJ_HEIGHT_l< 21.35 11 conifer600-0 prob=0 21 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl FROJ_AGE_l>=77.5 DEAD_VOL_PER_HA_SPPl_l 25>«42.33 MAT>-4.05 Elevation< 722.5 12 conifer600-0 prob=0.17 SUBZVAR=dk,mw DBTANCEnear>-544.3 13 conifer600=0 prob=0.16 SUBZVAR=dk,mw DBTANCEneaK 5443 BASAL_AREA>-39.34 14 conifer600”0 prob-0.06 SUBZVAR“dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l< 77.5 7 conifer600=l prob=0.63 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l >=77.5 DEAD_VOL_PER_HA_SPPl_125>=42.33 MAT< 4.05 SPECIES_CD_lnear“AT,EP>FD/FDkS,SX PROJ_HEKj HT_1>=19.25 DBTANCEneaK 182.4 62 Table 15. Translation of classification tree model for predicting the probability of stocking in all minimum stocking standards stocking threshold into textual rules. MSSpa (preferred and acceptable) stocked 1 MSSpa-l prob-1.00 SUBZVAR=dk,mw BASAL_AREA< 38.78 DBTANCEnear< 557.4 SURFACE_EXPRESSION=M,P not stocked 8 M5Spa=0 prob-0.30 SUBZVAR-dw2,dw3,mc2,mc3,mkl,wkl FROJ_AGE_l >-79.5 DEAD_VOL_PER_HA_SPPl_125< 42.33 SFECIES_CD_lnear-AT,SW,SX 2 MSSpa-l prob-0.78 SUBZVAR=dk,mw BASAL_AREA< 38.78 DISTANCEnear< 557.4 SURFACE_EXPRESSION-N,U MAP>=510 9 MSSpa-0 prob-0.24 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>-79.5 DEAD_VOL_PER_HA_SPPl_l 25>-42.33 SUBZVAR-dw3,mc2 DEAD_VOL_PER_HA_SPPl_l 25>-l 42.8 UV E_VOL_PER_HA_SPP1_125>=72.22 SPECIES_CD_1near-AT,S ,SB,SX 3 MSSpa-l prob-0.75 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>»79.5 DEAD_VOL_PER_HA_SPPl_125< 42.33 SPECIES_CD_lnear=EP,S,SB 4 MiSpa-1 prob-0.71 SUBZVAR-dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>=79.5 DEAD_VOL_PER_HA_SFPl_125>=42.33 SUBZVAR-dw2,mc3,mkl,wkl 5 MSSpa-l prob-0.68 SUBZVAR=dw2,dw3,mc2,mc3,mkl ,wkl PROJ_AGE_l>-79.5 DEAD_VOL_PER_HA_SPPl_125>-42.33 SUBZVAR=dw3,mc2 DEAD_VOL_PER_HA_SPPl_125< 142.8 6 MSSpa-l prob-0.59 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>-79S DEAD_VOL_PER_HA_SPPl _ 125>=42.33 SUBZVAR=dw3,mc2 DEAD_VOL_PER_HA_SPPl_125>-142.8 LTVE_VOL_PER_HA_SPP1_125>=72.22 SPECIES_CD_lnear-EP,FD,SW FROJ_HEIGHT_l>=24.75 7 MSSpa-l prob-0.58 SUBZVAR-dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l >-79.5 DEAD_VOL_PER_HA_SPPl_125>=42.33 SUBZVAR-dw3,mc2 DEAD_VOL_PER_UA_SPPl_125>-142.8 UVE_VOL_PER_HA_SPPl_125< 7222 10 MSSpa-0 prob-0.21 SUBZVAR=dk,mw BASAL_AREA< 38.78 DISTANCEnear< 557.4 SURFACE_EXFRESSION=N,U MAP< 510 11 MSSpa-0 prob-0.16 SUBZVAR=dk,mw BASAL_AREA< 38.78 DISTANCEnear>=557.4 12 MSSpa-0 prob=0.12 SUBZVAR=dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l>=79.5 DEAD_VOL_PER_HA_SPPl_125>-42.33 SUBZVAR-dw3,mc2 DEAD_VOL_PER_HA_SPPl_125>=142.8 UVEJVOL_PER_HA_SFPl_125>=72.22 SPECIES_CD_lnear-EP,FD,SW PROJ_HEIGHT_l < 24.75 13 MSSpa=0 prob=0.12 SUBZVAR-dk,mw BASAL_AREA>=38.78 14 MSSpa-0 prob=0.05 SUB2TVAR-dw2,dw3,mc2,mc3,mkl,wkl PROJ_AGE_l< 79.5 63 964 plots remaining in the predictive solution at this point of die tree are correct with 85% prediction or 0.85 probability. Predictive Rule (Rank 2) probability = 0.85 biogeoclimatic unit = dw2, dw3, mc2, mc3, m kl, or w k l stand age >78 yrs stand estimated dead volume <42 m 3/ha species of the nearest non-pine seed source = Spruce The ROC evaluations for all three stocking groups indicate that m y predictive model is a useful application, as all the ROC scores are above 0.83 (0.5-0.7 are considered low accuracy, 0.7-0.9 are considered useful, and ROC values > 0.9 indicate high accuracy) (Manel et al., 2001). The area under the ROC curve (AUC) represents the probability that the model will rank a randomly chosen positive instance (true positive) higher than a random ly chosen negative one (false positive). The AUC scores are 84%, 83%, and 85% for all tree species, conifers only, and MSSpa respectively (Figures 19,20, and 21). Applying the Classification Tree in a GIS M odel Maps illustrating the probability of stocking were generated for each of the three stocking groups per 1:250,000 m ap tile. A full series of maps, depicting the probability of stocking themed by five probability classes, for the 1:250,000 m ap tiles covering the study area are presented in Appendix 2. 64 ROC for allsp600 = 1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 1.0 Figure 19. ROC curve chart illustrating the accuracy of the classification tree model (all tree species 600 stems/ha threshold) through area under curve (AUC) statistic. ROC for conifer600 = 1 F a s t Positive Rate Figure 20. ROC curve chart illustrating the accuracy of the classification tree model (conifer-only threshold) through area under curve (AUC) statistic. 65 ROC for M SS d s = 1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rats 0.7 0.8 0.9 1.0 Figure 21. ROC curve chart illustrating the accuracy of the classification tree model (minimum stocking standard threshold) through area under curve (AUC) statistic. The probability of stocking tables indicate that approximately 63% of the study area (when m easured by a stocking threshold of 600 stems/ha for all tree species) is likely to very likely stocked; approximately 60% of the study area (when stocking is m easured by a 600 stems/ha threshold for conifers only) is likely to very likely stocked; and approximately 44% of the study area (when stocking is measured by MSSpa) is likely to very likely stocked (Tables 16, 17, and 18). Translated into area, this m eans that the study area (comprised of 1.4 million ha of m ature pine stands) can be expected to have up to 616,000 ha of stocked w ith seedlings and saplings (measured by MSSpa), 833,000 ha of stocked w ith seedlings and saplings (measured by 600 stems/ha all tree species), or 878,000 ha of stocked with seedlings and saplings (measured by600 stems/ha conifers only). 66 Table 16. Probability (as a percentage) of being stocked for each BEC unit derived from the classification tree rule set for conifer only 600 stems/ha stocking threshold. BEC unit M ature Pine (ha) 0-20 Very Likely Not Stocked Probability of Being Stocked (as percentage) 60.1-80 20.1-40 40.1-60 Likely Likely As Likely Not Stocked Stocked Stocked As Not 80.1-100 Very Likely Stocked SBSdk all trees SBSdk conifers SBSdkMSSpa 282539 40 40 46 16 18 14 0 0 6 5 5 21 39 37 12 SBSdw2 all trees SBSdw2 conifers SBSdw2 MSSpa 80876 18 17 36 15 13 13 7 0 10 36 49 21 23 21 20 SBSdw3 all trees SBSdw3 conifers SBSdw3 MSSpa 291425 15 15 32 15 13 17 9 0 7 32 47 22 29 25 22 SBSmc2 all trees SBSmc2 conifers SBSmc2 MSSpa 479996 16 26 34 15 11 10 11 0 9 27 37 21 31 27 26 SBSmc3 all trees SBSmc3 conifers SBSmc3 MSSpa 77284 15 41 39 7 5 11 9 0 12 31 26 16 38 28 22 SBSmkl all trees SBSmkl conifers SBSmkl MSSpa 259730 14 14 31 6 11 13 5 0 3 37 37 15 38 38 38 SBSmw all trees SBSmw conifers SBSmw MSSpa 5283 71 70 92 13 17 0 0 0 0 3 3 6 13 9 2 SBSwkl all trees SBSwkl conifers SBSwkl MSSpa 22014 18 14 23 3 12 29 8 0 1 28 36 29 44 38 17 Study Area all trees Study Area conifers Study Area MSSpa 1499147 20 25 36 13 12 13 7 0 7 26 33 20 33 30 24 67 Table 17. Probability (as a percentage) of being stocked for each 1:250,000 NTS maptile derived from the classification tree rule set for conifer only 600 stems/ha stocking threshold. NTS 1:250,000 Maptfle Mature Pine (ha) 0-20 Very Likely Not Stocked 20 35 33 Probability of Being Stocked (as percentage) 20.1-40 40.1-60 60.1-80 Likely As Likely Likely Not Stocked Stocked As Not Stocked 27 16 11 11 0 34 9 12 11 80.1-100 Very Likely Stocked 27 20 36 93E all trees 93E conifers 93E MSSpa 161479 93F all trees 93F conifers 93F MSSpa 360025 29 38 41 14 13 9 4 0 10 19 16 24 34 33 16 93G all trees 93G conifers 93G MSSpa 220916 21 16 35 14 15 13 7 0 8 33 48 24 25 20 20 93J all trees 93J conifers 93J MSSpa 278669 13 14 29 9 12 18 6 0 4 35 38 13 36 36 36 93K aH trees 93K conifers 93K MSSpa 305215 16 20 36 14 10 11 9 0 6 26 36 26 35 34 20 93L an trees 93L conifers 93L MSSpa 172843 20 23 39 13 15 16 8 0 5 17 34 16 42 29 24 68 Table 18. Probability (as a percentage) of being stocked for each forest district derived from the classification tree rule set for conifer only 600 stems/ha stocking threshold. Forest District Mature Pine (ha) Probability of being stocked (as percentage) 20.1-40 40.1-60 60.1-80 likely as likely likely not stocked stocked stocked as not 3 14 13 33 6 0 18 3 34 0-20 very likely not stocked 34 23 28 80.1-100 very likely stocked 37 38 18 Skeena Stikine all trees Skeena Stikine conifers Skeena Stikine MSSpa 21505 Mackenzie all trees Mackenzi conifers Mackenzi MSSpa 6769 6 9 11 23 11 12 9 0 1 21 48 63 41 32 14 Fort St. James all trees Fort St. James conifers Fort St. James MSSpa 219378 13 16 40 5 10 8 6 0 4 38 40 20 38 33 29 Nadina all trees Nadina conifers Nadina MSSpa 554084 26 33 42 14 13 12 7 0 7 20 27 17 33 27 22 Prince George all trees Prince George conifers Prince George MSSpa 285404 19 14 31 8 13 16 6 0 3 36 44 17 30 29 33 Vanderhoof all trees Vanderhoof conifers Vanderhoof MSSpa 376145 17 26 30 20 13 13 7 0 12 21 28 27 34 33 18 Quesnel all trees Quesnel conifers Quesnel MSSpa 22746 10 39 38 19 13 19 18 0 15 16 31 5 37 17 23 North Island-Central Coast all trees North Island-Central Coast conifers North Island-Central Coast MSSpa 13115 9 47 28 18 5 1 40 0 24 18 30 23 14 19 23 69 It is important to note that although study plots were located in the Skeena Stakine, Mackenzie, and North Island - Central Coast forest districts, only a fraction of these three forest districts intersected with the study area of interest (Figure 22). Caution should be used when drawing any conclusions involving the probability of stocking within a forest district context. Mackenzie _(0.6%)_ Figure 22. Map illustrating the proportion of forest districts that are within the study area (NTS 1:250,000 93E,F,G,J,K,L). Figure 23 is an example of a 1:250,000 m ap predicting the probability of MSSpa stocking in m ature pine-leading stands for m ap tile NTS 93G. The five colours in the legend correspond to a likelihood of stocking (red=0-20%, orange=2040%, yellow=40-60%, light green=60-80%, and dark green=80-100%). This m ap and those in Appendix 2 also have a shaded relief in the background for elevation context, major lakes (blue polygons), and major roads (symbolized by black lines). 70 Predicted Locations a n d Stocking Probability of A dvance R eg en eratio n U nder M ature Pine S ta n d s in Central British Colum bia 93G Probability of being Stocked 0- 20% 20-40% 140-60% I 60-80% 80-100% 600 stems/ha conifers only tit ffU 0 5 10 20 km 1 . . ■ i____ i Figure 23. Colour themed map depicting probability of being stocked for conifer only 600 stems/ha stocking threshold within NTS 1:250,000 map tile 93G. Once the classification tree results are in a geospatial environment, specific probability of stocking percentages can be displayed in isolation of the rem aining probability classes and can be provided spatial context. By placing large areas of "very likely" and "very unlikely" stocking in a spatial context, the data can be used as a GIS layer for more sophisticated analysis regarding forest management. For example, Figure 24 shows the isolation of areas that have a >80% probability of stocking. These isolated areas can be easily identified as cells that are very likely to recover naturally without intervention. Conversely, large areas can also be identified as priorities for salvage or rehabilitation operations. Figure 25 shows that by isolating areas that have a <20% probability of stocking, areas that may require rehabilitation (including planting) can be targeted. The isolated areas can be combined with datasets such as proximity to mill sites to determine economic viability. Further, they may be targeted for rehabilitation, i.e., treatm ent to knock dow n and pile trees, grinding them for pellet fuels or other bioenergy (depending on markets) or burned, and then planted. The large contiguous areas classified as very likely stocked (green pixels in Figure 24) become an im portant GIS layer with regards to coordinating logging activity. Perhaps these stands could be allocated as no logging or rehabilitation zones. It is recommended that these stands be identified for natural (unaided) recovery. They m ay be important as a m id-term supply of timber, or for habitat value. P redicted L ocations an d Stocking Probability of A dvance R egeneration U nder M ature Pine S ta n d s in C entral British Colum bia 93G Probability of being Stocked 80- 100 % >600 stems/ha conifers only Jt £ MJ? **** 0 5 10 Figure 24. Spatial extent of probability of stocking for map tile NTS 93G. Green areas denote >80% probability of stocking (conifers-only). 20 km Predicted L ocations an d Stocking Probability of A dvance R egeneration U nder M ature Pine S ta n d s in C entral British Colum bia * 4 93G Probability of being Stocked 0 - 20 % >600 stem i/ha conifer* only / 5 10 Figure 25. Spatial extent of probability for map tile NTS 93G. Red areas denote <20% probability of stocking (conifer-only). 20km C hapter Four: Discussion and Recom m endations The main objective of my thesis was to develop a model predicting the distribution of advance regeneration using publicly available data. M any of the initial inputs to my model have been explored for their predictive strength in previous studies. The resultant m odel was applied within a study area (NTS 1:250,000 93E, F, G, J, K, L m ap tiles) and probability m aps were constructed (Appendix 2). As per the Establishment to Free Growing Guidebook (BC Ministry of Forests, 2000), MSSpa criteria and thresholds vary w ith BEC unit and site series. These conclusions are consistent w ith several current advance regeneration publications such as Vyse et al. (2009) w ho found that m ore than half of all plots surveyed in lodgepole pine stands in the Kamloops Timber Supply A rea exceeded a threshold of 600 stems/ha. This is echoed in a study by N igh et al. (2008), which states that over half the stands sam pled in the Montane Spruce zone of southern British Columbia had enough advance regeneration (>1000 stems/ha) to form new stands of adequate density. Another study reports 44% to 98% stands contained sufficient stems after MPB attack to be considered stocked (Hawkins et al., 2012). A survey of pre-harvest industrial records has indicated that the percentage of pine stands with a greater than 600 stems/ha are highest in the moist cool (mk) subzone followed by the moist cold (me) subzone, and then by the dry warm (dw) and dry cool (dk) subzones of the SBS (Burton, 2006). Studies also suggest that SBSdk is 75 unlikely to provide a significant contribution to the m id-term timber supply; in contrast, the SBSdw and SBSmc are thought to be contributors to both the m id- and long-term supplies (Hawkins and Rakochy, 2007). The overall distribution and probability of stocking predicted here aligns well w ith these conclusions. A closer examination of the resultant data (tabular and spatial) provides evidence to this claim. The SBSmk had the largest likely/very likely stocked probability (as high as 75% for the all trees stocking group) for all BEC units. The results also support the assessment by Hawkins and Rakochy (2007) that SBSdw and SBS me w ill likely be the largest contributor to mid- and long-term timber supplies. In the conifer only stocking group, SBSdw has a 71.2% likely/very likely stocked probability (-208,000 ha) and SBSmc2 and SBSmc3 have a 63.3% (-304,000 ha) and 53.8% (-42,000 ha) likely/very likely stocked probability, respectively. The total area associated w ith these probabilities is - 554,000 ha of potential advanced regeneration >600 stems/ha or roughly one-third of the existing total ha of the study area. A closer examination of the colour-themed m aps reveals the utility of translating the tree/text based rules into a geospatial output. The m aps present the probability of stocking in five coloured classes, and because the scale of the m ap is known, an ocular estimate of the area of contiguous areas or distance betw een proximal areas for any of the probability classes can be m ade. For example, large areas of likely/very likely stocked polygons that are interrupted by sm aller areas 76 very unlikely to be stocked (Figure 26) can be identified and subsequently visited and assessed using ground-based advance regeneration grid sampling. Probability of Stocking ■ B o-20% 20-40% d ] 40-60% 60-80% I^Heo-100% Figure 26. A portion of the mapped final model that illustrates the intersection of large very likely stocked areas (green pixels) with very unlikely stocked cells (red pixels). The results generated from the classification tree model also support the preliminary results in Hawkins and Rakochy (2007), where it is reported that there was measurably less regeneration in the SBSdk and SBSdw2 than in the SBSdw3 and SBSmc. My classification tree model show s that the predicted probability of the MSSpa group being stocked in SBSdk (12%) is close to half of the probability in 77 SBSdw (25%) and SBSmc (22%). Vyse et al. (2009) also state that the m ean density of stems increased w ith moisture and elevation, both critical elements in determ ining the biogeoclimatic subzone of a site. This generalization is further supported by my findings, as mean annual precipitation and BEC unit are defined as im portant variables in the resulting predictor model. The resultant model indicated that the following variables are key predictors in m odelling stocking status: BEC unit, distance to the nearest non-pine seed source in a southwest direction, projected age of the leading species, basal area, leading species of dead volume per hectare at 12.5 cm, surface expression, spedes composition of the nearest non-pine seeds source, mean annual temperature, m ean annual precipitation, projected height of the leading spedes, and elevation. The key variables selected through recursive partitioning are consistent with other study condusions, specifically w ith regards to overstory height (Griesbauer and Green, 2006), overstory mortality (leading spedes dead volume; Lewis, 2011), basal area (Coates and Sachs, 2012; Nigh et al., 2008), distance to nearest seed sources (LePage et al., 2000; Kaufmann et al., 2008), and moisture (predpitation; Kayes and Tinker, 2012). Application to Forest M anagem ent A landscape-level planning tool is useful for several different aspects of forest planning and management. At the provindal level, this study's m odel can assist in the implementation of the Forests for Tomorrow Current Reforestation and Timber 78 Supply Mitigation Strategic Plan 2011-2015 (Ministry of Forests, Lands, and N atural Resource Operations, 2011) by providing critical data to m eet several goals. First and foremost, the resultant of my m odel (hardcopy m aps and textual rule sets) can assist forest managers in the establishment of new and up-to-date assessments of potential timber supply. Tools such as this can play a p art in helping estimate the new baseline that will be used for forecasting and planning. The need for current information is echoed in a June 8,2012, memo from the Inventory Section, Forest Analysis and Inventory Branch: Approach of the inventory program in 2012-13 to improve inventory information in MPB-affected management units (Ministry of Forests, 2012). The working memo states that the provincial inventory program is placing a high priority on improving information related MPB-affected areas. The memo specifically states that the advance regeneration in the understory is critical to mid-term timber supply and that the assessment of stocking under these MPBaffected stands is important. The model developed here can play a large role in assisting inventory personnel "short-list" stands that may potentially have this critical mid-term timber supply. The scale of the output is not necessarily fine enough to generate reliable determinations of stand-level stocking, b u t it is highly useful for directing field validation programs. O n a landscape level, the resultant stocking data can be used for more general land-use and resource m anagem ent 79 plans. By examining the data, areas of sporadic distribution, large anomalies, and "salt and pepper" occurrences can be identified (Figure 27). Probability of Stocking |HM0-20% I H xmok I 140-60% meo-ao% 80-100% Figure 27. Landscape view of mapped model output. Blue circles indicate large anomalies that are easily identified at a broad scale. The tabular results from exercising the classification tree have generated important information about how m any hectares of likely/very likely to be stocked there are within a defined area. They are, however, limited in their explanatory power, as there is a distinct difference in know ing how m any hectares of >80% (very likely to be stocked) probability of stocking and w here these very likely to be 80 stocked areas are on the landscape. Importing the tabular stocking inform ation into a GIS allowed me to spatially locate how the five stocking classes are distributed across the landscape. The ability to spatially locate each of the probability of stocking classes enhances the decision making pow er of m y decision model. Consider the following example, where a forest m anager w ants to understand the advance regeneration attributes of a particular m anagem ent area (Figure 28). Knowing the total hectares of the m anagem ent area (65180 ha) and total hectares of m ature pine within the management area (29815 ha), the forest m anager is now in a position to estimate the num ber of hectares of mature pine that could possibly have advance regeneration. According to the areas reported, the am ount of advance regeneration cannot exceed 45% of the m anagem ent area (or ~ 29,000 ha) as there are only 29,000 ha of m ature pine within the area. Assume further that a classification tree model has been used to calculate the probabilities of stocking as per the following: 1) 4858 ha are very unlikely to be stocked; 2) 3210 ha are unlikely to be stocked; 3) 0 ha are as equally likely to be stocked as no t stocked; 4) 10280 ha are likely to be stocked; 5) 11467 ha are very likely to be stocked. The detailed information regarding areas of stocking likelihood provides the forest manager with valuable information regarding the amount of potential 81 Figure 28. Example of a management area of interest in which a forest manager would like to calculate the amount of advance regeneration under mature pine forest stands by using a probability of stocking model. The grey polygons indicate mature pine forest stands. advance regeneration in the management area. W ithout spatial context, however, the planner can only know the probability of stocking w ith regards to advance regeneration. An assumption regarding the spatial distribution of the likely or unlikely stocked areas cannot be m ade w ith the information in hand, i.e., the manager can only know how m uch stocking is in the management area b u t no t if the stocking is restricted to one particular area or widely dispersed throughout the area. By providing a geospatial context to the probability information, a m uch clearer and complete picture is provided w ith regards to stocking (Figure 29). Patterns of large contiguous areas of very unlikely and likely stocking become evident. Large patches can be quickly identified and used to supplement landscape level plans. This information can be used as a landscape level planning tool for activities such as annual allowable cut (AAC) allocation, post-MPB m anagem ent planning, and even identification of mid- and long-term timber supply stands suitable for future research projects. In a VRI newsletter, Martin (2012) identifies one of the goals for inventory information is to know more about stocking by gathering information regarding small trees under a dead overstory. The prim ary goal of the BC Government's "Forest for Tomorrow" program, namely to im prove the mid- and long-term timber supply and establish resilient forest ecosystems, specifies the need to focus on restoration and reforestation through the identification of sites which have had the greatest negative impact on the m id- and 83 Probability of Stocking mm0-20% mm 20- 40% [ 140-60% m 60-80% ■ ■ 80-100% Figure 29. Example of a BEC area of interest in which a forest manager would like to calculate the probability of stocking. Probability of stocking is depicted in the five coloured classes. Spatial patterns of stocking are evident when viewing data in a geospatial context long-term timber supplies. To effectively mitigate m id-term timber supply shortfalls, it is imperative to have an overarching plan that helps differentiate between stands that have sufficient advance regeneration (and therefore represent the mid-term supply) and stands that will not be contributing to the m id- to long­ term timber supply. Stands that do not have sufficient advance regeneration can be identified through the use of a landscape level model and then targeted for harvesting and subsequent replanting. Alternatively, a working know ledge of the percent of m ature pine without adequate advance regeneration m ay be useful for old forest retention in the quest for biodiversity conservation and preservation of critical wildlife habitat. Large areas of m ature pine that are classified as very unlikely to be stocked pose no benefit to the mid- and long-term tim ber supply. These areas can be prioritized for logging efforts; effectively removing post-MPB stands from the inventory w ithout removing any potential mid- and long-term timber represented as advance regeneration, especially large areas classified as very unlikely to be stocked (Figure 30). Roads (red lines with black borders in Figure 30) and existing cutblocks (thick black lines) are added to the maps to provide context for accessibility. The red pixels represent areas very unlikely to be stocked areas. Goal three in "Forest for Tomorrow's" strategic plan is to develop and implement innovative approaches to reforesting forests dam aged by catastrophic disturbance. One of the strategies to achieve this is to engage stakeholders in the developm ent of 85 a strategic and tactical plan. W hether the stakeholders are the Province, the licensees / stewards of the forest, non-traditional users of the forest, or the public, strategic plans m ust begin w ith the m ost current and accurate knowledge possible. The predictive m odel can help provide a region-wide overview of w here the highest potential for mid- and long-term timber supply is located. A GIS-ready dataset of this scale can be used as an informative base layer for refining existing land use plans. Probability of Stocking H 0-20% H I 20-40% 1....... 1 40-60% H I 60-80% ■ ■ 80-100% Figure 30. A portion of the final model map that illustrates the location of large, very unlikely to be stocked mature pine areas (red pixels). These areas can be targeted for large scale logging and silviculture programs. 86 Griesbauer and Green (2006) identify two potentially dangerous and expensive scenarios associated w ith the perceived need to tend advance regeneration: supplementing advance regeneration through planting of unstocked patches and the requirement for thinning of overstocked understories. Both present a potential danger in the form of falling MPB-killed snags and would require the removal of snags for safety reasons. For the planting scenario, Griesbauer and Green (2006) suggest restricting planting only to highly productive sites that have little advance regeneration. The predictive model outputs can be used as a GIS overlay to help forest managers find the intersection betw een highly productive sites (site index layer) and probability of stocking (generated by the classification tree model). This model may help to economically identify areas that could have regeneration augmented through planting. Conversely, stands predicted to be stocked on highly productive sites m ay also be identified and flagged for field inspection. Assessment of the M odelling Approach R (R Core Team, 2012), DTREG (Sherrod, 2006), and ArcGIS (ESRI, 2011) were used to establish an intuitive workflow to create fully attributed feature classes using publicly available digital data input. The resulting feature classes have correct projection and coordinate systems, are topologically clean, fully attributed, and are GIS-ready for input as a working dataset. The working parts of the m odel are 87 extensive yet invisible to the user. The users cannot add more tools or explanatory variables to the model w ithout building a new model, as this would jeopardize the integrity of the classification tree solution. To create a stocked / not stocked m ap for another study area or for pine stands that do not fit the m ature pine criteria, the models w ould have to be recreated and re-run in both R and ArcGIS. This working model is interpolative, in that its training data were sparsely distributed and it cannot make predictions w ith regards to advance regeneration stocking beyond the limits of data used to create the model. In the case of my model, the limitations lie more in the organization and definition of variables, rather than quantitative maximum and m inim um values. It w ould be misleading to claim that the model developed in this study is unique. Other software packages have been used to predict natural regeneration under mountain pine beetle attacked stands, m ost notably SORTIE and PROGNOSIS80 (Smith, 1990; Ferguson and Carlson, 1993; Ribbens, 1994; Sattler, 2009). The distinction, however, is not in the intended application of the m odel b u t rather the scale and ownership of the m odel input variables and the ultim ate utility of the model output. Although those are mechanistic simulation models, and the work reported here resulted in an empirical statistical model, perhaps the key distinction is the emphasis on the ability to map predicted stocking across a broad 88 region. My study input data are derived solely from publicly available geospatial data that can be downloaded from provincial online repositories or requested from BC government agencies such as British Columbia's Land and Resource Data Warehouse (LRDW) and its Integrated Land Management Bureau (ILMB). Most forest planners and managers with access to the internet or an internal data warehouse have access to my m odel's inputs. The model does not require additional field data collection on the part of the analyst. SORTIE, a spatially explicit, mixed species forest dynamics simulator, relies heavily on the complex field-based m easurem ents of light transmitted through forest canopies and gaps (Canham et al., 1999). PROGNOSIS80, a grow th and yield model, is fueled by tree counts w ithin a stand and dbh measurements for each tree and requires one record of data per tree in the input .txt files. Although PROGNOSIS80 and SORTIE-ND modelling may be superior in their ecological portrayal of the mechanisms of stand development, they require considerable detail for input data that limits their application to case study simulations in a few representative stand types. LeMay et al. (2002) used PROGNOSIS 80 to examine the natural regeneration beneath complex stands in the Interior Douglas-Fir biogeoclimatic zone in the Nelson, Kamloops, and Cariboo Forest Regions. Their final report states that PROGNOSIS80 is best suited for 89 models with a ground-based inventory. Similarly, the data needed to feed a SORTIE-ND and PROGNOSIS80 hybrid m odel for predicting natural regeneration in MPB-attacked stands in central and southeastern BC necessitated the collection of ground-based measurements such as individual tree dbh, total tree height, height to live crown, maximum crown diameter, and ratio of live crown to tree height (Sattler, 2009). In contrast, the landscape-level geospatial approach developed here can be applied across a large range of geographic variability and forest stand types w ith a simple probability of occurrence output. The model uses coarsely collected variables for its input (a majority derived from Vegetation Resource Inventory GIS files), and consequently sacrifices individual stand anomalies for a broad-brush overview of all stands within a study area of interest. This empirical m odel does not simulate or otherwise address specific understory dynamics (e.g., in term s of growth release, competition among trees or with brush). The ultimate intention of the model is to create the most parsimonious algorithm derived from the most publicly available data. The more complicated the model becomes, the m ore limited its function. The process of creating predictive models for the field of ecology is highly contentious. At times, models are built w ith good intentions only to fall into disuse, 90 or worse, misuse (Bunnell, 1989). There are multiple ways to look at m odelling ecological processes, and therefore there are m ultiple ways to build a model. The inherent strength of my decision tree m odel is that it accommodates the possibility of multiple models by creating multi-branches of rules that all represent a possible predictive algorithm, with each branch assigned a probability of accuracy. A single regression model could provide a single parsim onious equation that draw s from the most significant variables as they apply to the dataset used to create the model. This, however, "locks" the model into a coefficient + variablei + variable 2 + variablen format, resulting in a single equation to fit the snapshot of data used to build it. With a static list of key predictor variables (and coefficients associated w ith these variables) the model weakens w hen key variables are missing from subsequent datasets. Through concepts such as surrogate splitters, the decision tree m odel can adapt to missing input data and can result in m any rule sets with m any combinations of contingency that result in a stand, or indeed parts of a forest stand, being stocked with advance regeneration or not. The forest response to MPB or other canopy disturbances is complex, and therefore cannot be easily m odeled w ith a simple catch-all equation. Many stands sharing the same characteristics m ay respond differently to disturbance. A m odel that allows for combinations of contingency provides a more multi-faceted approach to predicting the probability of stocking. 91 With each statistical m ethod used to create models, come m ultiple assumptions, limitations, and biases. The tem ptation to explore more complex interactions between variables, and therefore create more complex m odels is fueled by the ease of data accessibility and increased computing power. The m ore complex the model, the more difficult it becomes to evaluate (Bunnell, 1989; Kimmins, 2005). Just because a complex model can be built, doesn't necessarily mean it should be built. It is important to recognize that even the m ost inappropriate and unrelated variables thrown together into a modelling process may result in a predictive algorithm. Like parametric models, the addition of variables (overfitting) tends to increase the model fit. Variables, however, that have little connection to each other or the ecological process that they are trying to model tend to create a "snapshot" model. A model that fits only the dataset from which it was derived therefore defeats the predictive purpose of the model. The reliance on predictor variables that were derived from static snapshot datasets run into problems when validation is required and in most cases the model misfires (Thompson, 1995; Graham, 2001; Knapp and Sawilowsky, 2001). The variables used in the development of my m odel were based on known or hypothesized ecological mechanisms and results from preliminary studies. 92 A second consideration in developing predictive models is the tem ptation to prioritize all the variables determ ined from a purely a priori approach. This is best described by Thompson's basketball team paradox. Thompson (1995) suggests that modelling is similar to a basketball team picking the best player first, the secondbest player second (in the context of the first player's strengths and weaknesses), etc.. This is in direct contrast to the "all possible players'" approach used in constructing a second team, where the five players that play together best as a team are selected. This leaves the distinct possibility that the second team m ay have a roster that does not include one of the first team players. H e goes further to state that the "best team," although possibly comprised of w eaker players, m ay still be stronger as a team, than one made of all-stars selected through a purely linear process (Thompson, 1995). It was im portant in the development of this study's model that all combinations of variables were tested to ensure that the "best team " was selected. This can be illustrated in my thesis by examining the potential key variable distance from nearest non-pine seed source. The r2 value of density of advance regeneration when plotted against distance to seed source was only 0.0064, (P = 0.007, n = 964). This means that as a single variable considered alone, distance to seed source explains less than 1% of the variance in regeneration density. Yet it was one of the top five important variables, i.e., factor w here a split in the decision tree occurred, for all three stocking groups (and one of the top two for conifer only 93 and all trees, second only to BEC unit in both cases). W hen taken into account w ith other predictors, the strength of distance to seed source increases and its im portance as a primary splitter in the decision tree becomes evident. The goal of predictive ecological models should be intuitiveness, parsim ony and extensibility, namely the ability to add on or modify the model w ithout breaking w hat is already there. The final key im portant variables included in my study's model are ecologically sound and publicly available data that form an intuitive parsimonious model that has both explanatory and predictive value. This study began with two main objectives: to create an accurate predictive model of understory stocking probability in m ature pine-leading stands for forest planners to use in mid- to long-term planning initiatives, and to create cartographic and tabular output portraying the results of the predictive model. The first objective was accomplished by collecting advance regeneration data in the sum m ers of 2006 and 2007 and combining it with collaborator data to produce an extensive, geo­ referenced data library. The second objective was achieved by creating a recursively partitioned classification tree m odel using R rpart. The m odel input variables were limited to readily and publicly available data sources. The final m odel was composed of data from the Vegetation Resource Inventory, ClimateBC, and the biogeoclimatic ecosystem classification system The resulting model achieved a 78% accuracy in correctly predicting stocked and non-stocked locations in m ature 94 pine-leading SBS forests. For the final objective, the predictive model generated using the R package rpart was applied against a study area by creating a geospatial model in ArcGIS 10. The geospatial model partitions the data into the following probability of being stocked classes: 0-20%, 20-40%, 40-60%, 60-80%, and 80-100%. Area statistics were gathered for each membership class and a large-format colour themed m ap was generated to display the results for each NTS 1:250,000 m ap tile. The variables used in the most parsim onious predictive model are consistent w ith those reported as ecologically im portant in a num ber of similar studies (e.g., Ferguson, 1984; M urphy et al., 1998; Sattler, 2009; Coates et al., 2009). With BC's interior tim ber supply compromised by one of the largest disturbances in history, forest m anagers and planners are implementing strategies to help understand the mid- and long-term implications. These implications reach further than timber supply and will directly impact wildlife, carbon storage, hydrology, and tourism (Coates et al., 2009). Most of these strategies require current and accurate knowledge of the forest land base in the wake of the m ountain pine beetle outbreak. The need for inventory data, not only what is in the canopy but also what is beneath it, is obvious. Hopefully, the provincial governm ent is paying close attention to new inventories, or their proxy in the form of modelling output, before it proceeds w ith new harvesting, rehabilitation, and silviculture regulations. 95 Chapter Five: Conclusions The results of m y thesis show that one way to predict the presence or absence of advance regeneration beneath m ature pine stands in BC's northern interior is to create a model that works w ith the complex scenarios supportive of understory development, rather than trying to simplify them. The response of m ature pine to the MPB events of the last decade is complicated, and therefore requires a multiple model approach to prediction. The classification decision tree developed in this thesis provides a multiple branch m odel that proved to be 78% accurate in predicting whether a stand was stocked or not w ith seedlings and saplings (>600 stems/ha). Further, a geospatial link between the predictive model and GIS provides an enhanced understanding of the prediction, assisting forest m anagers to discover spatial patterns and the clustering or dispersion of seedling and sapling stocking at a landscape level. The work presented in my thesis contributes to the ongoing effort to better understand British Columbia's forests post-MPB. Several studies have examined the key variables involved in predicting advance regeneration at a stand level. The model developed in this thesis fills a gap in knowledge w ith regards to understanding the likely distribution of understory stocking at a landscape level. An accurate broad-brush tool is necessary in assisting w ith broad-brush management strategies. My predictive model is built to help forest m anagers make 96 decisions at the landscape level. Although m y m odel o utput generates a resultant probability of stocked pixels at a 100m size, it is the patterns of aggregated pixels that contain the information and not the single pixels themselves. The results of this thesis also contribute to the scientific community w ith the introduction of recursive partition analysis for stocking presence evaluation. Although not a new technique, it is presented here as a viable alternative to the m ore traditional step-wise regression statistical techniques. A second contribution, and an area for future research, is the integration of recursive partitioning and GIS. By providing geospatial awareness to predicted stocking probabilities across a landscape-wide area of interest, patterns previously unseen at a local scale may now emerge and potentially drive new research. By using the outputs of my model, forest planners may also have the information needed to augment silviculture strategies such as variable-retention harvesting. Variable-retention harvest systems have been developed as one m ethod for creating more natural stand structures, particularly in forest types that rarely experience stand-replacing natural disturbances such as severe forest fires. Variable retention harvest systems retain living and dead structural elements of the p re­ harvest stand to restore structural complexity in m anaged forestlands (Franklin et al., 1997). The relative novelty of formal variable-retention harvest systems, however, leaves many questions about their impacts on stand development 97 unanswered. Simple questions such as how different patterns of overstory retention influence the development of regeneration and grow th in residual trees cannot be readily answered for most forest types (Gordon, 1973). M ore importantly, we have only anecdotal evidence gathered from studies of tree responses to natural disturbances or more traditional m anagem ent practices to link regeneration and residual tree dynamics following variable retention harvesting to the underlying physiological mechanisms that drive these responses. Overstory density reductions and canopy gap formation alter resource availability, which should im pact tree physiological performance and growth. The planning of salvage logging operations after MPB, whether by clearcut or variable-retention harvesting, is further complicated by the presence of tree species other than lodgepole pine w ithin the impacted stands. A significant percentage of the pine stands contain other coniferous tree species in both the overstory and the understory (Coates et al., 2009). Incomplete forest inventories, coupled with the w idespread use of unsuitable growth and yield models, appear insufficient to account for these stand and understory variations. Therefore, there exists a need to develop strategies for information gathering to augment these models and to subsequently aid in the strategic planning of salvage and regeneration responses. My model's ability to predict the probability of stocking can aid in the adoption of forestry techniques such as variable-retention. The m odel's efficiency and efficacy, however, could be significantly supplemented with a broader geographic (i.e., biogeoclimatic) range of input data and the addition of several unused variables, such as prevailing w ind speed and direction. The predictive model is interpolative and therefore can only be used to predict within the limits of the input data. In this study's case, only plot data from the SBS dk, dw2, dw3, mc2, mc3, m kl, mw, and w k l were used to develop the model. Further study incorporating a more diverse sample of biogeoclimatic zones (or perhaps using tem perature and precipitation relationships to extrapolate the equivalence of subzones not sampled) is suggested to enhance the m odel from a central British Columbia based model to a province-wide model. A lthough climate data were selected by the model construction algorithm, w ind direction and speed data were not considered in this model iteration. W ind data can be generated in a GIS environment; I did not, however, have access to a landscape level w ind dataset at the time of this study. Research has indicated that prevailing w ind direction and speeds clearly play a role in the recruitm ent of seeds that contribute to the advance regeneration (Smidt and Blinn, 1995; Pardy, 1997; Wieland et al., 2011). In addition to the availability and favourability of substrate, abundance and proxim ity to seed sources (such as parent trees) are considered to be key variables in advance regeneration (Astrup et al., 2008). Given that a majority of the m ature pine understories sampled in my study area were populated w ith species other than 99 pine, advance regeneration establishment m ust have been aided by some form of seed recruitment mechanism such as wind. Incorporating the wind data into the existing model as an extensible component seems to be a logical extension to this project. This study's predictive m odel provides forest planners with a unique perspective on advance regeneration inventory. By looking at the landscape as a series of predicted probabilities, detailed inventory work and the continuation of stand-level advance regeneration studies can be focused. Finally, by combining w hat is known about understories from stand-level advance regeneration studies w ith a landscape level probability model, forest planners can provide legislators and the public with varying scales of m anagem ent plans and proactive forest health mitigation strategies. 100 References Acuna, E., and C.A. Rodriguez. 2004. Meta analysis study of outlier detection methods in classification. Technical paper, Departm ent of Mathematics, University of Puerto Rico at Mayaguez. Available from academic.uprm.edu/eacuna/ paperout.pdf. In proceedings IPSI2004, Venice. Allen, D. W. 2011. Getting to Know ArcGIS ModelBuilder. ESRI Press. Redlands, CA. A m up, R.W. 1996. Site and stand factors influencing the abundance and distribution of coniferous advance growth in northeastern Ontario Northern Ontario Devel­ opment Agreement (NODA) Note No. 26. N atural Resources Canada, Canadian Forest Service, Great Lakes Forestry Centre, Sault Ste. Marie, ON. Astrup, R., K.D. Coates, E. Hall.2008. Recruitment limitation in forests: lessons from an unprecedented m ountain pine beetle epidemic. Forest Ecology and M anagem ent 256:1743-1750. Axelson, J.N., R.I. Alfaro, and B.C. Hawkes. 2009. Influence of fire and m ountain pine beetle on the dynamics of lodgepole pine stands in British Columbia. Forest Ecology and Management 257:1874-882. Banerjee, O., L.E. Ghaoui, and A. d'Aspremont. 2008. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research 9:485-516. Bassman, J.H., J.C. Zwier, J.R. Olson, and J.D. Newberry. 1992. Growth of advance regeneration in response to residual overstory treatm ent in northern Idaho. W estern Journal of Applied Forestry 7:78-81. Beck, J.R., and E.K. Schultz. 1986. The use of ROC curves in test performance evaluation. Archives of Pathology and Laboratory Medicine 110:13-20. Berk, R.A. 2005. Data mining w ithin a regression framework. In: M aimon O., Rokach L., editors. Data mining and knowledge discovery handbook. Springer, N ew York, NY. 101 Bond, B.J., B.T. Farnsworth, R.A. Coulombe, and W.E. Winner. 1999. Foliage physiology and biochemistry in response to light gradients in conifers w ith varying shade tolerance. Oecologia 120; 183-192. Bouchard, M., D. Kneeshaw, and Y. Bergeron. 2005. Mortality and stand renewal following the last spruce budw orm outbreak in mixed forests of w estern Quebec. Forest Ecology and Management 204:297-313. Boucher, J.F., P.Y. Bernier, H.A. Margolis, and A.D. M unson. 2007. G row th and physiological response of eastern white pine seedlings to partial cutting and site preparation. Forest Ecology and M anagement 240:151-164. Bowler, R., A.L. Fredeen, M.G. Brown, and T.A. Black. 2012. Residual vegetation importance to net CO 2 update in pine-dom inated stands following m ountain pine beetle attack in central British Columbia, Canada. Forest Ecology and M anagem ent 269: 82-91. Breiman, L. 2001. Statistical modeling: The two cultures. Statistical Science 16:199-215. Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. 1984. Classification and Regression Trees. Chapman and Hall (W adsworth Inc), N ew York, NY. Brown, M.G., T.A. Black, Z. Nesic, A.L. Fredeen, V.N. Foord, D.L. Spittlehouse, R. Bowler, P.J. Burton, J.A. Trofymow, N.J. Grant, and D. Lessard. 2012. The carbon balance of two lodgepole pine stands recovering from mountain pine beetle attack in British Columbia. Agricultural and Forest Meteorology 153:82-93. British Columbia Ministry of Forests. 2000. Establishment to free growing guidebook. Prince George Forest Region. Revised edition, Version 2.2. Forest Practices Code of British Columbia Guidebook. Forest Practices Branch, BC Ministry of Forests, Victoria, BC. Bulmer C., M. Krzic, and K. Green. 2003. Soil productivity and forest regeneration success on reclaimed oil and gas sites. Final report to Oil and Gas Environm ental Fund April 2003. Available at: https://circle.ubc.ca/handle/2429/9044. Bunnell, F.L. 1989. Alchemy and uncertainty: w hat good are models? Gen. Tech. Rep. PNW-GTR-232. U.S. Departm ent of Agriculture, Forest Service, Pacific Northwest Research Station, Portland, OR. 102 Burgess, D., and S. Wetzel. 2000. N utrient availability and regeneration response after partial cutting and site preparation in eastern white pine. Forest Ecology and Management 138:249-261. Burnham, K.P., and D.R. Anderson. 2002. Model selection and m ultim odel inference: a practical information-theoretic approach. 2nd Edition. Springer-Verlag, New York, NY. Bums, R.M., and B.H. Honkala (Technical coordinators). 1990. Silvics of N orth America. Volume 1 and 2. U.S. Departm ent of Agriculture. Agriculture H andbook 654. Forest Service, United States Departm ent of Agriculture, W ashington, DC. Burton, P.J. 2006. Restoration of forests attacked by m ountain pine beetle: Misnomer, misdirected, or must-do? BC Journal of Ecosystems and M anagem ent 7:1-10. Burton, P.J. 2008a. The m ountain pine beetle as an agent of forest disturbance. BC Journal of Ecosystems and M anagement 9:9-13. Burton, P.J. 2008b. The potential role of secondary structure in forest renew al after mountain pine beetle. Canadian Silviculture May 2008:26-29. Burton, P.J. 2010. Striving for sustainability and resilience in the face of unprecedented change: the case of the m ountain pine beetle outbreak in British Columbia. Sustainability 2:2403-2423. Burton, P.J., D.C. Sutherland, N.M. Daintith, M.J. W aterhouse, and T.A. Newsome. 2000. Factors influencing the density of natural regeneration in uniform shelterwoods dom inated by Douglas-fir in the Sub-Boreal Spruce Zone. W orking Paper No. 47. BC Ministry of Forests, Research Branch, Victoria, BC. Canham, C.D., D.K. Coates, P. Bartemucci, and S. Quaglia. 1999. M easurem ent and modelling of spatially explicit variation in light transmission through interior ced arhemlock forests of British Columbia. Canadian Journal of Forest Research 29:17751783. Coates, K.D. 2006. Silvicultural approaches to m anaging MPB damaged stands: Regeneration and m idterm timber supply. Presentation at the Northern Silviculture Committee Winter Workshop: January 16-18,2006. Silviculture tactics to lessen the downfall. Prince George, BC. 103 Coates, K.D., and P.J. Burton. 1997. A gap-based approach for the developm ent of silvicultural systems to address ecosystem m anagem ent objectives. Forest Ecology and Management 99:339-356. Coates, K.D., C. DeLong, P.J. Burton, and D.L. Sachs. 2006. Abundance of Secondary Structure in Lodgepole Pine Stands Affected by M ountain Pine Beetle. Report for the Chief Forester. Bulkley Valley Centre for Natural Resources Research and Management, Smithers, BC. Coates, K.D., and E.C. Hall. 2005. Implications of alternative silvicultural strategies in mountain pine beetle damaged stands. Technical Report for Forest Science Project Y051161. Bulkley Valley Centre for N atural Resources Research and M anagement, Smithers, BC. Coates, K.D., S. Haeussler, S. Lindeburgh, R. Pojar, and A.J. Stock. 1994. Ecology and silviculture of interior spruce in British Columbia. BC FRDA Report No. 220. BC Ministry of Forests, Victoria, BC. Coates, K.D., T. Glover, B. Henderson. 2009. Abundance of secondary structure in lodgepole pine stands affected by m ountain pine beetle in the Cariboo-Chilcotin; mountain pine beetle working paper 2009-20. N atural Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Coates, K.D., and D.L. Sachs. 2012. MPB Impacted Stands Assessment Project: Current State of Knowledge Regarding Secondary Structure in M ountain Pine Beetle Impacted Landscapes; unpublished MFLNRO document. Smithers, B.C. De'Ath, G., and K.E. Fabricius. 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81: 3178-92. DeLong C, D. Tanner, and M.J. Jull. 1993. A field guide for site identification and interpretation for the southwest portion of the Prince George Forest Region. Land Management Handbook 24. British Columbia Ministry of Forests, Victoria, BC. DeLong C. 2003. A field guide for site identification and interpretation for the southeast portion of the Prince George Forest Region. Land Management H andbook 51. British Columbia Ministry of Forests, Victoria, BC. 104 DeLong C. 2004. A field guide for site identification and interpretation for the north central portion of the Prince George Forest Region. Land Management Handbook 54. British Columbia Ministry of Forests, Victoria, BC. Derksen, S., and H. J. Keselman. 1992. Backward, forward and stepwise autom ated subset selection algorithms: frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology 45:265-282. Dhar, A., and C.D.B. Hawkins. 2011. Regeneration and growth following m ountain pine beetle attack: A synthesis of knowledge. BC Journal of Ecosystems and Management 12:1-16. Dhurandhar, A., and A. Dobra. 2008. Probabilistic Characterization of Random Decision Trees. Journal of Machine Learning Research 9: 2287-2314. Draper, N., and H. Smith. 1981. Applied Regression Analysis. (2nd edition). John Wiley and Sons, New York, NY. Drever, C.R., G. Peterson, C. Messier, Y. Bergeron, and M.D. Flannigan. 2006. Can forests management based on natural disturbances maintain ecological resilience? Canadian Journal of Forest Research 36: 2285-2299. Elith J., J.R., Leathwick, and T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802-13. Ellsworth, D.S., and P.B. Reich. 1992. Leaf mass per area, nitrogen content and photosynthetic carbon gain in Acer saccharum seedlings in contrasting forest light environments. Functional Ecology 6:423-435. ESRI. 2011. ArcGIS Desktop: Release 10. Environmental Systems Research Institute. Redlands, CA. Feldman, D., and S. Gross. 2003. Mortgage Default: Classification Trees Analysis, Discussion Paper No. 3. The Pinhas Sapir Center For Development Tel-Aviv University, Israel. Ferguson, D.E. 1984. Needed: guidelines for defining acceptable advance regeneration. USDA Forest Serv Res Note INT-341,5 p. Intermountain Forest and Range Experiment Station, Ogden, UT. 105 Ferguson, D. E., and C.E. Carlson. 1993. Predicting Regeneration Establishment w ith the Prognosis Model. USDA Forest Service, Interm ountain Research Station, Ogden, Utah. Research Paper INT- 467. Franklin, J.F., D.R. Berg, D.A. Thornburgh, J.C. Tappeiner. 1997. Alternative silvicultural approaches to timber harvesting: variable retention harvest systems. In: Kohm, K.A., Franklin, J.F. (Eds.), Creating a Forestry for the 21st Century. Island Press, Washington, DC. Franklin, J.F., T.A. Spies, R. Van Pelt, A.B. Carey, D.A. Thornburgh, D.R. Berg, D.B. Lindenmayer, M.E. Harmon, W.S. Keeton, D.C. Shaw, K. Bible, and J. Chen. 2002. Disturbances and structural development of natural forest ecosystems w ith silvicultural implications, using Douglas-fir forests as an example. Forest Ecology and Management 155: 399-423. Franklin, J. 2009. M apping Species Distributions. 1st edition. Cambridge University Press. New York, NY. Friedl, M. A., C. Brodley, and A. Strahler. 1999. Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Transactions on Geoscience and Remote Sensing 37:969-977. Friedman, J.H., and J.J. Meulman. 2003. Multiple additive regression trees w ith application in epidemiology. Statistics in Medicine 22:1365-1381. Freedman, D. A. 1983. A note on screening regression equations. The American Statistician 37:152-155. Gautreaux, R. 1999. Vegetation Response Unit Characterizations and Target Landscape Prescriptions, Kootenai National Forest, 1999. USDA Forest Service, Northern Region, Kootenai National Forest, Libby, MT. Gordon, D.T. 1973. Released advance reproduction of w hite fir and red fir: Effects on growth, damage, and mortality. USDA Forest Service, Research Paper. Pacific Southwest Forest and Range Experiment Station, Berkeley, CA. Graham, J. M. 2001. The ethical use of statistical analyses in psychological research. Paper Presented at the annual m eeting of Division 17 (Counseling Psychology) of the American Psychological Association, Houston, TX. 106 Gray, J.R., G.D. Glysson, and D.S. Mueller. 2002. Comparability and accuracy of fluvialsediment data: A view from the U.S. Geological Survey. Proceedings of the American Society of Civil Engineers, Hydraulic M easurements and Methods Symposium, JulyAugust, 2002, Estes Park, CO. Greene, D.F., J.C. Zasada, L. Sirois, D. Kneeshaw, H. Morin, I. Charron, and M.J. Simard. 1999. A review of the regeneration dynamics of N orth American boreal forest tree species. Canadian Journal of Forest Research 29:824-839. Griesbauer, H., and S. Green. 2006. Examining the utility of advance regeneration for reforestation and timber production in unsalvaged stands killed by the m ountain pine beetle: Controlling factors and m anagem ent implications. BC Journal of Ecosystems and Management 7:81-92. Guisan, A., S.B. Weiss, A.D. Weiss. 1999. GLM versus CCA spatial m odeling of plant species distribution. Plant Ecology 143:107-122. Han, J., M. Kamber, and J. Pei. 2011. Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann. San Francisco, CA. Hanley, J.A., and B.J. McNeil. 1982. The m eaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36. Hansen, M.C., R.S. DeFries, J.R.G. Townshend, and R. Sohlberg. 2000. Global land cover classification at 1km spatial resolution using a classification tree approach. International Journal of Remote Sensing 21:1331-1364. Hastie, T., R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York, NY. Hawkes, BC, S.W. Taylor, C. Stockdale, T.L. Shore, R.I. Alfaro, R.A. Campbell, and P. Vera. 2003. Impact of m ountain pine beetle on stand dynamics in British Columbia. In M ountain pine beetle symposium: Challenges and solutions, October 30-31,2003, Kelowna, BC. Hawkins, C.D.B., A. Dhar, N.A. Balliet, and K.D. Runzer. 2012. Residual m ature trees and secondary stand structure after m ountain pine beetle attack in central British Columbia. Forest Ecology and Management 277:107-115. 107 Hawkins, C., and P. Rakochy. 2007. Stand-level effects of the mountain pine beetle outbreak in the central British Columbia interior. M ountain pine beetle initiative working paper 2007-2006, m ountain pine beetle initiative project #8.23. N atural Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Heithecker, T.D., and C.B Halpem. 2007. Edge-related gradients in microclimate in forest aggregates following structural retention harvests in western W ashington. Forest Ecology and Management 248:163-173. Helie, J.F., D.L. Peters, K.R. Tattrie, and J.J. Gibson. 2005. Review and synthesis of potential hydrologic impacts of m ountain pine beetle and related harvesting activities in British Columbia. M ountain Pine Beetle Initiative Working Paper 200523. Natural Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Hughes, A.R., 2010. Disturbance and diversity: an ecological chicken and egg problem. Nature Education Knowledge 1:26-33. Johnson, E.A., H. Morin, K. Miyanishis, R. Gagnon, and D.F. Greene. 2003. A process approach to understanding disturbance and forest dynamics for sustainable forestry. In P.J. Burton, C. Messier, D.W. Smith, and W.L. Adamowicz, editors. Towards Sustainable Management of the Boreal Forest. NRC Research Press, Ottawa, ON. Johnson, E.A., and K. Miyanishi (eds.). 2007. Plant Disturbance Ecology: The Process and the Response. Academic Press, San Diego, CA. Kaufmann, M.R., G.H. Aplet, M. Babler, W.L. Baker, B. Bentz, M. Harrington, B.C. Hawkes, L.S. Huckaby, M.J. Jenkins, D.M. Kashian, R.E. Keane, D. Kulakowski, C. McHugh, J. Negron, J. Popp, W.H. Romme, T. Schoennagel, W. Shepperd, F.W. Smith, E.K. Sutherland, D. Tinker, T.T. Veblen. 2008. The status of our scientific understanding of lodgepole pine and mountain pine beetles - a focus on forest ecology and fire behavior. Global Fire Initiative technical report 2008-2, Arlington, VA. Kayes, L.J., and D.B. Tinker. 2012. Forest structure and regeneration following a mountain pine beetle epidemic in southeastern Wyoming. Forest Ecology and Management 263:57-66. 108 Kim, H., and S. Yates. 2003. Missing value algorithms in decision trees. In Bozdogan, H. editor, Statistical Data Mining and Knowledge Discovery. Chapman and Hall/CRC, Boca Raton, FL. Kimmins, J.P., C. Welham, B. Seely, M. Meitne, R. Rempel, and T. Sullivan. 2005. Science in forestry: why does it sometimes disappoint or even fail us. Forestry Chronicle 81: 723-734. Klinka, K., J. Worral, L. Skoda, and P. Varga. 2000. The distribution and synopsis of ecological and silvical characteristics of tree species of British Columbia's forests. Canadian Cartographies Ltd., Coquitlam, BC. Knapp, T.R., and S.S. Sawilowsky. 2001. Constructive criticisms of methodological and editorial practices. The Journal of Experimental Education 71:65-79. Kneeshaw, D.D., Y. Bergeron, and L. DeGrandpre. 1998. Early response of Abies balsamea seedlings to artificially created openings. Journal of Vegetation Science 9:543-550. Kneeshaw, D.D. and P.J. Burton. 1997. Canopy and age structures of som e old subboreal Picea stands in British Columbia. Journal of Vegetation Science 8:615-626. Khoshgoftaar, T., and E.B. Allen. 2001. Controlling overfitting in classification-tree models of software quality. Empirical Software Engineering 6:59-79 Klock, R., and J. Mullock. 2008. The W eather of British Columbia: Graphic Area Forecast 31. The Meteorological Service of Canada and NavCanada, Kelowna, BC. Kreakie BJ, Y. Fan, and T.H. Keitt. 2012. Enhanced m igratory waterfowl distribution modeling by inclusion of depth to w ater table data. PLoS ONE 7: e30142. Krivec, J., and G. Matjaz. 2011. Data Mining Techniques for Explaining Social Events, Knowledge-Oriented Applications in Data Mining, Kimito Funatsu (Ed.). Jozef Stefan Institute, Slovenia. Leadem, C., S.L. Gillies, K.H. Yearsley, V. Sit, D.L. Spittlehouse, and P.J. Burton. 1997. Field studies of seed biology. Land M anagement Handbook No. 40. British Columbia Ministry of Forests, Victoria, BC. 109 LeMay, V., P. Marshall, H. Temesgen, A.A. Zumrawi, B. Hassini, K. Froese, C. Lencar, and R. Martin. 2002. Development of natural regeneration and juvenile height growth models for complex stands of Southeastern and Central British Columbia. Report prepared for: Forest Renewal BC, Victoria, BC. LePage, P.T., C.D. Canham, K.D. Coates, and P. Bartemucci. 2000. Seed sources versus substrate limitations of seedling recruitm ent in interior cedar- hemlock forest of British Columbia. Canadian Journal of Forest Research 30:415-427. Lewis, R.J. 2000. An Introduction to Classification and Regression Tree (CART) Analysis. Presented at Annual Meeting of the Society for Academic Emergency Medicine. Available at: http://www.saem .org/download/lewisl.pdf. Accessed June 2012. Lewis, K.J., and I. Hartley. 2005. Rate of deterioration, degrade and fall of trees killed by m ountain pine beetle: A synthesis of the literature and experiential knowledge. Mountain Pine Beetle Initiative W orking Paper 2005-14. N atural Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Lewis, K.J. 2011. Forest health and mortality of advance regeneration following canopy tree mortality caused by the m ountain pine beetle. MPB W orking Paper 2010-03. Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Lieffers, V.J., and K.J. Stadt. 1994. Growth of understory Picea glauca, Calamagrostis canadensis, and Epilobium angustifolium in relation to overstory light transmission. Canadian Journal of Forest Research 24:1193-1198. Lindenmayer, D.B., P.J. Burton, and J.F. Franklin. 2008. Salvage Logging and Its Ecological Consequences. Island Press, W ashington, DC. Lindenmayer, D.B., and J.F.Franklin. 2002. Conserving Forest Biodiversity: A Comprehensive Multiscaled Approach. Island Press, Washington, DC. McCarthy, J. 2001. Gap dynamics of forest trees: A review with particular attention to boreal forests. Environmental Reviews 9:1-59. McGuire, J.P., R.J. Mitchell, E.B. Moser, S.D. Pecot, D.H. Gjerstad, and C.W. Hedman.2001. Gaps in a gappy forest: plant resources, longleaf pine regeneration, and understory response to tree removal in longleaf pine savannas. Canadian Journal of Forest Research 31: 765-778. 110 McWilliams, J. 2009. A Review and Analysis of the Effect of BC's C urrent Stocking Standards on Forest Stewardship. Association of BC Forest Professionals. Vancouver, BC. Maclauchlan, L. 2006. Status of m ountain pine beetle attack in young lodgepole pine stands in central British Columbia. Report to the Chief Forester, Jim Snetsinger, at the 2006 Forest Health Review Committee Meeting, Victoria, BC. MacLean, D.A., and A.R. Andersen. 2008. Impact of a spruce budworm outbreak in balsam fir and subsequent stand development over a 40-year period. Forestry Chronicle 84:60-69. Manel, S., H.C., Williams, and S.J. Ormerod. 2001. Evaluating presence-absence models in ecology: the need to account for prevalence. Journal of A pplied Ecology 38:921-931. Martin, P.J., B. Bancroft, K. Day, and K. Peel. 2005. A new basis for understory stocking standards for partially harvested stands in the British Columbia interior. Western Journal of Applied Forestry 20:5-12. Martin, P. 2012. Improving inventory information in MPB-affected m anagem ent units. Forest Analysis and Inventory Branch. Available from http://www.for.gov.bc.ca/hts/vri/newsletter/improving_inventory_in%20mpb_units _2012.pdf Matlack, G. R. 1994. Vegetation dynamics of the forest ed g e—trends in space and successional time. Journal of Ecology 82:113-123. Meidinger, D., J. Pojar, and W.L. Harper. 1991. Sub-Boreal Spruce zone. In Ecosystems of British Columbia. Special Report Series No. 6. BC M inistry of Forests, Victoria, BC. Available from: http://www.for.gov.bc.ca/hfd/pubs/Docs/Srs/Srseries.htm Messier, C., R. Doucet, J. Ruel, Y. Claveau, C. Kelly, and M.J. Lechowicz. 1999. Functional ecology of advance regeneration in relation to light in boreal forests. Canadian Journal of Forest Research 29:812-823. Ministry of Forests, Lands, and N atural Resource Operations. 2011. Forests for Tomorrow Current Reforestation and Timber Supply Mitigation Strategic Plan 2011 to 2015. I ll Ministry of Forests, Lands and Natural Resources Operations, Inventory Section, Forest Analysis and Inventory Branch. 2012. Approach of the inventory program in 2012-13 to improve inventory information in MPB-affected m anagem ent units. Mitchell, J. 2005. Review and synthesis of regeneration methods in beetle-killed stands following m ountain pine beetle (Dendroctonus ponderosae) attack: A literature review. Mountain Pine Beetle Initiative W orking Paper 2005-16. N atural Resources Canada, C anadian Forest Service, Pacific Forestry Centre, Victoria, BC. Moisen, G. G. 2008. Classification and regression trees. In: Jorgensen, S.E. and B.D. Fath (Editor-in-Chief). Encyclopedia of Ecology, volume 1. Oxford, U.K. Moles, A.T., and D.R. Drake. 1999. Potential contributions of the seed rain and seed bank to regeneration of native forest under plantation pine in New Zealand. New Zealand Journal of Botany 37: 83-93. Mori, A.S. 2011. Ecosystem management based on natural disturbances: hierarchical context and non-equilibrium paradigm. Journal of Applied Ecology 48: 280-292. Murphy, T.E.L., D.L. Adams, and D.E. Ferguson. 1999. Response of advance lodgepole pine regeneration to overstory removal in eastern Idaho. Forest Ecology Management 120:235-244. Nigh, G.D., J.A. Antos, and R. Parish. 2008. Density and distribution of advance regeneration in m ountain pine beetle killed lodgepole pine stands of the M ontane Spruce zone of southern British Columbia. Canadian Journal of Forest Research 38:2826-2836. Oguchi, R., K. Hikosaka, T. Hiura, and T. Hirose. 2006. Leaf anatomy and light acclimation in woody seedlings after gap formation in a cool-temperate deciduous forest. Oecologia 149:571-582. Oliver, C., and B. Larson. 1996. Forest stand dynamics (update edition). McGrawHill Inc., New York, NY. Palik B.J., R.J Mitchell, and J.K Hiers. 2002. M odeling silviculture after natural disturbance to sustain biodiversity in the longleaf pine Pinus palustris ecosystem: balancing complexity and implementation. Forest Ecology and M anagem ent 155: 347-356. 112 Palik, B.J., R.J. Mitchell, S. Pecot, M. Battaglia, and M. Pu. 2003. Spatial distribution of overstory retention influences resources and grow th of longleaf pine seedlings. Ecological Applications 13: 674-686. Palik, B.J., R. Mitchell, G. Houseal, and N. Pederson. 1997. Effects of canopy structure on resource availability and seedling responses in a longleaf pine ecosystem. Canadian Journal of Forest Research 27:1458-1464. Pardy, A.B. 1997. Forest succession following a severe spruce budworm outbreak at Cape Breton Highlands National Park. M.Sc.F. thesis, University of N ew Brunswick, Fredericton, N.B. Parkins, J., and N. MacKendrick. 2007. Assessing community vulnerability: A study of the mountain pine beetle outbreak in British Columbia, Canada. Global Environmental Change 17:460-471. Pedersen, L. 2004. Expedited tim ber supply review for the Lakes, Prince George, and Quesnel Timber Supply Areas. Public Discussion Paper. BC M inistry of Forests and Range, Victoria, BC. PEM - Ecosystem and Terrain M apping Data Inventory [GIS shapefile]. 2008. Land and Resource Data Warehouse. Available http: https://apps.gov.bc.ca/pub/dwds/addProducts.do. [Accessed: May, 2012]. Pickett, S. T. A., and P. S. White, editors. 1985. The ecology of natural disturbance and patch dynamics. Academic Press, NewYork, NY. Pousette, J., and C. Hawkins. 2006. An assessment of critical assumptions supporting the timber supply m odelling for mountains-pine-beetle-induced allowable annual cut uplift in the Prince George Timber Supply Area. BC Journal of Ecosystems and Management 7:93-104. R Core Team. 2012. R: A language and environm ent for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.Rproject.org/. Refaeilzadeh, P., L. Tang, and H. Liu. 2009. Cross Validation. Encyclopedia of Database Systems. Springer, New York, NY. 113 Ribbens, E., Jr., J.A. Silander, and S.W. Pacala. 1994. Seedling recruitment in forests: Calibrating models to predict patterns of tree seedling dispersion. Ecology 75:17941806. Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK. Roberts, M.R. 2004. Response of the herbaceous layer to natural disturbance in North American forests. Canadian Journal of Botany 82:58-64. Sattler, F.D. 2009. A hybrid m odel to estimate natural recruitment and grow th in stands following m ountain pine beetle disturbance. M.Sc. thesis, University of British Columbia, Vancouver, BC. Seymour R.S., A.S. White, and P.G. deMaynadier. 2002. Natural disturbance regimes in northeastern North America - evaluating silvicultural systems using natural scales and frequencies. Forest Ecology and Management 155:357-367. Sherrod, P.H. 2006. DTREG Software for Predictive Modelling and Forecasting User Manual. Available from ww w.dtreg.com . Smidt M., and C.R. Blinn. 1995. Logging for the 21st Century: Forest Ecology and Regeneration, FO-06517. University of Minnesota, Minneapolis, MN. Smith, S.M. 1990. The Prognosis M odel Adapted for Dry-Belt Douglas-Fir in British Columbia. Report prepared by Stephen Smith & Associates, Victoria, B.C. for BC Ministry of Forests. Snetsinger, J. 2011. Prince George Timber Supply Area: Rationale for Allowable Annual Cut (AAC) Determination. British Columbia Ministry of Forests, Mines and Lands. Available from http://www.for.gov.bc.ca/hts/tsa/tsa24/tsr4/24tsllra.pdf. Stephens, P.A., S.W. Buskirk, G.D. Hayward, and C. Martinez del Rio. 2005. Information theory and hypothesis testing: a call for pluralism. Journal of Applied Ecology 42: 4-12. Strobl C., A.L. Boulesteix, and T. Augustin. Unbiased Split Selection for Classification Trees Based on the Gini Index. 2007. Computational Statistics and Data Analysis 52:483-501. 114 Strobl, C., J. Malley, and G. Tutz. 2009. An Introduction to Recursive Partitioning: Rational, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests. Psychological Methods 14:323-348. Stockdale, C., S. Taylor, and B. Hawkes. 2004. Incorporating mountain pine beetle impacts on stand dynamics in stand and landscape models: A problem analysis. In: Mountain Pine Beetle Symposium: Challenges and Solutions, October 30-31,2003, Kelowna, BC T.L. Shore, J.E. Brooks, and J.E. Stone (editors). Information Report BCX-399. Natural Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, BC. Stone, W.E., and M.L. Wolfe. 1996. Response of understory vegetation to variable tree mortality following a m ountain pine beetle epidemic in lodgepole pine stands in northern Utah. Plant Ecology 122:1-12. Sutherland, J. 2012. Guidance for assessing FSP stocking standards alignm ent w ith addressing immediate and long-term forest health issues. Memorandum. File: 28030/FSP. Ministry of Forests, Lands, and N atural Resource Operations. Available from https://www.for.gov.bc.ca/hfp/silviculture/Guidance%20for%20assessing%20FSP%2 0stocking%20standards%201une%2021%202012.pdf Tews, J. and F. Jeltsch. 2004. Modelling the impact of climate change on w oody plant population dynamics in South African savannah. BMC Ecology 4:17. Themeau, T.M, B. Atkinson, and B. Ripley. 2012. Recursive Partitioning. R package version 3.1-52. http://CRAN.R-project.org/package=rpart. Thompson, Bruce. 1995. Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. Educational and Psychological Measurement 55:525-534. Vyse, A., C. Ferguson, D.J. Huggard, J. Roach, and B. Zimonick. 2009. Regeneration beneath lodgepole pine dominated stands attacked or threatened by the m ountain pine beetle in the south central Interior, British Columbia. Forest Ecology and Management 258: S36-S43. 115 VRI - Forest Vegetation Composite Polygons and Rank 1 Layer [GIS shapefile]. 2006. Land and Resource Data Warehouse. DataBC. Available http: http://apps.gov.bc.ca/pub/dwds/addProducts.do?orderId=1206753&packagedProdu ctld=-l&indirect=true. [Accessed: May, 2007]. Walton, A. 2012. Provincial-Level Projection of the Current Mountain Pine Beetle Outbreak: Update of the infestation projection based on the Provincial Aerial Overview Surveys of Forest Health conducted from 1999 through 2011 and the BCMPB m odel (year 9). BC Forest Service: Victoria, B.C. Available from http://www.for.gov.bc.ca/hre/bcmpb/Year9.htm. Wang, T., A. Hamann, D.L. Spittlehouse, and S.N. Aitken. 2006. Developm ent of scalefree climate data for western Canada for use in resource management. International Journal of Climatology 26:383-397. Weetman, G., and A.H. Vyse. 1990. N atural regeneration. In: Regenerating British Columbia's forests. D.P. Lavender, R. Parish, C.M. Johnson, G. Montgomery, A. Vyse, R.A. Willis, and D. W inston (editors). University of British Columbia Press, Vancouver, BC. White, P.S., and S.T.A. Pickett. 1985. N atural disturbance and patch dynamics: an introduction. In: Pickett, S.T.A.; White, P.S., eds. The ecology of natural disturbance and patch dynamics. Academic Press, New York, NY. Wieland, L.M, R.C.G, Mesquita, P.E.D. Bobrowiec, T.V. Bentos, and G.B. Williamson. 2011. Seed rain and advance regeneration in secondary succession in the Brazilian Amazon. Tropical Conservation Science 4(3):300-316. Williams, H., C. Messier, and D.D. Kneeshaw. 1999. Effects of light availability and sapling size on the growth and crown morphology of understory Douglas-fir and Lodgepole pine. Canadian Journal of Forest Research 29:222-231. Williston, P., D. Cichowski, and S. Haeussler. 2006. The response of caribou terrestrial matforming lichens to m ountain pine beetles and forest harvesting in the East Ootsa and Entiako areas: Final Report -2005 - Years 1 to 5. A report to MoriceLakes Innovative Forest Practices Agreement, Prince George, BC, the Bulkley Valley Centre for Natural Resources Research and Management, and BC Parks, Smithers, BC. 116 Whittingham, M, P. Stephens, R.B. Bradbury, and R.P. Freckleton. 2006. W hy do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75:1182-1189. Wilford, D. 2008. Timber Growth and Value Conference Proceedings closing remarks. Bulkley Valley Centre for N atural Resources Research and M anagement. Smithers, BC. Available from http://bvcentre.ca/files/research_reports/0707ProceedingsTimberGrowth.pdf Wright, E.F., C.D. Canham, and K.D. Coates. 2000. Effects of suppression and release on sapling growth for 11 tree species of northern, interior British Columbia. Canadian Journal of Forest Research 30:1571-1580. 117 Appendix 1 The attached table are the published MSSpa values and conditions that w ere used to define the level of stocking against which to assess seedling and sam pling densities. Minimum Stocking Standards (MSSpa) for preferred and acceptable species (BC Ministry of Forests, 2000) BEC U n it SiteS eries T SS pa M SSpa M S Sp SB Sdk 1 1200 700 600 SB Sdk 2 1000 500 400 SB Sdk 3 1200 700 600 SB Sdk 5 1.200 700 600 SB Sdk 6 1200 700 600 SB Sdk 7 1000 500 400 SBSdw 2 1 1200 700 600 SBSdw 2 6 1200 700 600 SBSdw 2 8 1200 700 600 SBSdw 2 9 1200 700 600 1200 700 600 SBSdw 3 1 SBSdw 3 3 1200 700 600 SBSdw 3 4 1200 700 600 SBSdw 3 5 1200 700 600 SBSdw 3 6 1200 700 600 SBSdw 3 7 1200 700 600 SBSdw 3 8 1200 700 600 SBSdw 3 9 1000 500 400 SBSdw 3 10 400 200 200 SBSm c2 1 1200 700 600 SBSm c2 3 1200 700 600 SBSm c2 5 1200 700 600 SBSm c2 6 1200 700 600 SBSm c2 10 1000 500 400 SBSm c3 1 1200 700 600 SBSm c3 4 1200 700 600 SBSm c3 5 1200 700 600 SBSm c3 7 1200 700 600 S B Sm kl 3 1200 700 600 S B Sm kl 5 1200 700 600 SB Sm w 1 1200 700 600 S B S w kl 5 1200 700 600 118 Appendix 2 Colour themed maps depicting probability of being stocked w ithin NTS 1:250K map tiles 93E,F,G,J,K,L The following m aps are colour-themed m aps built from the classification tree m odel rules presented in this thesis. This is a full series of probability of stocking m aps for all three stocking groups: 1) >600 stems/ha conifer only; 2) >600 stems/ha all tree species; and 3) MSSpa (minimum stocking standards for preferred and acceptable species). 119 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93E Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% 120 >600 stems/ha all trees I 93L ■ 0 3 93K 93J 93F 93G 10 30km --1 1__1____ I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93G Probability of being Stocked 0-20% 20-40% jM 40-60% 60-80% 80-100% >600 stems/ha all trees 1 93L I 93K 93J 1 93E L_____ 9 3 F f f l 0 S 1 I 10 20km I_____I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93J Probability of being Stocked 0-20% 20-40% 4060% 6060% 80100% >600 stems/ha all trees 93L 93E 0 1 5 I 93F 93G 10 20km 1_______ I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93K Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% >600 stems/ha all trees 0 5 1 I 10 20km I_____1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93L Probability of being Stocked 0-20% 2040% 40-60% 60-80% 80-100% >600 stems/ha all trees ■ 93K 93J 93E 93F 93G 0 5 1 I 10 20km I_______ I L' Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93E Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% 126 >600 stems/ha conifers m 0 93K 93J 93F 93G 5 10 20 km ____ 1 l__ 1-----------1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93F Probability of being Stocked 0- 20% 20-40% 40-60% 60-80% 80-100% >600 stems/ha conifers 93L 93K 93E 0 93G 5 10 1 I 93J 20km 1_____I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93G Probability of being Stocked Jr ' », . nxT'.j'gm "T*#■* **, 600 stems/ha conifers 93L 93K 93E 93F 93J a* 0 5 10 1 i L_ 20 bn _J Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93J Probability of being Stocked 40-60% H 80-100% >600 stems/ha conifers j 93L ! ! 1 93E 93K SI 93F 93G 0 10 20km 5 1 l I_____ I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93K Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% U oJ >600 stems/ha conifers 93L H 93E 0 B 93J 93F 93G 5 10 1 I 20km I------- 1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93L Probability of being Stocked - % 0 20 . 40-60% B l 60-80% ( j | | 80-100% 131 >600 stems/ha conifers 0 5 93K 93J 93F 93G 10 20 km _ 1_ I---1------- 1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93E Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% 132 Minimum Stocking Standards (MSSpa) 93L ■ 93K 93J ! 93G w 0 5 10 20km 1 i I------- 1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93F Probability of being Stocked 0-20% 2040% 40-60% 60-80% 80-100% Minimum Stocking Standards (MSSpa) 93L 93K 93E 0 93G 5 10 1 i 93J 20km I------- 1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93G Probability of being Stocked 0-20% 2040% 40-60% 60-80% 80-100% 134 Minimum Stocking Standards (MSSpa) ' 'rtr / 4 *** 93L 93K 93E 93F 0 5 10 1 l 93J 20km I------- 1 Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93J Probability of being Stocked 0- 20% 20-40% 40-60% 160-80% I80-100% u> Ol Minimum Stocking Standards (MSSpa) j93L r 93K IttlH L_... j 93E i 93F 93G j 0 10 20km 5 1 l I_____ I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93K Probability of being Stocked 0 -20% 20-40% 40-60% 60-80% 80-100% 136 Minimum Stocking Standards (MSSpa) 93L H H H 93J 93E 93F 93G 0 10 20km 5 1 I I_____I Predicted Locations and Stocking Probability of Advance Regeneration Under Mature Pine Stands in Central British Columbia 93L Probability of being Stocked 0-20% 20-40% 40-60% 60-80% 80-100% 137 Minimum Stocking Standards (MSSpa) 93J 93E 03F 93G 0 10 20km I 5 1 I 1_____I Appendix 3 Classification Tree Analysis: Recursive Partitioning, Cross Validation, and Variable Importance For classification trees built w ith a categorical target variable, the determination of w hat category to assign a node is more complex: it is the category that minimizes the misdassification cost for the observations in the node. In the simplest case, every row that is m isdassified has a cost of 1 and every row that is correctly dassified has a cost of 0. A problematic issue in recursive partitioning is the decision of how large to build the tree (Breiman et al., 1994; Khoshgoftaar and Allen, 2001). Too large a tree means excessive branches, which in turn, m ean excessive nodes. This represents an over-fitting of the model. If two trees provide equivalent predictive accuracy, the simpler tree is preferred because it is less sensitive to outliers and spurious observations, easier to understand, and faster to use for making predictions. Limits m ust be placed on the size of the resultant tree. W ithout limits, a tree could be built so large that there is terminal node for every case in the original dataset. In addition to it being computationally expensive, this situation w ould represent a solution that would be too difficult to interpret and it w ould have no applicability to new cases, i.e., the model would be over-fit. Pre-pruning of the tree can occur by simply 138 providing limits to the classification routine. The analyst can either delim it the number of observations necessary for a node split or program the m axim um number of branch levels allowed to be calculated. Both methods will artificially stop the classification splitting before the tree becomes too large. Generally, the classification is provided a generous berth and stops w hen there are no more statistically significant splits to be made, (i.e., maxim um purity of nodes is achieved). This strategy usually results in a larger tree than necessary w ith over-fitting. A tree with maximum nodes will result in the best fit m odel for the dataset. This, however, is not necessarily the best model for all datasets. There is a decision cost associated with producing an overfitted model, necessitating the identification of the optimal sized tree that will best fit subsequent datasets. Parametric models use penalization strategies such as Akaike Information Criterion (AIC) model fit measurements to ensure parsimony is achieved. In fire case of nonparametric modelling, a widely accepted m ethod of model fit is tree pruning (Strobl, 2009). Through the use of v-fold cross validation - one pruning m ethod — the resultant can be pruned back to an optimal tree size (Dhurandhar and Dobra, 2008). Cross validation is widely used statistical strategy to evaluate or compare learning algorithms or decision tree models through a generalization error. The role of the v-fold cross validation is to separate the data into a test group and a validation group (folds). These groups, however, are created in such a m anner that 139 each point is 'crossed-over' between being a test point and validation point, ensuring that every data point is evaluated (Refaeilzadeh et al., 2009). This m ethod differs from the hold-out method, where data isolated as a validation set are never evaluated. In v-fold cross validation, the working dataset is separated into v equal parts (10 equal parts in the case of 10 v-fold). A test classification tree is built w ith v - 1 held back for validation. The training data are run and the test data are run as an independent check against the training data for accuracy. The results of both are compared and then stored as the initial test. The process of splitting off the first v-1 test data against training data is automatically conducted nine more times (in the case of 10 v-fold), each time with a new independent validation dataset. Once the process has been run 10 times, the classification error rate calculated for each of the ten test runs are averaged together to provide a generalization error or crossvalidation cost (Dhurandhar and Dobra, 2008). The tree size that produces the minimum cross validation cost is pruned to the num ber of nodes m atching the size that produces the minimum cross validation cost. The literature does not indicate that using more than ten folds improves the accuracy of the generalized error Backward pruning requires significantly more calculations than forw ard pruning, b u t the tree sizes are much more optimally calculated (Sherrod, 2006). When the target variable and the predictor variables are categorical (and both are multivariate, i.e., more than two categories), this creates a more m athematically 140 complex process. To perform an exhaustive search, the classifier m ust evaluate a potential split for every possible combination of categories of the predictor variable. The num ber of splits is equal to 2(k'h -l where k is the num ber of categories of the predictor variable. For example, if there are 5 categories, 15 splits are tried; if there are 10 categories, 511 splits are tried; if there are 16 categories, 32,767 splits are tried; if there are 32 categories, 2,147,483,647 splits are tried. Because of this exponential growth, the computation time to do an exhaustive search becomes prohibitive w hen there are more than about 12 predictor categories (Sherrod, P.H. DTREG Predictive Modeling Software, personal communication, March 10,2011). If the target variable is binary and has only two possible categories, as is the case for a stocked or non-stocked condition as posed in m y thesis, the exhaustive search is conducted with efficiency. The ideal split would divide a group into two nodes in such a way that all of the observations in the left node are the same (have the same value as the target variable) and all of the observations in the right node are the same - but different from the left node. This is referred to as purity. If such a split can be found, then you can exactly and perfectly classify all of the observations by using just that split, and no further splits are necessary or useful. Such a perfect split is possible only if the observations in the node being split have only two possible values of the target variable. 141 Unfortunately, perfect splits do not occur often in nature, so it is necessary to evaluate and compare the quality of imperfect splits. Various criteria have been proposed for evaluating splits, bu t they all have the same basic goal, w hich is to favour homogeneity within each right/left node and heterogeneity betw een the right/left nodes. The heterogeneity, or dispersion, of target categories w ithin a node is called the "node impurity". The goal of splitting is to produce nodes w ith minimum impurity. The impurity of every node is calculated by examining the distribution of categories of the target variable for the rows in the group. A "pure" node, w here all rows have the same value of the target variable, has an impurity value of 0 (zero). When a potential split is evaluated, the weighted average of the im purities of the two nodes is subtracted from the im purity of the node from which they w ere split. This reduction in impurity is called the im provem ent of the split. The split w ith the greatest improvement is the one used. Im provem ent values for splits are show n in the node information that is part of the generated report. Variable importance is the relative importance that each variable plays in the splitting of the tree into nodes, both as prim ary or surrogate splitters. A surrogate splitter is an imputation technique that is employed w hen rows of the dataset have missing values. If a variable is called on as a primary splitter in the building of a tree and it has missing data, the developed surrogate will take its place and conduct 142 the split as it was developed in the initial tree building (Acuna and Rodriquez, 2004). It is im portant to note that a variable's importance is not m easured solely by how early it enters the tree to act as a splitter. A strong surrogate splitter may be more "important" to the classification model even though it enters the tree later than a weaker primary splitter (Lewis, 2000; Sherrod, 2006). The loss of an important variable in the decision m odel will likely weaken the m odel as a whole. Variable importance can also act as a recruiter for subsequent analysis, as it is clear that a variable with a higher variable importance score is likely a significant predictor of the response variable (Breiman, 2001). The variable that contributes the highest improvement measure (i.e., the variable that has the greatest effect on error rate increase) achieves a score of 100 (Banerjee et al., 2008). The m easures are based on the num ber of times a variable is selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all trees (Friedman and Meulman, 2003; Strobl et al., 2007). The relative influence (or contribution) of each variable is scaled so that the sum adds to 100, w ith higher numbers indicating stronger influence on the response (Elith et al., 2008). The remaining contributing predictor variables are scored relative to the m ost im portant variable. Importance, however, may refer to how im portant a variable is to the overall goodness of model fit, or it may refer to how im portant the variable is to the model's predictive ability.