Bayesian Spatiotemporal Modelling: An Application to Glacier Satellite Imagery by Michael James McCurdy B.Sc. Hons, University of British Columbia, 2015 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MATHEMATICS UNIVERSITY OF NORTHERN BRITISH COLUMBIA December 2020 © Michael James McCurdy, 2020 Abstract Glaciers hold 1.7% of the Earth's total water supply, but they contain 68.7% of its freshwater. Given the global warming trend, accurate and recent inventory is necessary to assess glacial changes over time. However, frequent cloud and debris cover often make it difficult to determine the glacier 's exact edge. Multispectral Landsat 8 imagery along with data from the Global Land Ice Velocity Extraction (GoLIVE) project are combined to to create a Bayesian multivariate general additive model of the glaciers surrounding Mount Rainier, with Autoregressive Moving Average (ARMA) and Gaussian processes used to model the temporal and spatial autocorrelations. Using root mean square error and Watanabe-Akaike information criterion, all 42 combinations of ARMA models up to 4 total pa- rameters and exponential, Matem and spherical covariance kernels were com- pared. The ARMA(3,1) processes with the exponential Gaussian process kernel was determined to be the best fit model. Gaussian mixture models, hierarchical clustering, hard and soft K-means clustering, and support vector machines are used to classify the posterior distribution. The hard K-means algorithm was the best classifier, and it accurately predicted 85.1% of the glaciers, compared to 68.8% from a univariate classification on the Red / SWIR band ratio. 2 TABLE OF CONTENTS Abstract 2 Table of Contents 3 List of Tables 4 List of Figures 5 List of Acronyms 8 Acknowledgements 10 1 Introduction 11 2 Literature Review 2.1 Methods for Glacier Mapping 2.2 Contribution of this Work . . 14 3 Data Preprocessing . . . Variables of Interest 20 21 26 4 Methods 4.1 Modelling Overview 4.2 Bayesian Methodology 4.3 Markov Chain Monte Carlo . . . 4.4 Generalised Additive Model . . 4.5 Seemingly Unrelated Regression 4.6 Autoregressive Moving Average 33 34 35 36 37 38 3.1 3.2 14 18 3 32 4.7 4.8 4.9 Gaussian Process Model Choice Classification 4.9.1 Hard K-means 4.9.2 Soft K-means 4.9.3 Support Vector Machines 4.9.4 Hierarchical Clustering . . 4.9.5 Gaussian Mixture Models 4.10 Principal Components Analysis 40 43 45 45 46 46 47 48 49 5 Results 5.1 Glacier Model Choice 5.2 Model Prediction 5.3 Classification 5.4 Principal Components Analysis 50 53 65 69 6 Conclusions and Discussion 72 Bibliography 78 4 50 LIST OF TABLES 2.1 Multispectral Landsat 8 data 15 3.1 Aquisition Dates and proportion of incomplete observations due to cloud cover for 2013-2019 22 5.1 5.2 5.3 5.4 5.5 RMSE for all potential models and variables of interest. The chosen model is highlighted in red WAIC for all potential models and variables of interest. The chosen models is highlighted in red Summary table for the classification of Mount Rainier 's glaciers in 2015. The total classification was calculated as a weighted average of the glacier and non-glacier classifications with respect to the amount of pixels Loadings of principal components analysis of the variables of interest. The screeplot in Figure 5.12 suggests all four principal components are necessary. Summary table of the rotated data for Mount Rainier 's glaciers in 2015. The univariate classification was not rotated but added for comparison. The total classification was calculated as a weighted average of the glacier and non-glacier classifications with respect to the amount of pixels 5 51 52 66 70 71 LIST OF FIGURES 1.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 Debris-covered toe of Emmons Glacier on Mount Rainier, August 1984 (Photo by Topinka Lyn) SWIRl -NIR-red composite of Mount Rainier on August 23, 2020. The red outline represents the extent of Mount Rainier 's glaciers in 2015. This extent was obtained from Andrew Fountain, Portland State University. SWIRl-NIR-red composite image pairs of Mount Rainier for 20132016. Left image is complete image, right image has clouds and cloud shadows removed SWIRl -NIR-red composite image pairs of Mount Rainier for 20172019. Left image is complete image, right image has clouds and cloud shadows removed Grid of Red / SWIR Band ratio of Mount Rainier 's glaciers 20132019. The grey indicates missing data due to clouds or cloud shad- 12 23 24 25 ows Grid of surface temperature of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows. . Grid of NDSI of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows Grid of glacier speed of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows. . . . 31 Demonstration of Gaussian process regression on a toy dataset. Top figure is the data, made of sums of sine functions with random noise. Bottom figure is 100 samples from the posterior of a Gaussian process regression function 42 6 28 29 30 Density plots of the posterior predictive distributions of the Red / SWIR band ratio and NDSI. The samples from the posterior simulate the observed distribution fairly well with the exception of the bottom NDSI figure. The NDSI observations have a bimodal distribution, which MCMC sampling has trouble converging upon 54 5.2 Density plots of the posterior predictive distributions of the surface temperature and glacier speed 55 ' 5.3 Grid of mean estimate of Red / SWIR Band ratio of Mount Rainier s 57 glaciers 2013-2019 5.4 Grid of mean estimate surface temperature of Mount Rainier 's 58 glaciers 2013-2019 5.5 Grid of mean estimate NDSI of Mount Rainier 's glaciers 20132019 59 5.6 Grid of mean estimate of glacier speed of Mount Rainier 's glaciers 60 2013-2019 ' 5.7 Grid of residuals of Red / SWIR Band ratio of Mount Rainier s glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows 61 5.8 Grid of residuals of surface temperature of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud 62 shadows 5.9 Grid of residuals of Mount Rainier 's glaciers 2013-2019. The grey 63 indicates missing data due to clouds or cloud shadows 5.10 Grid of residuals of glacier speed of Mount Rainier 's glaciers 20132019. The grey indicates missing data due to clouds or cloud shad64 ows 5.11 Glacier classification techniques for Mount Rainier 's glaciers in 68 2015. The 2015 glacier extent is outlined in red 5.12 Screeplot of estimated Red / SWIR band ratio, NDSI, surface tem71 perature, and glacial speed for Mount Rainier 's glaciers 5.1 7 LIST OF ACRONYMS AR Autoregressive ARD Analysis Ready Data ARIMA Autoregressive Integrated Moving Average ARMA Autoregressive Moving Average BLUE Best Linear Unbiased Estimator DEM Digital Elevation Model EM Expectation-Maximisation GAM General Additive Model GP Gaussian Process GLIMS Global Land Ice Measurements from Space GoLIVE Global Land Ice Velocity Extraction from Landsat 8 HMC Hamiltonian Monte Carlo MA Moving Average MAR Missing at Random MCAR Missing Completely at Random MCMC Markov Chain Monte Carlo MNAR Missing Not at Random NIR Near Infrared NDSI Normalised Difference Snow Index PCA Principal Components Analysis RMSE Root Mean Squared Error 8 SAR Synthetic Aperture Radar ST Surface Temperature SUR Seemingly Unrelated Regression SWIR Short Wave Infrared TIRS Thermal Infrared USGS United States Geological Survey WAIC Watanabe-Akaike Information Criterion 9 Acknowledgements To my parents, thank you for your love and support. I would not be here without you both. Jeff and Laura, I always look forward to speaking with you, I'm looking forward to many more trips into the mountains. Sarah, I wish you the best in SF, I'm very proud of you. Kevin, thank you for your mentorship; your patient guidance was crucial to finishing this work. Neil and Roger, thank you for your treasured counsel and wisdom. Gillian, Ryan, Andy, and Tiegan thank you for the many many miles, they were sorely needed. And finally to Julia. Thank you for being a ray of sunshine. I'm looking forward to our life in Calgary. 10 Chapter 1 Introduction Glaciers hold less than 2% of the world's total water volume; however they contain more than 68% of its freshwater supply. [1] Given the large and often remote nature of glaciers, it can be difficult to obtain reliable outlines of glacier extents. Manual delineation of the glacial outline can be time consuming and relies on the competence and experience of the technician [2]. Furthermore, clouds and debris cover may obscure the glacier, making the underlying ice hard to identify and model. This modelling can be performed in a Bayesian framework. Bayesian models are based upon developing a model structure for the data and the underlying parameters. These models can be used to make new predictions or understand the underlying process, but they are often expensive computationally, so tact with model complexity and approximation methods is essential. The work outlined in this thesis focuses on investigating methods to perform 11 Figure 1.1: Debris-covered toe of Emmons Glacier on Mount Rainier, August 1984 (Photo by Topinka Lyn). efficient Bayesian multivariate spatiotemporal analysis with application to the glaciers surrounding Mount Rainier (46.8523° N, 121.7603° W), an isolated stratovolcano in Washington State, USA. This work can be summarised by two parts. The first part creates a multivariate analysis that models the spatiotemporal date to fill in gaps in the missing data. The analysis is done under a Bayesian paradigm, with Gaussian process and autoregressive moving averages modelling the spatial and temporal data. Seemingly unrelated regression is used to model the multivariate response variables. The second section is a classification of the previous section's model output. The K-means, support vector machine, hierarchical clus- tering algorithms, and Gaussian mixture models are used for clustering the multivariate structure into glacier. The second chapter of this work gives a summary 12 on current data and methods used within the realm of geostatistics. The third chapter presents the data used in this analysis. The fourth chapter formalises the spatiotemporal model used in this work. The fifth chapter presents the results, and finally the sixth chapter provides a discussion and outlines future work. 13 Chapter 2 Literature Review 2.1 Methods for Glacier Mapping Glacier mapping, or determining the outline of a glacier, is done through multiple ways with a variety of data. The available data can be grouped as passive and active data. Active data includes lidar and radar data. While some success has been found mapping glaciers using lidar, initially a portmonteau of "light" and "radar" [3, 4], the data acquisition can be expensive. Synthetic aperature radar involves repeated pulses of radio waves, of which the echo is recorded and used to create a reconstruction of the landscape. While the radar data is able to "see" through clouds, there can often be quality issues in mountainous terrain. [5]. Passive data includes optical and thermal satellites. Optical sensors collect so- lar radiation reflected by the earth's surface in the visible and near infrared bands of the electromagnetic spectrum. Satellites short revist times and large swaths 14 make them useful for glacier mapping [6]. The Landsat 8 satellite, launched in 2013, has 11 multi spectral bands with resolutions of 15m x 15m and 100m x 100m, and a revisit period of 16 days, to minimise the seasonal snow cover, acquisition dates are typically limited to between August and mid September. This limits the viable images per year to 3-4 images. [7]. Table 2.1 shows the bands, wavelengths, and spatial resolutions of the Landsat 8 satellite. Band ratios or standardised difference ratios, typically calculated from the green, red, Near Infrared (NIR), and Short Wave Infrared (SWIR) bands, are commonly used as indices to differentiate between glaciers and the surrounding bedrock [8-11]. With this multispectral data, thresholding is a common technique, in which either a predetermined threshold is declared [9, 10], or the sample is inspected and a threshold is determined to bisect the sample [5]. As glaciers continue to melt, rock and debris cover increasing amounts of a glacier 's surface. Depending on the thickness of the debris cover, it can have Bands Band 1 - Coastal aerosol Band 2 - Blue Band 3 - Green Band 4 - Red Band 5 - Near Infrared ( NIR) Band 6 - Short Wave Infrared (SWIR) 1 Band 7 - SWIR 2 Band 8 - Panchromatic Band 9 - Cirrus Band 10 - Thermal Infrared (TIRS) 1 Band 11 - Thermal Infrared (TIRS) 2 Wavelength( nm ) Resolution(m) 0.43-0.45 30 0.45-0.51 0.53-0.59 0.64-0.67 0.85-0.88 1.57-1.65 2.11-2.29 0.50-0.68 I.36-1.38 10.6-11.19 II.50-12.51 Table 2.1: Multispectral Landsat 8 data 15 30 30 30 30 30 30 15 30 100 100 both an albedo effect and an insulation effect. Where debris cover is thin, it can decrease the albedo, or proportion of incoming radiation that is absorbed by a surface, causing the glacier to melt more quickly. As the thickness of the debris cover increases, the debris forms barrier between the underlying ice and the atmosphere, causing the glacier to melt more slowly. Withe the current multispectral thresholding techniques, debris cover is commonly a confounding variable, making surrounding bedrock and glacier debris indistinguishable. To help with the debris cover, various techniques and new data sources have been developed. Digital Elevation Models (DEM) have been used to create a slope or elevation variable to help model debris cover [10,12]. Synthetic Aperture Radar (SAR) data has also been used with offset tracking, in which motion between two images is calculated using a cross-correlation algorithm [13], or interferometry. In interferometry, the difference in two images is used to model the spatial change [14,15]. The National Snow and Ice Data Center released a dataset as part of the glacial velocity titled Global Land Ice Velocity Extraction from Landsat 8 (GoLIVE) project. This covers all glaciers with an area over 5 km2. Pairs of Landsat 8 images are superimposed upon one another. Using an image correlation algorithm, pixels corresponding to ice velocity are generated. [16]. Some recent success has been found using this data source [17, 18]; however, the data is noisy, and given it is a product of optical satellite data, the data can be frequently obstructed by cloud cover. Given a set of spatial images aquired over multiple years, mapping glaciers 16 can be done through a spatiotemporal paradigm. The simplest method uses an average at each spatial site to flatten the spatiotemporal data into spatial data [17, 19]. In a similar nature, some researchers use the year as a coefficient in linear regression [20, 21], These simple methods help efficiently model the glacier data, although there is a trade off in complexity. More complex models such as Autoregressive Moving Average (ARMA) [22] and (local) polynomial regression [23, 24] are also in use for deliniation of glacier outlines. Despite multiple variables sampled at each pixel, few true multivariate models are used to model the glacier data. Frequently, some supervised classification techniques are employed, such as multivariable logistic regression [25]; however, caution must be taken due to frequent high correlation among the covariables. In some research, the multivariate data is reduced through principal components analysis (PCA) [26], similar to the previously mentioned band ratios and differences. Co-Kriging is also used, where the undersampled singular variable of note is modelled by highly sampled multivariate covariates. Classification methods are varied and include the previously mentioned thresh- olding and multiple logistic regression. In addition, object-based clustering [12, 27], in which objects are based upon similar sample values and proximity, is often used and is akin to hierarchical clustering. Multivariate clustering techniques such as K-means and K-medoids, which minimise the error around cluster mean or median respectively, [28, 29] are also employed. Machine learning techniques, such as support vector machines and neural nets, have also been applied to glacier classification with some success [28, 30]. Instead of directly classifying data, the 17 Expectation-Maximisation (EM) algorithm is often used when there is missing or unlabelled data. The theory is that it is easier to estimate and model using a missing latent parameter, than to model the distribution directly. In this case, instead of classifying the glacier directly, the EM algorithm attempts to identify the unknown parameters of a mixture distribution, which then can be used to classify the glacier [15, 31]. Manual delineation is also used to define a glacier 's extent, in which an expert defines the glacier edge based on a visual interpretation of the glacier edge. This method is typically performed using colour composites from optical satellites. Manual deliniation may get high results; however, it can be time consuming and not applicable for the delineation of multiple glaciers [32]. 2.2 Contribution of this Work Overall, spatial methods are well utilised in current glacier remote sensing re- search, typically through a Kriging variant. The spatiotemporal framework is typically used to impute missing data with a temporally adjacent sample. Addi- tionally, the majority of research models are developed with a univariate response. The contribution of this work is to broaden the current methods in use in three ways. Firstly, this work is done within a Bayesian paradigm that allows the model parameters to vary and captures the uncertainties involved in the modelling and final classification of the glacier rather than relying on the user 's fixed knowledge of the spatiotemporal processes, as implicit in more commonly used methods. Secondly, this work avoids flattening the data in order to maintain its spatiotem- 18 poral nature. Finally, the model developed in this work has a multivariate re- sponse, to account for correlations within the errors of the response variables and to assist with the final glacier classification. 19 Chapter 3 Data The glaciers surrounding Mount Rainier were chosen for a variety of reasons. Firstly, it is a fairly isolated group of glaciers, so there is a clear split between glacier and surrounding area. Secondly, there was a recent glacier inventory of Mount Rainier in 2015 shown in Figure 3.1. Finally, as it is located in the United States, Landsat 8 Analysis Ready Data (ARD) is available. The United States Geological Survey ( USGS) releases the ARD as preprocessed data that has been tiled, and top of atmosphere and atmospherically corrected for direct analysis and immediate use. The data is also georeferenced so that the internal coordinate reference system matches between scenes. [33] The initial plan for this research involved incorporating radar data and inter- ferometry; however, there were issues with coregistering the SAR scans due to poor quality DEM data in mountainous regions. As a result only optical data from the Landsat 8 satellite and the GoLIVE project, also derived from Landsat 8 20 satellite, were used for this work. The data was clipped to a bounding box 26 x 20 km immediately surrounding Mount Rainier. 3.1 Preprocessing Some further processing for the ARD and GoLIVE data was needed. The USGS uses the Function of Mask (Fmask) algorithm to determine cloud cover on pixels. Theoretically, the SWIR band is able to distinguish between clouds and underlying ice; however, clouds made of ice crystals may cause the two to be indistin- guishable [34]. Along with the multispectral data outlined in Table 2.1, the USGS releases a quality assessment band, which show the cloud and cloud shadow cover results of the Fmask algorithm. This was used to remove cloud and cloud shadows. Figures 3.2 and 3.3 show the pattern of missing data, and Table 3.1 shows the proportion of missing data per year. The data is relatively complete in years 2013, 2016, and 2017. In 2015, a pair of clouds covers up most of the glacier, causing most of the image to be removed. In 2019, the most cloud-free image is one that bisects the glacier, leaving roughly a third of the image complete. In 2014, the western half of the image is covered in clouds causing half of the image to be removed. The resolutions between the ARD and GoLIVE data also differed, the ARD data is 30m x 30m, while the GoLIVE data is 100m x 100m. Given that three quar- ters of the data is at a higher resolution, the GoLIVE data was resampled to 30m x 30m using a nearest neighbour algorithm. The data was clipped to a approximate 21 20km x 20km area surrounding Mount Rainier, resulting in approximately 800,000 observations. The additional processing of the ARD and GoLIVE data is as follows. Firstly, all ARD and GoLIVE images were downloaded from 2013-2019 for the period between August 01 to September 21. Data access data was March 2019 for years 2013-2018, and October 2019 for 2019. Secondly, data was clipped to the area im- mediately surrounding Mount Rainier Thirdly, cloud and cloud shadows were removed using the quality assessment band. Fourthly, for each year, the most com- plete image was selected. Fifthly, ARD and GoLIVE images from each year were combined to create a multivariate spatial matrix. Finally, images were stacked together to create a multivariate spatiotemporal matrix. Note that the release date for the ARD program was November 2017. Given its relative infancy, by March 2019 not all Landsat 8 imagery for 2013-2018 was updated into the ARD program. As a result, more cloud-free images for 20132018 have been added after the data download and processing was completed. Year 2013 2014 2015 2016 2017 2018 2019 Aquisition Date Proportion of Cloud Cover over Area of Interest 0.28 August 20 0.56 August 07 0.53 September 20 0.20 September 13 0.17 15 August 0.40 August 18 0.68 August 14 Table 3.1: Aquisition Dates and proportion of incomplete observations due to cloud cover for 2013-2019. 22 46.95 - 46.90 © - 3 ro -1 46.85 - 46.80 -121.8 -121.7 -121.6 Longitude Figure 3.1: SWIRl -NIR-red composite of Mount Rainier on August 23, 2020. The red outline represents the extent of Mount Rainier 's glaciers in 2015. This extent was obtained from Andrew Fountain, Portland State University. 23 46.95 - 0) 46.90 - D 3 jo 46.85 46.80 - It 2013 n: J*TJ -121.8 46.95 - o 46.90 - 46.80 - 1ro 46.85 - sv -121.7 -121.6 -121.8 longitude — dj O 46.95 - 46.90 - " 46.95 2014 5 O - = ° ro 46.85 - - 46.80 -121.8 -121.7 - -121.6 -121.7 longitude 46.95 - 46.90 - jc 46.85 - 46.80 - dj TD 5 46.95 - 0) 46.90 - o 3 ra 46.85 46.80 -121.7 - -121.6 longitude d) 46.90 D " longitude - 20 1 6 _ro 46.85 - 53 46.80 -121.6 longitude 20 1 5 -121.8 46.95 -121.6 o 469 - 3 « ro 46.85 46.80 -121.7 longitude 46.95 - 0) 46.90 - a 3 = ro 46.85 - * - 46.80 -121.8 -121.7 -121.6 longitude *10 * V -121.8 -121.6 longitude Figure 3.2: SWIRl -NIR-red composite image pairs of Mount Rainier for 2013-2016. Left image is complete image, right image has clouds and cloud shadows removed . 24 46.95 - d) 46.90 - O " 2017 5 to 46.85 46.80 46.95 - CD 46.90 - TD 3 to 46.85 - - 46.80 -121.8 -121.7 -121.8 -121.6 longitude (D D " 46.95 - 46.90 - 2018 3 to 46.85 46.80 -121.8 46.95 - ( B ) = (1 B ) d ( 1 — pBp ) c|)pBP ) « l — (tfB (4.11 ) 4>;+dBp+d ) . (4.12) Here, the root of 1 + e for e > 0, is taken. Therefore an ARIMA(p, d, q ) process can be approximated as an ARMA(p + d, q ) process, and we can approximate a non-stationary process with our ARMA model. There are a few other ways of determining ARMA model order. The first is a fully Bayesian model, where a prior is put on the order of parameters, and the 39 MCMC samples are allowed to transition between the various orders. [55] The second is an empirical Bayesian model, where models of different orders, p, q , are used to fit to the data and the best model is chosen. This is the method that is used for this work. Given that additive ARMA Gaussian prior models have been used with up to ARMA(2,2) parameters [44, 56, 57], all 14 combinations of ARMA processes up to 4 total parameters p + q are fit. 4.7 Gaussian Process Usually in simple linear regression analysis, the form of a regression is y = (3o + |3 ix + e, and Po, Pi are the intercept and slope parameters. The core of a Gaussian process is that the regression function is not fixed, but rather the process has a prior distribution. We can look at this as y = f (x ) + e, with f (x ) ~ GP ( m ( x ), K ( X ) ) ) where GP ( ) is the Gaussian prior distribution for the functional f . The GP has parameters of mean function m and covariance kernel K. Non-Bayesian Gaussian processes in the geostatistics world have been referred to as Kriging [58], named after the South African mining engineer Danie Krige, who developed this method to look for iron ore deposits [59], Traditionally, Krig- ing consists of multiple steps, including exploratory statistical analysis of the data, variogram modelling, creating the surface, and optionally exploring a variance surface and yielding a predicted mean. If the output is Gaussian and covariance function is exactly known, it produces the best linear unbiased estimator (BLUE). Bayesian methods, as previously noted, do not require a completely known vari- 40 ance and do not yield a mean estimate, but rather a posterior distribution from which a mean can be extracted. The mean function of a Gaussian process m ( x ) defines the base function around which all of the realizations of the Gaussian process will be distributed. In this case, we will assume it be 0, as there is a non-zero intercept AS t for each sample / location. If we were to have a more complex Gaussian process with a non-zero mean function it could simply be added to the intercept. The Gaussian process is used to model the deviation due to surrounding points in this regression model. The covariance kernel K ( X ) = k (x, x'), for x the mean function. The larger the covariance x ' controls how around ^between x and x' the more their f varies is outputs f ( x ) and f ( x') vary. [60, 61] In this work, three covariance kernels are compared. Firstly, the squared expo* * * , secondly the spherical: a2 nential: o^ exp 0.5 ' 22 ' and finally the Matem ) + \ u2exp Here, l is length parameter, controlling \ the smoothness of the response. In terms of the application in this work, we can view x and x' as spatial coordinates and l = |x — x'| [37]. Figure 4.1 shows an application of Gaussian processes on a toy dataset, each blue line is a random sample of a Gaussian process. The strength of Gaussian processes comes from the prediction. If there is a Gaussian observation model, y - N , a , we can estimate p at new locations ( y. 2 ) 41 . Figure 4.1: Demonstration of Gaussian process regression on a toy dataset. Top figure is the data, made of sums of sine functions with random noise. Bottom figure is 100 samples from the posterior of a Gaussian process regression function. through a joint density and then find the marginal density of ( j.new [37] y ~N 0 K ( X, X ) + CT2 I K ( X, X ) K ( x, x) 0 42 K ( x, x ), (4.13) with, (4.14) gV, y, T, l, cx ~ N(E( g'), cov ( g')), ^— ^ ^ ^ E ( g') = K ( x', x ) K ( x, x ) + cr21 cov ( g') = K ( x /, x / ) (4.15) y, K ( x', x ) K ( x, x ) + o2! K ( x, x'). (4.16) On first glance, estimating the posterior distribution of g and the new g' may seem simple; however each matrix inversion requires O (n3 ) computations for each MCMC iteration. And thus the computation becomes very expensive with n being more than a few thousand, which is the case here. In a fully Bayesian analysis, the length parameter, l, would be a distribution and be allowed to vary over MCMC samples. In this work, l was fixed to be 12, this brought the memory and time required from three weeks and 64 gigabytes of RAM to one week and 28 gigabytes of RAM. The Gaussian process was coded in Stan, modified from code in the Stan user 's guide [62]. 4.8 Model Choice As discussed in the previous sections, 14 orders of ARMA(p, q ) models and 3 different Gaussian process kernels are being compared. A choice needs to be made that balances the goodness of fit with model complexity. Given that this model will be used to make predictions of the missing data, evaluation of the predicted mean should be measured [63]. Root Mean Squared 43 Error (RMSE ) has been employed for measuring this predictive efficacy in Bayesian models [64, 65]. While the output of a model is a posterior distribution, any statistic may be calculated from this distribution. In this case, the predicted mean can be calculated from the distribution and RMSE calculated henceforth. RMSE = / Eilltx — E( ) ) ty X 3 (4.17) Note, due to the lack of independence between each of the predicted variables, RMSE is calculated for each of these variables. When assessing model performance, complex models with more parameters typically out perform simpler models. [66]. The Akaike Information Criterion (AIC), developed by Japanese statistician Hirotugu Akaike in 1974, is a statistic frequently used to balance goodness of fit of a model with model complexity [66, 67], In 2010, the Bayesian extension, Watanabe-Akaike Information Criterion or Widely Available Information Criterion (WAIC) was developed as an extension for Bayesian models [68]. Note that the WAIC asymptotically approximates the AIC [68]: WAIC = Tn -| Vn-. —n (4.18) Where Tn corresponds to how the model fits the data, and Vn corresponds to the functional variance. 44 ^ XI= logp (Xilw) Vn = Y_ {E [ ( logp ( Xilw ) ) ] Ew [logp ( Xilw )] } . ^ Tn = - * (4.19) i 1 Tl i=1 2 >v - 2 (4.20) Similarly with the RMSE, in this work the WAIC is calculated for each response variables. The WAIC calculation was coded in Stan, modified from code in the Stan user 's guide [62]. The RMSE was calculated using R calculated from the mean model response. 4.9 Classification As a part of the Global Land Ice Measurements from Space (GLIMS) initiative, a census of Mount Rainier 's glaciers was done by Andrew Fountain from Portland State University in 2015 [69]. In this thesis a variety of different classification tech- niques are used to classify the multivariate matrix of predicted variables as glacier. Hard algorithms, in which the classification is expressed as a binary outcomes and soft algorithms, in which the classification is expressed as a probability, are described in the following subsections. 4.9.1 Hard K -means Classical (Hard ) K-means is a iterative clustering algorithm in which n observations are partitioned into K clusters. The clusters are created to minimise the intra- 45 cluster sum of squares, kkfai w= “ Wc ) 2 (4.21) N k=1 i=1 where Uk is the mean of each cluster, and I is an indicator function for a particular observation belonging to a cluster. The hard K-means algorithm was performed on the two clusters of glacier and non-glacier. 4.9.2 Soft K -means Soft K-means, is similar to Hard K-means, being an iterative classification technique with K clusters; however, each yn may be assigned to different cluster, with probability p ( Ik ): K W= N J2=1 = p (k ) ( k ^ i 1 yi - pk ) 2 - (4.22) In classical K-means, the iterative process assigns each observation to a cluster with an index function. This is akin to assigning each observation with a proba- bility of 1. Soft K-means differs in that the probability of an observation belonging to any cluster is no longer 1, but can range between 0 and 1. The Soft K-means algorithm was performed on the two clusters of glacier and non-glacier. 4.9.3 Support Vector Machines Support vector machines (SVM) are a classification technique that has gained pop- ularity over last few decades through machine-learning development. In SVM, 46 the data is mapped to a higher dimension via a kernel and an optimal hyperplane segments the data into groups. The radial basis function kernel used in this analysis is written as, ( K (yi/ Uj) = exp -Y (yt - yj) 2 ) • (4.23) The gamma parameter defines the influence of a singular observation. Higher gamma functions are more influenced by a single outlier, and thus more prone to overfitting. The gamma function of 0.25 was chosen via an iterative search from 0 to 1. The threshold balances correct classification with maximisation of the decision function. As the threshold increases, the classification becomes less sensitive to errors. The threshold value of 0.1 was chosen through an iterative search from 0 to 1 to minimise the sum of type I and II errors. 4.9.4 Hierarchical Clustering Hierarchical clustering builds a hierarchy of nested clusters. The agglomerative technique here is a bottom-up algorithm pairs of observations are combined together until one final cluster remains. In order to choose the pairs of clusters being merged, a dissimilarity matrix is created, and a linkage criterion is applied to create clusters. While it is not necessary to predetermine the number of clusters in hierarchi- cal clustering, given it is a series of nested observations, once observations are assigned to a cluster, they will never separate. If the clusters are not truly nested, 47 hierarchical clustering is not appropriate. Complete, single and average linkages were investigated in the analysis, with average linkage performing the best and used for the final analysis. 4.9.5 Gaussian Mixture Models A mixture model is a probabilistic model for representing groups within a popu- lation. If we assume a Gaussian mixture, we have the following f (*) = ^ N i =1 p (*i ) 9i (x), (4.24) 9i ~ M ( Md / °i ). (4.25) The EM algorithm is an iterative process often used to estimate missing data. In this case, we will assume that the data is a mixture of two distributions, one for the glacier and another for the surrounding area. This Gaussian mixture model £ uses the EM algorithm to estimate the mean n and variance cr for each of the * mixtures. The EM algorithm alternates between an expectation step, which creates a function to for the expectation of the log-likelihood given the parameters, and the maximisation step, which computes new parameters to maximise the expected log-likelihood found in the expectation step. These new parameters are then used in subsequent expectation steps and the algorithm repeats until the values converge. The Gaussian mixture model was coded in Stan, using modified code from the Stan users guide [62] and executed in R using the RStan package. 48 4.10 Principal Components Analysis Principal components analysis (PCA) is a statistical method that aims to find the principal directions in which the data varies. It takes linear combinations of the data in order to capture the maximal variance. The principal components themselves are found though calculating the eigen- value and eigenvectors of either the variance-covariance matrix or the correlation matrix. The eigenvector with the largest eigenvalue denotes the direction with the highest variation, and subsequently the second highest eigenvalue's eigenvector is the second highest variation. If the set of all eigenvectors are normalised, they form an orthonormal basis for the multivariate data structure. If only the eigenvectors that capture the most variance are selected. PCA provides a tool for dimension reduction that limits the loss in variance. Due to the differences in units and scale of the variables of interest in this work, PCA is performed on the correlation matrix. The principal components analysis was performed on the correlation matrix in R. 49 Chapter 5 Results 5.1 Glacier Model Choice All 42 combinations of ARMA(p, q ) models up to 4 parameters and 3 Gaussian process kernels were modelled. The RMSE and WAIC for each of the four variables of interest are shown in Tables 5.1 and 5.2. Note that lower RMSE and lower WAIC correspond to better model fit. No single model was lowest in each of the response variables, so the RMSE were combined via an Euclidean distance, and the WAIC were combined via a sum. The ARMA(3,1) model with exponential Gaussian process kernel was lowest in both of these metrics. This model is highlighted in red in Tables 5.1 and 5.2 and is used for prediction of the missing data and subsequent classification. 50 ARMA Process ARMA (0,1) ARMA (0,2) ARMA (0,3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA (1,2) ARMA (1,3) ARMA (2,0) ARMA (2,1) ARMA (2,2) ARMA (3,0) ARMA (3,1) ARMA (4,0) ARMA (0,1) ARMA (0,2) ARMA (0,3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA (1,2) ARMA (1,3) ARMA (2,0) ARMA (2,1) ARMA (2,2) ARMA (3,0) ARMA (3,1) ARMA (4,0) ARMA (0,1) ARMA (0,2) ARMA (0,3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA (1,2) ARMA (1,3) ARMA (2,0) ARMA (2,1) ARMA (2,2) ARMA (3,0) ARMA (3,1) ARMA (4,0) Surface Gaussian Glacier Band Process Temperature Ratio NDSI Speed Pooled RMSE RMSE RMSE RMSE RMSE Kernel 8.73 7.56 0.153 0.175 11.556 Spherical 7.89 7.20 0.149 0.170 10.692 Spherical 7.11 9.755 6.66 0.138 0.166 Spherical 6.84 6.51 0.131 0.163 9.456 Spherical 0.169 9.387 6.68 6.58 0.130 Spherical 6.46 0.126 0.158 6.69 9.310 Spherical 6.47 0.125 0.157 9.449 6.88 Spherical 6.92 6.42 0.123 0.158 9.451 Spherical 6.49 0.128 0.164 6.57 9.243 Spherical 6.70 6.57 0.125 0.158 9.397 Spherical 6.47 5.29 0.125 0.156 8.363 Spherical 6.84 6.44 0.123 0.160 9.405 Spherical 6.92 6.42 0.123 0.158 9.450 Spherical 0.159 7.00 6.42 0.123 9.504 Spherical Matem 5.92 8.52 0.209 0.162 10.386 Matem 5.55 8.00 0.195 0.160 9.747 Matem 5.26 9.008 7.30 0.177 0.158 Matern 5.11 7.13 0.163 0.156 8.784 Matern 4.93 7.19 0.160 0.159 8.731 Matern 4.83 7.08 0.154 0.153 8.578 Matern 4.83 7.08 0.154 0.153 8.583 Matem 4.80 7.03 0.151 0.153 8.524 7.09 0.156 0.157 Matem 4.87 8.614 Matem 7.18 0.156 0.153 8.662 4.82 7.06 0.153 0.153 Matem 4.78 8.531 Matem 4.80 7.06 0.154 8.544 0.152 4.72 6.99 Matem 0.152 0.153 8.445 Matern 4.82 7.03 0.152 0.154 8.531 7.74 5.22 0.150 0.156 9.345 Exponential 4.86 7.35 0.146 0.155 8.822 Exponential 4.71 6.78 0.134 0.153 8.264 Exponential 4.62 0.126 6.65 0.152 8.110 Exponential 4.44 6.70 0.128 0.154 8.046 Exponential 4.40 6.60 7.936 0.123 0.150 Exponential 4.38 6.60 0.122 0.150 7.925 Exponential 4.30 6.56 0.119 0.150 7.910 Exponential 4.41 6.61 0.125 0.152 7.958 Exponential 4.33 6.66 0.124 0.150 7.955 Exponential 4.39 6.56 0.122 0.150 7.905 Exponential 7.912 4.39 6.57 0.120 0.150 Exponential 0.120 7.881 6.55 0.150 4.37 Exponential 4.39 6.55 0.120 7.894 0.150 Exponential Table 5.1: RMSE for all potential models and variables of interest. The chosen model is highlighted in red. 51 ARMA Process ARMA (0,1) ARMA (0,2) ARMA (0,3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA(1,2) ARMA (1,3) ARMA (2,0) ARMA (2,1) ARMA (2,2) ARMA (3,0) ARMA (3,1) ARMA (4,0) ARMA (0,1) ARMA (0,2) ARMA (0,3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA (1,2) ARMA (1,3) ARMA (2,0) ARMA (2,1) ARMA (2,2) ARMA (3,0) ARMA (3,1) ARMA (4,0) ARMA (0,1) ARMA (0,2) ARMA (0 3) ARMA (0,4) ARMA (1,0) ARMA (1,1) ARMA (1,2) ARMA ( l 3) ARMA (2,0) ARMA (2,1) ARMA ( 2,2 ) ARMA (3,0) ARMA (3,1) ARMA (4,0) / / Gaussian Process Kernel Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Spherical Matern Matern Matern Matern Matern Matern Matern Matern Matern Matern Matern Matern Matern Matern Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Exponential Surface Temperature WAIC 217,735 216,023 215,083 213,836 214,078 213,881 213,263 213,191 213,992 213,562 213,346 213,152 213,137 213,136 217,914 216,181 215.227 214,009 214,214 214,017 213,408 213,339 214,127 213,702 213,367 213,308 216,597 213,288 217,840 216,102 215,155 213,909 214,142 213,948 213,335 213,266 214,056 213,669 213,331 213.228 213,219 213,211 Band NDSI Ratio WAIC WAIC 140,614 -20,621 138,278 -22,745 136,887 -26,053 136,585 -27,270 136,565 -26,422 136,399 -27,771 136,387 -27,774 136,373 -27,657 136,413 -27,434 136,803 -27,768 136.350 -28,009 136,382 -28,056 136,338 -27,671 136,372 -27,687 140,080 -19,399 137,784 -21,658 136,419 -24,980 136,148 -26,268 136,158 -25,593 135,987 -26,859 135,979 -26,864 135,965 -27,246 136,000 -26,553 136.351 -25,732 135,917 -26,885 135,974 -27,135 135,422 -27,421 135,959 -27,279 140,716 -20,595 138,363 -22,727 136,957 -26,036 136,655 -27,248 136,631 -26,397 136,468 -27,748 136,454 -27,751 136,442 -28,129 136,480 -27,413 136,884 -26,508 136,396 -27,767 136,449 -28,029 136,403 -28,142 136,438 -28,161 Glacier Speed WAIC -18,234 -18,775 -19,185 -19,394 -18,686 -19,746 -19,724 -19,224 -19,243 -19,735 -19,816 -19,554 -19,227 -19,151 -18,199 -18,743 -19,166 -19,379 -18,646 -19,727 -19,729 -19,731 -19,224 -19,752 -19,802 -19,533 -19,709 -19,650 -18,184 -18,738 -19,148 Pooled WAIC 319,494 312,781 306,731 303,756 305,535 302,763 302,152 302,683 303.728 302,861 301,872 301,924 302,577 302,669 320,396 313,565 307,500 304,509 306,133 303,419 302,793 302,327 304,351 304,569 302,596 302,614 304,889 302,318 319,777 313,000 306,929 -19,362 303,953 -18,647 305.729 -19,711 302,956 -19,699 302,338 -19,686 301,892 -19,200 303,923 -16,491 307,554 -19,771 302,189 -19,521 302,126 -19,686 301,795 -19,625 301,862 Table 5.2: WAIC for all potential models and variables of interest. The chosen models is highlighted in red . 52 5.2 Model Prediction Posterior predictive checks are a standard model check procedure for Bayesian models. Data is repeatedly simulated from the fitted model and compared to the observed data. This repeated sampling allows the check to include the uncertainty associated with the estimate of the parameter of the model, along with the estimate itself . Figures 5.1 and 5.2 shows the posterior predictive checks for the model. Overall the posterior distribution approximates the observed data distribution well except for the NDSI variable. The NDSI distribution is bimodal, with the modes of the posterior distribution far apart. As the MCMC samples are taken in sequence, with the next sample being dependent upon the single previous sample. If the modes are far apart, the MCMC samples may have dif - ficulty transitioning between the two modes. However, as the NDSI distribution modes are far apart, this may make for easier classification, as the two mixtures can be easily identified. In contract with the Red / SWIR band ratio or the surface temperature, the small deviations from the Gaussian distribution may indicate a mixture with the modes much closer together. Apart from the NDSI variable, the posterior predictive distribution estimates the observed distribution well. 53 ^ Data MCMC Samples 7 3 7 (a ) Posterior predictive distribution of the Red / SWIR band ratio Data MCMC Samples A -1.0 -0.5 0.0 0.5 (b) Posterior predictive distribution of the NDSI Figure 5.1: Density plots of the posterior predictive distributions of the Red /SWIR band ratio and NDSI. The samples from the posterior simulate the observed distribution fairly well with the exception of the bottom NDSI figure. The NDSI observations have a bimodal distribution, which MCMC sampling has trouble converging upon. 54 Data MCMC Samples 270 280 310 300 290 320 (a ) Posterior predictive distribution of the surface temperature — Data MCMC Samples 1 2 3 4 (b) Posterior predictive distribution of the glacier speed Figure 5.2: Density plots of the posterior predictive distributions of the surface temperature and glacier speed. 55 The mean predictions of this model are shown in Figures 5.3, 5.4, 5.5, and 5.6. Overall the model shows clear images of the glaciers without large artifacts with some exceptions. In 2013, the forward predicting ARMA time series has no sup- port, so the 2013 prediction is solely made with the approximate Gaussian process. 2015 has a large amount of missing data, leading to poor prediction, especially in the NDSI variable. Despite missing half the image, 2019 was modelled well compared to 2015. This may be due to the increased number of preceding years. Due to there being no 2012 observations, in 2015 the third and fourth lags in an AR(4) model add nothing, whereas in 2019 more previous years are available. Figures 5.7, 5.8, 5.9, and 5.10 show the residual plots calculated from the posterior means. Due to the prevalent missing data, they are somewhat difficult to inspect. However, apart from 2013 and 2016, the residual plots appear to be randomly distributed. As previously discussed, the forward predicting time series had no support in 2013, so the residuals are not randomly distributed and large artifacts are found throughout the image. The residuals in 2016 appear to suffer from a similar problem, the large amount of missing data in 2015 again affects the time series predictions and there is some non-randomness seen in the western half of the Red / SWIR band ratio image. While the predictions in 2015 appear to be imperfect, due to the missing data, there are no residuals to calculate and inspect. 56 46.95 46.90 - 46.85 46.80 - 46.95 46.90 - 46.85 - Estimate of Red/SWIR Band Ratio 46.80 - 0.0 2.5 46.95 - 5.0 7.5 46.90 - 10.0 46.85 46.80 -121.7 -121.6 46.95 46.90 46.85 46.80 - -121.8 -121.7 -121.6 Longitude Figure 5.3: Grid of mean estimate of Red / SWIR Band ratio of Mount Rainier 's glaciers 2013-2019. 57 46.95 - 46.90 - 46.85 46.80 - 46.95 - 46.90 - 46.85 - 46.80 - Estimate of Surface Temperature ( kelvin) 270 46.95 - 46.90 - 46.85 - 46.80 - 280 290 300 310 -121.7 46.95 - 46.90 - 46.85 - 46.80 -121.8 -121.7 -121.6 -121.6 Longitude Figure 5.4: Grid of mean estimate surface temperature of Mount Rainier 's glaciers 2013- 2019. 58 46.95 - 46.90 - 46.85 46.80 - 46.95 - 46.90 - 46.85 46.80 Estimate of NDSI - (D -1.0 o ~ 3 -0.5 ro 46.95 - 0.0 i 46.90 46.85 - I 0.5 46.80 - -121.8 -121.7 -121.6 46.95 46.90 46.85 - 46.80 - -121.8 -121.7 -121.6 Longitude Figure 5.5: Grid of mean estimate NDSI of Mount Rainier 's glaciers 2013-2019. 59 46.95 - 46.90 - 46.85 - 46.80 - 46.95 - 46.90 - 46.85 - 46.80 - Estimate of Glacier Speed (m/day ) I 0.0 46.95 - 46.90 - 46.85 - 46.80 - |0.5 I ° -121.8 46.95 - 46.90 - -121.7 -121.6 46.85 46.80 -121.8 -121.7 -121.6 Longitude Figure 5.6: Grid of mean estimate of glacier speed of Mount Rainier 's glaciers 2013-2019. 60 2013 - -." Y-. 46.90 - 46.95 a 1 T 46.85 - 46.80 - 46.95 - 46.90 - 46.85 - 46.80 - a " L K V --1 H . A* - . i -- V J r Residaul of Red/SWIR Band Ratio O " 5 -5 CD 46.95 - 46.90 - 46.85 - 46.80 - 0 “ % 5 L. .- -121.8 2019 46.95 - 46.90 - 46.85 - 46.80 -121.8 -121.7 -121.7 -121.6 -121.6 Longitude Figure 5.7: Grid of residuals of Red / SWIR Band ratio of Mount Rainier 's glaciers 20132019. The grey indicates missing data due to clouds or cloud shadows. 61 2013 46.95 - 46.90 - T-/ -'4* 46.85 - L -sr v "1 V ‘ 46.80 - !* .. B ^ r- - . 7r : I - r . :- - 2014 '1 I" * w . VH ' : ' 2016 46.95 - 46.90 - 46.85 - 46.80 - -’1 / r ~ * \ 7. r . _ .i . .t ; "% .& s “ I * . 11 Residual of Surface Temperature J (K) v / ' r* 2016 46.95 - 46.90 - 46.85 - 46.80 - 46.95 - 46.90 - 46.85 - 46.80 - - -< y r ~ * \ 7 r " m% " l . •• . * .- d" / v• . Residual of NDSI -1.0 -0.5 CD O " 5 CD 0.0 0.5 1.0 -121.8 46.95 - 46.90 - 46.85 - 46.80 -121.8 -121.7 -121.7 -121.6 -121.6 Longitude Figure 5.9: Grid of residuals of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows. 63 46.95 46.90 46.85 - 46.80 - 46.95 - ' 46.90 46.85 - 46.80 - © V r= A * Residual of Glacier Speed (m/day) -4 3 -2 46.95 - 0 2 46.90 46.85 - 46.80 - 4 -121.8 46.95 - 46.90 - 46.85 - 46.80 -121.8 -121.7 -121.7 -121.6 -121.6 Longitude Figure 5.10: Grid of residuals of glacier speed of Mount Rainier 's glaciers 2013-2019. The grey indicates missing data due to clouds or cloud shadows. 64 5.3 Classification The classification was performed using the algorithms outlined in Section 4.9 to classify the glacier and non-glacier areas for 2015. The total classification was calculated as a weighted average of the glacier and non-glacier classifications with respect to the amount of pixels. Figure 5.11 shows the classification of the final model sampled from the posterior for 2015 with an outline of the 2015 glacier mapping superimposed upon each image. Table 5.3 summarises the results for true values of glacier and sur- rounding area. Overall, all classification methods failed to classify the debris covered tongues of the northeastern glaciers. The Gaussian mixture model has the highest accuracy in classifying the glacier, and it does pick out some potential toes on the North edge of the glacier; however, it misclassifies these toes, showing them offset in adjacent locations. Compared to the other probabilistic classification techniques, soft K-means, Gaussian mixture model outperforms on both glacier and non-glacier accuracy. However, the Gaussian mixture model appears to classify too much glacier, and the edges of the glacier often protrude outside the red boundaries. The reason Gaussian Mixture Models still outperforms soft K-means is that the soft K-means algorithm had difficulty determining the non- glacier areas. There are large striations in the soft K-means image, and these add up and contribute to its low classification of the surrounding area. The hard K-means and soft K-means perform similarly. Each algorithm finds the true edges of the glacier well, outside of the northern toes. However, both algorithms have difficulty finding the outline between adjacent glaciers. As pre65 Probability of Correct Classification Gaussian Hierarchical Support Hard K -means Band Ratio Glacier Mixture Hard K-means Clustering Soft K-means Vector (2016) Models Machines Yes 0.874 0.852 0.803 0.819 0.764 0.688 No 0.854 0.943 0.832 0.958 0.973 0.940 Total 0.858 0.922 0.915 0.830 0.919 0.916 Table 5.3: Summary table for the classification of Mount Rainier 's glaciers in 2015. The total classification was calculated as a weighted average of the glacier and non-glacier classifications with respect to the amount of pixels. viously noted, the large classification striations in the soft K-means image add up and contribute to its low classification accuracy of the surrounding area . When it comes to finding the outline between adjacent glaciers, the support vector machine is the best model. While the hierarchical clustering and both K means algorithms also did this to an extent, the support vector machine best sep- arated adjacent glaciers. This is seen in the highest classification accuracy of the surrounding area. However, the support vector machine algorithm performed the worst in classifying the glacier itself, with the lowest accuracy of 76.4%. Support vector machines work by drawing a hyperplane to segment the data into groups. As the model predicted variables appear to underestimate the edges of the glacier, the SVM may draw the hyperplane to underlassify the glacier. The hierarchical clustering algorithm had the second highest classification ac- curacy of the non-glacier area at 94.3%, just behind of the support vector machine at 95.8%. However it was least effective for the glacier itself , correctly classifying only 80.3%. To show comparison to current work done in a univariate non-Bayesian para66 digm, a hard K-means classification was performed. As the 2016 data is much more complete than the 2015 data, it was performed on the red / SWIR band ratio on the 2016 data. When classifying the non-glacier areas, the univariate classification performed very well, this is seen in the highest non-glacier classification of 97.3%. This is clearly seen in the middle of Mount Rainer 's glaciers, where only the support vector machines and the univariate K-means managed to delineate the individual glaciers. Despite this, the univariate K-means had the lowest classification of the glaciers itself as it correctly classified 68.8% of the glacier. This suggests that this multivariate and Bayesian analysis adds value. Overall, the hard K-means algorithm appears to perform best. It does not pick up on any glacial toes; however, it does classify the easternmost isolated glacier, and does a good job of isolating adjacent glaciers, unlike the Gaussian mixture model. And unlike the soft K-means, the h K-means accurately classifies the surrounding bedrock as bedrock. While it does not have the highest accu- racy of either the glacier or surrounding area, it is marginally behind the leading algorithms, and it has the highest overall accuracy of 92.2%. 67 Hard K-means 46.95 - l 46.90 - 46.85 - 46.80 - Hierarchical Clust HKM BR only (2016) 46.95 - Probability of Glacier 1.00 46.90 D < 0.75 TD 3 0.50 03 46.85 - 0.25 0.00 46.80 Soft K-means 46.95 - 46.90 - 46.85 - 46.80 - -121.8 -121.7 -121.8 -121.6 Longitude -121.7 Figure 5.11: Glacier classification techniques for Mount Rainier 's glaciers in 2015. The 2015 glacier extent is outlined in red . 68 5.4 Principal Components Analysis In this work, the PCA was performed using the correlation matrix (or equivalently the covariance matrix with standard-normalised variables) on the model output of the four variables of interest for the 2015 data. The screeplot in figure 5.12 shows that at least 3 principal components should be used. Given that there is little difference between the eigenvalues of the third and fourth principal com- ponents, four principal components were chosen. The loadings of the principal components are shown in Table 5.4. The loadings of the first principal component suggests a contrast between the Red / SWIR band ratio and the glacier speed with the surface temperature. The second principal component is the NDSI. The third principal component is the contrast with Red / SWIR band ratio with glacier speed, and finally the fourth principal component is the contrast of Red / SWIR band ratio and surface temperature with glacier speed. 69 Loadings of PCA of mean estimates PCI PC2 PC3 PC4 Band Ratio 0.535 -0.293 0.688 0.394 NDSI 0.206 0.955 0.190 0.099 Surface Temperature -0.598 0.041 0.025 0.800 Glacier Speed 0.560 -0.028 -0.700 0.442 Table 5.4: Loadings of principal components analysis of the variables of interest. The screeplot in Figure 5.12 suggests all four principal components are necessary. Given that all four principal components are used and there is no dimension reduction, this suggests that only a rotation is being applied upon the data. As Gaussian mixture models, the K-means algorithm, hierarchical clustering, and support vector machines with radial bases (as are used here) are invariant under rotation [70-72], this does not aid in the classification. Table 5.5 shows the results of the classification upon the rotated data; there is minimal change compared to Table 5.3. Note that approximate principal components are used for ease, so it is not a true orthogonal transformation, and thus minor differences are seen. Despite not being able to aid in the classification, the PCA does add value. It suggests that one single variable (a linear combination of multiple variables) does not capture enough of variability in the model. So having a model with all four multivariate responses, as in this work, better models the spatiotemporal process compared to a univariate model. 70 Dimensions Figure 5.12: Screeplot of estimated Red / SWIR band ratio, NDSI, surface temperature, and glacial speed for Mount Rainier 's glaciers. Probability of Correct Classification of Rotated Data Gaussian Hierarchical Support Hard K-means Glacier Mixture Hard K-means Clustering Soft K-means Vector Band Ratio (2016) Models Machines Yes 0.851 0.805 0.818 0.764 0.868 0.688 No 0.859 0.940 0.945 0.836 0.958 0.973 Total 0.857 0.922 0.915 0.829 0.919 0.916 Table 5.5: Summary table of the rotated data for Mount Rainier 's glaciers in 2015. The univariate classification was not rotated but added for comparison. The total classification was calculated as a weighted average of the glacier and non-glacier classifications with respect to the amount of pixels. 71 Chapter 6 Conclusions and Discussion In terms of satisfying its research goals, this work successfully modelled the spa- tiotemporal data of the glaciers surrounding Mount Rainier, and it had mixed successes with the delineation of said glaciers. The delineation outperformed a com- parable univariate analysis; however it failed to classify any of the debris covered glacier tongues. In order to do this modelling, a spatiotemporal GAM was chosen for its flexibility and computational speed, with ARMA processes and Gaussian processes respectively modelling the temporal and spatial autocorrelations. The ARMA (3,1) model with the exponential Gaussian process kernel was chosen via lowest pooled RMSE and WAIC of all four variables. Model predictions were then created by sampling from the posterior. These predictions were classified with a variety of techniques, from which the hard K-means algorithm measured by euclidean distance was determined to be the best. A further PCA was performed which showed that a reduction in the number of variables could not capture the 72 variability within the model and that multivariate response regression as justified. Furthermore compared to a univariate classification of the Red / SWIR band ratio only, the hard K-means multivariate classification outperformed the univariate classification by 85% to 68%. One major limitation in this research is that there are only seven time indices. In the realm of time series analysis, this is a small sample size. The seven time indices also allowed for some assumptions in the modelling process that could not be checked by hypothesis tests. Given the nature of a glacier, each pixel is rel- atively stationary throughout time, until a certain point when the ice melts and the observation's expected value is no longer invariant with respect to time and the stationary assumption is violated. If these methods were applied to a longer time period, it would be reasonable to consider a change point model to accommodate nonstationarity, whereby an additional parameter can be used to separate two different time series models. Furthermore, the time and memory required, even with this restricted data, was substantial. A single ARMA model required up to 28 Gigabytes of memory for upwards of seven days of run time. Gaussian processes scale 0 ( n3 ), so if double the data is used, the model requirements scale by 8. So further approximations, or different processes are needed if even a small amount of additional data is used. Since this research has been completed, additional ARD imagery has been uploaded for the years 2013-2019. Further research may use these additional images, older Landsat 7 imagery, or data from 2020 onwards for a larger dataset. The Bayesian methods used allowed us to model the uncertainty not only in 73 the response, but also allows variation in the hyperparameters, unlike current methods. The issue that arose is that Bayesian computation is expensive in both time and memory demands. As a result, further approximations are needed to be made to run the model in a reasonable time. In this case, only one acquisition was used per year, and the Gaussian process length parameter was fixed at 12. This is seen in Figures 5.3, 5.4, 5.5, and 5.6 for 2013, in which the ARMA process had no support and the approximations within the Gaussian process clearly did not effectively outline the glacier. Including multiple scenes per year could help with modelling the intra-annual variation. Given the data is obtained from satel- lites rotating the earth, including multiple scenes per year could help mitigate sampling error. Additionally, the additional data would influence the modelling choice. Instead of modelling the spatial dependencies with Gaussian processes, conditional autoregressive models may be more suitable, as there would no longer be large amounts of missing data, and conditional autoregressive models are more sensitive to rapid changes, such as one would find at the glacier edges. However, given the large memory and time constraints on the analysis, the data was re- stricted to a singular image per year. While a glacier is a stationary object and is not expected to shrink more than a few meters per year, adding additional manner may help with first reconstructing a complete image on an annual basis, then modelling the spatiotemporal dynamics. This may increase the time and memory demands of the algorithm. The methods here, as with many statistical techniques, rely on the missing data being MAR, the assumption of which can certainly be brought into ques- 74 tion. As previously noted, the Fmask algorithm, used to remove the clouds and cloud shadows, has difficulty distinguishing between clouds made of ice crystals and underlying snow or ice. As a result, given equal cloud cover, more glacier observations would be deemed missing than non-glacier observations. However, given the elevation range of Mount Rainier, an isolated strato-volcano, the cloud formation is not uniformly distributed, and will be more likely to form at certain elevations. This research glossed over this with a MAR assumption. I think it would be a valid extension to further investigate this by adding a missing data model parameter. The missing data also influenced the algorithm complexity. Note that the simplest ARMA models, the AR(1) and MA(1), could be ruled out immediately by inspecting the data in Figures 3.4, 3.5, 3.6, and 3.7. The missing data in 2015 and 2019 does not allow any time series process with a single lag to make predictions for the following year. So the 2016 and future 2020 data would be not well modelled under an AR(1), MA (1), or ARMA (1,1) process. Current research avoids this problem by filling in these gaps, either through a Kriging-variant interpolation or through quantile regression [73-75] . The issues arising here are that these methods often flatten the spatiotemporal data, or they use the data twice. The first time is to fill in the gaps, and the second time is to model the spatiotemporal dynamics. A different way of approaching this would be not to use a discrete time series process, such as ARMA, but instead use a continuous time series process. The downside of this is that the sample size may grow many times in size, which will greatly increase the memory and computation time. 75 Glacier observations seem to be taken from several sub-populations. The first is the snow and ice on the glacier. The second is the surrounding bedrock. The third is the surrounding forest and alpine areas, and finally, speculatively, is the border area between the first two. One possible way would be to model this using a multi-modal distribution. The issue with applying this stems from the MCMC methods used to approximate the posterior distribution. The MCMC iterations happen in sequence, so if the modes of the posterior distribution are far apart then the MCMC samples have trouble transitioning from one peak to another. Given there are a finite, predetermined amount of MCMC iterations, the model may not transition in time from one peak to another for accurate modelling. The potential solution to this would be to split the data into three sections and model each independently. A difficulty with this is that as the glacier melts, pixels will transition from one state to another. This would require a dynamic spatiotemporal model with a non-separable covariance structure. This is another potential opportunity not seen in current glacier remote sensing research. Another way to approach the debris cover question would be to involve differ- ent data sources. Given that debris covered glaciers and the surrounding bedrock are indistinguishable to the human eye, research shows little improvement with including finer resolution lidar data [76]. However, the addition of non-optical data, such as SAR data, may improve upon the classification. Given the methods themselves have no determinant connection to isolated strato-volcanos, the methods demonstrated here can easily be applied to other glaciers, ice fields, and other permanent ice structures. These methods here are also suited to other stationary 76 spatiotemporal data structures. Potential applications include but are not limited to mountain snow pack, algae bloom, and particulate spread from an emission source. 77 Bibliography [1] USGS, "How much of the Earth's water is stored in glaciers?" https : / / www . usgs . gov / faqs / how-much- earths- water-stored-glaciers? qt - news science products=0, accessed 2019-05-01. _ _ [2] R. Bhambri and T. Bolch, "Glacier mapping: a review with special reference to the Indian Himalayas," Progress in Physical Geography, vol. 33, no. 5, pp. 672-704, 2009. [3] A. Fischer, B. Seiser, M. Stocker Waldhuber, C. Mitterer, and J. Abermann, "Tracing glacier changes in Austria from the Little Ice Age to the present using a lidar-based high-resolution glacier inventory in Austria," Cryosphere, vol. 9, no. 2, 2015. [4] W. G. Rees and N. S. Arnold, "Mass balance and dynamics of a valley glacier measured by high-resolution LiDAR," Polar Record, vol. 43, no. 4, pp. 311 319, 2007. — [5] H. Alifu, B. A. Johnson, and R. Tateishi, "Delineation of debris-covered glaciers based on a combination of geomorphometric parameters and a TIR / NIR / SWIR Band Ratio," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 2, pp. 781-792, 2015. [6] A. E. Racoviteanu, M. W. Williams, and R. G. Barry, "Optical remote sensing of glacier characteristics: a review with focus on the Himalaya," Sensors, vol. 8, no. 5, pp. 3355-3383, 2008. [7] K. Bayr, D. K. Hall, and W. M. Kovalick, "Observations on glaciers in the eastern Austrian Alps using satellite data," International Journal of Remote Sensing, vol. 15, no. 9, pp. 1733-1742, 1994. 78 [8] F. Paul, S. H. Winsvold, A. Kaab, T. Nagler, and G. Schwaizer, "Glacier remote sensing using Sentinel-2. Part II: Mapping glacier extents and surface facies, and comparison to Landsat 8," Remote Sensing, vol. 8, no. 7, p. 575, 2016. [9] D. K. Hall, J. L. Foster, J. Y. Chien, and G. A. Riggs, "Determination of actual snow-covered area using Landsat TM and digital elevation model data in Glacier National Park, Montana," Polar Record, vol. 31, no. 177, pp. 191-198, 1995. [10] R. Sidjak and R. Wheate, "Glacier mapping of the Illecillewaet icefield, British Columbia, Canada, using Landsat TM and digital elevation data," International Journal of Remote Sensing, vol. 20, no. 2, p. 273-284, 1999. [11] M. Zhang, X. Wang, C. Shi, and D. Yan, "Automated glacier extraction index by optimization of red / SWIR and NIR / SWIR ratio index for glacier mapping using Landsat imagery," Water, vol. 11, no. 6, p. 1223, 2019. [12] P. Rastner, T. Bolch, C. Notamicola, and F. Paul, "A comparison of pixel- and object-based glacier classification With optical satellite Images," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 3, pp. 853-862, 2014. [13] T. Strozzi, A. Luckman, T. Murray, U. Wegmuller, and C. L. Werner, "Glacier motion estimation using SAR offset-tracking procedures," IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 11, pp. 2384-2391, 2002. [14] N. Short and A. Gray, "Potential for RADARSAT-2 interferometry: glacier monitoring using speckle tracking," Canadian Journal of Remote Sensing, vol. 30, no. 3, pp. 504-509, 2004. [15] H. Hindberg, E. Malnes, and K. A. Hogda, "A monitoring system for glaciers on Svalbard based on Envisat ASAR wide swath data," in 2012 IEEE International Geoscience and Remote Sensing Symposium, pp. 1856-1859, 2012. [16] NSIDC, "Global Land Ice Velocity Extraction from Landsat 8 (GoLIVE)." https : // nsidc . org / data /golive, accessed 2019-05-01. [17] B. Altena, T. Scambos, M. Fahnestock, and A. Kaab, "Extracting recent shortterm glacier velocity evolution over southern Alaska and the Yukon from a large collection of Landsat data," The Cryosphere, vol. 13, no. 3, pp. 795-814, 2019. 79 [18] L. Sam, A. Bhardwaj, R. Kumar, M. Buchroithner, and F. J. Martin-Torres, "Heterogeneity in topographic control on velocities of Western Himalayan glaciers," Scientific Reports, vol. 8, 08 2018. [19] C. Nuth, J. Kohler, M. Konig, A. von Deschwanden, J. O. Hagen, A. Kaab, G. Moholdt, and R. Pettersson, "Decadal changes from a multi-temporal glacier inventory of Svalbard," The Cryosphere, vol. 7, no. 5, pp. 1603-1621, 2013. [20] B. Medley, I. Joughin, S. B. Das, E. J. Steig, H. Conway, S. Gogineni, A. S. Criscitiello, J. R. McConnell, B. E. Smith, M. R. van den Broeke, J. T. M. Lenaerts, D. H. Bromwich, and J. P. Nicolas, "Airborne-radar and ice-core observations of annual snow accumulation over Thwaites Glacier, West Antarctica confirm the spatiotemporal variability of global and regional atmospheric models," Geophysical Research Letters, vol. 40, no. 14, pp. 36493654, 2013. [21] D. J. Wingham, D. W. Wallis, and A. Shepherd, "Spatial and temporal evolution of Pine Island Glacier thinning, 1995-2006," Geophysical Research Letters, vol. 36, no. 17, 2009. [22] A. Greene, Response of Mountain Glaciers to Climate Forcing: Analysis and Applications. PhD thesis, Columbia University, 2002. [23] A. Moreno, F. J. Garda-Haro, B. Martinez, and M. A. Gilabert, "Noise reduction and gap filling of fAPAR time series using an adapted local regression filter," Remote Sensing, vol. 6, no. 9, pp. 8238-8260, 2014. [24] M. Neteler, "Estimating daily land surface temperatures in mountainous environments by reconstructed MODIS LST data," Remote Sensing, vol. 2, no. 1, pp. 333-351, 2010. [25] N. E. Barrand and T. Murray, "Multivariate controls on the incidence of glacier surging in the Karakoram Himalaya," Arctic, Antarctic, and Alpine Research, vol. 38, no. 4, pp. 489-498, 2006. [26] A. Dehecq, N. Gourmelen, and E. Trouve, "Deriving large-scale glacier velocities from a complete satellite archive: Application to the PamirKarakoram-Himalaya," Remote Sensing of Environment, vol. 162, pp. 55-66, 2015. 80 [27] P. Kraaijenbrink, J. Shea, F. Pellicciotti, S. De Jong, and W. Immerzeel, "Object-based analysis of unmanned aerial vehicle imagery to map and characterise surface features on a debris-covered glacier," Remote Sensing of Environment, vol. 186, pp. 581-595, 2016. [28] V. Ayma, C. Beltran, P. Happ, G. Costa, and R. Feitosa, "Mapping glacier changes using clustering techniques on cloud computing infrastructure," International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 2019. [29] S. Z. Gilani and N. I. Rao, "A clustering based automated glacier segmentation scheme using digital elevation model," in 2009 Digital Image Computing: Techniques and Applications, pp. 277-284, IEEE, 2009. [30] N. Karimi, A. Farokhnia, L. Karimi, M. Eftekhari, and H. Ghalkhani, "Combining optical and thermal remote sensing data for mapping debriscovered glaciers ( Alamkouh Glaciers, Iran )," Cold Regions Science and Technology, vol. 71, pp. 73 - 83, 2012. [31] C. E. Rasmussen, "The infinite Gaussian mixture model," in Advances in Neural Information Processing Systems, pp. 554-560, 2000. [32] F. Paul, "Evaluation of different methods for glacier mapping using landsat tm," EARSeL eProceedings, vol. 1, pp. 239-245, 2000. [33] J. L. Dwyer, D. P. Roy, B. Sauer, C. B. Jenkerson, H. K. Zhang, and L. Lymburner, "Analysis ready data: enabling analysis of the Landsat archive," Remote Sensing, vol. 10, no. 9, p. 1363, 2018. [34] S. Foga, P. L. Scaramuzza, S. Guo, Z. Zhu, R. D. Dilley Jr, T. Beckmann, G. L. Schmidt, J. L. Dwyer, M. J. Hughes, and B. Laue, "Cloud detection algorithm comparison and validation for operational Landsat data products," Remote Sensing of Environment, vol. 194, pp. 379-390, 2017. [35] T. Bolch and U. Kamp, "Glacier mapping in high mountains using DEMs, Landsat and ASTER data," Zurich Open Repository and Archive, 2005. [36] A. Bhardwaj, "Applicability of Landsat 8 data for characterising glacier facies and supraglacial debris," International Journal of Applied Earth Observation and Geoinformation, vol. 38, 12 2014. 81 [37] A. Gelman, J. B. Carlin, H. S. Stem, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian Data Analysis. CRC press, 2013. [38] R. R Bukata, Satellite Monitoring of Inland and Coastal Water Quality. CRC Press, 2005. [39] N. Cressie and C. K. Wikle, Statistics for Spatio-Temporal Data. John Wiley & Sons, 2015. [40] G. C. Tiao and G. E. Box, "Some comments on 'Bayes' estimators," The American Statistician, vol. 27, no. 1, pp. 12-14, 1973. [41] D. J. MacKay, "Bayesian interpolation," Neural Computation, vol. 4, no. 3, pp. 415-447, 1992. [42] B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell, "Stan: A probabilistic programming language," Journal of Statistical Software, vol. 76, no. 1, 2017. [43] L. Fahrmeir and S. Lang, "Bayesian inference for generalized additive mixed models based on Markov random field priors," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 50, no. 2, pp. 201-220, 2001. [44] R. Murray-Smith and A. Girard, "Gaussian process priors with ARMA noise models," in Irish Signals and Systems Conference, Maynooth, pp. 147-152, 2001. [45] T. Hengl, G. B. Heuvelink, M. P. Tadic, and E. J. Pebesma, "Spatio-temporal prediction of daily temperatures using time-series of MODIS LST images," Theoretical and Applied Climatology, vol. 107, no. 1-2, pp. 265-277, 2012. [46] Q. Yao and P. J. Brockwell, "Gaussian maximum likelihood estimation for ARMA models II: spatial processes," Bernoulli, vol. 12, no. 3, pp. 403-429, 2006. [47] C. J. Verzilli, N. Stallard, and J. C. Whittaker, "Bayesian modelling of multivariate quantitative traits using seemingly unrelated regressions," Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, vol. 28, no. 4, pp. 313-325, 2005. 82 [48] A. Zellner and T. Ando, "A direct Monte Carlo approach for Bayesian analysis of the seemingly unrelated regression model," Journal of Econometrics, vol. 159, no. 1, pp. 33-45, 2010. [49] B. Liquet, K. Mengersen, A. Pettitt, and M. Sutton, "Bayesian variable selection regression of multivariate responses for group data," Bayesian Analysis, vol. 12, no. 4, pp. 1039-1067, 2017. [50] M. A. Zapala and N. J. Schork, "Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables," Proceedings of the National Academy of Sciences, vol. 103, no. 51, pp. 19430-19435, 2006. [51] A. Schuster, "On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena," Terrestrial Magnetism, vol. 3, no. 1, pp. 13-41, 1898. [52] W. M. Persons, "Correlation of time series," Journal of the American Statistical Association, vol. 18, no. 142, pp. 713-726, 1923. [53] G. Box, "Time Series Analysis, Forecasting, and Control," Francisco Holden-Pay, 1970. [54] H. Wold, A study in the Analysis of Stationary Time Series. Almqvist & Wiksell, 1938. PhD thesis, [55] J. F. Monahan, "Fully Bayesian analysis of ARMA time series models," Journal of Econometrics, vol. 21, no. 3, pp. 307-331, 1983. [56] K. A. Hoover and M. G. Wolman, "Beyond the semivariogram: Patterns, scale, and hydrology in a semi-arid landscape," Advances in Water Resources, vol. 28, no. 9, pp. 885 - 898, 2005. [57] F. Pellicciotti, S. Ragettli, M. Carenzo, and J. McPhee, "Changes of glaciers in the Andes of Chile and priorities for future work," Science of The Total Environment, vol. 493, pp. 1197 - 1210, 2014. [58] M. S. Handcock and M. L. Stein, "A Bayesian analysis of Kriging," Technometrics, vol. 35, no. 4, pp. 403-410, 1993. [59] D. G. Krige and D. Krige, Lognormal-de Wijsian Geostatistics for Ore Evaluation. South African Institute of Mining and Metallurgy Johannesburg, 1981. 83 [60] D. J. Mackay, "Gaussian processes," A replacement for neural networks, NIPS Tutorial, pp. 514-520, 1997. [61] J. G. Anderson and J. N. Brune, "Probabilistic seismic hazard analysis without the ergodic assumption," Seismological Research Letters, vol. 70, no. 1, pp. 19-28, 1999. [62] Stan Development Team, "Stan modeling language users guide and reference manual, version 2.18.0," 2018. [63] P. C. Phillips, "Bayesian model selection and prediction with empirical applications," Journal of Econometrics, vol. 69, no. 1, pp. 289-331, 1995. [64] R. Salakhutdinov and A. Mnih, "Bayesian probabilistic matrix factorization using Markov chain Monte Carlo," in Proceedings of the 25th International Conference on Machine Learning, pp. 880-887, 2008. [65] A. Vehtari, A. Gelman, and J. Gabry, "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC," Statistics and Computing, vol. 27, no. 5, pp. 1413-1432, 2017. [66] T. M. Ludden, S. L. Beal, and L. B. Sheiner, "Comparison of the Akaike Information Criterion, the Schwarz criterion and the F test as guides to model selection," Journal of Pharmacokinetics and Biopharmaceutics, vol. 22, no. 5, pp. 431-445, 1994. [67] D. Anderson, K. Burnham, and G. White, "Comparison of Akaike information criterion and consistent Akaike information criterion for model selection and statistical inference from capture-recapture studies," Journal of Applied Statistics, vol. 25, no. 2, pp. 263-282, 1998. [68] S. Watanabe and M. Opper, "Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory," Journal of Machine Learning Research, vol. 11, no. 12, 2010. [69] B. Raup, A. Racoviteanu, S. J. S. Khalsa, C. Helm, R. Armstrong, and Y. Amaud, "The GLIMS geospatial glacier database: a new tool for studying glacier change," Global and Planetary Change, vol. 56, no. 1-2, pp. 101-110, 2007. 84 [70] S. Abe, "On invariance of support vector machines," in Proceedings of the 4th International Conference on Intelligent Data Engineering and Automated Learning, 2003. [71] S. Luo, W. Mou, M. Li, K. Althoefer, and H. Liu, "Rotation and translation invariant object recognition with a tactile sensor," in SENSORS, 2014 IEEE, pp. 1030-1033, IEEE, 2014. [72] D. Adametz, Invariances for Gaussian models. sity _of _Basel, 2015. PhD thesis, Univer- [73] C. Zhang, W. Li, and D. Travis, "Gaps-fill of SLC-off Landsat ETM + satellite image using a geostatistical approach," International Journal of Remote Sensing, vol. 28, no. 22, pp. 5103-5122, 2007. [74] S. M. Howard and J. M. Lacasse, "An evaluation of gap-filled Landsat SLCoff imagery for wildland fire burn severity mapping," Photogrammetric Engineering and Remote Sensing, vol. 70, no. 8, pp. 877-880, 2004. [75] F. Gerber, R. de Jong, M. E. Schaepman, G. Schaepman-Strub, and R. Furrer, "Predicting missing values in spatio-temporal remote sensing data," IEEE Transactions on Geoscience and Remote Sensing, pp. 1-13, 2018. [76] T. Reid, M. Carenzo, F. Pellicciotti, and B. Brock, "Including debris cover effects in a distributed model of glacier ablation," Journal of Geophysical Research: Atmospheres, vol. 117, no. D18, 2012. [77] S. H. Winsvold, A. Kaab, and C. Nuth, "Regional glacier mapping using optical satellite data time series," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 8, pp. 3698-3711, 2016. [78] R. Frauenfelder and A. Kaab, "Glacier mapping from multi-temporal optical remote sensing data within the Brahmaputra River Basin," pp. 922-925, 01 2009. [79] M. Konig, J.-G. Winther, J. Kohler, and F. Konig, "Two methods for fimarea and mass-balance monitoring of Svalbard glaciers with SAR satellite images," Journal of Glaciology, vol. 50, no. 168, pp. 116-128, 2004. [80] A. Brenning, "Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection," Remote Sensing of Environment, vol. 113, pp. 239-247, 01 2009. 85 [81] D. Farinotti, "On the effect of short-term climate variability on mountain glaciers: insights from a case study," Journal of Glaciology, vol. 59, no. 217, p. 992-1006, 2013. [82] U. Naeem, M. Shamim, N. Ejaz, D. H. Rehman, U. Mustafa, H. Hashmi, and A. R. Ghumman, "Investigation of temporal change in glacial extent of Chitral watershed using Landsat data," Environmental Monitoring and Assessment, vol. 188, pp. 1-13, 12 2016. [83] C. I. Millar, R. D. Westfall, and D. L. Delany, "Thermal and hydrologic attributes of rock glaciers and periglacial talus landforms: Sierra Nevada, California, USA," Quaternary International, vol. 310, pp. 169 - 180, 2013. PACLIM: Proceedings of the 25th Pacific Climate Workshop, 2011. [84] Y. Liu, J. Wu, Y. Liu, B. X. Hu, Y. Hao, X. Huo, Y. Fan, T. J. Yeh, and Z.L. Wang, "Analyzing effects of climate change on streamflow in a glacier mountain catchment using an ARMA model," Quaternary International, vol. 358, pp. 137 - 145, 2015. [85] USGS, "SLC Gap-Fill Methodology." https : // www . usgs . gov / media /files / landsat -7-slc-gap-filled- products- phase- one-methodology, accessed 2019-05-01. [86] USGS, "SLC-off Gap-Filled Products Gap-Fill Algorithm Methodology." https://prd-wret .s3.us-west-2.amazonaws. com/assets/palladium/ production/atoms/f iles/L7SLCGapFilledMethod.pdf, accessed 2019-05-01. [87] R. E. Rossi, J. L. Dungan, and L. R. Beck, "Kriging in the shadows: geostatistical interpolation for remote sensing," Remote Sensing of Environment, vol. 49, no. 1, pp. 32-40, 1994. [88] E. Addink, "A comparison of conventional and geostatistical methods to replace clouded pixels in NOAA-AVHRR images," International Journal of Remote Sensing, vol. 20, no. 5, pp. 961-977, 1999. [89] Y. Zhu, E. L. Kang, Y. Bo, Q. Tang, J. Cheng, and Y. He, "A Robust fixed rank Kriging method for improving the spatial completeness and accuracy of satellite SST products," IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 9, pp. 5021-5035, 2015. 86 [90] L. Guo, L. Lei, Z. Zeng, P. Zou, D. Liu, and B. Zhang, "Evaluation of spatiotemporal variogram models for mapping xco 2 using satellite observations: A case study in China," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 1, pp. 376-385, 2014. [91] R. Guhaniyogi and S. Banerjee, "Multivariate spatial meta kriging," Statistics & Probability Letters, vol. 144, pp. 3-8, 2019. [92] P. Scaramuzza and J. Barsi, "Landsat 7 scan line corrector-off gap-filled product development," in Proceeding of Pecora, vol. 16, pp. 23-27, 2005. [93] R. Hodgkins, "Glacier hydrology in Svalbard, Norwegian high Arctic," Quaternary Science Reviews, vol. 16, no. 9, pp. 957-973, 1997. [94] S. Lang, S. B. Adebayo, L. Fahrmeir, and W. J. Steiner, "Bayesian geoadditive seemingly unrelated regression," Computational Statistics, vol. 18, no. 2, pp. 263-292, 2003. [95] M. Smith and R. Kohn, "Nonparametric seemingly unrelated regression," Journal of Econometrics, vol. 98, no. 2, pp. 257-281, 2000. [96] D. K. Hammond and E. P. Simoncelli, "Image modeling and denoising with orientation-adapted Gaussian scale mixtures," IEEE Transactions on Image Processing, vol. 17, no. 11, pp. 2089-2101, 2008. 87