Sigma-point Kalman Filter D a t a Assimilation M e t h o d s For Strongly Nonlinear Dynamical Models Jaison Thomas Ambadan B.Sc, Mahatma Gandhi University, (India) 2000 M.Sc, Cochin University of Science & Technology, (India) 2002 Thesis Submitted In Partial Fulfillment Of The Requirements For The Degree Of Master of Science in Natural Resources and Environmental Studies (Environmental Science) The University of Northern British Columbia June 2008 (C) Jaison Thomas Ambadan, 2008. 1*1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A0N4 Canada Your file Votre reference ISBN: 978-0-494-48755-6 Our file Notre reference ISBN: 978-0-494-48755-6 NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. AVIS: L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par Plntemet, prefer, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada To my beloved parents Acknowledgments It has been my good fortune to encounter many good people who have given me companionship, help, and above all their valuable time during the course of my life at UNBC. First and foremost, I would like to express my sincere gratitude to my supervisor, Dr. Youmin Tang for his help and guidance. This research work has certainly benefited from his suggestions. I would also like to take this opportunity to acknowledge and thank my graduate supervisory committee members Dr. Peter L. Jackson, Dr. Stephen J. Dery and Dr. Zhenguo Qiu for the support they have given me over the years. A special word of gratitude to all my co-workers, group members, and my beloved friends and roommates that I have worked with over the years. A special word of gratitude, finally, to all the members of my research group, Dr. Ziwang Deng, Dr. Xiaobing Zhou, Mr. Yanjie Cheng, and Mr. Zhiyu Wang. I also acknowledge and thank the Oregon Graduate Institute, and Dr. Eric A. Wan and Dr. Rudolph Van der Merwe for providing the ReBEL tool kit, part of which has been used in this research work. This work was supported by an NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery Grant and the CFCAS (Canadian Foundation for Climate and Atmospheric Sciences) network 1 11 project of "Global Ocean-Atmosphere Prediction and Predictability". I am particularly indebted to my parents, my brother, and my sisters for their monumental, unwavering support and encouragement on all fronts. They have truly always been there for me, and without them none of this would have been even remotely possible. Prince George, 27th August 2008 Jaison Thomas Ambadan Abstract Performance of an advanced derivative-less, sigma-point Kalman filter (SPKF) data assimilation scheme in a strongly nonlinear dynamical model is investigated. The SPKF data assimilation scheme is compared against standard Kalman filters such as the extended Kalman filter (EKF) and the ensemble Kalman filter (EnKF) schemes. Three particular cases, namely the state estimation, parameter estimation, and joint estimation of states and parameters from a set of discontinuous noisy observations are studied. The problems associated with the use of the tangent linear model (TLM) or the Jacobian when using standard Kalman filters are eliminated when using SPKF data assimilation algorithms. Further, the constraints and issues of SPKF data assimilation in real ocean or atmospheric models are emphasized. A reduced sigma-point subspace approach is proposed and investigated for higher dimensional systems. A low dimensional Lorenz '63 model and a higher dimensional Lorenz '95 model are used as the test-bed for data assimilation experiments. The results of the SPKF data assimilation schemes are compared with those of the standard EKF and EnKF where a highly nonlinear chaotic case is studied. It is shown that the SPKF is capable of estimating the model state and parameters with better accuracy than EKF and EnKF. Numerical experiments show that in all cases, the SPKF can give consistent results with better assimilation skills than EnKF and EKF, and can overcome the iii iv drawbacks associated with the use of EKF and EnKF. Contents Acknowledgments i Abstract iii 1 Introduction 1 1.1 Optimal Estimation and Data Assimilation: An Overview 1 1.2 Data Assimilation in Nonlinear Systems: A Solution? 3 1.3 Research Objectives 6 1.4 Contribution of This Research Work 7 1.5 Thesis Context and Overview 8 2 Sequential Data Assimilation Methods 10 2.1 Recursive Bayesian Estimation for Nonlinear Systems 11 2.2 Extended Kalman Filter 15 2.3 Ensemble Kalman Filter 17 2.4 Sigma-point Kalman Filters 19 2.4.1 Sigma-point Approach 20 2.4.2 Sigma-point Unscented Kalman Filter 23 2.4.3 Sigma-point Central Difference Kalman Filter 25 2.5 A New Ensemble Kalman Filter: Sigma-point Approach v 29 CONTENTS 3 4 5 vi Experiments with the Lorenz '63 Model 33 3.1 State Estimation 34 3.2 Parameter Estimation 36 3.3 Joint Estimation of States and Parameters 37 Results and Discussion 39 4.1 State Estimation 39 4.2 Parameter Estimation 52 4.3 Joint Estimation of States and Parameters 55 Reduced Space Sigma-point Kalman Filter 60 5.1 60 SPKF Data Assimilation in Higher Dimensional Systems 5.1.1 A subspace approach with sigma-points: Design and Implementation 5.2 61 Experiments with the Lorenz '95 Model 62 5.2.1 63 Performance and Evaluation 6 Conclusion 68 A Re-interpretation of the Standard Kalman Gain 71 B An Alternate Formula for Updating the State Error Covariance Matrix 74 Bibliography 76 Chapter J I Introduction 1.1 Optimal Estimation and Data Assimilation: An Overview Optimal estimation is the process of determining the state of a system with minimal errors using the information from measured or observed data and a mathematical model. In optimal estimation, it is assumed that all the available information from the numerical model and observations are contaminated with noise, hence there is uncertainty. If there was no uncertainty then there would be no need for an optimal estimation. The uncertainty is due to model errors, and measurement or observational errors. The model errors are commonly due to the functional simplifications, linearization, and discretization of the mathematical or numerical model describing the physical phenomena. The observational or measurement errors are usually associated with sensor and instrument inaccuracy. Optimal estimation is often characterized into three different classes, namely: prediction, filtering and smoothing, depending on the time at which an estimate is desired, with the available observed information. Optimal estimation has a wide range of applications from robotics to artificial intelligence, to missile guidance, to econometrics, to weather forecasting and so forth. Several optimal estimation methods have been proposed over time. The most commonly used optimal estimation methods include the method of least squares, the maximum like- 1 1.1 OPTIMAL ESTIMATION AND DATA ASSIMILATION: AN OVERVIEW 2 lihood and the Bayesian estimation. The method of least squares, first introduced by Gauss in 1794, is used to optimally fit the data using a linear model. The best fit is characterized by the sum of squared residuals which have the least value. The Least Square Estimation (LSE) is optimal only if the underlying distribution is normal. On the other hand, Maximum Likelihood Estimation (MLE), introduced by Fisher in 1922, is generally optimal. The LSE has the minimum variance within the class of linear unbiased estimates 1 . The likelihood function is often defined as a probability density function for the observable 2 in terms of which parameters are observed. Bayesian estimation makes use of any prior information about the unknown parameters and the observable. Bayesian estimation can simply be considered as an extension of maximum likelihood estimation where we can use all the available information about observations and parameters to be estimated. Recursive Bayesian estimation is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming new measurements and a mathematical model. One of the most widely used recursive Bayesian optimal estimation techniques is the Kalman filter. Loosely speaking, recursive Bayesian estimation is known in the earth science as data assimilation. In meteorology and physical oceanography, data assimilation is a procedure that combines all available observations with information from numerical models to give an optimal estimate of the state of the atmosphere-ocean. A data assimilation system consists of three major components: a set of observations, a dynamical model, and an assimilation scheme which is equivalent to an optimal estimation method. The quality of assimilation depends mostly on the assimilation scheme used in the model and the quality of the observations. Generally, all assimilation methods are able to estimate the model state at a given 1 This is the well known Gauss-Markov (GM) theorem. 2 In estimation theory, a system is said to be observable if the current state of the system can be determined in time using only the outputs of the system. If a system is not observable, then current values of states cannot be determined through observations or measurements. 1.2 DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION? 3 time. Another important task in data assimilation is to use available observations to estimate unknown parameters in a dynamical or an empirical model. 1.2 Data Assimilation in Nonlinear Systems: A Solution? Many assimilation techniques have been developed for meteorology and physical oceanography. Methodologies like Optimal Interpolation (01) and variational methods, in particular the three- and four-dimensional variational data assimilation methods (3D-VAR and 4D-VAR), and Kalman filter based techniques have broad applications in Numerical Weather Prediction (NWP) [Bennet, 1992; Daley, 1991; Ghil and Malanotte-Rizzoli, 1991; Navon, 1998]. In 0 1 (and 3D-VAR) robust estimation is difficult because of the outlier problem, i.e. isolated observations are given more weight than observations close together. In 01 the forecast errors have large correlations at nearby observation points and when several observations are close together calculation of weights may be ill-posed. On the other hand the 4D-VAR assimilation algorithm is expensive 3 to implement in the case of nonlinear systems although it gives almost the same accuracy as the Kalman filter, since it requires repeated forward integrations with the non-linear (forecast) model and backward integrations with the tangent linear model (TLM). In addition, writing the adjoint code is the most difficult part in implementing 4D-VAR assimilation scheme for general circulation models (GCMs), because of its highly nonlinear nature. It has been shown that estimation methods such as the Kalman filter and its derivatives can be used in sequential data assimilation in strongly nonlinear systems such as the atmosphere and the ocean [Miller et al., 1994]. Among sequential data assimilation methods, Kalman filters have been widely used in meteorology and oceanography. The standard Kalman Filter (KF) is a sim3 If the system is linear and Gaussian 4D-VAR is equivalent to Kalman filter, and is computa- tionally cheaper 1.2 DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION? 4 plification of Bayesian estimation that provides sequential, unbiased, minimum error variance estimates based upon a linear combination of all past measurements and dynamics [Welch and Bishop, 1995]. Since the introduction of the extended Kalman filter (EKF), which is the nonlinear extension to the standard KF, there have been many attempts to use the EKF in weather or climate prediction models. It has been shown that the EKF can be used in sequential data assimilation in strongly nonlinear systems [Miller et al., 1994]. Unfortunately, the requirement of the Jacobian or the tangent linear model (TLM) for the linearization of nonlinear functions limits the use of EKF for many real world problems. Another major drawback of EKF is that it only uses the first order terms of the Taylor expansion of the nonlinear function. It is evident that this approximation often introduces errors in the estimation of covariance matrices in highly nonlinear models [Miller et al., 1994]. In other words, the inaccuracy of propagated means and covariances resulting from the linearization of the nonlinear model is one of the major drawbacks of the EKF data assimilation algorithm. Another alternative to the standard KF is the Ensemble Kalman Filter (EnKF), introduced by Evensen [1992] (later modified by Houtekamer and Mitchell [1998]), where the error covariances are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Plank equation, can be estimated using ensemble integrations [Evensen, 1994, 1997]; the error covariance matrices can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher order statistical moments in calculating the error covariance. Major strengths of the EnKF are: i) there is no need to calculate the tangent linear model, which is extremely difficult for ocean (or atmospheric) general circulation models (GCMs); ii) the covariance matrix is 1.2 DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION? 5 propagated in time via fully nonlinear model equations (no linear approximation as in the EKF); and iii) it is well suited to modern parallel computers (cluster computing) [Keppenne, 2000]. However, there are also some disadvantages of using EnKF data assimilation. Firstly, the finite ensemble size has major effects on the performance of the EnKF. A small ensemble size increases the residual errors and gives inaccurate statistical moments while a large ensemble size is not computationally feasible in the case of atmospheric or ocean general circulation models. Another disadvantage of the EnKF is that it assumes a linear measurement operator; if the measurement function is nonlinear, it has to be linearized in the EnKF. The nonlinear measurement function often appears in many situations; for example, the parameter estimation of nonlinear dynamical models where the measurement relationship between observations and parameters are nonlinear. Similarly, in the case of satellite altimetry data assimilation, the observation (sea level height) is often nonlinearly related to the variable required for assimilation (e.g. temperature). Thus the condition of linear measurement limits the use of the EnKF in some real world problems. The Sigma-Point Kalman Filters (SPKFs) have been proposed in an attempt to address these drawbacks of EKF and EnKF [Julier et al., 1995; Van der Merwe, 2004; Van der Merwe et al., 2004]. The SPKF is a derivative-less sequential optimal estimation method which uses a novel deterministic sampling approach that eliminates the need for the calculation of TLM or the Jacobian of the model equations as needed by the standard KF [Haykin, 2001; Ito and Xiong, 2000; Julier et al., 1995; Lefebvre et al., 2002; N0rgad Magnus et a l , 2000b; Van der Merwe, 2004; Wan and Van Der Merve, 2000]. It has been found that the expected error due to linearization is smaller than that of a truncated Taylor series linearization in EKF [Van der Merwe and Wan, 2001a]. The SPKF algorithm has been successfully implemented in many areas like robotics, artificial intelligence, natural language processing, and global po- 1.3 RESEARCH OBJECTIVES 6 sitioning systems navigation [Haykin, 2001; Van der Merwe, 2004; Van der Merwe and Wan, 2001a,b; Van der Merwe et al., 2004; Wan and Van Der Merve, 2000]. In this thesis, we will show that SPKF has a great potential in data assimilation problems in meteorology and physical oceanography, and is capable of estimating the model state and parameters with better accuracy than existing methods. 1.3 Research Objectives Over the last few decades the Kalman filter based data assimilation has become an active area of research in meteorology and physical oceanography. One of the main reasons behind this widespread research is due to the fact that Kalman filter based optimal estimation has been a great success in other areas of science and engineering. However, the estimation inaccuracy due to the highly nonlinear nature of the dynamics as well as the computational difficulties limits the use of standard Kalman filters such as the extended Kalman filter and ensemble Kalman filter assimilation schemes in many real world problems. In this thesis, we will show that SPKF, as an ensemble Kalman filter with "specific" ensemble, has a great potential in the assimilation of nonlinear systems. This thesis is meant to explore the possibility of applying SPKF in atmospheric and oceanic data assimilation. Summary of Research Objectives: This study aims to: • Investigate the performance and capabilities of Sigma-point Kalman filter (SPKF) data assimilation schemes in strongly nonlinear dynamical systems for both the model state and parameter estimation problems. • Study the assimilation performance in the case of higher noise and infrequent observations. 1.4 CONTRIBUTION OF THIS RESEARCH WORK 7 • Compare the assimilation results against existing Kalman filter based data assimilation schemes. • Explore the possibility of combining the theory of ensemble Kalman filter and sigma-point Kalman filters. • Explore the possibilities to overcome the drawbacks or constraints of SPKF data assimilation scheme, if any. 1.4 Contribution of This Research Work During the course of this thesis work, most of the research objectives stated in the previous section were successfully completed. The following concrete and substantial contributions are made to the body of knowledge regarding SPKF data assimilation schemes and their application to highly nonlinear dynamical models. Summary of Research Contributions: • Implemented the SPKF method in three classes of data assimilation problems namely state estimation, parameter estimation and joint estimation (of state and parameters) from a set of discontinuous and noisy observations. • Demonstrated that the SPKF is capable of estimating the model state and parameters with better accuracy and skill than the EKF and the EnKF. • Unified the theoretical concepts of the sigma-point Kalman filter and the ensemble Kalman filter, and identified the sigma-points as deterministic or specific ensembles. • Derived a new algorithm for ensemble Kalman filter based on the sigma-point approach which does not need the requirement of a linearized measurement operator. THESIS CONTEXT AND OVERVIEW 1.5 8 • Proposed and investigated a new reduced subspace sigma-point Kalman filter data assimilation scheme for higher dimensional systems. 1.5 Thesis Context and Overview This thesis describes the possibility of using an advanced derivative-less optimal estimation technique in highly nonlinear dynamical models. The thesis is outlined as follows: Chapter 1 (Introduction) gives a brief general introduction to general optimal estimation, followed by optimal estimation problems in the earth science, commonly known as data assimilation. The key issues in data assimilation methods are discussed along with the need for more versatile data assimilation schemes. Chapter 2 (Sequential Data Assimilation Methods) introduces the concept of se- quential data assimilation methods, and describes the details of commonly used Bayesian methods. This chapter gives an overview of existing data assimilation methods such as the Extended Kalman Filter (EKF) and the Ensemble Kalman Filter (EnKF) as well as a review of their drawbacks. A new derivative-less optimal estimation method known as the sigma-point Kalman filter and the sigma-point approach is introduced in detail. The sigma-point approach forms one of the key components in the SPKF framework. This chapter explains algorithmic details of two main classes of sigma-point Kalman filters such as the sigma-point unscented Kalman filter (SPUKF) and sigma-point central difference Kalman filter (SP-CDKF). It also explains the unification of sigma-points and ensembles, from which a new ensemble Kalman filter (EnKF) algorithm was derived. Chapter 3 (Data Assimilation Experiments With the Lorenz '63 model) introduces the Lorenz '63 model, used in this research following a number of experiments in which the data assimilation schemes described in Chapter 2 are applied to the Lorenz '63 THESIS CONTEXT AND OVERVIEW 1.5 9 model. Implementation of three particular cases, namely the state, parameter, and joint estimation of states and parameters from a set of discontinuous noisy observations are described in detail. Chapter 4 (Results and Discussions) gives the details of the experimental re- sults, and compares the results against standard Kalman filters such as the extended Kalman filter (EKF) and the ensemble Kalman filter (EnKF) schemes. Both parameter and state estimation results are discussed in detail. Chapter 5 (A Reduced Sigma-point Sub-space Approach) emphasizes the con- straints and issues of SPKF data assimilation in higher dimensional models. A reduced sigma-point subspace model is proposed and investigated for higher dimensional systems. Chapter 6 (Conclusion) consolidates the thesis with a detailed discussion of the presented research topics, and provides some suggestions for future research. Appendix A and Appendix B give detailed interpretations and derivations of Kalman gain term and covariance update equation in the standard Kalman filter equations. Chapter 2 Sequential Data Assimilation Methods In general data assimilation methods can be classified into two categories: sequential methods and variational methods. Sequential methods integrate the model forward in time with some forcing terms which force the model towards observations. On the other hand variational methods such as the three dimensional variational assimilation (3D-VAR) and the four dimensional variational assimilation (4D-VAR), seek to minimize a cost function which is the measure of the error between the model and the observations over the assimilation period. In sequential methods the observations are essentially blended with the model but in variational methods the analysis is a valid trajectory of the model. One of the most widely used sequential data assimilation methods is Kalman filter, which is based on the recursive Bayesian estimation. In this chapter we will take a short tour of basic estimation theory and then we will introduce the theory of Sigma-point Kalman filters and their variants in detail, which will serve as a theoretical background for the remaining part of the thesis. The basic theory presented in this chapter is mainly based on the books of Papoulis and Unnikrishna Pillai [2002], Simon [2006], Van Trees [2001], Robert [1994], Bergman [1997], Anderson and Moore [1979], and Jazwinsky [1970]. 10 2.1 RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS 2.1 Recursive Bayesian Estimation for Nonlinear Systems 11 The main objective of an estimation problem is to extract information about the states of a physical system given the observation of the states. In the Bayesian approach, both the system states and the measurements are treated as stochastic quantities, and the inference about the estimate is a conditional density function of the states given the measurement or observation outcome. In Bayesian estimation theory the state vector (say 0) which describes the physical states of the system, and the measurement vector (say ?/>) which is the outcome of the measurement experiment or instrumental observations, are often considered as random vectors. Our objective is to estimate the current state of the system using all the past observations. In the Bayesian approach, all the statistical information about the current state is condensed in the conditional density p(0\ip) given by, = pjmm fm) (2„ where p(ip\6) is the probability density function of the measurement vector, p(0) is the prior density function, which is assumed to be known. p(0\if)) is also known as the posterior density function, which describes all statistical information about the random state vector 0 after the observation process, p (xp) is given by, p(il>)= f p(il>\0)p(0)d0 The joint density function of the states and the observations, p(ip\0)p (2.2) (0) defines the estimation problem, while the posterior density function p(0\ij)) is regarded as the solution to the estimation problem. In practice, it is difficult to find the complete posterior density. However, it is desirable to determine an "optimal" estimate 0 of the model state using the mathe- 2.1 RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS 12 matical rule or function (known as the estimator). In most cases the cost function, which represents the erroneous estimate, depends only on the estimation error (difference between true and optimal estimate) given by, 0 = 0-0 (2.3) Two most commonly used optimal estimators are the minimum mean-squared error estimate and the maximum a-posteriori estimate, which upon some conditions is equivalent to the maximum likelihood estimate. In the Bayesian framework it is assumed that the posterior density is symmetric about its mean hence the optimal Bayesian estimator is the conditional mean. The optimal estimate for the minimum mean-squared estimator is given by 0 MS = I 9p(0\il>)d6 (2.4) Similarly the maximum a-posteriori estimate is given by 0MAP — arg max.p(6\tf) d6 (2.5) 0 Another quantity which gives the information about the goodness of the optimal estimate is the mean-squared error covariance matrix: P^E [e-e)(e-e) (2.6) The error covariance can be directly computed from the posterior density p= f (e-e\(e-e\Tp{o\^)de (2.7) Therefore the mean-square estimate and its estimation error covariance are the first RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS 2.1 13 and second central moments of the posterior density p(0\ij>) respectively. In the case of recursive estimation, the information about the system state is updated continuously as new measurements are collected. This recursive analysis of observations is suitable in meteorology and physical oceanography problems where the states of the physical system change with time. In recursive estimation the states evolve in time according to a Markov process with initial states do ~ p (do) and the transition kernel, which is the transition function of the stochastic process given by p(6k+1\ek) fc = 0,l,... (2.8) The Markovian property implies that given the present state, the future states are independent of the past states. The measurement of the state at time k is conditionally independent of the previous measurements given the present states. The prior p (do), the transition kernel p (dk+i\dk) an d the likelihood p (ij)k\Ok) defines the joint density of all measurements and states from time zero to time k. Applying Bayes' rule to the last measurement vector ipk gives P W^l:*-l) P(V>fc|0fcMfc|0*)p(Wl:*-l)d0* (2-10) Here we make use of the Markovian property that the measurement vector Vfc is conditionally independent of the previous measurements given the present state. The expression (2.10) is known as the measurement update in recursive Bayesian estimation. Similarly by assuming the state evolution is Markovian and Oh+i is independent RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS 2.1 14 of the past measurements tp^j, given the present state 6k, the effect of time step is given by p (0 fc+ i, 0fc|V>1:fc) = P (0fc+i|0fc, ^i-.k) P (flfclV^fc) = p(Ok+1\Ok)p(Ok\il>1:k) (2.11) Integrating (2.11) with respect to 6k gives the time update equation of the recursive Bayesian estimation, also known as the Chapman-Kolmogorov equation, given by p(0k+1\^1:k)= f p(0k+1\0k)p(9k\il>1:k)d9k (2.12) Equations (2.9) and (2.12) form the recursive propagation of posterior density of the states given the measurements, and the recursion is initiated by p (0 o |V'-i) = V (#o)The estimates corresponding to the minimum mean-squared estimator and maximum a-priori estimator are given by °™t= [ ekp(dk\xphk)d9k (2.13) 0 ^ A P = avgnmxp(dk\^1:k)d9k (2.14) The corresponding estimation error covariance is given by Pk\k= [ (d-d)(o-e)Tp(dk\^1:k)ddk JRn V / \ (2.15) / In general, there is no explicit analytical solution to the integrals (2.10) and (2.12) especially in the case of multidimensional nonlinear physical systems with nonGaussian elements. If the state transition and the measurement relation are linear, and the noises acting on the system are Gaussian, then it is possible to solve the recursive Bayesian estimation problem analytically, and the solution is the known as 2.2 EXTENDED KALMAN FILTER 15 the celebrated Kalman filter. However, many geophysical systems are nonlinear with non-Gaussian components, and it is practically impossible to find the analytical solutions for the optimal estimators and estimation error covariances. In general there are two ways to tackle this problem. The first one is to simplify the dynamical model so that an analytical solution is obtained. In this case, the nonlinear model equations are expanded into a Taylor series around mean states, assuming Gaussian distribution [Jazwinsky, 1970; Simon, 2006; Van Trees, 2001]. This approach is commonly known as the Extended Kalman Filter (EKF), which we will discuss in the next subsections. The second way is to approximate the posterior distribution for non-Gaussian systems[Doucet, 1998; Simon, 2006]. It is usually done by a weighted sum of Gaussian distributions. This is one of the most common approach in Sequential Importance Re-sampling (SIR) based particle filters [Arulampalam et al., 2002; Doucet, 1998; Doucet et al., 1997; Gordon et al., 1993] and Gaussian-Sum Filters (GSF) [Kotecha and Djuric, 2001, 2003]. In a broad sense the Sigma-point Kalman Filters use the same approach, which we will describe in detail in later sections. 2.2 Extended Kalman Filter The basic Kalman Filter (KF) is a simplification of Bayesian estimation for linear Gaussian systems that provides sequential, unbiased, minimum error variance estimate based upon a linear combination of all past measurements and dynamics [Welch and Bishop, 1995]. The basic Kalman filter algorithm is limited to linear models. However, most of the atmospheric and ocean dynamical equations are non-linear. In such situations, a Kalman filter that is linearized about the current mean and covariance, referred to as an extended Kalman filter (EKF) is used. To formulate the extended Kalman filter equations, consider an L dimensional dynamical system 2.2 EXTENDED KALMAN FILTER 16 represented by a set of discretized state space equations, Ok = f(Ok-i,qk-i) (2-16) i/>k = h(Ok,rk) (2.17) where 6k represents the system state vector at time k, /(•) is the nonlinear function of the state, qk is the random (white) model errors, t/)k is the measured state, h(-) is the measurement function, and rk is the zero-mean random measurement noise. Also, we assume that the state, observations and the noise terms are Gaussian processes. To solve the estimation problem, we must determine the conditional probability density function p{Ok\tpi:k), which conveys all the information about the state, using (2.9). Since we are assuming that this probability density function is Gaussian, we only need to calculate its mean and covariance. The mean (estimate) and the covariance can be obtained using the extended Kalman filter for the nonlinear system (2.16)-(2.17), which consists of a time update (or prediction) via 0k=f(ek-i,qk-i) (2-18) Pk- = *kPk-1*J (2.19) + Qk and a measurement update (or correction) via ek = e~k+Kk[^k-h(e'k)\ Pk = [I- KkHk] Pk [I - KkHkf (2.20) + KkRkKj (2.21) where Pk is the state error covariance matrix, and Kk is the Kalman gain given by Kk = PkHk [Hkp-kHTk + Rk]-1 (2.22) ENSEMBLE KALMAN FILTER 2.3 17 The superscript "—"represents the prior states given by the following equations d^ElfiO^q^)} (2.23) $-k=E[h{ez,rk)] (2.24) = E [,] = H6~k (2.25) where E [•] represents the mathematical expectation or the expected value. Qk and Rk are the model and observation error covariance matrices respectively. The matrices k = h(Ok,rk) (2.32) where 0k represents the system state vector at time k, /(•) is the nonlinear function of the state, qk is the random (white) model errors, ipk is the measured state, h(-) is the measurement function, and rk is the zero-mean random measurement noise. To understand how the Sigma-point algorithm works, it is convenient to rewrite the standard Kalman filter optimal state update equation ek = e~k+Kk{tt>k-i>t) (2.33) where the superscript "—"represents the prior states. Here Kk is the Kalman gain, which is optimally chosen such that it minimizes the weighted scalar sum of the diagonal elements of the error covariance matrix Pi [Gelb, 1974]. The standard expression for the Kalman gain and the error covariance matrix is given by Kk = P^HT [HPekHT + R] ~X Po„ = E [ 0 k - 0kJ [Ok - 0 k J (2.34) (2.35) where H is the linearized measurement operator. The error covariance update or the analysis covariance matrix, which represents the change in forecast error covariance, 2.4 SIGMA-POINT KALMAN FILTERS 21 when a measurement is employed, is given by Pek = (I - KkH) P-Bk (2.36) where I is the identity matrix. For EKF, the formulation of the forecast error covariance is given by Pek = Ak-iPe^Al-i + Qk-i (2-37) where Ak-i is the TLM of the nonlinear model (2.31)-(2.32), and Qk_i is the model error covariance matrix. The TLM often introduces errors in highly nonlinear models and is extremely difficult to obtain for GCMs. Another drawback of EKF is that it uses the linearized measurement operator, H, to calculate the Kalman gain and update error covariance. The linearization of nonlinear measurement function is computationally difficult and may result in estimation errors. On the other hand, the forecast error covariance for EnKF can be calculated by integrating the ensemble of model states, and is given by Pgk = (Ok - h)(6k - 0k)T (2.38) where the over-line denotes the ensemble average. Limited ensemble size often introduces errors in approximating the error covariance matrix. The EnKF also uses the linearized measurement operator, H, to calculate the Kalman gain. If the model involves nonlinear measurement functions, linearizing the nonlinear measurement functions might result in estimation errors. The SPKF family addresses the above issues of EKF and EnKF. It uses a different approach in calculating the Kalman gain and the error covariance matrices. The technique employed in SPKF is to re-interpret the standard Kalman gain and 2.4 SIGMA-POINT KALMAN FILTERS 22 covariance update equation in such a way that it does not need the TLM and the linearized measurement operator. This interpretation is explained as follows. The first term, Pg HT, in Kalman gain equation (2.34) can be interpreted as the cross-covariance, Pe ^ , between the state and observation errors and the remaining expression can be interpreted as the error covariance, Px , of the difference between model and observation [Gelb, 1974]. Proof1 of this interpretation can be found in Appendix A. Therefore the optimal gain or Kalman gain, Kk, can be re-written as Here V'fcis defined as the error between the noisy observation ipk and its prediction */;fe given by ij)k = tyk — t/>fc . By using the relation (2.39) the covariance update equation (2.36) can be re-written as (see Appendix B for details) Pek = P~ek - KkP^Kl (2.40) Unlike the standard KF, the SPKF algorithm makes use of this new interpretation (Eqs. 2.39 &; 2.40), which avoids the use of the "Jacobian" while retaining consistency and accuracy. In the standard KF the state error covariance is calculated during the "time update" process, and is updated during the "measurement update" process. Updating the error covariance matrix is important since it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation since it can directly calculate the updated error covariance matrix from a set of ensembles. Evensen [2003] has derived the analysis covariance equation, which is consistent with the standard KF error covariance update *A more detailed statistical derivation and interpretation of these formulations can be found in Simon [2006]. 2.4 SIGMA-POINT KALMAN FILTERS 23 equation (2.36). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible. The SPKF makes use of the reformulated error covariance update equation (2.40) and "chooses" the ensembles deterministically in such a way that it can capture the statistical moments of the nonlinear model accurately; in other words, the forecast error covariance equation (2.35) is computed using deterministically chosen samples, called sigma points. In a broad sense, the SPKF algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus SPKF is fully consistent with the "time update" and "measurement update" formulation of the Kalman filter algorithm. In the next subsection we will discuss each SPKF algorithm in detail. 2.4.2 Sigma-point Unscented Kalman Filter The Sigma-Point Unscented Kalman filter (SP-UKF) [Julier, 1998; Julier et al., 1995; Wan and Van Der Merve, 2000] is a SPKF that can capture the statistical properties of the model state through a method known as scaled unscented transformation (SUT) [Julier, 2002]. Unlike the EKF, the SP-UKF uses the true nonlinear model and approximates the state distribution using a set of deterministically chosen states, known as sigma-points, using SUT. In SP-UKF the state error covariance matrix is calculated from a set of "particular ensembles " that are generated by "sigmapoints " . Julier et al. [1995] have shown that for the nonlinear model given by (2.31), the number of sigma-points needed to compute precisely the mean and covariance of the model state at time k is 2L + 1. The sigma-point state vector is given by [Julier, 2002; Julier et al., 1995; Wan and Van Der Merve, 2000] Xk=[xkfl X+Ki X^} i = l,...,L (2.41) 2.4 SIGMA-POINT KALMAN FILTERS 24 where Xkto, X-tn an< ^ -^-k i a r e ^ n e sigma-point vectors. The selection scheme for choosing the sigma-points is based on the scaled unscented transformation that transforms the model state vector according to the following equations: 4m) = j Xkfl = 0k *t,i = 0k+(y/(L + \)P9ky i = i,...,L ^ ^c) = ^ - ~ (2-42) + ( l - a 2 + /3) (2.43) x k,i = ek~ \J(L + X)P0kJ (m) i = (L + i),...,2L «/:v (c) 1 — TV] ' = -—- 2 (L + A)— i= l,...,2L (2.44) where ( y^(L + A) Pe fc J is the ith row (or column) of the weighted matrix square root of the covariance matrix Pgk. io\ is the weighting term corresponding to the mean and zvf' corresponds to the covariance. A = a2 (L + K) — L is a scaling parameter. The parameter a is set to a small positive value (0 < a < 1) and determines the spread of the sigma-points around the mean state dj- • nis another control parameter which guarantees the positive semi-definiteness of the covariance matrix and is usually set to a positive value (K > 0). /3 is a non-negative weighting term which can be used to incorporate any prior knowledge of the nature of the state distribution 2 . The sigma-point vector is then propagated through the nonlinear model (2.31)(2.32) given by xl^fixl^xU) lf (2.50) 6~k)(yk,i-i>lf (2.51) 2L i>k j=0 2L yOk The Kalman gain, K, can be calculated using the equation (2.39) and the state covariance is updated using equation (2.40). A detailed description and derivation of the UKF algorithm and sigma-point formulation can be found in the above referred literature. 2.4.3 Sigma-point Central Difference Kalman Filter Unlike the SP-UKF, the SP-CDKF is based on the Stirling's interpolation formulae [Ito and Xiong, 2000; N0rgad Magnus et al., 2000b; Press et a l , 1992]. In SP-CDKF the analytical derivatives in EKF are replaced by numerically evaluated central divided differences. The linearization in SP-CDKF is based on weighted statistical linear regression [Lefebvre et al., 2002]. It has been shown that the SP-CDKF has superior performance to SP-UKF and EKF [Ito and Xiong, 2000; N0rgad Magnus 2.4 SIGMA-POINT KALMAN FILTERS 26 et al., 2000b]3. Using Stirling polynomial interpolation, the nonlinear model given by equation (2.31) can be approximated as f{dk)^f{4>k) + bk + ^bl (2.52) where / ((/>fc) is the linearized model, and Dk and Dk are the central divided difference operators, which we will explain in the latter part in this section. Here the linearization of the nonlinear model is achieved by using a linear transformation that statistically decouples 4 the state vector Bk [Schei, 1997]. It has been shown that this approximation is always better than using the Jacobian matrix [Schei, 1997]. The linear transformation is based on the square root factorization of the model covariance matrix, and is given by 4>k = S^0k / ( & ) = / C M * ) = /(**) (2.53) (2-54) Here 0k is the mean state and Sgk is the Cholesky factor of the updated error covariance matrix (2.40) that satisfies the following condition Pek = S0kSl (2.55) - 2 The terms Dk and Dk are the first and second order central divided difference operators and can be written as [Ito and Xiong, 2000; N0rgad Magnus et al., 2000a; 3 But our numerical experiments show that the SP-CDKF does not always outperform SP-UKF. See the below discussions. 4 T h e linear transformation from the stochastic vector k to Ok- Ok = S$k4>k, decouples the fully coupled state vector Ok where the covariance of 4>k is equal to identity matrix. For computational reasons the square root matrix Sgk often keeps triangular (Cholesky decomposition). More details on decoupling and its advantages in Kalman filters can be found in Ohmuro [1984], Baheti et al. [1990], and Daum and Fitzgerald [1983] 2.4 SIGMA-POINT KALMAN FILTERS 27 Van der Merwe and Wan, 2001a; Wan and Van der Merwe, 2001]; ^k = \J2 (4k - 4>k)i midi (2.56) f (4k) i=0 / i>l- L L d J2 (4k ~ 4k)] f + J2J2(4k~ j=l 4=0 k)j (4k - 4k) g (™A) (mqdqf(4k) ) - S2-L S2 1 252 (a) = J_ 452 i = i,...,2L (2.62) i = l,...,2L <52-l M _ z^ 1,...,2L 4<54 The augmented sigma-point vectors are then propagated through the approximated nonlinear model (2.52), and the approximated mean model state vector can be computed as follows. Ou ~ E f(4>k) + Dk + ^i)l 62-L 82 1 (2.63) L (2.64) i=l 2L E XVi (2.65) A. i Similarly, the measurement state, the forecast covariance and the cross-covariance for the calculation of the Kalman gain are given by 2L (2.66) L P 8k ~ E W Cl i (**:,« ~~ Xk,L+i) + Mi* {Xk,i + Xk,L+i ~ 2Xkfl) (2.67) L .0 \ 2 ' (ci) W i - * U H ) 2 + ^(C2) 0 > t + * l ™ - 2 u E h (2.68) i=l L (2.69) j=0 yJ^PH. [?UL - 91,L + l : 2 L j (2.70) 2.5 A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH 29 One main advantage of SP-CDKF over SP-UKF is that it uses only one "control parameter" (S) compared to three (X,a, and «,) in UKF. For exact derivation and algorithmic details see Ito and Xiong [2000], N0rgad Magnus et al. [2000a], Van der Merwe and Wan [2001a], and Wan and Van der Merwe [2001]. 2.5 A New Ensemble Kalman Filter: Sigma-point Approach In this section we extend the concept of sigma point approach to the traditional ensemble Kalman filter (EnKF). In SPKF the forecast error covariance matrix is computed using deterministically chosen samples, called sigma points. In general we can consider these sigma-points as specific conditional ensembles, i.e. choosing "ensemble" (sigma-points) using the scaled unscented transformation (SUT) for SPUKF and the Stirling's approximation formula for SP-CDKF. In other words, we can consider the ensemble Kalman filter as a general sigma-point Kalman filter without a specific selection scheme for choosing the ensembles. The main concept behind the formulation of EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which is described by the Fokker-Plank equation [Jazwinsky, 1970], can be estimated using ensemble integrations [Evensen, 1994, 1997]. i.e. the error covariance matrices can be calculated by integrating the ensemble of model states. Thus the forecast error covariance matrix, given by the Kalman filter equation can be rewritten as pf=pfe= (dk - ek)(ek - hV (2.7i) where P{ is the forecast covariance matrix from an ensemble of model forecast states. The over-line denotes the average of the ensemble. Theoretically an infinite ensemble of model states is required to compute the full covariance matrix, which represents the true error statistics of the model. However, it has been shown that [Evensen, A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH 2.5 30 1994, 1997; Houtekamer and Mitchell, 1998] an ensemble of limited size will provide an approximation to the error covariance matrices with reasonable accuracy. Pf 1 M = M T I E (** - *) (** - ^) T (2-72) m=l where M represents the number of ensemble forecasts. Here we can consider each perturbed initial state as an EnKF sigma-point. Thus the selection scheme for choosing the sigma-points and corresponding weighing terms are given by X-k,m = 6k,m + tk,m W= , _ , (2.73) where efc,m is a sequence of white uniform random perturbations and u> is the weighting term for mean and covariance. The number of sigma-points needed to approximate the mean and covariance of the model state at time A;, is M, which is equal to the total ensemble members. The sigma-point vector is then propagated through the nonlinear model (2.31)-(2.32) to get the forecast sigma-points or ensemble, given by 4 = /(**_!,4_i) (2-74) 0r{ = h(xi,xl) (2.75) where x[ is the forecast sigma-points corresponding to the ensemble forecast states, Xqk_x is the sigma-points corresponding to the model error, and Xrk that of the observation error. Now the ensemble forecast error covariance given by equation (2.72) 2.5 A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH 31 can be re-written as M 1 Pf P T X = ~ek « ^ 3 7 E ( L - 4) (4,m - 4) (2-76) TO=1 M _ T X = " E { im ~ Ok) (4,m " Ok) (2-77) m=l where M d~k~4 = "Y,4,m (2-78) m=l Similarly, the approximated mean, covariance and cross-covariance for the calculation of Kalman gain are computed as follows. M (2-79) ^*««>X>iU m=l M _ T P ^ * " E H m - ^* ) (^,m " 4") m=l M P fc& « ^ E ( 4 m " *fc ) ( ^ . m - ^ f c ) m=l (2-80) T (2-81) The optimal gain (Kalman gain, K) can be calculated using Eq.(2.39) and the state covariance is updated using the equation (2.40). The main advantage of using this approach is that we can avoid Eq. (2.34) that requires the linear measurement operator H to calculate the Kalman gain, thereby extending EnKF for nonlinear measurement functions. We have extended the sigma-point approach to ensemble Kalman filter and formulated the ensemble Kalman filter as a general sigma-point Kalman filter without a specific selection scheme. In other words the forecast sigma-points in SP-UKF and SP-CDKF and other Sigma-point Kalman filter algorithms are actually specific ensembles conditioned on the specific selection schemes, which can represent the error statistics with better accuracy than EnKF. This allows us to extend the EnKF to 2.5 A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH 32 a more generalized approach of SPKF, which is suitable for nonlinear measurement functions. Also, it has been shown [Keppenne, 2000] that the ensemble forecast step in EnKF can be parallelized by running each ensemble member on a separate processor of a parallel computer (or cluster), resulting in huge computational savings. This approach can be extended to SPKF data assimilation problems, since each sigma-point propagated through the nonlinear model, are independent of each other. Chapter 3 Experiments with the Lorenz '63 Model In the field of data assimilation, the celebrated Lorenz model has served as a test bed for examining the properties of various data assimilation methods [Evensen, 1997; Gauthier, 1992; Miller et al., 1994] as the Lorenz model shares many common features with the atmospheric circulation and climate system in terms of variability and predictability [Palmer, 1993]. By adjusting the model parameters that control the nonlinearity of the system, the model can be used to simulate nearly-regular oscillations or highly nonlinear fluctuations. The Lorenz [1963] model consists of nonlinear ordinary differential equations of three components, given by dx ~dl dy_ dt dz ~dl o(y-x) + (f (3.1) px — y — xz + qy (3.2) xy - pz + qz (3.3) where variables x, y, and z are related to the intensity of convective motion, and the temperature gradients in the horizontal and vertical directions, respectively, and the parameters a, p, and f3 will be referred to as dynamical parameters. qx,qy, and qz represents the unknown model errors, assumed uncorrelated in time (white noise). 33 STATE ESTIMATION 3.1 34 Also, we assume that all the measurements or observations are linear functions of the nonlinear model states. The true data are created, by integrating the model over 4000 time steps using the fourth-order Runge-Kutta scheme [Press et al., 1992], with parameters a, p, and j3 set to 10.0, 28.0, and 8/3 respectively, and initial conditions set to 1.508870, -1.531271, and 25.46091 [Evensen, 1997; Miller et al., 1994]. The integration step is set to 0.01. The observation data sets are simulated by adding normal distributed noise to the true data. The assimilation process is completely subject to the model equations (3.1) - (3.3) after the initial guesses are given; at each step of the integration, the initial conditions are the estimated model state from the previous step. 3.1 State Estimation To apply KF, we discretize the nonlinear Lorenz model (3.1)-(3.3) using the fourthorder Runge-Kutta method and write it in the form of state space equations given by (2.31) and (2.32), where 6k represents the system state vector (a column vector composed of x, y and z), /(•) is the nonlinear function of the state and qk is the random (white) process noise vector (column vector composed of qx, qy and qz). The measured model state, xj)k, required for the application of the KF, is a function of the states according to the equation (2.32), where h(-) is the measurement function and rfc is the random measurement noise vector. To implement the SP-UKF, the state vector is redefined as the concatenation of the model states, model errors and measurement errors: the augmented state vector STATE ESTIMATION 3.1 35 ®k and the corresponding covariance matrix are given by the following equations. ek = Pek = (3.4) Qk P,dk 0 0 o Qk o 0 0 Rk (3.5) Therefore the augmented state dimension is the sum of the original state dimension, model error dimension and measurement error dimension given by L@ — La + La + Lr (3.6) where La is the dimension of the state, Lq is the dimension of the model error vector and Lr is that of measurement errors. The augmented sigma-points are found using the transformation equations (2.42)-(2.44). The dimension of the augmented sigmapoint vector is 2LQ + 1. For the Lorenz model discussed here, the augmented sigmapoint vector dimension is 19. In other words the number of sigma-points required to approximate the error statistics accurately is 19. The augmented sigma-point vector is then propagated through (2.45)-(2.46) and the optimal terms for the calculation of Kalman gain are computed according to the equations (2.47)-(2.51). On the other hand, for implementing SP-CDKF, we split the augmentation process between time update and measurement update sections. For time update, the 3.2 PARAMETER ESTIMATION 36 augmented state vector, covariance matrix, and sigma-point vector are given by 0fc-l = 0fc-i 0k-i 0 0 Qk-i P °e*-i = (3.7) 9*-i (3.8) ®k- x + Sx/Pe^ X-k-\ — Ofc_! e fc _!-- tfVPe*-/ (3.9) and for measurement update they are given by (3.10) 0 f c = 0, 0 (3.11) Pek = 0 Xk @k Rk @k + SVPek @k - 8VP@k (3.12) The optimal terms for the calculation of Kalman gain are computed using (2.65)(2.70). 3.2 Parameter Estimation The model parameter estimation can be regarded as a special case of general state estimation where the parameters are treated as specific states. Parameter estimation involves determining a nonlinear mapping Yh = *C(9k,A) (3.13) where the nonlinear map fA£(-) may be the dynamical model /(•) or an empirical model parameterized by the vector A. The state space representation of the parameter 3.3 JOINT ESTIMATION OF STATES AND PARAMETERS 37 estimation problem for the Lorenz model can be written as, Ak = Ak_1 + qL1 i>k = f(Ok,Ak) + rkx (3.14) (3.15) where / ( • ) is the nonlinear measurement model given by the Lorenz equations (2.31)(2.33), A is the parameter vector which constitutes the dynamical parameters a, p and /3, and qfc and r £ represent the model and measurement error vector respectively. The SPKF (SP-UKF and SP-CDKF) equations for the parameter estimation problem are similar to those of the state estimation formulation except the fact that the state (here states are parameters) time evolution is linear (Eq. 3.14) and the measurement function is nonlinear (Eq. 3.15). 3.3 Joint Estimation of States and Parameters The joint estimation approach is required when the "clean" state and parameters are not available. In general there are two approaches to estimate the model state and parameters simultaneously, namely the dual estimation and joint estimation approaches [Haykin, 2001; Nelson, 2000; Van der Merwe, 2004]. In the dual estimation approach, two Kalman filters are running simultaneously for state and parameter estimation. On the other hand, in the joint estimation approach, the system state and parameters are concatenated into a single higher-dimensional joint state vector and only one Kalman filter is used to estimate the joint vector. For example, the joint "state vector" Jk for the SPKF data assimilation can be written as, Jk = [6k Akf (3.16) 3.3 JOINT ESTIMATION OF STATES AND PARAMETERS 38 In the joint estimation process, the SPKF schemes estimate the states using parameters that are estimated at every time step using the prior states. In this study, we will only present the joint estimation (instead of dual estimation) of parameters and states since it incorporates complete model states and parameters during assimilation cycles. Chapter 4 Results and Discussion In this section we demonstrate the feasibility of the SPKF algorithms as effective data assimilation methods for highly nonlinear models. The SPKF algorithms discussed in the previous sections will now be examined and compared with standard EKF and EnKF methods. To compare the SPKF algorithms with standard EKF and EnKF, all experiments were designed almost identically to those of Miller et al. [1994] and Evensen [1997]. 4.1 State Estimation The first set of experiments were carried out with initial conditions, parameters and observation noise levels identical to those in Miller et al. [1994], and Evensen [1997]: the observations and initial conditions are simulated by adding normal distributed noise iV(0, \/2). Also the interval of observation is set to 25 time steps, i.e. the observed states are assimilated to the nonlinear model at every 25 time steps. For all the cases to be discussed, we assume that the model and observation errors are uncorrelated in both space and time. Since there is no general way to set the model error, the amount of model error to use in the KF is often determined experimentally by trial or by statistical methods such as Monte-Carlo, which is com- 39 4.1 STATE ESTIMATION 40 putationally expensive [Miller et al., 1994]. In our experiments, the model errors were intentionally designed in such a way that the model would not drift from the true state too much 1 . In detail, we set the model errors by calculating the expected errors in the state scaled by a decreasing exponential factor which is a function of the assimilation time; initially, the model covariance matrix is set to an arbitrary diagonal value, and then anneals towards zero exponentially as the assimilation proceeds. For simulating model errors in the ensemble Kalman filter, we follow the method suggested by Evensen [2003]. An ensemble of 1000 members was used in the EnKF as in Evensen's experiment [Evensen, 1997]. Figure 4.1 - (a), (b), (c) and (d) show the state estimate using the EKF, EnKF, SP-UKF and SP-CDKF respectively. As can be seen, all four methods can generate the model states similar to true values, indicating good capability of these methods in estimating model states if the size of initial perturbation and observed noise are appropriate as given here. Figure 4.1 - (a) and (b) were also obtained by Miller et al. [1994] and Evensen [1997]. It should be noted that the EKF and the EnKF can have good state estimates but the former needs to construct TLM and the latter asks a large ensemble size of 1000. In contrast, the SP-UKF and the SP-CDKF only use 19 "particular" ensemble members (sigma-points) here, showing their advantages over EKF and EnKF. A comparison among the four methods is shown in Fig.4.2 the variation of the error square (ES) with time step. The ES is defined here as the square of the difference between estimated state and true model state scaled by N, where A" is a scalar quantity 2 Error = 1 (0k - < u e ) 2 x (4.1) The model is considered to have a relatively large error at initial time so that the assimilation weighs more observation information. As such the model prediction would not drift from the 'true value' too much. 2 We choose N as 4000, which is the total assimilation period. 4.1 STATE ESTIMATION 41 4000 Figure 4.1: Case 1 - Assimilation solutions for the Lorenz model: The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions 4.1 STATE ESTIMATION 42 Prom Fig. 4.2, we can see that the SP-UKF and SP-CDKF assimilations have smaller ES than EnKF in most times although some assimilation steps have an opposite situation. These "peak" values of ES correspond to either over-estimation or under-estimation of model states, which are most probably related to random noise in the "observations " and to the chaotic nature of the Lorenz system. The state estimate is probably poor when a large noise is assimilated and when the state is a transition from one chaotic regime to the other (also see below discussions). The overall performance of each assimilation is measured by the root mean square of error (RMSE) over all time steps, as shown in Table 4.1. As be shown, SP — UKF and Assimilation M e t h o d Computation Time (in Seconds) RMSE EKF 37.04 1.812 EnKF (with 1000 ensembles) 7143.57 1.987 EnKF (with 19 ensembles) 132.77 6.123 SP-UKF 133.91 1.640 SP-CDKF 90.42 1.592 Table 4.1: RMSE and computation time for Case 1 SP — CDKF have slightly smaller RMSE than others. The most impressive point in the table is that SPKF methods use only 19 sigma-points (or 19 conditional ensembles) to estimate the statistical moments of the nonlinear model accurately. This turns out to be an advantage for the data assimilation problems in low dimensional systems but in the case of atmosphere/ocean GCMs, the 2L + 1 integration is not computationally feasible. More details on SPKF implementation, its limitations, and methods to overcome the limitations are described in detail in next section (Section 5). For the sake of completeness, we performed an EnKF assimilation experiment with 19 ensembles compared to 1000 ensembles. The result of this experiment is shown STATE ESTIMATION 4.1 0.1 43 (b) I I I i 0.08 0.07 0.09 - g 0 . 0 6 g 0.05 UJ 0.04 0.03 - 0.02 0.01 0.1 I } i„ , \ * . /I. n ri _ . - AlAi.M ... J li it Ai iNL AH JU i ..j.. J . I * J I .J (c) I I 1 1 I 1 i r^ i 0.09 - 0.08 0.07 g 0 . 0 6 o 0.05 t UJ 0.04 _ - 0.03 0.02 I 0.01 -. . -J. i A. l .At ^uijL. - A. J i . . r J ~-..^.ll. . 1 4*. A A.A i AA. i-.JL J_ J 1.J (d) 1 1 0.08 0.07 ,0.06 0.09 1 - o 0.05 1J 0.04 0.03 0.02 1 0.01 JMlJlJtJt IM « . ... - - W A ./Ld.L 2000 Time Steps A, .. 1 Hk. *. 1 ..An Figure 4.2: Case 1 - Assimilation errors for state estimation (corresponding to Fig. 4-1) (a) EKF (b) EnKF (c) SP-UKF and (d) SP-CDKF STATE ESTIMATION 4.1 44 Figure 4.3: (a)-Ensemble Kalman filter data Assimilation solutions for the Lorenz model with 19 ensembles: The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions, (b)-Assimilation errors in Fig. 4.3 - (a) and the corresponding ES and RMSE are shown in Fig. 4.3 - (b), and Table 4.1. These results (Figs. ?? & ??) show that the amplitude of the squared errors of state estimate from the EnKF with 19 ensemble members are around 5-10 times as much as SPKF. Thus EnKF with only 19 members could not capture the mean and covariance of a highly nonlinear Lorenz model appropriately. On the other hand, with just 19 conditional ensembles (or sigma-points), SPKF is able to capture the statistical moments of the highly nonlinear Lorenz model. The assimilation experiments took place on a SMP (Symmetric Multi-Processor) machine with two AMD Optron 248 CPUs [Advanced Micro Devices, 2007] of clock speed 2.2 GHz, running on Linux. MATLAB 7.3.0.298 (R2006) [Mathworks., 2007] software was used to implement the model and data assimilation algorithm. Table 4.1 also compares the 4.1 STATE ESTIMATION 45 computation time taken by each assimilation algorithm discussed above. In order to compare the computational efficiency, we use the same programming framework for implementing all the data assimilation methods discussed above. The computational cost is the least for EKF, followed by two SPKF methods. The EnKF that requires 1000 members for a good estimate as shown in Fig. 4.1 - (b) is the most expensive, around 50 - 80 times as much as SPKF. The second set of experiments was carried out with a more realistic situation by increasing the observations' noise level ten fold: the observations and initial conditions are generated by adding normal distributed noise iV(0, -\/20). The assimilation results were shown in Fig. 4.4 and the corresponding ES is shown in Fig. 4.5. From Fig. 4.4, and Table 4.2 we can see the superior performance of SPKF algorithms. Among SPKF, the SP-CDKF shows comparatively better assimilation skill than SP-UKF. The RMSE of EKF, EnKF, SP-UKF and SP-CDKF assimilation results were 5.39, 6.37, 4.25, and 4.56 respectively. It should be noted that, even though EKF result show a low RMSE value compared to SPKF schemes, it requires the TLM for the assimilation, which is very difficult in the case of GCMs. From Fig. 4.4 and 4.5, it is evident that the SPKF assimilation schemes can give better estimation results even if the system subject to higher noise. Also our numerical experiments showed that an EnKF assimilation with 1000 ensembles can give as good results (not shown here) as SPKF assimilation. The RMSE values are summarized in Table 4.2. Assimilation Method RMSE EKF 5.39 EnKF (using 19 ensembles) 6.37 SP-UKF 4.25 SP-CDKF 4.56 Table 4.2: RMSE statistics for Case 2 STATE ESTIMATION 4.1 46 (c) 1 30 1 1 1 1 UKF + 20 - + + ObS True 10 mytj 1 +. J *SjUiV 0„ tl \ »T li / \ Ifl_L_ -10 if U Til -20 + 1 + i i 1 1 i i 1 (d) 30 CDK + 20 - + + ObS Tn» 10 0. J|+fw jm+ i 1 + •ji+j. J. / *Xllr %Ji rt"/ i \ 1 + Thutl 1ft #irn1i+f Vw f fVU 10 + +"* 20 + i 500 1000 1500 1_ | i i 2000 2500 3000 3500 4000 Time Steps Figure 4.4: Case 2 - Assimilation solutions for the Lorenz model with ten fold increase in the observation noise levels: The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions. (Here the EnKF simulation is performed with 19 ensembles) STATE ESTIMATION 4.1 47 isL •EKF MA (b) ujJI .AL J l l J J AJ J I J A. )i I L l i i AJ L J/UK-, EnKF / Ifl i i J\ AL JJIA/U J.I JUL^IAW UJi. A..u*i/vl ^.ilii.L.i. J,L. Juiljif/ (c) UKF | - 0.25 0.2 - - 6 0.15 - - 0.1 0.05 - J • I I LJIAJ, , . J... ..«.1.JI .. ..ll.ju/1 A/I JLL„I J ..AJM ..A~+AtnlJ. Al-.tfjliuiu 3500 4000 Figure 4.5: Case 2 - Assimilation errors for state estimation (corresponding to Fig. 4-4): (a) EKF (b) EnKF (c) SP-UKF and (d) SP-CDKF STATE ESTIMATION 4.1 48 In the third set of experiments, we increased the observation noise level as well as the interval between consecutive observations; the interval between observations is increased from 25 to 40 and the observations and initial conditions are generated by adding normal distributed noise N(0, \/20). The assimilation results and corresponding ES are shown in Fig. 4.6 and 4.7 respectively. The RMSE values of case 3 experiments are summarized in Table 4.3 4.3. Fig. 4.7 shows some divergence in some time steps of the assimilation track among the four methods. For example, the errors (ES) vary almost steadily in SP — UKF whereas SP — CDKF has a relatively significant variation of ES with time steps. Compared with SPKF, the variation of ES is more striking in EKF and EnKF. The significant variation of ES might be related to the chaotic nature of the Lorenz system and the capability of individual algorithm in capturing the observation information. The chaotic Lorenz attractor is known to have a butterfly shape with two wings. For a good estimate of the transition state from one wing to the other, the assimilation should be able to characterize the information of both wings of the Lorenz attractor. Obviously this depends on two issues: observation itself and the assimilation algorithm. If the observation is more frequently assimilated (i.e., interval between observations is small), sufficient data allow to cover more information of both chaotic regimes in assimilation. This is the reason why there are much more 'abnormal' values of ES in Fig. 4.7 than in Fig. 4.2 where the observation is more frequent. On the other hand, if one assimilation algorithm has better capability to mix observation and model information to characterize transitions, it would have better estimates for transition states. In many cases, it highly depends on the model and observation error covariances. When observation error covariance is usually pre-described, the model error covariance is updated at each assimilation step in the family of Kalman filters, depending on the algorithm. Thus Fig. 4.7 suggests that SPKF is probably better than EKF and EnKF in the assimilation of some transition states using noisy STATE ESTIMATION 4.1 T 500 1000 1500 I I I I 2000 2500 3000 3500 49 4000 Time Steps Figure 4.6: Case 3 - Assimilation solutions for the Lorenz model with less observations and ten fold increase in the observation noise levels: The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions. (Here the EnKF simulation is performed with 19 ensembles) STATE ESTIMATION 4.1 50 0.2 0.15 0.1 0.05 \LhLLJL AJLJLJLJ JL 500 1000 1500 y[ iu.. JMjJli/f 2000 2500 3000 3500 4000 Time Steps Figure 4.7: Case 3 - Assimilation errors for state estimation (a) EKF (b) EnKF (c) SP-UKF and (d) SP-CDKF (corresponding to Fig. 4-.6): 4.2 500 1000 1500 2000 Time Steps 2500 3000 3500 4000 Figure 4.8: Case 3 - EnKF assimilation solutions with 1000 ensembles, for the Lorenz model with less observations and ten fold increase in the observation noise levels: The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions observations. Assimilation M e t h o d RMSE EKF 6.0637 EnKF (with 19 ensembles) 8.2814 EnKF (with 1000 ensembles) 7.0390 SP-UKF 6.1847 SP-CDKF 5.5832 Table 4.3: RMSE statistics for Case 3 Again, we repeated the EnKF assimilation (for case 3) with 1000 ensembles and the result is shown in Fig. 4.8. The result is not as good as SPKF assimilation and seems more "noisy". This is probably because the observation assimilated is less frequent and more noisy, thus the ensemble size of 1000 is probably not enough to capture the statistical moments accurately. 4.2 PARAMETER ESTIMATION 4.2 52 Parameter Estimation Estimating uncertain dynamical model parameters is one of the important tasks in data assimilation, where the measurement function is usually nonlinear. The requirement of the tangent linear measurement operator, H in the optimal gain term given by Eq. (2.40) makes the EKF and EnKF assimilation schemes inaccurate and inappropriate for the parameter estimation in nonlinear dynamical systems. It has been shown [Kivman, 2003] that the EnKF data assimilation gave poor results in estimating the dynamical parameter of the Lorenz model. The SPKF methods should be better alternatives for parameter estimation since they do not need to linearize the nonlinear measurement function. The experimental setup is identical to that of the first case of the state estimation problem discussed in the above subsection. To simulate a more realistic situation, the initial guesses of the parameters are generated by adding normal distributed noise of covariance 100 to the true parameters. In the first case, we assume that only one parameter (say 0) is uncertain. Thus our task is to estimate the correct value of (3 from infrequent observations contaminated by noise. Figure 4.9 shows the SPKF parameter estimation results. Figure 4.9 - (a) shows the parameter estimation using SP-UKF and Fig. 4.9 - (b) that of SP-CDKF. Table 4.4 compares the true and the estimated parameter values, and corresponding estimation errors. From this figure it is clear that SPKF assimilation methods can retrieve dynamical parameters well from noisy observations. In the above experiment, even though the initial parameter was far from the true value (standard deviation is 10) the SPKF method is still able to estimate the parameter accurately. In general, our experiments suggest a faster convergence for the SP-CDKF algorithm. This might be due to the algorithm tuning problem, since SP-CDKF uses only one "control parameter" (S) compared to three (\,a, and /?,) in SP-UKF. In the second case we assume that two dynamical parameters (say p and f3) are PARAMETER ESTIMATION 4.2 53 HWL CDKF — — -True P 400 800 1200 1600 2000 2400 2800 3200 3600 4000 Time Steps 400 800 1200 1600 2000 2400 2800 3200 3600 4000 Time Steps Figure 4.9: Parameter estimation: (a) SP-UKF assimilation (b) SP-CDKF assimilation. True j3 - dashed line, estimated (3 - solid red line. Assimilation Method True 0 Estimated f3 Squared error SP-UKF 2.6667 2.5152 0.0229 SP-CDKF 2.6667 2.4979 0.0285 Table 4.4: Single parameter estimation statistics (Case 1) 4.3 PARAMETER ESTIMATION JW- 30 54 £L 20 ^x rl CCIKTj- - - _ •" 0.02 0.015 0.01 - _ - 0.04 0.035 „ 0.03 X S 0.025 0.005 MM I kJ JljHkA.../lj» lAlll . .. JJL I . tlA A ^ L. L.A .A...1 .Jk -•iflL. . AJI Figure 4.14: Assimilation errors for SP-CDKF joint estimation: (a) estimation error for x (b) estimation error for a Chapter \J I Reduced Space Sigma-point Kalman Filter In the preceding sections, we have demonstrated the power and merits of SPKF, as well as its advantages over EKF and EnKF by the low-dimensional Lorenz model. One of the crucial issues in evaluating a data assimilation algorithm is its computational expense when applied to realistic models that have a large size of dimensionality. In this section we will further explore the SPKF using higher dimensional Lorenz models. 5.1 SPKF Data Assimilation in Higher Dimensional Systems For an L-dimensional system, the number of sigma-points required to estimate the true mean and covariance is 2L + 1. As described in the previous sections, this procedure works well for low dimensional models such as the Lorenz '63 model, but 2 L + 1 sigma-point integration is computationally infeasible, if the dimension system is of the order of millions as in GCMs. Julier [Julier, 2003; Julier and Uhlmann, 2002, 2004] has shown that using simplex unscented transformation, the minimum number of sigmapoints that gives same estimation accuracy as SP-UKF can be reduced to (L + 1). These sigma-points are called simplex sigma-points, but for higher dimensional systems this L + l simplex sigma-point integration is still computationally intractable. A possible solution to this problem is to reduce the number of sigma-points by selecting 60 5.1 SPKF DATA ASSIMILATION IN HIGHER DIMENSIONAL SYSTEMS 61 a particular subset of sigma-points from the original sigma-point space, which can approximate the error statistics of the model. In the following subsections we will examine this possibility. 5.1.1 A subspace approach with sigma-points: Design and Implementation It has been shown that the number of degrees of freedom necessary to describe most large scale geophysical systems is finite, and their dominant variability can be described by a limited number of modes or functions [Lermusiaux, 1997; Lermusiaux and Robinson, 1999; Teman, 1991]. These functions evolve in time and space in accordance with the system. The techniques commonly used for deriving such functions include dynamical normal modes, dynamical singular vectors and values, and empirical orthogonal functions (EOFs) [Davis, 1977; Lorenz, 1965; von Storch and Frankignoul, 1997; Wallace et al., 1992; Weare and Nasstrom, 1982], principal interaction and oscillation patterns (PIPs and POPs) [Hasselmann, 1988; Penland, 1989; Schnur et al., 1993; von Storch et al., 1995] , and radial functions and wavelets [Gamage and Blumen, 1993; Meyers et al., 1993]. Lermusiaux et. al. [Lermusiaux, 1997; Lermusiaux and Robinson, 1999] proposed a method to reduce error space, called the Error Subspace Statistical Estimation (ESSE). In the ESSE approach, a reduced rank approximation PJ! to the error covariance Pk should be defined by minimizing the norm of the difference between Pk and P£; i.e. rank (F^) = p m i n ^|| P f c „pP|| (5.1) According to the minimum criterion (5.1), the error subspace is characterized by the singular vectors and singular values of Pk. We follow a similar idea as in ESSE to form a sigma-point subspace that approximates the mean and error covariance of system. In our approach, it is assumed that 5.2 EXPERIMENTS WITH THE LORENZ '95 MODEL 62 when the estimate of a system's full errors requires all sigma-points, its dominant errors can be estimated using the most important sigma-points. Theoretically these most important sigma-points should be chosen based on (5.1). However this will bring huge complexity and be difficult to implement. For simplicity, as a good start towards a complete solution to the problem, we have used principal component analysis (PCA) to identify the most important sigma-points which influence the evolution of error covariance. The main idea behind using the PCA is to represent the multidimensional sigma-point space by a fewer number of sigma-points retaining main features of the original sigma-point space; i.e. sigma-points in the principal component space are used to calculate the error propagation. The selection of sigma-points is based on the proportion of variances. Specifically, instead of using the full sigmapoint space, we use some leading principal components, thereby reducing the number of sigma-points required to approximate forecast error covariance. In the following subsections, we will see the potential of this approach in the assimilation of higher dimensional systems. 5.2 Experiments with the Lorenz '95 Model The Lorenz '95 [Lorenz, 2006] model is a one-dimensional atmospheric model introduced by E. Lorenz in 1995 to explain the dynamics of weather at a fixed latitude. It has similar error growth characteristics as full NWP models. The model contains K variables X\, • • • , X^, which may be thought of as atmospheric variables in K sectors of a latitude circle, governed by, dX —rr — —Xk-i {Xk-2 ~ Xk+\) — Xu + F (5.2) where the constant F, called the forcing term, is independent of k. By using the cyclic boundary conditions, the definition of Xk is extended to all values of fc; i.e. 5.2 EXPERIMENTS WITH THE LORENZ '95 MODEL 63 Xk-K and Xk+K equal X^. It is assumed that a unit time At = 1 is associated with 5 days. The experimental setup is similar to that of Lorenz [2006], where K = 36, and the magnitude of the forcing set to 8 for which the system is chaotic. The system is integrated using the 4th order Runge-Kutta, with a time step At = 0.05 (i.e. 6 hours). The experiments were carried out with random initial conditions, and the observations are generated by adding normal distributed noise N{0, \/2) to the true states. Also the interval of observation is set to 10; i.e. the observed states are assimilated to the nonlinear model at every 10 time steps. A more detailed discussion of the model and its characteristics can be found in Lorenz [2005, 2006], and Lorenz and Emmanuel [1998]. 5.2.1 Performance and Evaluation For all the cases to be discussed, we assume that the model and observation errors are uncorrelated in both space and time. In the first case we use "full" sigma-point space for the calculation of error covariance. Thus we have a total of 217 sigma-points, hence 217 ensemble members. Figure 5.1 - (a) shows the state estimate using the SPUKF. As can be seen, SP-UKF can estimate the model states similar to true values, indicating good capability of the original SPKF methods in estimating model states. In the second case we use the reduced sigma-point subspace to calculate the error covariance. In this case we select 20 sigma-points which account for more than 90% of the total variance. The result of this experiment is shown in Fig. 5.1 - (b). As can be seen, the model states can be fairly well estimated by the reduced SPKF, although its estimate accuracy is not as good as the original SPKF. This suggests a possible solution to apply SPKF for high dimensional systems. For the sake of completeness, we performed an EnKF assimilation experiment with 20 ensembles. The ensemble is generated using the same approach as the previous experiment with Lorenz '63 model, 5.2 EXPERIMENTS WITH THE LORENZ '95 MODEL 64 (c 10 t + "** -tA t \MJ\ 5 • X1 0. -5 f f + + + enkf + -10 , ! , A W- / , - Obs True - Figure 5.1: 36 Variable Lorenz *95 model: Assimilation solutions for X\. The black curve is the true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions. (a)-Case 1, (b)-Case 2, (c)- EnKF with 20 ensembles. 5.2 EXPERIMENTS WITH THE LORENZ '95 MODEL 65 where we used 19 ensembles (Fig.4.3-a). The result of this experiment is shown in Fig. 5.1 - (c). Comparing Fig. 5.1 - (b) and Fig. 5.1 - (c) reveals that the reduced SPKF is better than the EnKF for the state estimate, especially for the magnitude estimate. It is apparent that the EnKF underestimates the magnitude of model states during the transition phase period. The RMSE values for the above experiments are summarized in Table 5.1. Assimilation Method RMSE SP-UKF (using 241 SPs) 1.7403 Reduced space SP-UKF (using 20 SPs) 3.3491 Standard EnKF (using 20 ensembles) 3.5308 Table 5.1: : 36 Variable Lorenz *95 model: RMSE statistics for XI We also performed the SPKF assimilation experiment for 960 variable Lorenz '95 model. The experimental setup is identical to that in the previous cases except that K = 960. Two cases are studied with the model. In the first case we use all sigmapoints, which constitutes a total of 5761 sigma-points, and in second case we use 200 important sigma-points for the calculation of error covariances. The results of these experiments are shown in Fig. 5.2. For comparison, we also performed an EnKF assimilation experiment with 200 ensembles, and the result of this experiment is shown in Fig. 5.2 - (c). Apparently the reduced SPKF leads to a better estimate than the EnKF in both phase and magnitude simulation. As can be seen in the EnKF, the estimated state is often out of the phase of "true" trajectory, which is absent in the reduced SPKF. The correlation between the estimated trajectory and "true" trajectory is 0.59 for the reduced SPKF against 0.10 for the EnKF. Table 5.2 shows the comparison between the RMSE values for different experiments. A great deal of additional research is needed for better design and implementation of these techniques applied to atmosphere/ocean GCMs for state, parameter and joint EXPERIMENTS WITH THE LORENZ '95 MODEL 5.2 Figure 5.2: 960 Variable Lorenz 95 model: Assimilation solutions for X\. curve is the true model, + indicates the noisy observations and the red trajectory the assimilated solutions. (a)-Case 1, (b)-Case 2, (c)-EnKF with 200 ensembles. Assimilation M e t h o d RMSE SP-UKF (using 5761 SPs) 2.2397 Reduced space SP-UKF (using 200 SPs) 3.5864 Standard EnKF (using 200 ensembles) 4.6240 Table 5.2: ; 960 Variable Lorenz 95 model: RMSE statistics for XI 66 The black indicates 5.2 EXPERIMENTS WITH THE LORENZ '95 MODEL 67 estimation problems. However, the above experimental results are promising, and a variety of possible extension to these techniques could be developed to deal with more complicated situations. Chapter 6 Conclusion The EKF and EnKF, two important Kalman-type filters, have been widely applied for atmospheric and oceanic data assimilation due to efficient and simple algorithms. The major weakness of the EKF and EnKF are that the former needs to calculate the tangent linear model or Jacobian for linearization of nonlinear forecast models whereas the EnKF performance is greatly dependent on ensemble size, which is often an intractable burden for computation. Yet, the EKF and EnKF cannot deal with the systems directly if observed data are a nonlinear transformation of states. In this study we introduced and presented two recently proposed derivative-less sigma-point Kalman filters. The SPKF is a technique for implementing a derivativeless optimal estimation using a novel deterministic sampling approach that ensures a set of samples to accurately estimate forecast error statistics. It is unlike EnKF where a random sampling strategy is used. The technique employed in SPKF is that it reinterprets the standard Kalman gain and covariance update equation in such a way that it does not need linearization of the nonlinear prediction model and nonlinear measurement operator. However, SPKF can capture the statistical moments of the nonlinear model accurately using deterministic sampling technique. Thus in SPKF, the forecast error covariance equation is computed using deterministically chosen samples, called sigma points. In a broad sense, the SPKF algorithm can be considered 68 69 6.0 as a particular case of ensemble Kalman filter with a specific sample selection scheme. In other words, the forecast sigma-points in SPKF algorithms are actually specific ensembles conditioned by the specific selection schemes, which can represent the error statistics accurately. Also, the ensemble forecast step in SPKF can be paralleled by running each ensemble member on a separate processor of a parallel computer (or cluster), resulting in huge computational savings. Using the highly nonlinear low dimensional Lorenz '63 model and a higher dimensional Lorenz '95 model, we investigated the capability and performance of the SPKF over the standard KF-based data assimilation methods for three different classes of problems, namely state estimation, parameter estimation and joint estimation. The results demonstrated that the SPKF has better estimate accuracy than the EKF and the EnKF for all experiments. SPKF experiments with a higher dimensional model suggest that it is possible to reduce the number of sigma-points thereby reducing the computation time, by using a reduced sigma-point space approach. The results in this study are encouraging and suggest that the SPKF could become an effective method to assimilate observations into realistic models such as the atmospheric or oceanic GCMs. The SPKF also has the advantage that it does not need the tangent linear or the Jacobian operators of the original models. The SP-UKF and the SP-CDKF data assimilation involves the calculation of the matrix square root of the state covariance matrix, which is a computationally intensive process. It has been shown [Van der Merwe and Wan, 2001a,b] that square-root formulation of the SP-UKF and the SP-CDKF is numerically efficient and stable, and has equal estimation accuracy when compared to original SP-UKF and SP-CDKF. Since the state space dimension of the model that we used in this study is relatively small, it is practically irrelevant to compare the numerical stability of the square-root formulation with original SP-UKF and SP-CDKF implementation. Therefore this issue is left for future study in GCMs. 70 6.0 In this study, we explored the SPKF using highly simplified nonlinear models. One might be concerned by the performance and efficiency of the SPKF when a realistic GCM is used. Additional research is needed for better implementation of these techniques applied to data assimilation problems in atmospheric or ocean GCMs. Yet, the present study represents a step in pursuing advanced data assimilation algorithms by using a simple nonlinear model, which shares some common features with complicated atmospheric and oceanic models. Appendix A Re-interpretation of the Standard Kalman Gain The optimal state update equation in Kalman filter filter algorithm can be written as dk = e~k+Kk(^k-^~k) (A.i) where the superscript "—"represents the prior states given by the following equations «* = ^ [ / ( » * - i , 9 f c - i ) ] (A.2) fc = E[h(Ot,rk)] (A.3) = E[il>k] = H§; (A.4) Here H is the measurement operator. Thus the state and covariance update equations can be re-written as ek^e~k+Kk{^k-He~k) (A.5) P0k = (I - KkH) P-Bk (A.6) 71 RE-INTERPRETATION OF THE STANDARD KALMAN GAIN 72 Now the "standard" Kalman gain equation is given by Kk = P^HT[HP^HT + R] - l (A.7) where Pg is the forecast error covariance matrix. The first under-bracketed expression in Kalman gain term can be interpreted as the cross-covariance between the state and observation errors [Gelb, 1974]: P 6ki>k ~ (A.8) (ek-e-k)wk-^y E = E (ek-e~k)(Hdk + T rk-Hd^) (A.9) T (A.10) = E (0k - rk)(dk - rk) }H = + E [dkrl] - E 6krl (A.H) Pe^ Similarly the second under-bracketed expression in Eq. A.7 can be interpreted as the error covariance of the difference between model and observation [Gelb, 1974]: P *=E (A.12) = E (H9k + rk - H0~k) (H0k + rk - H6~k) = HE (0k - d~k) (0k - el) + HE T" 0k - e~k) rl + E = HP. HT + R (A.13) HT + E [rkrl] (A. 14) (Ok - el) H* (A.15) (A.16) Therefore the Kalman gain can be rewritten as K k - p ek$kp$k (A.17) RE-INTERPRETATION OF THE STANDARD KALMAN GAIN 73 The main advantage of using this form of Kalman gain is that we can avoid the use of measurement operator especially when the measurement operator is a nonlinear function of the state. Appendix B An Alternate Formula for Updating the State Error Covariance Matrix The estimation error is defined as 8k = &k — &k (B-l) Similarly the error between the noisy observation xpk and its prediction i^fc , is given by j>k = V f c - $~k (B.2) Substituting (B.l) into the state update equation (A.l) , we can rewrite the estimation error as dk = Ok-Ok-Kk(i/>k-fc) = 6~k- Kk^k 74 (B.3) AN ALTERNATE FORMULA FOR UPDATING THE STATE ERROR COVARIANCE MATRIX 75 Here we made use of the fact that the estimator is unbiased: E A = 0 (B.4) Now, the state error covariance, Pgk , and the cross covariance, Pe r , between the state and observation error given by the equations (A.8) and (A. 12) can be rewritten in terms of equations (B.l) and (B.2) and are given by (B.5) Poh = E QkQk P (B.6) - —E Taking the outer products and expectation of (B.3) produces E 0k9k = E [dl-Kki,k)(d~k-Kk^k)T = E e e kk E Wk •T *i E Kk^ke~k +E rT. (B.7) Using equations (B.5) and (B.6), equation (B.7) can be rewritten as p ok - pek ~ pekj,kKI - KkPjkek + K p k jKl (B.8) Substituting the expression for Kalman gain, given by equation (A. 17) back into the above expression, the covariance update equation is given by p ek - pek ~ K P k i,kKj (B.9) Bibliography Advanced Micro Devices, I., 2007: Amd optron processor tech docs. http://www.amd.com/us-en/processors/technicalresources/O,,30_ 182„7 3 9_900 3, 0 0. html. Advanced Micro Devices. Anderson, B. and J. Moore, 1979: Optimal Filtering. Prentice-Hall, New Jersey., 357 pp. Arulampalam, M., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle filter for online nonlinear/non-gaussian bayesian tracking. IEEE T. Signal Proces., 50 (2), 188-197. Baheti, R. S., O. Halloran, and H. R. Itekowitz, 1990: Mapping extended kalman filters onto linear arrays. IEEE T. Automat. Contr., A C - 3 5 , 1310-1319. Bennet, A., 1992: Inverse Methods in Physical Oceanography. Cambridge University Press., 223 pp. Bergman, N., 1997: Recursive bayesian estimation. Ph.D. thesis, Linkoping Studies in Science and Technology., 219 pp., [Available from h t t p : //www. c o n t r o l . i s y . liu.se/research/reports/Ph.D.Thesis/PhD579.pdf]. Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press., 457 pp. 76 BIBLIOGRAPHY 77 Daum, F. E. and J. Fitzgerald, 1983: Decoupled kalman filters for phased array radar tracking. IEEE T. Automat. Contr., AC-28, 269-283. Davis, R. E., 1977: Techniques for statistical analysis and prediction of geophysical fluid systems. Geophys. Astro. Fluid., 8, 245-277. Doucet, A., 1998: On sequential simulation-based methods for bayesian filtering. Tech. rep., Cambridge University, Dept. of Engineering, 26 pp. [Available on the MCMC preprint service at h t t p : / / w w w . s t a t s . b r i s . a c . u k / M C M C ] . Doucet, A., S. Godsil, C. Andrieu, C. Berzuini, N. Best, W. Gilks, and C. Larizza, 1997: Dynamic conditional independence models and markov chain monte carlo methods. J. Am. stat. Assoc, 92, 1403-1412. Evensen, G., 1992: Using the extended kalman filter with a multilayer quasi- geostrophic ocean model. J. Geophys. Res., 97 ( C l l ) , 17905-17924. Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10143-10162. Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon. Wea. Rev., 125, 1342-1354. Evensen, G., 2003: The ensemble kalman filter: theoretical formulation and practical implementation. Ocean Dynam., 53, 343-367. Gamage, N. and W. Blumen, 1993: Comparative analysis of low-level cold fronts: Wavelet, fourier, and empirical orthogonal function decompositions. Mon. Wea. Rev., 191, 28672878. Gauthier, P., 1992: Chaos and quadric-dimensional data assimilation:a study based on the lorenz model. Tellus, AAA, 2-17. BIBLIOGRAPHY 78 Gelb, A., 1974: Applied Optimal Estimation. MIT Press. Ghil, M. and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Adv. Geophys., 33, 141-265. Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novell approach to nonlinear or non-gaussian bayesian filtering. IEE Proc-FP., IEEE, 107-113. Hasselmann, K., 1988: Pips and pops, a general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation patterns. J. Geophys. Res., 93, 11015-11021. Haykin, E., S., 2001: Kalman Filtering and Neural Networks. Wiley., 128 pp. Houtekamer, P. and H. Mitchell, 1998: Data assimilation using an ensemble kalman filter technique. Mon. Wea. Rev., 126, 796-811. Ito, K. and K. Xiong, 2000: Gaussian filters for nonlinear filtering problems. IEEE T. Automat. Contr., 45(5), 910-927. Jazwinsky, A., 1970: Stochastic Processes and Filtering Theory. Academic Press, New York. Julier, S., 1998: A skewed approach to filtering. SPIE Conference on Signal and Data Processing of Small Targets., Orlando, Florida, USA, SPIE, 271-282. Julier, S., 2002: The scaled unscented transformation. Proceedings of the American Control Conference, IEEE, 4555-4559. Julier, S., 2003: The spherical simplex unscented transformation. Proceedings of the American Control Conference, IEEE, 2430-2434. BIBLIOGRAPHY 79 Julier, S. and J. Uhlmann, 2002: Reduced sigma-point filters for the propagation of mean and covariances through nonlinear transformations. Proceedings of the American Control Conference, IEEE, 887-892. Julier, S. and J. Uhlmann, 2004: Unscented filtering and nonlinear estimation. Proceedings of the IEEE, IEEE, Vol. 92(3), 401-422. Julier, S., J. Uhlmann, and H. Durrant-Whyte, 1995: A new approach for filtering nonlinear systems. Proceedings of the American Control Conference, Seattle, WA, USA, IEEE, 1628-1632. Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a parallel ensemble kalman filter. Mon. Wea. Rev., 128, 1971-1981. Kivman, G. A., 2003: Sequential parameter estimation for stochastic systems. Nonlinear Proc. Geoph., 10, 253-259. Kotecha, J. and P. Djuric, 2001: Gaussian sum particle filtering for dynamic state space models. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, Utah, IEEE., Vol. 6, 3465 - 3468. Kotecha, J. and P. Djuric, 2003: Gaussian sum particle filtering. IEEE T. Signal Proces., 51 (10), 2602-2612. Lefebvre, T., H. Bruyninckx, and J. De Schutter, 2002: Comment on - new method for the nonlinear transformation of means and covariances in filters and estimators deterministic non-periodic flow.". IEEE T. Automat. Contr., 47(8), 1406- 1409. Lermusiaux, P. F. J., 1997: Error subspace data assimilation methods for ocean field estimation: Theory, validation and applications. Ph.D. thesis, Harvard University, Cambridge, MA., 402 pp., [Available from Harvard Archives Microfilm, Harvard Depository HU 90.14492.9]. BIBLIOGRAPHY 80 Lermusiaux, P. F. J. and A. R. Robinson, 1999: Data assimilation via error subspace statistical estimation, part i: Theory and schemes. Mon. Wea. Rev., 127, 13851407. Lorenz, E., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 13-141. Lorenz, E., 1965: A study of the predictability of a 28-variable atmospheric model. Tellus, 17, 321-333. Lorenz, E., 2005: Designing chaotic models. J. Atmos. Sci, 62, 1574-1587. Lorenz, E. and K. A. Emmanuel, 1998: Optimal sites for supplementary weather observations: Simulations with a small model. J. Atmos. Sci., 55, 399-414. Lorenz, E. N., 2006: Predictability - a problem partly solved. Predictability of Weather and Climate, T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40-58. Mathworks., T., 2007: Matlab - the language of technical computing, h t t p : / / w w w . m a t h w o r k s . c o m / p r o d u c t s / m a t l a b / i n d e x . h t m l . The MathWorks, Inc. Meyers, S. D., B. G. Kelly, and J. J. O'Brien, 1993: An introduction to wavelet analysis in oceanography and meteorology: With application to the dispersion of yanai waves. Mon. Wea. Rev., 121, 28582866. Miller, R., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems. J. Atmos. Sci., 5 1 , 1037-1056. Navon, I., 1998: Practical and theoretical aspects of adjoint parameter estimation and identifiability in meteorology and oceanography. Dyn. Atmos. Oceans, 27, 55-79. Nelson, A. T., 2000: Nonlinear estimation and modeling of noisy time-series by dual kalman filtering methods. Ph.D. thesis, Oregon Graduate Institute. BIBLIOGRAPHY 81 N0rgad Magnus, P., N. K. Poulsen, and O. Ravn, 2000a: Advances in derivative-free state estimation for nonlinear systems. Tech. rep., Dept. of Mathematical Modeling, Technical University of Denmark., 28 Lyngby, Denmark. N0rgad Magnus, P., N. K. Poulsen, and O. Ravn, 2000b: New developments in state estimation of nonlinear systems. Automatica., 36, 1627-1638. Ohmuro, T., 1984: A decoupled kalman tracker using los coordinates. Proc. Int. Symp. Noise and Clutter Rejection in Radars and Imaging Sensors, IEEE, 451-455. Palmer, T., 1993: Extended-range atmospheric prediction and the lorenz model. Bull. Amer. Meteor. Soc, 74, 49-65. Papoulis, A. and S. Unnikrishna Pillai, 2002: Probability, Random Variables, and Stochastic Processes. 4th ed., McGraw-Hill. Penland, C , 1989: Random forcing and forecasting using principal oscillation pattern analysis. Mon. Wea. Rev., 117, 2165-2185. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 2 Edition. Robert, C. P., 1994: The Bayesian Choice: A Decision-Theoretic Motivation. Springer Texts in Statistics, Springer-Verlag, 436 pp. Schei, T. S., 1997: A finite-difference method for linearization in nonlinear estimation algorithms. Automatica., 33, 2053-2058. Schnur, R., G. Schmitz, N. Grieger, and H. von Storch, 1993: Normal modes of the atmosphere as estimated by principal oscillation patterns and derived from quasigeostrophic theory. J. Atmos. Sci., 50, 2386-2400. BIBLIOGRAPHY 82 Simon, D., 2006: Optimal State Estimation, Kalman, H^, and Nonlinear Approaches. 1st ed., Wiley-Interscience, 318-320 pp. Teman, R., 1991: Approximation of attractors, large eddy simulations and multiscale methods. Proc. Roy. Soc. London., 434A, 23-29. Van der Merwe, R., 2004: Sigma-point kalman filters for probabilistic inference in dynamic state-space models. Ph.D. thesis, Oregon Health &; Science University. Van der Merwe, R., A. Doucet, N. de Freitas, and E. Wan, 2000: The unscented particle filter. Tech. rep., Cambridge University Engineering Department, 46 pp. [Available at http://citeseer.ist.psu.edu/artile/vandermerwe00unscented. html]. Van der Merwe, R. and E. A. Wan, 2001a: Efficient derivative-free kalman filters for online learning. Proceedings of ESANN, Bruges. Van der Merwe, R. and E. A. Wan, 2001b: The square-root unscented kalman filter for state and parameter estimation. Proceedings of the International Acoustics, Speech, and Signal Processing (ICASSP)., Conference on Salt Lake City, Utah, IEEE. Van der Merwe, R., E. A. Wan, and S. I. Julier, 2004: Sigma-point kalman filters for nonlinear estimation and sensor fusion: Applications to integrated navigation. AIAA Guidance, Navigation and Control Conference, Providence, RI, American Institute of Aeronautics and Astronautics, 5120-5122. Van Trees, H. L., 2001: Detection, Estimation and Modulation Theory. Part I. WileyInterscience, New York, 720 pp. von Storch, H., G. Burger, R. Schnur, and J. S. von Storch, 1995: Principal oscillation patterns: A review. J. Climate, 8, 377-400. BIBLIOGRAPHY 83 von Storch, H. and C. Prankignoul, 1997: Empirical modal decomposition in coastal oceanography. Processes and Methods, K. Brink and A. R. Robinson, Eds., John Wiley and Sons, The Sea: The Global Coastal Ocean I., Vol. 10, 419455. Wallace, J. M., C. Smith, and C. S. Bretherton, 1992: Singular value decompositions of winter time sea surface temperature and 500-mb height anomalies. J. Climate, 5, 561-576. Wan, E. and R. Van Der Merve, 2000: The unscented kalman filter for nonlinear estimation. Adaptive Systems for Signal Processing, Communications, and Control Symposium ASSPCC 2000, Lake Louise, Alberta, Canada, IEEE Communications Society, 153-158. Wan, E. A. and R. Van der Merwe, 2001: Kalman filtering and neural networks, haykin, s, ed. Wiley., chapter 7 - The Unscented Kalman Filter., chapter 7 - The Unscented Kalman Filter. Weare, B. C. and J. Nasstrom, 1982: Examples of empirical orthogonal function analysis. Mon. Wea. Rev., 110, 481-485. Welch, G. and G. Bishop, 1995: An introduction to the kalman filter, technical report: Tr95-041. Tech. rep., University of North Carolina, Chapel Hill, NC.