Sigma-point Kalman Filter D a t a Assimilation M e t h o d s
For Strongly Nonlinear Dynamical Models

Jaison Thomas Ambadan
B.Sc, Mahatma Gandhi University, (India) 2000
M.Sc, Cochin University of Science & Technology, (India) 2002

Thesis Submitted In Partial Fulfillment Of
The Requirements For The Degree Of
Master of Science
in
Natural Resources and Environmental Studies
(Environmental Science)

The University of Northern British Columbia
June 2008
(C) Jaison Thomas Ambadan, 2008.

1*1

Library and
Archives Canada

Bibliotheque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de I'edition

395 Wellington Street
Ottawa ON K1A0N4
Canada

395, rue Wellington
Ottawa ON K1A0N4
Canada

Your file Votre reference
ISBN: 978-0-494-48755-6
Our file Notre reference
ISBN: 978-0-494-48755-6

NOTICE:
The author has granted a nonexclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or noncommercial purposes, in microform,
paper, electronic and/or any other
formats.

AVIS:
L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par Plntemet, prefer,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.

The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.

L'auteur conserve la propriete du droit d'auteur
et des droits moraux qui protege cette these.
Ni la these ni des extraits substantiels de
celle-ci ne doivent etre imprimes ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.

Conformement a la loi canadienne
sur la protection de la vie privee,
quelques formulaires secondaires
ont ete enleves de cette these.

While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.

Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

Canada

To my beloved

parents

Acknowledgments
It has been my good fortune to encounter many good people who have given me
companionship, help, and above all their valuable time during the course of my life
at UNBC.
First and foremost, I would like to express my sincere gratitude to my supervisor, Dr. Youmin Tang for his help and guidance. This research work has certainly
benefited from his suggestions.
I would also like to take this opportunity to acknowledge and thank my graduate
supervisory committee members Dr. Peter L. Jackson, Dr. Stephen J. Dery and Dr.
Zhenguo Qiu for the support they have given me over the years.
A special word of gratitude to all my co-workers, group members, and my beloved
friends and roommates that I have worked with over the years. A special word of
gratitude, finally, to all the members of my research group, Dr. Ziwang Deng, Dr.
Xiaobing Zhou, Mr. Yanjie Cheng, and Mr. Zhiyu Wang.
I also acknowledge and thank the Oregon Graduate Institute, and Dr. Eric A.
Wan and Dr. Rudolph Van der Merwe for providing the ReBEL tool kit, part of
which has been used in this research work. This work was supported by an NSERC
(Natural Sciences and Engineering Research Council of Canada) Discovery Grant and
the CFCAS (Canadian Foundation for Climate and Atmospheric Sciences) network

1

11

project of "Global Ocean-Atmosphere Prediction and Predictability".
I am particularly indebted to my parents, my brother, and my sisters for their
monumental, unwavering support and encouragement on all fronts. They have truly
always been there for me, and without them none of this would have been even
remotely possible.

Prince George,
27th August 2008

Jaison Thomas Ambadan

Abstract
Performance of an advanced derivative-less, sigma-point Kalman filter (SPKF) data
assimilation scheme in a strongly nonlinear dynamical model is investigated.

The

SPKF data assimilation scheme is compared against standard Kalman filters such as
the extended Kalman filter (EKF) and the ensemble Kalman filter (EnKF) schemes.
Three particular cases, namely the state estimation, parameter estimation, and joint
estimation of states and parameters from a set of discontinuous noisy observations are
studied. The problems associated with the use of the tangent linear model (TLM) or
the Jacobian when using standard Kalman filters are eliminated when using SPKF
data assimilation algorithms. Further, the constraints and issues of SPKF data assimilation in real ocean or atmospheric models are emphasized. A reduced sigma-point
subspace approach is proposed and investigated for higher dimensional systems.
A low dimensional Lorenz '63 model and a higher dimensional Lorenz '95 model
are used as the test-bed for data assimilation experiments. The results of the SPKF
data assimilation schemes are compared with those of the standard EKF and EnKF
where a highly nonlinear chaotic case is studied. It is shown that the SPKF is capable
of estimating the model state and parameters with better accuracy than EKF and
EnKF. Numerical experiments show that in all cases, the SPKF can give consistent
results with better assimilation skills than EnKF and EKF, and can overcome the

iii

iv
drawbacks associated with the use of EKF and EnKF.

Contents
Acknowledgments

i

Abstract

iii

1

Introduction

1

1.1

Optimal Estimation and Data Assimilation: An Overview

1

1.2

Data Assimilation in Nonlinear Systems: A Solution?

3

1.3

Research Objectives

6

1.4

Contribution of This Research Work

7

1.5

Thesis Context and Overview

8

2

Sequential Data Assimilation Methods

10

2.1

Recursive Bayesian Estimation for Nonlinear Systems

11

2.2

Extended Kalman Filter

15

2.3

Ensemble Kalman Filter

17

2.4

Sigma-point Kalman Filters

19

2.4.1

Sigma-point Approach

20

2.4.2

Sigma-point Unscented Kalman Filter

23

2.4.3

Sigma-point Central Difference Kalman Filter

25

2.5

A New Ensemble Kalman Filter: Sigma-point Approach
v

29

CONTENTS

3

4

5

vi

Experiments with the Lorenz '63 Model

33

3.1

State Estimation

34

3.2

Parameter Estimation

36

3.3

Joint Estimation of States and Parameters

37

Results and Discussion

39

4.1

State Estimation

39

4.2

Parameter Estimation

52

4.3

Joint Estimation of States and Parameters

55

Reduced Space Sigma-point Kalman Filter

60

5.1

60

SPKF Data Assimilation in Higher Dimensional Systems
5.1.1

A subspace approach with sigma-points: Design and Implementation

5.2

61

Experiments with the Lorenz '95 Model

62

5.2.1

63

Performance and Evaluation

6

Conclusion

68

A

Re-interpretation of the Standard Kalman Gain

71

B

An Alternate Formula for Updating the State Error Covariance Matrix

74

Bibliography

76

Chapter J

I

Introduction

1.1

Optimal Estimation and Data Assimilation: An Overview

Optimal estimation is the process of determining the state of a system with minimal
errors using the information from measured or observed data and a mathematical
model. In optimal estimation, it is assumed that all the available information from
the numerical model and observations are contaminated with noise, hence there is
uncertainty. If there was no uncertainty then there would be no need for an optimal
estimation. The uncertainty is due to model errors, and measurement or observational
errors. The model errors are commonly due to the functional simplifications, linearization, and discretization of the mathematical or numerical model describing the
physical phenomena. The observational or measurement errors are usually associated
with sensor and instrument inaccuracy. Optimal estimation is often characterized into
three different classes, namely: prediction, filtering and smoothing, depending on the
time at which an estimate is desired, with the available observed information. Optimal estimation has a wide range of applications from robotics to artificial intelligence,
to missile guidance, to econometrics, to weather forecasting and so forth. Several optimal estimation methods have been proposed over time. The most commonly used
optimal estimation methods include the method of least squares, the maximum like-

1

1.1

OPTIMAL ESTIMATION AND DATA ASSIMILATION: AN OVERVIEW

2

lihood and the Bayesian estimation. The method of least squares, first introduced by
Gauss in 1794, is used to optimally fit the data using a linear model. The best fit is
characterized by the sum of squared residuals which have the least value. The Least
Square Estimation (LSE) is optimal only if the underlying distribution is normal.
On the other hand, Maximum Likelihood Estimation (MLE), introduced by Fisher
in 1922, is generally optimal. The LSE has the minimum variance within the class of
linear unbiased estimates 1 . The likelihood function is often defined as a probability
density function for the observable 2 in terms of which parameters are observed.
Bayesian estimation makes use of any prior information about the unknown parameters and the observable. Bayesian estimation can simply be considered as an
extension of maximum likelihood estimation where we can use all the available information about observations and parameters to be estimated. Recursive Bayesian
estimation is a general probabilistic approach for estimating an unknown probability
density function recursively over time using incoming new measurements and a mathematical model. One of the most widely used recursive Bayesian optimal estimation
techniques is the Kalman filter. Loosely speaking, recursive Bayesian estimation is
known in the earth science as data assimilation. In meteorology and physical oceanography, data assimilation is a procedure that combines all available observations with
information from numerical models to give an optimal estimate of the state of the
atmosphere-ocean. A data assimilation system consists of three major components:
a set of observations, a dynamical model, and an assimilation scheme which is equivalent to an optimal estimation method. The quality of assimilation depends mostly
on the assimilation scheme used in the model and the quality of the observations.
Generally, all assimilation methods are able to estimate the model state at a given
1

This is the well known Gauss-Markov (GM) theorem.

2

In estimation theory, a system is said to be observable if the current state of the system can be

determined in time using only the outputs of the system. If a system is not observable, then current
values of states cannot be determined through observations or measurements.

1.2

DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION?

3

time. Another important task in data assimilation is to use available observations to
estimate unknown parameters in a dynamical or an empirical model.

1.2

Data Assimilation in Nonlinear Systems: A Solution?

Many assimilation techniques have been developed for meteorology and physical
oceanography. Methodologies like Optimal Interpolation (01) and variational methods, in particular the three- and four-dimensional variational data assimilation methods (3D-VAR and 4D-VAR), and Kalman filter based techniques have broad applications in Numerical Weather Prediction (NWP) [Bennet, 1992; Daley, 1991; Ghil
and Malanotte-Rizzoli, 1991; Navon, 1998]. In 0 1 (and 3D-VAR) robust estimation
is difficult because of the outlier problem, i.e. isolated observations are given more
weight than observations close together. In 01 the forecast errors have large correlations at nearby observation points and when several observations are close together
calculation of weights may be ill-posed. On the other hand the 4D-VAR assimilation algorithm is expensive 3 to implement in the case of nonlinear systems although
it gives almost the same accuracy as the Kalman filter, since it requires repeated
forward integrations with the non-linear (forecast) model and backward integrations
with the tangent linear model (TLM). In addition, writing the adjoint code is the
most difficult part in implementing 4D-VAR assimilation scheme for general circulation models (GCMs), because of its highly nonlinear nature. It has been shown
that estimation methods such as the Kalman filter and its derivatives can be used
in sequential data assimilation in strongly nonlinear systems such as the atmosphere
and the ocean [Miller et al., 1994].
Among sequential data assimilation methods, Kalman filters have been widely
used in meteorology and oceanography. The standard Kalman Filter (KF) is a sim3

If the system is linear and Gaussian 4D-VAR is equivalent to Kalman filter, and is computa-

tionally cheaper

1.2

DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION?

4

plification of Bayesian estimation that provides sequential, unbiased, minimum error
variance estimates based upon a linear combination of all past measurements and
dynamics [Welch and Bishop, 1995]. Since the introduction of the extended Kalman
filter (EKF), which is the nonlinear extension to the standard KF, there have been
many attempts to use the EKF in weather or climate prediction models. It has been
shown that the EKF can be used in sequential data assimilation in strongly nonlinear
systems [Miller et al., 1994]. Unfortunately, the requirement of the Jacobian or the
tangent linear model (TLM) for the linearization of nonlinear functions limits the
use of EKF for many real world problems. Another major drawback of EKF is that
it only uses the first order terms of the Taylor expansion of the nonlinear function.
It is evident that this approximation often introduces errors in the estimation of covariance matrices in highly nonlinear models [Miller et al., 1994]. In other words,
the inaccuracy of propagated means and covariances resulting from the linearization
of the nonlinear model is one of the major drawbacks of the EKF data assimilation
algorithm.
Another alternative to the standard KF is the Ensemble Kalman Filter (EnKF),
introduced by Evensen [1992] (later modified by Houtekamer and Mitchell [1998]),
where the error covariances are estimated approximately using an ensemble of model
forecasts.

The main concept behind the formulation of the EnKF is that if the

dynamical model is expressed as a stochastic differential equation, the prediction error
statistics, which are described by the Fokker-Plank equation, can be estimated using
ensemble integrations [Evensen, 1994, 1997]; the error covariance matrices can be
calculated by integrating the ensemble of model states. The EnKF can overcome the
EKF drawback that neglects the contributions from higher order statistical moments
in calculating the error covariance. Major strengths of the EnKF are: i) there is
no need to calculate the tangent linear model, which is extremely difficult for ocean
(or atmospheric) general circulation models (GCMs); ii) the covariance matrix is

1.2

DATA ASSIMILATION IN NONLINEAR SYSTEMS: A SOLUTION?

5

propagated in time via fully nonlinear model equations (no linear approximation as
in the EKF); and iii) it is well suited to modern parallel computers (cluster computing)
[Keppenne, 2000].
However, there are also some disadvantages of using EnKF data assimilation.
Firstly, the finite ensemble size has major effects on the performance of the EnKF.
A small ensemble size increases the residual errors and gives inaccurate statistical
moments while a large ensemble size is not computationally feasible in the case of
atmospheric or ocean general circulation models. Another disadvantage of the EnKF
is that it assumes a linear measurement operator; if the measurement function is
nonlinear, it has to be linearized in the EnKF. The nonlinear measurement function
often appears in many situations; for example, the parameter estimation of nonlinear
dynamical models where the measurement relationship between observations and parameters are nonlinear. Similarly, in the case of satellite altimetry data assimilation,
the observation (sea level height) is often nonlinearly related to the variable required
for assimilation (e.g. temperature). Thus the condition of linear measurement limits
the use of the EnKF in some real world problems.
The Sigma-Point Kalman Filters (SPKFs) have been proposed in an attempt to
address these drawbacks of EKF and EnKF [Julier et al., 1995; Van der Merwe, 2004;
Van der Merwe et al., 2004]. The SPKF is a derivative-less sequential optimal estimation method which uses a novel deterministic sampling approach that eliminates
the need for the calculation of TLM or the Jacobian of the model equations as needed
by the standard KF [Haykin, 2001; Ito and Xiong, 2000; Julier et al., 1995; Lefebvre et al., 2002; N0rgad Magnus et a l , 2000b; Van der Merwe, 2004; Wan and Van
Der Merve, 2000]. It has been found that the expected error due to linearization is
smaller than that of a truncated Taylor series linearization in EKF [Van der Merwe
and Wan, 2001a]. The SPKF algorithm has been successfully implemented in many
areas like robotics, artificial intelligence, natural language processing, and global po-

1.3

RESEARCH OBJECTIVES

6

sitioning systems navigation [Haykin, 2001; Van der Merwe, 2004; Van der Merwe and
Wan, 2001a,b; Van der Merwe et al., 2004; Wan and Van Der Merve, 2000]. In this
thesis, we will show that SPKF has a great potential in data assimilation problems in
meteorology and physical oceanography, and is capable of estimating the model state
and parameters with better accuracy than existing methods.

1.3

Research Objectives

Over the last few decades the Kalman filter based data assimilation has become an
active area of research in meteorology and physical oceanography. One of the main
reasons behind this widespread research is due to the fact that Kalman filter based
optimal estimation has been a great success in other areas of science and engineering.
However, the estimation inaccuracy due to the highly nonlinear nature of the dynamics as well as the computational difficulties limits the use of standard Kalman filters
such as the extended Kalman filter and ensemble Kalman filter assimilation schemes
in many real world problems.
In this thesis, we will show that SPKF, as an ensemble Kalman filter with "specific" ensemble, has a great potential in the assimilation of nonlinear systems. This
thesis is meant to explore the possibility of applying SPKF in atmospheric and oceanic
data assimilation.
Summary of Research Objectives: This study aims to:
• Investigate the performance and capabilities of Sigma-point Kalman filter (SPKF)
data assimilation schemes in strongly nonlinear dynamical systems for both the
model state and parameter estimation problems.
• Study the assimilation performance in the case of higher noise and infrequent
observations.

1.4

CONTRIBUTION OF THIS RESEARCH WORK

7

• Compare the assimilation results against existing Kalman filter based data assimilation schemes.
• Explore the possibility of combining the theory of ensemble Kalman filter and
sigma-point Kalman filters.
• Explore the possibilities to overcome the drawbacks or constraints of SPKF data
assimilation scheme, if any.

1.4

Contribution of This Research Work

During the course of this thesis work, most of the research objectives stated in the
previous section were successfully completed. The following concrete and substantial
contributions are made to the body of knowledge regarding SPKF data assimilation
schemes and their application to highly nonlinear dynamical models.
Summary of Research Contributions:
• Implemented the SPKF method in three classes of data assimilation problems
namely state estimation, parameter estimation and joint estimation (of state
and parameters) from a set of discontinuous and noisy observations.
• Demonstrated that the SPKF is capable of estimating the model state and
parameters with better accuracy and skill than the EKF and the EnKF.
• Unified the theoretical concepts of the sigma-point Kalman filter and the ensemble Kalman filter, and identified the sigma-points as deterministic or specific
ensembles.
• Derived a new algorithm for ensemble Kalman filter based on the sigma-point
approach which does not need the requirement of a linearized measurement
operator.

THESIS CONTEXT AND OVERVIEW

1.5

8

• Proposed and investigated a new reduced subspace sigma-point Kalman filter
data assimilation scheme for higher dimensional systems.

1.5

Thesis Context and Overview

This thesis describes the possibility of using an advanced derivative-less optimal estimation technique in highly nonlinear dynamical models. The thesis is outlined as
follows:
Chapter 1 (Introduction)

gives a brief general introduction to general optimal

estimation, followed by optimal estimation problems in the earth science, commonly
known as data assimilation. The key issues in data assimilation methods are discussed
along with the need for more versatile data assimilation schemes.
Chapter 2 (Sequential Data Assimilation

Methods) introduces the concept of se-

quential data assimilation methods, and describes the details of commonly used
Bayesian methods. This chapter gives an overview of existing data assimilation methods such as the Extended Kalman Filter (EKF) and the Ensemble Kalman Filter
(EnKF) as well as a review of their drawbacks. A new derivative-less optimal estimation method known as the sigma-point Kalman filter and the sigma-point approach is
introduced in detail. The sigma-point approach forms one of the key components in
the SPKF framework. This chapter explains algorithmic details of two main classes
of sigma-point Kalman filters such as the sigma-point unscented Kalman filter (SPUKF) and sigma-point central difference Kalman filter (SP-CDKF). It also explains
the unification of sigma-points and ensembles, from which a new ensemble Kalman
filter (EnKF) algorithm was derived.
Chapter 3 (Data Assimilation Experiments

With the Lorenz '63 model) introduces

the Lorenz '63 model, used in this research following a number of experiments in which
the data assimilation schemes described in Chapter 2 are applied to the Lorenz '63

THESIS CONTEXT AND OVERVIEW

1.5

9

model. Implementation of three particular cases, namely the state, parameter, and
joint estimation of states and parameters from a set of discontinuous noisy observations are described in detail.
Chapter 4 (Results and Discussions)

gives the details of the experimental re-

sults, and compares the results against standard Kalman filters such as the extended
Kalman filter (EKF) and the ensemble Kalman filter (EnKF) schemes. Both parameter and state estimation results are discussed in detail.
Chapter 5 (A Reduced Sigma-point

Sub-space Approach) emphasizes the con-

straints and issues of SPKF data assimilation in higher dimensional models. A reduced sigma-point subspace model is proposed and investigated for higher dimensional
systems.
Chapter 6 (Conclusion) consolidates the thesis with a detailed discussion of the
presented research topics, and provides some suggestions for future research.
Appendix A and Appendix B give detailed interpretations and derivations of
Kalman gain term and covariance update equation in the standard Kalman filter
equations.

Chapter

2

Sequential Data Assimilation Methods
In general data assimilation methods can be classified into two categories: sequential
methods and variational methods. Sequential methods integrate the model forward
in time with some forcing terms which force the model towards observations. On the
other hand variational methods such as the three dimensional variational assimilation
(3D-VAR) and the four dimensional variational assimilation (4D-VAR), seek to minimize a cost function which is the measure of the error between the model and the
observations over the assimilation period. In sequential methods the observations are
essentially blended with the model but in variational methods the analysis is a valid
trajectory of the model. One of the most widely used sequential data assimilation
methods is Kalman filter, which is based on the recursive Bayesian estimation.
In this chapter we will take a short tour of basic estimation theory and then we
will introduce the theory of Sigma-point Kalman filters and their variants in detail,
which will serve as a theoretical background for the remaining part of the thesis. The
basic theory presented in this chapter is mainly based on the books of Papoulis and
Unnikrishna Pillai [2002], Simon [2006], Van Trees [2001], Robert [1994], Bergman
[1997], Anderson and Moore [1979], and Jazwinsky [1970].

10

2.1

RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS

2.1

Recursive Bayesian Estimation for Nonlinear Systems

11

The main objective of an estimation problem is to extract information about the states
of a physical system given the observation of the states. In the Bayesian approach,
both the system states and the measurements are treated as stochastic quantities, and
the inference about the estimate is a conditional density function of the states given
the measurement or observation outcome. In Bayesian estimation theory the state
vector (say 0) which describes the physical states of the system, and the measurement
vector (say ?/>) which is the outcome of the measurement experiment or instrumental
observations, are often considered as random vectors. Our objective is to estimate the
current state of the system using all the past observations. In the Bayesian approach,
all the statistical information about the current state is condensed in the conditional
density p(0\ip) given by,

= pjmm

fm)

(2„

where p(ip\6) is the probability density function of the measurement vector, p(0) is
the prior density function, which is assumed to be known. p(0\if)) is also known as
the posterior density function, which describes all statistical information about the
random state vector 0 after the observation process, p (xp) is given by,

p(il>)=

f

p(il>\0)p(0)d0

The joint density function of the states and the observations, p(ip\0)p

(2.2)

(0) defines the

estimation problem, while the posterior density function p(0\ij)) is regarded as the
solution to the estimation problem.
In practice, it is difficult to find the complete posterior density. However, it is
desirable to determine an "optimal" estimate 0 of the model state using the mathe-

2.1

RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS

12

matical rule or function (known as the estimator). In most cases the cost function,
which represents the erroneous estimate, depends only on the estimation error (difference between true and optimal estimate) given by,

0 = 0-0

(2.3)

Two most commonly used optimal estimators are the minimum mean-squared error
estimate and the maximum a-posteriori estimate, which upon some conditions is
equivalent to the maximum likelihood estimate.

In the Bayesian framework it is

assumed that the posterior density is symmetric about its mean hence the optimal
Bayesian estimator is the conditional mean. The optimal estimate for the minimum
mean-squared estimator is given by

0 MS = I

9p(0\il>)d6

(2.4)

Similarly the maximum a-posteriori estimate is given by

0MAP — arg

max.p(6\tf)

d6

(2.5)

0

Another quantity which gives the information about the goodness of the optimal
estimate is the mean-squared error covariance matrix:

P^E

[e-e)(e-e)

(2.6)

The error covariance can be directly computed from the posterior density

p= f (e-e\(e-e\Tp{o\^)de

(2.7)

Therefore the mean-square estimate and its estimation error covariance are the first

RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS

2.1

13

and second central moments of the posterior density p(0\ij>) respectively.
In the case of recursive estimation, the information about the system state is
updated continuously as new measurements are collected. This recursive analysis of
observations is suitable in meteorology and physical oceanography problems where
the states of the physical system change with time. In recursive estimation the states
evolve in time according to a Markov process with initial states do ~ p (do) and the
transition kernel, which is the transition function of the stochastic process given by

p(6k+1\ek)

fc

= 0,l,...

(2.8)

The Markovian property implies that given the present state, the future states are
independent of the past states. The measurement of the state at time k is conditionally independent of the previous measurements given the present states. The prior
p (do), the transition kernel p (dk+i\dk)

an

d the likelihood p (ij)k\Ok) defines the joint

density of all measurements and states from time zero to time k. Applying Bayes'
rule to the last measurement vector ipk gives

P W^l:*-l)
P(V>fc|0fcM<Wl:fc-l)

(2.9)

PW^l:fc-l)
where,

P W^l:fc-l) = /

PW>fc|0*)p(Wl:*-l)d0*

(2-10)

Here we make use of the Markovian property that the measurement vector Vfc is
conditionally independent of the previous measurements given the present state. The
expression (2.10) is known as the measurement update in recursive Bayesian estimation. Similarly by assuming the state evolution is Markovian and Oh+i is independent

RECURSIVE BAYESIAN ESTIMATION FOR NONLINEAR SYSTEMS

2.1

14

of the past measurements tp^j, given the present state 6k, the effect of time step is
given by

p (0 fc+ i, 0fc|V>1:fc) = P (0fc+i|0fc, ^i-.k) P (flfclV^fc)
= p(Ok+1\Ok)p(Ok\il>1:k)

(2.11)

Integrating (2.11) with respect to 6k gives the time update equation of the recursive
Bayesian estimation, also known as the Chapman-Kolmogorov equation, given by

p(0k+1\^1:k)=

f p(0k+1\0k)p(9k\il>1:k)d9k

(2.12)

Equations (2.9) and (2.12) form the recursive propagation of posterior density of the
states given the measurements, and the recursion is initiated by p (0 o |V'-i) = V (#o)The estimates corresponding to the minimum mean-squared estimator and maximum a-priori estimator are given by

°™t=

[ ekp(dk\xphk)d9k

(2.13)

0 ^ A P = avgnmxp(dk\^1:k)d9k

(2.14)

The corresponding estimation error covariance is given by

Pk\k= [

(d-d)(o-e)Tp(dk\^1:k)ddk

JRn V

/ \

(2.15)

/

In general, there is no explicit analytical solution to the integrals (2.10) and
(2.12) especially in the case of multidimensional nonlinear physical systems with nonGaussian elements. If the state transition and the measurement relation are linear,
and the noises acting on the system are Gaussian, then it is possible to solve the
recursive Bayesian estimation problem analytically, and the solution is the known as

2.2

EXTENDED KALMAN FILTER

15

the celebrated Kalman filter. However, many geophysical systems are nonlinear with
non-Gaussian components, and it is practically impossible to find the analytical solutions for the optimal estimators and estimation error covariances. In general there
are two ways to tackle this problem. The first one is to simplify the dynamical model
so that an analytical solution is obtained. In this case, the nonlinear model equations
are expanded into a Taylor series around mean states, assuming Gaussian distribution
[Jazwinsky, 1970; Simon, 2006; Van Trees, 2001]. This approach is commonly known
as the Extended Kalman Filter (EKF), which we will discuss in the next subsections.
The second way is to approximate the posterior distribution for non-Gaussian systems[Doucet, 1998; Simon, 2006]. It is usually done by a weighted sum of Gaussian
distributions. This is one of the most common approach in Sequential Importance
Re-sampling (SIR) based particle filters [Arulampalam et al., 2002; Doucet, 1998;
Doucet et al., 1997; Gordon et al., 1993] and Gaussian-Sum Filters (GSF) [Kotecha
and Djuric, 2001, 2003]. In a broad sense the Sigma-point Kalman Filters use the
same approach, which we will describe in detail in later sections.

2.2

Extended Kalman Filter

The basic Kalman Filter (KF) is a simplification of Bayesian estimation for linear
Gaussian systems that provides sequential, unbiased, minimum error variance estimate based upon a linear combination of all past measurements and dynamics [Welch
and Bishop, 1995]. The basic Kalman filter algorithm is limited to linear models.
However, most of the atmospheric and ocean dynamical equations are non-linear.
In such situations, a Kalman filter that is linearized about the current mean and
covariance, referred to as an extended Kalman filter (EKF) is used. To formulate
the extended Kalman filter equations, consider an L dimensional dynamical system

2.2

EXTENDED KALMAN FILTER

16

represented by a set of discretized state space equations,

Ok = f(Ok-i,qk-i)

(2-16)

i/>k = h(Ok,rk)

(2.17)

where 6k represents the system state vector at time k, /(•) is the nonlinear function of
the state, qk is the random (white) model errors, t/)k is the measured state, h(-) is the
measurement function, and rk is the zero-mean random measurement noise. Also, we
assume that the state, observations and the noise terms are Gaussian processes. To
solve the estimation problem, we must determine the conditional probability density
function p{Ok\tpi:k),

which conveys all the information about the state, using (2.9).

Since we are assuming that this probability density function is Gaussian, we only need
to calculate its mean and covariance. The mean (estimate) and the covariance can
be obtained using the extended Kalman filter for the nonlinear system (2.16)-(2.17),
which consists of a time update (or prediction) via

0k=f(ek-i,qk-i)

(2-18)

Pk- = *kPk-1*J

(2.19)

+ Qk

and a measurement update (or correction) via

ek = e~k+Kk[^k-h(e'k)\

Pk = [I- KkHk]

Pk [I - KkHkf

(2.20)

+ KkRkKj

(2.21)

where Pk is the state error covariance matrix, and Kk is the Kalman gain given by

Kk = PkHk

[Hkp-kHTk

+ Rk]-1

(2.22)

ENSEMBLE KALMAN FILTER

2.3

17

The superscript "—"represents the prior states given by the following equations

d^ElfiO^q^)}

(2.23)

$-k=E[h{ez,rk)]

(2.24)

= E [</>,] = H6~k

(2.25)

where E [•] represents the mathematical expectation or the expected value. Qk and
Rk are the model and observation error covariance matrices respectively. The matrices
<frfc and Hk are those of the linearized system, known as the state transition matrix
(or fundamental matrix) and the linearized measurement operator respectively, given
by

*k = Vf(0,q)\g=§k_i

(2.26)

Hk = Vh(0,r)\e=§-

(2.27)
(2.28)

Due to the linearization, the EKF estimate often diverges from the true value for
highly nonlinear systems. This is due to the fact that the estimation error covariance
Pit becomes significantly different from the true estimation error. There are many
alternative strategies to tackle this issue but they are practically infeasible due to
computational burdens. A detailed proof and interpretation regarding all aspects of
EKF can be found in Jazwinsky [1970].

2.3

Ensemble Kalman Filter

In the case of continuous state evolution in time, the Chapman-Kolmogorov equation (2.12) will turn into a partial differential equation, known as the Kolmogorov
forward equation or the Fokker-Plank equation. Analytical solutions to the Fokker-

2.3

ENSEMBLE KALMAN FILTER

18

Plank equation have been found only in rare cases. For Gaussian distributions, if
the dynamical model is expressed as a stochastic differential equation, the prediction
error statistics, which is described by the Fokker-Plank equation, can be estimated
using ensemble integrations [Evensen, 1994, 1997]: i.e. the error covariance matrices
can be calculated by integrating the ensemble of model states. This is the principle
behind EnKF, first introduced by Evensen [1994]. In a broad sense, the basic idea
behind EnKF is to describe the probability function of the state using an ensemble
of possible states [Evensen, 1992; Houtekamer and Mitchell, 1998].
The EnKF makes use of the power of both the ensemble technique and KF algorithm. The EnKF can overcome two major drawbacks of EKF. The first relates
to the use of the first order linear approximation by a Taylor series expansion of the
nonlinear model thereby neglecting the contributions from higher order statistical
moments in calculating the error covariance. The second is the huge computational
requirements associated with the storage and forward integration of the error covariance matrix. The major strength of the EnKF scheme is that there is no need to
calculate the tangent linear model or Jacobian, which is extremely difficult in ocean
(or atmospheric) general circulation models (GCMs). In the EnKF scheme the covariance matrix is propagated in time via fully nonlinear model equations (no linear
approximation as in EKF). Another practical advantage of the EnKF scheme is that
it is well suited to modern parallel computers (cluster).
In the ensemble Kalman filter, the forecast covariance matrix and the analysis
covariance matrix given by the EKF equations (2.19) and (2.21) can be rewritten as

P'k = (0* - 9k){9k - Gk)T

(2.29)

Pk = {ek-h){h-h)T

(2.30)

where P^ is the forecast covariance matrix from an ensemble of model forecasts

2.4

SIGMA-POINT KALMAN FILTERS

19

and Pk is the analysis covariance matrix from an ensemble of model states. The
over-line denotes the average of the ensemble. Theoretically an infinite ensemble of
model states is required to compute the full covariance matrix, which represents the
true error statistics of the model. However, it has been shown that an ensemble
of limited size will provide an approximation to the error covariance matrices with
reasonable accuracy [Evensen, 1994, 1997; Houtekamer and Mitchell, 1998]. Also,
since the members of the analysis ensembles are essentially independent, they may
be computed simultaneously using parallel computers.

2.4

Sigma-point Kalman Filters

The so called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman
filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF) [Julier et al., 1995; Wan and Van Der Merve, 2000], central difference
Kalman filter (CDKF) [Ito and Xiong, 2000; N0rgad Magnus et al., 2000b] and their
square root versions [Haykin, 2001; Van der Merwe and Wan, 2001a,b]. Another
interpretation of the sigma-point approach is that it implicitly performs a statistical linearization [Gelb, 1974; Lefebvre et al., 2002] of the nonlinear model through
a weighted statistical linear regression (WSLR) to calculate the covariance matrices [Van der Merwe and Wan, 2001a,b; Van der Merwe et a l , 2004]. In SPKF, the
model linearization is done through a linear regression between n number of points
(called sigma points) drawn from a prior distribution of a random variable rather
than through a truncated Taylor-series expansion at a single point [Van der Merwe
et al., 2004]. It has been found that this linearization is much more accurate than that
of a truncated Taylor series linearization [Lefebvre et al., 2002; Schei, 1997; Van der
Merwe and Wan, 2001a]. In the next subsection we will interpret the sigma-point

2.4

SIGMA-POINT KALMAN FILTERS

20

concept and SPKF algorithms in detail.

2.4.1

Sigma-point Approach

Consider an L dimensional dynamical system represented by a set of discretized state
space equations,

0k = f(0k-uqk-i)

(2-31)

tl>k = h(Ok,rk)

(2.32)

where 0k represents the system state vector at time k, /(•) is the nonlinear function
of the state, qk is the random (white) model errors, ipk is the measured state, h(-) is
the measurement function, and rk is the zero-mean random measurement noise.
To understand how the Sigma-point algorithm works, it is convenient to rewrite
the standard Kalman filter optimal state update equation

ek = e~k+Kk{tt>k-i>t)

(2.33)

where the superscript "—"represents the prior states. Here Kk is the Kalman gain,
which is optimally chosen such that it minimizes the weighted scalar sum of the
diagonal elements of the error covariance matrix Pi

[Gelb, 1974]. The standard

expression for the Kalman gain and the error covariance matrix is given by

Kk = P^HT

[HPekHT

+ R] ~X

Po„ = E [ 0 k - 0kJ [Ok - 0 k J

(2.34)
(2.35)

where H is the linearized measurement operator. The error covariance update or the
analysis covariance matrix, which represents the change in forecast error covariance,

2.4

SIGMA-POINT KALMAN FILTERS

21

when a measurement is employed, is given by

Pek = (I - KkH) P-Bk

(2.36)

where I is the identity matrix. For EKF, the formulation of the forecast error covariance is given by

Pek = Ak-iPe^Al-i

+ Qk-i

(2-37)

where Ak-i is the TLM of the nonlinear model (2.31)-(2.32), and Qk_i is the model
error covariance matrix. The TLM often introduces errors in highly nonlinear models and is extremely difficult to obtain for GCMs. Another drawback of EKF is
that it uses the linearized measurement operator, H, to calculate the Kalman gain
and update error covariance. The linearization of nonlinear measurement function is
computationally difficult and may result in estimation errors.
On the other hand, the forecast error covariance for EnKF can be calculated by
integrating the ensemble of model states, and is given by

Pgk = (Ok - h)(6k - 0k)T

(2.38)

where the over-line denotes the ensemble average. Limited ensemble size often introduces errors in approximating the error covariance matrix. The EnKF also uses the
linearized measurement operator, H, to calculate the Kalman gain. If the model involves nonlinear measurement functions, linearizing the nonlinear measurement functions might result in estimation errors.
The SPKF family addresses the above issues of EKF and EnKF. It uses a different approach in calculating the Kalman gain and the error covariance matrices.
The technique employed in SPKF is to re-interpret the standard Kalman gain and

2.4

SIGMA-POINT KALMAN FILTERS

22

covariance update equation in such a way that it does not need the TLM and the
linearized measurement operator. This interpretation is explained as follows.
The first term, Pg HT, in Kalman gain equation (2.34) can be interpreted as the
cross-covariance, Pe ^ , between the state and observation errors and the remaining
expression can be interpreted as the error covariance, Px , of the difference between
model and observation [Gelb, 1974]. Proof1 of this interpretation can be found in
Appendix A. Therefore the optimal gain or Kalman gain, Kk, can be re-written as

Here V'fcis defined as the error between the noisy observation ipk and its prediction */;fe given by ij)k = tyk — t/>fc . By using the relation (2.39) the covariance update
equation (2.36) can be re-written as (see Appendix B for details)

Pek = P~ek - KkP^Kl

(2.40)

Unlike the standard KF, the SPKF algorithm makes use of this new interpretation (Eqs. 2.39 &; 2.40), which avoids the use of the

"Jacobian" while retaining

consistency and accuracy.
In the standard KF the state error covariance is calculated during the "time update" process, and is updated during the "measurement update" process. Updating
the error covariance matrix is important since it represents the change in forecast error
covariance when a measurement is performed. The EnKF implementation does not
require the covariance update equation since it can directly calculate the updated error covariance matrix from a set of ensembles. Evensen [2003] has derived the analysis
covariance equation, which is consistent with the standard KF error covariance update
*A more detailed statistical derivation and interpretation of these formulations can be found in
Simon [2006].

2.4

SIGMA-POINT KALMAN FILTERS

23

equation (2.36). But the true representation of the updated error covariance requires
a large ensemble size, which is often computationally infeasible. The SPKF makes
use of the reformulated error covariance update equation (2.40) and "chooses" the
ensembles deterministically in such a way that it can capture the statistical moments
of the nonlinear model accurately; in other words, the forecast error covariance equation (2.35) is computed using deterministically chosen samples, called sigma points.
In a broad sense, the SPKF algorithm implicitly uses the prior covariance update
equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus SPKF is fully consistent with the "time update" and "measurement
update" formulation of the Kalman filter algorithm. In the next subsection we will
discuss each SPKF algorithm in detail.

2.4.2

Sigma-point Unscented Kalman Filter

The Sigma-Point Unscented Kalman filter (SP-UKF) [Julier, 1998; Julier et al., 1995;
Wan and Van Der Merve, 2000] is a SPKF that can capture the statistical properties of
the model state through a method known as scaled unscented transformation (SUT)
[Julier, 2002]. Unlike the EKF, the SP-UKF uses the true nonlinear model and
approximates the state distribution using a set of deterministically chosen states,
known as sigma-points, using SUT. In SP-UKF the state error covariance matrix
is calculated from a set of "particular ensembles " that are generated by "sigmapoints " . Julier et al. [1995] have shown that for the nonlinear model given by (2.31),
the number of sigma-points needed to compute precisely the mean and covariance of
the model state at time k is 2L + 1. The sigma-point state vector is given by [Julier,
2002; Julier et al., 1995; Wan and Van Der Merve, 2000]

Xk=[xkfl

X+Ki

X^}

i = l,...,L

(2.41)

2.4

SIGMA-POINT KALMAN FILTERS

24

where Xkto, X-tn an< ^ -^-k i a r e ^ n e sigma-point vectors. The selection scheme for choosing the sigma-points is based on the scaled unscented transformation that transforms
the model state vector according to the following equations:

4m) = j

Xkfl = 0k
*t,i = 0k+(y/(L

+ \)P9ky

i = i,...,L

^

^c) = ^ - ~

(2-42)
+ ( l - a 2 + /3)
(2.43)

x

k,i = ek~

\J(L

+ X)P0kJ

(m)

i = (L + i),...,2L «/:v

(c)
1
— TV] ' = -—-

2 (L + A)—

i=

l,...,2L

(2.44)
where ( y^(L + A) Pe fc J is the ith row (or column) of the weighted matrix square root
of the covariance matrix Pgk. io\

is the weighting term corresponding to the mean

and zvf' corresponds to the covariance. A = a2 (L + K) — L is a scaling parameter.
The parameter a is set to a small positive value (0 < a < 1) and determines the
spread of the sigma-points around the mean state dj- • nis another control parameter
which guarantees the positive semi-definiteness of the covariance matrix and is usually
set to a positive value (K > 0). /3 is a non-negative weighting term which can be used
to incorporate any prior knowledge of the nature of the state distribution 2 .
The sigma-point vector is then propagated through the nonlinear model (2.31)(2.32) given by

xl^fixl^xU)
<yek = h(xlxl)

(2.45)
(2.46)

where Xek_x is the sigma-points calculated from (2.42)-(2.44), Xek is the forecast sigma2

The weighting term corresponding to the zeroth sigma-point directly affects the magnitude of
errors in higher order moments for symmetric distributions [Julier, 2002; Van der Merwe et al., 2000].
The parameter (3 is thus introduced which affects the weighting of the zeroth sigma-point for the
calculation of the covariance, which allows the minimization of higher order errors if prior knowledge
of the distribution is available.

2.4

SIGMA-POINT KALMAN FILTERS

25

point state vector, X\_x is the sigma-point vector corresponding to the model error,
and X\ corresponds to the observation error. The approximated mean, covariance and
cross-covariance for the calculation of Kalman gain are computed as follows [Julier,
2002; Julier et al., 1995; Wan and Van Der Merve, 2000; Wan and Van der Merwe,
2001];
1L

(2.47)

0fc ~ 2_^ Wi ^ M
i=0
2L

(2.48)
i=0
2L
8k

*fc)(*W-0fc)T

(2-49)

$l)(?k}i-i>lf

(2.50)

6~k)(yk,i-i>lf

(2.51)

2L

i>k
j=0
2L

yOk

The Kalman gain, K, can be calculated using the equation (2.39) and the state
covariance is updated using equation (2.40). A detailed description and derivation of
the UKF algorithm and sigma-point formulation can be found in the above referred
literature.

2.4.3

Sigma-point Central Difference Kalman Filter

Unlike the SP-UKF, the SP-CDKF is based on the Stirling's interpolation formulae
[Ito and Xiong, 2000; N0rgad Magnus et al., 2000b; Press et a l , 1992]. In SP-CDKF
the analytical derivatives in EKF are replaced by numerically evaluated central divided differences.

The linearization in SP-CDKF is based on weighted statistical

linear regression [Lefebvre et al., 2002]. It has been shown that the SP-CDKF has
superior performance to SP-UKF and EKF [Ito and Xiong, 2000; N0rgad Magnus

2.4

SIGMA-POINT KALMAN FILTERS

26

et al., 2000b]3. Using Stirling polynomial interpolation, the nonlinear model given by
equation (2.31) can be approximated as

f{dk)^f{4>k) + bk + ^bl

(2.52)

where / ((/>fc) is the linearized model, and Dk and Dk are the central divided difference operators, which we will explain in the latter part in this section. Here the
linearization of the nonlinear model is achieved by using a linear transformation that
statistically decouples 4 the state vector Bk [Schei, 1997]. It has been shown that this
approximation is always better than using the Jacobian matrix [Schei, 1997]. The linear transformation is based on the square root factorization of the model covariance
matrix, and is given by

4>k = S^0k

/ ( & ) = / C M * ) = /(**)

(2.53)

(2-54)

Here 0k is the mean state and Sgk is the Cholesky factor of the updated error covariance matrix (2.40) that satisfies the following condition

Pek = S0kSl

(2.55)

- 2

The terms Dk and Dk are the first and second order central divided difference operators and can be written as [Ito and Xiong, 2000; N0rgad Magnus et al., 2000a;
3

But our numerical experiments show that the SP-CDKF does not always outperform SP-UKF.

See the below discussions.
4

T h e linear transformation from the stochastic vector <j>k to Ok- Ok = S$k4>k, decouples the fully
coupled state vector Ok where the covariance of 4>k is equal to identity matrix. For computational
reasons the square root matrix Sgk often keeps triangular (Cholesky decomposition). More details
on decoupling and its advantages in Kalman filters can be found in Ohmuro [1984], Baheti et al.
[1990], and Daum and Fitzgerald [1983]

2.4

SIGMA-POINT KALMAN FILTERS

27

Van der Merwe and Wan, 2001a; Wan and Van der Merwe, 2001];

^k

=

\J2 (4k - 4>k)i midi

(2.56)

f (4k)

i=0

/

i>l-

L

L

d

J2 (4k ~ 4k)] f + J2J2(4k~
j=l

4=0

<t>k)j (4k - 4k) g (™A) (mqdqf(4k)
)

<?=0

J
(2.57)

where m.j, dj, and d? are the mean, partial first order and partial second order central
divided difference operators respectively, defined as

™if (4k) = \[f (h + 5a6i) +f(Bk- 5s6i)]

(2.58)

dJ (4k) = ^[f
25

(2.59)

d2J(4k)

(ek + 5sei) -f(ek-

1
[f {Bk + ss6i) + f{ek282

5sei)}
ss6i) - 2/ (ek)]

(2.60)

where S is the central difference step size and sei is the ith column of the Cholesky
factor of the covariance updated error covariance matrix (2.40):

se.

0k

.

(2.61)

For implementing the SP-CDKF, augmented state vectors are constructed by concatenating the original model state, model and observation error vectors. The aug-

2.4

SIGMA-POINT KALMAN FILTERS

28

merited sigma-point state vectors are calculated using the following selection scheme.

(m)

•*fc,o = Ok
1,...,L

(m)
Wi

(L+1),...,2L

u>-

S2-L
S2
1
252

(a) = J_
452

i = i,...,2L

(2.62)

i = l,...,2L

<52-l
M
_
z^

1,...,2L

4<54

The augmented sigma-point vectors are then propagated through the approximated
nonlinear model (2.52), and the approximated mean model state vector can be computed as follows.

Ou ~ E

f(4>k) + Dk + ^i)l

62-L
82

1

(2.63)
L

(2.64)
i=l

2L

E

XVi

(2.65)

A. i

Similarly, the measurement state, the forecast covariance and the cross-covariance for
the calculation of the Kalman gain are given by
2L

(2.66)
L
P

8k ~ E

W Cl

i

(**:,« ~~ Xk,L+i) + Mi* {Xk,i + Xk,L+i ~ 2Xkfl)

(2.67)

L
.0 \ 2 '

(ci)
W i - * U H ) 2 + ^(C2) 0 > t + * l ™ - 2 <o)
i>u E h

(2.68)

i=l
L

(2.69)
j=0

yJ^PH. [?UL - 91,L + l : 2 L j

(2.70)

2.5

A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH

29

One main advantage of SP-CDKF over SP-UKF is that it uses only one "control
parameter" (S) compared to three (X,a, and «,) in UKF. For exact derivation and
algorithmic details see Ito and Xiong [2000], N0rgad Magnus et al. [2000a], Van der
Merwe and Wan [2001a], and Wan and Van der Merwe [2001].

2.5

A New Ensemble Kalman Filter: Sigma-point Approach

In this section we extend the concept of sigma point approach to the traditional
ensemble Kalman filter (EnKF). In SPKF the forecast error covariance matrix is
computed using deterministically chosen samples, called sigma points. In general
we can consider these sigma-points as specific conditional ensembles, i.e. choosing
"ensemble" (sigma-points) using the scaled unscented transformation (SUT) for SPUKF and the Stirling's approximation formula for SP-CDKF. In other words, we can
consider the ensemble Kalman filter as a general sigma-point Kalman filter without
a specific selection scheme for choosing the ensembles.
The main concept behind the formulation of EnKF is that if the dynamical model
is expressed as a stochastic differential equation, the prediction error statistics, which
is described by the Fokker-Plank equation [Jazwinsky, 1970], can be estimated using
ensemble integrations [Evensen, 1994, 1997]. i.e. the error covariance matrices can
be calculated by integrating the ensemble of model states. Thus the forecast error
covariance matrix, given by the Kalman filter equation can be rewritten as

pf=pfe=

(dk - ek)(ek - hV

(2.7i)

where P{ is the forecast covariance matrix from an ensemble of model forecast states.
The over-line denotes the average of the ensemble. Theoretically an infinite ensemble
of model states is required to compute the full covariance matrix, which represents
the true error statistics of the model. However, it has been shown that [Evensen,

A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH

2.5

30

1994, 1997; Houtekamer and Mitchell, 1998] an ensemble of limited size will provide
an approximation to the error covariance matrices with reasonable accuracy.

Pf

1

M

= M T I E (** - *) (** - ^) T

(2-72)

m=l

where M represents the number of ensemble forecasts.
Here we can consider each perturbed initial state as an EnKF sigma-point. Thus
the selection scheme for choosing the sigma-points and corresponding weighing terms
are given by

X-k,m = 6k,m + tk,m

W= ,

_ ,

(2.73)

where efc,m is a sequence of white uniform random perturbations and u> is the weighting
term for mean and covariance. The number of sigma-points needed to approximate
the mean and covariance of the model state at time A;, is M, which is equal to the total
ensemble members. The sigma-point vector is then propagated through the nonlinear
model (2.31)-(2.32) to get the forecast sigma-points or ensemble, given by

4 = /(**_!,4_i)

(2-74)

0r{ = h(xi,xl)

(2.75)

where x[ is the forecast sigma-points corresponding to the ensemble forecast states,
Xqk_x is the sigma-points corresponding to the model error, and Xrk that of the observation error. Now the ensemble forecast error covariance given by equation (2.72)

2.5

A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH

31

can be re-written as
M

1

Pf

P

T

X

= ~ek « ^ 3 7 E ( L - 4) (4,m - 4)

(2-76)

TO=1

M

_

T

X

= " E { im ~ Ok) (4,m " Ok)

(2-77)

m=l

where
M

d~k~4 = "Y,4,m

(2-78)

m=l

Similarly, the approximated mean, covariance and cross-covariance for the calculation
of Kalman gain are computed as follows.
M

(2-79)

^*««>X>iU
m=l
M

_

T

P

^ * " E H m - ^* ) (^,m " 4")
m=l
M

P

fc& « ^ E
( 4 m " *fc ) ( ^ . m - ^ f c )
m=l

(2-80)
T

(2-81)

The optimal gain (Kalman gain, K) can be calculated using Eq.(2.39) and the state
covariance is updated using the equation (2.40). The main advantage of using this approach is that we can avoid Eq. (2.34) that requires the linear measurement operator
H to calculate the Kalman gain, thereby extending EnKF for nonlinear measurement
functions.
We have extended the sigma-point approach to ensemble Kalman filter and formulated the ensemble Kalman filter as a general sigma-point Kalman filter without
a specific selection scheme. In other words the forecast sigma-points in SP-UKF and
SP-CDKF and other Sigma-point Kalman filter algorithms are actually specific ensembles conditioned on the specific selection schemes, which can represent the error
statistics with better accuracy than EnKF. This allows us to extend the EnKF to

2.5

A NEW ENSEMBLE KALMAN FILTER: SIGMA-POINT APPROACH

32

a more generalized approach of SPKF, which is suitable for nonlinear measurement
functions. Also, it has been shown [Keppenne, 2000] that the ensemble forecast step in
EnKF can be parallelized by running each ensemble member on a separate processor
of a parallel computer (or cluster), resulting in huge computational savings. This approach can be extended to SPKF data assimilation problems, since each sigma-point
propagated through the nonlinear model, are independent of each other.

Chapter

3

Experiments with the Lorenz '63 Model
In the field of data assimilation, the celebrated Lorenz model has served as a test bed
for examining the properties of various data assimilation methods [Evensen, 1997;
Gauthier, 1992; Miller et al., 1994] as the Lorenz model shares many common features with the atmospheric circulation and climate system in terms of variability and
predictability [Palmer, 1993]. By adjusting the model parameters that control the
nonlinearity of the system, the model can be used to simulate nearly-regular oscillations or highly nonlinear fluctuations.
The Lorenz [1963] model consists of nonlinear ordinary differential equations of
three components, given by
dx

~dl
dy_
dt
dz

~dl

o(y-x) + (f

(3.1)

px — y — xz + qy

(3.2)

xy - pz + qz

(3.3)

where variables x, y, and z are related to the intensity of convective motion, and the
temperature gradients in the horizontal and vertical directions, respectively, and the
parameters a, p, and f3 will be referred to as dynamical parameters. qx,qy, and qz
represents the unknown model errors, assumed uncorrelated in time (white noise).
33

STATE ESTIMATION

3.1

34

Also, we assume that all the measurements or observations are linear functions of
the nonlinear model states. The true data are created, by integrating the model over
4000 time steps using the fourth-order Runge-Kutta scheme [Press et al., 1992], with
parameters a, p, and j3 set to 10.0, 28.0, and 8/3 respectively, and initial conditions
set to 1.508870, -1.531271, and 25.46091 [Evensen, 1997; Miller et al., 1994]. The
integration step is set to 0.01. The observation data sets are simulated by adding
normal distributed noise to the true data. The assimilation process is completely
subject to the model equations (3.1) - (3.3) after the initial guesses are given; at each
step of the integration, the initial conditions are the estimated model state from the
previous step.

3.1

State Estimation

To apply KF, we discretize the nonlinear Lorenz model (3.1)-(3.3) using the fourthorder Runge-Kutta method and write it in the form of state space equations given
by (2.31) and (2.32), where 6k represents the system state vector (a column vector
composed of x, y and z), /(•) is the nonlinear function of the state and qk is the
random (white) process noise vector (column vector composed of qx, qy and qz). The
measured model state, xj)k, required for the application of the KF, is a function of the
states according to the equation (2.32), where h(-) is the measurement function and
rfc is the random measurement noise vector.
To implement the SP-UKF, the state vector is redefined as the concatenation of
the model states, model errors and measurement errors: the augmented state vector

STATE ESTIMATION

3.1

35

®k and the corresponding covariance matrix are given by the following equations.

ek =

Pek =

(3.4)

Qk

P,dk

0

0

o

Qk

o

0

0

Rk

(3.5)

Therefore the augmented state dimension is the sum of the original state dimension,
model error dimension and measurement error dimension given by

L@ — La + La + Lr

(3.6)

where La is the dimension of the state, Lq is the dimension of the model error vector
and Lr is that of measurement errors. The augmented sigma-points are found using
the transformation equations (2.42)-(2.44). The dimension of the augmented sigmapoint vector is 2LQ + 1. For the Lorenz model discussed here, the augmented sigmapoint vector dimension is 19. In other words the number of sigma-points required to
approximate the error statistics accurately is 19. The augmented sigma-point vector
is then propagated through (2.45)-(2.46) and the optimal terms for the calculation of
Kalman gain are computed according to the equations (2.47)-(2.51).
On the other hand, for implementing SP-CDKF, we split the augmentation process between time update and measurement update sections. For time update, the

3.2

PARAMETER ESTIMATION

36

augmented state vector, covariance matrix, and sigma-point vector are given by

0fc-l = 0fc-i
0k-i

0

0

Qk-i

P

°e*-i =

(3.7)

9*-i

(3.8)

®k- x + Sx/Pe^

X-k-\ — Ofc_!

e fc _!-- tfVPe*-/

(3.9)

and for measurement update they are given by

(3.10)

0 f c = 0,
0

(3.11)

Pek =
0
Xk

@k

Rk
@k +

SVPek

@k -

8VP@k

(3.12)

The optimal terms for the calculation of Kalman gain are computed using (2.65)(2.70).

3.2

Parameter Estimation

The model parameter estimation can be regarded as a special case of general state
estimation where the parameters are treated as specific states. Parameter estimation
involves determining a nonlinear mapping

Yh = *C(9k,A)

(3.13)

where the nonlinear map fA£(-) may be the dynamical model /(•) or an empirical model
parameterized by the vector A. The state space representation of the parameter

3.3

JOINT ESTIMATION OF STATES AND PARAMETERS

37

estimation problem for the Lorenz model can be written as,

Ak = Ak_1 + qL1
i>k = f(Ok,Ak)

+ rkx

(3.14)
(3.15)

where / ( • ) is the nonlinear measurement model given by the Lorenz equations (2.31)(2.33), A is the parameter vector which constitutes the dynamical parameters a, p
and /3, and qfc and r £ represent the model and measurement error vector respectively.
The SPKF (SP-UKF and SP-CDKF) equations for the parameter estimation problem
are similar to those of the state estimation formulation except the fact that the state
(here states are parameters) time evolution is linear (Eq. 3.14) and the measurement
function is nonlinear (Eq. 3.15).

3.3

Joint Estimation of States and Parameters

The joint estimation approach is required when the "clean" state and parameters
are not available. In general there are two approaches to estimate the model state
and parameters simultaneously, namely the dual estimation and joint estimation approaches [Haykin, 2001; Nelson, 2000; Van der Merwe, 2004]. In the dual estimation
approach, two Kalman filters are running simultaneously for state and parameter estimation. On the other hand, in the joint estimation approach, the system state and
parameters are concatenated into a single higher-dimensional joint state vector and
only one Kalman filter is used to estimate the joint vector. For example, the joint
"state vector" Jk for the SPKF data assimilation can be written as,

Jk = [6k

Akf

(3.16)

3.3

JOINT ESTIMATION OF STATES AND PARAMETERS

38

In the joint estimation process, the SPKF schemes estimate the states using parameters that are estimated at every time step using the prior states. In this study, we
will only present the joint estimation (instead of dual estimation) of parameters and
states since it incorporates complete model states and parameters during assimilation
cycles.

Chapter

4

Results and Discussion
In this section we demonstrate the feasibility of the SPKF algorithms as effective data
assimilation methods for highly nonlinear models. The SPKF algorithms discussed
in the previous sections will now be examined and compared with standard EKF and
EnKF methods. To compare the SPKF algorithms with standard EKF and EnKF,
all experiments were designed almost identically to those of Miller et al. [1994] and
Evensen [1997].

4.1

State Estimation

The first set of experiments were carried out with initial conditions, parameters and
observation noise levels identical to those in Miller et al. [1994], and Evensen [1997]:
the observations and initial conditions are simulated by adding normal distributed
noise iV(0, \/2). Also the interval of observation is set to 25 time steps, i.e.

the

observed states are assimilated to the nonlinear model at every 25 time steps.
For all the cases to be discussed, we assume that the model and observation errors are uncorrelated in both space and time. Since there is no general way to set
the model error, the amount of model error to use in the KF is often determined
experimentally by trial or by statistical methods such as Monte-Carlo, which is com-

39

4.1

STATE ESTIMATION

40

putationally expensive [Miller et al., 1994]. In our experiments, the model errors were
intentionally designed in such a way that the model would not drift from the true
state too much 1 . In detail, we set the model errors by calculating the expected errors
in the state scaled by a decreasing exponential factor which is a function of the assimilation time; initially, the model covariance matrix is set to an arbitrary diagonal
value, and then anneals towards zero exponentially as the assimilation proceeds. For
simulating model errors in the ensemble Kalman filter, we follow the method suggested by Evensen [2003]. An ensemble of 1000 members was used in the EnKF as in
Evensen's experiment [Evensen, 1997].
Figure 4.1 - (a), (b), (c) and (d) show the state estimate using the EKF, EnKF,
SP-UKF and SP-CDKF respectively. As can be seen, all four methods can generate
the model states similar to true values, indicating good capability of these methods
in estimating model states if the size of initial perturbation and observed noise are
appropriate as given here. Figure 4.1 - (a) and (b) were also obtained by Miller et al.
[1994] and Evensen [1997]. It should be noted that the EKF and the EnKF can have
good state estimates but the former needs to construct TLM and the latter asks a
large ensemble size of 1000. In contrast, the SP-UKF and the SP-CDKF only use 19
"particular" ensemble members (sigma-points) here, showing their advantages over
EKF and EnKF.

A comparison among the four methods is shown in Fig.4.2 the

variation of the error square (ES) with time step. The ES is defined here as the
square of the difference between estimated state and true model state scaled by N,
where A" is a scalar quantity 2

Error = 1 (0k - < u e ) 2
x

(4.1)

The model is considered to have a relatively large error at initial time so that the assimilation

weighs more observation information. As such the model prediction would not drift from the 'true
value' too much.
2
We choose N as 4000, which is the total assimilation period.

4.1

STATE ESTIMATION

41

4000

Figure 4.1: Case 1 - Assimilation solutions for the Lorenz model: The black curve is the
true model, + indicates the noisy observations and the red trajectory indicates the assimilated solutions

4.1

STATE ESTIMATION

42

Prom Fig. 4.2, we can see that the SP-UKF and SP-CDKF assimilations have
smaller ES than EnKF in most times although some assimilation steps have an opposite situation. These "peak" values of ES correspond to either over-estimation or
under-estimation of model states, which are most probably related to random noise
in the

"observations " and to the chaotic nature of the Lorenz system. The state

estimate is probably poor when a large noise is assimilated and when the state is a
transition from one chaotic regime to the other (also see below discussions).

The

overall performance of each assimilation is measured by the root mean square of error
(RMSE)

over all time steps, as shown in Table 4.1. As be shown, SP — UKF and

Assimilation M e t h o d

Computation Time (in Seconds)

RMSE

EKF

37.04

1.812

EnKF (with 1000 ensembles)

7143.57

1.987

EnKF (with 19 ensembles)

132.77

6.123

SP-UKF

133.91

1.640

SP-CDKF

90.42

1.592

Table 4.1: RMSE and computation time for Case 1

SP — CDKF

have slightly smaller RMSE

than others. The most impressive point

in the table is that SPKF methods use only 19 sigma-points (or 19 conditional ensembles) to estimate the statistical moments of the nonlinear model accurately. This
turns out to be an advantage for the data assimilation problems in low dimensional
systems but in the case of atmosphere/ocean GCMs, the 2L + 1 integration is not
computationally feasible. More details on SPKF implementation, its limitations, and
methods to overcome the limitations are described in detail in next section (Section
5).
For the sake of completeness, we performed an EnKF assimilation experiment with
19 ensembles compared to 1000 ensembles. The result of this experiment is shown

STATE ESTIMATION

4.1

0.1

43

(b)
I

I

I

i

0.08 0.07 0.09

-

g 0 . 0 6

g 0.05
UJ
0.04
0.03

-

0.02
0.01

0.1

I

}

i„ ,

\

*

. /I.

n

ri _ . -

AlAi.M

...

J li

it Ai iNL

AH JU

i ..j.. J .

I * J I

.J

(c)
I

I

1

1

I

1

i

r^
i

0.09

-

0.08
0.07
g 0 . 0 6

o 0.05
t
UJ
0.04

_
-

0.03
0.02

I

0.01

-.

. -J.

i

A.

l

.At

^uijL. -

A. J i . . r

J ~-..^.ll. .

1 4*.

A

A.A

i

AA. i-.JL

J_

J 1.J

(d)
1

1

0.08 0.07 ,0.06 0.09

1
-

o 0.05
1J
0.04
0.03
0.02

1

0.01
JMlJlJtJt

IM « . ...

-

-

W

A

./Ld.L
2000
Time Steps

A,

..

1 Hk.

*.

1

..An

Figure 4.2: Case 1 - Assimilation errors for state estimation (corresponding to Fig. 4-1)
(a) EKF (b) EnKF (c) SP-UKF and (d) SP-CDKF

STATE ESTIMATION

4.1

44

Figure 4.3: (a)-Ensemble Kalman filter data Assimilation solutions for the Lorenz model
with 19 ensembles: The black curve is the true model, + indicates the noisy observations
and the red trajectory indicates the assimilated solutions, (b)-Assimilation errors

in Fig. 4.3 - (a) and the corresponding ES and RMSE

are shown in Fig. 4.3 - (b),

and Table 4.1. These results (Figs. ?? & ??) show that the amplitude of the squared
errors of state estimate from the EnKF with 19 ensemble members are around 5-10
times as much as SPKF.

Thus EnKF with only 19 members could not capture the

mean and covariance of a highly nonlinear Lorenz model appropriately. On the other
hand, with just 19 conditional ensembles (or sigma-points), SPKF

is able to capture

the statistical moments of the highly nonlinear Lorenz model.

The assimilation

experiments took place on a SMP (Symmetric Multi-Processor) machine with two
AMD Optron 248 CPUs [Advanced Micro Devices, 2007] of clock speed 2.2 GHz,
running on Linux. MATLAB 7.3.0.298 (R2006) [Mathworks., 2007] software was used
to implement the model and data assimilation algorithm. Table 4.1 also compares the

4.1

STATE ESTIMATION

45

computation time taken by each assimilation algorithm discussed above. In order to
compare the computational efficiency, we use the same programming framework for
implementing all the data assimilation methods discussed above. The computational
cost is the least for EKF, followed by two SPKF

methods. The EnKF that requires

1000 members for a good estimate as shown in Fig. 4.1 - (b) is the most expensive,
around 50 - 80 times as much as

SPKF.

The second set of experiments was carried out with a more realistic situation by
increasing the observations' noise level ten fold: the observations and initial conditions
are generated by adding normal distributed noise iV(0, -\/20). The assimilation results
were shown in Fig. 4.4 and the corresponding ES is shown in Fig. 4.5.

From Fig.

4.4, and Table 4.2 we can see the superior performance of SPKF algorithms. Among
SPKF, the SP-CDKF shows comparatively better assimilation skill than SP-UKF.
The RMSE of EKF, EnKF, SP-UKF and SP-CDKF assimilation results were 5.39,
6.37, 4.25, and 4.56 respectively. It should be noted that, even though EKF result
show a low RMSE value compared to SPKF schemes, it requires the TLM for the
assimilation, which is very difficult in the case of GCMs. From Fig. 4.4 and 4.5, it is
evident that the SPKF assimilation schemes can give better estimation results even
if the system subject to higher noise. Also our numerical experiments showed that
an EnKF assimilation with 1000 ensembles can give as good results (not shown here)
as SPKF assimilation. The RMSE values are summarized in Table 4.2.

Assimilation Method

RMSE

EKF

5.39

EnKF (using 19 ensembles)

6.37

SP-UKF

4.25

SP-CDKF

4.56

Table 4.2: RMSE statistics for Case 2

STATE ESTIMATION

4.1

46

(c)
1

30

1

1

1

1

UKF
+

20 - +

+

ObS
True

10

mytj 1 +.

J *SjUiV

0„

tl \ »T li / \ Ifl_L_

-10

if

U Til

-20

+

1

+
i

i

1

1

i

i

1

(d)

30

CDK
+

20 - +

+

ObS
Tn»

10
0.

J|+fw

jm+ i 1 +

•ji+j. J.

/ *Xllr

%Ji

rt"/

i \ 1 + Thutl

1ft #irn1i+f Vw f fVU

10

+

+"*

20

+
i

500

1000

1500

1_

|

i

i

2000

2500

3000

3500

4000

Time Steps

Figure 4.4: Case 2 - Assimilation solutions for the Lorenz model with ten fold increase
in the observation noise levels: The black curve is the true model, + indicates the noisy
observations and the red trajectory indicates the assimilated solutions. (Here the EnKF
simulation is performed with 19 ensembles)

STATE ESTIMATION

4.1

47

isL
•EKF

MA
(b)

ujJI .AL J l l J J AJ J I J A. )i I L l i i AJ L J/UK-,
EnKF

/ Ifl i i J\ AL JJIA/U J.I JUL^IAW UJi. A..u*i/vl ^.ilii.L.i. J,L. Juiljif/
(c)
UKF |

-

0.25

0.2

-

-

6 0.15

-

-

0.1

0.05

-

J
•

I

I

LJIAJ, , . J... ..«.1.JI .. ..ll.ju/1 A/I

JLL„I J ..AJM ..A~+AtnlJ. Al-.tfjliuiu

3500

4000

Figure 4.5: Case 2 - Assimilation errors for state estimation (corresponding to Fig. 4-4):
(a) EKF (b) EnKF (c) SP-UKF and (d) SP-CDKF

STATE ESTIMATION

4.1

48

In the third set of experiments, we increased the observation noise level as well
as the interval between consecutive observations; the interval between observations
is increased from 25 to 40 and the observations and initial conditions are generated
by adding normal distributed noise N(0, \/20). The assimilation results and corresponding ES are shown in Fig. 4.6 and 4.7 respectively. The RMSE values of case 3
experiments are summarized in Table 4.3 4.3.
Fig. 4.7 shows some divergence in some time steps of the assimilation track among
the four methods. For example, the errors (ES) vary almost steadily in SP — UKF
whereas SP — CDKF

has a relatively significant variation of ES with time steps.

Compared with SPKF,

the variation of ES is more striking in EKF and EnKF. The

significant variation of ES might be related to the chaotic nature of the Lorenz system
and the capability of individual algorithm in capturing the observation information.
The chaotic Lorenz attractor is known to have a butterfly shape with two wings. For
a good estimate of the transition state from one wing to the other, the assimilation
should be able to characterize the information of both wings of the Lorenz attractor. Obviously this depends on two issues: observation itself and the assimilation
algorithm. If the observation is more frequently assimilated (i.e., interval between
observations is small), sufficient data allow to cover more information of both chaotic
regimes in assimilation. This is the reason why there are much more 'abnormal' values of ES in Fig. 4.7 than in Fig. 4.2 where the observation is more frequent. On
the other hand, if one assimilation algorithm has better capability to mix observation
and model information to characterize transitions, it would have better estimates for
transition states. In many cases, it highly depends on the model and observation
error covariances. When observation error covariance is usually pre-described, the
model error covariance is updated at each assimilation step in the family of Kalman
filters, depending on the algorithm. Thus Fig. 4.7 suggests that SPKF

is probably

better than EKF and EnKF in the assimilation of some transition states using noisy

STATE ESTIMATION

4.1

T

500

1000

1500

I

I

I

I

2000

2500

3000

3500

49

4000

Time Steps

Figure 4.6: Case 3 - Assimilation solutions for the Lorenz model with less observations
and ten fold increase in the observation noise levels: The black curve is the true model, +
indicates the noisy observations and the red trajectory indicates the assimilated solutions.
(Here the EnKF simulation is performed with 19 ensembles)

STATE ESTIMATION

4.1

50

0.2
0.15
0.1
0.05

\LhLLJL AJLJLJLJ JL
500

1000

1500

y[ iu.. JMjJli/f
2000

2500

3000

3500

4000

Time Steps

Figure 4.7: Case 3 - Assimilation

errors for state estimation

(a) EKF (b) EnKF (c) SP-UKF and (d)

SP-CDKF

(corresponding to Fig. 4-.6):

4.2

500

1000

1500

2000
Time Steps

2500

3000

3500

4000

Figure 4.8: Case 3 - EnKF assimilation solutions with 1000 ensembles, for the Lorenz
model with less observations and ten fold increase in the observation noise levels: The black
curve is the true model, + indicates the noisy observations and the red trajectory indicates
the assimilated solutions

observations.
Assimilation M e t h o d

RMSE

EKF

6.0637

EnKF (with 19 ensembles)

8.2814

EnKF (with 1000 ensembles)

7.0390

SP-UKF

6.1847

SP-CDKF

5.5832

Table 4.3: RMSE statistics for Case 3

Again, we repeated the EnKF assimilation (for case 3) with 1000 ensembles and
the result is shown in Fig. 4.8. The result is not as good as SPKF assimilation
and seems more "noisy". This is probably because the observation assimilated is less
frequent and more noisy, thus the ensemble size of 1000 is probably not enough to
capture the statistical moments accurately.

4.2

PARAMETER ESTIMATION

4.2

52

Parameter Estimation

Estimating uncertain dynamical model parameters is one of the important tasks in
data assimilation, where the measurement function is usually nonlinear. The requirement of the tangent linear measurement operator, H in the optimal gain term given
by Eq. (2.40) makes the EKF and EnKF assimilation schemes inaccurate and inappropriate for the parameter estimation in nonlinear dynamical systems. It has been
shown [Kivman, 2003] that the EnKF data assimilation gave poor results in estimating the dynamical parameter of the Lorenz model. The SPKF methods should be
better alternatives for parameter estimation since they do not need to linearize the
nonlinear measurement function.
The experimental setup is identical to that of the first case of the state estimation
problem discussed in the above subsection. To simulate a more realistic situation, the
initial guesses of the parameters are generated by adding normal distributed noise of
covariance 100 to the true parameters. In the first case, we assume that only one
parameter (say 0) is uncertain. Thus our task is to estimate the correct value of
(3 from infrequent observations contaminated by noise. Figure 4.9 shows the SPKF
parameter estimation results. Figure 4.9 - (a) shows the parameter estimation using
SP-UKF and Fig. 4.9 - (b) that of SP-CDKF. Table 4.4 compares the true and the
estimated parameter values, and corresponding estimation errors. From this figure
it is clear that SPKF assimilation methods can retrieve dynamical parameters well
from noisy observations. In the above experiment, even though the initial parameter
was far from the true value (standard deviation is 10) the SPKF method is still
able to estimate the parameter accurately. In general, our experiments suggest a
faster convergence for the SP-CDKF algorithm. This might be due to the algorithm
tuning problem, since SP-CDKF uses only one "control parameter" (S) compared to
three (\,a, and /?,) in SP-UKF.
In the second case we assume that two dynamical parameters (say p and f3) are

PARAMETER ESTIMATION

4.2

53

HWL
CDKF
— — -True

P

400

800 1200 1600 2000 2400 2800 3200 3600 4000
Time Steps

400

800 1200 1600 2000 2400 2800 3200 3600 4000
Time Steps

Figure 4.9: Parameter estimation: (a) SP-UKF assimilation (b) SP-CDKF assimilation.
True j3 - dashed line, estimated (3 - solid red line.

Assimilation Method

True 0

Estimated f3

Squared error

SP-UKF

2.6667

2.5152

0.0229

SP-CDKF

2.6667

2.4979

0.0285

Table 4.4: Single parameter estimation statistics (Case 1)

4.3

PARAMETER ESTIMATION

JW-

30

54

£L

20

^x

<Q_

. UKF p

- CDKF

i True p

. True p

• UKF D
'

400

p

10

• CDKF

True/3

. True

800 1200 1600 2000 2400 2800 3200 3600 4000
Time Steps

400

0
0

800 1200 1600 2000 2400 2800 3200 3600 4000
Time Steps

Figure 4.10: Simultaneous estimation of two parameters: (a) SP-UKF assimilation, (b)
SP-CDKF assimilation. True (3 - dashed blue line, True p - dashed red line; estimated j3 blue solid line, estimated p - solid red line

uncertain. This situation is more difficult compared to the first case, since inaccuracy
in the estimation of one parameter can result in inaccurate estimation of the other.
Initial parameters were generated by using the same method as in the previous case:
adding normal distributed noise of covariance 100 to the true parameters. Figure 4.10
- (a) shows the results of the simultaneous estimation of p and j3 using SP-UKF and
Fig. 4.10 - (b) those of SP-CDKF. A comparison between the true and the estimated
parameters, and corresponding estimation errors are given in Table 4.5.
In the case of single parameter estimation, SPKF assimilation is able to approximate the true parameter much faster compared to the two parameter case. This
suggests that more frequent observations might be needed to accurately estimate
both parameters.

Assimilation

True p, p

SP-UKF

2.6667, 28.0 2.3051, 28.2503

0.1308, 0.0627

SP-CDKF

2.6667, 28.0 2.5165, 27.8479

0.0226, 0.0231

Estimated /?, p Squared errors (f3, p)

Table 4.5: Two parameter estimation statistics (Case 2)

4.3

JOINT ESTIMATION OF STATES AND PARAMETERS

4.3

55

Joint Estimation of States and Parameters

Data assimilation problems involving inaccurate model states and parameters arise in
many situations in meteorology and physical oceanography. In this situation our task
is to estimate the model states and parameters simultaneously from a set of noisy
observations. In this experiment we used SPKF data assimilation schemes for the
joint estimation of parameters and states simultaneously.
The experimental setup is identical to that of state/parameter estimation discussed in the above section where the interval between noisy observations are set to
25 and noise covariance is 2. In the joint estimation approach the model states and
parameters evolve in time simultaneously, where model states are estimated at each
assimilation step using the estimated parameters, which are estimated from the prior
states. In this simulation we estimated the model state x and dynamical parameter
a simultaneously.
Figure 4.11 and 4.12 show the joint estimation results and the corresponding error
square for SP-UKF assimilation, and Fig. 4.13 and 4.14 shows the joint estimation results and the corresponding error square for SP-CDKF data assimilation respectively.
Table 4.6 summarize the statistics of the joint estimation experiments.
Simultaneously estimating both state and parameter values increases nonlinearity
of the assimilation problem thereby increasing the assimilation time to retrieve them.
From Fig. 4.12 and 4.14, we can see that the ES of parameter estimate decreases with
time.

One interesting feature in Fig. 4.11 - 4.14 is when the estimated parameters

are far away the true values, the model states still have good estimation. This is
because the initial model errors for the states are much higher than those for the
parameter, thereby the analysis takes much more weights on the observation than
on model simulation associated with inaccurate parameters. When the estimated
parameter gradually approaches the true value, the ES of model state estimate seems
not to decrease significantly. This is due to the fact that as the model error decreases

4.3

JOINT ESTIMATION OF STATES AND PARAMETERS

500

1000

1500

2000
Time steps

2500

3000

3500

56

4000

Figure 4.11: SP-UKF assimilation for joint estimation of state and parameter: (a) solid
black line - true state x, observations - +, solid red line - estimated x; (b) dashed black line
- true a, solid red line - estimated a

with assimilation time, namely the model becomes more and more accurate, the model
state is more sensitive to the slight changes in the estimated parameter.

Assimilation

RMSE (state)

True a

Estimated a

Squared error (a)

SP-UKF

1.6528

10.0

9.9299

0.0049

SP-CDKF

1.9021

10.0

8.3187

2.8268

Table 4.6: Joint estimation statistics
In summary, all the above experiments, including state, parameter and joint estimation with different observation frequency and noise levels, show that sigma-point
Kalman niters are efficient and good assimilation algorithms for highly nonlinear
Lorenz system. If the observation density is high and noise level is small, all the data

4.3

JOINT ESTIMATION OF STATES AND PARAMETERS

57

a)
I

I

I

I

I

I

J

I

ukfl

0.02

-

0.015

-

0.01

0.005

I

JJliiJ

-

IIM.LJL I.M.A

II

I. i n . / 1

t/UA.

.«_. *AJ Aim.JiJLi.n..,

Time steps

Figure 4.12: Assimilation errors for SP-UKF joint estimation: (a) estimation error for
x (b) estimation error for a

assimilation methods discussed above would estimate the model state accurately, but
with the cost of additional computational expense and the requirement of TLM for
EnKF and EKF. But even if the noise level is high and the observation is less frequent, SPKF can estimate the model states and parameters with good accuracy and
without the requirement of TLM and costly computation.

4.3

JOINT ESTIMATION OF STATES AND PARAMETERS

58

4000

Figure 4.13: SP-CDKF assimilation for joint estimation of state and parameter:

(a) solid

black line - true state x, observations - +, solid red line - estimated x; (b) dashed black line
- true a, solid red line - estimated a

4.3

JOINT ESTIMATION OF STATES AND PARAMETERS

59

(a)
i

i

I

I

0.045

I

i iii> rl
CCIKTj-

-

-

_
•" 0.02 0.015 0.01 -

_
-

0.04
0.035
„

0.03

X

S 0.025

0.005

MM

I

kJ

JljHkA.../lj» lAlll .

.. JJL

I
. tlA A ^ L. L.A .A...1 .Jk -•iflL. . AJI

Figure 4.14: Assimilation errors for SP-CDKF joint estimation: (a) estimation error for
x (b) estimation error for a

Chapter \J

I

Reduced Space Sigma-point Kalman Filter
In the preceding sections, we have demonstrated the power and merits of SPKF, as
well as its advantages over EKF and EnKF by the low-dimensional Lorenz model. One
of the crucial issues in evaluating a data assimilation algorithm is its computational
expense when applied to realistic models that have a large size of dimensionality. In
this section we will further explore the SPKF using higher dimensional Lorenz models.

5.1

SPKF Data Assimilation in Higher Dimensional Systems

For an L-dimensional system, the number of sigma-points required to estimate the
true mean and covariance is 2L + 1. As described in the previous sections, this procedure works well for low dimensional models such as the Lorenz '63 model, but 2 L + 1
sigma-point integration is computationally infeasible, if the dimension system is of the
order of millions as in GCMs. Julier [Julier, 2003; Julier and Uhlmann, 2002, 2004] has
shown that using simplex unscented transformation, the minimum number of sigmapoints that gives same estimation accuracy as SP-UKF can be reduced to (L + 1).
These sigma-points are called simplex sigma-points, but for higher dimensional systems this L + l simplex sigma-point integration is still computationally intractable. A
possible solution to this problem is to reduce the number of sigma-points by selecting

60

5.1

SPKF DATA ASSIMILATION IN HIGHER DIMENSIONAL SYSTEMS

61

a particular subset of sigma-points from the original sigma-point space, which can
approximate the error statistics of the model. In the following subsections we will
examine this possibility.

5.1.1

A subspace approach with sigma-points: Design and Implementation

It has been shown that the number of degrees of freedom necessary to describe most
large scale geophysical systems is finite, and their dominant variability can be described by a limited number of modes or functions [Lermusiaux, 1997; Lermusiaux
and Robinson, 1999; Teman, 1991]. These functions evolve in time and space in
accordance with the system. The techniques commonly used for deriving such functions include dynamical normal modes, dynamical singular vectors and values, and
empirical orthogonal functions (EOFs) [Davis, 1977; Lorenz, 1965; von Storch and
Frankignoul, 1997; Wallace et al., 1992; Weare and Nasstrom, 1982], principal interaction and oscillation patterns (PIPs and POPs) [Hasselmann, 1988; Penland, 1989;
Schnur et al., 1993; von Storch et al., 1995] , and radial functions and wavelets [Gamage and Blumen, 1993; Meyers et al., 1993]. Lermusiaux et. al. [Lermusiaux, 1997;
Lermusiaux and Robinson, 1999] proposed a method to reduce error space, called
the Error Subspace Statistical Estimation (ESSE). In the ESSE approach, a reduced
rank approximation PJ! to the error covariance Pk should be defined by minimizing
the norm of the difference between Pk and P£; i.e.

rank (F^) = p m i n ^|| P f c „pP||

(5.1)

According to the minimum criterion (5.1), the error subspace is characterized by the
singular vectors and singular values of Pk.
We follow a similar idea as in ESSE to form a sigma-point subspace that approximates the mean and error covariance of system. In our approach, it is assumed that

5.2

EXPERIMENTS WITH THE LORENZ '95 MODEL

62

when the estimate of a system's full errors requires all sigma-points, its dominant
errors can be estimated using the most important sigma-points. Theoretically these
most important sigma-points should be chosen based on (5.1). However this will
bring huge complexity and be difficult to implement. For simplicity, as a good start
towards a complete solution to the problem, we have used principal component analysis (PCA) to identify the most important sigma-points which influence the evolution
of error covariance. The main idea behind using the PCA is to represent the multidimensional sigma-point space by a fewer number of sigma-points retaining main
features of the original sigma-point space; i.e. sigma-points in the principal component space are used to calculate the error propagation. The selection of sigma-points
is based on the proportion of variances. Specifically, instead of using the full sigmapoint space, we use some leading principal components, thereby reducing the number
of sigma-points required to approximate forecast error covariance. In the following
subsections, we will see the potential of this approach in the assimilation of higher
dimensional systems.

5.2

Experiments with the Lorenz '95 Model

The Lorenz '95 [Lorenz, 2006] model is a one-dimensional atmospheric model introduced by E. Lorenz in 1995 to explain the dynamics of weather at a fixed latitude.
It has similar error growth characteristics as full NWP models. The model contains
K variables X\,

• • • , X^, which may be thought of as atmospheric variables in K

sectors of a latitude circle, governed by,
dX
—rr — —Xk-i {Xk-2 ~ Xk+\) — Xu + F

(5.2)

where the constant F, called the forcing term, is independent of k. By using the
cyclic boundary conditions, the definition of Xk is extended to all values of fc; i.e.

5.2

EXPERIMENTS WITH THE LORENZ '95 MODEL

63

Xk-K and Xk+K equal X^. It is assumed that a unit time At = 1 is associated with
5 days.
The experimental setup is similar to that of Lorenz [2006], where K = 36, and
the magnitude of the forcing set to 8 for which the system is chaotic. The system
is integrated using the 4th order Runge-Kutta, with a time step At = 0.05 (i.e. 6
hours). The experiments were carried out with random initial conditions, and the
observations are generated by adding normal distributed noise N{0, \/2) to the true
states. Also the interval of observation is set to 10; i.e. the observed states are
assimilated to the nonlinear model at every 10 time steps. A more detailed discussion
of the model and its characteristics can be found in Lorenz [2005, 2006], and Lorenz
and Emmanuel [1998].

5.2.1

Performance and Evaluation

For all the cases to be discussed, we assume that the model and observation errors
are uncorrelated in both space and time. In the first case we use "full" sigma-point
space for the calculation of error covariance. Thus we have a total of 217 sigma-points,
hence 217 ensemble members. Figure 5.1 - (a) shows the state estimate using the SPUKF. As can be seen, SP-UKF can estimate the model states similar to true values,
indicating good capability of the original SPKF methods in estimating model states.
In the second case we use the reduced sigma-point subspace to calculate the error
covariance. In this case we select 20 sigma-points which account for more than 90%
of the total variance. The result of this experiment is shown in Fig. 5.1 - (b). As can
be seen, the model states can be fairly well estimated by the reduced SPKF, although
its estimate accuracy is not as good as the original SPKF. This suggests a possible
solution to apply SPKF for high dimensional systems. For the sake of completeness,
we performed an EnKF assimilation experiment with 20 ensembles. The ensemble is
generated using the same approach as the previous experiment with Lorenz '63 model,

5.2

EXPERIMENTS WITH THE LORENZ '95 MODEL

64

(c

10

t

+

"** -tA

t

\MJ\

5
•

X1
0.

-5

f f +
+

+

enkf

+
-10

,

!

,

A

W-

/

,

-

Obs
True

-

Figure 5.1: 36 Variable Lorenz *95 model: Assimilation solutions for X\. The black curve
is the true model, + indicates the noisy observations and the red trajectory indicates the
assimilated solutions. (a)-Case 1, (b)-Case 2, (c)- EnKF with 20 ensembles.

5.2

EXPERIMENTS WITH THE LORENZ '95 MODEL

65

where we used 19 ensembles (Fig.4.3-a). The result of this experiment is shown in
Fig. 5.1 - (c). Comparing Fig. 5.1 - (b) and Fig. 5.1 - (c) reveals that the reduced
SPKF is better than the EnKF for the state estimate, especially for the magnitude
estimate. It is apparent that the EnKF underestimates the magnitude of model states
during the transition phase period. The RMSE values for the above experiments are
summarized in Table 5.1.

Assimilation Method

RMSE

SP-UKF (using 241 SPs)

1.7403

Reduced space SP-UKF (using 20 SPs)

3.3491

Standard EnKF (using 20 ensembles)

3.5308

Table 5.1: : 36 Variable Lorenz *95 model: RMSE statistics for XI

We also performed the SPKF assimilation experiment for 960 variable Lorenz '95
model. The experimental setup is identical to that in the previous cases except that
K = 960. Two cases are studied with the model. In the first case we use all sigmapoints, which constitutes a total of 5761 sigma-points, and in second case we use
200 important sigma-points for the calculation of error covariances. The results of
these experiments are shown in Fig. 5.2. For comparison, we also performed an
EnKF assimilation experiment with 200 ensembles, and the result of this experiment
is shown in Fig. 5.2 - (c). Apparently the reduced SPKF leads to a better estimate
than the EnKF in both phase and magnitude simulation. As can be seen in the
EnKF, the estimated state is often out of the phase of "true" trajectory, which is
absent in the reduced SPKF. The correlation between the estimated trajectory and
"true" trajectory is 0.59 for the reduced SPKF against 0.10 for the EnKF. Table 5.2
shows the comparison between the RMSE values for different experiments.
A great deal of additional research is needed for better design and implementation
of these techniques applied to atmosphere/ocean GCMs for state, parameter and joint

EXPERIMENTS WITH THE LORENZ '95 MODEL

5.2

Figure 5.2: 960 Variable Lorenz 95 model: Assimilation

solutions for X\.

curve is the true model, + indicates the noisy observations and the red trajectory
the assimilated solutions.

(a)-Case 1, (b)-Case 2, (c)-EnKF

with 200 ensembles.

Assimilation M e t h o d

RMSE

SP-UKF (using 5761 SPs)

2.2397

Reduced space SP-UKF (using 200 SPs)

3.5864

Standard EnKF (using 200 ensembles)

4.6240

Table 5.2: ; 960 Variable Lorenz 95 model: RMSE statistics for XI

66

The black
indicates

5.2

EXPERIMENTS WITH THE LORENZ '95 MODEL

67

estimation problems. However, the above experimental results are promising, and a
variety of possible extension to these techniques could be developed to deal with more
complicated situations.

Chapter

6

Conclusion
The EKF and EnKF, two important Kalman-type filters, have been widely applied
for atmospheric and oceanic data assimilation due to efficient and simple algorithms.
The major weakness of the EKF and EnKF are that the former needs to calculate
the tangent linear model or Jacobian for linearization of nonlinear forecast models
whereas the EnKF performance is greatly dependent on ensemble size, which is often
an intractable burden for computation. Yet, the EKF and EnKF cannot deal with
the systems directly if observed data are a nonlinear transformation of states.
In this study we introduced and presented two recently proposed derivative-less
sigma-point Kalman filters. The SPKF is a technique for implementing a derivativeless optimal estimation using a novel deterministic sampling approach that ensures a
set of samples to accurately estimate forecast error statistics. It is unlike EnKF where
a random sampling strategy is used. The technique employed in SPKF is that it reinterprets the standard Kalman gain and covariance update equation in such a way
that it does not need linearization of the nonlinear prediction model and nonlinear
measurement operator. However, SPKF can capture the statistical moments of the
nonlinear model accurately using deterministic sampling technique. Thus in SPKF,
the forecast error covariance equation is computed using deterministically chosen
samples, called sigma points. In a broad sense, the SPKF algorithm can be considered
68

69

6.0

as a particular case of ensemble Kalman filter with a specific sample selection scheme.
In other words, the forecast sigma-points in SPKF algorithms are actually specific
ensembles conditioned by the specific selection schemes, which can represent the error
statistics accurately. Also, the ensemble forecast step in SPKF can be paralleled by
running each ensemble member on a separate processor of a parallel computer (or
cluster), resulting in huge computational savings.
Using the highly nonlinear low dimensional Lorenz '63 model and a higher dimensional Lorenz '95 model, we investigated the capability and performance of the SPKF
over the standard KF-based data assimilation methods for three different classes of
problems, namely state estimation, parameter estimation and joint estimation. The
results demonstrated that the SPKF has better estimate accuracy than the EKF and
the EnKF for all experiments. SPKF experiments with a higher dimensional model
suggest that it is possible to reduce the number of sigma-points thereby reducing
the computation time, by using a reduced sigma-point space approach. The results
in this study are encouraging and suggest that the SPKF could become an effective
method to assimilate observations into realistic models such as the atmospheric or
oceanic GCMs. The SPKF also has the advantage that it does not need the tangent
linear or the Jacobian operators of the original models.
The SP-UKF and the SP-CDKF data assimilation involves the calculation of the
matrix square root of the state covariance matrix, which is a computationally intensive
process. It has been shown [Van der Merwe and Wan, 2001a,b] that square-root
formulation of the SP-UKF and the SP-CDKF is numerically efficient and stable, and
has equal estimation accuracy when compared to original SP-UKF and SP-CDKF.
Since the state space dimension of the model that we used in this study is relatively
small, it is practically irrelevant to compare the numerical stability of the square-root
formulation with original SP-UKF and SP-CDKF implementation. Therefore this
issue is left for future study in GCMs.

70

6.0

In this study, we explored the SPKF using highly simplified nonlinear models.
One might be concerned by the performance and efficiency of the SPKF when a realistic GCM is used. Additional research is needed for better implementation of these
techniques applied to data assimilation problems in atmospheric or ocean GCMs.
Yet, the present study represents a step in pursuing advanced data assimilation algorithms by using a simple nonlinear model, which shares some common features with
complicated atmospheric and oceanic models.

Appendix

A

Re-interpretation of the Standard Kalman
Gain
The optimal state update equation in Kalman filter filter algorithm can be written as

dk = e~k+Kk(^k-^~k)

(A.i)

where the superscript "—"represents the prior states given by the following equations

«* = ^ [ / ( » * - i , 9 f c - i ) ]

(A.2)

fc = E[h(Ot,rk)]

(A.3)

= E[il>k] = H§;

(A.4)

Here H is the measurement operator. Thus the state and covariance update equations
can be re-written as

ek^e~k+Kk{^k-He~k)

(A.5)

P0k = (I - KkH) P-Bk

(A.6)

71

RE-INTERPRETATION OF THE STANDARD KALMAN GAIN

72

Now the "standard" Kalman gain equation is given by

Kk = P^HT[HP^HT

+ R] - l

(A.7)

where Pg is the forecast error covariance matrix. The first under-bracketed expression in Kalman gain term can be interpreted as the cross-covariance between the state
and observation errors [Gelb, 1974]:

P

6ki>k ~

(A.8)

(ek-e-k)wk-^y

E

= E

(ek-e~k)(Hdk

+
T

rk-Hd^)

(A.9)

T

(A.10)

= E (0k - rk)(dk - rk) }H
=

+ E [dkrl] - E 6krl

(A.H)

Pe^

Similarly the second under-bracketed expression in Eq. A.7 can be interpreted as the
error covariance of the difference between model and observation [Gelb, 1974]:

P

*=E

(A.12)

= E (H9k + rk - H0~k) (H0k + rk - H6~k)
= HE (0k - d~k) (0k - el)
+ HE

T"

0k - e~k) rl + E

= HP. HT + R

(A.13)

HT + E [rkrl]

(A. 14)

(Ok - el) H*

(A.15)
(A.16)

Therefore the Kalman gain can be rewritten as

K

k -

p

ek$kp$k

(A.17)

RE-INTERPRETATION OF THE STANDARD KALMAN GAIN

73

The main advantage of using this form of Kalman gain is that we can avoid the use
of measurement operator especially when the measurement operator is a nonlinear
function of the state.

Appendix

B

An Alternate Formula for Updating the State
Error Covariance Matrix
The estimation error is defined as

8k = &k — &k

(B-l)

Similarly the error between the noisy observation xpk and its prediction i^fc , is given
by

j>k = V f c - $~k

(B.2)

Substituting (B.l) into the state update equation (A.l) , we can rewrite the estimation
error as

dk =

Ok-Ok-Kk(i/>k-fc)

= 6~k- Kk^k

74

(B.3)

AN ALTERNATE FORMULA FOR UPDATING THE STATE ERROR COVARIANCE
MATRIX
75
Here we made use of the fact that the estimator is unbiased:

E

A

= 0

(B.4)

Now, the state error covariance, Pgk , and the cross covariance, Pe r , between the
state and observation error given by the equations (A.8) and (A. 12) can be rewritten
in terms of equations (B.l) and (B.2) and are given by

(B.5)

Poh = E QkQk
P

(B.6)

- —E

Taking the outer products and expectation of (B.3) produces

E 0k9k

= E

[dl-Kki,k)(d~k-Kk^k)T

= E

e e

kk

E

Wk

•T

*i

E Kk^ke~k

+E

rT.

(B.7)
Using equations (B.5) and (B.6), equation (B.7) can be rewritten as

p

ok - pek ~ pekj,kKI

- KkPjkek

+

K p

k jKl

(B.8)

Substituting the expression for Kalman gain, given by equation (A. 17) back into the
above expression, the covariance update equation is given by

p

ek - pek ~

K P

k i,kKj

(B.9)

Bibliography
Advanced

Micro

Devices,

I.,

2007:

Amd

optron

processor

tech

docs.

http://www.amd.com/us-en/processors/technicalresources/O,,30_
182„7 3 9_900 3, 0 0. html. Advanced Micro Devices.
Anderson, B. and J. Moore, 1979: Optimal Filtering. Prentice-Hall, New Jersey., 357
pp.
Arulampalam, M., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle
filter for online nonlinear/non-gaussian bayesian tracking. IEEE T. Signal Proces.,
50 (2), 188-197.
Baheti, R. S., O. Halloran, and H. R. Itekowitz, 1990: Mapping extended kalman
filters onto linear arrays. IEEE T. Automat.

Contr., A C - 3 5 , 1310-1319.

Bennet, A., 1992: Inverse Methods in Physical Oceanography. Cambridge University
Press., 223 pp.
Bergman, N., 1997: Recursive bayesian estimation. Ph.D. thesis, Linkoping Studies
in Science and Technology., 219 pp., [Available from h t t p : //www. c o n t r o l . i s y .
liu.se/research/reports/Ph.D.Thesis/PhD579.pdf].
Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press., 457 pp.
76

BIBLIOGRAPHY

77

Daum, F. E. and J. Fitzgerald, 1983: Decoupled kalman filters for phased array radar
tracking. IEEE T. Automat.

Contr., AC-28, 269-283.

Davis, R. E., 1977: Techniques for statistical analysis and prediction of geophysical
fluid systems. Geophys. Astro. Fluid., 8, 245-277.
Doucet, A., 1998: On sequential simulation-based methods for bayesian filtering.
Tech. rep., Cambridge University, Dept. of Engineering, 26 pp. [Available on the
MCMC preprint service at h t t p : / / w w w . s t a t s . b r i s . a c . u k / M C M C ] .
Doucet, A., S. Godsil, C. Andrieu, C. Berzuini, N. Best, W. Gilks, and C. Larizza,
1997: Dynamic conditional independence models and markov chain monte carlo
methods. J. Am. stat. Assoc, 92, 1403-1412.
Evensen, G., 1992:

Using the extended kalman filter with a multilayer quasi-

geostrophic ocean model. J. Geophys. Res., 97 ( C l l ) , 17905-17924.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic
model using monte carlo methods to forecast error statistics. J. Geophys. Res.,
99 (C5), 10143-10162.
Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon.
Wea. Rev., 125, 1342-1354.
Evensen, G., 2003: The ensemble kalman filter: theoretical formulation and practical
implementation. Ocean Dynam., 53, 343-367.
Gamage, N. and W. Blumen, 1993: Comparative analysis of low-level cold fronts:
Wavelet, fourier, and empirical orthogonal function decompositions. Mon. Wea.
Rev., 191, 28672878.
Gauthier, P., 1992: Chaos and quadric-dimensional data assimilation:a study based
on the lorenz model. Tellus, AAA, 2-17.

BIBLIOGRAPHY

78

Gelb, A., 1974: Applied Optimal Estimation. MIT Press.
Ghil, M. and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and
oceanography. Adv. Geophys., 33, 141-265.
Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novell approach to nonlinear
or non-gaussian bayesian filtering. IEE Proc-FP., IEEE, 107-113.
Hasselmann, K., 1988: Pips and pops, a general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation
patterns. J. Geophys. Res., 93, 11015-11021.
Haykin, E., S., 2001: Kalman Filtering and Neural Networks. Wiley., 128 pp.
Houtekamer, P. and H. Mitchell, 1998: Data assimilation using an ensemble kalman
filter technique. Mon. Wea. Rev., 126, 796-811.
Ito, K. and K. Xiong, 2000: Gaussian filters for nonlinear filtering problems. IEEE
T. Automat.

Contr., 45(5), 910-927.

Jazwinsky, A., 1970: Stochastic Processes and Filtering Theory. Academic Press, New
York.
Julier, S., 1998: A skewed approach to filtering. SPIE Conference on Signal and Data
Processing of Small Targets., Orlando, Florida, USA, SPIE, 271-282.
Julier, S., 2002: The scaled unscented transformation. Proceedings of the American
Control Conference, IEEE, 4555-4559.
Julier, S., 2003: The spherical simplex unscented transformation. Proceedings of the
American Control Conference, IEEE, 2430-2434.

BIBLIOGRAPHY

79

Julier, S. and J. Uhlmann, 2002: Reduced sigma-point filters for the propagation of
mean and covariances through nonlinear transformations. Proceedings of the American Control Conference, IEEE, 887-892.
Julier, S. and J. Uhlmann, 2004: Unscented filtering and nonlinear estimation. Proceedings of the IEEE, IEEE, Vol. 92(3), 401-422.
Julier, S., J. Uhlmann, and H. Durrant-Whyte, 1995: A new approach for filtering
nonlinear systems. Proceedings of the American Control Conference, Seattle, WA,
USA, IEEE, 1628-1632.
Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a
parallel ensemble kalman filter. Mon. Wea. Rev., 128, 1971-1981.
Kivman, G. A., 2003: Sequential parameter estimation for stochastic systems. Nonlinear Proc. Geoph., 10, 253-259.
Kotecha, J. and P. Djuric, 2001: Gaussian sum particle filtering for dynamic state
space models. IEEE International

Conference on Acoustics, Speech, and Signal

Processing, Salt Lake City, Utah, IEEE., Vol. 6, 3465 - 3468.
Kotecha, J. and P. Djuric, 2003: Gaussian sum particle filtering. IEEE T. Signal
Proces., 51 (10), 2602-2612.
Lefebvre, T., H. Bruyninckx, and J. De Schutter, 2002: Comment on - new method
for the nonlinear transformation of means and covariances in filters and estimators
deterministic non-periodic flow.". IEEE T. Automat.

Contr., 47(8), 1406- 1409.

Lermusiaux, P. F. J., 1997: Error subspace data assimilation methods for ocean field
estimation: Theory, validation and applications. Ph.D. thesis, Harvard University,
Cambridge, MA., 402 pp., [Available from Harvard Archives Microfilm, Harvard
Depository HU 90.14492.9].

BIBLIOGRAPHY

80

Lermusiaux, P. F. J. and A. R. Robinson, 1999: Data assimilation via error subspace
statistical estimation, part i: Theory and schemes. Mon. Wea. Rev., 127, 13851407.
Lorenz, E., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 13-141.
Lorenz, E., 1965: A study of the predictability of a 28-variable atmospheric model.
Tellus, 17, 321-333.
Lorenz, E., 2005: Designing chaotic models. J. Atmos. Sci, 62, 1574-1587.
Lorenz, E. and K. A. Emmanuel, 1998: Optimal sites for supplementary weather
observations: Simulations with a small model. J. Atmos. Sci., 55, 399-414.
Lorenz, E. N., 2006: Predictability - a problem partly solved. Predictability of Weather
and Climate, T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40-58.
Mathworks., T., 2007: Matlab - the language of technical computing, h t t p : / / w w w .
m a t h w o r k s . c o m / p r o d u c t s / m a t l a b / i n d e x . h t m l . The MathWorks, Inc.
Meyers, S. D., B. G. Kelly, and J. J. O'Brien, 1993: An introduction to wavelet
analysis in oceanography and meteorology: With application to the dispersion of
yanai waves. Mon. Wea. Rev., 121, 28582866.
Miller, R., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly
nonlinear dynamical systems. J. Atmos. Sci., 5 1 , 1037-1056.
Navon, I., 1998: Practical and theoretical aspects of adjoint parameter estimation and
identifiability in meteorology and oceanography. Dyn. Atmos. Oceans, 27, 55-79.
Nelson, A. T., 2000: Nonlinear estimation and modeling of noisy time-series by dual
kalman filtering methods. Ph.D. thesis, Oregon Graduate Institute.

BIBLIOGRAPHY

81

N0rgad Magnus, P., N. K. Poulsen, and O. Ravn, 2000a: Advances in derivative-free
state estimation for nonlinear systems. Tech. rep., Dept. of Mathematical Modeling,
Technical University of Denmark., 28 Lyngby, Denmark.
N0rgad Magnus, P., N. K. Poulsen, and O. Ravn, 2000b: New developments in state
estimation of nonlinear systems. Automatica., 36, 1627-1638.
Ohmuro, T., 1984: A decoupled kalman tracker using los coordinates. Proc. Int. Symp.
Noise and Clutter Rejection in Radars and Imaging Sensors, IEEE, 451-455.
Palmer, T., 1993: Extended-range atmospheric prediction and the lorenz model. Bull.
Amer. Meteor. Soc, 74, 49-65.
Papoulis, A. and S. Unnikrishna Pillai, 2002: Probability, Random

Variables, and

Stochastic Processes. 4th ed., McGraw-Hill.
Penland, C , 1989: Random forcing and forecasting using principal oscillation pattern
analysis. Mon. Wea. Rev., 117, 2165-2185.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992: Numerical
Recipes in C: The Art of Scientific

Computing. Cambridge University Press, 2

Edition.
Robert, C. P., 1994:

The Bayesian

Choice:

A Decision-Theoretic

Motivation.

Springer Texts in Statistics, Springer-Verlag, 436 pp.
Schei, T. S., 1997: A finite-difference method for linearization in nonlinear estimation
algorithms. Automatica., 33, 2053-2058.
Schnur, R., G. Schmitz, N. Grieger, and H. von Storch, 1993: Normal modes of the
atmosphere as estimated by principal oscillation patterns and derived from quasigeostrophic theory. J. Atmos. Sci., 50, 2386-2400.

BIBLIOGRAPHY

82

Simon, D., 2006: Optimal State Estimation, Kalman, H^, and Nonlinear Approaches.
1st ed., Wiley-Interscience, 318-320 pp.
Teman, R., 1991: Approximation of attractors, large eddy simulations and multiscale
methods. Proc. Roy. Soc. London., 434A, 23-29.
Van der Merwe, R., 2004: Sigma-point kalman filters for probabilistic inference in
dynamic state-space models. Ph.D. thesis, Oregon Health &; Science University.
Van der Merwe, R., A. Doucet, N. de Freitas, and E. Wan, 2000: The unscented particle filter. Tech. rep., Cambridge University Engineering Department, 46 pp. [Available at

http://citeseer.ist.psu.edu/artile/vandermerwe00unscented.

html].
Van der Merwe, R. and E. A. Wan, 2001a: Efficient derivative-free kalman filters for
online learning. Proceedings of ESANN, Bruges.
Van der Merwe, R. and E. A. Wan, 2001b: The square-root unscented kalman filter
for state and parameter estimation. Proceedings of the International
Acoustics, Speech, and Signal Processing (ICASSP).,

Conference on

Salt Lake City, Utah, IEEE.

Van der Merwe, R., E. A. Wan, and S. I. Julier, 2004: Sigma-point kalman filters
for nonlinear estimation and sensor fusion: Applications to integrated navigation.
AIAA

Guidance, Navigation and Control Conference, Providence, RI, American

Institute of Aeronautics and Astronautics, 5120-5122.
Van Trees, H. L., 2001: Detection, Estimation and Modulation Theory. Part I. WileyInterscience, New York, 720 pp.
von Storch, H., G. Burger, R. Schnur, and J. S. von Storch, 1995: Principal oscillation
patterns: A review. J. Climate, 8, 377-400.

BIBLIOGRAPHY

83

von Storch, H. and C. Prankignoul, 1997: Empirical modal decomposition in coastal
oceanography. Processes and Methods, K. Brink and A. R. Robinson, Eds., John
Wiley and Sons, The Sea: The Global Coastal Ocean I., Vol. 10, 419455.
Wallace, J. M., C. Smith, and C. S. Bretherton, 1992: Singular value decompositions
of winter time sea surface temperature and 500-mb height anomalies. J. Climate,
5, 561-576.
Wan, E. and R. Van Der Merve, 2000: The unscented kalman filter for nonlinear
estimation. Adaptive Systems for Signal Processing, Communications,

and Control

Symposium ASSPCC 2000, Lake Louise, Alberta, Canada, IEEE Communications
Society, 153-158.
Wan, E. A. and R. Van der Merwe, 2001: Kalman filtering and neural networks,
haykin, s, ed. Wiley., chapter 7 - The Unscented Kalman Filter., chapter 7 - The
Unscented Kalman Filter.
Weare, B. C. and J. Nasstrom, 1982: Examples of empirical orthogonal function
analysis. Mon. Wea. Rev., 110, 481-485.
Welch, G. and G. Bishop, 1995: An introduction to the kalman filter, technical report:
Tr95-041. Tech. rep., University of North Carolina, Chapel Hill, NC.