LUMPING OF ATMOSPHERIC ORGANIC CHEMICAL SPECIES
BY MACHINE LEARNING
by
Pruthvi Polam
B.E., Mechanical Engineering
Bangalore University (India), 1999

THESIS SUBMITTED IN PARTIAL EULFILLMENT OF
THE REQUIREMENTS FOR THE DECREE OF
MASTER OF SCIENCE
in
MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES
(CHEMISTRY AND COMPUTER SCIENCE)

THE UNIVERSITY OF NORTHERN BRITISH COLUMBIA
April 2006
© Pruthvi Polam, 2006

1^1

Library and
Archives Canada

Bibliothèque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de l'édition

395 W ellington Street
Ottawa ON K 1A 0N 4
Canada

395, rue W ellington
Ottawa ON K 1A 0N 4
Canada
Your file Votre référence
ISBN: 978-0-494-28366-0
Our file Notre référence
ISBN: 978-0-494-28366-0

NOTICE:
The author has granted a non­
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non­
commercial purposes, in microform,
paper, electronic and/or any other
formats.

AVIS:
L'auteur a accordé une licence non exclusive
permettant à la Bibliothèque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par télécommunication ou par l'Internet, prêter,
distribuer et vendre des thèses partout dans
le monde, à des fins commerciales ou autres,
sur support microforme, papier, électronique
et/ou autres formats.

The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.

L'auteur conserve la propriété du droit d'auteur
et des droits moraux qui protège cette thèse.
Ni la thèse ni des extraits substantiels de
celle-ci ne doivent être imprimés ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.

Conformément à la loi canadienne
sur la protection de la vie privée,
quelques formulaires secondaires
ont été enlevés de cette thèse.

While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.

Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

Canada

A bstract
Lumping of atmospheric chemical species into different groups is one of the
effective techniques used to reduce the complexity of the reaction mechanisms.
Since lumping of chemical species into different categories is a classification
problem, the application of machine learning by Artificial Neural Networks
(ANNs) is appropriate to address the problem from a computational perspec­
tive. The conventional notation used to represent chemical species is not in
a form which can be directly given as an input for machine learning. Issues
such as what type of chemical information is appropriate and how best it is
given as an input for ANN to obtain good results in classifying the chemical
species into different lumped categories are discussed. Both the supervised and
unsupervised learning methods are explored. The study in this thesis suggests
that supervised ANNs can be gainfully employed for lumping of atmospheric
chemical species when compared to the unsupervised ANNs.

11

C ontents

1

A b s tr a c t....................................................................................................................

ii

Table of C o n te n ts....................................................................................................

iii

List of T a b l e s ...........................................................................................................

vil

List of F ig u res...........................................................................................................

viii

A cknow ledgem ents.................................................................................................

x

In troduction

1

1.1

................................................................................

1

Atmospheric Air Quality Simulation M o d elin g ..........................

2

Chemical Mechanism Reduction M eth o d s................................................

4

1.2.1

Mechanism Reduction without Time Scale A n a ly s is ................

6

1.2.2

Kinetic Lumping Approach

..........................................................

8

1.2.3

Reduction Based on the Investigation of Time S c a l e s .............

11

1.2.4

Approximate

Chemical Mechanisms
1.1.1

1.2

1.3
2

Lumping

in

Systems

with

Time

Scale

S ep aratio n ...........................................................................................

12

1.2.5

Structural and Molecular Lumping Approach

14

1.2.6

Advantages and Disadvantages of Condensed Mechanism Ap­

..........................

proaches ..............................................................................................

20

M otivation.......................................................................................................

21

A rtificial N eural N etw orks

22

2.1

In tro d u ctio n ....................................................................................................

22

2.2

P attern Recognition

23

....................................................................................
iii

2.3

A rch itectu re....................................................................................................

24

2.4

A pplications....................................................................................................

26

2.5

Learning M eth o d s..........................................................................................

26

2.5.1

Supervised L e a r n in g .......................................................................

27

2.5.2

Unsupervised L e a rn in g ...................................................................

28

Area of R e s e a r c h ..........................................................................................

29

2.6
3

4

M eth od ology - G eneration and P ru n in g o f C hem ical Species D atabase 31
3.1

In tro d u ctio n ....................................................................................................

31

3.2

Generation of Chemical Species D atab ase................................................

31

3.3

EPI (Estimation Program Interface) S u i t e .............................................

35

3.4

Pruning of Chemical Species D a ta b a s e ...................................................

36

3.4.1

Functional Croup A p p ro a c h ..........................................................

37

3.4.2

Vapor P r e s s u r e ................................................................................

38

M eth od ology - R ep resen tation o f C hem ical Species

41

4.1

In tro d u ctio n ....................................................................................................

41

4.2

SMILES (Simplified Molecular Input Line Entry System) Notation . .

42

4.2.1

A to m s .................................................................................................

42

4.2.2

B o n d s .................................................................................................

43

4.2.3

B ranches.............................................................................................

43

4.2.4

Cyclic S tr u c tu r e s .............................................................................

44

4.2.5

A ro m a tic ity .......................................................................................

44

M atrix N o tatio n ..............................................................................................

45

4.3.1

................................................

45

The A p p ro ach .................................................................................................

50

4.3

4.4
5

Techniques used in the literature

M eth od ology - R eaction s and Lum ping

53

5.1

Tropospheric Chemical R e a c tio n s .............................................................

53

5.1.1

53

Reactions of A lk a n e s .......................................................................

IV

5.2
6

55

5.1.3

Alkyne R e a c tio n s...............................................................................

57

5.1.4

Reaction of Oxygen-containing Organic S p e c i e s .......................

58

...................................

59

Lumping Approach Employed for Classification

62

6.1

Nature of Input to A N N ..............................................................................

62

6.2

Supervised Learning - Multilayer Feedforward NeuralNetwork . . . .

63

6.2.1

68

6.4

Backpropagation A lg o r ith m ...........................................................

Unsupervised Learning - Competitive Neural networks

......................

69

6.3.1

Similarity Measure L a y e r.................................................................

71

6.3.2

Competitive Layer (or M a x n e t) ....................................................

72

6.3.3

The Combination of these Two L a y e r s ........................................

72

Usecase D ia g r a m ..........................................................................................

74

A pplication o f th e A N N for C lassification o f C hem ical Species: Im ­
plem en tation and R esults

76

7.1

Supervised Neural N e tw o rk s.......................................................................

77

7.1.1

Network Development

....................................................................

77

7.1.2

Training and Testing the Network M o d e l....................................

80

Unsupervised Neural N etw orks...................................................................

85

7.2.1

Network developm ent........................................................................

85

7.2.2

Training and Testing the Network M o d e l....................................

86

D iscu ssio n.......................................................................................................

90

7.2

7.3
8

Reactions of A lk e n e s ........................................................................

M eth od ology - A rtificial N eural N etw orks

6.3

7

5.1.2

C onclusions and Future D irection s

97

8.1

C o n clu sio n s....................................................................................................

97

8.2

Future D ire c tio n s..........................................................................................

99

A p p e n d ic e s

101

Appendix A - Back Propagation A lgorithm ......................................................

101

Appendix B - Chemical Species L is t....................................................................

104

B ib lio g ra p h y

141

VI

List of Tables
3.1 List of the empirical formulas used to generate the chemical species
d a ta b a s e ...........................................................................................................

34

4.1 Some examples of SMILES notation to represent m o lecu les.................

42

4.2 A list of chemical species with SMILES n o ta tio n ....................................

43

7.1 Number of chemical species in the d a t a s e t ..............................................

79

7.2 Network parameters adopted for supervised neural network experimen­
tation

..............................................................................................................

79

7.3 Classification accuracy of lumping chemical species into appropriate
groups with 27 and 35 hidden nodes(H N ).................................................

82

7.4 Classification accuracy of lumping chemical species into appropriate
groups with 65 and 75 hidden nodes ( H N ) .............................................

83

7.5 Best classification accuracy of chemical species into appropriate lump­
ing g r o u p s ........................................................................................................

84

7.6 Network parameters for unsupervised neural n e tw o r k ...........................

85

7.7 Examples for chemical species represented in vector notation (VN) and
normalized vector notation (NVN)

..........................................................

86

7.8 Analysis of results for alcohols - misclassification of chemical species in
supervised learning m eth o d o b tain ed from 5 ite r a tio n s .............................

91

7.9 Analysis of results for other chemical species - misclassification of chem­
ical species in supervised learning method obtained from 5 iterations

vn

92

List o f Figures
1.1 Taxonomy of air quality simulation m odeling.................................

4

2.1 A typical feedforward neural network architecture (after Figure 1in [35])

25

2.2 Competitive neural network (after Figure in [34])

25

..................................

3.1 Methodology of the re s e a rc h ..............................................................
4.1 SMILES notation for molecules with branched structure

32

.....................

43

4.2 SMILES notation for cyclic structured molecules (after figure in [48])

44

4.3 SMILES notation for aromatic chemical species (after figure in[48]) .

44

4.4 BE matrix representations th a t specify atomic connectivity and elec­
tronic environment for (left) ethane, (center) ethyl radical, and (right)
ethene (after Figure 5. in [1])

....................................................................

45

4.5 Reaction m atrix representation for (a) H-abstraction, (b) /3-scission,
(c) Recombination, (d) Bond fission, and (e) Radical addition, (after
Figure 6. in [ 1 ] ) ....................................................................................
4.6 Bond fission reaction in matrix notation

..................................................

4.7 A set of reaction pathways and its m atrix operations
4.8 Various transformations of the chemical species

6.1 Various transformations of the chemical species

48

...........................

50

.....................................

52

5.1 Scission of C-C b o n d ...........................................................................

6.2 Transfer functions

46

55

.....................................

64

........................................................................................

67

6.3 Training process for competitive neural n e tw o r k ...........................
viii

73

6.4 Usecase diagram for the s y s t e m ........................................................

74

7.1 Training of a neural n e tw o rk ..............................................................

84

7.2 Prototype weight vector (W^) formed after 1000 epochs [Columnsrep­
resent the clusters and rows represent the connectivity information]
7.3 Results obtained for unsupervised learning method
7.4 Example for the input data

.

88

..............................

89

........................................................................

93

IX

A ck n o w le d g m e n t s
I would like to express my g ratitu d e to all those who gave th e support to complete
this thesis. I am greatly indebted to my co-supervisor, Dr. M argot M andy for pro­
viding suggestions and encouragement which helped me in all th e tim e of research
and w riting of this thesis. Her comments have been of greatest help a t all times. My
gratitude also goes to my co-supervisor Dr. Charles Brown for his suggestions and
encouragement th a t led to substantial improvements of this thesis. I th an k Dr. Peter
Jackson for serving on my graduate com m ittee and m onitoring th e work w ith his
valuable suggestions. I would also like to th an k th e external exam iner for reviewing
this thesis.
I also extend my sincere thanks to Dr. Alex A ravind and Dr. M aheshwari for their
guidance and assistance during my education.
I th an k th e chair of Com puter Science departm ent, Dr. W aqar Haque, for the
departm ental research assistantship and th e chair of Chem istry departm ent. Dr. Ron
Thring, for providing me w ith teaching assistantship. I would also like to th an k my
co-supervisor Dr. M argot M andy for the financial support from her N atural Sciences
and Engineering Research Council of C anada grant.
I th an k A lida Hall and Janis Shandro for their support during Teaching Assis­
tantship.

I acknowledge all persons in th e program s of Chem istry and C om puter

Science a t th e University of N orthern British Columbia, for their efforts during my
education. I th an k Sr ini vas, Baljeet, Jeyaprakash, Joanne and K ouhyar for their nice
company and suggestions during my education. Lastly, b u t m ost im portantly I am
very grateful for th e love and support of my parents N arasim ha R eddy and V anaja
and my fiancée Swapna Reddy to fulfill my dreams.

X

C hapter 1
Introduction
1.1

C hem ical M echanism s

A chemical mechanism is a detailed description of the sequence of elementary pro­
cesses which occur during an overall chemical reaction. It includes a list of all primary,
secondary, and intermediate reactions which gives certain, essentially quantitative, in­
formation about the fate of the chemical species. The chemical mechanisms which
describe pyrolysis, combustion, atmospheric, and oxidation chemistry of even light
hydrocarbons can be extremely complex [1]. Hundreds or thousands (or more) of
kinetically significant chemical species, elementary reactions, and a large number of
reactive intermediates can be involved. In atmospheric chemistry, even relatively mi­
nor emissions into the atmosphere can play an im portant role in the formation of
undesirable byproducts, and their properties and reactions need to be modeled in
some detail in order to make accurate predictions. There are many existing models
for combustion, pyrolysis, and atmospheric chemistry. For example, in the pyrolysis
process, the consideration of only species with two or fewer carbon atoms generated 11
chem ical species a n d 55 chem ical reactions, but w hen th e num ber of c a rb o n atom s was

increased to three, 99 chemical species and 611 chemical reactions were generated [1].
In combustion systems, kinetic models have thousands of elementary reactions and
a large number of reactive intermediates.
1

For example, there are 3,662 chemical

reactions involving 470 chemical species considered in the simulations of n-hexane
combustion by Glande and co-workers [2] and 479,206 reactions and 19,052 species in
simulation of tetradecane combustion performed by De W itt and co-workers [3].

1.1.1

A tm o sp h er ic A ir Q u a lity S im u la tio n M o d e lin g

Ozone is not directly emitted into the atmosphere, but ozone and other oxides are
formed by the complex reactions between nitrogen oxides (NO^) and reactive organic
chemical species. It requires reliable and scientifically valid methods to formulate
appropriate and cost-effective control strategies to estimate the type of emission re­
ductions needed to reduce the formation of ozone.
Air quality simulation models can be used to address these kinds of problems. Air
quality simulation models are designed to estimate parameters related to air quality
th a t cover large geographic regions. These types of problems can be addressed using
available models of chemical and physical processes which influence the formation of
ozone. Apart from the meteorological data, an im portant component of such mod­
els is the gas-phase chemical reaction mechanism which is used to describe the fate
of emitted chemical species in the atmosphere. Atmospheric modeling can be bro­
ken down into three main components: atmospheric meteorology; emission inventory
modeling which includes quantity, location, and rate of pollutant emissions; and the
chemical mechanism.
1. M eteorological Data: Local and regional scale meteorological processes pro­
vide information such as wind speed, wind direction, cloud cover, and turbu­
lence th at affect the transport, dispersion, deposition, and chemistry of airborne
pollutants.
2. E m ission Inventory: The emissions of biogenic and anthropogenic organic
chemical species and their precursors are simulated. This simulation includes
the quantity, location, and the rate of chemical species emissions which are
required to gain an accurate understanding of the different emission sources.

3. A tm ospheric C hem ical M echanism :

The atmospheric chemical mecha­

nisms within air quality models have grown in complexity.

These chemical

mechanisms can be either constructed manually or the process can be auto­
mated. In manual construction of the detailed chemical mechanisms, chemists
examine which chemical species are most likely to be present in the system and
which reactions are likely to occur under appropriate conditions. The inter­
conversion between reactants and products also introduces a large number of
intermediate species. A huge number of highly coupled reaction steps have to
be examined and there is always a possibility of human error. Since atmospheric
chemistry involves many different organic species, the number of reactions may
become too unmanageable for application in models used to describe a certain
airshed. Manual construction of the chemical mechanisms is extremely timeand labor-intensive to develop even for simple systems. This is because the
emissions of even a light hydrocarbon into atmosphere can involve hundreds of
kinetically significant chemical species, elementary reactions and reactive inter­
mediates.
For the past two decades, it has been identified th at the process of construct­
ing chemical kinetic models could be computationally automated [1,4-8]. In
order to automate the mechanism generation the following points have to be
considered:
(a) The structure of the chemical species should be stored in a form which can
be accessed and manipulated computationally.
(b) All combinations of a species must be considered, but a specific reaction
must not be produced twice.
(c) A program should be able to parameterize all the reactions based on em­
pirical rules associated with the type of reaction and the size and structure
of the reactants.
(d) Because a reaction generator produces a very large number of possible

reactions, it should be possible to filter out automatically the reactions
which are obviously unimportant.
The chemical mechanisms are the most computationally intensive aspects of
photochemical air quality simulation models. This is because of the presence of
thousands of atmospheric chemical species and reactions as well as the amount
of computer time required for the numerical integration of the rate equations
associated with thousands of chemical reactions. This computational burden is
partly due to the fact th a t atmospheric chemical kinetic systems are very “stiff” ,
and involve changes associated with disparate time scales [9]. This becomes a
serious limitation for the application of simulations. The taxonomy of the air
quality simulation models is as shown in the Figure 1.1.

( Modeling Atmospheric Chem is^y at Urban and Regional Scale

(

Chemical M echanism )

(M anual Constructuion J

( Meteorological Data

(

)

Other Inputs

( Automation

(D etailed Chemical M echanism ]

(C ondensed Chemical M echanism ]

(o th e r Grouping Techmques^^^^ ^ ^

(M athem atical A pproach^ ( Structural Approach ) (M olecular A pproach)

Figure 1.1: Taxonomy of air quality simulation modeling

1.2

C hem ical M echanism R ed u ction M eth od s

A fully explicit mechanism for representing gas-phase atmospheric chemistry would
contain 20 000 or more reactions and thousands of chemical species. Due to the large

numbers of chemical species and reactions present in atmospheric chemical mecha­
nisms and limited computational resources, explicit chemical mechanisms are gener­
ally not used in atmospheric air quality simulation models. Rather, the mechanisms
for air quality models are highly condensed in a various ways to substantially reduce
the number of reactions and species in order to be computationally tractable while
maintaining accuracy. Even the developers of highly detailed mechanisms adopt some
method to limit the size. Lumping of chemical species has been widely employed in
the development of condensed mechanisms. Several condensed chemical mechanisms
have been designed in tractable form for air quality simulation modeling. Some of
the mechanism reduction methods are:
1. Mechanism reduction without time scale analysis [10,11]
• Identifying redundant species
• Identifying redundant reactions
• Sensitivity of temperature to rate coefficient
2. Formal lumping procedures [11-13]
3. Reduction based on the investigation of time scales [11,14-16]
• Low-dimensional systems
• Jacobian analysis
• Computational singular perturbation theory
• Slow/inertial manifolds
4. Approximate lumping in systems with time scale separation [17-19]
5. Structural and Molecular lumping approach [20-24]

1.2.1

M ech a n ism R e d u c tio n w ith o u t T im e S cale A n a ly sis

The first step for the mechanism reduction is to find the subset of the detailed mech­
anism which consists of fewer chemical reactions and species and which still describes
the system adequately. The reduced mechanism may be tailored later according to
specific requirements. The primary stage for finding an appropriate mechanism is to
find the redundant species.
Identifying R edundant Species
Species in a chemical mechanism can be classified into 3 categories: Im portant species
which include reaction products or initial reactants; necessary species are the chemical
species which assist in accurate reproduction of the concentration profiles of important
species, tem perature profiles or other im portant reaction features; and the remaining
species which are redundant species [11]. Two methods have been proposed to identify
redundant species.
1. If a species has no consuming reactions, a change in its concentration has no
influence on the concentration of the other species. Therefore, a species which
does have consuming reactions could be classified as redundant if the elimination
of the reactions th a t consume it has no significant effect on the output of the
model when compared with the full model.
2. A species may be considered as redundant if its concentration change has no
effect on the rate of production of im portant species. The Jacobian, J =
where / is rate of production of species and c is concentration, of the ordinary
differential equations which describe the kinetic system is used for this inves­
tigation. An element of the normalized Jacobian,

shows the fractional

change of the rate of production of species i caused by the fractional change
of the concentration of the species j. The influence of the change of the con­
centration of species i on the rate of production of an A-membered group of

important species is given by the sum of squared elements of the normalized
Jacobian [11].

n—l ^

^

The higher the Bi value for the species, the greater its direct effect on the concen­
trations of the im portant species. This provides a quantitative measure allowing the
identification of possible redundant species.
Identifying R edundant R eactions
A reaction is also considered to be redundant if its contribution to the production
rate of each necessary species is small throughout the modeling regime. To do this,
all reaction contributions to each necessary species at several reaction times need to
be considered. This can require the analysis of very large matrices. An alternative
technique for the reduction of the mechanism by eliminating the redundant reactions
is through overall sensitivity measures and principal component analysis of the rate
sensitivity matrix [11]

where
Vÿ is the stoichiometric coefficient of species i in reaction j,
Rj is the rate of reaction j,
kj is the rate coefficient for reaction j, and
f i is the rate of production of species i.
The reactions whose contributions, on the basis of eigenvalues of F^F, are below a
desired precision threshold may be eliminated in th a t region.

S en sitiv ity o f Tem perature to R a te CoeiRcients
Temperature is one of the important features in the reaction modeling, especially in
combustion modeling. Reactions may be modeled over wide ranges of temperature
profiles. Therefore, the sensitivity of the rate of change of tem perature to a change in
rate parameters is of importance. The temperature sensitivities become useful when
the reduced model is required only to produce accurate temperature profiles [11]. The
normalized temperature rate sensitivity is given by
dln{dT/dt)
dlnkj

~QjRj
Cp{dT/dt)

(1.3)

where
T is the temperature,
kj is the rate coefficient for reaction j,
Qj is the exothermicity of the reaction step j,
Rj is the rate of reaction j, and
Cp heat capacity per unit volume.

1.2.2

K in etic L u m p in g A p p roach

In this approach, new lumped variables are related to the original variables by a
mathematical lumping function which can be either linear or nonlinear and depends
on the original species’ concentrations as well as on other significant parameters. The
main aim in this approach to lumping is to identify mathematical procedures which
can be applied to a general reaction system and provide an automatic algorithm
for reduction. Developments of such procedures will involve rigorous mathematical
principles.
Linear methods, where the new species are represented as linear combinations of
the original ones, work well for linear kinetic systems and also provide some degree
of reduction th at may be appropriate for nonlinear schemes. Nonlinear methods are

more general, but can involve complicated algebraic methods which might limit their
use.
The terms “exact” lumping and “approximate” lumping have been used to dis­
tinguish whether the lumped model has used approximations. The technique used
in the kinetic lumping approach is an exact lumping method, which represents the
exact features of the full model.
The kinetics of a dynamic system with n dependent variables can be described
by an n-dimensional ordinary differential equation system dy/ dt — f{y). In mathe­
matical terms, lumping reduces the system to n dimensions if a differential equation
system dy/dt = f{y) can be found th at adequately models the kinetics of interest,
where
n <n,

ÿ = /iW,
h is some linear or nonlinear function of the original variable y,
y represents the lumped species.
Linear lum ping
In effect, the objective of the linear lumping is to construct an n x n lumping matrix
M and its inverse y = M~^y such th a t the original set of species is mapped into n
lumped species.

,

For example M = \

1 1 0

,

| implies th a t y \ = y i + 3/2 and y 2 — ys

0 0 1
A method of determining M is to define the m atrix J^{y) as an expansion

= A + "^Akak{y)

( 1.4 )

k=l

and finding the n subspace th a t is simultaneously invariant with respective all
where, J^{y) is the transpose at Jacobian of f(y) and Ak is viewed as a set of basis
matrices of J^{y). The problem in this approach is finding the invariant subspaces
9

of the original equations, i.e. invariant subspaces of the transpose of the J'^{y) so
th at the eigenvalues of

and J ^ { M~^ My ) are identical. The eigenvalues of the

reduced system at a fixed point form a subset of the eigenvalues for the full equations.
By choosing which eigenvalues are retained it is possible to ensure the equivalence of
the full and reduced systems [11-13]. The simplest technique to find the invariant
subspaces is to determine eigenvalues and eigenvectors of A, where A = Y^I^ qA^
which is equivalent to the invariant subspaces of J'^{y). If the eigenvector matrix is
given by X = (xi, Xg, .....,x„), where xi, xg, ..... ,x„ represent the column of X, then
the subspaces are given by the span of these columns. The lumping matrices M of
different dimensions can then be formed by taking the linear combinations of these
columns of X.
Linear lumping cannot always find suitable lumping schemes or achieve the degree
of reduction required to give an efficient model, especially if the system is nonlinear.
Either the system has to be considered as locally linear by applying the analysis over
suitable time periods or by developing nonlinear lumping methods, which will give
more flexibility in how the lumped species can be represented as a function of the
original species. The first approach can slow down the calculations because of the
necessity of switching between different lumped schemes for different time periods.
N onlinear Lum ping
It is not always possible to achieve linear lumping with the desired lower dimension.
For problems which are highly nonlinear in nature, the lumping scheme has to be
applicable for the desired reaction period. One method for this is to consider the
system to be locally linear over short time periods. If the system is highly nonlinear
then, a high degree of reduction by linear lumping may not be achieved. This is
because it m ay require a large num ber of lum ping schemes. A nonlinear analysis m ay

be a better approach to solve these problems. The disadvantage of this approach
is th at the generalization to nonlinear lumping may involve complicated analytical
theory but this may be compensated by better accuracy of the final model.
10

1.2.3

R e d u c tio n B a se d on th e In v estig a tio n o f T im e Scales

The computational expense of the modeling of chemical kinetic systems is not only
due to the large number of chemical species involved but also due to the fact th a t the
reactions involve a large range of time scales. Such chemical systems are considered
to be stiff because of the range of reaction rates. The removal of the fast species will
automatically lead to a reduction in the stiffness of the system and a computationally
explicit integration scheme may be used. Reduction techniques th at take advantage
of this separation on the basis of time scale analysis include methods such as quasi­
steady-state approximation (QSSA) [25], sensitivity analysis [26], inertial manifolds
approach [27, 28], center manifolds approach [29], and the slow manifold approach
[30,31].
The literature includes many models. Green and co-workers proposed an adaptive
chemistry approach [15] and an optimal reduced kinetic model [14] to avoid includ­
ing species and reactions in regions where they are negligible. The work presented
by Whitehouse and co-workers [16] describes the application of systematic and auto­
mated methods for reducing complex mechanisms while maintaining the accuracy of
the model with respect to important species and features.
A full mechanism which consists of 10,763 reactions and 3,487 species was first
considered. Redundant species were identified and removed from the mechanism. As
a result, 1,396 species and 1,224 reactions were removed from the mechanism. Later,
redundant reactions were removed by calculating the local rate sensitivity matrix and
using overall sensitivity measures and principal component analysis to interpret the
resulting matrix. By using overall sensitivities the mechanism was reduced to 8,410
reactions. A further reduction is done by a second stage of identification of redundant
species using a smaller set of im portant species. Eigenvector - eigenvalue analysis of
the Jacobian is carried out leaving only 6,927 reactions in the mechanism. A further
reduction is carried out by applying principal component analysis on the sensitivity
matrix. Despite extensive experimentation with possible thresholds only 8 reactions
were removed while maintaining the desired accuracy. Finally the mechanism re­
11

duction involved identifying QSSA species and removing them by defining algebraic
expressions for the QSSA species as functions of the non-QSSA species.
These approaches did not always lead to lumped models th a t performed ade­
quately. A more detailed description of these methods is illustrated by Tomlin and
co-workers in [11]. To address the problems in the above mentioned approaches a
technique called approximate lumping for systems with time scale separation has
been developed.

1 .2 .4

A p p r o x im a te L u m p in g in S y ste m s w ith T im e Scale
S ep aration

Exact lumping may not meet the practically desired goals, therefore approximate
nonlinear lumping is more readily used for realistic systems. Li and Rabitz have
presented a general analysis of exact nonlinear lumping which can serve as a starting
point for the development of an approximate nonlinear lumping [17,18].
The kinetics of dynamical systems with n dependent variables can be described
by an n-dimensional ordinary differential equation system d y / d t = f{y)- The linear
differential operator A is defined as

The lumping scheme can then be related to identifying a new basis such th at A
possesses a canonical form in which the corresponding differential equations are par­
tially or completely decoupled. This operator has a one-to-one relationship with the
original kinetic equation system. Exact nonlinear lumping is related to eigenfunctions
and generalized eigenfunctions of the linear differential operator A. These eigenfunc­
tio n s are defined in order to avoid th e difficulty in finding th e inverse of th e lumping

transformation for nonlinear systems [18]. Lumped systems are obtained after the
full determination of the generalized eigenfunctions. Generalized eigenfunctions may
not be completely determined for elementary or other simple functions, thus the ap­
12

plication of an exact lumping scheme is quite restricted. This situation may arise if
the matrix contains rows and columns th at are linear combinations of other rows and
columns.
When the conditions of exact lumping are not fulfilled, approaches using ap­
proximate lumping with small errors may be more appropriate. For linear lumping,
approximate lumping will become more exact by increasing the dimension of the
lumped model, whereas for nonlinear lumping increased accuracy is achieved by in­
cluding more terms in the lumped differential equation. To address the problems
involved in the applications of exact nonlinear lumping, Li and Rabitz developed a
modified approach to constrained nonlinear lumping using numerical methods [17].
If we wish to delete a variable j/j, which is only the function of the

^ i), then

may be constructed by the constrained nonlinear approach. Additional variables are
deleted in the same way. If k variables are thus eliminated, the resultant values of % 's
are substituted into the remaining n-k differential equations for further calculations.
This reduction of the dimension (i.e. a transformation from the basis {yi, ....,y„} to
the basis {yi,

....,y„(,3}) is achieved only if the new variable

is intro­

duced. This is a purely fast variable such th a t it approaches zero in a short time.
Therefore we can set it to zero from this point on without introducing significant er­
rors. The sufficient conditions for it being purely a fast variable are elaborated upon
by Li and Rabitz [17].
In systems with time scale separation, it is possible to carry out approximate
lumping by selecting the slow subspaces, without significant loss of information about
the system. The singular perturbation method may also be used for eliminating the
chemical species which are not of interest. Li and Rabitz [19] have employed a singular
perturbation method to determine an approximate lumped model. New fast variables
are defined, which can be nonlinear combinations of the original variables, and these
fast variables can be eliminated from the scheme using singular perturbation methods.
The application of an algebraic method in nonlinear perturbation theory requires the
existence of a group of small parameters e,. Identifying the e/s can be a nontrivial

13

task even for a small set of equations. Numerical investigation of time scales is used
to minimize this problem. The key point of the singular perturbation method for
approximate lumping is to separate a purely fast variable vector (j>{i) from the vector
z(t), leading to the elimination of 0 (t) from the lumped differential equation system.
It is assumed th a t these fast species equilibrate rapidly. The resulting lumped scheme
will be valid over a wide range of conditions. The faster the variables (f), the fewer
are the terms needed in the approximation. Computational singular perturbation
methods can be used to characterize the relative time scales in order to identify the
faster species. This approach is related to QSSA approach with some extra terms
added. The application of singular perturbation methods over QSSA has showed
significant improvement in the accuracy of the resulting models [11].

1.2.5

S tru ctu ra l an d M olecu lar L u m p in g A p p roach

There are two main diagnostic lumping approaches employed in the literature apart
from the approaches discussed earlier in this chapter. They are the structural ap­
proach and the molecular approach. In the structural approach, the molecular struc­
tures or functional groups within the hydrocarbon molecules provide the lumping
category. In the molecular lumping approach, the numerous emitted organic com­
pounds are represented by a limited number of species, each of which represents a
certain class of compounds. The principal requirement for this approach is th at the
average behavior of the lumped categories must not depart substantially from the
behavior of the individual compounds th a t are lumped. Carbon Bond IV Mechanism
(GEM IV) [22] is an example of the lumped structure approach which was developed
by Grey in 1988-89. The Statewide Air Pollution Research Center (SAPRC) mech­
anism [20] is an example of the lumped molecule approach which was developed by
Carter in 1990. Other lumping approaches are used in the RADM (Regional Acid
Deposition Model) mechanism [24], and RACM (Regional Atmospheric Chemistry
Mechanism) [23] developed by Stockwell. The morphecule approach is a new ap­
proach which is under the development at University of North Carolina [21]. Each of
14

these previously published mechanisms will be discussed in detail below.
C B M IV m echanism
In the CBM IV mechanism, organic compounds are grouped together according to
bond type (e.g. carbon single bonds, carbon double bonds, or carbonyl bonds). The
main advantage of this structure-lumping approach is th at fewer surrogate categories
are needed to represent bond groups [22], [9]. The CBM IV mechanism was evaluated
against 170 experiments conducted in 3 different smog chambers. This mechanism
allocates the chemical species in the atmosphere into four different classes of species:
1. Inorganic species are treated explicitly without lumping.
2. Organic species are represented by carbon bond surrogates. These carbon bond
surrogates are used to describe the chemistry of three different types of carbon
bonds. These three surrogates are described as follows:
(a) The single bonded one-carbon-atom surrogate PAR is used to represent the
chemistry of alkanes and most of the alkyl groups found in other organics.
(b) The double bonded two-carbon-atom surrogate OLE (Olefins) are used to
represent the chemistry of alkenes whose carbon-carbon double bonds are
found in 1-alkenes.
(c) The third surrogate, two-carbon-atom surrogate ALD2 is used to represent
acetaldehyde and higher aldehydes th a t contain a -CHO group and adja­
cent carbon atoms. It is also used to represent 2-alkenes, because these
species react very rapidly in the atmosphere to produce aldehyde products.
3. Organic species are represented by molecular surrogates. There are two molec­
ular surrogates th at are used to represent the chemistry of aromatic hydrocar­
bons.
(a) The surrogate TOL is a seven-carbon species used to categorize monoalkylbenzene structures.
15

(b) The surrogate XYL is an eight-carbon surrogate used to represent dialkylbenzene and trialkylbenzene structures.
4. Organic species like formaldehyde, ethene, and isoprene are also treated explic­
itly because of their unique chemistry or special importance in the atmosphere.
S tatew id e Air P ollu tion R esearch C enter (S A P R C ) M echanism
In the SAPRC mechanism, the chemical species which have similar reactivity con­
tributing to the formation of ozone and other oxidants are lumped together. The
SAPRC mechanism is based on detailed model species for which kinetic and mech­
anistic parameters have been evaluated against over 500 environmental chamber ex­
periments [9], [20]. In this mechanism, the reactions of alkanes, alkenes (excluding
ethene), aromatics and biogenics are represented using generalized kinetic and mech­
anistic parameters specified by the user.

The SAPRC mechanism contains more

organic species than are represented in CBM IV.
1. Inorganic species, formaldehyde, acetaldehyde, acetone, glyoxal, methyl glyoxal
and ethene are explicitly represented in the mechanism.
2. Species such as the higher aldehydes and ketones are represented using the
surrogate species approach.
3. Species such as alkanes, aromatics, and higher alkenes are represented in the
mechanism using generalized reactions and variable kinetic as well as mechanis­
tic parameters assigned for each species by the user.
4. Species such as haloalkanes and haloalkenes, for which reaction mechanisms are
highly uncertain, are represented by the generic mechanism species.
According to the principles of this approach, organic species can be grouped into
three functional groups such as alkanes, alkenes, and aromatics. Alcohols and ethers
are estimated to have similar mechanistic reactivity characteristics to alkanes, there­
fore they are lumped in the same group with alkanes . W ithin each group, the organic
16

species can be specified further according to their reaction rates with the OH radicals
or other oxidizing agents. Generally, there are three classes which can be specified
within each of these three groups:
1. Slowly reacting species, for which only a relatively small fraction reacts during
the model simulation.
2. Rapidly reacting species, which are essentially reacted completely during a oneday simulation.
3. Species with intermediate reaction rates, which fall in neither of the other two
categories above.
Thus within each of these three classes, the organic species can be lumped together.
R A D M (R egional A cid D ep osition M odel) M echanism
The RADM mechanism was developed by Stockwell in 1990 [24] and has been used in
U.S. Environmental Protection Agency’s Regional Acid Deposition Model (RADM).
Like the SAPRC mechanism, the RADM is also a generalized species mechanism.
The hydrocarbons are represented using lumped species with fixed rather than user
specified parameters.
1. Inorganic species, methane, formaldehyde, ethane, ethene, and isoprene are
explicitly represented in the mechanism.
2. Alkanes other than methane and ethane are represented using three species with
respect to the rate of reaction with OH falling within the following range:
(a) Between 3.4 x

and 6.8 x 10“ ^^ ppm~^ min“ ^

(b) Less than 3.4 x 10“ ^^ ppm~^ min“ ^
(c) Greater than 6.8 x 10“ ^^ ppm“ ^ min“ ^
3. Alkenes other than ethene and isoprene are represented using two species:
17

(a) First surrogate species (propene) is used to represent 1-alkenes.
(b) Second surrogate species (trans-2-butene) is used to represent internal
alkenes, cyclic alkenes and dienes.
4. Aromatic hydrocarbons are simulated using two surrogate species:
(a) Toluene to represent aromatics of low reactivity,
(b) Xylene to represent aromatics of high reactivity.
5. Five species are used to represent carbonyl compounds:
(a) Acetaldehyde is used as a surrogate to represent all other aldehydes other
than formaldehyde.
(b) Ketones are treated as a mixture of acetone and methyl ethyl ketone.
(c) Three species (glyoxal, methylglyoxal and a lumped unsaturated dicar­
bonyl) are used to represent dicarbonyls formed during the oxidation re­
action of aromatics.
6. Finally, generalized species are included to represent each of the following 9
organic species:
1) Alkylnitrates

6) Unsaturated PAN’s

2) Formic acid

7) Methyl hydrogen peroxide

3) Peroxyacetic acid

8) Higher organic peroxides

4) Acetic acid and higher acids

9) PAN and higher saturated acyl-

5) Cresol

peroxy nitrates

Stockwell in 1997 [23] developed another mechanism called Regional atmospheric
chemistry mechanism (RACM) which is an updated and extended version of the
RADM mechanism. A completely new condensed reaction mechanism scheme was
included for biogenic compounds like isoprene, a-pinene, and d-limonene. This new
scheme is based on recent kinetic and mechanistic data obtained for isoprene in various
laboratory studies and includes methacrolein as one of the reaction products.
18

M orphecule m echanism
Recently, a method called the morphecule approach [21] has been under development
by the University of North Carolina.

The main objective of this approach is to

eliminate some of the weaknesses in the existing condensed chemical mechanisms.
This approach centers around the use of surrogate species called morphecules, the
composition, concentration, and rate of reaction of which are updated after each
time step in the simulation. Some of the weaknesses of other approaches which the
morphecule approach is attem pting to address are as follows:
1. Hundreds of atmospheric VOCs are grouped into a few lumped surrogates re­
sulting in the loss of individual chemical species characteristics.
2. All the parameters such as reaction rate coefficients and product yields of the
lumped surrogates are kept constant throughout the simulation. The atmo­
spheric chemical mechanism progresses by depleting the more reactive species
first. In the morphecule approach, the rate coefficients for a particular lumped
surrogate and the type of products are updated at each time step.
3. Highly generic products are formed by lumping chemical species in a condensed
mechanism.
4. In all lumping mechanisms, the number of organic radicals included in the mech­
anism is limited, fn the CBM IV mechanism alkyl radicals produced from the
NO to NO2 oxidation reactions are classified as XO2 If the NO concentrations
is very low, XO2 reacts only with itself or with HO2. XOg includes all species
even when NO concentrations is very low. The morphecule approach considers
the rate of RO2 and its time evolution in more detail.

19

1.2.6

A d v a n ta g es and D isa d v a n ta g es o f C o n d e n sed M ech a ­
n ism A p p ro a ch es

Many of these kinds of chemical mechanisms are proposed in the literature, using dif­
ferent techniques and assumptions in order to represent the condensed mechanisms.
Some of the advantages and disadvantages of these condensed mechanisms are as fol­
lows:

Advantages:
It is not feasible to represent explicitly the chemistry of the hundreds of organic species
present in the atmosphere because most of the air quality simulation models require
repeated chemical calculations. The main advantage of the lumping mechanism is
th a t fewer surrogate categories are needed to represent the chemical species resulting
in fewer chemical species and therefore they can be easily implemented in the large air
quality simulation models. This greatly reduces the computational resources required.

D isadvantages:
There are always some uncertainties involved in the condensed mechanisms since
there is a lot of flexibility and judgment involved in choosing the kinetics and prod­
ucts th a t represent the whole group of organics (i.e., the timing and magnitude of
the chemical species produced during the chemical reactions). Despite the fact that
these approaches have certainly reduced the complexity by condensing the chemical
mechanism, it is still not an exact representation of the chemical processes and may
not make accurate predictions.
The major limitation of these kinds of approaches is often associated with inaccu­
racies due to the fact th at these lumped mechanisms have typically been optimized
to fit the observed time concentration profile of a specific species. In order to in­
corporate the errors and uncertainties in kinetics and mechanisms of key reactions,
these studies are frequently updated. These uncertainties vary from the one lumping

20

approach to the other depending upon the assumptions and techniques used.
The limitation th a t there is a loss of accuracy in individual simulations by con­
densed representation of the chemical mechanism may be sufficiently compensated by
increased computational tractability.

1.3

M otivation

Applying computational techniques to chemistry is becoming increasingly popular in
recent times. Computational modeling has become an essential tool to understand
and trace the atmospheric chemical species. There are many attem pts in the literature
to autom ate the generation of chemical species and reaction mechanisms [1,3-8].
Continued advances in the computational techniques are warranted because these
models need to be more accurate and efficient. Automation of chemical mechanisms
is very complex,

fn this context, lumping techniques can be used to reduce the

complexity of the process [12,13,17-24]. We have discussed the various attem pts made
in the literature for reducing the complexity of the chemical mechanisms through
lumping approaches. Most of the existing lumped models which were developed are
done by brute force methods.
fn another context, artificial neural networks are used as effective tools for pattern
recognition and classification [32]. The ability of the neural network is to generalize.
Generalization refers to the neural network producing reasonable outputs for inputs
not encountered during learning. Since lumping of chemical species is a classification
problem, from the computational perspective we believe applying neural networks is
appropriate to solve this problem. The main objective of this approach is to reduce
the “drudgery” involved in lumping problems by using a previously trained neural
network. Once trained, a neural network is then used to classify a new chemical
species which was not involved in the learning process to an appropriate lumped
category.

21

C hapter 2
A rtificial N eural N etw orks
2.1

In troduction

Artificial neural networks (ANN) are relatively crude electronic models based on the
neural structure of the brain. The brain learns from experience as it stores the infor­
mation as patterns. This process of storing information as patterns, utilizing those
patterns, and then solving problems, encompasses a new field in computing involv­
ing the creation of massively parallel networks and the training of those networks to
solve specific problems. The training of these networks is achieved by extracting the
knowledge or patterns from complicated or imprecise data and detecting trends that
are too complex to be noticed either by humans or by other computer techniques. A
trained neural network can be thought of as an “expert” in the category of informa­
tion it has been given to analyze. This expert can then be used to provide projections
in relation to new situations of interest and to answer “what if” questions [32].
When using a neural network, it is always im portant to think about how to repre­
sents the data to the neural network. The usual method of data representation for a
neural network is by a vector. Each element in the vector represents various param­
eters of the pattern th a t influence the decision of assigning the pattern to a certain
class. For example, in forecasting problems for air quality, various parameters such
as concentrations of various pollutants, wind speed, wind direction, and temperature
22

serve as the vector components. The neural network will be able, in this case, to
forecast pollutant concentration.
Sometimes there is a need for input d ata to be normalized, depending upon the
type and the objective of the problem. For example, if we wanted to find the difference
between the two vectors, one approach for this would be to find the dot product of
the normalized vectors. The dot product is maximum if the vectors have a minimal
difference. The need for normalization varies from one problem to another and the
nature of the input data. Normalization of the input vector may be applied effectively
only when this approach does not lead to the loss of any information needed for the
network to be classified appropriately.

2.2

P a ttern R ecogn ition

Pattern recognition is widely used, often under the name of ‘classification’. A pattern
may be loosely defined as any entity th a t could be given a name. For example, a
pattern could be a fingerprint image, a handwritten cursive word, a human face, or
a speech signal [33]. Pattern recognition is defined formally as the process where
a given pattern is assigned to one of a prescribed number of categories. A neural
network may be used for pattern recognition if it first undergoes training. During
training, the network is repeatedly presented with a set of only input patterns for
unsupervised learning and for supervised learning the input patterns are presented
along with a category to which each particular pattern belongs.
Later, once training is term inated or completed, a pattern th a t has not been
seen before (i.e. not been used in training) but belongs to the same population of
patterns used to train the network is presented to a network. The network identifies
the category of the pattern on the basis of the information it has extracted during
the training process. Pattern recognition is achieved if the information is carried in
the relative rather than the absolute values of the vector component and the category
identified is correct or acceptable.

23

2.3

A rch itectu re

The major building block for any ANN architecture is the processing element or
neuron. These neurons are located in one of the three types of layers: the input
layer, a hidden layer, or the output layer as shown in Figure 2.1. The input neurons
receive d ata from the outside environment, the hidden neurons receive signals from
all of the neurons in the preceding layer, and the output neurons receive signals from
all of the neurons in the preceding layer and send information back to the external
environment. It is possible to have one or more hidden layers of neurons in a neural
network depending upon the complexity of the problem. These neurons are linked
by a line of communication called a connection. The way in which the neurons are
connected has a great effect on the operation and the performance of the network.
ANN models can have a variety of topologies or paradigms. Detailed descriptions of
all the paradigms are presented in Neural Network Design by Hagan [32].
A “Feedforward” neural network has been used for the supervised method (see
Fig 2.1) and a “Competitive” neural network has been used for the unsupervised
method (see Fig 2.2) in this thesis work to lump the chemical species into appropriate
categories.
The network topology depends critically upon the number of training examples
and the complexity of the pattern th at the network is trying to learn. The optimum
number of hidden nodes varies from one type of the problem to the other. It is very
difficult to determine a good network topology just from the number of inputs and
outputs. Some authors refer to a “rule of thumb” for choosing the network topology
i.e., the number of hidden nodes should be greater than the sum of input nodes and
output nodes [36]. In this work we have followed this “rule of thum b” to determine
the appropriate number of hidden nodes required for the neural network.

24

Input Layer

Hidden Layer
IN P U T

Net

O U TPU T

Output layer

Activation function

Figure 2.1; A typical feedforward neural network architecture (after Figure 1 in [35])

Similarity Measure Layer

Competitive Layer

M xR

ndist
R xl

Input

M xl
Output

M xl

Learning rule

Figure 2.2: Competitive neural network (after Figure in [34])

25

2.4

A pplications

Artificial neural networks can be used in fields such as signal processing, robotics,
pattern recognition, medicine, chemistry, speech recognition, business and in vision such as face recognition, edge detection, and for visual search engines. Artificial neu­
ral networks have been applied in different fields of chemistry. A detailed description
of the application of ANNs in chemistry and the representation of chemical informa­
tion was given by John A. Burns and George M. Whitesides in 1993 [37] including
applications in biological sequences, in the interpretation of spectra, in sensor arrays,
and Quantitative Structure-Activity Relationships (QSAR).

2.5

Learning M eth od s

The purpose of learning is to train the network to perform the desired task. Learn­
ing rules are the methodologies used in support of training a neural network, where
the learning rules are used to extract information and knowledge of patterns from
the training examples and to adjust the neural networks accordingly. Each input
has an associated weight th at represents the strength of th a t particular connection.
The learning rule allows the network to adjust the two parameters (i.e., the connec­
tion weights and associated biases) in order to associate given input vectors with
corresponding output vectors. The learning rules are methodologies for modifying
the weights and biases dynamically in an efficient way such th at accurate pattern
recognition is achieved. During training periods, the input vectors are repeatedly
presented, and the weights and biases are modified according to the learning rule,
until the network produces the desired associations with the desired accuracy.
There are as many learning rules as there are neural networks. As the architecture
of the neural networks vary, the learning rules also vary, but mostly all the learning
rules are categorized into two main types. They are the supervised learning method
and the unsupervised learning method. In both cases, the neural network is able to
generalize from what it has learned from the training patterns, so th a t when a pre­
26

viously unseen input pattern is presented, the network responds with an appropriate
answer.

2.5.1

S u p e rv ised L earning

In the supervised learning method, each individual output node has an external
“teacher” . Thus for each given input, the output unit is told what the desired re­
sponse ought to be. Supervised learning tries to match the output of the network
to values th at have already been defined. Methods of supervised learning include
error-correction learning, reinforcement learning, and stochastic learning.
The im portant issue in supervised learning is th a t the total training error converges
to a minimum, in th a t the error between the desired output (or the target output)
and the computed network output decreases. One of the most commonly used meth­
ods in learning process is least mean square (LMS) convergence which minimizes the
Euclidean distance between the desired output and the network output. Some of neu­
ral networks which use supervised methods are the Perceptron neural network, the
Adaline neural network, the Feedforward neural network with backpropagation algo­
rithm (BP), and the Learning Vector Quantization (LYQ) [32]. Supervised learning
methods usually perform better than unsupervised learning methods, but supervised
training is not necessarily faster, or more efficient. W hether it is appropriate depends
on the problem.
We have discussed different methods used for lumping the chemical species to
reduce the complexity in the chemical mechanism. Most of the methods used in the
literature consist of rigorous mathematical methods to obtain the lumps such that
the behavior of the lumped chemical species should not depart significantly from the
actual chemical species. The objective of using machine learning through artificial
neural network is to reduce “drudgery” involved in the existing methods by utilizing
a previously trained neural network. A previously classified set of chemical species
may be used for supervised training of neural network.

27

2 .5 .2

U n su p e r v ise d L earn in g

Everyday life is filled with unexpected aspects of situations where exact training sets
do not exist. Unsupervised learning may be used for problems in which we lack
comprehensive prior knowledge.

Unsupervised learning, in contrast to supervised

learning, does not provide the network with target output values. For unsupervised
learning, the training set consists of input training patterns only. As usual, inputs
are applied to the input layer, and the outputs from the output layer nodes are
considered. There are no known corresponding correct outputs, in contrast to the
supervised learning. A raw datum with no prior knowledge about the desired output
for a given input is analyzed and the network is trained without target values. Weights
and biases are modified in response to network inputs only. The network learns to
adapt based on the experiences collected through the previous training patterns. The
only possible way to classify is by enhancing differences as well as similarities from
the training patterns of the data and by arranging the d ata in clusters so th at the
vectors similar to each other are grouped together.
After the network has been tuned to the statistical regularities of the input data,
it develops to form the internal representations for encoding the features of the input
and to create new classes automatically. Unsupervised learning usually performs a
mapping from input to output space, data compression or clustering. Some of the
popular unsupervised neural networks are Grossberg classifier, Kohonen self organiz­
ing feature map, competitive neural networks, and fuzzy associative memory [32].
Unsupervised learning neural network may be used with two layers - an input layer
and competitive layer as in the case of competitive neural network. Figure 2.2. The
input layer receives the available input and the competitive layer consists of neurons
th a t compete with each other for the opportunity to respond to features contained in
the input data. The network operates in accordance to the “winner takes all” strategy.
In this strategy the neuron with the greatest total output wins the competition and
all other neurons are switched off. The methodology of competitive neural network
is discussed in Section 6.3.
28

A potential advantage of the unsupervised learning method is th a t it does not re­
quire any prior classification process for training the neural network. In our approach
of solving the lumping problem by neural network, both supervised and unsupervised
methods have been attempted.

2.6

A rea o f R esearch

Lumping of atmospheric chemical species can be achieved by two approaches: (i) the
lumped structural approach and (ii) the lumped molecular approach. In the lumped
structural approach, organic chemical species are grouped together according to the
types of bonds and functional groups in each chemical species. In the lumped molec­
ular approach, generalized chemical species are used to represent a certain class of
chemical species which have similar chemical behavior. Many techniques have been
used in the literature to lump atmospheric chemical species [20,22-24]. In this thesis
we adopted the functional group approach to lump the Volatile Organic Compounds
(VOCs). A functional group is defined as a group of atoms within the molecule
th a t is largely responsible for certain chemical behaviors of the parent molecule and
the reactions in which it takes part. These functional groups can be used to gener­
ate and characterize a detailed chemical mechanism which gives certain, essentially
quantitative, information about the fate of atmospheric chemical species. Lumping
of chemical species into different categories is a classification problem. From the com­
puter science perspective, classification problems are appropriately conducted by the
application of machine learning by Artificial Neural Networks (ANN). An im portant
application of neural networks is classification by pattern recognition. Once trained,
the neural network is able to recognize similarities when presented with a new input
pattern, resulting in a predicted output pattern. The research in this thesis deals
with the issue of classification of chemical species into different lumped categories
using artificial neural networks. The feasibility of applying neural networks to solve
the lumping problem is not straightforward because:

29

1. The conventional notation used to represent chemical species is not in a form
which can be given directly as an input for the neural networks.
2. A unique or canonical representation of each chemical species is required in
order to avoid ambiguity and misinterpretation.
3. An atmospheric chemical species database needs to be available which is ade­
quate for a neural network to be trained.
Machine learning techniques by artificial neural networks have the potential to
automate the process of classifying atmospheric chemical species into appropriate
lumped categories.

30

C hapter 3
M ethodology - G eneration and
Pruning of Chem ical Species
D atabase
3.1

In trodu ction

The issue of applying neural networks to solve the problem of lumping chemical
species is not straightforward. One of the major problems mentioned previously is
the availability of a database of atmospheric chemical species which is adequate for a
neural network to be trained. Even if the database is generated, not all the chemical
species will be im portant in the atmosphere. It was realized th a t a systematic pruning
of chemical species database would be required. The stages of this work are depicted
in Figure 3.1 and are discussed in this chapter.

3.2

G eneration o f C hem ical Species D atab ase

Once it has been decided to use the neural network to solve the problem, gathering
the d ata for neural network training purposes is the first task. The training data set
includes a number of categories (also called patterns) and a number of cases (also
31

G enerating the C hem ical Species D atabase
--------------using SA M S Softw are

C o nvertin g the g en erated C hem ical Species
^
to SM IL E S N o tation
___

P ru ning o f C hem ical Species w hich
are obviously u nim p o rtan t____

L um pin g o f C h em ical Species into
D ifferent G r o u p s
_

C o nvertin g the C hem ical Species to a
M atrix N otatio n w ith Species U niqueness

C o n verting to V ecto r N otatio n and
R edu cing th e dim en sion ality

A p plication o f A rtificial N eural N etw o rk
for P attern R ecog nition

S upervised L earning
M ethods

U nsu perv ised L earning
M ethods

C lassifying to differen t p ro posed lum ped categories

Figure 3.1: Methodology of the research
called samples) in each category. This requirement of a number of categories and a
number of cases in each category for neural network training frequently presents dif­
ficulties. For most practical problems, the number of cases required will be hundreds
or thousands. Even more cases may be required if the problem is more complex. If
the training dataset is smaller, the information given may not be adequate to train
the n eu ral netw ork.
The purpose of using a neural network is to generalize (i.e., when inputs which
are not in the training set are given to the network, the outputs of the network
should closely approach target values). Generalization requires prior knowledge. T his
32

can be achieved by knowing the relevant inputs (usually in large numbers) and the
input to output relationship th at contains adequate information for the network to
be trained. The effective performance of the neural network lies in the accuracy
of classification. For a neural network to have acceptable performance, there is a
need for a set of chemical species available in the database th a t is adequate for a
neural network to be trained to give correct classification. There are some existing
databases available in the literature such as Master Chemical Mechanism (MCM)
developed by the University of Leeds [38], a database of atmospheric chemical species
developed by Syracuse Research Corporation [39], and the Regional Atmospheric
Chemistry Mechanism developed by Stockwell [23]. The limited number of chemical
species in these databases are insufficient to train a neural network. For example in
MCM database consists of 124 chemical species which are divided into 14 lumped
categories. This is the reason we have been motivated to develop our own database
of VOCs (Volatile Organic Compounds).
Generating the chemical species database has been done with the help of exist­
ing software developed by Spectrum Research group. The software is a ComputerAssisted Structure Elucidation of Q-2 called SAMS (Structure Assembly Made Sim­
ple) [40]. SAMS is a powerful tool used for both Structure Elucidation and New
Chemical Entities (NCEs) generation. SAMS was designed for optimized structure
generation based on known empirical formula and bond constraints derived from small
molecule fragments. This software takes the empirical formula as an input and gen­
erates all possible complete unique structures of the given empirical formula as an
output. A database of chemical species, excluding the cyclic chemical species, has
been generated for the empirical formulas shown in Table 3.1.
Cyclic species are not considered in this thesis because few cyclic species are
present in the atmosphere.

The SAMS software produced thousands of chemical

species isomers for a given empirical formula. This case is especially true with the
empirical formulas which have more carbon atoms (usually for a carbon count greater
than 5). Identifying the possible cyclic species in the atmosphere and considering

33

Table 3.1: List of the empirical formulas used to generate the chemical species
database
n = 1 to 8
c . iH2„+2
Cri^2n
Orii^2n-2

n = 1 to 7
C„H2„ + 20
Cn^2nO

n = 1 to 6

n = 1 to 5

CnH2n-20

CrjH2„_202

^71^271—40

Cnll2n —4O2

OnH2?a—
gO CnH2n_602
OyiH2n—
sO CnH2n_s02
CnH2n-loO C„H2n_io02
CnH2n+202

them in the chemical species database can be considered as one area of future work.
The heavier chemical species are also not considered in this thesis. As the number
of carbon atoms in the carbon chain increases and as the molecular weight increases
the vapor pressure is decreased. Organic compounds with low vapor pressures (i.e.,
less than 10 Pa at 20° C) [41] are not considered to be volatile organic compounds in
the atmosphere.
Only the Volatile Organic Compounds (VOCs) are considered in this thesis be­
cause VOCs are very important trace atmospheric constituents. In the atmosphere
these play a critical role in tropospheric chemistry and can have strong direct ad­
verse effects on the environment depending on their concentrations. VOCs affect
the oxidation capacity of the troposphere and contribute to photochemical ozone for­
mation. The formation of many im portant secondary pollutants in the atmosphere,
such as ozone, peroxides, aldehydes, and secondary organic particulate m atter de­
pend critically on the availability of VOCs. Tropospheric ozone is mainly formed
when pollutants emitted by cars, power plants, chemical plants, and other sources
react chemically in the presence of sunlight. Motor vehicle exhaust and industrial
emissions, gasoline vapors, and chemical solvents are the major sources of NOa, and
VOCs. These two pollutants are the primary reactants in tropospheric ozone forma­
tion. Ozone at ground level is considered as a “bad ozone” because it is a harmful
pollutant and has proved to be toxic to living things. Hence, VOCs are of central
importance for tropospheric chemistry. VOCs include a wide range of carbon based
34

molecules which participate in atmospheric photochemical reactions. The VOCs in­
clude aldehydes, ketones, alcohol, and hydrocarbons with single, double, and triple
bonds.
W ithout including the cyclic species, the SAMS software assisted in generating
4200 unique chemical species for the empirical formulas listed in the table 3.1. All
the chemical species generated by this process may not be im portant or may not exist
in the atmosphere. A systematic pruning of chemical species is done by eliminating
the chemical species which are obviously unim portant in the atmosphere due to low
volatility, lack of sources, or structure instability. EPI (Estimation Program Interface)
Suite assisted in pruning some of the chemical species from the database.

3.3

E P I (E stim ation Program Interface) S uite

The EPI Suite is a group of programs th at provides physical properties, chemical
properties and environmental fate of the chemical species. It was developed by the
Environmental Protection Agency’s (EPA’s) Office of Pollution Prevention and Toxics
and Syracuse Research Corporation (SRC). The vapor pressure program of the suite
has been used to determine whether or not the generated chemical species is a VOC.
The EPI suite [42] provides users with both experimental values and estimated values
of physical and chemical properties which assist in predicting the environmental fate
of a chemical species. This software requires only the chemical structure of the com­
pound in SMILES (Simplified Molecular Input Line Entry System) notation as an
input. A detailed description of the SMILES notation of chemical species structure
is given in the next chapter. The interface of EPI suite transfers a single SMILES
notation to ten separate structure estimation programs th at are part of the EPI Suite:

35

1) Atmospheric oxidation rates

6) Bioconcentration factor

2) Biodégradation probability

7) Aquatic toxicity

3) Henry’s law constant

8) Water solubility

4) Octanol-water partition coefficient

9) Aqueous hydrolysis rates

5) Soil absorption coefficient

10) Melting point, Boiling point,
and Vapor pressure

3.4

P runing o f C hem ical Species D atabase

After generation, the chemical species are first sorted according to different functional
groups. The functional groups considered are:
1. Alkanes
2. Alkenes (with one, two and three double bonds in the molecule)
3. Alkynes (with one, two and three triple bonds in the molecule)
4. Combination of double and triple bonds (with the combinations

of 1 double

and 1 triple bonds, 1 double and 2 triple bonds, 1 double and 3 triple bonds,2
double and 1 triple bonds, 2 double and 2 triple bonds, 3 double and 1 triple
bonds in the molecule)
5. Alcohols
6. Aldehydes
7. Ketones
8. Ethers
9. Esters
10. Carboxylic acids

36

11. Unstable chemical species (chemical species with patterns -C (0 )0 , -C (0 )(0 )C -,
and -COCOC-)
12. Vicinal diols (chemical species with patterns -C (0 )C (0 )C - )
A systematic pruning of the chemical species database was been done by exclud­
ing chemical species which are obviously unim portant in the atmosphere using the
following two approaches:
1. Functional Group Approach
2. Vapor Pressure

3.4.1

F u n ctio n a l G rou p A p p roach

Some of the chemical species which have been excluded from the chemical species
database on the basis of functional group are as follows:
1. E thers and esters: Ethers and esters have been excluded from the database
because the chemical species with these functional groups are considered to be
unim portant in the atmosphere. The atmospheric fate of the chemical species
with these functional groups are not described in comprehensive books such as
Chemistry of Upper and Lower Atmosphere by Finlayson-Pitts and Pitts [43]
and Atmospheric Chemistry and Global Change by Brasseur et al. [44].
2. C arboxylic acids: Carboxylic acids react very slowly in the atmosphere (for
example the life time for HCOOH at [OH] = 1x10® radicals cm~® is approxi­
mately 26 days). Due to the high solubility and stickiness of these acid molecules,
they are likely to be removed by wet and dry deposition rather than by OH rad­
ical reactions [43].
3. V icinal diols: Vicinal diols have been excluded from the database because a
limited number of studies have been done on the oxidation of diols by molecular

37

oxygen as an oxidant. These are unstable in the atmosphere and undergo a re­
arrangement by cleaving the C-C bond and forming either an aldehyde group or
a ketone group. As these two groups have already been included, the molecules
formed by cleaving of the diols will be present in those groups.
4. U n sta b le m olecules: The functional group denoted by SMILES patterns such
as -C (0 )0 , -C (0 )(0 )C -, or -COCOC- is unstable in the atmosphere and breaks
down relatively quickly, forming either aldehydes or ketones which are already
included in the chemical species database.

3 .4 .2

V ap or P re ssu re

Not all the chemical species which were generated by SAMS software may be VOCs.
VOCs are organic chemicals th a t can easily vaporize at ambient temperatures. The
remaining chemical species in the database were further pruned by eliminating the
chemical species which were found to be nonvolatile organic compounds. This can
be done by considering the vapor pressures of the organic species. If the chemical
species does not have a measurable vapor pressure then it is a nonvolatile chemical
species. The recently published EU VOC-directive [41] defines a VOC as an organic
compound which has a vapor pressure above 10 Pa at 20° C, or has a corresponding
volatility under the particular condition of use. In the absence of measurable data,
the vapor pressure of the each chemical species in the database was estimated from
the EPI suite using three methods:
1. Antoine method
2. Modified Grain method and
3. Mackay method.
All three methods use the normal boiling point to estimate the vapor pressure.
The Antoine method [45] was developed for liquids and gases. The general equation

38

is:
1

\n{VP) =

L(r&-c)

(T -c)

(3.1)

where
A H„6 is the heat of vaporisation at the boiling point (cal/mol)
Tft is the temperature of the normal boiling point in Kelvin
C is a constant estimated to be = -18 + 0.19T& (in Kelvin)
T is the temperature in Kelvin
AZft (compressibility factor) is assumed to have the value of 0.97
R is the gas constant = 1.987 cal/(mol K)
Vapor pressure is defined with respect to the reference state of standard pressure 1
atm.
The modified Grain method [46] is a modification of the Watson method. It is
applicable for solids, liquids and gases. Equation 3.2 applies only for compounds
which are liquid or gaseous at the tem perature of interest and equation 3.3 can be
used for solid and liquid compounds.

ln ( y p ) - ^-^-(8.75 + RZTr(T),))

In(VP)

(Ï6 - C):
0.97RT

R 'f (8.75 -I- RZM(T),))

0.97R

1-

(2^ - C )

(3 - 2T*)

(T-C)J

(3.2)

2m (3 - 2r*)™-Vm(T*)
(3.3)

where
VP = vapor pressure [atm]
Kp = compound class specific constant
R = gas constant [cal/mol K]
Tb = boiling point [K]
T = environmental temperature [K]
C = -18 + 0.19 Tfc
T* = T/Tfe
The value of Kp ranges between 0.97 to 1.23. The constant m depends on T* and
39

the physical state of the compound at the temperature of interest:
Liquids: m — 0.19
Solids: T* > 0.6 then m = 0.36
0.5 < T* < 0.6 then m — 0.8
T* < 0.5 then m = 1.19
Mackay [47] fitted the following empirical equation to estimate the vapor pressure:

In(yp) = -(4 .4 + Z7%(T1,)) 1.803 ( p - l ] -0 .8 0 3 X /n

-

6.8

T

(3.4)
The equation includes the boiling point (Tb), the melting point (T ^) and the
tem perature (T) in Kelvins. The melting point is ignored for liquids. EPI reports the
vapor pressure estimate from all the three methods and reports a “suggested” vapor
pressure. The modified Grain estimates the suggested vapor pressure for solids, while
for liquids and gases, the suggested vapor pressure is the average of the Antoine and
the modified Grain estimates. The Mackay method is not used for the suggested
vapor pressure because its application is limited to the chemical species from which
it was derived.
After pruning the list of chemical species by the functional groups, a further
pruning has been done by taking the vapor pressure into consideration. According to
the statement th a t a VOG is an organic compound which has a vapor pressure above
10 Pa at 20° C in the recently published EU VOC-directive, a further pruning of the
chemical species database has been done by eliminating the chemical species from the
database which have a vapor pressure less than 10 Pa. W ith the help of the above
two pruning approaches, a data set representative of the chemical species present in
the atmosphere was obtained.

40

C hapter 4
M ethodology - R epresentation of
C hem ical Species
4.1

In troduction

After generating the chemical species database, the next task was to determine how
to give the information about the chemical species to the neural network. Tradition­
ally a chemical species is represented either as an empirical formula or with structural
notation. This type of notation is not suitable as input to the neural network. There
needs be some intermediate notation which can be easily accessible and should pro­
vide appropriate information about the chemical species to the computer software.
We have employed SMILES (Simplified Molecular Input Line Entry System) nota­
tion to generate a matrix notation to convey the structural information to the neural
network. SMILES notation, while not suitable for the purpose of giving the chemical
species information to the neural network, did serve as an intermediate representation
into which the structural representation of the chemical species could be converted.
SMILES notation can be used later to convert to matrix notation which can be acces­
sible by the neural network. This chapter discusses the methodology of the SMILES
notation and the techniques used in the literature to represent chemical species in
matrix notation. This chapter also presents our approach to m atrix representation of
41

Table 4.1: Some examples of SMILES notation to represent molecules
Molecular Name
Methane
Ammonia
Water

SMILES Notation
C
N
0

Molecular Formula
CH4
NH3
H2O

chemical species and its uniqueness which avoids misinterpretation and ambiguity.

4.2

SM ILES (Sim plified M olecular Input Line En­
try S ystem ) N o ta tio n

SMILES is an effective method which is used widely by chemists to encode chemical
species data for computer use. SMILES is a line notation for chemical structures
which represents the two-dimensional valence-oriented picture th a t chemists often
use to represent a molecule. SMILES notation is written as a single sequence of
characters in the form of strings without any space characters [48]. A space character
denotes the end of the string. Hydrogen atoms are suppressed in this type of notation.
Among several approaches to computerized chemical notation, this line notation is
popular because it represents molecular structure by a linear string of symbols. Rules
for generating SMILES for any chemical structure are illustrated in Tables 4.1, 4.2
and Figures 4.1, 4.2, 4.3 shown in this section.

4 .2 .1

A to m s

Atoms are represented by their atomic symbols. In general the first or only letter of
the symbol is written in an upper-case letter, the second (if present) must be lower­
case. Atoms in aromatic rings are specified by the lower-case letters. Some examples
of SMILES notation to represent an atom in a molecule are shown in the Table 4.1.

42

Table 4.2: A list of chemical species with SMILES notation
Molecular Name
Ethene
Ethyne
3-Heptene(Cis)
3-Heptene(Trans)

4 .2 .2

SMILES Notation

Molecular Formula

C=C

C2H4

c#c
cc/c=c\ccc
cc\c=c\ccc

CyHw
C7 H14

B onds

Single, double, and triple bonds are represented by the symbols - (hyphen), = (equals
sign), and # (hash symbol) respectively. For 'els' chemical species a backward and
a forward slash is introduced immediately before and after the two carbon atoms
linked by the double bond. For ‘tran s’ chemical species two backward slashes are
used. Single bonds between atoms are not explicitly shown. A list of chemical species
with SMILES notation are shown in Table 4.2.

4 .2 .3

B ra n ch es

Branches are specified by enclosing the atoms within parentheses. Branches can be
nested to any depth or stacked as illustrated in Figure 4.1.

H
H H -C-H H
H -Ç —

H

H

Ç— Ç — C— Ç - H

—

^ CC(CXC)CC(C)C

h h - c - h h h - c- h H

H

H
2,2,4-Trimethylpentane

Figure 4.1: SMILES notation for molecules with branched structure

43

4 .2 .4

C yclic S tru ctu res

Cyclic structures are usually represented by breaking one single or aromatic bond
in each ring and labeling the atoms which participated in the broken bond with the
same integer. The bonds are numbered in any order, designating ring-opening (or
ring-closure). For example SMILES notation for l-methyl-3-bromo-cyclohexene is
shown in the Figure 4.2. The notation C C l=C C (B r)C C C l is the canonical notation
according to the lUPAC convention.
Br
Br
(a) CCl=CC(Br)CCCl

Cl

C

cx -^

HjC

(b) CCl=CC(CCCl)Br

Figure 4.2: SMILES notation for cyclic structured molecules (after figure in [48])

4 .2 .5

A r o m a tic ity

Aromatic structures may be distinguished from cyclic species by writing the atoms in
the aromatic ring in lower case letters. For example the SMILES notation of benzoic
acid is shown in Figure 4.3.
C—

Cl

' ®

clccccclC(=0)0

OH

Figure 4.3: SMILES notation for aromatic chemical species (after figure in [48])

44

4.3

M atrix N o ta tio n

4 .3 .1

T ech n iq u es u sed in th e litera tu re

The bond and electron (BE) matrix provides description of connectivity of atoms
within a moiecuie [1,4,7]. Atom connectivity is given by nonzero entries equal to the
bond order in the off-diagonal locations of the matrix. Radicals include an entry on
the diagonal of the m atrix element indicating the unpaired electron. The BE matrices
of ethane, ethyl radical, and ethene are shown in the Figure 4.4.
01110001
10001110
10000000
10000000
01000000
01000000
01000000
10000000

1111000
1000111

1000000
1000000
0100000
0100000
0100000

021001

200 1 10
100000
010000
010000

100000

Figure 4.4: BE matrix representations th a t specify atomic connectivity and electronic
environment for (left) ethane, (center) ethyl radical, and (right) ethene (after Figure
5. in [1])
The sum of the entries in a given row is equal to the number of valence electrons
for th a t atom. The diagonal elements in ethane BE matrix are all zero, as all of the
entire valence electrons for carbon and hydrogen are involved in bonding. The BE
m atrix for the ethyl radical contains a nonzero entry in the diagonal to denote the
unpaired electron defining the radical center. Non-unity entries for multiple bonds
are illustrated in the BE matrix of ethene.
R ep resen tation for C hem ical Species and R eactions
The BE m atrix is well suited for description of chemical reactions. Broadbelt and
co-workers [1] developed an automated system which describes the chemical reaction
mechanism through BE matrix notation. The number of atoms in a molecule ac­
tually affected in a chemical reaction is small. The BE sub-matrix comprises only
45

those atoms which are actually affected in chemical reaction. These sub-matrices are
relatively small and dense. This makes the m atrix operation R -t- B = E simple
and effective. The reaction matrix, R, is added to the reactant sub-matrix, B, to
yield the product sub-matrix, E, which gives the altered connectivity of the atoms
involved in the reaction. The product sub-matrix E can then be incorporated back
into the overall BE m atrix and adjacency structure to represent the entire product
molecules [1,7]. Figure 4.5 shows the reactant sub matrices for different types of
reactions.
H 0 -1 1 R* -1 1 0
C -1 1 0 Ca 1 0-1
R 1 0 -1
0-1 1
S

4 -1 1
Ri 1-1

(b)

(a)

Cl 1-1
C2 -1 1

(c)

Cl 0-1 1
C2 -1 1 0
R. 1 0-1
(e)

(d)

Figure 4.5: Reaction matrix representation for (a) H-abstraction, (b) /3-scission, (c)
Recombination, (d) Bond fission, and (e) Radical addition, (after Figure 6. in [1])
The general form of the chemical reactions shown in the Figure 4.5 is as follows:

(a) H-abstraction:

(b) /3-scission:

R

-

C -

H +

OH

C — Co

C

+

\

/

I

c

Ca +

-0/3

/

\

I

(c) Recombination: Ri- 4- Rg- + M —> R1-R2 + M

46

R -

H2O

(d) Bond fission:

_

Cj

-

\
(e) Radical addition:

— -

Cr

+

/
Ci

/

C2 -

=

C2 +

R-

R

-

C2-

I

I

— Ci

— C2'

I

I

\

The mechanism generator contains three species lists: unreacted components
(molecules and radicals), reacted molecules, and reacted intermediates (radicals).
These lists are visited through an iterative algorithm. Reaction generation begins
by placing the reactants in the unreacted components array, a list of species which
are yet untested for reaction. The first species is then extracted from the unreacted
component array. Sequential tests are done on the molecule to indicate what type of
reactions are plausible, and later the reaction operations are applied. These tests are
done by user-defined rules which define the atoms in a molecule th a t are involved in
the chemical reaction. For example [1] in:
• H -abstraction: Abstraction of any hydrogen atom in a molecule is allowed.
• ,0-scission: /3-scission requires a C atom th a t is in (3 position to the radical
center and it should be singly bonded to the a atom.
• R adical recom bination: A radical recombination can occur if there exist any
two radical centers in a reaction.
• B ond fission: If there exists a carbon-carbon single bond, a bond fission reac­
tion can occur.
• R adical addition: Radical addition reaction occurs if there exists a radical
center with the atoms of a double or triple bond.
Reaction operations are applied on the small area of the m atrix containing the
atoms th a t are actually involved in the reaction. M atrix representation of the product
47

matrix is obtained with altered connectivity of atoms in the molecule. A systematic
connectivity check has to be done on each product matrix to determine the number
of chemical species formed and their correct matrix representation.
The methodology of reaction generation is best explained with the pyrolysis of
pure ethane [7]. The process first checks whether the species is a molecule or radical
because the reaction properties are different. As ethane is a molecule it is subjected
to tests like bond fission, H-abstraction, and radical addition. Ethane should fail
the test for H-abstraction and radical addition as there are no co-reactant radicals in
the species lists for the pyrolysis mechanism. Ultimately the first application is the
bond fission reaction. Bond fission requires a carbon-carbon single bond. The model
is first tested for bond fission by determining if there is a single carbon to carbon
bond. After the single carbon to carbon bond is located the connectivity information
is placed into a BE matrix to compute the reaction. The fission reaction is carried
out by addition of the reactant (ethane) sub-matrix and the reaction (fission) matrix
as shown in the Figure 4.6. A new BE sub-matrix describing the connectivity and
electron configuration of the reactive sites of the product molecule is obtained.
c O il 11000
c 10000111
H 10000000
H 10000000
H 10000000
H 01000000
H 01000000
H 01000000

+

1-1
- 11

_____ ^

c 10111000
c 01000111
H 10000000
H 10000000
H 10000000
H 01000000
H 01000000
H 01000000

Figure 4.6: Bond fission reaction in matrix notation
The connectivity of these reactive sites in the product side is unaltered. The prod­
uct molecule is produced by reassembling the adjacency information of the reacted
and unreacted atoms. This entire matrix is then subjected to a connectivity check to
determine the number of products formed.
The current molecule or radical subjected to the reaction tests (ethane, in this
48

example) is now completed as a reactive component. It is removed from the unre­
acted component list and placed in the appropriate molecule or radical list so th at
subsequently generated species can participate with it in bimolecular reactions. All
combinations of species for a given reaction type will ultimately be tested. Subse­
quent passes through the generation algorithm follow the same logic and treat all
species in a systematic manner to ensure all possible combinations are generated.
The algorithm continues as long as all the new species in the unreacted components
list are completed. Thus, the methyl radical was tested for the pyrolysis radical reac­
tions of recombination, radical addition, disproportionation, /3-scission, and hydrogen
abstraction.
The functionality of the model developed by Broadbelt and co-workers [1] is also
explained with a bimolecular reactions example. A set of chemical reaction pathways
for methyl radical (ie., 2 CH3

C^Hg, C2H6 + OH —> C2H5 + H2O) is considered

in the Figure 4.7. For bimolecular reactions, two reactants are combined together to
form a single reactant matrix.
Step I describes the merging of the two methyl radicals into a single reactant
matrix. In Step II the forward reaction operation describes the recombination of two
methyl radicals to one ethane molecule. The reverse reaction describes the bond fis­
sion of one ethane molecule into two methyl radicals. Steps III, IV, and V describe the
H-abstraction from ethane. Step HI shows the merging of two BE reactant matrices
(i.e., ethane and OH radical) into a single reactant matrix.
This section illustrates the way a chemical species can be represented in matrix
notation and how well this matrix notation is suited for the description of chemical
reactions for computational purposes. Although this notation is not the exact matrix
notation employed in this thesis, the work produced by Broadbelt and coworkers has
introduced the idea of representing the chemical species in a matrix notation which
may be useful for application to neural network processing.

49

c
H
H
H
C
H
H
H

n i l
1000
1000
1000
+
n i l
1000
1000
1000

1 1 1000
0001 1 1

10000000
10000000
10000000

Ri.
Ri

Cl

01000000

Cl

01 1 1 1 0 0 0 0 0
1 0 0 0 0 1 1 1 00
1000000000
1000000000
1000000000
0100000000
0100000000
0 100000000
000000001 1
0000000010

c 01 1 1 1 0 0 0
c 10 0 0 0 1 1 1
H 10000000
H 10000000
H 10000000
H 0 1000000
H 0 1000000
H 0 1000000

1-1

R ecom bination

01000000
01000000

-11

1-

B ond F ission

+
S te p I

o . 11
H 10

Step II

H 00 10 0 0 0 0 0 0
c 010 0111000
0 100 1 0 0 0 0 0 0
H 00 1 0 0 0 0 0 0 0
c OIOOOOOI 1 1
H 0100000000
H 0100000000
H 0000 100000
H 0000100000
H 0000100000

111100000
100001110
100000000
100000000
010000000
010000000
010000000

001
001

Step V

Step III

H 0 -1 1
C -1 1 c
rJi 0

H 010 0000000
c 100 0111000
0 00 1 1000000

0010000000

H - A bstraction c 0 1 0 0 0 0 0 1 1 1
H 0100000000
H 0100000000
H 0000 I 00000
H 0000100000
H 0000100000

Step IV

110
Figure 4.7: A set of reaction pathways and its m atrix operations

4.4

T he A pproach

After obtaining the pruned chemical species list, chemical species in SMILES nota­
tion are converted to the internal matrix notation. The hydrogens are suppressed
in the matrix notation approach used in this thesis. The various transformations of
the chemical species are depicted in Figure 4.8. Figure 4.8(a) shows the molecular
structure of the chemical species where the numbers in the parentheses refer to the
corresponding atoms in matrix notation shown in Figure 4.8(c). The structural nota­
tions are converted to SMILES notation as shown in Figure 4.8(b) for computer use
in order to convert the chemical species from external notation to an internal con­
nectivity matrix notation as shown in Figure 4.8(c). The connectivity matrix is the
hydrogen suppressed matrix. Atom connectivity in the connectivity matrix is given
by nonzero entries equal to the bond order in the off-diagonal locations of the matrix.

50

In order to avoid ambiguity, misinterpretation, and to preserve the atom connectiv­
ity information of each chemical species, a unique representation of chemical species
is done. The uniqueness of the chemical species is obtained by labeling the atom ’s
connectivity as a real number weight in each of the non-zero matrix elements. The
decimal part of the entry represents which type of atom is connected and what type
of bond conformation exists. In Figure 4.8(c) the entities are represented as follows:
1. If a C atom is singly bonded to another C atom, then it is represented as 1.0
2. If a C atom is doubly bonded to another C atom, then it is represented as 2.0
3. If a C atom is singly bonded to an O atom, then it is represented as 1.5. The
5 in the decimal place represents the oxygen atom.
4. Similarly, If a C atom is doubly bonded to an 0 atom, then it is represented as
2.5
5. If there exists a double bond between the two C atoms

which isassociated with

the ‘cis’ conformation, then it is represented as 2.25
6. If there exists a double bond between the two C atoms

which isassociated with

the ‘tran s’ conformation, then it is represented as 2.75. This initial choice for
representing ‘trans’ was later modified as discussed in Section 7.1.2.
After identfying the matrix notation for the chemical species, the next task of
the work is to find the appropriate lumped categories and sort the chemical species
database into those lumped categories. These are discussed in the Chapter 5.

51

C C (=0)C C (C )C
(4)

CH 3
(7)

(SM ILES Notation)

(b)

4-M ethyl-2-Pentanone
(Structural Notation)
(a)

c

(4)

c

c
(5)

0

0

0

0

0

0

2.5

1.0

0

0

0

0

2.5

0

0

0

0

0

0

1.0

0

0

1.0

0

0

0

0

0

1.0

0

1.0

1.0

0

0

0

0

1.0

0

0

0

0

0

0

1.0

0

0

C
(1)

(2)

c

0

1.0

c
(2)

1.0

(1)
(=0)
(3)
C
(4)
C
(5)
(C)
(6)
C
(7)

c

(7)

(c)

Figure 4.8: Various transformations of the chemical species

52

C hapter 5
M ethodology - R eactions and
Lum ping
5.1

T ropospheric C hem ical R eaction s

5.1.1

R e a c tio n s o f A lk an es

Alkanes are hydrocarbons which have only single carbon-carbon bonds and have a
general formula C„H2„+2- The hydroxyl radical has a strong tendency to abstract a
hydrogen atom from an alkane forming an alkyl radical and a water molecule. This
is the only im portant initial chemical reaction of an alkane in the atmosphere [43].
RH + OH -> R + H2O
The other two possible reactions are:
RH + NO3 ^ R -I- HNO3 (Relatively slow)
RH + Cl ^ R + HCl.
The only significant reaction of the alkyl radical in the atmosphere is with O2. It
readily combines with the O2 molecule to form alkyl peroxy radicals.
R + O2
53

RO2

The reactions of RO2 with NO are fast and these are the predominant reactions
oxidizing NO to NO2.
RO2 + NO —> RO T NO2
The second class of reactions of RO2 with NO is the addition of RO2 to NO forming
alkyl nitrate. This type of reaction can be significant for larger molecules (with 5 or
more carbons).
RO2 + NO ^ RONO2
The other two possible reactions of RO2 are with HO2 or with another RO2 .
RO2 + HO2

ROOH + O2

RO2 + HO2 —> Carbonyl compound + H2O
RO2 + HO2 —^ ROH + O3
RO2 + RO2

2R 0 + O2 (Contributes significantly)

RO2 + RO2 —> ROH + RCHO + O2 (Contributes significantly)
RO2 + RO2 —» ROOR + O2 (Small contribution)
The alkoxy radical produced can react by several pathways depending upon the struc­
ture of the molecule. The reaction could be with O2, decomposition, isomerization,
with NO and with NO2.
R eaction w ith O 2: Molecular oxygen, O2, reacts with the alkoxy radical by abstract­
ing the hydrogen, producing a carbonyl group and HO2. The hydrogen abstracted is
the hydrogen bonded to the same carbon as the alkoxy.
RO -j- O2 —>O—C<c -|- HO2
Scission o f a C-C bond: C-C bond scission tends to produce the larger alkyl radical
of those possible. In Figure 5.1, in the case of 2-butoxy radical, path (b) dominates
over path (a) [43].
R eaction w ith N O and N O 2 : These reactions are not as significant as the reactions
with O2.
54

o

a
C H g -^

C - t -

C H .C H ,

CH,

+

CCHoCH

H

|b
O
CH3C

+

CH2 CH3

H

Figure 5.1: Scission of C-C bond
RO -b NO ^ RONO
RO + NO2 ^ RONO2
Alkyl nitrates (RONO2 and RONO) absorb light strongly in the actinic region,
where they appear to photolyse rapidly. Thus other reactions, such as reaction with
OH, do not compete significantly.

5.1 .2

R e a c tio n s o f A lk en es

Alkenes are hydrocarbons which have one or more carbon-carbon double bonds.
In itiation reaction w ith OH: This type of reaction occurs to a limited extent.
This is particularly the case with larger and highly branched compounds. This reac­
tion occurs by adding OH to the double bond, forming an adduct or an intermediate
th a t can decompose back to reactants. In the case of asymmetrical alkenes, such as
propene, the OH radical can add to either end of the double bond but preferentially
adds to form the secondary radical as shown below [43].

CH3CH = CH2 + OH

CH3CH - CH2OH

CH3CH = CH2 + OH ^ CH3CH(0 H) - ÔH2
CH3 ÔH - CH2 OH + O2 -> CH3C(0 0 -)H - CH2OH
CH3C(0 0 -)H - CH2OH + NO ^ CH3C(0 -)H - CH2OH + NO2
55

CH3C(0 0 -)H - CH2OH + NO -4 CH3C(0 N0 2 )H - CH2OH
CH3C(0 -)H-CH2 0 H formed from the above reaction can then undergo the following
process:
1. Reaction with O2
CR1R20H + O2

R 1R 2C =0 + HO2

2. Decomposition (Carbon count < 4)
CH3C(0 -)H - CH2OH

CH3CHO + CH2OH

3. Isomerization (Carbon count > 5)
CH2OH CHgCHG- ^ Isomer
In itiation reaction w ith O3 : Typical peak ozone concentrations found throughout
the world currently range from 30 to 40 ppb for the most remote places, or as high
as 500ppb or more in highly polluted urban areas. Peak levels in the rural-suburban
areasare typically in the 80 - 150 ppb range [43]. Under these conditions the reaction

O3 is the significant reaction for alkenes.
Sym m etrical Alkenes: The initial step in this type of reaction is the addition of
O3 across the double bond forming a primary ozonide
^2 > 0 1 = C2<^4 + 0 3 —^ Primary Ozonide
These primary ozonides are not stable in the atmosphere. They readily break down
to an aldehyde or ketone and to diradical Criegee intermediates
Primary Ozonide —>

> 0 = 0 + ^ >C 00-

Primary Ozonide

> 0 = 0 + ^ >C 00

A sym m etrical Alkenes: The following are the reactions of asymmetrical alkenes
and O3 with their approximate branching ratios:
RICH = CH2 + O3

RICHOO- + ECHO (0.5)

RICH = CH2 + 0 3 ^ RICHO + HCHOO- (0.5)
56

^ >C = CHg + Û 3 ^ ^ >COQ. + HCHO (0.65)
m >C = CHa + O3 -» R1C(0)R2 + HCHOO- (0.35)
R2
^ . _ _ CHR3 + 0 3 - ^ ^ > C 0 0 - + R3CH0 (0.65)

Rl
Rl
R2

>C = CHR3 + O3 ^ R 1C(0)R2 + R 3C H 00- (0.35)

Term inal A lkenes and Internal Alkenes: Alkenes can be classified as terminal
or internal alkenes. Alkenes with a double bond involving a terminal C (such as
propene) are called terminal alkenes. Internal alkenes, such as trans-2-butene, have
double bonds within the molecule which do not involve C atoms in the terminal
positions. The reaction reactivities and the end products of internal and terminal
alkenes with respect to the OH radical and O3 are significantly different. Terminal
alkenes react in the atmosphere to form aldehydes while internal alkenes react in the
atmosphere to form ketones.

5 .1 .3

A lk y n e R e a ctio n s

Alkynes are hydrocarbons which have one or more carbon-carbon triple bonds. The
only significant initial reaction of alkyne is with the OH radical. The OH radical is
initially added to the triple bond. These reactions give major products with corre­
sponding dicarbonyls: acetylene gives glyoxal [(CHOjg], propyne gives methylglyoxal
[CH3COCHO] and 2-butyne gives biacetyl [(CH3CO)2]. The following is the reaction
mechanism for the ethyne [43]:
HC = CH + OH ^ HC(OH) = CH
HC(OH) = CH + O2 ^ HC(OH) = CHOOHC(OH) = CHOO + N 0

HC(OH) = C(0-)H -t- NO2

HC(OH) = C (0-)H ^ HC(OH) - C (H )= 0
HC(OH) - C (H )=0 -b O2 ^ (CH0)2 + HO2
Term inal A lkynes and Internal Alkynes: Similar to the alkenes, alkynes are
also classified as terminal or internal alkynes. Alkynes with a triple bond involving a
57

carbon atom at the end of the molecule, such as 1-butyne, are called terminal alkynes.
Internal alkynes, such as 2-butyne, have a triple bond within the molecule. Terminal
alkynes react with OH radical resulting in the formation of formaldehyde and other
aldehyde radicals as the end product whereas in the case of internal alkynes the final
products are ketones and some other aldehydes.

5 .1 .4

R e a c tio n o f O x y g en -co n ta in in g O rganic S p ecies

The reactions of oxygen-containing organics with the OH radical are fast. In most
cases, NO3 reactions are slower than reactions with the OH radical, the latter being
the primary oxidant for the oxygen-containing compounds [43].
Aldehydes: Aldehydes are chemical species which have a carbonyl functional group
( 0 = 0 ) in the terminal position. In an aldehyde, at least one hydrogen atom is bonded
to the carbon of the carbonyl group, so the aldehyde functional group is -OHOReactions with aldehydes occur with abstraction of the relatively weakly bonded
aldehydic hydrogen
R C H O + OH (N O 3 , Cl)

R C O + H 2 O (H N O 3 , HCl)

The RCO radical which was obtained is then added to Og. An illustration of the
reaction mechanism for aldehydes is as follows:

RCO + O2

RC(=0)00-

RC(=0)00- + NO

RC(=0)0 + NOg

R C (=0)00 + NOg ^ RC(=0)-00N0g
RC(=0)0 ^ R + COg
K etones: Chemical species with a carbonyl group functional group(C =0) in the
nonterminal position are referred to as ketones. The reactions of ketones are similar
to those of alkanes with abstraction of hydrogen by OH, N O 3, and Cl.
A lcohols: Alcohols are the chemical species which have a hydroxyl (-0H) functional
group. The possible hydrogen abstraction sites for the reaction with alcohols are:
58

1. The alcohol 0-H
2. Primary hydrogen
3. Secondary hydrogen
4. Tertiary hydrogen
The reaction with OH tends to abstract the hydrogen th at is the most weakly bonded.
The hierarchy of the C-H and 0-H bond strengths are:

Tertiary < Secondary < Primary < 0-H .

5.2

Lum ping A pproach E m ployed for C lassifica­
tion

In our approach, we have employed features of both the structural and the molecular
lumping approaches. The most important oxidation reaction for alkanes is reaction
with an OH radical. The OH radical abstracts one hydrogen atom to form an alkyl
radical and a water molecule. Under atmospheric conditions oxygen reacts rapidly
with the alkyl radical to form an alkyl peroxy radical. The peroxy radical reacts with
NO to make NOg, organic nitrates, and unstable alkoxy radicals. The latter may
decompose or isomerize.
Hydroxyl radical reaction rate coefficients and ozone reaction rate coefficients for
VOCs can be estimated from structure-reactivity relationships. Kwok [49] developed
an algorithm to estimate the hydroxyl radical reaction rate coefficients for gas-phase
organic compounds. The EPI suite provides the experimental and estimated reaction
rate coefficients for all the VOCs with respect to OH and O3 using structure-reactivity
relationships.
For example, hydrogen atom abstraction rate coefficients from C-H and 0-H are
based on the estimation of group rate coefficients for hydrogen atom abstraction from
59

-CHa, -CH2-, >CH-, and -OH groups. The rate coefficient for hydrogen atom ab­
straction from these groups depends on the identity of the substituents attached to
them [49].

The hydroxyl rate coefficient for n-butane is as shown below;

CH3 -CH2 CH2 CH3 ^

tprimF(-CH2 -) -b

(-CH3 ) f (-CH2 - ) -fi

(-CH3)F(-CH2-) -b &pHmF(-CH2 -)
where
^(-CHa-X) = k p r i m F { X ) ,

A(X-CH2-Y) =

( X ) f (Y),

&(X-CH<^) -

( X ) f ( Y ) f (Z)

and kprim, kgec, and ktert are derived

using the expression k = CT^e^^/^, where

C (cm^ molecule"^ s“ ^) and D (K) are the temperature-dependent parameters.

The

values of these parameters for the group rate coefficients for H-atom abstraction from
-CHa, -CH2-, >CH-, and -OH groups are given in Table 1 of Kwok’s paper [49].

Relative rate coefficients (i.e., O a:O H ) may be determined with the O H reaction
rate coefficient based on ambient concentration of 1.5 6 x 10® O H cm~® and the O3
reaction rate coefficient with O3 ambient concentration of 7x 10^^ molecules cm"® [42].
Relative rate coefficients are calculated to decide whether a chemical species will
initiate the oxidation reaction preferentially with an O H radical or with O3. All the
chemical species which react preferentially with O3 are lumped into a separate group
because the product species follow a different chemical pathway of decomposition.
For example, O3 reacts across the double bond of alkenes to make a very short-lived
intermediate th at decomposes to carbonyl and Criegee intermediate radicals.
The im portant chemical loss process for aldehydes includes photolysis and the
reaction with OH, N O 3 radical and oxygen atom. All these reactions are first ex­
pected to form acylperoxy radicals. The subsequent reactions lead to the formation
of peroxyacyl nitrates, alkyl-peroxy radicals, and eventually, formaldehyde.
60

Reaction of the alcohols in the atmosphere are of particular interest because of
their use as alternative fuels. The reaction of alcohols with the OH radical depends
on the possible hydrogen abstraction sites. For the reaction of methanol with the OH
radical, the hydrogen abstraction from the alkyl group is predominant. In the case
of the reaction of ethanol with OH radical, the secondary hydrogen abstraction is the
predominant reaction.
Terminal chemical species refer to those species which have a double or triple
bond in the terminal position of the carbon chain. Thesechemical species react in
the atmosphere to form aldehydes. Internal chemical species refer to the chemical
species which have a double bond in the internal position of the carbon chain. These
chemical species react in the atmosphere to form ketones.
The lumping of chemical species is done based on the knowledge gained from the
existing lumped mechanisms [9,20-24] and from the reviews on tropospheric chemical
mechanisms [43,44]. The lumped groups utilized in this thesis are as follows:
1. All the chemical species whose dominant reaction is with ozone are lumped into
a group.
2. All alkanes except methane are grouped together.
3. All aldehydes are grouped together.
4. All ketones are grouped together.
5. All alcohols are grouped together.
6. Alkynes and alkenes which react predominantly with OH radical and have a
double or triple bond at a terminal position in a carbon chain are grouped
together.
7. Alkynes and alkenes which react predominantly with OH radical and with a
double or triple bond at internal position in a carbon chain are grouped together.

61

C hapter 6
M ethodology - Artificial N eural
N etw orks
6.1

N ature o f Input to A N N

The reliability of prediction is not only dependent on the architecture of the artificial
neural network, but also on the input data. To obtain reliable results, the input data
should be concise and minimize redundant or irrelevant information. To achieve this,
the matrix notation is collapsed into a final vector notation of the chemical species
which is suitable as an input to the neural network. This is depicted in the Figure 6.1.
To represent the chemical species as an input to the neural network in an effective
way, the following steps were applied:
1. Since the matrix has diagonal symmetry, it contains redundant information.
The information which is present below the diagonal of the matrix is sufficient
to represent a structure of the chemical species. The upper half of the matrix
is redundant and will not be considered further.
2. All input vectors should have the same size, but the length of the chemical
species in SMILES notation varies from one chemical species to another re­
sulting in different sizes of the matrix. We restricted the maximum size of
62

the SMILES notation and matrix restricting molecules to under by 8 carbon
atoms. The remaining parts of the matrix are filled with zeros which describes
no connectivity as shown in Figure 6.1(e).
3. There are large numbers of zeros in the matrix notation even after reducing the
size of the matrix. These zeros give little information to ANN for classification
and if the input size of the vector is large, the performance of the neural network
may decrease. Since we are not considering any radical species in our work, the
entire diagonal of zero elements (diagonally shaded region in Figure 6.1(f)) is
removed.
4. Since in this work, we did not consider the cyclic species involving the first and
last atoms, the shaded entities of the matrix always have zeros as shown in
Figure 6.1(f).
5. Converting from this matrix notation to the vector notation is relatively straight­
forward. All the rows excluding the shaded entities are arranged in a sequential
manner (as shown in Figure 6.1(g)) which preserves the connectivity informa­
tion.
6. The ratio of rate coefficients (OaiOH radical) is added as a last element of the
chemical species input vector.

6.2

Supervised Learning - M ultilayer Feedforward
N eural N etw ork

Feedforward neural networks are the most popular and most widely used models of
ANN in many practical applications. They are applied to a wide variety of chemistryrelated problems. This class of networks consists of multiple layers of computational
units interconnected with each other in the feedforward way. Each neuron in one
layer has a directed connection to each of the neurons in the subsequent layer. The
63

(4)

(5)^

(1)

CC (=0)C C (C )C

CHj

(SM ILES Notation)

(b)

(7)

4-MethyI-2-Pentanone
(Structural Notation)
(a)

c
(1)
c
(2)
(=0)
(3)
w
(§
(C)
(6)
C
(7)

C
(1)

c
(2)

c
((g ) (4)

c
(5)

c
(7)

0

1.0

0

0

0

0

0

c

0

1.0

0

2.5

1.0

0

0

0

c

1.0

0

0

2.5

0

0

0

0

0

(=0)

0

2.5

0

0

1.0

0

0

1.0

0

0

c

0

1.0

0

0

0

0

0

1.0

0

1.0

1.0

c

0

0

0

1.0

0

0

0

0

0

1.0

0

0

(C)

0

0

0

0

1.0

0

0

0

0

0

1.0

0

0

c

0

0

0

0

1.0

0

0

(d)

(c)

c

0

c

1.0

0

(=0)

0

2.5

0

c

0

1.0

0

0

c

0

0

0

1.0

0

(C)

0

0

0

0

1.0

0

c

0

0

0

0

1.0

0

0

0

0

0

0

0

0

0

(=(),

(C)

1 .0

0

(e)

1.0 2.5 1 .0 0 0 0 1 . 0 0 0 0 1 . 0 ODD 1 .0 0 0 0
(8)

Relative Rate Coefficient (Ozone/OH Radical)

Figure 6.1; Various transformations of the chemical species

64

first layer is called the input layer where each unit represents the external inputs and
does not perform any calculations. The last layer is called the output layer where the
output units of the network as a whole represents the “answer” . The layers between
these two layers are the hidden layers which do not correspond to either external
inputs or outputs of the network. Each unit in the layer generates an output which
is a simple function of its inputs, and may include external data or the outputs of
previous layers. Each unit takes the output generated by the previous layer as an
input, performs calculations, and provides its output as an input to the subsequent
unit in a sequential manner. The coefficients, which multiply the inputs to a unit,
are known as “weights” . The weight between the unit i and the unit j of a network
is indicated with Wj, , assuming th a t all elements of a layer are connected to all
elements of the successive layer. In this way, the connections between two layers can
be represented by a weight matrix W. In this matrix the entry ji corresponds to the
connection between node i and node j in the succeeding layer.
These weight coefficients are used to make the network “learn” training data. The
processing element multiplies each input by its connection weight and usually sums
these products. The summed output n, often referred to as the net input, is used
as input to the transfer function which produces the neuron output a. The transfer
function can be linear or nonlinear. The transfer function is chosen depending upon
the specification of the problem th a t the neuron is attem pting to solve. There are
different varieties of transfer function which are used depending upon the type of
problem and threshold value. Some of the standard transfer functions are [32] :
• The hard limit transfer function sets the output of the neuron to 0 if the function
argument is less than 0; or 1 if its argument is greater than or equal to 0.
• The symmetrical hard limit transfer function sets the output of the neuron to
-1 if the function argument is less than 0; or 1 if its argument is greater than
or equal to 0.
• The linear transfer function where the output of the transfer function is pro­
65

portional to its input.
• The saturating linear transfer function sets the output of the neuron to 0 if the
function argument is less than 0; the output is equal to function argument if its
arguments is greater than or equal to 0 and less than or equal to 1; or 1 if its
argument is greater than 1. The input/output relation is:
a =

0 if n < 0

a=

n

a =

1 if n > 1

if 0 < n < l

• The symmetric saturating linear transfer function sets the output of the neuron
ing the range -1 to 1 as follows:
a = -1

if

n < -1

a= n

if - l < n < l

a= I

if n > 1

• The log-sigmoid transfer function takes the input of any value from -foo and
-oo and maps the output into the range 0 to 1. The in p ut/output relation is:

1 + er
In our approach, log-sigmoid and saturating linear transfer functions have been
used in the hidden layer and the output layer respectively for the feedforward neural
network. The feedforward neural network uses the backpropagation algorithm. The
backpropagation algorithm uses differentiation of the transfer function. Therefore the
most popular choice of the transfer function for the feedforward neural network in
the hidden layer is sigmoidal because it has a continuous derivative.
The multilayer feedforward neural network learns using the backpropagation algo­
rithm. Backpropagation can be applied to a large number of problems. It is successful

66

a = logsig (n)

a = satlin (n)

L o g -S ig m o id T ran sfe r F u n ctio n

S aturating L in ear T ran sfe r Function

Figure 6.2: Transfer functions

in practical applications. The backpropagation algorithm is simple to use, and reaches
an acceptable error level reasonably quickly. The network is provided with the inputoutput pairs and then tries to find the weights which minimize the squared error of
the approximation produced by the network. In this case the best solution is obtained
by the least mean squared method. In a multilayer networks with nonlinear transfer
functions, the relation between the network weights and the error is more complex.
An iterative gradient descent has to be used in order to minimize the squared error
for the training set. This is achieved by means of the backpropagation algorithm.
As the name implies in “backpropagation” , the error of the network is propagated
“backwards” from output nodes to hidden layer(s). In this algorithm, the input data
is repeatedly presented to the network. W ith each iteration of presented d ata an
output is generated. The output of the neural network is compared to the target
output and an error is computed. The error computed here is the mean square error
(F(x) = E(e^) = E[(t-a)^]). The vector of network weights and biases is ‘x’. The
target output is denoted as ‘t ’ and ‘a ’ is the network output. This error is fed back
to the neural network and used to adjust the weights such th a t the error is decreased
in each iteration. The weights are randomly assigned to the network in the first
iteration. As a result in each iteration of the training process, the network model

67

gets closer to producing the desired output. The training of a neural network can be
done by finding those weights th a t minimize the network’s error on the given samples.

6.2.1

B a ck p ro p a g a tio n A lg o r ith m

The backpropagation algorithm consists of two phases: a forward propagation and
a backward propagation. In forward propagation, the input units are applied to the
input neurons and all the outputs are calculated using the sigmoid threshold of the
inner product of the corresponding weight and input vector. The outputs of the pre­
vious layer k are propagated to the next layer k+l. Finally, a set of outputs are
produced as the actual response of the neural network. During the forward propaga­
tion the weights of the network are not changed. The network output is compared
with the target output, and calculates the overall system error by squaring the differ­
ence between this pair of vectors. The accumulated error for all of the input-output
pairs is defined as the Euclidean distance in the weight space. During the backward
propagation phase this error signal is then propagated backward through the network.
The weights are adjusted in the direction of decreasing error in accordance with an
error-correction rule in order to attem pt to minimize this distance using the gradient
descent approach. The objective of the gradient descent approach is to make the
function decrease for every iteration. Gradient descent is a function approximation
which uses the derivative of a function to determine the steepest descent of the slope.
The function moves in the negative direction of the slope so th a t the value of the
function is reduced in each iteration. A detailed description of gradient descent in
backpropagation algorithm is illustrated in Appendix A.
Backpropagation provides a way to compute the necessary gradients, so th a t the
network finds a local minimum of the training error function with respect to network
weights. For the multiple layer neural network the error is not an explicit function
of the weights in the hidden layer, therefore the differentiation is not computed so
easily. Because the error is an indirect function of the weights in the hidden layer, a
chain rule of differentiation is applied to compute the gradient of the error function
68

with respect to the weights [32].
If yj{n) is the value of the j t h unit at n th iteration, for each Wji connecting the
output of the neuron i to the input of the neuron j at iteration n, one can write the
partial derivative of the error function as:

dS{n) _ dS{n) dej{n) dyj{n) dvj{n)
dwji{n)
dej{n) dyAn) dvj{n) dwji{n)
where
6 (n) refers to instantaneous value of the error at iteration n.
6j (n) refers to error signal at the output of neuron j at an iteration n.
yj{n) is the functional signal appearing at the output at iteration n.
Vj{n) refers to the induced local field (i.e., weighted sum of all inputs
plus bias) produced at the input of activation function of neuron j at iteration
n.
4>j is the transfer function.

The update for the weights will be Awji —

where, 77 is a parameter

known as the learning rate. The detailed mathematical description of the backprop­
agation algorithm is explained in the Appendix-A.

6.3

U n su p ervised Learning - C om p etitive N eural
netw orks

Unsupervised learning performs a mapping from input to output space. Unsupervised
learning is used in a wide variety of fields, the most common of which is cluster
analysis.

As discussed in Chapter 2, unsupervised learning does not provide the

network with the target outputs. For unsupervised learning the training set consists
of input patterns only. Raw data with no prior knowledge of classification is to be

69

analyzed. The network learns to adapt based on the experiences collected through
the previous training patterns. The furthur reading can be sought in [32,34].
Cluster analysis is one of the classification methods th at are used to arrange a set
of cases into a cluster such th at the cases within the cluster are more similar to each
other when compared to the cases in the other clusters. Such a process can be readily
performed using simple competitive networks. The architecture of the competitive
neural network is shown in the Figure 2.2. In the network p^ and aj indicate ith input
node and j t h output node respectively. Wj is the prototype weight vector stored in
the j t h row of weight matrix W and connected to the j t h output node from all the
input nodes.
The main tasks of the competitive neural network are:
• The initial values of the prototype weight vectors are assigned as random values
in order to avoid the possibility of any user influence.
• For each input vector p (R x l) determine the winning neuron, j, for which its
weight vector, Wj, is closest to the input vector. For this neuron, the output aj
—

1.

• Adjust the weight vector of the winning neuron, Wj, in the direction of the input
vector, p with the training algorithm. Weight vectors of the losing neurons are
not modified.
To perform the above-mentioned objectives, the competitive neural network is
divided into two different layers of neurons which perform two different actions:
1. The similarity measure layer which measures how much the input vector resem­
bles the weight vector of each perceptron.
2. The maxnet layer (or competitive layer) which declares the weight vector closest
to the current input vector as the winner.

70

6,3.1

S im ilarity M ea su re Layer

The main objective of this layer is to perform the correlation between the input vector
and the weight vector. The correlation is done by generating the distance signal that
indicates the similarity between those two vectors and leads to the formation of a
prototype weight vector. The ||ndist|l box in the Figure 2.2. accepts the input vector
p and the input weight matrix W, and produces the vector having M elements which
describes the distance between the input vectors and the weights vectors.
In order to represent the output node with a cluster for a given input, each per­
ceptron at the similarity measure layer calculates distance between the input vector
and the weight vector by two methods:
1. Measure of the similarity for the normalized vector
2. Measure of the similarity for unnormalized vector
M easure o f th e sim ilarity for th e norm alized vector:
The distance between the input vector and the prototype weight vector can be
calculated by the weighted sum. This weighted sum can be interpreted as the dot
product of the input vector p and the weight vector W. Through a weighted sum each
perceptron calculates a measure of how closely its weight vector resembles the input
vector. The highest dot product indicates the highest similarity between these two
vectors.

max (a) = ^

WiPi = W * p

M easure o f th e sim ilarity for th e unnorm alized vector:
The most obvious similarity measure for the unnormalized vectors is the Euclidean
norm, i.e., the magnitude of the difference vector S.

d = \ \ P - W\\ =

+
71

+ ....... + J 2 =

This measure is relatively complex to calculate, hence the square of the magnitude
of the difference vector is used:

i=l

where N is the size of the input vector.

6 .3 .2

C o m p e titiv e Layer (or M a x n e t)

The objective of the competitive layer is to declare as winner the node which has a
weight vector th a t is closest to the vector of the input. This layer is a fully connected
network with each node connecting to every other node, including itself. The basic
idea is th a t the nodes compete against each other by sending out inhibiting signals
to each other. In this layer the net input n is computed by adding the distance signal
and the biases b. The competitive transfer function in the competitive layer accepts
the net input vector and returns the neuron outputs of zero for all neurons except
the winner which has an output of 1.

6 .3 .3

T h e C o m b in a tio n o f th e se T w o Layers

The combination of these networks forms the simple competitive neural network. In
a simple competitive network, a maxnet connects the top nodes of the similarity
measure layer. Whenever an input is presented, the similarity measure layer finds
the distance of the weight vector of each node from the input vector. The calculated
signal is then fed to the competitive transfer function in the maxnet. Using this
competitive transfer function only one node wins and all other nodes converge to 0
except for the node with the maximum initial value, which is deemed as the winner.
In this way the maxnet identifies the node with the maximum value which has the
closest similarity to the input vector. Once it has found the weighting vector of the
winning node, this weight is updated by the training algorithm and all other weights
remain unchanged.

Thus the winning neuron with its weight vector moves more
72

closely towards the similarity input vector. In this study, the competitive neural
network is trained with the Kohonen learning rule. The Kohonen learning rule is
defined as:

where r] is the learning rate.

I

OO
OOO;

###
###

o o

® e#

mOO

o

# #

o

o

O

w4

GO '
^* *® ®
®
J
D uring Training

A fter Training

Figure 6.3: Training process for competitive neural network

This process is repeated iteratively over entire training set of the data until the
weight change becomes minimum and the weight vectors are considered as the repre­
sentative of the clusters. If the input data sets are in the form of clusters, then every
time the winning node is excited, the winning weight vector will move towards the
particular dataset which is in the form of a cluster. Eventually, once the competitive
neural network is trained, each of the weight vectors would converge to the centroid
of one cluster ideally representing the prototypes of the clusters found in the dataset.
This training process is depicted in the Figure 6.3 where i is referred to input data.
When a new dataset is presented it calculates the similarity with the weight vectors
which are the centroids of each cluster and generates the output th at is the closest
73

resemblance to the given input vector.

6.4

U secase D iagram

The first step in any system design is to identify the usecases and actors.

UML

(Unified Modeling Language) tools are among the best tools for designing software
systems. ‘ Usecase' is a system level function th a t helps one to visualize by describing
the interaction between the user and the system. It emphasizes the behavior as it
appears to the outside user environment. The elements in the usecase diagram are
“usecases” , “actors” , and the association between them.

Identifying th e usecases:
SAMS Software

Chemical species database
^

in SMILES notation ,
< uses >

VOCs L ist in
SM ILES Notation

Functional Group Sorter

< uses >
Pruned Chemical Species

SMILES to Matrix

List in SM ILES Notation,

^

C onverter^

Actor
< uses >
V ector Notation

Artificial Neural
Network

O utput

Figure 6.4: Usecase diagram for the system

Usecases are the services provided by the system from the user’s perspective. In this

74

system, we identified the following usecases as shown in Figure 6.4:
1. VOCs list extraction and the extracted list is given to the functional group
sorter as an input
2. Pruned chemical species generation and the pruned list is given as an input to
the SMILES to matrix notation converter
3. Optimized vector generation and the vector is given as an input to artificial

neural network for training and testing process
4. O utput
Identifying th e actors:
An actor is a person, organization, or external system th at plays a role in interacting
with the system.
In this case, the researcher or user, the EPI Suite (External System), and the SAMS
software are the only actors.

Usecases help to represent problem scenarios and to identify the cases th at need
to be taken care of during design or implementation. The implementation of lump­
ing of atmospheric chemical species through artificial neural network is discussed in
Chapter 7.

75

C hapter 7
A pplication of th e A N N for
C lassification of Chem ical Species:
Im plem entation and R esults
Neural networks have the ability to learn and therefore generalize. Generalization
refers to the neural network producing reasonable outputs for the inputs not en­
countered during training. Neural network performance depends on two attributes:
knowledge representation and architecture.
Knowledge representation includes what type of information is actually made ex­
plicit to the network and how the information is physically encoded for the subsequent
use. An accurate solution for the problem depends upon the appropriate representa­
tion of the input data to the neural network.
For a given problem, an appropriate neural network architecture has to be con­
sidered. This consists of selection of suitable neural network model and network
parameters. Unfortunately, there are currently no well defined rules to do this, so
th a t ad-hoc procedures are used to yield good results.
A neural network may perform badly if the network model is poorly fitted. There
are many other reasons for a network model to underperform. Some of the reasons
may be due to improper input node assignment, insufficient hidden nodes, too few
76

training epochs, inappropriate values of design parameters, and the nature of the
dataset. A number of experiments in lumping chemical species with artificial neural
networks have been carried out.
In this work, network modeling and programming is performed using MATLAB™
version-7. M ATLAB^^ (Matrix laboratory) is a high performance interactive soft­
ware package for performing numerical computations and graphics with matrices and
vectors. MATLAB™ features a family of add-on application-specific solutions called
toolboxes. MATLAB^-^ toolboxes are available for signal processing, control sys­
tems, neural networks, fuzzy logic, wavelets, simulation, and many others.

The

MATLAB™ Neural Network Toolbox was used in this work.
This chapter discusses the network development, parameters used, testing meth­
ods, and results obtained for both supervised and unsupervised learning. Finally this
chapter also discusses why one learning method performed better than the other.

7.1

Supervised N eural N etw orks

7.1.1

N etw o rk D e v elo p m e n t

The feedforward neural network has 19 elements in the input vector. The first 18
elements describe the connectivity of atoms within a molecule and the final element
describes the relative rate coefficient for reaction of the chemical species with ozone
and OH. There are seven output layer units, with each unit describing a lumped
category. The lumped categories considered in this work are aldehydes, terminal
alkenes or alkynes, internal alkenes or alkynes, alcohols, chemical species which react
predominantly with ozone, alkanes, and ketones. Various combinations of the training
epochs, hidden layers, and hidden layer units have been considered to optimize the
performance of the network. The training and testing process is carried out for 25
iterations. An epoch is defined as a single cycle in which a sequence of all the input
vectors is presented to the neural network. An iteration is defined as a cycle of a
designated number of epochs. At each epoch, when an input vector is presented to
77

the neural network, an error is computed. The weights of the network are adjusted
based on the error computed for all the input vectors. The modification of the weights
is carried out for the complete cycle of epochs such th at the mean squared error is
decreased for every epoch. This process is carried out until the network reaches
the maximum number of epochs or the network reaches the minimum mean squared
error. This complete process is one iteration. If the results are presented for only one
iteration, then the results obtained can not be justified as a good result. Therefore
to evaluate the performance of the network and the results obtained, the network
training and testing process is carried out for 25 iterations.
Various combinations of the hidden layer and its units are considered because
these greatly influence the performance of the network. If the number of nodes in
the hidden layer is too low, it may not yield an appropriate classification of the data.
On the other hand, if “more-than-necessary” hidden neurons or hidden layers are
used, the network may tend to overfit the data. According to a rule of thumb often
mentioned in the literature, a network performs better if the sum of hidden nodes
is greater than the sum of input nodes and output nodes [36]. We have carried out
the experiments with 27, 35, 45, 55, 65 or 75 hidden nodes in a single hidden layer
and [25, 10] nodes respectively in a two hidden layers network. The notation [25, 10]
indicates th a t it has two hidden layers with 25 hidden nodes in the first hidden layer
which is connected to the input layer and the 10 nodes in the second hidden layer
which is connected to the first hidden layer and the output layer.
Prior to the network training operation, a dataset consisting 1,016 chemical species
is sorted into seven categories as shown in Table 7.1. The dataset is divided into two
distinct parts - a training set and a testing set. A testing set of 102 chemical species
is generated using a uniform random distribution within each lumped category so
th a t 10% of the chemical species from every lumped category is included for testing
the network. This testing set is excluded from the training set and used later, once
the model is developed to test the performance of the network model. The network
parameters for the supervised neural network are shown in the Table 7.2.

78

Table 7.1: Number of chemical species in the dataset
Lum ped C ategory
Aldehydes
Terminal alkenes or alkynes
Internal alkenes or alkynes
Alcohols
Molecules which react predominantly with O3
Alkanes
Ketones

N um ber o f C hem ical Species
98
500
119
36
152
38
73

Table 7.2: Network parameters adopted for supervised neural network experimenta­
tion
Architecture
Learning Algorithm
Input units
Hidden Layers
Hidden Layer nodes
Number of training Epochs
O utput units
Transfer Function
Adaptive Learning Rate
Initial weights and biases
Performance Goal
Training set
Testing set

Feedforward neural network
Gradient descent with adaptive learning rate backpro­
pagation algorithm
19
One and two hidden layers
Various combinations 27, 35, 45, 55, 65, 75, and [20 15]
250, 500, 1000, and 1500
7 proposed lumped patterns
‘logsig’ in hidden layer and ‘satlin’ in output layer
0.01 (default)
Randomly generated
0
90% of the dataset
10% of the dataset (Uniform random distribution
from each class)

79

7 .1.2

T rain in g an d T estin g th e N etw o rk M o d el

In each epoch of training, when an input vector is presented, the seven elements of the
output vector are generated. The seven elements of the output vector are compared
to the value of the seven elements of the target vector and an error is computed. This
error is the mean-squared error ' ^ { X i - X )^/n which averages the error between the
network’s output and the target output over all n inputs. The error is fed back to
the neural network and the backpropagation mechanism will adjust the weights and
biases of the neurons of the network according to the learning algorithm such that
the error is decreased for every epoch. This training process is repeated until the sum
of the mean squared error is minimized or maximum number of epochs of training
are reached. As a result in each epoch of an iteration, the network model gets closer
to produce a desired output.
The final step in each iteration is testing the performance of the neural network.
In the testing process, when an input vector is presented to the neural network, an
output is generated. Seven elements of the output indicate the “goodness” of the
match with the seven lumps of the chemical species. The output element with the
maximum value indicates the lumped category of chemical species represented by
the input vector. The percentage accuracy of classifying a chemical species into a
distinct lumped group for various network designs are shown in Tables 7.3 and 7.4.
The results presented are the average and the standard deviation of 25 iterations.
Each iteration is first carried out for up to a different maximum number of epochs of
training. The results presented in the Tables 7.3 and 7.4 are trained for 250 epochs.
This process is repeated
• Representation as

for two different representations

of ‘tran s’ double bonds:

matrix element value2.75 (and input vector) for a double

bond between the two C atoms which is associated with the ‘tran s’ conformation
as mentioned earlier in the Section 2.6.2
• Representation as

matrix element value1.75 (and input vector) for a double

bond between the two C atoms which is associated with the ‘tran s’ conforma­
80

tion.
The notation is transformed from 2.75 to 1.75 because T.75’ is closer to ‘2’ (dou­
ble bond) and this representation of ‘cis’ (2.25) and ‘trans’ (1.75) averages to the
representation of ‘double bond’ (i.e., 2).
We can observe from the results th at the performance of the network varied from
one iteration to other for the same network model. The accuracy of classifying the
chemical species into appropriate lumped categories varied approximately from 46%
to 92%. This variation of the results is not considered to be unusual in the field of
neural networks. For example in Table 7.4, for the 5*^ iteration for 45 hidden nodes,
the network responded with only 48% of accuracy. Such results may be obtained if
the network reaches either the minimum gradient or the maximum number of epochs
before the performance goal has been met. This case is an example for reaching
the minimum gradient before the performance goal has been met.

For randomly

assigned initial weights, there is a probability th a t the mean squared error may not
have converged within the given number of epochs.
The transformation of ‘trans’ representation from 2.75 to 1.75 led to improved
results as shown in Tables 7.3 and 7.4. For most of the models, the mean of the
results is increased and the standard deviation is decreased when the ‘trans’ double
bond is represented by 1.75 instead of 2.75. This is a clear indication th a t 1.75 is a
better representation than 2.75 for a ‘tran s’ double bond and th a t the results overall
may be sensitive to the numerical values assigned to various chemical bonds.
The network is trained for various training epochs (250, 500, 1000, 1500). The
training process by gradient descent with an adaptive learning rate backpropagation
algorithm is stopped before reaching the maximum number of epochs if the algorithm
has converged. The network is said to have converged if the mean squared error (MSB)
is nearly constant over several epochs. The best results obtained for the supervised
training method are illustrated in the Table 7.5.
A single hidden layer feedforward neural network with 35 and 65 hidden nodes
produced the best result, classifying 92.16% of the chemical species into appropriate
81

Table 7.3: Classification accuracy of lumping chemical species into appropriate groups
with 27 and 35 hidden nodes(HN)

Iter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
M ean
ST D

H N -27
Trans
2.75

Trans
1.75

66.67
85.29
89 .2 2
85.30
86.28
86.28
72.55
65.69
85.29
85.29
73.53
61.76
77.45
55.88
79.41
87.26
83.33
86.28
78.43
87.26
85.29
80.39
87.26
83.33
82.35

86.28
91.18
91.18
87.26
88.24
86.28
86.28
90.20
80.39
90.20
87.26
89.22
85.29
86.28
86.28
85.29
83.33
84.31
88.24
88.24
86.28
83.33
86.28
89.22
88.24

H N -35
Trans
2.75
89.22
76.47
78.43
87.26
85.29
74.51
87.26
73.53
90.20
72.55
82.35
88.24
9 2 .1 6
86.28
80.39
79.41
79.41
81.37
80.39
88.24
74.51
81.37
87.26
75.49
85.29

79.88
8.97

86.98
2.57

82.28
5.77

Trans
1.75

H N -45
Trans
2.75

88.23
64.71
85.29
87.26
80.39
81.37
72.55

90.20
87.26
90.20
89.22
87.26
87.26
84.31
89.22
88.24
87.26
83.33
87.26
91 .1 8
82.35
84.31
87.26
89.22
84.31
87.26
84.31
89.22
83.33
83.33
83.33
84.31

H N -55
Trans
2.75
90.20
74.51
80.39
84.31
76.47
85.29
65.69
72.55
74.51
77.45
62.75
91.18
77.45
47.06
75.49
66.67
87.26
76.47
86.28
83.33
79.41
85.29
77.45
84.31
77.45

79.41
12.0871

86.59
2.64

77.57
9.64

82.35
79.41
82.35
85.29
85.29
84.31
89.22
85.29
88.24
85.29
84.31
86.28
84.31
88.24
88.24
87.26
81.37
87.26
83.33
83.33
81.37
85.29
9 2 .1 6
84.31
86.28

89 .2 2
81.37
63.73
46.08
86.28
77.45
83.33
87.26
84.31

85.22
2.83

82

83.33
71.57
83.33
88.24
48.04
87.26
86.28

Trans
1.75

Trans
1.75
84.31
85.29
87.26
g g .g g
86.28
88.24
66.67
71.57
75.49
82.35
66.67
88.24
88.24
85.29
85.29
68.63
83.33
84.31
87.26
81.37
88.24
89 .2 2
85.29
85.29
88.24
82.86
7.13

Table 7.4: Classification accuracy of lumping chemical species into appropriate groups
with 65 and 75 hidden nodes (HN)

Iter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

H N -65
Trans
2.75
84.31
88.24
81.37
86.28
73.53
55.88
73.53
86.28
90.20
74.51
75.49
86.28
92 .1 6
85.29
83.33
86.28
78.43
90.20
70.59
88.24
83.33
84.31
87.84
89.22
87.26

Trans
1.75
79.41
82.35
90.20
89.22
71.57
87.26
87.26
78.43
84.31
67.65
79.41
91.18
92 .1 6
90.20
79.41
82.35
85.29
83.33
73.53
88.24
81.37
84.31
84.31
88.24
89.22

H N -75
Trans
2.75
87.26
77.45
83.33
86.28
80.39
86.28
84.31
85.29
76.47
78.43
82.35
87.26
72.55
78.43
88.24
83.33
84.31
89.22
85.29
78.43
85.29
86.26
82.35
90 .2 0
87.26

M ean
ST D

79.29
16.93

83.61
6.29

83.45
4.43

83

Trans
1.75
85.29
88.24
85.29
83.33
81.37
78.43
88.24
83.33
79.41
71.57
71.57
90 .2 0
87.26
88.24
90.20
85.29
84.31
86.28
86.28
77.45
88.24
88.24
87.26
87.26
86.28
84.35
5.11

HN-[25,10]
Trans
2.75
77.45
78.43
74.51
76.47
79.41
81.37
83.33
78.43
76.47
72.55
83.33
80.39
80.39
87.26
80.39
78.43
79.41
71.57
81.37
74.51
77.45
83.33
86.28
86.28

Trans
1.75
77.45
80.39
80.39
82.35
83.33
81.37
8 7 .2 6
82.35
82.35
79.41
84.31
83.33
79.41
80.39
62.75
70.59
76.47
81.37
76.47
84.31
74.51
77.45
72.55
82.35
86.28

79.88
4.42

79.57
5.34

88.24

T raining N eu rai N etw ork
0.2
18

16
0 .1 4

0.12
ui
w

0.1

•M S E

0 .0 8
0 .0 6
0 .0 4

0.02
0
1

5

9 13 17 21 25 29 33 37 41 4 5 4 9 53 57 61 65

Num ber o f E pochs

Figure 7.1: Training of a neural network

Table 7.5: Best classification accuracy of chemical species into appropriate lumping
groups
Network Design
27 hidden nodes
35 hidden nodes
45 hidden nodes
55 hidden nodes
65 hidden nodes
75 hidden nodes
[20 15] hidden nodes

Accuracy of
Prediction (%)
91.18
92.16
91.18
89.22
92.16
90.20
87.26

Performance Goal
Achieved (MSE)
0.0014603900
0.0023992100
0.0006678037
0.0009909770
0.0006780370
0.0008345070
0.0023994000

84

Number of Epochs
83
83
61
60
64
64
94

Table 7.6: Network parameters for unsupervised neural network
Architecture
Learning Algorithm
Input units
Number of Layers
Number of training Epochs
O utput units
Transfer Function
Kohonen learning rate
Conscience learning rate
Initial weights and biases

Training set
Testing set

Competitive neural network
Kohonen learning rule for updating the weights and
bias learning rule for updating the biases
19
Similarity measure layer and competitive layer.
1000
1 (i.e., the winner)
Competitive transfer function (compet) in comp­
etitive layer
0.01 (default)
0.001 (default)
The initial values of weights are the midpoint of
the input range and biases are initialized by con­
science bias initialization function
90% of the dataset
10% of the dataset (Uniform random distribution
from each class)

categories. Less than 8% of the chemical species are misclassihed. This is discussed
further in Section 7.3.

7.2

U nsu p ervised N eural N etw orks

7.2.1

N etw o rk d e v elo p m en t

In contrast to the supervised learning, unsupervised learning does not use any target
outputs for training the neural networks. Unsupervised learning methods can be used
for applications such as mapping from input to output space, d ata compression or
clustering. The network parameters used for the unsupervised learning method are
as shown in Table 7.6.
The initial values of weights assigned for a network are the midpoints of the
input range and the biases are initialized by a conscience bias initialization function.
Midpoint is a weight initialization function th a t sets weight (row) vectors to the
85

Table 7.7: Examples for chemical species represented in vector notation (VN) and
normalized vector notation (NVN)

F o rm ald eh y d e
VN
NVN
H ex ald e h y d e
VN
NVN

[1.0 2.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0.3714 0.9285 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1.0 1.0 0 1.0 0 0 1.0 0 0 0 1.0 0 0 0 0 2.5 0 0 0]
[0.2981 0.2981 0 0.2981 0 0 0.2981 0 0 0 0.2981 0 0 0 0 0.7454 0 0 0]

center of the input ranges. This function takes two arguments (S, PR), where S is
the number of neurons and PR is an R x Q matrix of input value ranges [Pmm Pmax]
and returns an S x R matrix with rows set to (Pmin + Pmax)^/2. The conscience bias
initialization function assigns random values depending upon the number of neurons.
No m atter how long the training is continued in competitive networks, there is a
possibility th at randomly assigned neuron weight vectors can start out far from any
input vectors and will not lead to learning. As a result the neuron can never win the
competition. This limitation can be mitigated by using the bias learning rule. The
functionality and this algorithm are documented in the neural network toolbox [34].

7.2.2

T rain in g an d T estin g th e N etw o rk M o d el

Usually, the inputs are applied to the network through the input layer. As mentioned
earlier in section 2.3, the first process for a competitive neural network is to calculate
the distance between the input vector and the weight vector. The distance between
the input vector and the weight vector can be calculated by the weighted sum. The
weighted sum can be interpreted as the dot product of the input vector and the
weight vector. The input vectors need to be normalized for this approach. If the
input vector has been normalized, the weights of each element in the input vector
which contain the connectivity information will be changed.

Consider the vector

notation for two chemical species from the aldehyde group shown in Table 7.7. The

86

connectivity information of carbon to carbon single bond in the vector notation is
represented by 1.0 in all the chemical species. But when the vector is normalized, the
same carbon to carbon single bond is represented by different values thus changing
the weights of the representation. This means the similarity can not be measured.
This process of normalizing the vector alters the connectivity information of all the
atoms within the molecule. Because the information is degraded the normalization
is not appropriate to the application. Therefore the similarity measure is carried out
for the un-normalized input vectors.
When an input vector is presented it finds the similarity measure by calculating
how far the weight vector is from the input vector as discussed earlier in section
6.3. The output obtained by the similarity measure layer is fed to the competitive
layer where each node is connected to every other node including itself. Every node
competes with the other nodes and finally the network presents the winner. The
winning node indicates th a t the input vector which is presented to the neural network
belongs to the winner class.

The prototype vectors formed after 1000 epochs of

training are as shown in Figure 7.2.
The 8*^ and 12*^ elements of the prototype vector are zeros because all the chemical
species in the vector notation has their

and 12‘^ elements as zero. Ten percent of

the dataset (102 chemical species) from each group (Aldehydes - 10, Terminal alkenes
or alkynes - 50, Internal alkenes or alkynes - 12, Alcohol - 4, Chemical species which
react predominantly with ozone -15, Alkanes - 4 and Ketones - 7) is used as the testing
set. As in the case of supervised learning a testing set is generated using a uniform
random distribution within each lumped category so that 10% of the chemical species
from every lumped category is included for testing the network. Once the network
has been trained, every weight vector in the weight matrix forms a prototype weight
vector of the input cluster (each weight vector contributes to the centroid of the each
cluster). When a new input vector from the testing set is presented to the network,
it finds the similarity between the input vector and the prototype weight vector and
classifies to a cluster which is the closest representation to the input vector. The

87

1,2344 2.3544
1.7313 0.9929
0.2835 0.0000
0.9794 1.9291
0.0079 0.0000
0.2334 0.4881
0.8768 0.5008

0
0
0.0762 0.0822
0.2508 0.0542
1.0186 0.7385
0
0
0.0000 0.0101
0.0367 0.0060
0.2874 0.3103
0.7050 0.3949
0.2142 0.1542
0.3781 0.1508
1.9605 0.1009

1.9677
1.2074
0.3459
0.6554
0.0000
0.0000
2.4186

0
0.2276
0.2671
0.4878
0
0.0000
0.0000
0.0091
1.3530
0.2695
0.3224

0.2900

2.2025

1.0652
0.0457
1.7036
0.0002
0.5249
0.4787
0
0.0000
0.0000
2.5174
0
0.0000
0.0706
0.1535
0.5052
0.0542
0.3211
0.2900

1.2756
1.6809
0.4864
0.4736
0.0977
0.0224
0.9211
0

2.0091
1.1343
0.1667
1.0835
0.0080

0.0298

0.2884

1.0026
0.2478
0.9496
0.0182
0.3637
0.8597
0
0.1966

0.3377
0.4760
0
0.0012
0.1075
0.1195

0.3035

0.1682

0.4082
0
0.0000
0.0000
0.0000

0.9499
0
0.0705
0.0937
0.3756

0.2870

2.5526

0.4898

0.0750
0.1480
0.0852

0.0431
0.4751
0.2496

0.0000

0.3582

0.8510
0

2.1598

2.0534
0.2548

Figure 7.2: Prototype weight vector (W^) formed after 1000 epochs [Columns repre­
sent the clusters and rows represent the connectivity information]

competitive layer assigns each input vector to one of the categories by producing an
output of 1 for a neuron with the weight vector is closest to input vector. The results
obtained for the unsupervised learning method after training the network for 1000
epochs are illustrated in the Figure 7.3. The numbers in the results (Figure 7.3)
indicate the winning neuron for each chemical species given as test data.
Terminal alkenes or alkynes

Aldehyde

1 2 3 4 5 6 7 8 9 Id 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30' >— .Order of Chemical species
in testing set

6 6 6 6 6 4 2 4 7 4 3 3 6 7 6 7 2 2 3 6 2 3 7 7 7 5 2 6 2 7-

■Output Neuron won

Terminal alkenes or alkynes

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
5 4 5 3 7 6 2 2 2 4 2 2 2 3 3 6 6 7 2 2 2 4 7 2 3 7 7 7 7 2
f

\
Internal alkenes or alkynes

Alcohol

Chemical species which predominently react with ozone

61 62 63 64 65 66 67 68 69 70 71 72

73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91

1 1 6 5 2 5 2 3 2 2 5 4

2 5 2 5

Alkanes

1 1 1 1 1 1 1 1 3

6 1 1 3

11

Ketones

92 93 94 95 96 97 98 99 100 101 102
5 5 5 5

2 5 3 2 3

6 5

Figure 7.3: Results obtained for unsupervised learning method

In supervised learning method, a previously classified set of the chemical species
is required for training the neural network. Once trained the network will be able to
classify the chemical species into appropriate categories. The potential of the unsu­
pervised learning method is th a t it does not require any prior classification process
for training the neural network. The unsupervised learning method is able classify
the chemical species into categories based on the input patterns only. The results ob­
tained by the unsupervised learning method do not appear promising when compared
to the supervised learning method.

89

7.3

D iscu ssion

The main purpose of the experimentation with ANNs was to design and test ANNs for
automating the classification of chemical species. We employed both the supervised
learning and the unsupervised learning neural networks.
The supervised learning method was able to classify accurately 92.16% of the
chemical species. Misclassification is occured for less than 8% of the chemical species.
An analysis of the results is done for the first five iterations of training and testing. In
supervised learning, it was the alcohols th at were most frequently misclassihed. For
example, the chemical species with a SMILES pattern of C = C C (0 )C —C was never
successfully classified as a member of the alcohol lump. When this chemical species
is given for testing it was misclassified once as an aldehyde lump and four times as
terminal lump. From the neural network perspective, it might be appropriate th a t it
has been classified as terminal lump because it has two double bonds in the terminal
positions of the carbon chain. But from the lumping point of view we have told
the supervised network to classify it as a member of the alcohol lump. There are
chemical species in all the lumped categories with more than one functional group.
We have not employed any ranking scheme for multiple functional groups within a
chemical species. The chemical species with multiple functional groups were able to
be classified properly for other lumps but not for the alcohol lump.
It was also observed from the results th a t there are few cases in which a chemical
species from the alcohol lump were not able to be classified into any of the proposed
lumped categories. Two examples are presented in Table 7.8.
Another reason for misclassification of alcohols may be the low number of chemical
species available in this lump. There are only 36 chemical species in this lump in the
training set, less than half the number of chemical species present in the other lumps.
As shown in Table 7.8, two other chemical species in the alcohol lump which were
misclassified frequently are C C (0)C C = C and CCCO.
Another lump for which misclassification was frequent is the Terminal lump. Fifty

90

Table 7.8: Analysis of results for alcohols - misclassification of chemical species in
supervised learning method obtained from 5 iterations
Lump
Aldehyde
Terminal
Internal
Alcohol
Ozone
Alkanes
Ketones
None
Misclassification

C = C C (0)C = C

C C (0)C C =C

CCCO

1
4
5

2
1
1
1
5

2
2
1
3

chemical species were given for testing from the terminal lump. As shown in Table
7.9, three chemical species (C=CCCC(=C)C, C #C C (= C )C , C # C C # C ) were mis­
classified frequently.
Supervised learning performed better than the unsupervised learning. There are
several factors th a t may account for this.
1. Supervised learning performed fairly well in classifying the chemical species
into appropriate lumped categories because the goal for supervised learning
having the computer to learn a classification previously created. In contrast,
unsupervised learning has the goal is to have the computer learn how to do
something th a t the network has not been previously told how to do.
2. The nature of the dataset contributed to the superior performance of supervised
learning over unsupervised learning. The elements in the vector representation
of chemical species consists of just O’s, I's, 2’s and 3’s. The vector notation of
chemical species in one lump has a close resemblance to the chemical species in
other lumps. For example, the chemical species with patterns C—O and C—C
have 2.5 and 2.0 respectively as elements in the vector notation which describes
the connectivity information between the atoms. Similarly the patterns CO and
CC are represented as 1.5 and 1.0. These are the key elements in the vector no91

Table 7.9: Analysis of results for other chemical species - misclassification of chemical
species in supervised learning method obtained from 5 iterations
Lump
Aldehyde
Terminal
Internal
Alcohol
Ozone
Alkanes
Ketones
None
Misclassification

C-CCCC(-C)C

C#CC(=C)C

c#cc#c

-

1
-

4

2
-

2
1
1
4

3
-

3

1
4

tation th at differentiate between the lumped groups. These weights to represent
the connectivity information between the carbon and oxygen atoms have been
chosen arbitrarily. For example, trans double bond is initially represented as
2.75 but later changed to 1.75 which increased the accuracy of results. However
this value is very close to the values represented for carbon-carbon double bond
(2.0) and carbon-oxygen double bond (2.5). Classes of similar properties may
be separated by supervised learning because supervised learning has an exter­
nal teacher (target output) for each input vector during training. In the case of
unsupervised learning they could be lumped into one class. If the choice of the
numerical values representing different functional groups were more separated,
improved results may be anticipated. This is definitely one of the approaches
we would like to consider in our future work.
3. The supervised learning method can learn about the direction of the error in
which the network is moving. In the case of unsupervised learning, the direction
in which the error is changing does not play a role.
Unsupervised learning methods have been gainfully applied to problems where
each element in the vector of all input data indicates the fixed position or fixed
characteristics. For example in the Figure 7.4 the character ‘A’ can be represented in
92

a vector notation as: [ 0 1 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 ] and the character ‘B’
can be represented in a vector notation as: [ 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0
]. In these two vector notations the first and second elements in the vector indicates
the first and second cells in the first column respectively. This is not the case with
the dataset in this work. Each element position in the vector does not represent the
same functional group. This type of method may not be readily applicable to our
dataset. Even though it has learned to form a prototype weight vector by calculating
the similarity measure, these prototype vectors are not the actual representative of
each cluster. To solve this problem, one approach could be representing the chemical
species by using some form of canonical notation.

Figure 7.4: Example for the input data

In Figure 7.3, the chemical species numbered from 77 to 91 in the testing set belong
to the lump th at reacts preferentially with ozone. When these chemical species are
given as an input to the network, one of seven categories will produce an output of
1 for the neuron whose weight vector is closest to the input vector. Twelve of fifteen
chemical species which are given as an input to the network were able to produce an
output of 1 for the 1®* neuron. This explains th at the E* prototype weight vector
in the weight m atrix closely resembles the ozone lump (i.e., the 1st vector in the
weight matrix was able to form the centroid of the ozone lump cluster). Two of the
chemical species from the ozone lump are misclassified to the 3'’'^ prototype vector
and one is misclassified to the 6*^ prototype vector. We can also confirm th at the 1®*
prototype vector in the weight matrix indicates the ozone lump with the help of a
relative rate coefficient. Only the chemical species which predominantly react with
ozone have a relative rate coefficient greater than unity. The nineteenth element in the
93

vector notation for all the chemical species in the database contains the relative rate
coefficient. It can be observed th a t only the

weight vector ([1.2344 1.7313 0.2835

0.9794 0.0079 0.2334 0.8768 0 0.0762 0.2508 1.0186 0 0.0000 0.0367 0.2874 0.7050
0.2142 0.3781 1.9605]) in the prototype weight m atrix has a nineteenth element greater
than unity. This is the reason th a t the ozone lump was able to classify reasonably
well when compared to the other lumps.
Similarly the alkane lump (chemical species numbers from 92 to 95) was able to
map the 5*^ prototype weight vector. All the four chemical species from the alkane
lump which are given for testing were classified properly. The elements in the vector
notation for all the chemical species in the alkane lump should not be greater than
or equals to 2 because the carbon atoms in the carbon chain of alkanes are single
bonded, which is denoted as 1. It can be observed from the 5*^ prototype weight
vector th a t none of the elements is greater than or equals to 2. Therefore the
prototype weight vector was able to train and map to form the centroid of the alkane
lump cluster. The aldehyde lump (chemical species numbers from 1 to 10) was able
to map with the 6*^ prototype weight vector. Only the first five of the ten chemical
species which are given for testing were able to be classified properly. This is because
the 16*^ element in the prototype weight vector and the first five chemical species
which are given for testing has 2.5 (i.e., oxygen atom is connected to the carbon
atom with a double bond). Therefore it was able to map with 6*^ prototype weight
vector. A similar kind of mapping is done for the rest of the chemical species in the
aldehydes testing dataset and it misclassified accordingly as shown in the following
tables. From this we can conclude th a t unsupervised learning method itself is not
performing badly, but it is because of the nature of the d ata th a t contributes to the
unsupervised learning method performing poorly. Similarly, other lumps have been
misclassified depending upon the input given for testing and the resulting prototype
weight vector formed.
In conclusion, it can be said th at the dataset in its current form may not be ap­
propriate for applying unsupervised learning methods. A potential future direction

94

of research would be to improve the representation of the chemical species for use in
unsupervised learning methods.

Table 7.10(a): Classification and misclassification for th e aldehyde lump.

P rototyp e

1

2

3

4

5

P rototyp e

W eight

W eight

V ector - 6

V ector - 4

6

8

1 0

2.0091

2 .0 0 0 0

2 .0 0 0 0

2 .0 0 0 0

2 .0 0 0 0

1 .0 0 0 0

2.2025

1 .0 0 0 0

1 .0 0 0 0

1 .0 0 0 0

1.1343

1 .0 0 0 0

1 .0 0 0 0

1 .0 0 0 0

1 .0 0 0 0

1 .0 0 0 0

1.0652

1 .0 0 0 0

1 .0 0 0 0

1 .0 0 0 0

0.1667

0

0

0

0

1 .0 0 0 0

0.0457

0

0

1 .0 0 0 0

1.0835

1 .0 0 0 0

2 .0 0 0 0

1 .0 0 0 0

2 .0 0 0 0

0

1.7036

2.2500

1 .0 0 0 0

0

0.0080

0

0

0

0

0

0.0002

0

0

1 .0 0 0 0

0.3582

0

0

2 .0 0 0 0

1 .0 0 0 0

0

0.5249

0

0

0

0.8510

1 .0 0 0 0

1 .0 0 0 0

0

0

2 .0 0 0 0

0.4787

1 .0 0 0 0

1 .0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0.2884

0

0

0

0

0

0.0000

0

0

0

0.3035

1 .0 0 0 0

1 .0 0 0 0

0

0

1 .0 0 0 0

0.0000

0

0

0

0.4082

0

0

1 .0 0 0 0

1 .0 0 0 0

0

2.5174

2.5000

2.5000

2.5000

0

0

0

0

0

0

0

0

0

0

0.0000

0

0

0

0

0

0.0000

0

0

0

0.0000

0

0

0

0

0

0.0706

0

0

0

0.0000

0

0

0

0

0

0.1535

0

0

0

2.5526

2.5000

2.5000

2.5000

2.5000

2.5000

0.5052

0

0

0

0.0431

0

0

0

0

0

0.0542

0

0

0

0.4751

0

0

0

0

0

0.3211

0

0

0

0.2496

0 .1 0 2 0

0.0580

0.0580

0.0500

0.0230

0.2900

0 .0 1 1 0

0

0

95

Table 7.10(b): Misclassification for th e aldehyde lump.

P rototyp e

7

P rototyp e

W eight

W eight

V ector - 2

V ector - 7

9

2.3544

3.0000

2.1598

1.0000

0.9929

1.0000

1.0026

1.0000

0 .0 0 0 0

0

0.2478

1.0000

1.9291

2.5000

0.9496

0

0 .0 0 0 0

0

0.0182

0

0.4881

0

0.3627

0

0.5008

0

0.8597

1.0000

0

0

0

0

0.0822

0

0.1966

0

0.0542

0

0.1682

1.0000

0.7385

0

0.9499

0

0

0

0

0

0 .0 1 0 1

0

0.0705

0

0.0060

0

0.0937

0

0.3103

0

0.3756

0

0.3949

0

0.4898

1.0000

0.1542

0

0 .0 0 0 0

0

0.1508

0

2.0534

2.5000

0.1009

0.0010

0.2548

0

96

C hapter 8
C onclusions and Future D irections
8.1

C onclusions

During the past decade, it has been identified th a t lumping of atmospheric chemical
species into groups is an effective technique to reduce the complexity of the reaction
mechanism by the condensed representation of atmospheric chemistry. Lumping of
chemical species into different categories is a classification problem. We identified
from the computational perspective th a t application of machine learning techniques
by artificial neural networks has the potential to automate the process. In this study,
we
1. Generated a chemical species database for training the neural network. This
includes:
(a) The generation of all the possible chemical species isomers for a given
empirical formula.
(b) Conversion of structural notation of the chemical species to a SMILES
notation for computer use.
(c) Pruning of the chemical species database by retaining the chemical species
which are important in the atmosphere.

97

2. Proposing seven lumped categories using a functional group approach.
3. Implementing a method of transforming the chemical species structural infor­
mation into a notation which can be used by the neural network.
4. Conducting training and testing processes for both supervised and unsupervised
learning methods.
The results presented in this study for supervised learning are more promising than
for unsupervised learning and suggest th a t supervised neural networks can be gainfully
employed for lumping atmospheric chemical species.

The best result obtained is

92.16% accuracy of classification for a supervised neural network learning method.
However, some improvements are needed to quantitatively describe practical systems.
The results of unsupervised learning indicate th a t it is sensitive to the numerical
values assigned to various chemical bonds. Improvements towards the representation
of the chemical species for unsupervised learning must be considered in the future
work.
The percentage accuracy of classification depends on the complexity of the prob­
lem and availability of the data set. We could not compare the results obtained in
this work directly with the results in the literature because this is a novel approach.
When the results obtained in this work are compared with some other chemistry
and environmental science applications in the field of neural networks, these results
are considered to be acceptable. In “The integrated strategy of pattern classification
and its application in chemistry” [50], Huafeng Wang and co-workers tried to classify
the Nature Spearmint Essence (NSE) specimens into a 3 graded ranks of qualitygood, middling and bad and also to classify the toxicity of amine into highly-toxic,
medium-toxic and low-toxic. The neural networks alone showed only 46% accuracy.
An improvement of the model has been obtained by integrating other approaches
such as Bayes method and correlative component analysis leading to a classificafion
accuracy of 100%. In another paper, “Assessment and prediction of tropospheric
ozone concentration levels using artificial neural networks” , by Abdul-Wahab and co­
98

workers [35] the ozone concentration was predicted in advance with an accuracy rate
of98%k
The lumped mechanism (RACM) proposed by Stockwell [23] was used to conduct
a simulation study of environmental chamber experiments, testing a complete system
of atmospheric chemical mechanism. The chamber contains the mixture of NO^, and
the organic species exposed to either sunlight or artificial light and concentration
measurements were made as a function of time. Taking all the uncertainties into
consideration, he was able to predict ozone concentration no better than ± 30%.
Taking this study into consideration and through indirect comparisons, we would
conclude th a t an error rate of 8% obtained from the supervised learning method is
very likely to be acceptable in kinetic models.

8.2

Future D irection s

There are many directions for improvements in which the work presented in this thesis
can be extended. We outline some possible directions of the future work:
1. The neural network methods used in this work are general. An alternative ap­
proach such as “network pruning” [51] is a possible future direction. A minimum
size neural network is less likely to be influenced by the noise in the training
data and thus will be able to generalize better. Network pruning is one of the
techniques th a t can be used to achieve this objective. Network pruning assists
in minimizing the system complexity. In network pruning, a large network with
an adequate performance is first considered and later once the network has been
trained the network is pruned by eliminating certain synaptic weights in a se­
lective or orderly fashion. The potential exploration of network pruning could
involve the addressing of issues such as which weights to be eliminated, and
how the remaining weights to be adjusted.
2. The work could be expanded by working with larger data sets, considering all
possible chemical species in the atmosphere. In the present work we did not
99

consider cyclic chemical species, aromatics and radicals. By considering these,
the system developed might evolve to a system with more practical applications.
If the d ata set is large, a more adequate testing set (for example 20% - 30%)
can be used to test the performance of the model. It is also im portant to have a
good number of chemical species in each lump to produce a good classification
accuracy for all the categories. The problem of unsupervised learning of the
alcohols lump might also be examined in future work.
3. The application of neural networks to various fields has shown wide ranges in the
quality of results. Many attem pts have been made in the literature to improve
the performance of the models through hybrid networks if the neural network
alone is performing unsuccessfully. Future work could consider other approaches
such as hybrid models. The aim of the hybrid model is to combine the advantage
of different architectures into a single system. This involves the use of more than
one problem solving technique. Through this approach a system can perform
better by increasing the strength of the combined techniques and decreasing the
weakness of using either technique alone.
4. Modification of the input vector notation by choosing more separated numer­
ical values representing different functional groups and adopting a canonical
representation have to be considered such th a t the improved results can be
anticipated for unsupervised learning method.
The overall results confirm the hypothesis th a t it is possible to automate (with
reasonable error) the classification of atmospheric chemical species by a neural net­
work approach. The supervised learning method performed reasonably well when
compared to the unsupervised learning method. This is because the dataset in its
current form of representation may not be appropriate for applying unsupervised
learning methods.

100

A p p en d ices
A pp en d ix A - B ackpropagation A lgorithm
The error signal at the output of neuron j at an iteration n is defined as

^W = 4)W -

(Ai)

where yj{n) and dj(n) is the functional signal appearing at the output and desired
response of neuron j at iteration n respectively.
Instantaneous value £(n) of the error energy for neuron j is defined as:

==

(^12)

Instantaneous value for total error energy is obtained by summing |e |( n ) for all neu­
rons. This can be written as:

tec
where C is the set of all neurons.
Let N be the total number of training patterns. Therefore the average squared energy
is obtained by
1 ^
(A4)

£av —

n=l

For a given set, Sav represents the cost function as a measure of a learning per­
formance. The objective of the learning process is to adjust the parameters of the
network to minimize £av The induced local field (i.e., weighted sum of all synaptic
inputs plus bias) produced at the input of activation function of neuron j at iteration
n is given as:

101

W

^

W

(AS)

2=0

where m is the total number of inputs and Wj^(n) is the synaptic weight connecting
the output of neuron i to the input of neuron j.
The bias applied to neuron j is denoted by hj ; its effect is represented by a synapse of
weight Wjo = hj connected to a fixed input equal to 4-1. The functional signal y^(n)
appearing at the output of neuron j at iteration n is

%j(n) ==<Aj(%j(m))

(^L6)

The backpropagation algorithm applies a correlation A Wji{n) to the synaptic weight
Wji{n), which is proportional to the partial derivative

According to the chain

rule of calculus, we can write as:

dwji (n)
The partial derivative

dej (n) dyj (n) dvj (n) dwji (n)

represents the sensitive factor, determining the direction

of search in weight space for a synaptic weight Wjj(n). Differentiating both sides of
the eqn. A2 with respective ej{n)

Differentiating both sides of the eqn. A l with respective yj{n)
dejjn) ^
#%(m)
Next, differentiating eqn. A6 with respect to V j { n )

102

(A9)

Finally, differentiating eqn. A5 with respect to Wji{n
dvj{n)

dwjiin)

= %(n)

(A ll)

The use of eqns. A8 to A ll in A7 yields

dwji{n
The correction A Wji{n) applied to Wji{n) is defined by the delta rule:

where r) is the learning rate parameter of the backpropagation algorithm.
The minus sign indicates the gradient descent in weight space. By substituting eqn.
A12 in A13 yields

Awji(n) =

(A14)

where the local gradient 5j{n) is defined by

= -eXM)<;6j(uX7i))

103

(A17)

A pp en d ix B - C hem ical Species List
Experim entally determ ined vapor pressures are denoted by *.

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec"i

(mol/cm®)sec"i

Os:OH

cc#cccccc

7.50*

0.000050600

0 .0 0 0 0 0 0 0 2 1

0.000415134

c/c=c\c=c/c=c\c

7.63

0.000263503

0.000040800

0.154974681

c/c=c/c=c/c=c/c

7.63

0.00027358

0.000044800

0.163818435

c#cc=cc=cc=c
c=cc#cc=cc=c

7.87

0.000288935

0.000006916

0.023936146

7.94

0.000296708

0.000006916

0.023309104

C C (=0)/C = C \C C

7.96

0.000081100

0.000010700

0.132605608

C C (=0)/C = C /C C

7.96

0.000091700

0.000021500

0.234361847

C C (C )C #C C =0

8 .0 0

0.000044200

3.36E-09

0.000076000

C C #C C (C )C =0

8 .0 0

0.000077800

0 .0 0 0 0 0 0 0 2 1

0.000269871

C=CCC(C)(0)C

8.17

0.000048400

0.000008400

0.173569613

C #C C (C #C )C =0

8.19

0.000056500

0.00000042

0.007434254

C C(C)=CC(=0)C

8.21

0.000122591

0.000139742

1.139902843

CC#CC#CC(C)C

8.24

0.000113273

0.000000042

0.000370787

CC#CC(C)C#CC

8.24

0.000088000

0.000000042

0.000477535

CC(C )=C C =0

8.35

0.000072800

0.000008281

0.113678545

c#cc#cc=o

8.39

0.000059100

0.000000024

0.000411885

C /C = C \C (= 0)C C

8.5

0.000081300

0.000003980

0.048959047

C /C = C /C (= 0)C C

8.5

0.000092000

0.000007960

0.086559861

CC(C)(0)CCC

8.6

0.000010700

No

No

c#cc#cc#cc

8.62

0.000140347

0.000000063

0.000448888

CC(0)CC=C

8.66

0.000056300

0.000008400

0.149126100

CC=C(CC)C=0

8.74

0.000074400

0.000008281

0.111376176

CCC(C)=CC=0

8.74

0.000074400

0.000008281

0.111376176

CC(C)=CCC=0

8.74

0.000169467

0.000301000

1.776161789

CC=C(C)CC=0

8.74

0.000169467

0.000301000

1.776161789

CCC(0)CC

8.77*

0.000020300

No

No

104

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° G

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec"i

(mol/cm®)sec~^

0 3 :0 H

C=CC(0)C=C

8.89

0.000092900

0.000016800

0.180904263

C=CC(C)(0)C=C

8.96

0.000083100

0.000002450

0.029489433

CC#CC=C(C)CC

8.96

0.000214178

0.000051800

0.241749163

CC=C(C)C#CCC

8.96

0.000214178

0.000051800

0.241749163

CC=C(CC)C#CC

8.96

0.000214178

0.000051800

0.241749163

CCC#CC=C(C)C

8.96

0.000214178

0.000051800

0.241749163

CCC=C(C)C#CC

8.96

0.000214178

0.000051800

0.241749163

CC#CCC=C(C)C

8.96

0.000179778

0.000301021

1.674408380

C #C C C (=C )C =0

9.03

0.000066000

0.000001295

0.019630349

C #C C (=C )C C =0

9.03

0.000146999

0.000079800

0.054309737

CC(C)(C)CCC=0

9.06

0.000041500

No

No

CCC(C)(C)CC=0

9.06

0.000036300

No

No

CCCC(C)(C)C=0

9.06

0.000043100

No

No

C(C)C(CC)(C)C=0

9.06

0.000045100

No

No

CCC(CC)(C)C=0

9.06

0.000045100

No

No

C=CC(0)CC

9.13

0.000056300

0.000008400

0.149126100

CC(C)C(C)0

9.15*

0.000017800

No

No

C#CC(C#C)=CCC

9.16

0.000249385

0.000008000

0.032096934

C#CC(CC)=CC#C

9.16

0.000249385

0.000008000

0.032096934

C#CC=C(C)CC#C

9.16

0.000198104

0.000051800

0.261469458

C#CCG(C#C)=CC

9.16

0.000198104

0.000051800

0.261469458

C#CCC=C(C)C#C

9.16

0.000198104

0.000051800

0.261469458

c/c=c\o
c/c=c/o

9.17

0.000088400

0.000003980

0.045029348

9.17

0.000100271

0.000007960

0.079410148

C#CC(=C)C#CCC

9.25

0.000182313

0.000001267

0.006949587

C#CGC#CC(C)=C

9.25

0.000152630

0.000008000

0.052443681

G#GGG(=G)G#GG

9.25

0.000152630

0.000008000

0.052443681

G#GG#GGG(G)=G

9.25

0.000150805

0.000008442

0.055979502

SMILES N otation

105

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(m ol/cm ^)sec“ ^

C = C (C )C = C C = 0

9.44

0.000101859

0.000005900

0.057891996

C = C C (C )= C C = 0

9.44

0.000101859

0.000005900

0.057891996

C =C C =C (C )C =0

9.44

0.000101859

0.000005900

0.057891996

C C =C (C =C )C =0

9.44

0.000101859

0.000005900

0.057891996

C C = C C (= C )C = 0

9.44

0.000101859

0.000005900

0.057891996

C C (= 0 ) C C = 0

9.51

0.000103664

No

No

C#CC=CC=C(C)C

9.56

0.000395026

0.000239578

0.606487401

C#CC=CC(C)=CC

9.56

0.000395026

0.000239578

0.606487401

CC=CC(C#C)=CC

9.56

0.000395026

0.000239578

0.606487401

C # C C = C (C )C = C C

9.56

0.000395026

0.000239578

0.606487401

C#CC(C)=CC=CC

9.56

0.000395026

0.000239578

0.606487401

C=C(C)C(C)0

9.63

0.000091500

0.000008400

0.091835448

C =C (C )C #C /C =C \C

9.65

0.000249474

0 .0 0 0 0 1 2 0 0 0

0.047959827

C =C (C )C #C /C =C /C

9.65

0.000264887

0.000015946

0.060199205

C=CC(C)=CC#CC

9.65

0.000324203

0.000036876

0.113743679

C=C(C)C=CC#CC

9.65

0.000324202

0.000036876

0.113743788

C=CC=C(C)C#CC

9.65

0.000324202

0.000036876

0.113743788

CC=C(C=C)C#CC

9.65

0.000324202

0.000036876

0.113743788

CC=CC(=C)C#CC

9.65

0.000324202

0.000036876

0.113743788

C=CC#CC(C)=CC

9.65

0.000260426

0.000053000

0.203521643

C = C C # C C = C (C )C

9.65

0.000260426

0.000053000

0.203521643

cccc#cccc
c#cccccc#c

9.67"

0.000049900

0 .0 0 0 0 0 0 0 2 1

0.000420947

9.68

0.000029800

0.000000042

0.001407838

C C (=0)C C #C C

9.71

0.000043600

0 .0 0 0 0 0 0 0 2 1

0.000481828

C C C (=0)C #C C

9.71

0.000040000

0 .0 0 0 0 0 0 0 2 1

0.000524471

C C (=0)C #C C C

9.71

0.000039800

0.000000057

0.001425155

0 = C C (C )C = 0

9.72

0.000207901

No

No

c=ccccc=o

9.79

0.000083500

0.000008400

0.100551540

SMILES N otation

106

0 3

:0 H

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° G

concentration 1.56x10®

concentration 7x10^^

coefficient

m m of Hg

1.56E6 (OH/cm®)sec“ ^

(mol/cm®)sec^i

0 3 :0 H

C #C C (=C )C =C C #C

9.86

0.000392933

0.000005712

0.014536837

C #C C (C #C )=C C =C

9.86

0.000392933

0.000005712

0.014536837

c#c/c=c\cccc
c#c/c=c/cccc
c=ccco
c#ccc/c=c\cc
c#cccc/c=c\c
c#cc/c=c\ccc
c#cc/c=c/ccc
c#ccc/c=c/cc
c#cccc/c=c/c

9.89

0.000129711

0.000004000

0.030855160

9.89

0.000145124

0.000007980

0.055011698

9.89

0.000049300

0.000008400

0.170346931

9.89

0.000109791

0.000091000

0.828848081

9.89

0.000104905

0.000091021

0.867652909

9.89

0.000104206

0.000091021

0.873469397

9.89

0.000116062

0.000140021

1.206429817

9.89

0.000121647

0.000140000

1.150871644

9.89

0.000116761

0.000140021

1.199211873

CC(C)CC(=0)CC

9.90

0.000015600

No

No

CC(C)C(=0)CCC

9.90

0 .0 0 0 0 1 1 2 0 0

No

No

C #C C (=0)C C #C

9.93

0.000021800

0.000000042

0.001923008

C # C C (= C )C # C C = C

9.95

0.000228561

0.000002492

0.010902995

C=CC#CGCCC

9.98

0.000095200

0.000001246

0.013089874

c=ccccc#cc
c=ccc#cccc
c=cccc#ccc

9.98

0.000089100

0.000008421

0.094459994

9.98

0.000088500

0.000008421

0.095206026

9.98

0.000088500

0.000008421

0.095206026

C = C C (= 0 )C = 0

10.3

0.000034100

0.000001225

0.035916318

C C (= 0 )/C = C \C # C

10.4

0.000112382

0.000001670

0.014902244

C C (= 0 )/C = C /C # G

10.4

0.000126254

0.000003330

0.026363538

G #G G (=0)/G =G \G

10.4

0.000034800

0.000004000

0.114912876

G #G G (=0)/G =G /G

10.4

0.000039000

0.000007980

0.204819827

GGG(G)G=0

10.4

0.000041900

No

No

C = C C C C (= 0 )C

10.5*

0.000048200

0.000008400

0.174224440

GG(G)GO

10.5*

0.000010700

No

No

GG(=0)G#GG=G

10.5

0.000085700

0.000001280

0.014952637

SMILES N otation

107

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(m ol/cm ^)sec“ ^

OaiOH

C = C C (= 0 )C # C C

10.5

0.000029300

0.000001246

0.042505346

ccc#ccccc
c#ccc#cc#c

10.7

0.000049900

0 .0 0 0 0 0 0 0 2 1

0.000420947

10.7

0.000081300

0.000000063

0.000774631

C /C =C \C (C )(0)C

10.7

0.000090000

0.000003980

0.044222856

c=cc#cccc=c

10.7

0.000133751

0.000009646

0.072119272

C /C =C C /(C )(0)C

10.7

0.000101883

0.000007960

0.078153390

c=ccc#ccc=c

10.7

0.000127090

0.000016821

0.132354941

1 1 .1

0.000215734

0.000009650

0.044736717

1 1 .1

0.000225812

0.000013600

0.060371009

1 1 .1

0.000253453

0.000127820

0.504314011

1 1 .1

0.000265309

0.000176820

0.666467654

cc=cc=ccc=c

1 1 .1

0.000264217

0.000232400

0.879579376

CC(C)C(C)CC=0

1 1 .2

0.000052800

No

No

CC(C)CC(C)C=0

1 1 .2

0.000047200

No

No

CC(C)C(CC)C=0

1 1 .2

0.000054100

No

No

CCC(C)C(C)C=0

1 1 .2

0.000052800

No

No

cccccc=o

11.3

0.000045000

No

No

CC(C)(0)C(=C)C

11.5

0.000082200

0.000007960

0.096835618

CC(C)=C(C)C=0

1 1 .6

0.000085300

0.000053800

0.630950557

CCC(C)C(=0)C

1 1 .6

0.000009720

No

No

CC(=0)CCCC

1 1 .6

0.000010500

No

No

C #C C (C )C C =0

11.7

0.000056700

0 .0 0 0 0 0 0 0 2 1

0.000370282

C #C C C (C )C =0

11.7

0.000052600

0 .0 0 0 0 0 0 0 2 1

0.000399184

C #C C (C C )C =0

11.7

0.000052000

0 .0 0 0 0 0 0 0 2 1

0.000403529

C#CC#CCC(C)C

11.9

0.000075000

0.000000042

0.000560085

C#CC#CC(C)CC

11.9

0.000075000

0.000000042

0.000560085

C#CCC(C)C#CC

11.9

0.000059000

0.000000042

0.000711530

C#CC(C)CC#CC

11.9

0.000059000

0.000000042

0.000711530

c=c/c=c\c=c/cc
c=c/c=c/c=c/cc
c=c/c=c\c/c=c\c
c=c/c=c/c/c=c/c

108

SMILES N otation

Vapor

OH R ate coeff­

O 3

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefRcient

mm of Hg

(OH/cm®)sec“ ^

(moI/cm®)sec“ ^

OsiOH

C#CC(CC)C#CC

11.9

0.000059000

0.000000042

0.000711530

C#CCC#CC(C)C

11.9

0.000058000

0.000000042

0.000723571

C#CC(C)C#CCC

11.9

0.000058000

0.000000042

0.000723571

1 2 .0

0.000042400

3.36E-09

0.000079300

1 2 .0

0.000075800

0 .0 0 0 0 0 0 0 2 1

0.000277007

C#CC(C#C)=C(C)C

1 2 .0

0.000308992

0.000051800

0.167636044

C#CC(C)=C(C)C#C

1 2 .0

0.000308992

0.000051800

0.167636044

C #C C (C #C )C C #C

1 2 .1

0.000038300

0.000000063

0.001646000

1 2 .2

0.000111442

0.000000042

0.000376877

12.2

0.000111442

0.000000042

0.000376877

1 2 .2

0.000086100

0.000000042

0.000487684

C C (C )/C = C \C = 0

12.3

0.000059800

0.000000637

0.010653489

C C (C )/C =C /C = 0

12.3

0.000063800

0.000001274

0.019961251

C /C = C \C (C )C = 0

12.3

0.000123679

0.000091000

0.735776701

C /C = C /C (C )C = 0

12.3

0.000135535

0.000140000

1.032944831

C C (C )/C =C \C #C C

12.4

0.000153942

0.000004000

0.025998473

C /C =C \C #C C (C )C

12.4

0.000153942

0.000004000

0.025998473

C C (C )/C =C /C #C C

12.4

0.000169355

0.000007980

0.047140751

C /C =C /C #C C (C )C

12.4

0.000169355

0.000007980

0.047140751

C /C=C\C(C)C#CC

12.4

0.000133816

0.000091021

0.680196678

C /C=C/C(C)C#CC

12.4

0.000145672

0.000140210

0.962506735

C #C C (C )/C =C \C #C

12.7

0.000137869

0.000004020

0.029181774

C #C C (C )/C =C /C #C

12.7

0.000153281

0.000008000

0.052220952

C #C C (C =C )C =0

12.7

0.000086600

0.000008421

0.097236542

C #C C (C #C )/C =C \C

12.7

0.000113063

0.000091042

0.805235615

C #C C (C #C )/C =C /C

12.7

0.000124919

0.000140420

1.124092369

C#CC(C)C#CC=C

12.8

0.000103346

0.000001267

0.012259813

C#CC#CC(C)C=C

12.8

0.000113267

0.000008442

0.074531603

ccc#cc=o
CG#CCC=0

ccc#cc#cc
ccc#cc#cc
cc#ccc#cc

109

R ate coeff­

Relative

R ate coeff­

Relative

Vapor

OH R ate coeff­

O 3

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec“ ^

OsiOH

c#cc#ccc=cc

12.8

0.000113267

0.000008442

0.074531603

C#CC(C=C)C#CC

1 2 .8

0.000097300

0.000008442

0.086756741

SMILES N otation

c#cc=cc=o
c#c/c=c/c=o

12.9

0.000074500

0.000000119

0.001597302

12.9

0.000079700

0.000000217

0.002721312

CCCC=C(C)C#C

12.9

0.000189573

0.000051800

0.273126190

C#CC=C(C)CCC

12.9

0.000189573

0.000051800

0.273126190

CC=C(C#C)CCC

12.9

0.000189573

0.000051800

0.273126190

C#CCCC=C(C)C

12.9

0.000150493

0.000301021

2.000238111

CC=C(C)CCC#C

12.9

0.000150493

0.000301021

2.000238195

C#CCC=C(C)CC

12.9

0.000149871

0.000301021

2.008530941

CCC=C(C)CC#G

12.9

0.000149871

0.000301021

2.008530941

CC=C(CC)CC#C

12.9

0.000149871

0.000301021

2.008530941

c#c/c=c\c#cc
c#c/c=c/c#cc

13.0

0.000193989

0.000000655

0.003373901

13.0

0.000214026

0.000001267

0.005919848

c=cc#cc=o

13.0

0.000091800

0.000001230

0.013375181

C C C C (-C )C =0

13.0

0.000057400

0.000001274

0.022180637

c/c=c\c#cc#c
c/c=c/c#cc#c

13.0

0.000175330

0.000004020

0.022946734

13.0

0.000190743

0.000008000

0.041964896

CC#CC(=C)CCC

13.0

0.000144099

0.000007980

0.055403035

C=C(C)C#CCCC

13.0

0.000144099

0.000007980

0.055403035

CCC(=C)C#CCC

13.0

0.000143477

0.000007980

0,055642965

C=C(C)CCC#CC

13.0

0.000126313

0.000008421

0.066667946

CC#CCC(=C)CC

13.0

0.000125691

0.000008421

0.066997516

C=C(C)CC#CCC

13.0

0.000125691

0.000008421

0.066997516

C=C(C)CCC=0

13.0

0.000119808

0.000008400

0.070111997

CCC(=C)CC=0

13.0

0.000115380

0.000008400

0.072802753

c=cc#cc#cc

13.1

0.000154847

0.000001267

0.008182262

C #CC(=C)CCC#C

13.3

0.000128024

0.000008000

0.062523362

110

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec"^

(mol/cm®)sec“ ^

0 3 :0 H

C#CCC(=C)CC#C

13.3

0.000104938

0.000008442

0.080447441

CCC(CC)C(=0)C

13.4

0.000015500

No

No

CC(C)C(=:0)C(C)C

13.4

0.000007850

No

No

C /C =C \C /C =C (C )\C

13.5

0.000225642

0.000392000

1.737268921

C /C =C (G )\C /C =C \C

13.5

0.000225642

0.000392000

1.737268921

C /C =C /C /C =C (C )/C

13.5

0.000237498

0.000441000

1.856861495

C /C =C (C )/C /C =C /C

13.5

0.000237498

0.000441000

1.856861495

CCC=CC=C(C)C

13.5

0.000298542

0.000700000

2.344725982

C C = C (C )C = C C C

13.5

0.000298542

0.000700000

2.344725982

CCC=C(C)C=CC

13.5

0.000298542

0.000700000

2.344725982

C C = C C = C (C )C C

13.5

0.000298542

0.000700000

2.344725982

C C = C C (= C )C (C )C

13.5

0.000298542

0.000700000

2.344725982

C C = C (C C )C = C C

13.5

0.000298542

0.000700000

2.344725982

c#ccccccc

13.6*

0.000021400

0 .0 0 0 0 0 0 0 2 1

0.000982300

C =C (C )C /C =C \C #C

13.9

0.000205514

0.000012400

0.060347353

C =C (C )C /C =C /C #C

13.9

0.000220927

0.000016400

0.074157913

CCC=C(C=C)C#C

13.9

0.000298976

0.000036820

0.123153686

C=CC=C(CC)C#C

13.9

0.000298976

0.000036820

0.123153686

C=CC(CC)=CC#C

13.9

0.000298976

0.000036820

0.123153686

C # C C = C C (= C )C C

13.9

0.000298976

0.000036841

0.123223926

CCC=CC(=C)C#C

13.9

0.000298976

0.000036876

0.123340992

C = C C C (C # C )= C C

13.9

0.000228212

0.000060200

0.263689659

C=CCC=C(C)C#C

13.9

0.000228212

0.000060200

0.263689659

C=CCC(C)=CC#C

13.9

0.000228212

0.000060200

0.263689659

C /C =C \C C (=C )C #C

13.9

0.000203174

0.000099000

0.487184901

C C = C (C = C )C C # C

13.9

0.000234109

0.000126021

0.538300075

C=CC(C)=CCC#C

13.9

0.000234109

0.000126021

0.538300075

CC=CC(=C)CC#C

13.9

0.000234109

0.000126021

0.538300075

Ill

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25= C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m ol/cm ^)sec“ ^

03:0H

C = C (C )C = C C C # C

13.9

0.000234109

0.000126021

0.538300075

C /C =C /C C (=C )C #C

13.9

0.000203174

0.000147984

0.728357052

C C C (= 0 )C C C

13.9

0.000009210

No

No

C # C C (= 0 )C C C

14.0

0.000017100

0 .0 0 0 0 0 0 0 2 1

0.001226955

C #C C C (=0)C C

14.0

0.000013905

0 .0 0 0 0 0 0 0 2 1

0.001507215

C=CC(=C)C#CCC

14.0

0.000250460

0.000005691

0.022722189

C=CC(=C)CC#CC

14.0

0.000207589

0.000009821

0.047309783

C=CC#CC(=C)CC

14.0

0.000189725

0.000009210

0.048535955

C = C C (= C )C C = 0

14.0

0.000197278

0.000009800

0.049676035

C=C(C)CC#CC=C

14.0

0.000170992

0.000009646

0.056412128

C=C(C)C#CCC=C

14.0

0.000182738

0.000016400

0.089655486

C = C C C (= C )C # C C

14.0

0.000182738

0.000016400

0.089655486

C =C C C (=C )C =0

14.0

0.000096100

0.000009674

0.100689778

C /C = C \C C (= 0)C

14.0

0.000089400

0.000091000

1.017350181

C /C = C /C C (= 0)C

14.0

0.000101304

0.000140000

1.381978175

cccccccc

14.1*

0.000012900

No

No

C #C C (0)C

14.3

0 .0 0 0 0 2 2 0 0 0

0 .0 0 0 0 0 0 0 2 1

0.000954902

c=cccccc#c

14.4

0.000059900

0.000008421

0.140488212

C =C /C =C (C )\C =C /C

14.5

0.000263503

0.000040800

0.154974681

C =C (C )/C =C \C =C /C

14.5

0.000263503

0.000040800

0.154974681

C =C /C =C (C )/C =C /C

14.5

0.000273580

0.000044800

0.163818435

C =C (C )/C =C /C =C /C

14.5

0.000273580

0.000044800

0.163818435

C /C =C (C =C )\C =C /C

14.5

0.000235259

0.000040800

0.173580028

C /C = C \C (= C )/C = C \C

14.5

0.000235259

0.000040800

0.173580028

C /C =C (C =C )/C =C /C

14.5

0.000245337

0.000044800

0.182677654

C /C = C /C (= C )/C = C /C

14.5

0.000245337

0.000044800

0.182677654

C = C C = C C = C (C )C

14.5

0.000254884

0.000057400

0.225303719

C C = C (C )C = C C = C

14.5

0.000287238

0.000240782

0.838267891

112

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec“ ^

CC=CC=C(C)C=C
C=CC(C=C)=C(C)C

14.5

0.000287238

0.000240782

0.838267891

14.5

0.000249447

0.000240782

0.965264674

CC(=0)CC(C)(C)C

14.6

0.000002290

No

No

CCC(=0)C(C)(C)C

14.8

0.000004400

No

No

C=C(C)CO

14.9

0.000085700

0.000008400

0.098000098

C#CC(C=C)=CC=C
C#CC(=C)C=CC=C
C#CC=C(C=C)C=C

15.0

0.000288935

0.000006916

0.023936146

15.0

0.000288935

0.000006916

0.023936146

15.0

0.000252218

0.000006916

0.027420673

C#CC=CC(=C)C=C

15.0

0.000252218

0.000006916

0.027420673

C=CC(CC)CC=C

15.0

0.000090500

0.000016800

0.185676713

c=cc=ccccc
c=ccccc=cc

15.0

0.000169850

0.000036820

0.216779920

15.0

0.000135013

0.000099400

0.736226208

cc/c=ccc=o

15.0

0.000123180

0.000091000

0.738754780

c=cc/c=c\ccc

15.0

0.000134314

0.000099400

0.740055317

c=ccc/c=c\cc

15.0

0.000134314

0.000099400

0.740055317

C=CC(C=C)CCC

15.0

0.000146869

0.000148400

1.010425274

c=cc/c=c/ccc
c=ccc/c=c/cc

15.0

0.000146170

0.000148400

1.015254236

15.0

0.000146170

0.000148400

1.015254236

cc/c=c/cc=o

15.0

0.000135036

0.000140000

1.036758633

CCC(C)CC=0

15.0

0.000050600

No

No

C =C(C)CC(=0)C

15.r

0.000081600

0.000008400

0.102880583

C=CC(=C)C#CC=C

15.1

0.000296708

0.000006916

0.023309104

C #C C C (=0)C =C

15.1

0.000048900

0.000001246

0.025460798

C #C C (=0)C C =C

15.1

0.000051900

0.000008421

0.162101978

c/c=c\cc/c=c\c
c/c=c/cc/c=c/c
c#cco

15.5

0.000179977

0.000182000

1.011242707

15.5

0.000203689

0.000280000

1.374647540

15.6

0.000016200

0 .0 0 0 0 0 0 0 2 1

0.001293260

C=CC(=0)/C=C\C

15.7

0.000045400

0.000005210

0.114760144

SMILES N otation

113

0 3

:0 H

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

Rate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec^i

(mol/cm®)sec“ ^

0 3 :0 H

C = C C (= 0)/C = C /C

15.7

0.000049500

0.000009190

0.185546236

C #C C(C)(0)C

16.0*

0.000012800

0 .0 0 0 0 0 0 0 2 1

0.001646953

CC(C)(C)CO

16.0*

0.000007270

No

No

c=cc=cccc=c
c=cc/c=c\cc=c
c=cc/c=c/cc=c

16.2

0.000208412

0.000045220

0.216973804

16.2

0.000172954

0.000107800

0.623286829

16.2

0.000184810

0.000156800

0.848438570

c /c = c \c c c c c

16.3*

0.000096500

0.000091000

0.943492864

c /c = c /c c c c c

16.3*

0.000108306

0.000140000

1.292632402

c c /c = c \c c c c

16.3

0.000095800

0.000091000

0.950376240

c c /c = c /c c c c

16.3

0.000107608

0.000140000

1.301023928

CC(C)(0)CC

16.8*

0.000007630

No

No

CC#CC(CC)CC

16.8

0.000051100

0 .0 0 0 0 0 0 0 2 1

0.000411288

CC#CCC(C)CC

16.8

0.000051000

0 .0 0 0 0 0 0 0 2 1

0.000411628

C C #C C C C (C )C

16.8

0.000050600

0 .0 0 0 0 0 0 0 2 1

0.000415241

CC#CC(C)CCC

16.8

0.000050600

0 .0 0 0 0 0 0 0 2 1

0.000415241

CCC#CCC(C)C

16.8

0.000049900

0 .0 0 0 0 0 0 0 2 1

0.000421056

CCC#CC(C)CC

16.8

0.000049900

0 .0 0 0 0 0 0 0 2 1

0.000421056

CC(C)C#CCCC

16.8

0.000049500

0 .0 0 0 0 0 0 0 2 1

0.000424126

C#CC(C)=C(C)CC

16.9

0.000234504

0.000336437

1.434671011

C#CC(CC)=C(C)C

16.9

0.000234504

0.000336446

1.434710712

C#CCC(C)=C(C)C

16.9

0.000184614

0.000840021

4.550159631

CC(C)(C)C(C)C=0

16.9

0.000036800

No

No

CC(C)C(C)(C)C=0

16.9

0.000046400

No

No

CCC(CC)C=0

16.9

0.000047700

No

No

CC(C)CCC=0

16.9

0.000045000

No

No

C#CCC(C#C)CC

17.1

0.000030300

0.000000042

0.001387795

C#CCC(C)CC#C

17.1

0.000030300

0.000000042

0.001387795

C#CC(C#C)CCC

17.1

0.000029800

0.000000042

0.001408457

SMILES N otation

114

Vapor

OH R ate coeff­

O 3

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec“ ^

O ^O H

C#CC(C)CCC#C

17.1

0.000029800

0.000000042

0.001408457

C /C = C \C (= 0 )C

17.3

0.000079600

0.000010700

0.135115593

C /C = C /C (= 0 )C

17.3

0.000090200

0.000021500

0.238273281

c=ccccccc
c#cc#cccc

17.4*

0.000051500

0.000008400

0.163149893

17.5

0.000072800

0.000000042

0.000576945

C#CC#CC(C)(C)C

17.5

0.000069700

0.000000042

0.000602282

c#cccc#cc
c#ccc#ccc

17.5

0.000056800

0.000000042

0.000738963

17.5

0.000056200

0.000000042

0.000747131

C#CC(C)(C)C#CC

17.5

0.000053800

0.000000042

0.000781045

CC=C(C)C(C)=CC

17.6

0.000406449

0.007000000

17.22234819

CC(C)=C(C)C=CC

17.6

0.000406449

0.007000000

17.22234819

CC=C(C)C=C(C)C

17.6

0.000406449

0.007000000

17.22234819

c#cccc=o

17.7

0.000050300

0.000000021

0.000417228

C #C C (C )(C )C =0

17.7

0.000045000

0.000000021

0.000467079

ccc/c=c\ccc
ccc/c=c/ccc
cc^^=ccccc

1718*
17\8*

0.000095700

0.000091000

0.951141751

0.000107530

0.000140000

1.301956334

17.8'

0.000142038

0.000301000

2.119151213

C #C C (C #C )(C )C #C

17.9

0.000033000

0.000000063

0.001907877

C#C /C =C \C C (C )C

17.9

0.000129698

0.000004000

0.030858278

C#C /C =C \C (C )C C

17.9

0.000129698

0.000004000

0.030858278

C#C /C =C /C C (C )C

17.9

0.000145111

0.000007980

0.055016665

C #C /C =C /C (C )C C

17.9

0.000145111

0.000007980

0.055016665

cc=c^^c=o

17.9

0.000072800

0.000008281

0.113678545

C/C =C \C C (C )C #C

17.9

0.000104892

0.000091021

0.867761304

C/C =C \C (C )C C #C

17.9

0.000104892

0.000091021

0.867761304

C /C =C \C (C C )C #C

17.9

0.000104892

0.000091021

0.867761304

C #C C /C =C \C (C )C

17.9

0.000103909

0.000091021

0.875964888

C C /C =C \C (C )C #C

17.9

0.000103909

0.000091021

0.875964888

SMILES N otation

115

R ate coeff­

Relative

R ate coeff­

Relative

Vapor

OH R ate coeff­

O 3

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec~^

(mol/cm®)sec“ ^

OsiOH

C/C =C /C C (C )C #C

17.9

0.000116748

0.000140021

1.199346475

C/C =C /C (C )C C #C

17.9

0.000116748

0.000140021

1.199346475

C/C =C /C (C C )C #C

17.9

0.000116748

0.000140021

1.199346475

C#C C /C =C /C (C )C

17.9

0.000115765

0.000140021

1.209523577

CC /C =C /C (C )C #C

17.9

0.000115765

0.000140021

1.209523577

CCCC(C)C=0

17.9

0.000045000

No

No

C=CC#CCC(C)C

18.0

0.000095200

0.000001246

0.013091677

C=CC#CC(C)CC

18.0

0.000095200

0.000001246

0.013091677

C=CCC(C)C#CC

18.0

0.000089100

0.000008421

0.094473880

C=CC(C)CC#CC

18.0

0.000089100

0.000008421

0.094473880

C=CC(CC)C#CC

18.0

0.000089100

0.000008421

0.094473880

C=CCC#CC(C)C

18.0

0.000088200

0.000008421

0.095526644

C=CC(C)C#CCC

18.0

0.000088200

0.000008421

0.095526644

CCC=C(C)CCC

18.0

0.000141339

0.000301000

2.129625086

CC(C)C(=0)CC

18.1

0.000058406

No

No

CC(C)C(=0)CC

18.1

0.000058406

No

No

C =C C ((^C C =0

1&2

0.000086800

0.000008400

0.096750242

C =G C C ((^C =0

18.2

0.000082700

0.000008400

0.101553213

C=CC(CC)C=0

1&2

0.000082100

0.000008400

0.102253447

CC=C(C)C(=C)C#C

18.2

0.000395026

0.000239578

0.606487401

C=C(C)C(C#C)=CC

1&2

0.000395026

0.000239578

0.606487401

C#CC(C=C)=C(C)C

1&2

0.000395026

0.000239788

0.607019011

CC(C)=CC(=C)C#C

1&2

0.000395026

0.000239788

0.607019011

C=CC(C)=C(C)C#C

1&2

0.000395026

0.000239788

0.607019011

C=C(C)C(C)=CC#C

1&2

0.000395026

0.000239788

0.607019011

C=C(C)C=C(C)C#C

1&2

0.000395026

0.000239788

0.607019011

CC=C(C)CCCC

18.2

0.000142038

0.000301000

2.119151213

CCC(C)=CCCC

1&2

0.000141339

0.000301000

2.129625086

SMILES N otation

116

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at2& :C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m ol/cm ^)sec“ ^

OsiOH

CC=C(CC)CCC

18.2

0.000141339

0.000301000

2.129625086

CC(C)=C(C)CCC

1&3*

0.000176082

0.000840000

4.770512129

CCC(0)C

18.3*

0.000015600

No

No

cc/c=c\c#cc
c/c=c\c#ccc
cc/c=c/c#cc
c/c=c/c#ccc
c/c=c\cc#cc
c/c=c/cc#cc

18.3

0.000152111

0.000040000

0.026311311

18.3

0.000152111

0.000040000

0.026311311

1&3

0.000167524

0.000007980

0.047655805

18.3

0.000167524

0.000007980

0.047655805

18.3

0.000131985

0.000091021

0.689629516

1&3

0.000143841

0.000140021

0.973440462

C=C(C)C#CC(C)=C

18.4

0.000239334

0.000015946

0.066626438

C=C(C)C(=C)C#CC

18.4

0.000324202

0.000036876

0.113743788

C#CC(C#C)CC=C

18.4

0.000068400

0.000008442

0.123452457

C#CCC(C#C)C=C

18.4

0.000068400

0.000008442

0.123452457

cc/c=c\c=o

18.5

0.000058000

0.000000637

0.010989908

cc/c=c/c=o
c/c=c\cc=o

1&5

0.000062000

0.000001274

0.020550605

18.5

0.000121674

0.000091000

0.747897766

c/c=c/cc=o

18.5

0.000133530

0.000140000

1.048450516

C =C (C )C (=C )C =0

1&6

0.000101859

0.000005900

0.057891996

cc=cc=cc^^c

18.6

0.000225281

0.000224000

0.994315816

CC=CC((^C=CC

18.6

0.000179680

0.000182000

1.012913489

C C = C C (C )C = C C

18.6

0.000203392

0.000280000

1.376653959

c#cc/c=c\c#c
c#cc/c=c/c#c

18.7

0.000136038

0.000040200

0.029574405

18.7

0.000151451

0.000008000

0.052852064

C #C C (=C )C (=C )C #C

1&8

0.000392933

0.000005712

0.014536837

C#CC(=C)CCCC

1&8

0.000119571

0.000007980

0.066767929

C=C(C)CCCC#C

1&8

0.000097100

0.000008421

0.086716513

C#CCC(=C)CCC

1&8

0.000096400

0.000008421

0.087349078

C#CCCC(=C)CC

1&8

0.000096400

0.000008421

0.087349078

SMILES N otation

117

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm ®)sec"i

(mol/cm®)sec“ ^

O^OH

c=cc#ccc#c
c#ccc#cc=c
c=ccc#cc#c

1&9

0.000101515

0.000001267

0.012480860

18.9

0.000101515

0.000001267

0.012480860

1&9

0.000111437

0.000008442

0.075755781

C=CC(C)C#CC=C

19.4

0.000133454

0.000009646

0.072279702

C = C C (C = C )C # C C

19.4

0.000127415

0.000016821

0.132017879

cc=cc=cc#c

19.5

0.000297470

0.000036876

0.123965375

CCC(C)CCCC

19.6*

0.000013400

No

No

C = C C (C )= C C C C

19.6

0.000225577

0.000126000

0.558566584

C=C((^C=CCCC

19.6

0.000225577

0.000126000

0.558566584

CC=C(C=C)CCC

19.6

0.000225577

0.000126000

0.558566584

CC=CC(=C)CCC

19.6

0.000225577

0.000126000

0.558566584

C=C(C)CC/C=C\C

19.6

0.000172177

0.000099400

0.577314303

C=C(C)C/C=C\CC

19.6

0.000171555

0.000099400

0.579405251

C/C=C\CC(=C)CC

19.6

0.000171555

0.000099400

0.579405251

C=C(C)CC/C=C/C

19.6

0.000184033

0.000148400

0.806378975

c=c^^cc=ccc

19.6

0.000183411

0.000148400

0.809110770

C/C=C/CC(=C)CC

19.6

0.000183411

0.000148400

0.809110770

C = C C = C (C )C C C

19.6

0.000225577

0.000224000

0.993007261

G=CCCC=C(C)C

19.6

0.000180601

0.000309400

1.713172831

CC=C((^CCC=C

19.6

0.000180601

0.000309400

1.713172831

c=ccc=c^^cc

19.6

0.000179979

0.000309400

1.719087272

CCC=C(C)CC=C

19.6

0.000179979

0.000309400

1.719087272

CC=C(CC)CC=C

19.6

0.000179979

0.000309400

1.719087272

CCC(=C)CCCC

19.7*

0.000088000

0.000008400

0.095507126

c=cc#c/c=c\c

19.7

0.000198359

0.000052300

0.026352411

C #C C (=C )C (=0)C

19.7

0.000103256

0.000003.30

0.032235290

c=cc#c/c=c/c

19.7

0.000213725

0.000009210

0.043085648

C = C C (C = C )C = 0

19.7

0.000116711

0.000016800

0.143945000

118

R ate coeff­

Relative

Vapor

OH R ate coeff­

O 3

P ressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec~^

O^OH

c=cc=cc#cc

19.7

0.000248954

0.00005691

0.228596301

C #C C (=0)C (C )=C

19.7

0.000032100

0.000007980

0.248718323
No

SMILES N otation

CCC(C)(C)C(=0)C

1&8

0.000007760

No

CC=C(C)C(=0)C

19.9

0.000122591

0.000139742

1.139902843

CC(C)CC(=0)C

19.9

0.000013800

No

No

CCC(CC)CCC

20.0*

0.000014000

No

No

c=cc=cc=o

20.0

0.000082000

0.000000907

0.011060490

CCC(=0)C(C)==C

20.0

0.000074300

0.000007960

0.107169828

C=C(C)CCCCC

20.2*

0.000095700

0.000091000

0.951141751

C=C(C)CCCCC

20.2*

0.000107530

0.000140000

1.301956334

C=CCCC(=C)C#C

20.3

0.000158133

0.000016821

0.106372177

CCCC(C)CCC

20.5*

0.000013400

No

No

CCCC(=C)CCC

20.7

0.000087900

0.000008400

0.095590884

C =C (0)C

20.7

0.000080600

0.000007960

0.098772480

CC(C)(C)C(C)(C)C

20.9*

0.000001570

No

No

ccco

21.0*

0.000008540

No

No

C =C C (=C )/C =C \C C

21.1

0.000194850

0.000009650

0.049531709

C=CC=CC(=C)CC

21.1

0.000209104

0.000013600

0.065194698

C =C C (=C )/C =C /C C

21.1

0.000204928

0.00001.600

0.066523518

C=CC=C(CC)C=C

21.1

0.000224884

0.000038080

0.169331874

C=C(C)CC=CC=C

21.1

0.000245653

0.000045220

0.184080647

CCC=C(C=C)C=C

21.1

0.000196640

0.000038080

0.193653349

C =C C (=C )C /C =C \C

21.1

0.000253453

0.000100800

0.397706559

CC=C(C=C)CC=C

21.1

0.000264217

0.000134400

0.508672410

C=CCC=C(C)C=C

21.1

0.000264217

0.000134400

0.508672410

C=CC=C(C)CC=C

21.1

0.000264217

0.000134400

0.508672410

C=CCC(=C)C=CC

21.1

0.000264217

0.000134400

0.508672410

C=C(C)C=CCC=C

21.1

0.000264217

0.000134400

0.508672410

119

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

a t 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec^i

(mol/cm®)sec^i

OaiOH

C=CC(=C)C/C=C/C

2 1 .1

0.000265309

0.000149800

0.564624220

C=CC(=0)CCC

21.3

0.000044200

0.000001225

0.027706946

C = C C C (= 0 )C C

21.3

0.000044000

0.000008400

0.190731433

CCC(C)C(C)CC

21.7*

0.000013800

No

No

CC(C)C(C)=C(C)C

22.5*

0.000175785

0.000840000

4.778568638

c= cccccc= c

2&T

0.000090000

0.000016800

0.186565150

C = C C (= 0 )C C = C

2&9

0.000079000

0.000009625

0.121764573

CCC(CC)(C)CC

23.0*

0.000006420

No

No

CC(C)C(C)CCC

23.4*

0.000013300

No

No

CC(C)C=C(C)C#C

2&4

0.000189276

0.000051800

0.273554572

C#CC=C(C)C(C)C

2&4

0.000189276

0.000051800

0.273554572

C#CC(=CC)C(C)C

2&4

0.000189276

0.000051800

0.273554572

C#CC(C)C=C(C)C

23.4

0.000150196

0.000301021

2.004191664

CC=C(C)C(C)C#C

234

0.000150196

0.000301021

2.004191664

C=CC(C)(0)C

2&4

0.000042900

0.000001225

0.028582217

CC#CC(=C)C(C)C

23.6

0.000143802

0.000007980

0.055517410

C=C(C)C#CC(C)C

2&6

0.000143802

0.000007980

0.055517410

CC=C(C)CC#CC

2&6

0.000126016

0.000008421

0.066825002

C=C(C)C(C)C#CC

2&6

0.000126016

0.000008421

0.066825002

C=C(C)CCC(=C)C

2&9

0.000164377

0.000016800

0.102204343

CC#CC=C((^C

2&9

0.000212672

0.000051800

0.243460921

CC=C((^C#CC

2&9

0.000212672

0.000051800

0.243460921

CCC(C)=C(C)CC

2&9

0.000175460

0.000840000

4.787405676

CC(C)=C(CC)CC

23.9

0.000175460

0.000840000

4.787405676

C#CC(C)C(=C)C#C

24.1

0.000127729

0.000008000

0.062668038

C#CC(C#C)C(C)=C

24.1

0.000105263

0.000008442

0.080199456

C#CCCC(C)CC

24.2

0.000021800

0 .0 0 0 0 0 0 0 2 1

0.000962893

C#CCC(C)CCC

24^

0.000021800

0 .0 0 0 0 0 0 0 2 1

0.000962893

SMILES N otation

120

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec~^

OsiOH

C#CCCCC(C)C

24.2

0.000021400

0 .0 0 0 0 0 0 0 2 1

0.000982903

C#CC(C)CCCC

24.2

0.000021400

0 .0 0 0 0 0 0 0 2 1

0.000982903

CC(C)C(=C)C=0

24^

0.000057100

0.000001274

0.022295874

C=C(C)C(C)C=:0

24^

0.000115879

0.0000084

0.07248951500

CC(C)C(C)C(=0)C

242

0.000016100

No

No

C #C C (=C C )C #C

24.5

0.000247879

0.000010045

0.040523752

c#cc=c^ ^c#c

245

0.000247879

0.000010045

0.040523752

C #C C (=C )C #C C

247

0.000180807

0.000001267

0.007007467

c= c^ ^ c#cc#c

247

0.000165190

0.000008000

0.048456335

CC#CCC(C)(C)C

248

0.000044900

0 .0 0 0 0 0 0 0 2 1

0.000467630

CC#CC(C)(C)CC

248

0.000044900

0 .0 0 0 0 0 0 0 2 1

0.000467630

CCC#CC(C)(C)C

248

0.000044600

0 .0 0 0 0 0 0 0 2 1

0.000470632

C=CCCC(C)CC

249

0.000051900

0.000008400

0.161794420

-CC(C)=CCC(C)C

25.0*

0.000142025

0.000301000

2.119346738

C=CCCCC(C)C

25.2*

0.000051500

0.000008400

0.163191428

C/C=C\CC(C)CC

2&2

0.000096900

0.000091000

0.939296740

C/C=C\C(CC)CC

2&2

0.000096900

0.000091000

0.939296740

C/C=C\C(C)CCC

2&2

0.000096400

0.000091000

0.943621068

C/C=C\CCC(C)C

2&2

0.000096400

0.000091000

0.943621068

CC/C=C\C(C)CC

2&2

0.000095700

0.000091000

0.950504772

CC/C=C\CC(C)C

2&2

0.000095700

0.000091000

0.950504772

C/C=C/C(CC)CC

2&2

0.000108779

0.000140000

1.287011793

C/C=C/CC(C)CC

2&2

0.000108737

0.000140000

1.287510326

C/C=C/C(C)CCC

2&2

0.000108293

0.000140000

1.292788817

C/C=C/CCC(C)C

2&2

0.000108293

0.000140000

1.292788817

CC/C=C/C(C)CC

2&2

0.000107595

0.000140000

1.301180494

CC/C=C/CC(C)C

2&2

0.000107595

0.000140000

1.301180494

C #C C (=C )C = 0

2&3

0.000071100

0 .0 0 0 0 0 0 0 2 2

0.000305406

121

Vapor

OH R ate coeff­

O3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O3

R ate

a t 2&:C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(m ol/cm ^)sec“ ^

O^OH

C #C C (C )C (=0)C

2&3

0.000014200

0 .0 0 0 0 0 0 0 2 1

0.001481341

C#C C (=0)C (C )C

2&3

0.000013800

0 .0 0 0 0 0 0 0 2 1

0.001521800

C#CC(C)(C)CC#C

2&3

0.000024200

0.000000042

0.001738835

C#CC(CC)(C)C#C

2&3

0.000024200

0.000000042

0.001738835

C=CCC(C)CCC

25.4

0.000051900

0.000008400

0.161794420

CC(C)(C)CC=0

25.4

0.000034500

No

No

CCC(C)(C)C=0

25M

0.000040000

No

No

CCC(=C)C(C)CC

2&7

0.000087900

0.000008400

0.095521189

C=C(C)CC(C)=CC

25.7

0.000217842

0.000309400

1.420298573

C=C(C)CC=C(C)C

25.7

0.000217842

0.000309400

1.420298573

C=CC(C)=C(C)CC

2&7

0.000298542

0.000700000

2.344725982

C=C(C)C(C)=CCC

25.7

0.000298542

0.000700000

2.344725982

C = C (C )C = C (C )C C

25.7

0.000298542

0.000700000

2.344725982

CC(C)=CC(=C)CC

2&7

0.000298542

0.000700000

2.344725982

C=CC(CC)=C(C)C

25.7

0.000298542

0.000700000

2.344725982

CC=C(C)C(=C)CC

25J

0.000298542

0.000700000

2.344725982

CC=C(CC)C(=C)C

2&7

0.000298542

0.000700000

2.344725982

C=CCC(C)=C(C)C

2&7

0.000214722

0.000848400

3.951164182

C C #C C ((^ = 0

2&9

0.000038300

0.000000057

0.001481219

C C (=0)C #C C

2&9

0.000038300

0.000000057

0.001481219

C=CCC(CC)C#C

26

0.000060400

0.000008421

0.139485552

C#CCC(C=C)CC

26

0.000060400

0.000008421

0.139485552

C=CCC(C)CC#C

26

0.000060400

0.000008421

0.139485552

C=CC(C#C)CCC

26

0.000059900

0.000008421

0.140518932

C=CCCC((^C#C

26

0.000059900

0.000008421

0.140518932

C=CC(C)CCC#C

26

0.000059900

0.000008421

0.140518932

ccccc= o

26.01

0.000042800

No

No

c=cco

26.1

0.000046300

0.000008400

0.181245258

SMILES N otation

122

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

a t2 5 °C

concentration 1.56x10®

concentration 7x10^^

coefficient

m m of Hg

(O H /cm ^)sec“ ^

(mol/cm®) sec

OaiOH

cc#ccccc
c#ccccc#c

2&3

0.000048400

0 .0 0 0 0 0 0 0 2 1

0.000434047

2&3

0.000027600

0.000000042

0.001520159

C=CC(C)C(C)CC

2&3

0.000051900

0.000008400

0.161836727

CC(C)(C)C(C)=CC

2&3

0.000136771

0.000051800

0.378415026

CC(C)/C=C\CCC

2&3

0.000095400

0.000091000

0.954102231

CC(C)/C=C/CCC

26.3

0.000107234

0.000140000

1.305560700

C #C /C =C \C (C )(C )C

2&4

0.000124444

0.000000634

0.005090643

C #C /C =C /C (C )(C )C

2&4

0.000139857

0.000001246

0.008909112

ccc/c=c\c#c

2&4

0.000127506

0.000004000

0.031388610

C /C =C \C (C )(C )C #C

2&4

0.000099600

0.000004000

0.040167853

ccc/c=c/c#c

2&4

0.000142919

0.000007980

0.055860217

C /C =C /C (C )(C )C #C

2&4

0.000111494

0.000007980

0.071604663

c/c=c\ccc#c
cc/c=c\cc#c
c/c=c/ccc#c

2&4

0.000102700

0.000091021

0.886276838

2&4

0.000102079

0.000091021

0.891671542

2&4

0.000114556

0.000140021

1.222288588

C=CC(C)CC#C

2&4

0.000114556

0.000140021

1.222288588

cc/c=c/cc#c

26.4

0.000113935

0.000140021

1.228954373

C #C C (= 0 )C # C

26.5

0.000007644

0.000000042

0.005494505

C=C(C)CC(=C)C#C

2&5

0.000195374

0.000016400

0.083856943

C#CC(=C)C(=C)CC

2&5

0.000298976

0.000036876

0.123340992

C=C(C)C(=C)CC#C

2&5

0.000234109

0.000126021

0.538300075

C=CC(C)(C)C#CC

2&6

0.000083900

0.000001246

0.014854176

C=C(C)CCC(C)C

26.7

0.000088600

0.000008400

0.094768533

c=cccc#cc
c=ccc#ccc
cc=cc=c^^c

2&7

0.000086900

0.000008421

0.096854984

2&7

0.000086300

0.000008421

0.097552140

2&8

0.000297036

0.000700000

2.356612898

CCC(C)(C)C(C)C

27.0*

0.000006820

No

No

CC(C)C(C)C(C)C

27.1*

0.000013300

No

No

SMILES N otation

123

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m oI/cm ^)sec“ ^

O^OH

C#CC(C=C)(C)C#C

2T2

0.000063100

0.000001267

0.020070019

C=CC=CCC(C)C

27.2

0.000169837

0.000036820

0.216796646

C=CC=CC((^CC

2T2

0.000169837

0.000036820

0.216796646

C=CC(C)C/C=C\C

2T2

0.000135468

0.000099400

0.733753985

C=CCC(C)/C=C\C

2T2

0.000135000

0.000099400

0.736297671

C=CC(CC)/C=C\C

27^

0.000135000

0,000099400

0.736297671

C=CC/C=C\C(C)C

2T2

0.000134017

0.000099400

0.741694647

C =CC(C)/C=C\CC

2T2

0.000134017

0.000099400

0.741694647

C=CCC(C)/C=C/C

27^

0.000146856

0.000148400

1.010515435

C=CC(C)C/C=C/C

27^

0.000146856

0.000148400

1.010515435

C=CC(CC)/C=C/C

27.2

0.000146856

0.000148400

1.010515435

C=CC/C=C/C(C)C

2^2

0.000145873

0.000148400

1.017320387

C=CC(C)/C=C/CC

2T2

0.000145873

0.000148400

1.017320387

C=CC(C)(C)C=0

274

0.000075100

0.000001225

0.016318474

C=CC(C)CCCC

274

0.000049900

0.000008400

0.168291845

C =C (C )C (=C )/C =C \C

27.6

0.000235259

0.000040800

0.173580028

C =C (C )C (=C )/C =C /C

27.6

0.000245337

0.000044800

0.182677654

C C = C (C = C )C (= C )C

27.6

0.000228629

0.000044800

0.196027257

C=CC(=C)C=C(C)C

27^

0.000233999

0.000057400

0.245412143

C=CC(C)=C(C)C=C

2T6

0.000287238

0.000240782

0.838267891

C=C(C)C(C)=CC=C

27.6

0.000287238

0.000240782

0.838267891
0.838267891

SMILES N otation

C = C (C )C = C (C )C = C

27^

0.000287238

0.000240782

CC=C(C)C(=C)C=C

27^

0.000249447

0.000240782

0.965264674

ccc=cc=cc

27^

0.00022345

0.000224000

1.002460557

c /c = c c /c = c \c

2T6

0.000177849

0.000182000

1.023337953

c /c = c /c /c = c /c

27.6

0.000201561

0.000280000

1.389155144

C CC(=C)C(=0)C

2T8

0.000074000

0.000021500

0.290355917

C=C(C)C(C)CCC

28

0.000088600

0.000008400

0.094768533

124

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m ol/cm ^)sec“ ^

O ^O H

c = c c /c = c \c # c

2&4

0.000166146

0.000012400

0.074646588

c = c c /c = c /c # c

2&4

0.000181559

0.000016400

0.090237864

c=cc=ccc#c

2&4

0.000176177

0.000036841

0.209113514

C#CC(=C)C(=C)C=C

2&5

0.000252218

0.000006916

0.027420673

CCC(C)(C)CCC

28.6'

0.000006830

No

No

c= cc#cccc

2&6

0.000093000

0.000001246

0.013400206

C=CC(=C)CCCC

2&6

0.000169850

0.000009800

0.057698078

C=C(C)CCCC=C

2&6

0.000127213

0.000016800

0.132062128

C=CCC(=C)CCC

2&6

0.000126514

0.000016800

0.132791330

G=CCCC(=C)CC

2&6

0.000126514

0.000016800

0.132791330

c= ccc#cc= c

2&7

0.000131623

0.000009646

0.073284819

C=C(C)C(CC)CC

29.2

0.000089100

0.000008400

0.094296211

C = C C (C )C = C C = C

2&2

0.000208115

0.000045220

0.217283308

C = C C (C = C )/C = C \C

2&2

0.000173279

0.000107800

0.622119667

C =C C (C =C )/C =C /C

2&2

0.000185135

0.000156800

0.846951536

cc=cc=cc=c

29.7

0.000223378

0.000038080

0.170473399

C=CCC(C)C(C)C

2&8

0.000051900

0.000008400

0.161836727

c/c=cc=o

30.0*

0.000056500

0.000000637

0.011283043

c /c = c /c = o

30*

0.000060500

0.127400000

2106.222383

C =C C (=C )C (=0)C

30.0

0.000147579

0.000015309

0.103734187

C = C C (= 0 )C (C )= C

30.0

0.000042600

0.000009190

0.215485093

C=C(C)C(C)(C)CC

3&2*

0.000083000

0.000007960

0.095966974

CC(=C)C(C)(C)CC

3&2'

0.000083000

0.000007960

0.095966974

c#co

30.2

0 .0 0 0 0 1 1 1 0 0

0 .0 0 0 0 0 0 0 2 1

0.001885370

CC(C)CCC(C)C

30.3*

0.000012900

No

No

CC#CC(C)C(C)C

30.3

0.000050600

0 .0 0 0 0 0 0 0 2 1

0.000415349

CC(C)C#CC(C)C

30.3

0.000049200

0 .0 0 0 0 0 0 0 2 1

0.000426685

CC(C)CC(C)CC

30.4*

0.000013300

No

No

SMILES N otation

125

Vapor

OH R ate coeff­

O3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O3

Rate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

m m of Hg

(OH/cm®)sec~^

(mol/cm®)sec^i

OsiOH

C=CC(=C)CCC=C

30.8

0.000208412

0.000018200

0.087326918

C=CCC(=C)CC=C

30.8

0.000165154

0.000025200

0.152584786

C#CC(C#C)C(C)C

31.0

0.000029800

0.000000042

0.001409069

C#CC(C)C(C)C#C

31.0

0.000029800

0.000000042

0.001409069

C C C (= 0 )C (= 0 )C

31.1

0.000002080

No

No

CC(C)C(C)C=0

31.4

0.000048300

No

No

CC(C)(C)C(=0)C

31.5

0.000002640

No

No

C#CC#CC(C)C

31.7

0.000072500

0.000000042

0.000579307

C#CC(C)C#CC

31.7

0.000056500

0.000000042

0.000742843

CC(C)(C)C(C)CC

32.1*

0.000007680

No

No

CCCfCCCC

32.1

0.000047700

0 .0 0 0 0 0 0 0 2 1

0.000440406

C/C=C\CC(C)(C)C

32.5

0.000090800

0.000091000

1.002520049

C/C=C/CC(C)(C)C

3&5

0.000102627

0.000140000

1.364160077

C #C C ((^ C = 0

3&0

0.000179306

0.000006146

0.034276643

CC/C=C(C)C(C)C

33.0

0.000090500

0.000003980

0.043999085

CC/C=C(C)/C(C)C

33.0

0.000102341

0.000007960

0.077803742

CC(C)=CC(C)CC

33.0

0.000142025

0.000301000

2.119346738

CC=C(C)C(C)CC

33.0

0.000142025

0.000301000

2.119346738

CC=C(C)CC(C)C

33.0

0.000142025

0.000301000

2.119346738

CC(C)C=C(C)CC

33.0

0.000141042

0.000301000

2.134109905

CCC(C)=CC(C)C

33.0

0.000141042

0.000301000

2.134109905

CCC=C(C)C(C)C

33.0

0.000141042

0.000301000

2.134109905

CC=C(CC)C(C)C

33.0

0.000141042

0.000301000

2.134109905

c#cc#cc#c

3&2

0.000099600

0.000000063

0.000632690

C=C(C)C(C)=C(C)C

33.6

0.000406449

0.007000000

17.22234819

C=CC(0)C

33.7

0.000052100

0.000008400

0.161229063

cc#cc= o

3&8

0.000040900

3.36E-09

0.000082200

CC(C)(C)CCCC

34*

0.000007240

No

No

SMILES N otation

126

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

m m of Hg

(OH/cm®)sec“ ^

(m ol/cm ^)sec“ ^

Os-.OH

C#CC(=C)CC(C)C

34.1

0.000119558

0.000007980

0.066775247

C#CC(=C)C(C)CC

34.1

0.000119558

0.000007980

0.066775247

C=C(C)C(C)CC#C

34.1

0.000097100

0.000008421

0.086732397

C=C(C)CG(C)C#C

34.1

0.000097100

0.000008421

0.086732397

C#CC(CC)C(=C)C

34.1

0.000097100

0.000008421

0.086732397

C#CCC(=C)C(C)C

34.1

0.000096100

0.000008421

0.087618886

C#CC(C)C(=C)CC

34.1

0.000096100

0.000008421

0.087618886

ccc=c^ ^c#c

34.5

0.000187445

0.000051800

0.276225751

C#CC=C(C)CC

34.5

0.000187445

0.000051800

0.276225751

CC=C(CC)C#C

34.5

0.000187445

0.000051800

0.276225751

C#CCC=C((^C

34.5

0.000148365

0.000301021

2.028916898

CC=C(C)CC#C

34.5

0.000148365

30.1020986

202891.6898

CCC(=C)C#CC

3L9

0.000141971

0.000007980

0.056233162

C=C((^C#CCC

34.9

0.000141971

0.000007980

0.056233162

C=C((^CC#CC

34.9

0.000124185

0.000008421

0.067809925

C C (= 0 )C C C

3&4

0.000007450

No

No

CC=C(C=C)C(C)C

35.5

0.000225281

0.000126000

0.559302647

C=C(C)C(C)/C=C\C

3&5

0.000171880

0.000099400

0.578311432

C=C(C)C(C)/C=C/C

35.5

0.000183736

0.000148400

0.807681869

C=CC=C(C)C(C)C

3&5

0.000225281

0.000224000

0.994315816

C=CC(C)=CC(C)C

35.5

0.000225281

0.000224000

0.994315816

C = C (C )C = C C (C )C

35.5

0.000225281

0.000224000

0.994315816

C=CC(C)C=C(C)C

3&5

0.000180304

0.000309400

1.715993550

CC=C(C)C(C)C=C

3&5

0.000180304

0.000309400

1.715993550

C #C CC(=C)C#C

35.6

0.000125898

0.000008000

0.063579125

C#CCCC(C)(C)C

35.7

0.000015700

0 .0 0 0 0 0 0 0 2 1

0.001337633

C#CC(C)(C)CCC

35.7

0.000015700

0 .0 0 0 0 0 0 0 2 1

0.001337633

C#CCC(C)(C)CC

35.7

0.000015300

0 .0 0 0 0 0 0 0 2 1

0.001373696

SMILES N otation

127

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m oI/cm ^)sec“ ^

O ^O H

C#CC(C)(CC)CC

3&7

0.000015300

0 .0 0 0 0 0 0 0 2 1

0.001373696

CC(C)(C)C=C(C)C

35.9*

0.000136771

0.000051800

0.378415026

CC=C(C)C=CC

36.0

0.000297036

0.000700000

2.356612898

C =C ((^C C =0

3&4

0.000113874

0.000008400

0.073765492

C=CC(C)(C)CCC

36.6

0.000045800

0.000001225

0.026742420

C=CCCC(C)(C)C

36.6

0.000045800

0.000008400

0.183376597

C=CC(=C)C(C)C#C

3&7

0.000178007

0.000009821

0.055171867

C=CC(C)C(=C)C#C

36.7

0.000157837

0.000016400

0.103800394

C=C(C)C(C=C)C#C

36.7

0.000135371

0.000016821

0.124258923

CC=C(C=C)C#C

3T2

0.000297470

0.000036876

0.123965375

C =C C =C ((^C #C

3T2

0.000297470

0.000036876

0.123965375

CC=CC(=C)C#C

37^

0.000297470

0.000036876

0.123965375

C = C C (C )= C C # C

37^

0.000297470

0.000036876

0.123965375

c=c^^c=cc#c

3^2

0.000297470

0.000036876

0.123965375

C/C=C\C(C)(C)CC

3T3

0.000090800

0.000003980

0.043860252

C/C=C/C(C)(C)CC

37.3

0.000102627

0.000007960

0.077586604

C #C C C (=0)C

3T4

0 .0 0 0 0 1 2 2 0 0

0 .0 0 0 0 0 0 0 2 1

0.001725285

C #C C (=0)C C

3Z4

0.000011700

0 .0 0 0 0 0 0 0 2 1

0.001787507

C=C(C)CC(=C)CC

37^

0.000163755

0.000016800

0.102592144

C=C(C)C(=C)CCC

3T4

0.000225577

0.000126000

0.558566584

CCC(=C)C(=C)CC

37.4

0.000224956

0.000126000

0.560109393

C=CC(=C)C#CC

3T5

0.000248954

0.000005691

0.022859630

C=C(C)C#CC=C

37.5

0.000188219

0.000009210

0.048924272

CCC(=0)CC

37.7

0.000003840

No

No

C=CC(=0)CC

3&2

0.000038800

0.000001225

0.031535262

C=CC(C)(C)CC#C

3&5

0.000054300

0.000001246

0.022962619

C=CC(CC)(C)C#C

3&5

0.000054300

0.000001246

0.022962619

C =CC(=0)C(C)C

3&5

0.000040800

0.000001225

0.029987872

SMILES N otation

128

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

atay-C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(m ol/cm ^)sec“ ^

O^OH

c=ccccc#c

3&5

0.000057700

0.000008421

0.145851792

C=CCC(C)(C)C#C

38.5

0.000054300

0.000008421

0.155191181

C=CC(C)C(=0)C

3&5

0.000044300

0.000008400

0.189683289

cc#ccc^%c

39

0.000048400

0 .0 0 0 0 0 0 0 2 1

0.000434165

c= cccc= o
C=CC(=C)C=0

39.1

0.000080400

0.000008400

0.104425462

393

0.000082000

0.000000907

0.011060490

C=CCC(C)CC=C

3&6

0.000090500

0.000016800

0.185676713

C=CC(C)CCC=C

39.6

0.000090000

0.000016800

0.186592303

C=CC=CC(C)(C)C

40.1

0.000164583

0.000005670

0.034450743

C=CC(C)(C)/C=C\C

40.1

0.000129746

0.000005210

0.040126436

C=CC(C)(C)/C=C/C

40.1

0.000141602

0.000009190

0.064882496

c=cc=cccc
c=ccc/c=c\c
c=cc/c=c\cc
c=ccc/c=c/c
c=cc/c=c/cc

40.2

0.000167645

0.000036820

0.219630448

40.2

0.000132808

0.000099400

0.748446544

40.2

0.000132187

0.000099400

0.751964632

40.2

0.000144664

0.000148400

1.025822411

40.2

0.000144043

0.000148400

1.030247426

C #C C (=0)C = C

40.3

0.000018200

0.000001246

0.068530069

C=CC(=C)C(=C)CC

40.3

0.000188220

0.000013600

0.072428569

C=C(C)CC(=C)C=C

40.3

0.000245653

0.000018200

0.074088186

C=C(C)C(=C)CC=C

4&3

0.000264217

0.000134400

0.508672410

CC(C)(0)C

40.7*

0.000002640

No

No

C=CC(C=C)(C)C#C

41.4

0.000093200

0.000002471

0.026502358

CC/C=C\C(C)(C)C

41.8

0.000090500

0.000003980

0.043999085

CC/C^C\C(C)(C)C

41.8

0.000090500

0.000003980

0.043999085

CC/C=C/C(C)(C)C

4L8

0.000102341

0.000079600

0.077803742

CC/C=C/C(C)(C)C

41.8

0.000102341

0.000007960

0.077803742

oc=o

4Z6*

0.000000811

No

No

C=CC(C=C)CC=C

4&6

0.000128599

0.000025200

0.195958587

SMILES N otation

129

Vapor

OH R ate coeff­

O3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at2&=C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(mol/cm®)sec~^

OaiOH

c=ccc=cc=c

43.2

0.000206285

0.000045220

0.219211243

CC(C)/C=C\C(C)C

43.4*

0.000095100

0.000091000

0.957082768

CC(C)/C=C/C(C)C

4&4*

0.000106937

0.000140000

1.309186988

C = C C (= C )C (= C )C = C

4&6

0.000236691

0.000011340

0.047910567

C#CCC(C)C(C)C

43.8

0.000021800

0 .0 0 0 0 0 0 0 2 1

0.000963472

C#CC(CC)C(C)C

4&8

0.000021800

0 .0 0 0 0 0 0 0 2 1

0.000963472

C#CC(C)C(C)CC

4&8

0.000021800

0 .0 0 0 0 0 0 0 2 1

0.000963472

C#CC(C)CC(C)C

43.8

0.000021400

0 .0 0 0 0 0 0 0 2 1

0.000983506

CC#CC(C)CC

44.8

0.000048400

0 .0 0 0 0 0 0 0 2 1

0.000434165

C#CC(C)=C(C)C

4&2

0.000232998

0.000336437

1.443943282

CC(0)C

4&4'

0.000011300

No

No

C/C=C\C(C)C(C)C

45.7

0.000096400

0.000091000

0.943749306

C/C=C/C(C)C(C)C

45J

0.000108280

0.000140000

1.292945269

C#C C ((X ^C #C

4&8

0.000027600

0.000000042

0.001520880

C#CC(C)CC#C

4&8

0.000027600

0.000000042

0.001520880

ccccccc

46.0'

0.000010700

No

No

C = C (C )C = C C

46.0

0.000221944

0.000126000

0.567709955

C =C (C )C (=0)C

46.8

0.000072500

0.000021500

0.296383715

c#cc#ccc

46.9

0.000070700

0.000000042

0.000594311

cc#ccc#c

46.9

0.000054700

0.000000042

0.000767695

C#CC(C=C)C(C)C

47.1

0.000059900

0.000008421

0.140549299

C=CC(C)C(C)C#C

47.1

0.000059900

0.000008421

0.140549299

C C (C )/C =C \C #C

4T8

0.000127210

0.000004000

0.031461861

C C (C )/C =C /C #C

4T8

0.000142622

0.000007980

0.055976490

C /C =C \C (C )C #C

4T8

0.000102450

0.000091021

0.888440125

C /C =C /C (C )C #C

47^

0.000114260

0.000140021

1.225464326

C=C(C)CC(C)CC

48.1

0.000089100

0.000008400

0.094251646

CCC(=C)CC(C)C

48.1

0.000087900

0.000008400

0.095521189

SMILES N otation

130

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec~^

(moI/cm®)sec” ^

O^OH

CCCC(=C)C(C)C

48.1

0.000087600

0.000008400

0.095914915

C=CC#CC(C)C

48.2

0.000092700

0.000001246

0.013443126

C=CC(C)C#CC

48.2

0.000086600

0.000008421

0.097186824

c /c = c \c c c c

48.5*

0.000094200

0.000091000

0.965559859

c /c = c /c c c c

48.9*

0.000106102

0.000140000

1.319487014

CC(C)(C)CC(C)C

4&3*

0.000007230

No

No

C #C C(C=C)C#C

49.3

0.000065900

0.000008442

0.128114073

c#ccc=o

49.7

0.000044400

0 .0 0 0 0 0 0 0 2 1

0.000472992

C # C /C = C \C # G

50.0

0.000167257

0.000000655

0.003913141

c # c /c = c /c # c

50.0

0.000187294

0.000001267

0.006764780

CC(C)CC=0

50.0

0.000046100

No

No

CC(C)=C(C)CC

50.2*

0.000173955

0.000840000

4.828848701

C#CC(=C)C(C)(C)C

50.3

0.000114304

0.000001246

0.010900755

C#CC(=C)CCC

50.3

0.000117366

0.000007980

0.068021998

C=C(C)C(C)(C)C#C

50.3

0.000091800

0.000007980

0.086930118

C=C(C)CCC#C

5&3

0.000094900

0.000008421

0.088735122

C#CCC(=C)CC

50.3

0.000094300

0.000008421

0.089319933

CC=C(C)CCC

50.5*

0.000139834

0.000301000

2.152559024

c#cc#cc= c

50.5

0.000114075

0.000001267

0.011106728

c= cc^ ^ c= o

51.1

0.000076500

0.000008400

0.109788611

c=co

51.3

0.000041200

0.000001225

0.029699562

CC(C)CCCCC

51.5*

0.000012900

No

No

CCC(=C)C=0

51.6

0.000055300

0.000001274

0.023033698

C=CC(=C)CC(C)C

5L8

0.000169837

0.000009800

0.057702529

C=CC(=C)C(C)CC

5L8

0.000169837

0.000009800

0.057702529

C=C(C)C(C)CC=C

5L8

0.000127200

0.000016800

0.132075733

C=C(C)CC(C)C=C

5L8

0.000127200

0.000016800

0.132075733

C=CC(CC)C(=C)C

5L8

0.000127200

0.000016800

0.132075733

131

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at2yG

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm^)sec~^

(m ol/cm ^)sec“ ^

O^OH

5L8
5L8

0.000126217

0.000016800

0.133103660

0.000126217

0.000016800

0.133103660

52.1

0.000174720

0.000005691

0.032572115

0.000123873

0.000004000

0.032309207

0.000139286

0.000007980

0.057317252

0.000004080

No

No

cc/c=c/ccc
c#cccccc

5Z2
5Z2
5Z2
5Z4'
52.5*

0.000105403

0.000140000

1.328232012

0.000019200

0 .0 0 0 0 0 0 0 2 1

0.001095235

CCC#CC((^C

52.5

0.000047400

0 .0 0 0 0 0 0 0 2 1

0.000443167

CCC=G(C)C=C

0.000223450

0.000126000

0.563884064

C=C(C)C=CCC

5&5
5Z5

0.000223450

0.000126000

0.563884064

CC=C(CC)C=C

5Z5

0.000223450

0.000126000

0.563884064

CC=CC(=C)CC

5&5

0.000223450

0.000126000

0.563884064

C=C(C)C/C=C\C

5&5

0.000170049

0.000099400

0.584536161

C=C(C)C/C=C/C

52.5

0.000181905

0.000148400

0.815808836

C=CC=C(C)CC

5Z5

0.000223450

0.000224000

1.002460557

C=CCC=C(C)C

5Z5

0.000178473

0.000309400

1.733592061

CC=C(C)CC=C

5Z5

0.000178473

0.000309400

1.733592061

C=C(C)C(=C)C(C)=C

52.7

0.000228629

0.000044800

0.196027257

CC=C(CC)CC

0.000139212

0.000301000

2.162166573

0.000093500

0.000091000

0.972770216

CC(C)=CCCC

52.8*
53.2*
&L0*

0.000139834

0.000301000

2.152559024

C=CC(=C)CC#C

54.2

0.000176177

0.000009821

0.055745062

C=CCC(=C)C#C

0.000156006

0.000016400

0.105018235

C=CCC(C)(C)CC

54^
54^

0.000045300

0.000008400

0.185612708

C=CC(C)(CC)CC

54.3

0.000045300

0.000012250

0.270689398

CCC(=C)CCC

55.1

0.000085700

0.000008400

0.097962300

C=CC(=G)C(C)C=C

5&7

0.000208115

0.000018200

0.087451486

SMILES N otation

C=CCC(=C)C(C)C
C=CC(C)C(=C)CC

c#cc=cc=co
c#c/c=c\c
c#c/c=c/c
CC(=0)C(C)C

ccc=cccc

132

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at2&'C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm^)sec~^

(mol/cm®)sec~^

OsiOH

C=CC(C=C)C(=C)C

55.7

0.000165479

0.000025200

0.152285589

C =C C (=C )/C =C \C

56.5

0.000193344

0.000009650

0.049917489

C =C C (=C )/C =C /C

5&5

0.000203422

0.000013600

0.067015971

C=CC=C(C)C=C

56.5

0.000223378

0.000038080

0.170473399

C=C(C)C=CC=C

56.5

0.000223378

0.000038080

0.170473399

CC=C(C=C)C=C

56.5

0.000195134

0.000038080

0.195147790

C C (= 0)C (= 0)C

56.8*

0.000000318

No

No

C=C C C (=0)C

5&9

0.000042300

0.000008400

0.198675967

C=C(C)C=C(C)C

56.9

0.000297036

0.000700000

2.356612898

CCC(CC)CC

57.9*

0.000011800

No

No

C=CC(CC)(C)C=C

5&4

0.000084400

0.000002450

0.029038722

C=CC(C)(C)CC=C

5&4

0.000084400

0.000009625

0.114080694

c=ccccc=c

5&5

0.000087800

0.000016800

0.191246615

c= cccccc

59.3*

0.000049300

0.000008400

0.170447766

cco

59.3'

0.000005580

No

No

C#CCCC(C)C

5&8

0.000019200

0 .0 0 0 0 0 0 0 2 1

0.001095984

C=C(C)CCCC

60.9'

0.000086400

0.000008400

0.097170668

C =C C(=0)C =C

61.2

0.000028700

0.000002450

0.085307595

CCC(C)CCC

61.5*

0 .0 0 0 0 1 1 2 0 0

No

No

C=CC(C=C)(C)C=C

62.9

0.000123345

0.000003675

0.029794482

C#CC(C)C(C)(C)C

64.6

0.000015700

0 .0 0 0 0 0 0 0 2 1

0.001338751

C#CC(C)(C)C(C)C

64.6

0.000015700

0 .0 0 0 0 0 0 0 2 1

0.001338751

C#CCC(C)CC

64.7

0.000019600

0 .0 0 0 0 0 0 0 2 1

0.001071164

C#CC(C)CCC

64.7

0.000019200

0 .0 0 0 0 0 0 0 2 1

0.001095984

CC(C)CCCC

66.0'

0.000010700

No

No

CC#CC(C)(C)C

6&2

0.000043100

0 .0 0 0 0 0 0 0 2 1

0.000487070

C=CC(CC)C(C)C

6&5

0.000051900

0.000008400

0.161836727

C=CC(C)CC(C)C

6&5

0.000051500

0.000008400

0.163232983

SMILES N otation

133

Vapor

OH R ate coeff­

O3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O3

R ate

a t 2&:C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec^i

OaiOH

CCC(=C)C(C)C

67.4*

0.000085500

0.000008400

0.098302816

CC=C(C)C(C)C

67^*

0.000139537

0.000301000

2.157138650

C#CC(C)(C)C#C

67.6

0.000022400

0.000000042

0.001878188

C=C(C)C(C)C(C)=C

6T7

0.000164080

0.000016800

0.102389261

C=C(C)C(=C)C(C)C

67J

0.000225281

0.000126000

0.559302647

C=CC(C)=C(C)C

6&7

0.000297036

0.000700000

2.356612898

C=C(C)C(C)=CC

6&7

0.000297036

0.000700000

2.356612898

cc#c/c=c\c
cc#c/c=c/c

6&8

0.000150606

0.000004000

0.026574391

6&8

0.000166018

0.000007980

0.048088066

CC(C)C(C)CC

68.9*

0 .0 0 0 0 1 1 1 0 0

No

No

C=CC(CC)C#C

6&6

0.000057700

0.000008421

0.145884902

C=CCC((^C#C

69.6

0.000057700

0.000008421

0.145884902

C=CCCC(=C)C

6&9

0.000125008

0.000016800

0.134390952

C/C=C/CC(C)C

69.9

0.000106089

0.000140000

1.319649996

C/C=C\CC(C)C

69.9

0.000009420

0.000091000

9.656941299

c#c/c=c\cc
c#c/c=c/cc
c/c=c\cc#c
c/c=c/cc#c

70.6

0.000125379

0.000004000

0.031921157

70.6

0.000140792

0.000007980

0.056704205

70.6

0.000100573

0.000091021

0.905022412

70.6

0.000112429

0.000140021

1.245414892

C=C(C)C(=C)C#C

70.8

0.000297470

0.000036876

0.123965375

cc#ccc=c

71.3

0.000084800

0.000008421

0.099284107

C/C=C\C(C)CC

71.3

0.000094200

0.000091000

0.96569413

C/C=C/C(C)CC

71.3

0.000106089

0.000140000

1.319649996

C=CC(C=C)C(C)C

71.6

0.000090000

0.000016800

0.186619141

C=CC(C)C(C)C=C

71.6

0.000090000

0.000016800

0.186619141

C=CC=CC(C)C

7Z6

0.000167348

0.000036820

0.220020062

C=CC(C)/C=C\C

7&6

0.000132512

0.000099400

0.750123302

C=C C (C )/C =C /C

7Z6

0.000144368

0.000148400

1.027931845

SMILES N otation

134

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

Rate

at2&'C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec~^

(mol/cm®)sec“ ^

OaiOH

C=CCC(C)CC

73.5*

0.000049700

0.000008400

0.168970458

C=CC(C=C)C#C

74.9

0.000096000

0.000016400

0.170657192

CC/C=C\C(C)C

76.1*

0.000093300

0.000091000

0.975868718

CC/C=:C/C(C)C

76.1*

0.000105106

0.000140000

1.331985516

c#cccc#c

76.1

0.000025400

0.000000042

0.001651966

C=CC(=C)C(C)(C)C

76.4

0.000164583

0.000005670

0.034450743

C=CC(C)(C)C(C)=C

76.4

0.000121946

0.000009190

0.075340641

C=CC(=C)CCC

76.5

0.000167650

0.000009800

0.058455141

C=CCC(=C)CC

76.5

0.000124387

0.000016800

0.135062272

c=ccc= o

77

0.000074500

0.000008400

0.112742259

C = C C C C (C )C

7&2

0.000049300

0.000008400

0.170493100

c= cc#ccc

79.2

0.000090900

0.000001246

0.013713944

c=cc#cc=c

79.2

0.000137105

0.000002471

0.018022749

CC(C)CC(C)C

7&4*

0.000010700

No

No

cc#cccc

80.8

0.000046200

0 .0 0 0 0 0 0 0 2 1

0.000454768

C=C(C)C(C)CC

81.1'

0.000086500

0.000008400

0.097106531

C=CC(CC)CC

81.5'

0.000049700

0.000008400

0.168970458

CCC=C(C)CC

82.2'

0.000139212

0.000301000

2.162166573

C=CC(=C)CC=C

8Z3

0.000206285

0.000018200

0.088227435

C=CC(C)CCC

8 24'

0.000049300

0.000008400

0.170493100

C #C C (=0)C

824

0.000009990

0.000000057

0.005677312

CCC(C)(C)CC

82.7'

0.000004630

No

No

c#cc= cc= c

84

0 .0 0 0 2 2 2 2 2 2

0.000005691

0.025609526

CC(C)=CC(C)C

84.4'

0.000139537

0.000301000

2.157138650

C=C(C)C(C)C(C)C

8&9

0.000088600

0.000008400

0.094782546

CC(C)C(=C)C(C)C

8&9

0.000087300

0.000008400

0.096241323

cc=cc= cc

87.9'

0.000221944

0.000224000

1.009262143

c= cc= cc= c

8 29'

0.000174104

0.000006895

0.039602812

SMILES N otation

135

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec"i

OsiOH

CC(=0)CC

90.6

0.000002080

No

No

C#C C(-C)C(C)C

91

0.000117070

0.000007980

0.068194490

C=C(C)C(C)C#C

91

0.000094600

0.000008421

0.089013575

C = C C (= 0 )C

91.3

0.000037100

0.000003310

0.089188638

c#cc=c^^c

9^3

0.000185940

0.000051800

0.278462820

CCCfCCC

9&8

0.000045600

0 .0 0 0 0 0 0 0 2 1

0.000460971

c=c^^c#cc

93.1

0.000140466

0.000007980

0.056836012

C=CC(C)C(C)C

9&5

0.000049300

0.000008400

0.170538458

C=C(C)CC(C)C

94.5*

0.000086400

0.000008400

0.097185400

C #C C (=C )C #C

95.2

0.000154075

0.000001267

0.008223270

C#CCC(C)(C)C

95.4

0.000013500

0 .0 0 0 0 0 0 0 2 1

0.001556121

C#CC(C)(C)CC

9&4

0.000013500

0 .0 0 0 0 0 0 0 2 1

0.001556121

CC(C)(C)C=0

96.7

0.000034900

No

No

C=CC(C)(C)C(C)C

9&1

0.000045800

0.000001225

0.026750073

C=CC(C)C(C)(C)C

98.1

0.000083000

0.000007960

0.095966974

C=C(C)CC(C)=C

9&9

0.000162249

0.000016800

0.103544322

C=C(C)C(=C)CC

9&9

0.000223450

0.000126000

0.563884064

CC(C)(C)C(C)C

1 0 2

*

0.000005030

No

No

C=CC(C)(C)C#C

103

0.000052500

0.000001246

0.023746914

c=cccc#c

103

0.000055500

0.000008421

0.151641597

C=CC(C)(C)CC

104*

0.000043600

0.000001225

0.028094338

CC(C)(C)CCC

105*

0.000005040

No

No

C=CC(CC)C=C

106

0.000087900

0.000016800

0.191214016

C=CC(C)CC=:C

106

0.000087900

0.000016800

0.191214016

C=CC(C)CC=C

106

0.000087800

0.000016800

0.191275148

C=C(C)C(=C)C=C

108

0.000195134

0.000038080

0.195147790

C = C C (C )= C C

109

0.000221944

0.000126000

0.567709955

C/C=C\C(C)(C)C

1 1 1

*

0.000089000

0.000003980

0.04474372

SMILES N otation

136

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

a t 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec“ ^

(mol/cm®)sec“ ^

O^OH

0.000100835

0.000007960

0.078965662

0.000039700

No

No

C/C=C/C(C)(C)C

1 1 1

cccc= o

1 1 1

C=C(C)C(C)(C)C

1 1 2

*

0.000081200

0.000007960

0.09808575

C=CC(C=C)C=C

114

0.000126110

0.000025200

0.199824915

C#CC(C)C(C)C

117

0.000019100

0 .0 0 0 0 0 0 0 2 1

0.001096734

CC#CC((^C

1 2 0

0.000045900

0 .0 0 0 0 0 0 0 2 1

0.000457710

C C (= 0 )C = 0

1 2 1

0.000019900

No

No

C#CC(C)C#C

1 2 2

0.000025100

0.000000042

0.001671483

c#cc#cc

125

0.000069200

0.000000042

0.000607251

CC(C)=C(C)C

126*

0.000136200

0.000301000

2.209977527

CO

127*

0.000000961

No

No

c= cc= ccc

129*

0.000165518

0.000036820

0.222453109

C=CCC(C)(C)C

130*

0.000043600

0.000008400

0.192646888

G#CC#C

132*

0.000028392

0.000000042

0.001479290

c#ccccc

133*

0.000017000

0 .0 0 0 0 0 0 0 2 1

0.001237501

C#CC(=C)CC

134

0.000115239

0.000007980

0.069277625

C=C(C)CC#C

134

0.000092800

0.000008421

0.090769748

C=CC(=C)C(C)C

138

0.000167348

0.000009800

0.058560473

C=C(C)C(C)C=C

138

0.000124712

0.000016800

0.134710861

C=CC(=C)C#C

144

0 .0 0 0 2 2 2 2 2 2

0.000005691

0.025609526

c /c = c \c c c

150*

0.000092100

0.000091000

0.988549769

c/c=c/ccc
cccccc

150*

0.000103897

0.000140000

1.347483138

151*

0.000008510

No

No

C=C(C)C(C)=C

152*

0.000221944

0.000126000

0.567709955

C = C (C )C = 0

155*

0.000053800

0.000001274

0.023678361

C=CC(C)(C)C=C

156

0.000082600

0.000002450

0.029668928

cc= c^ ^ cc

157*

0.000137706

0.00030100

2.185810635

C#CC(C)=CC

164

0.000185940

0.000051800

0.278462820

*

137

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

atay-C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(mol/cm^) sec~i

O^OH

165*

0.000091400

0.00009100

0.99540521

165*

0.000103276

0.000140000

1.355590109

172*

0.000130681

0.000099400

0.760629685

172*

0.000142537

0.000148400

1.041131719

172

0.000017000

0 .0 0 0 0 0 0 0 2 1

0.001238458

CC(C)C=0

173

0.000036300

No

No

CCC(=C)CC

175*

0.000036000

0.000008400

0.100454369

c#ccc#c
c=ccccc
c=cc^^c#c

180

0.000023300

0.000000042

0.001802804

184*

0.000047100

0.000008400

0.178428504

185

0.000055200

0.000008421

0.152456610

CCC(C)CC

190*

0.000089500

No

No

C=C(C)CCC

195*

0.000084200

0.000008400

0.099713438

c#cc=o

197

0.000030000

3.36E-09

0.000111842

C#CCC(C)C

2 0 1

0.000017000

0 .0 0 0 0 0 0 0 2 1

0.001238458

C=CC(=C)CC

203

0.000165518

0.000009800

0.059208052

C/C=C\C(C)C

207*

0.000091700

0.000091000

0.991884683

C/C=C/C(C)C

207*

0.000103601

0.000140000

1.351344360

CC(C)CCC

2 1 1

*

0.000008500

No

No

cc#cc=c

214

0.000089400

0.000001246

0.013945072

C=CC(=C)C=C

218

0.000153219

0.000006895

0.045000858

c=cccc=c

221*

0.000085600

0.000016800

0.196169428

C C (=0)C

232

0.000000318

No

No

CC(C)C(C)C

235*

0.000008490

No

No

CCfCCC

236*

0.000044100

0 .0 0 0 0 0 0 0 2 1

0.000476729

C=C(C)CC=C

244

0.000122881

0.000016800

0.136717415

C=C(C)C(C)C

252*

0.000083900

0.000008400

0.100066072

o=cc=o

255*

0.000039500

No

No

C=CC(C)CC

269*

0.000047100

0.000008400

0.178478183

SMILES N otation

cc/c=c\cc
cc/c=c/cc
c/c=c\cc=c
c/c=c/cc=c
c#cc^^cc

138

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at2&:C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(O H /cm ^)sec“ ^

(m oI/cm ^)sec“ ^

O^OH

C=CCC(C)C

272'

0.000047100

0.000008400

0.178478183

c= cc= o

274

0.000040300

0.000000196

0.004866035

ccc= o

317

0.000034300

No

No

CC(C)(C)CC

319*

0.000002840

No

No

c#ccc= c

404

0.000053400

0.000008421

0.157681747

c= cc= cc

405*

0.000164012

0.000036829

0.224495550

C=CC(C)(C)C

431*

0.000041800

0.000001225

0.029298645

CCC=C(C)C

431*

0.000137706

0.000301000

2.185810635

CC=C(C)C

468*

0.000136188

0.000301000

2.210180045

C#CC(C)(C)C

482

0.000011700

0 .0 0 0 0 0 0 0 2 1

0.001794441

c /c = c \c c

506'

0.000089900

0.000091000

1.012076092

c/c=c/cc
ccccc

506'

0.000101770

0.000140000

1.375648436

514*

0.000006310

No

No

C=C((^C=C

550'

0.000164012

0.000009800

0.059751667

C#CC(C)=C

592

0.000113733

0.000007980

0.070194884

C=C(C)CC

610*

0.000082100

0.000008400

0.102296573

c=cccc

635'

0.000044900

0.000008400

0.187193955

CC(C)CC

689'

0.000006300

No

No

cc#cc

706*

0.000042500

0 .0 0 0 0 0 0 0 2 1

0.000493603

C#CC(C)C

726

0.000014500

0 .0 0 0 0 0 0 0 2 1

0.001451441

c= ccc= c

748'

0.000083500

0.000016800

0.201166189

cc= o

902

0.000026500

No

No

C=CC(C)C

903'

0.000044600

0.000008400

0.188440622

CC(C)(C)C

1290*

0.000001040

No

No

c#cc= c

1350*

0.000062600

0.000001246

0.019898305

c#ccc
c#cccc
c/c=c\c

1410*

0.000012600

0 .0 0 0 0 0 0 0 2 1

0.001661652

1570*

0.000014800

0 .0 0 0 0 0 0 0 2 1

0.001422259

1760*

0.000088400

0.000091000

1.029314888

SMILES N otation

139

SMILES N otation

Vapor

OH R ate coeff­

O 3 R ate coeff­

Relative

Pressure

icient w ith OH

icient W ith O 3

R ate

at 25° C

concentration 1.56x10®

concentration 7x10^^

coefficient

mm of Hg

(OH/cm®)sec^i

(moI/cm®)sec“ ^

OaiOH

c/c=c/c
cccc

1760*

0.000100264

0.000140000

1.396309275

1820*

0.000004340

No

No

c= cc= c

2 1 1 0

*

0.000103896

0.000005670

0.054573805

c= ccc

2250'

0.000042700

0.000008400

0.196509486

C=C(C)C

2310*

0.000080600

0.000008400

0.104207605

CC(C)C

2610*

0.000003810

No

No

cc#c
c#c
ccc

4310*

0 .0 0 0 0 1 1 1 0 0

0 .0 0 0 0 0 0 0 2 1

0.001886426

5240*

0.000001270

0 .0 0 0 0 0 0 0 2 1

0.016517225

7150*

0.000001980

No

No

cc= c

8690'

0.000041200

0.000008400

0.203684952

cc

31500*

0.000000424

No

No

c= c

52100*

0.000013300

0.000001225

0.092166245

140

Bibliography
[1] Broadbelt, L.J., Stark, S.M. and Klein, M.T. “Computer generated py­
rolysis modeling: on-the-fly generation of species, reactions, and rates,”
Industrial & Engineering Chemistry Research, vol. 33, pp. 790-799, 1994.
[2] Claude, P.A., Battin-Leclerc, F., Foment, R., Warth. V., Come, G.M.
and Scacchi, G. “Construction and simplification of a model for the
oxidation of alkanes.,” Combustion and Flame., vol. 122, pp. 451-462,
2000 .

[3] De W itt, M.J., Dooling, D.J. and Broadbelt, L.J. “Computer generation
of reaction mechanisms using quantitative rate information: application
to long-chain hydrocarbon pyrolysis,” Industrial & Engineering Chem­
istry Research, vol. 39, pp. 2228-2237, 2000.
[4] Broadbelt. L.J., Stark, S.M. and Klein, M.T. “Computer generated reac­
tion modelling: decomposition and encoding algorithms for determining
the species uniqueness,” Computers and chemical engineering., vol. 20,
pp. 113-129, 1996.
[5] Bournez, O., Come, G.M., Conraud, V., Kirchner, H. and Ibanescu, L.
“Automated generation of kinetic chemical mechanisms using rewrit­
ing,” Lecture Notes in Computer Science., vol. 2659, pp. 367-376, 2003.
[6] Bournez, O., Come, G.M., Conraud, V., Kirchner, H. and Ibanescu, L.
“A rule-based approach for automated generation of kinetic chemical
141

mechanism,” Lecture Notes in Computer Science., vol. 2706, pp. 30-45,
2003.
[7] Grenda, J.M., Androulakis, I.P., Dean, M.A., and Green, W.H. Jr. “Ap­
plication of computational kinetic mechanism generation to model the
autocatalytic pyrolysis of methane” . Industrial & Engineering Chem­
istry Research, vol. 42, pp. 1000-1010, 2003.
[8] W arth, V., Battin-Leclerc, P., Fournet, R., Glaude, P.A., Come, G.M.
and Scacchi, G. “Computer based generation of reaction mechanisms for
gas-phase oxidation,” Computers and Chemistry., vol. 24, pp. 541-560,
2000 .

[9] Wang,

S.W.,

Levy, H., Li, G.,

Rabitz,

H. and Georgopoulos,

P.G. “Studies of methods for condensing atmospheric photochem­
istry mechanisms into computationally efficient schemes,” Technical
Report ORC-TR99-05 - Computational Chemodynamics Laboratory.,
http://www.ccl.rutgers.edu/reports/orc/orctr99-05.pdf 2001.
[10] Turanyi, T., Tomlin, A.S., Pilling, M.J., Merkin, J.H., and Brind­
ley, J. “Mechanism reduction for the oscillatory oxidation of hydrogensensitivity and quasi-steady-state analysis.,” Combustion and Flame.,
vol. 91, pp. 107-130, 1992.
[11] Tomlin, A.S., Turanyi, T., and Pilling, M.J. “M athematical tools for the
construction, investigation and reduction of combustion mechanisms.,”
Compr. Chem. Kinet. vol. 35, pp. 293-437, 1997.
[12] Coxson, P.G., and Bischoff, K.B. “Lumping strategy. 1. introductory
techniques and applications of cluster analysis.,” Ind. Eng. Chem. Res,
vol. 26, pp. 1239-1248, 1987.

142

[13] Coxson, P.G., and Bischoff, K.B. “Lumping strategy. 2. A system theo­
retic approach.,” Ind. Eng. Chem. Res, vol. 26, pp. 2151-2157, 1987.
[14] Bhattacharjee, B., Schwer, D.A., Barton, P.I., and Green W.H Jr.
“Optimally-reduced kinetic models: reaction elimination in large-scale
kinetic mechanisms” , Combustion and Flame vol. 135, pp. 191-208,
2003.
[15] Schwer, D.A., Lu. P.S., and Green. W.H Jr. “An adaptive chemistry
approach to modeling complex kinetics in reacting flows” , Combustion
and Flame vol. 133, pp. 451-465, 2003.
[16] Whitehouse, L.E., Tomlin, A.S., and Pilling, M.J. “Systematic reduction
of complex tropospheric chemical mechanisms. P art I: sensitivity and
time-scale analyses.,” Atmos. Chem. Phys., vol. 4, pp. 2025-2056, 2004.
[17] Li, G., and Rabitz, H. “Gombined symbolic and numerical approach to
constrained nonlinear lumping - with application to an H2/O 2 oxidation
model.,” Chem. Eng. Sci., vol. 51, pp. 4801-4816, 1996.
[18] Li, G., Tomlin A.S., Rabitz, H., and Toth, J. A general analysis of exact
nonlinear lumping in chemical kinetics.,” Chem. Eng. Sci., vol. 49, pp.
343-361, 1994.
[19] Li, G., Rabitz, H., and Toth, J. Determination of approximate lumping
schemes by a singular peturbation method.,” J. Chem. Phys., vol. 99,
pp. 3562-3574, 1993.
[20] Carter, W.P.L. “A detailed mechanism for the gas-phase atmospheric
reactions of organic com pounds,” A tm ospheric E n viro n m e n t P a rt A -

General Topics, vol. 24A, pp. 481-518, 1990.
[21] Dodge, M.C., “Chemical oxidant mechanisms for air quality modeling:
critical review,” Atmospheric Environment, vol. 34, pp. 2103-2130, 2000.
143

[22] Gery, M.W., W hitten, G.Z., Killus, J.P., and Dodge, M.G. “A pho­
tochemical kinetics mechanism for urban and regional scale computer
modeling,” J. Geophys. Res. - Atmospheres, vol. 94, pp. 12925-12956,
1989.
[23] Stockwell, W.R., Kirchner, F., Kuhn, M., and Seefeld, S. “A new mech­
anism for regional atmospheric chemistry modeling,” J. Geophys. Res.,
vol. 102, pp. 25847-25879, 1997.
[24] Stockwell, W.R., Middleton, P., Ghang, J.S., and Tang, X.T., “The
2nd generation regional acid deposition model chemical mechanism for
regional air quality modeling,” J. Geophys. Res., vol. 95, pp. 1634316367,1990.
[25] Peters, N., and Rogg B., “Reduced kinetic mechanisms for applications
in combustion system,” Springier, Berlin, 1992.
[26] Rabtiz, H., “Chemical dynamics and kinetics phenomena as revealed by
sensitivity analysis techniques,” Chemical Reviews, vol. 87, pp. 101-112,
1987.
[27] Foias, C. Sell, G.R., and Temam, R., “Inertial manifolds for nonlinear
evolutionary equations,” Journal of Differential Equations, vol. 73, pp.
309-353, 1988.
[28] Foias, C. Nicolaenko, B. Sell, and Temam, R., “Inertial manifolds for
the kuramoto-sivashinsky equation and an estimate of their lowest di­
mension,” Journal de Mathématiques Pures et Appliquées, vol. 67, pp.
197-226, 1988.
[29] Garr, L., “Application of centre manifold theory,” Springer, New York,
1981.

144

[30] Roussel, M.R., and Fraser, S.J, “On the geometry of transient relax­
ation,” Journal of Chemical Physics, vol. 94, pp. 7106-7113, 1991.
[31] Maas, U.A., and Pope S.B., “Simplifying chemical-kinetics - intrin­
sic low-dimensional manifolds in composition space” . Combustion and
Flame, vol. 88, pp. 239-264, 1992.
[32] Hagan, H.T., Demuth, H.B. and Beale, M. Neural network design, PWS
Publishing Company, USA., 1996.
[33] Jain, A.K., Duin, R.P.W., and Mao, J. “Statistical pattern recognition:
A review” , IEEE Transactions on Pattern Analysis and Machine Intel­
ligence vol. 22, pp. 4-37, 2000.
[34] Demuth, H.T., and Beale, M. “Neural network toolbox for use with
MATLAB - Version 5” , The MathWorks, Inc., USA., 1992.
[35] Abdul-Wahab, S.A., and Al-Alawi, S.M., “Assessment and prediction of
tropospheric ozone concentration levels using artificial neural networks” ,
Environmental Modelling & Software, vol. 17, pp. 219-228, 2002.
[36] A simple walkthrough,
http://grb.mnsu.edu/grbts/doc/manual/Gamma_Ray_Burst_ToolShed_He.html.
(accessed Feb 22, 2006)
[37] Burns, J.A., and Whitesides, G.M., “Feedforward neural networks in
chemistry: Mathematical systems for classification and pattern recogni­
tion” , Chemical Reviews vol. 93 pp. 2583-2601, 1993.
[38] Master Chemical Mechanism, http://chm lin9.leeds.ac.uk/M CM /hom e.htt.
(accessed Nov 1, 2004).
[39] Syracuse research corporation,
http://www.syrres.com/esc/smiles_ex_notations.htm (accessed June 15,
2004).
145

[40] Spectrum research (SAMS), Accelarating computer assisted structure
elucidation (United States), http://www.specres.com / (accessed Dec 26,
2004).
[41] Olsen, E. and Nielsen, F., “Predicting vapour pressures of organic com­
pounds from their chemical structure for classification according to the
VOC directive and risk assessment in general,” Molecules, vol. 6, pp.
370-389, 2001.
[42] U.S Environmental protection agency. Estimation program interface
(EPI) Suite, http://w w w .epa.gov/opptintr/exposure/docs/episuite.htm .
(accessed Mar 4, 2004).
[43] Finlayson-Pitts, B.J. and Pitts, J.N. Jr., “Chemistry of the upper and
lower atmosphere” , Academic press, 2000.
[44] Brasseur, G.P, Orlando, J.J. and Tyndall, G.S., Atmospheric chemistry
and global change, Oxford university press, 1999.
[45] Antoine, C., “Tensions des vapeurs: Nouvelle relation entre les tensions
et les temperatures” , Compt. Rend., vol. 107, pp. 681, 1985.
[46] Grain, C.F., “Vapor pressure, in handbook of chemical property esti­
mation methods” , Lyman, W .J., Reehl, W.F., Rosenblatt, D.H., Eds.,
McGraw-Hill, New York., 14, 1982.
[47] Mackay, D., Bobra, A., Chan, D.W. and Shin, W.Y., “Vapor pressure
correlation for low-volatility environmental chemicals” . Environmental
Science and Technology, vo. 16, pp. 645-649, 1982.
[48] Weininger, D. “SMILES, A chemical language and information system.
1. Introduction to methodology and encoding rules,” J. Chem. Inf. Comput. Sci., vol. 28, pp. 31-36, 1988.

146

[49] Kwok, E.S.C., Atkinson. R., and Roger, A. “Estimation of hydroxyl
radical reaction rate constants for gas-phase organic compounds using
a structure-reactivity relationship: An update” , Atmospheric Environ­
m ent vol. 29, pp. 1685-1695, 1995.
[50] Wang, H., Chen, D. and Chen, Y. “The integrated strategy of pattern
classification and its application in chemistry,” Chemometrics and In­
telligent Laboratory Systems.,., vol. 70, pp. 23-31, 2004.
[51] Hassibi, B., Stork, D.G. and Wolff, G.J., “Optimal brain surgeon and
general network pruning,” Proceedings o f IEEE International Joint Con­
ference on Neural Networks, vol. 2, pp. 293-299, 1992.

147