1
Theoretical studies of the unimolecular and bimolecular tautomerization of cytosine

2
Computational investigations of the unimolecular and bimolecular tautomerization of isolated and dimeric cytosine have been performed.

3
Stationary and transition states of the isolated and dimeric cytosine systems were characterized at the MP2(full)/6-311+G(2d,2p)//MP2(full)/6-31G* and MP2(full)/6-311+G(2d,2p)//B3LYP/6-31G* levels of theory, respectively.

4
In the solid phase, cytosine exists in a single tautomeric state.

5
In contrast, experiments conducted in the gas phase find that cytosine exists as a mixture of several tautomeric forms.

6
The energy barriers for unimolecular tautomerization of the tautomeric form found in solids to those observed in the gas phase are high and vary between 142.2 and 169.9 kJ mol−1.

7
The formation of dimers with dual hydrogen bonding interactions results in a significant lowering of the barriers to tautomerization, thus facilitating tautomerization during the sublimation process.

8
Based on such bimolecular tautomerization mechanisms, we believe that the relative populations of the cytosine tautomers produced in the gas phase via thermal vaporization cannot be accurately predicted without considering intermolecular hydrogen bonding interactions present in the condensed phase.

Introduction

9
Nucleic acid bases are constituents of DNA and RNA and play important roles in the genetic code transformation.

10
They occur as one predominant isomer, but can exist in other minor tautomeric forms.

11
The presence of these minor tautomers in nucleic acids can lead to base-pair mismatches and result in gene mutation.1,2

12
Previous theoretical studies3–17 have examined a variety of the possible tautomers of isolated cytosine and found that six lie relatively low in energy.

13
Several of these six tautomers have been calculated to be of very similar stability such that the relative energy ordering of these tautomers is sensitive to the theoretical approach employed.

14
In the present work, we examine these six most stable tautomers of cytosine as shown in Fig. 1.

15
The keto–amino form (C1) is the “canonical” structure of cytosine found in DNA and RNA.

16
Indeed, X-ray diffraction (XRD) studies find that this is the only tautomeric form that occurs in cytosine crystals.18,19

17
Similarly, XRD19 and neutron diffraction20 studies find that C1 is the only form observed in cytosine monohydrate.

18
In addition, experimental21,22 and theoretical17,23 studies indicate that cytosine adopts predominantly the C1 form in aqueous solution.

19
In contrast, resonance enhanced multiphoton ionization (REMPI) experiments24 find that both keto–amino (C1) and enol–amino (C2 and possibly C3) tautomers coexist in the gas phase.

20
A mixture of C1 and C2 (and possibly C3) as well as the keto–imino (C4) tautomers have also been observed in IR matrix isolation studies.25

21
Similarly, molecular beam microwave (MW) spectroscopy studies indicate that the C1, C2, and C4 tautomers coexist in the gas phase.26

22
The IR matrix isolation and MW spectroscopy studies discussed above both made use of solid cytosine, which is commercially available only as the C1 tautomer, and heated it to sublimation to produce the gas phase mixture of tautomers observed.

23
While in the REMPI experiments,24 the solid cytosine sample was desorbed into the gas phase by laser ablation jet-cooling.

24
In an attempt to understand the experimental observation of a mixture of cytosine tautomers in the gas phase, Russo et al14. calculated the relative stabilities of the C1, C2, C3, and C4 tautomers and the barriers for unimolecular tautomerization processes that allow interconversion of these species.

25
Their studies indicate that the barriers for the C1 → C2 and C1 → C4 unimolecular tautomerization processes (156.5 and 181.6 kJ mol−1, respectively) are too large to be overcome by thermal vaporization.

26
In contrast, the C2 → C3 unimolecular tautomerization requires significantly less energy, 38.9 kJ mol−1, suggesting that interconversion of these tautomers may occur via thermal vaporization.

27
However, direct conversion of C1 to C3 is expected to require significantly greater energy than the two step process, C1 → C2 → C3, and therefore is not likely to occur.

28
Thus, these results suggest that formation of C3 from C1via thermal vaporization of cytosine is also not possible.

29
In the REMPI studies24 the authors suggest that tautomerization of C1 → C2 (and possibly C3) takes place in the desorption step or be can induced by multiple collisions.

30
Because of the energetic nature of laser ablation this may be possible, but these conclusions do not explain the observation of the tautomeric mixture observed in the MW and IR matrix isolation studies.

31
In order to reconcile these findings, it is clear that alternative mechanisms by which C1 can be converted to C2 and C4 must exist that are energetically accessible via thermal vaporization.

32
Our interest in resolving this apparent discrepancy arose from studies we have been performing that examine the threshold collision-induced dissociation (TCID) behavior of complexes of cytosine to a variety of metal ions in order to extract the corresponding metal ion binding affinities.

33
Our initial results indicated that theory seriously over estimated the binding affinities of cytosine.

34
However, this conclusion was based upon the assumption that the canonical form of cytosine, C1, was the form accessed upon thermal vaporization.

35
Although C1 may be formed upon thermal vaporization, the poor agreement between theory and experiment in our studies, as well as the previous experimental studies of cytosine, suggests that it is not the only tautomer accessed.

36
It should be noted that the TCID technique is a threshold technique and therefore the thermochemistry derived from such studies is only sensitive to the lowest energy dissociation pathway available.

37
Therefore, the measured thresholds can only provide the metal ion binding affinity of the tautomeric form present in the mixture generated upon thermal vaporization that has the lowest binding affinity.

38
And thus for appropriate interpretation of our TCID results it is necessary that we determine which tautomeric forms of cytosine are accessed in the thermal vaporization process.

39
It is possible that tautomerization could also occur during complex formation or collision-induced dissociation of the metal ion–cytosine complexes.

40
However these issues will not be dealt with here, but instead will be discussed in a later paper that deals with these experimental measurements.

41
In the present work, we re-examine the unimolecular tautomerization processes studied by Russo et al14. and extend these studies to include two additional tautomers, C5 and C6.

42
In addition, we also examine three bimolecular tautomerization processes of dimeric cytosine as an alternative means by which the C2 and C4 tautomers might be formed from solid cytosine, C1.

43
This paper is, to our knowledge, the first theoretical work to investigate bimolecular tautomerization processes of dimeric cytosine thus providing an alternative mechanism that might explain how the tautomerization energy barriers might be overcome by thermal vaporization.

44
It should be noted that three of the six low-energy tautomers of isolated cytosine, C2, C3, and C6, are not accessible in DNA and RNA because the ribose would not migrate.

45
In addition, the bimolecular tautomerization of cytosine dimers is probably also not accessible in DNA and RNA because cytosine is base paired with guanine, not itself.

46
However, if the results of studies of isolated cytosine and other nucleobases are to be put into appropriate biological context and act as models to provide insight into tautomerization processes that may occur in DNA and RNA, then a complete and accurate understanding of all tautomeric forms and mechanisms by which they may be accessed is necessary.

47
For example, the canonical form of cytosine present in nucleic acids, C1, might be converted to C4via an analogous double proton transfer mechanism in the C:G base pair, where the N1 H atom of guanine is transferred to N3 of cytosine, while an amino H atom of cytosine is transferred to the carbonyl oxygen atom of guanine.

48
Thus the bimolecular double proton transfer mechanisms examined here might provide insight into mutation of DNA resulting from tautomerization of cytosine or other bases during replication.

Computational details

49
To obtain structures and energetics for the isolated cytosine tautomers and the transition states (TSs) for unimolecular tautomerization (i.e. interconversion via intramolecular proton transfer), ab initio theory calculations were performed using Gaussian .9827

50
Geometry optimizations and vibrational analyses were performed at the MP2(full)/6-31G* level.

51
When used to calculate zero point, thermal, and free energy corrections, the MP2(full)/6-31G* vibrational frequencies are scaled by a factor of 0..964628

52
The optimized geometries, rotational constants, and scaled vibrational frequencies thus obtained for each tautomer and unimolecular transition state examined are available as ESI and are listed in Tables S1 through S4.

53
Single point energy calculations were performed at the MP2(full)/6-311+G(2d,2p) level using the MP2(full)/6-31G* optimized geometries.

54
To obtain accurate energetics, zero point energy corrections were included.

55
As a probe of alternative mechanisms for tautomerization during sublimation, transition states for bimolecular tautomerization of cytosine dimers (i.e., interconversion via intermolecular double proton transfer) were studied.

56
With the computational resources available, we were unable to perform these calculations at the same level of theory employed for the monomers.

57
Therefore, geometry optimizations and vibrational analyses were performed at the B3LYP/6-31G* level of theory.

58
When used to calculate zero point, and thermal and free energy corrections the B3LYP/6-31G* vibrational frequencies are scaled by a factor of 0..980428

59
The optimized geometries, rotational constants, and scaled vibrational frequencies determined for each cytosine dimer and bimolecular transition state are available as ESI and are given in Tables S5 through S8.

60
Single point energy calculations were carried out at the MP2(full)/6-311+G(2d,2p) level using the B3LYP/6-31G* optimized geometries.

61
To obtain accurate energetics, zero point energy and basis set superposition error (BSSE) corrections were included in the full counterpoise approximation29,30.

Results and discussion

Isolated cytosine tautomers

Relative stabilities

62
Previous theoretical studies9,10,12–14 are in general agreement that of all of the possible tautomeric forms of cytosine, there are six low-energy tautomers C1, C2, C3, C4, C5, and C6 (Fig. 1).

63
In agreement with most of these studies, the calculations performed here find that C2 is the most stable tautomer in the gas phase.

64
C3, the tautomer derived from C2via 180° rotation of the hydroxy group about the C2–O bond is found to be the next most stable tautomer, lying only 3.1 kJ mol−1 higher in energy than C2.

65
Although the absolute differences in the stabilities of the C2 and C3 tautomers differ somewhat depending upon the level of theory employed, all previous studies also find that C2 is more stable than C3.

66
Because the chemical bonding in C2 and C3 is the same, i.e., C2 and C3 are simply rotamers of each other, the relative stabilities of these tautomers are easily understood based upon their dipole moments 3.72 and 5.11 D, respectively.

67
The C1 tautomer, the most stable and only tautomer observed in solid cytosine, is found to be the next most stable gas phase tautomer, lying 5.9 kJ mol−1 higher in energy than C2.

68
The dipole moment of the C1 tautomer, 7.20 D, is larger than those of C2 and C3.

69
Although the chemical bonding of C1 differs somewhat compared to C2 and C3, a simple correlation between the relative stabilities of these tautomers and their dipole moments still exists.

70
It should be noted that although there is general agreement that C2 is the most stable tautomeric form of cytosine in the gas phase, density functional theory calculations tend to over estimate the stability of the C1 tautomer and actually find it to be more stable than C2 when certain basis sets are employed.13,16

71
The C4 tautomer is found to be the next most stable cytosine tautomer, lying 13.3 kJ mol−1 higher in energy than the C2 tautomer.

72
The dipole moment of C4 is calculated to be 5.46 D. This suggests that if the relative stabilities of the cytosine tautomers could be predicted based solely upon their dipole moments as the above correlation for the C1, C2, and C3 tautomers suggests, then C4 would be expected to be more stable than C1, but less stable than C2 and C3.

73
Indeed, Fogarasi15 argues that some high level theoretical calculations find that the C1 and C4 tautomers are of very nearly identical stability, but was unable to rationalize this claim based upon comparison to the estimated populations of these tautomers observed in the IR matrix isolation and molecular beam MW studies.

74
The lesser stability of the C4 tautomer calculated here and in other studies as well as the lower population of this tautomer observed in the experimental studies suggests that the imino functionality is less favorable than the amino functionality.

75
The C5 tautomer is found to be the next most stable cytosine tautomer, lying 20.6 kJ mol−1 higher in energy than the C2 tautomer.

76
The C5 tautomer can be derived from C4via 180° rotation of the imino hydrogen atom about the C4N bond.

77
However in order for this rotation to occur, the π bond must be broken and thus the interconversion of C4 and C5 should require more energy than the interconversion of C2 and C3 as is found here and will be discussed later.

78
The dipole moment of C5, 2.78 D, is smaller than that of C4 and in fact is the smallest of all of the tautomeric forms examined here.

79
This indicates that the relative stabilities of the tautomers cannot be explained based simply upon their dipole moments and that other factors must be examined.

80
All previous theoretical studies agree than C5 is less stable than C4.

81
The decreased stability of C5 compared to C4 arises from steric repulsion between the hydrogen atom bound to N3 and the imino hydrogen atom, which causes the N3–C4–NH bond angle to increase by ∼8° compared to that observed for the other tautomers.

82
C6 is calculated to be the least stable of the cytosine tautomers examined here, lying 37.4 kJ mol−1 higher in energy than C2.

83
Most previous theoretical studies concur that C6 is the least stable of these six tautomeric forms of cytosine.

84
The calculated dipole moment of C6 is 8.88 D, the largest of all six tautomers examined here.

85
The very large dipole moment and the steric repulsion between the hydrogen atom bound to N3 and the adjacent amino hydrogen atom results in decreased stability of this tautomer and leads to an ∼2° decrease in the H–N–H angle.

86
In summary, the calculations performed here find that the relative energies of the six tautomers examined follow the order C2 < C3 < C1 < C4 < C5 < C6.

87
Our results as well as those from earlier theoretical work9,12,13 performed at fairly high levels of theory that support this trend are summarized in Table 1.

88
Although absolute consistency in these trends is not found in every theoretical study performed, the calculations or arguments that suggest that C1 and C4 are more stable have not been adequately supported by either the theoretical or experimental studies.

89
The above discussion of relative stabilities of the cytosine tautomers is based upon 0 K energetics.

90
Because experimental studies are carried out at elevated temperatures, typically 298 K, but much higher temperatures in the cytosine studies discussed above, ∼490 K for the IR matrix isolation experiments25 and ∼570 K for the molecular beam MW experiments,26 it is more appropriate to examine the relative stabilities of the tautomers at these temperatures.

91
To ascertain the relative stabilities of these tautomers at the appropriate experimental temperatures, we also calculated thermal energy corrections at 298, 490, and 570 K. The relative free energies of the cytosine tautomers at these temperatures are also summarized in Table 1.

92
As expected, the absolute differences in stability decrease with increasing temperature, but the thermal corrections are not large enough to result in a change in the stability order over this range of temperatures.

Relative populations

93
As discussed above, the matrix isolation IR study of cytosine reported the observation of a mixture of C1, C2 and C4.

94
Based upon comparison between the measured IR intensities and those predicted from ab initio calculations, the authors estimated the population ratios of the C1, C2, and C4 tautomers present in their experiments as 0.4–0.5∶1.0∶0.1–0.25, respectively.25

95
The rotamers of C2 and C4 (i.e., C3 and C5) were not reported, with no explanation provided to explain their absence.

96
By examining the vibrational frequencies of C2 and C3, and those of C4 and C5 (Table S2), it can be seen that they are virtually identical.

97
Therefore, these rotamers cannot really be distinguished by their IR spectra.

98
The molecular beam MW spectroscopy study of cytosine also reported the observation of a mixture of C1, C2 and C4.

99
Based upon comparison between the measured MW intensities and those predicted from ab initio calculations and the dipole moment components, the authors estimated the population ratios of the C1, C2, and C4 tautomers present in their experiments as 1.0∶1.0∶0.25, respectively.26

100
The C3, C5 and C6 tautomers were not observed.

101
The identification of the tautomers based solely on their rotational constants (Table S2) alone could not absolutely discount the absence of C3 and C5, however, the differences in the magnitude and direction of the dipole moments of the rotamers (Fig. 1) makes it possible to distinguish them in their MW spectra and thus confirms the absence of these species in the MW experiments.

102
Because the cytosine vapor in both the matrix isolation IR and molecular beam MW studies was generated by thermal vaporization, albeit at somewhat different temperatures, it can be concluded that C3 and C5 were also not generated in the matrix isolation IR studies.

103
Using the relative stabilities of the cytosine tautomers determined from our calculations at 298, 490, and 570 K, we estimated the Maxwell–Boltzmann populations of these species based solely on their relative stabilities.

104
These populations are summarized in Table 2.

105
Also listed in Table 2 are the estimated populations of the cytosine tautomers derived from Fogarasi's coupled cluster electron correlation study and those estimated from the matrix isolation IR and molecular beam MW experiments.

106
As can be seen in Table 2, our estimated relative populations for the C1, C2, and C4 tautomers are in good agreement with those measured experimentally.

107
Good agreement is also found between experiment and Fogarasi's estimated populations for the C1 and C2 tautomers, but his results tend to overestimate the relative population of the C4 tautomer.

108
This suggests that the coupled cluster results tend to overestimate the stability of the C4 tautomer as discussed above.

109
In contrast, a major discrepancy between theory and experiment exists for the C3 tautomer.

110
Based solely on the relative stabilities of the tautomers, both our results and those of Fogarasi predict that C3 should have an appreciable population in the gas phase; whereas experiments do not find any direct evidence that C3 coexists with C1, C2 and C4 in gas phase mixture of tautomers generated by thermal vaporization of cytosine.

111
Therefore, it is clear that population estimates based solely upon the relative stabilities of the isolated cytosine tautomers are not reliable.

112
In the following sections we will examine several possible mechanisms by which tautomerization of cytosine may occur and the influence of these mechanisms on the relative populations of the various tautomers generated by thermal vaporization.

Unimolecular tautomerization of cytosine

Barrier heights

113
In earlier work Russo et al14. investigated the relative stabilities and unimolecular tautomerization processes that allow interconversion of the four most stable tautomers of cytosine: C1, C2, C3, and C4.

114
Their calculations were carried out at the B3LYP/6-311+G(2df,2p) level of theory.

115
As mentioned earlier, density functional theory tends to overestimate the stability of the C1 tautomer, and at the level of theory employed in their work the relative stabilities of these tautomers follow the order of C1 < C2 < C3 < C4.

116
Their studies also found that the barriers for the C1 ↔ C2 and C1 ↔ C4 unimolecular tautomerization processes are too large to be overcome by thermal vaporization.

117
The C2 ↔ C3 unimolecular tautomerization was found to require significantly less energy such that it was concluded that thermal vaporization may provide enough energy to allow interconversion of these tautomers, but this process is only relevant if C1 can first undergo tautomerization to C2.

118
Thus, they concluded that because the energy barriers were so large the possibility of acquiring of C2, C3 and C4 from heating C1 in the gas phase is very small.

119
Because density functional theory does not determine the correct relative stabilities of these tautomers, we reinvestigated these tautomerization processes and extended our studies to include the unimolecular tautomerization processes involving C5 and C6.

120
There are five direct unimolecular tautomerization processes that allow interconversion of the six tautomers of cytosine via simple proton transfer, C5 ↔ C4 ↔ C1 ↔ C2 ↔ C3 ↔ C6.

121
These unimolecular tautomerization processes can be divided into three groups:.

1–2 Proton transfer

122
Three of the unimolecular tautomerization pathways involve a 1–2 proton transfer between adjacent atoms.

123
The C1 ↔ C2 and C3 ↔ C6 transformations correspond to simple keto–enol tautomerization, while C4 ↔ C1 interconversion corresponds to an amino–imino tautomerization.

124
During these unimolecular tautomerization processes, a σ-bond is broken and a new σ-bond is formed.

125
As a result, these processes exhibit very large activation energy barriers, C1 → C2 (142.2 kJ mol−1), C1 → C4 (169.1 kJ mol−1), and C3 → C6 (158.9 kJ mol−1).

cistrans Isomerization

126
The C4 ↔ C5 unimolecular tautomerization corresponds to a cistrans isomerization.

127
During this unimolecular tautomerization process, the hydrogen atom remains bonded to the imino nitrogen atom, but the imino π bond is broken and reformed.

128
Because π bonds are weaker than σ bonds the activation energy barrier for this process is somewhat smaller than for the 1–2 proton transfer tautomerization processes, C4 → C5 (94.3 kJ mol−1).

σ-Bond rotation

129
The C2 ↔ C3 unimolecular tautomerization corresponds to 180° rotation about the C2–OH σ bond.

130
This conversion does not require bond breakage and thus exhibits a much lower energy barrier, C2 → C3 (35.2 kJ mol−1).

131
The relative energies of the cytosine tautomers and the transition states for unimolecular tautomerization at 0 K determined here are summarized in Table 3.

132
Also listed in Table 3 are the results of Russo et al.14

133
Fig. 2 shows the potential energy landscape for the unimolecular tautomerization of the isolated tautomers of cytosine at 0 K determined here.

134
Overall, our results are in qualitative agreement with those of Russo et al., the primary differences being that we find a different stability order for the four lowest energy tautomers and the barriers we calculate for the C1 ↔ C2 and C1 ↔ C4 transformations are ∼8–18 kJ mol−1 lower.

135
Based upon the 0 K energetics, Russo et al. concluded that the unimolecular tautomerization of C1 to generate the mixture of tautomers observed in the matrix isolation IR and molecular beam MW experiments (C1, C2, and C4) cannot occur in the gas phase by heating solid cytosine under its decomposition temperature, ∼600 K. To confirm this conclusion, we also calculated the thermal and free energy corrections for the tautomers and their unimolecular transition states at 298, 490, and 570 K. As the temperature is increased from 0 to 298 K, the unimolecular tautomerization barriers decrease for all processes, and specifically by 5.9 and 6.8 kJ mol−1 for the C1 → C2 and C1 → C4 transformations, respectively.

136
However, as the temperature is increased beyond 298 K, to 490 and 570 K, the barriers change very little and actually increase.

137
Therefore, our conclusions remain the same as those of Russo et al.; thermal vaporization of cytosine below its decomposition temperature does not provide sufficient energy to overcome the unimolecular tautomerization barriers.

138
It is worth noting that the unimolecular tautomerization energy barriers can be overcome given sufficient energy.

139
Szczesniak et al. observed the C1 → C2 transformation in isolated cytosine upon absorption of UV radiation at 220 nm (543.7 kJ mol−1).25

140
Absorption of a single phonon at 220 nm provides 543.7 kJ mol−1, more than enough energy to overcome the calculated unimolecular tautomerization barrier.

141
Thus it is clear that another mechanism by which C1 can be converted to C2 and C4 under thermal vaporization conditions must exist.

Tunneling effects on the unimolecular tautomerization rates

142
Reactions involving light species such as H+ or H, (e.g., the tautomerization processes examined here) can experience enhanced rates as a result of tunneling processes.

143
Tunneling can easily increase the rate constant by a factor of 3 or more.31

144
Using the simplest tunneling correction of the Wigner transformation coefficient32–35 given by K(T) = 1 + (ħ|ω|/kT)2/24 where ω is the imaginary frequency of the unbound normal mode at the saddle point, ħ is Planck's constant divided by 2π, and k is the Boltzmann constant, we calculated corrections for the unimolecular tautomerization processes that convert the tautomeric form found in solids to those observed in the gas phase, C1 ↔ C2 (ω = −1820 cm−1) and C1 ↔ C4 (ω = −1880 cm−1).

145
The tunneling correction coefficient for these processes varies between ∼2.00 ± 0.21 assuming an experimental temperature of 500 to 600 K for thermal vaporization of cytosine.

146
The tunneling correction coefficients for the other unimolecular tautomerization processes examined here, C2 ↔ C3 (ω = −496 cm−1), C3 ↔ C6 (ω = −1812 cm−1), and C4 ↔ C5 (ω = −1089 cm−1) are smaller and vary between 1.06 and 2.13 over the same range of temperatures.

147
Although tunneling effects result in a doubling of the rate of tautomerization, they do not explain why cytosine vapor exists as mixture of tautomers with C2 the predominant form.

148
Thus our conclusion remains, another mechanism by which C1 can be converted to C2 and C4 under thermal vaporization conditions must exist.

Cytosine dimers

Assisted tautomerization mechanisms

149
As discussed above, the pathways for unimolecular tautomerization of C1 to produce the gas phase mixture of tautomers observed experimentally, C1, C2, and C4, require significantly more energy than provided by thermal heating under cytosine's decomposition temperature.

150
In contrast, proton transfer processes that take place between two species stabilized by intermolecular hydrogen bonds generally exhibit much lower energy barriers to tautomerization, particularly when dual hydrogen bonding interactions exist between the two molecules where one molecule acts as a proton donor and the other as a proton acceptor in one hydrogen bonding interaction, and each plays the reverse role in the second hydrogen bonding interaction.36–50

151
For example, studies of gas-phase tautomerization mechanisms based on the formation of hydrated complexes51 or dimers of the nucleic acid bases52–54 have found that proton transfer is facilitated by such dual hydrogen bonding interactions between the base and a water molecule or between the two bases.

152
Therefore, the formation of such dimers might provide an energetically accessible pathway by which the experimentally observed mixture of cytosine tautomers might be generated upon thermal vaporization.

Structures of the cytosine dimers

153
Because thermal vaporization of solid cytosine is the means by which the cytosine vapor is generated in the experimental studies mentioned above,25,26 it is worthwhile to review its crystal structure.

154
X-ray diffraction studies18 have found that C1 is the only tautomeric form present in crystalline cytosine.

155
A schematic representation of the crystal structure derived from this work is reproduced in Fig. 3.

156
The unit cell is orthorhombic with a layered structure in which each cytosine molecule engages in a total of six hydrogen bonding interactions with four neighboring cytosine molecules.

157
The N1 hydrogen atom interacts with the N3 nitrogen atom of the first neighbor, the carbonyl oxygen atom interacts with an amino hydrogen atom on the first and second neighbors, the N3 nitrogen atom interacts with the N1 hydrogen atom of the third neighbor, and the amino hydrogen atoms interact with the carbonyl oxygen atoms of the third and fourth neighbors.

158
Only one structurally distinct dimer involving dual hydrogen bonds can be formed directly from the crystal structure.

159
We designate this cytosine dimer as C1∶C1,α as shown in Fig. 4a.

160
Although an alternative structurally distinct dimer can also be formed, it only involves a single hydrogen bond and therefore could not act to relay hydrogen atoms between the two molecules to facilitate tautomerization.

161
During the vaporization process, thermal motion of the cytosine molecules may allow the formation other hydrogen bonding structures and thus alternative dimers may be formed.

162
If this occurs, two alternative C1∶C1 dimers, C1∶C1,β and C1∶C1,γ, might also be generated as shown in Fig. 4b and c, respectively.

163
Based upon previous theoretical studies,55,56 these three C1∶C1 dimers are the most stable of all of the possible cytosine dimers.

164
Furthermore, C1∶C1,α has been reported in gas phase by Nir et al.55

165
Other C1∶C1 dimer conformations are possible, but are not likely to be of importance because they only engage in one hydrogen bonding interaction and thus proton transfer produces radical species of much lower stability and would require much greater energies for their generation.

Bimolecular tautomerization of cytosine

166
In the dimer associated tautomerization studies,51–54 the synchronous proton transfer between two pairs of proton donors and acceptors was obtained by computation.

167
Compared with stepwise tautomerization processes, in which one proton is transferred in each step, the simultaneous double-proton-transfer mechanism exhibits a much lower barrier.52

168
Tautomerization via simultaneous double proton transfer of the three C1 dimers examined here leads to three new cytosine dimers: C1∶C1,α ↔ C2∶C4, C1∶C1,β ↔ C2∶C2 and C1∶C1,γ ↔ C4∶C4.

169
The corresponding transition states for these bimolecular tautomerization processes were also calculated.

170
The relative energies of the dimers and the corresponding transition states are listed in Table 4.

171
To provide an overall view of the bimolecular tautomerization processes, the calculated structures and relative energies of the dimers and transition states are shown as reaction coordinate diagrams at 0 K in Fig. 4a, b, and c, respectively.

172
In all cases, the reactant C1∶C1 dimers are more stable the corresponding product dimers, C2∶C4, C2∶C2, and C4∶C4, indicating that intermolecular interactions can alter the relative stability of the cytosine tautomers.

173
Similar results were also found in the theoretical study of alternative cytosine dimers by Czerminski et al.57

174
Although this observation relates only to dimeric clusters, it likely extends to higher order clusters and may provide an explanation as to why C1 is the only tautomeric form found in the solid state.

Barrier heights for bimolecular tautomerization

175
Based on our calculations, the simultaneous double-proton-transfer mechanism significantly reduces the activation energy as compared to the corresponding unimolecular tautomerization processes for isolated cytosine monomers.

176
The bimolecular tautomerization of C1∶C1,α is not directly comparable to the unimolecular tautomerization processes as it generates two different tautomers, C2 and C4, in a single process.

177
However, the bimolecular tautomerizations of the C1∶C1,β and C1∶C1,γ generate two C2 and C4 molecules, respectively.

178
The calculated barrier heights at 0 K for the C1∶C1,α, C1∶C1,β, and C1∶C1,γ bimolecular tautomerization processes are 26.6, 10.1, and 41.2 kJ mol−1, respectively, compared to 142.2 and 169.1 kJ mol−1 for the C1 → C2 and C1 → C4 unimolecular tautomerization processes.

Tautomerization mechanisms

179
Another important feature of the bimolecular tautomerization processes is that both the dimers and bimolecular transition states are of much greater stability than the isolated monomers.

180
The heat of sublimation of cytosine, i.e., the difference in enthalpy of solid versus gas phase cytosine, was determined to lie in the range between 147 and 176 kJ mol−1 at 423–483 K.58–60

181
This is approximately four times larger than the difference in stability of two C1 isolated monomers and the three C1∶C1 dimers at 0 K, and 40 to 50 times larger than their difference in stability at 490 K. Therefore, the relative stability of cytosine in various environments is temperature dependent.

182
Thus, it is clear that the extensive hydrogen bonding network present in solid cytosine provides additional stabilization compared to the dimer, and that the hydrogen bonding in the dimer likewise stabilizes cytosine compared to isolated molecules so long as the temperature is not so high as to provide enough internal energy to break the hydrogen bonding interactions and allow entropy to take over and vaporize the cytosine molecules.

183
This suggests that C1 might be converted into other tautomeric forms during the sublimation process by the following three mechanisms:

184
(1) C1 is evaporated into the gas phase as C1 monomers.

185
Two C1 monomers associate in the gas phase to form one of the three types of C1∶C1 dimers.

186
It should be noted that although the cytosine sample may be heated to 500–600 K, that the vaporized molecules may undergo cooling to near room temperature as a result of collisions with other vaporized cytosine molecules and/or bath gases present under the vaporization conditions.

187
Because the dimer association energy is greater than the barrier to bimolecular tautomerization, these dimers may tautomerize to produce alternative cytosine dimers (C2∶C4, C2∶C2, and C4∶C4) that may dissociate to produce C2 and C4 monomers upon collision or absorption of heat.

188
This mechanism is particularly favorable for the conversion of C1 into C2, because C2 is more stable than C1 in the gas phase.

189
Because of the difficulty of evaporating cytosine solid, the concentration of C1 in the gas phase is likely to be very low, and the possibility of two cytosine molecules associating into a dimer might not be very significant.

190
However, in the region very close to the heated cytosine sample, it is reasonable to believe that the density of cytosine molecules is large enough that such association of cytosine might be feasible under the experimental vaporization conditions.

191
If tautomerization were to occur via this mechanism, then the relative populations of C1, C2, and C4 produced via thermal vaporization of solid cytosine would depend upon the relative gas phase stabilities of the isolated cytosine tautomers and cytosine dimers.

192
(2) C1∶C1 dimers are directly evaporated into the gas phase and may be converted into the corresponding cytosine dimers (C2∶C4, C2∶C2, and C4∶C4) via bimolecular tautomerization pathways.

193
The mixture of C∶C dimers thus generated then dissociate into monomers upon collisions with other cytosine molecules or bath gases present, or by the absorption of heat.

194
If tautomerization were to occur via this mechanism, then the relative populations of C1, C2, and C4 produced via thermal vaporization of solid cytosine would again depend upon the relative gas phase stabilities of the isolated cytosine tautomers and cytosine dimers.

195
(3) Tautomerization via double proton transfer might occur in the solid phase resulting in a mixture of cytosine tautomers being present in the heated solid.

196
Then as the solid absorbs enough energy to overcome the hydrogen bonding and other stabilizing interactions, the cytosine sample is evaporated into the gas phase as a mixture of the tautomers generated in the heated solid state.

197
In this case, the additional hydrogen bonding interactions with adjacent cytosine molecules not directly involved in the specific tautomerization mechanism occurring at a given sight probably provide additional stabilization that might further facilitate the tautomerization process.

198
Thus, it seems plausible that such a solid-state tautomerization mechanism might dominate the tautomerization processes that occur upon thermal heating of cytosine.

199
If tautomerization were to occur via this mechanism, then the relative populations of C1, C2, and C4 produced via thermal vaporization of solid cytosine could not be predicted simply based upon the relative stabilities of the isolated gas phase cytosine tautomers and cytosine dimers, but that the extended intermolecular hydrogen bonding interactions would also need to be considered.

200
The bimolecular double proton-transfer tautomerization mechanism can only be regarded as an approximate model for the actual tautomerization process that might occur in the solid phase.

201
In order to obtain a better description of the tautomerization processes in the solid phase and accurately predict the relative populations of the various tautomers that would be produced upon thermal vaporization, larger molecular clusters should be investigated.

202
Simulated annealing or direct dynamics simulations might also provide additional insight into this much more complicated process.

Relative populations

203
The crystal structure of cytosine suggests that only the tautomerization processes associated with the C1∶C1,α dimer, i.e., formation of equal amounts of C2 and C4 in the solid state is likely to occur.

204
However, the relative populations of C2 and C4 estimated from the matrix isolation IR and MW spectroscopy experiments25,26 suggest that C2 is generated in much greater abundance than C4.

205
As suggested earlier, thermal motion as a result of heating the solid cytosine sample may allow alternative C1∶C1 dimers to be formed.

206
If this is the case, then the relative stabilities of the cytosine dimers as well as the relative stabilities of the C1, C2, and C4 monomers could be used to obtain a better estimate for the relative populations of tautomers generated via thermal vaporization.

207
Based upon this simple idea, the relative populations of the C1, C2, and C4 tautomers generated upon thermal vaporization over the temperature range from 500 to 600 K are estimated as 0.2–0.5∶1.0∶0.01–0.05, respectively.

208
This estimate agrees reasonable well with the C1 and C2 populations estimated from the matrix isolation IR study, but the MW study suggests that the estimated population of C1 is somewhat low.

209
Both studies also suggest that the estimated C4 population is also too low.

210
Thus it seems clear that accurate prediction of the relative populations of the various tautomers that would be produced upon thermal vaporization can only be achieved if the influence of the extended intermolecular hydrogen bonding interactions present in the condensed phase are examined in greater detail.

Comparison of theory and experiment

Solid state structure of cytosine

211
Based upon the calculated relative stabilities of the three C1∶C1 dimers, C1∶C1,α (4.4 kJ mol−1), C1∶C1,β (0.0 kJ mol−1), and C1∶C1,γ (12.9 kJ mol−1), we might expect that the hydrogen bonding structures analogous to C1∶C1,α and C1∶C1,β might coexist in the solid state, while the hydrogen bonding structure analogous to C1∶C1,γ is much less likely to be present.

212
As mentioned above, the X-ray diffraction experiment18 found that the only hydrogen bonding configuration that exists in crystalline cytosine is that analogous to the C1∶C1,α dimer.

213
This contradiction might be resolved based upon the following considerations.

214
First, the relative stabilities of the cytosine dimers are based upon isolated gas phase dimers.

215
However, in the solid phase, the relative stabilities of the various C1∶C1 hydrogen bonding interactions may be altered by the extended intermolecular interactions present in the solid state.

216
In order to evaluate this conclusion, additional calculations of higher order clusters would need to be performed.

217
Second, the XRD data were obtained for crystalline cytosine, while the IR and MW experiments made use of amorphous cytosine powder.

218
It may be possible that the cytosine molecules are aligned differently in the powder than in the crystals, and hydrogen bonding interactions analogous to the C1∶C1,β and C1∶C1,γ dimers might also exist.

219
Thus, powder XRD experiments may provide additional structural information.

Gas phase dimer populations

220
Because the relative stabilities of C1∶C1,β (0 kJ mol−1), C1∶C1,α (4.4 kJ mol−1), C2∶C2 (4.8 kJ mol−1), and C1∶C1,γ (12.97 kJ mol−1) are very similar, and the energy barriers for tautomerization of these dimers via double proton-transfer are relatively small, it might be expected that C1∶C1,α, C1∶C1,β, C1∶C1,γ, and C2∶C2 would coexist in the gas phase.

221
In contrast, the other cytosine dimers examined in this work are sufficiently less stable, C2∶C4 (22.76 kJ mol−1), and C4∶C4 (42.80 kJ mol−1), that they would not be expected to have significant populations in the gas phase at temperatures below the decomposition temperature of cytosine.

222
In agreement with our results, molecular dynamics/quenching calculations,56 found that the C1∶C1,α dimer is the most populated species of all possible dimeric clusters.

223
In addition, it was also found that the C1∶C1,β and C1∶C1,γ dimers are also populated in reasonable abundance (note: the C2∶C2 dimer was not examined in this work).

224
However, C1∶C1,α was the only dimer observed in the REMPI spectra of laser desorbed jet-cooled cytosine.55

225
Considering that similar experiments for the cytosine monomers found that both C1 and C2 are formed under analogous experimental conditions, and that these tautomers are calculated to have larger differences in relative stability, it is not obvious why the C1∶C1,β and C2∶C2 dimers were not formed.

226
However, the calculated relative stabilities of the cytosine dimers are sensitive to the computational method employed.

227
The relative stabilities of the cytosine dimers determined here were derived from MP2(full)/6-311+G(2d,2p)//MP2(full)/6-31G* calculations, while the molecular dynamics/quenching calculations were performed at a slightly lower level of theory.

228
This suggest that either calculations at higher levels of theory might be needed to obtain accurate relative stabilities of the cytosine dimers, that the spectra of these dimers overlap and cannot really be distinguished in the spectra, or that there is some dynamical effects that occur in the REMPI experiments that favor the formation of C1∶C1,α over the other dimers.

229
Additional experiments such as molecular beam MW experiments that are sensitive to large differences in the dipole moments of the cytosine dimers might be helpful in understanding this apparent discrepancy between theory and experiment.

Conclusions

230
The relative stabilities of the cytosine tautomers, C2 < C3 < C1 < C4 < C5 < C6, as determined at the MP2(full)/6-311+G(2d,2p)//MP2(full)/6-31G* level of theory including zero point energy and basis set superposition error corrections were found to be in agreement with the overwhelming majority of data from previous theoretical and experimental studies.

231
Based upon the calculated relative stabilities of these tautomers, it might be expected that the C1, C2, C3, and C4 tautomers would be generated in measurable populations in the gas phase.

232
Indeed several experimental studies found that C1, C2, and C4 are generated from vaporization or laser desorption of solid cytosine, however no direct evidence for the formation of C3 has been observed.

233
It is reasonable that C1 exists, because it is the only tautomeric form present in the original solid sample and should be able to be sublimated into the gas phase directly by heating.

234
At first glance it seems surprising that although C3 is more stable than C1, there is no evidence that C3 is formed, while the less stable tautomer C4 has been observed in gas phase.

235
This apparent contradiction can be understood based on both the unimolecular and bimolecular tautomerization mechanisms examined here.

236
The C2 and C4 tautomers can be formed directly from C1 by either unimolecular or bimolecular tautomerization processes, whereas formation of C3 directly from C1 is not possible via either mechanism.

237
The barriers to unimolecular tautomerization for the C1 → C2 and C1 → C4 tautomerization processes are too large to be overcome by thermal vaporization.

238
However, formation of dual hydrogen bonds in cytosine dimers facilitate the tautomerization processes, C1∶C1,α → C2∶C4, C1∶C1,β → C2∶C2, and C1∶C1,γ → C4∶C4 and thus provide an energetically accessible mechanism for the formation C2 and C4.

239
Tautomerization via double proton-transfer seems feasible both in the gas and condensed phases.

240
Whether or not the actual tautomerization occurs in the gas phase or in the solid sample is not clear, but the additional stabilization provided by neighboring cytosine molecules in the solid should make the tautomerization processes even more favorable.

241
The formation of C3 can only be achieved via unimolecular tautomerization of C2, but requires additional energy (∼30 kJ mol−1) to overcome the activation barrier.

242
It seems extremely unlikely that the C2 monomers would be formed with enough internal energy to overcome this barrier, and therefore explains why C3 was not observed in the experiments published to date.

243
Because it is not clear whether tautomerization occurs in the gas phase or in the solid or both, accurate estimation of the relative populations of the C1, C2, and C4 tautomers is not possible.

244
The populations measured in the IR and MW experiments agree fairly well with those estimated solely on the basis of the relative stabilities of these three tautomers, but also appear to agree with consideration of the dimer stabilities as well.

245
In any event, it seems clear that intermolecular interactions between the cytosine molecules must be involved in the mechanism by which cytosine undergoes tautomerization to produce a mixture of tautomeric species in the gas phase.

246
Additional experiments and theoretical studies could shed light on the gas phase versus condensed phase contributions to this process.

247
Although the cytosine dimers examined here cannot be formed in DNA and RNA, the similarity of these systems to the DNA base pairs suggests that such bimolecular double proton transfer mechanisms might provide a means by which mutation of DNA can be achieved and explained.