Small-Molecule Interaction with G-Quadruplex DNA: Context of Anti-Cancer Drug Design

Targeting G-quadruplex (G4) DNA structures by small molecules is a potential strategy for directing gene therapy of cancer disease. Herein, novel insights into non-covalent interactions between a structurally diversified spectrum of ligands and a G-quadruplex DNA (formed in the c-Myc oncogene promoter region) are reported. Solvation-induced effects on and entropic contributions to the binding free energy are explored. In addition, the correlation of G4 domain motions and active site rearrangements with the binding of highest affinity ligands, being associated with the fundamentally distinguishable modes of interaction (external stacking: BRACO-19, TMPyP4, and CX-3543; groove binding: Sanguinarine, Tetrahydropalmatine, and Hoechst 33258), is quantitatively evaluated and elaborated by observing thermodynamic consequences of the receptor conformational flexibility changes in the asymptotic regime (t → ∞) of molecular dynamics (MD) simulation. BRACO-19 and Tetrahydropalmatine are identified as unique (thermodynamically favorable and highly selective) G4-DNA binders. Implications of the present study for experimental research are elucidated.


INTRODUCTION
ESIDES folding into different duplex structures, highly dynamical DNA molecules can fold into hairpin, triplex, G-quadruplex, and i-motif structures that contain noncanonical base pairs.G-quadruplex (or G-tetraplex) structures stem from guanine (G)-rich tracts, and are composed of two or more stacked G-tetrads (or G-quartets) assembled from a single strand of DNA in an intramolecular (backfolded) way or from two-, three-, or four DNA strands that can associate in intermolecular G4-DNAs.G4s adopt a great diversity of conformations and folding energies, and their thermodynamic stabilities are often comparable to those of corresponding duplex structures.The presence and function of G4s in vivo are not quite clear.Being located in several significant genomic regions (human telomeres, oncogene promoter regions, immunoglobulin switch regions, ribosomal DNA, and some regions of RNA), G4s are hypothesized to be involved in important biological phenomena, such as telomere maintenance, end-capping and protection, chromosome stability, gene expression, viral integration, and recombination. [1,2]For example, the inhibition of telomere elongation by telomerase in cancer cells is the consequence of the formation of quadruplexes in telomeric DNA. [3,4]As a growing number of proteins with a strong propensity to bind to G4s are being identified indicates that their interactions with G4s may be quite relevant for the cellular processes.Thus, targeting G4-DNA by small molecules, with the aim to disrupt the recognition of G4-DNA by the binding proteins, emerges as a potential anticancer strategy. [5]For this approach, the structural design and development of highly selective small ligand molecules is of vital importance.
G4-DNAs have more compact structures than duplex DNAs and display well-defined binding sites supposed to accommodate ligands that are complementary in shape and charge to the biological target. [1,2]Regardless of this fact, the way in which the ligand/G4 interaction is mediated currently seeks a more physically grounded elucidation in order to get closer to the successful identification of optimal G4-DNA binders.The main reason is related to finding optimal ligand/receptor configurations and predicting binding B tumor cells. [8,9]A set of structurally different ligand molecules (Figure 1), having propensity to bind to the particular G4, [9] is chosen with intention to be representative (in terms of exhibited structural diversities) as much as possible.Their interactions with the G4 are characterized from a biophysical point of view.As a result, the highest affinity ligands, associated with external stacking interaction and groove binding respectively, are identified.Given that various conformations in static crystal structures may be due to differences in crystallization conditions or procedures, the dynamics of binding of the highest affinity ligands is investigated.For the various parts (complete structure, sugar-phosphate backbone, system of all bases, G-tetrad bases) of the G4-DNA structure, MD simulations are employed to evaluate the asymptotic configurational entropy change -the thermodynamic consequence of DNA flexibility change upon ligand binding.The configuartional entropy is conceivable as an upper bound of the true entropy contribution to the free energy in non-covalent binding.It is the entropy of the solute, mainly without translational and rotational contributions.While the separation of translational entropy of the solute is acceptable without any approximation, the removal of rotational entropy is based on the assumption of a negligible correlation between the internal degrees of freedom and the global rotation of the solute.The configurational entropy is determined using the quasi harmonic approximation, considering a system of atoms with many degrees of freedom as a system of uncoupled harmonic oscillators.

METHODS
A convenient review of distinct groups of small molecules with natural inclination to bind to G-quadruplex DNA was previously given in the literature. [9]Particular ligands (Figure 1) were selected to be typical structural representatives of the ligand families and adequate (in terms of atomic composition) to be treated by current computational methods.To obtain the initial coordinates of the target atoms, the experimental structure of the monomeric parallelstranded G-quadruplex (PDB ID: 2A5P) was retrieved from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). [10]The assembly of G-tetrads is shown in Figure 2.
Flexible docking of each ligand in the receptor was performed by AutoDock 4.2. [11,12]The lowest energy and physically meaningful (in terms of the ligand spatial orientation with respect to the compact binding sites of G4) conformations were extracted from docking experiments.The Lamarckian Genetic Algorithm in combination with a gridbased energy evaluation method was employed to calculate grid maps, while atomic potential grid map was computed by AutoGrid4 with a 0.536 Å spacing in a 65 Å × 65 Å × 65 Å (1 Å = 10 -10 m) box centered on the macromolecule.
All the other default options were chosen for the set up of runs by the AutoDock 4 tools. [12]or the calculations done by the Amber 11 suite of programs, [13,14] the solute was prepared using the Amber11 utility program tLeap in association with the ff99sb force field. [15]Every ligand was initially prepared by parametrizing its atom types, charges, and connectivity in order to be treated as part of the solute.The molecular geometry was optimized by Gaussian 98 at the MP2/6-31G* level of theory. [16]The molecular electrostatic potential was calculated by Gaussian 98 at the HF/6-31G* level of theory, [16] while the atomic charges were derived by means of the RESP fitting technique [17] that is part of AmberTools 1.5. [13,14]Remaining parameters were assigned from the General Amber Force Field (GAFF), [18] being entirely compatible with the ff99sb macromolecular force field [15] and containing the parameters for almost all the organic molecules.
The single point free energy change upon complex formation was calculated using the Sander module under Amber11 [13,14] on the basis of the following equation: where Gcomplex, Gligand, and Gquadruplex are the Molecular Mechanics-Generalized Born Solvent-Accessible Surface Area (MM-GBSA) free energies [Eq.( 2)] for the complex, ligand, and quadruplex respectively.The free energies (GMM-GBSA) were calculated by taking into account molecular mechanics contribution (GMM) and solvation-induced effects (Gsolv) through the MM-GBSA approach, [19][20][21][22][23] as implemented in Amber 11. [13,14] The MM contribution (GMM) consists of the internal (bond, angular, and dihedral) energy (Einternal), the van der Waals interaction (EvdW), and the Coulombic (electrostatic) interaction (Eelec).ΔEinternal was set to zero.Solvation was included implicitly through GBSA model treating solvent as a continuous medium instead of individual (explicit solvent) molecules.The solvation free energy (Gsolv) comprises the electrostatic contribution (EGB) [24] and the nonpolar (SA-dependent) contribution (Esurf) -the attractive (dispersion) and repulsive (cavity) interactions. [25]Thus, it follows: The nonpolar solvation term was computed in the following way: where γ is the surface tension that was set to 0.0072 kcal mol -1 Å -2 , while b is a constant that was set to 0. [25] SASA means the Solvent-Accessible Surface Area (Å 2 ) that was predicted using the Molsurf algorithm.The GBSA terms are formally identical to the free energies of hydration that can be directly compared with experimental data. [19]To define the dielectric boundary around the molecular surface, the solvent probe radius was set to 1.4 Å.The dielectric constant was set to 1 for the solute and to 80 for the surrounding solvent molecules.
The solute entropy change at 300 K (T ΔS) was estimated using the normal (translational, rotational, and vibrational) modes for the complex, receptor, and ligand, which were calculated by the Nmode module inside Amber 11. [13,14] Before running production MD, every solute was solvated using a 8 Å pad of about 3220 TIP3P water molecules and the counter ions K + were added to neutralize each system, what was followed by minimization and equilibration through a number of stages.At the outset, the positions of the solute atoms were kept fixed, while the positions of the water atoms were minimized by gradually reducing an initial harmonic restraint of 2 kcal mol -1 Å -2 on all non-hydrogen non-water atoms via 100 combined steepest descent and conjugate gradient minimization steps.Afterwards, the entire system was minimized without restrains by means of 200 combined steepest descent and conjugate gradient minimization steps.Following minimization, a 50 ps (1 ps = 10 -12 s) MD run was performed to linearly heat the water up to 300 K in the canonical NVT ensemble (constant number of particles, N; constant volume, V; constant temperature, T) using a Langevin thermostat, with a collision frequency of 2 ps −1 and harmonic restraints of 1 kcal mol -1 Å -2 .By releasing the position restraints, the entire system was heated to 300 K via a subsequent 50 ps constant-pressure simulation.To equilibrate the system, a further 500 ps long run at 300 K was executed in the NPT ensemble, without positional restraints and with a pressure relaxation time of 2 ps.Production run was then made for 10 ns (1 ns = 10 -9 s) in the NPT ensemble at 300 K. Temperature was controlled by way of a Langevin thermostat with a 2 ps -1 collision frequency.A time step used throughout all stages was 2 fs (1 fs = 10 -15 s) and all hydrogen atoms were constrained using the Shake algorithm. [26]The long-range interactions were included on every step using the Particle Mesh Ewald algorithm [27] with a 4th order B-spline interpolation, a grid spacing of < 1 Å, and a direct space cutoff of 12 Å.The minimizations and MD simulations were performed using the Sander module of Amber11. [13,14]The production trajectories were processed using the Ptraj analysis tool included in the Amber11 program suite. [13,14]or the various structural parts of the receptor (Figure 3), the configurational entropy was estimated using the approximation of Andricioaei and Karplus -A&K, [28] treating a system of atoms with many degrees of freedom as a system of uncoupled harmonic oscillators, assuming a multivariate normal distribution of the atomic fluctuations, and providing an upper bound of the true entropy [Eq.( 4)].where ħ is the reduced Planck constant, k is the Boltzmann constant, and the sum runs over all 3N-6 vibrational degrees of freedom that are associated with the harmonic oscillator frequencies (ωi).The frequencies were determined by converting the nonzero eigenvalues (λi) of the mass-weighted covariance matrix ( = i i ω kT λ ).The eigenvaules were obtained by means of the quasi-harmonic analysis, consisting of: (i) a calculation of the massweighted covariance matrix using the Ptraj command matrix in combination with the keyword mwcovar and (ii) a diagonalization of the mass-weighted covariance matrix using the Ptraj command analyze matrix. [13,14]The frequencies and entropies were numerically computed by specifying the key word thermo on the analyze matrix command line.Another computationally less expensive formula for the evaluation of configurational entropy was proposed by Schlitter. [29]However, in case of DNA, both procedures [28,29] yielded equal upper bounds of the true entropy with numerical precision. [30]

Initial Binding Modes of Highest Affinity Ligands
The MM-GBSA parameters that characterize the initial G4 : ligand interfaces are given in Table 1.Even though this approximate method is known to have many limitations (such as overestimation of the binding free energy), [31] this method is usable for single point calculation [5] in terms of scoring the initial structures that are further subjected to some, more rigorous, theoretical and computational methods.The question how inclusion of solvation in the computational protocol affects the binding free energy, defines the main criterion (ΔGMM-GBSA in Table 1) for determining highest affinity ligands.In practice, entropy contributions (T ΔS) obtained by performing normal mode analysis on the three species (complex, receptor, and ligand) can be neglected if a comparison of states of similar entropy is sought such as two ligands binding to the same receptor.The reason for this is that normal mode analysis calculations are computationally expensive and their large margin of error introduces significant uncertainty in the result. [13,14,30]Thus, the values of ΔGMM-GBSA illustrate that BRACO-19 (15), TMPyP4 (1), and CX-3543 (2) nearly equally stabilize the G4-DNA through external stacking interaction, while Sanguinarine ( 5), Tetrahydropalmatine (6), and Hoechst 33258 (3) do the same through groove binding.[34][35][36][37][38][39][40][41] It means that the complex with the ligand bound to the G-tetrad is more stable than that with the ligand in the groove-binding mode by circa 10 kcal mol -1 .
The spatial orientations of BRACO-19 (15) and Tetrahydropalmatine (6) with respect to the G4 are shown in Figure 4, while those of the other highest affinity ligands are given in Figure S1 (Supplementary Information).
Table 1.Various contributions to the free energy associated with the initial mode of ligand-G4 interaction.a) Taking into account that the current computational methods do not enable a routine determination of enthalpy, the data for different ligands need to be analyzed by observing the trends of ΔGMM-GBSA and T ΔS separately. (b) ΔGMM = ΔEinternal + ΔEvdW + ΔEelec; the molecular mechanical (MM) contributions are: Einternal -the internal (bond, angular, and dihedral) energy, EvdW -van der Waals interaction from MM and Eelec -electrostatic energy calculated by the MM force field. (e) The solute entropy change (multiplied by 300 K) is estimated using the normal (translational, rotational, and vibrational) modes for the complex, receptor, and ligand in Eq. ( 1).
In the context of predicting the solute entropy change (T ΔS in Table 1) using the normal (translational, rotational, and vibrational) modes of the complex, receptor, and ligand, it is interesting to acquire some knowledge on which portion (in kcal mol -1 ) of T ΔS goes to ligand structural alterations upon complex formation specifically.Flexible docking of ligand using the AutoDock 4.2 algorithm is based on active torsions in ligand structure, [11,12] which can be viewed as particular sp 3 bonds being directly involved in finding the lowest energy ligand:receptor configurations.Entropy of ligand binding is essentially related to the loss of its degrees of freedom upon binding.The torsional potential, as determined by AutoDock 4.2, [11,12] indirectly takes care of it by being proportional to the number of active torsions in ligand structure.An active torsion is estimated to cost 0.3 kcal mol -1 energetically. [11,12]It means that the structures of BRACO-19 (15), TMPyP4 (1), and CX-3543 (2), having 11, 4, and 6 active torsions, experience the negative entropy changes that roughly cost -3.3,-1.2, and -1.8 kcal mol -1 upon external stacking interaction with G4, respectively.By not having any active torsion in its structure, the groove binder Sanguinarine (5) does not entropically associate with a negative energetic amount.The entropy reductions of about -1.2 and -0.3 kcal mol -1 correspond to the groove binding of Tetrahydropalmatine ( 6) with 4 active torsions and Hoechst 33258 (3) with 1 active torsion, respectively.
The trend of ΔGMM-GBSA and the trend of T ΔS are separately analyzed above, because the enthalpy calculation is often impractical and hard to converge.There is a proposal of calculating the binding free energy by summing ΔGMM-GBSA and (-T ΔS). [42]If this proposal is conditionally taken as a correct one, the values from Table 1 give the following BFEs: i) -11.37, -7.21, and -3.18 kcal mol -1 for ( 15), (1), and (2) upon external stacking interaction and ii) -5.93, -3.89, and -1.30 kcal mol -1 for ( 5), (6), and (3) upon groove binding respectively.All the negative values mean that the particular G4:ligand complexes are favorable thermodynamically.
The results suggest the π-π stacking of ligand at the end of the G-quadruplex to be the favorable mode of interaction, what is in agreement with experimental [9,43] and theoretical [5] reports.Also, the groove or loop is suggested to be a viable site, which is of vital importance for blocking the interaction between the G4 and its binding proteins in aqueous solution and which is not related to crystal packing effects. [5]Even though nonspecific ligand-groove or ligandloop binding is not inherently stable and depends on the particular topology of the loop, [5,9] the groove or loop may be seen as a recognition motif of interest for the structurebased drug design.

Stable Regime of MD Simulation and Asymptotic Configurational Entropy
The reference structure for each MD run was calculated as an average over all frames fitted to the first frame of the specific trajectory.The configurational entropy was evaluated by considering vibrational degrees of freedom. [28]The translational entropy is removed without making any approximation, while the removal of rotational entropy is acceptable for a negligible correlation between the internal degrees of freedom and the global rotation of the solute. [44]The global translational and rotational entropies were separated using a least square fitting to the reference structure.The computed entropies were found to be insensitive to any iterative improvement of the reference structures.For the different parts of the G4 structure (Figure 3) the configuartional entropy was evaluated using a procedure that treats only non-hydrogen atoms as input to the Ptraj command matrix, [13,14] which is described in the section Methods.The terms used in the caption of Figure 3 mean the following: i) Complete Structure refers to all of the non-hydrogen atoms of the entire G4, ii) Backbone refers to the non-hydrogen atoms (C1', C2', C3', O3', C4', O4', C5', O5', OP1, OP2, P) of the sugar and phosphate moieties of the nucleotides, iii) Bases refers to the non-hydrogen atoms (N1, C2, O2, N2, N3, C4, O4, C5, C6, O6, C7, N7, C8, N9) of all nucleobases, and iv) G-tetrads refer to the non-hydrogen atoms (N1, C2, N2, N3, C4, C5, C6, O6, N7, C8, N9) of Gs making the G 2 G 6 G 11 G 15 , G 3 G 7 G 12 G 16 , and G 17 G 4 G 22 G 13 tetrads respectively.Not including the fast motion of hydrogen atoms in MD simulation may cause some error in the absolute value of entropy. [45,46]As the present focus is on the entropy change upon ligand binding to the apo (ligand-free) G4, the possibility of accumulating this type of inaccuracy was circumvented.
Because the entropy used for the analysis should converge with increasing trajectory length, the simulation trajectory was divided into nonoverlapping segments of 0.5, 1, 2, 4, and 10 ns.Then, the entropy was averaged over each time interval.The reported entropy values were extrapolated to an infinitely long simulation by the following empirical equation: where Sinf is the estimation of entropy for an infinite time length of MD simulation, [30,47] while the physical of the parameters p and q have not yet been determined. [38]The specific values of p and q were obtained by fitting S(t) to the sets of the entropy values obtained by 0.5, 1, 2, 4, and 10 ns long simulations (Figures 7 and 9).The fitting procedure was separately done for each part of G4 (Figure 3) in order to eliminate unwanted correlation of the particular entropy with the entropies of the other parts.Sinf multiplied by 300 K is reported in Tables S1 and S3 (Supplementary Information) that contain the absolute (asymptotic) configurational entropies ± standard deviations (SDs).To calculate the relative entropies (Tables 2 and 4) as the difference of the absolute entropies for a complex and the ligand-free G4, it was necessary to use the same type of Sinf that provides a best fit [Eq.( 5)] for each system respectively.This condition was satisfied by Sinf that takes into account the upper bound (+SD).A satisfactory convergence of the entropies was seen through the correlation coefficients being in between 0.9 and 1.In the case of apo G4, the parameter q correctly converged to 0.7 -an expected value that was previously reported. [47]In the case of the G4:ligand complexes, the parameter q was in between 0.5 and 0.6, what is in agreement with the literature recommendation. [30]he simulations of both ligand-free (apo) G4 and its complexes with the highest affinity ligands reveal a stable trend around 4 ns, as it is demonstrated by the Root Mean Square Deviation (RMSD) relative to the first frame of the trajectory (Figure 5).The RMSD of apo G4 is stabilized around 2 Å, while the RMSD of its complexes with the ligands is stabilized in between 2.5 and 3 Å (Figure 5).Representative snapshots of the systems in the stable   Numerical comparisons between the entropies generated by 4 ns-long simulations (Figures 7 and 9) and the entropies extrapolated to infinitely long simulations (Tables S1 and  S3) confirm that 92 ± 3 % of the asymptotic (t → ∞) values was reached within 4 ns for each part of the G4 structure.

G4 Flexibility Change Upon External Stacking Interaction
The configurational entropies extrapolated to infinitely long MD simulations are reported in Table S1 (Supplementary Information), while their changes are summarized in Table 2.The different structural parts of G4 (complete structure, sugar-phosphate backbone, complete system of bases, and G-tetrad bases), for which the configurational entropy change was computed, are displayed in Figure 3. Binding of BRACO-19 ( 15) is associated with a configurational entropy change of 17.22 kcal mol -1 , indicating a remarkable amount of increased flexibility of the entire G4-DNA structure.The increased conformational flexibility of G4 contributes favorably to the free energy of non-covalent binding.The backbone itself gets more flexible as an entropic increase of 7.95 kcal mol -1 is observed.The complete system of nucleobases itself retains the same trend as a relevant entropy change of 5.39 kcal mol -1 is found, of which almost nothing goes to the system of the G-tetrad nucleobases.The increased flexibility of both the sugar-phosphate backbone and the non-tetrad nucleobases may be associated with creating binding site, while the ability of the central channel of apo G4 to accommodate a solvent cation between each pair of tetrads does not get affected by binding of BRACO-19 (15).
Binding of TMPyP4 ( 1) is associated with a configurational entropy change of -6.36 kcal mol -1 , indicating a reasonable amount of increased rigidity of the entire G4-DNA structure.It means that the reduced conformational flexibility of G4-DNA contributes unfavorably to the free energy of non-covalent binding.This is mainly reflected through   5).
the complete system of nucleobases as an entropy decrease of -10.58 kcal mol -1 is established.Of all the nucleobases, those making the G-tetrads have a key contribution of -8.16 kcal mol -1 .At the same time, the backbone does not experience any notable flexibility change as a slight entropic increase (2.73 kcal mol -1 ) is detected.The reduced flexibility of G4 may be seen through the strengthening of Hoogsteen hydrogen bonds between the G-tetrad base pairs, thereby narrowing down the central channel that is supposed to accommodate a solvent cation between each pair of tetrads.
Binding of CX-3543 ( 2) is associated with a configurational entropy change of -35.38 kcal mol -1 , indicating a substantial amount of increased rigidity of the entire G4-DNA structure.It means that the reduced conformational flexibility of G4-DNA contributes unfavorably to the free energy of non-covalent binding.This is reflected through both the sugar-phosphate backbone and the complete system of nucleobases.The backbone itself is associated with an entropic decrease of -16.23 kcal mol -1 , while the complete system of nucleobases itself is associated with an entropic decrease of -15.13 kcal mol -1 .Among the nucleobases, the G-tetrad bases have a key entropy contribution of -13.69 kcal mol -1 .In comparison to binding of TMPyP4 (1), a considerably larger increase of the rigidity of G4 due to binding of CX-3543 (2) may be conceivable by means of both a severe curvature of the backbone (Figure S2) and a conspicuous strengthening of Hoogsteen hydrogen bonds between the G-tetrad base pairs, thereby making the space of the central channel vertically distorted, more narrowed down, and less capable of accommodating a solvent cation between each pair of tetrads.
The configurational entropy calculated for each base pair (bp) of the G-tetrads is given in Table S2 (Supplementary Information).The values obtained by 4 ns simulations are presented because the calculated entropy converges very quickly with the length of trajectory for these systems, reaching 98 % of S inf in a 4 ns run.This measure quantifies the entropy within a specific base pair and is uncorrelated with the motion of the other base pairs.All the individual intra-bp entropies remain almost unchanged (± 2 kcal mol -1 , Table S2) by external stacking interaction of every single ligand with the G4.The sum of all the intra-bp entropies estimates the total intra-bp entropy of G-tetrads, which changes by 6.74, -14.20, and -22.79 kcal mol -1 (Table 3) after having the G4 complexed with BRACO-19 (15), TMPyP4 (1), and CX-3543 (2) respectively.Only in case of BRACO-19 (15), the total intra-base pair conformational flexibility of G-tetrads contributes favorably to the free energy of binding, and the entropic change of 6.74 kcal mol -1 (Table 3) is comparable with the entropic change of 5.39 kcal mol -1 (Table 2) that refers to the system of all nucleobases.
The sum of entropies of all individual bps (denoted by Base Pairs in Table 3) can be conveniently observed with respect to the entropy of the system of all G-tetrad nucleobases (denoted by Bases in Table 3).By subtracting the Base Pairs values from the Bases values, it is possible to quantitatively evaluate the thermodynamic purpose of total inter-base pair motion (denoted by Difference in Table 3).Only binding of BRACO-19 ( 15) is associated with a negative value of -6.74 kcal mol -1 , indicating that the reduced inter-base pair conformational flexibility of Gtetrads contributes unfavorably to the binding free energy.a) Ligand no.: BRACO-19 (15), TMPyP4 (1), and CX-3543 (2).Difference (G4:15) -Difference (G4) -6.74 Difference (G4:1) -Difference (G4) 6.04 d) The difference of the values Bases and Base pairs gives the estimate of the total inter-base pair entropy.
Noteworthy is another uniqueness of BRACO-19 (15) among these three ligands.The reduced inter-base pair conformational flexibility of G-tetrads (-6.74 kcal mol -1 , Table 3), which is unfavorable thermodynamically, is lined up with a comparable increase of the intra-base pair conformational flexibility of G-tetrads (6.74 kcal mol -1 , Table 3) that is favorable thermodynamically.

CX-3543 (
Based on the discussion in this subsection, BRACO-19 (15) can be viewed as a thermodynamically favorable and highly selective G4-DNA binder that is of special interest for the structure-based design of anti-cancer agents.
There is an experimental evidence of the activity of BRACO-19 at the viral G4 level, supporting the use of Gquadruplexes as new anti-HIV-1 targets. [50]On the basis of both the present study and our recent report, [2] BRACO-19 comes out to fit the context of anti-HIV-1 drug design as a G4 binder that is functionally compatible with a benzophenanthridine derivative (11).

G4 Flexibility Change Upon Groove Binding
The configurational entropies extrapolated to infinitely long MD simulations are given in Table S3 (Supplementary Information), while their changes are reported in Table 4.
Binding of Tetrahydropalmatine ( 6) is associated with a configurational entropy change of 5.54 kcal mol -1 , indicating some increase of the flexibility of the entire G4-DNA structure.The increased conformational flexibility of G4 contributes favorably to the free energy of non-covalent binding.At the same time, the backbone and the complete system of nucleobases themselves become more flexible as positive entropic changes of 1.98 and 4.27 kcal mol -1 are observed respectively.Of the increased flexibility of all nucleobases, a slight portion (0.72 kcal mol -1 ) is reflected through the G-tetrad nucleobases.
Groove binding of Sanguinarine (5) and Hoechst 33258 (3) increases the rigidity of the entire G4-DNA structure as negative configurational entropy changes of -19.85 and -34.96 kcal mol -1 are observed respectively.It means that Sanguinarine (5) contributes less unfavorably than Hoechst 33258 (3) to the free energy of non-covalent interaction by about 15.11 kcal mol -1 .The same trend holds for both the sugar-phosphate backbone and the complete system of nucleobases.The backbone itself shows an entropic decrease of -14.25 kcal mol -1 for (5) and -21.27 kcal mol -1 for (3).The complete system of nucleobases itself displays an entropic decrease of -1.94 kcal mol -1 for (5) and -11.73 kcal mol -1 for (3).

3.76
d) The difference of the values Bases and Base pairs gives the estimate of the total inter-base pair entropy.
binding of ( 6), the reduced total intra-bp conformational flexibility of G-tetrads upon binding of ( 5) and ( 3) is unfavorable to the binding free energy.The sum of entropies of all individual bps (denoted by Base Pairs in Table 5) can be conveniently observed relative to the entropy of the system of all G-tetrad nucleobases (denoted by Bases in Table 5).Subtracting the Base Pairs values from the Bases values evaluates the thermodynamic purpose of total inter-base pair motion (denoted by Difference in Table 5) quantitatively.The total inter-bp entropy of G-tetrads changes by 0.12, 3.76, and 11.42 kcal mol -1 upon binding of ( 6), (5), and (3) respectively, thereby contributing favorably to the free energy.
Overall, Tetrahydropalmatine (6) can be seen a thermodynamically favorable and most specific groove binder.

Implications for Experimental Research
The structural design of optimal groove (or loop) binders is a challenge, as this mode of interaction is nonspecific in terms of its dependence on the particular arrangement of groove (or loop) residues.Ligand binding in a pure quartetbinding mode is more stable than that in a multiple-binding mode (the simultaneous external stacking and loop binding of ligands).This is likely because loop-binding ligand induces loop rearrangement, thereby influencing the interaction between the side chains of G-quartet-binding ligand and the loops/grooves of G-quadruplex.There are indications that combined ligand loop and G-quartet binding enhances G-quadruplex rigidity and decreases flexibility of both G-quartets and backbone. [5]In this light, it is important to highlight the following.BRACO-19, a pure G-quartetbinding ligand, concomitantly takes part in π-π stacking with the G 2 G 6 G 11 G 15 tetrad by way of its core aromatic scaffold and in five electrostatic interactions with the residues A 1 , G 6 , G 11 , and G 15 (of which none is loop nucleotide) by way of its side chains (Figure 10).Tetrahydropalmatine, a preferred groove binder, shows inclination to induce some rearrangement of G4 loop (Figure 11).Therefore, we propose the determination of an experimental structure, which would simultaneously embody the external stacking of BRACO-19 and the loop/groove binding of Tetrahydropalmatine to the G4.The necessary structural basis, in combination with remarkable theoretical [41] and computational [51] advances in the accurate determination of the small   6) is inclined to induce the rearrangement of loop residues G 8 , G 9 , A 10 , and T 14 by binding to the apo G4 in the stable regime of MD simulation around 4 ns.ligand/G4 binding free energy, would facilitate the development of effective ligand molecules capable of blocking the interaction of G4 with its binding proteins.

Final Standpoint
The quasi harmonic approximation only considers the equilibrium structure of molecule, allowing each atomic coordinate to fluctuate around a single equilibrium value [Eq.( 4)].In view of this, it is useful to analyze the calculated entropy change of G4 upon ligand binding by considering an expected enthalpic cost for creating a bincding site.The enthalpic cost, or the energy of G4, was estimated to be 38-78 kcal mol -1 . [52]Subtracting the configurational entropy changes (Tables 2 and 4) for the whole G4 structure from this specific value gives the following free energies: a)  6), (5), and (3) upon groove binding respectively.The positive values mean that the quasi harmonic approximation holds for the systems under present study.However, the applications of the current methodology are not always straightforward.Some contradictory examples show that the intercalation of aromatic ligand molecules may yield negative free energies because of an overestimation of the true entropy contribution by the quasi harmonic approximation.This inconsistency means that the entire DNA would spontaneously unwind to create an intercalation site. [53]When the present methodology is not applicable to very flexible targets (especially proteins), a more relevant theoretical framework, which goes beyond the quasi harmonic approximation, [48] is needed.
In general, conformational flexibility of molecules (a large protein or small ligand) is a very important feature.Conformational sampling that generates an ensemble of biologically meaningful conformations is needed for computer aided drug design (CADD).Computational methods, such as the Site Identification by Ligand Competitive Saturation (SILCS) protocol for structure based drug design and the Conformationally Sampled Pharmacophore (CSP) protocol for ligand based drug design, have advantages over other CADD methods that are based on single crystal structure or limited ligand conformations. [54]For protein-ligand interactions, more advanced MD techniques, such as replica exchange methods, [55,56] can be employed to enhance the sampling efficiency at micro-second scales and reveal new modes of interaction.

CONCLUSIONS
Among structurally different ligand molecules (Figure 1), those having highest affinity for the G-quadruplex from the c-Myc oncogene promoter region and being involved in different modes of interaction are identified (external stacking: BRACO-19, TMPyP4, and CX-3543; groove binding: Tetrahydropalmatine, Sanguinarine, and Hoechst 33258).The ligands are established to be more prone to bind at the end of G-quadruplex via π-π stacking interaction.The groove or loop of G4 is proposed to be a viable site -a recognition motif of interst for the structure-based anticancer drug design.
Conformational flexibility changes of the receptor upon ligand binding are observed by evaluating the changes of configurational entropy in the asymptotic regime (t → ∞) of MD simulation.
The preferred mode of interaction is the external stacking of BRACO-19 as a remarkably increased flexibility of the entire G4-DNA structure is observed, while the backbone and the complete system of nucleobases retain the same trend respectively.The origin of the additional conformational flexibility is dissected by analyzing the configurational entropy contributions at the level of individual base pairs of the G-tetrads and by evaluating the thermodynamic purpose of the total intra-base pair and inter-base pair motions within G-tetrads.
Groove binding of Tetrahydropalmatine is uniquely found to increase the flexibility of the entire G4-DNA structure, thus contributing favorably to the free energy of noncovalent binding.
Implications of the present work for experimental research are discussed.
The way of applying the present methodology to very flexible targets, such as proteins, is clarified.

Figure 6 .
Figure 6.Cartoon display of the complex formed upon external stacking of BRACO-19 (15) to apo (ligand-free) G4 in the stable regime of MD simulation around 4 ns.

Figure 7 .
Figure 7. Dependence of configurational entropy (T S in kcal mol -1 , T = 300 K) on the length of MD trajectory for the various parts of the G4 structure in complex with the external stacking ligands BRACO-19 (15), TMPyP4 (1), and CX-3543 (2) respectively.Solid lines are fitted functions based on Eq. (5).

Figure 8 .
Figure 8. Cartoon display of the complex formed upon groove binding of Tetrahydropalmatine (6) to apo (ligandfree) G4 in the stable regime of MD simulation around 4 ns.

Figure 9 .
Figure 9. Dependence of configurational entropy (T S in kcal mol -1 , T = 300 K) on the length of MD trajectory for the various parts of the G4 structure in complex with the groove binding ligands Tetrahydropalmatine (6), Sanguinarine (5), and Hoechst 33258 (3) respectively.Solid lines are fitted functions based on Eq. (5).

Figure 10 .
Figure 10.BRACO-19 (15) is involved not only in π-π stacking with the G4, but also in five electrostatic interactions (given by dashed line) with the residues A 1 , G 6 , G 11 , and G 15 in the stable regime of MD simulation around 4 ns.

Figure 11 .
Figure 11.Tetrahydropalmatine (6) is inclined to induce the rearrangement of loop residues G 8 , G 9 , A 10 , and T 14 by binding to the apo G4 in the stable regime of MD simulation around 4 ns.