Journal of the Brazilian Chemical Society | |
A Structure-Activity Relationship (SAR) Study of Neolignan Compounds with Anti-schistosomiasis Activity | |
Silva, Albérico B. F. da1  Alves, Claúdio N.1  Santos, Lourival S.1  Barata, Lauro E. S.1  Universidade Estadual de Campinas, Campinas, Brazil1  Camargo, Ademir J.1  Honório, Káthia M.1  Macedo, Luiz G. M. de1  Universidade de São Paulo, São Carlos, Brazil1  Universidade Federal do Pará, Belém, Brazil1  Jardim, Iselino N.1  | |
关键词: PM3; neolignans; anti-schistosomiasis; principal component analysis; hierarchical cluster analysis; discriminant analysis; Kth nearest neighbor ; | |
DOI : 10.1590/S0103-50532002000300003 | |
学科分类:化学(综合) | |
来源: SciELO | |
【 摘 要 】
Neolignans are dimers obtained from oxidative coupling of allyl and propenyl phenols occurring in the Myristicaceae and other primitive plant families. The Virola is the most representative Myristicaceae found throughout the Americas.1,2 In 1970, initial studies of leaves of Virola surinamensis showed high efficacy in the cercariae blockage tests of Schistosoma mansoni in mice.3 The active substances responsible for protection were isolated and identified as the natural neolignans virolin and surinamensin. In order to determine the biological activity of neolignans, Barata et al.4 and Santos5 synthesized eighteen analogues of neolignan compounds which were submitted to biological tests against fungi, bacteria, leishmaniasis, schistosomiasis, cancer and PAF (platelet activating factor).6 From the eighteen neolignan compounds synthesized, five have been classified as active and thirteen have been classified as inactive against schistosomiasis (all tests in vitro).4,5 In the present work we calculated selected molecular descriptors of the eighteen neolignan derivatives synthesized4,5 and afterwards statistical methods (principal component analysis, PCA, hierarchical cluster analysis, HCA, and discriminant analysis, DA) were used in order to obtain the relationship between the molecular descriptors and the biological activity. The results obtained with PCA, HCA and DA were tested in a new set of neolignan compounds and the KNN method was used for activity prediction of these new compounds. The molecular descriptors calculated were selected so that some steric, electronic and hydrophobic characteristics of these compounds could be taken into account since each one of them can contribute to the biological activity and give information about the interactions between the compounds and their respective biological receptor. Calculations Figure 1 shows the central chemical structure and numbering used in all eighteen neolignan compounds studied here, and Figure 2 displays the individual chemical structure, along with the activity indication, for each one of the eighteen neolignan molecules. The compounds showed in Figure 2 consist of five active molecules (5, 6, 8, 9 and 17) and thirteen inactive molecules (1, 2, 3, 4, 7, 10, 11, 12, 13, 14, 15, 16 and 18). All geometry calculations were performed by using the PM3 method7 and the geometries were fully optimized by using the EF algorithm of the AMPAC 6.5 molecular package.8 Among many descriptors (variables) that have been utilized in SAR studies,9-11 we have chosen the following descriptors to be evaluated: HOMO - The energy of the highest occupied molecular orbital (eV); LUMO - The energy of the lowest unoccupied molecular orbital (eV); c - Mulliken's electronegativity (eV); POL - Molecular polarizability (a.u.); QN - Net atomic charge on atom N; t - Torsional angle (see Figure 1); d - bond angle (see Figure 1); HE - Hydration energy (kcal mol-1); MR - Molecular refractivity (Å3); VOL - Molecular volume (Å3); Log P - Partition coefficient. The calculated descriptors were selected so that they could represent electronic (HOMO, LUMO, c, POL, Q2, Q3, Q12, Q13, Q19, MR and HE), steric (t, d and VOL) and hydrophobic (Log P) properties of the compounds studied. These properties (descriptors) are supposed to be important to explain the anti-schistosomiasis activity of the neolignan molecules under study here9 and the number of calculated descriptors was limited by the softwares we used in the calculations. The structural descriptors t and d were obtained during the optimization procedure and the most stable structures were used to obtain the other descriptors. The descriptors HOMO, LUMO, c, POL and logP were obtained by using the HyperChem/Chemplus molecular package12 and the atomic charges were obtained by employing the electrostatic potential method of the Spartan program.13 The electrostatic potential method is based on the calculation of a set of punctual atomic charges so that they represent the possible best quantum molecular (electrostatic) potential for a set of points defined around a molecule.14,15 The routine developed by Connolly16 was employed and this methodology uses a density of 1 point per A2 in four layers placed at distances 1.4, 1.6, 1.8 and 2.0 times the Van der Waals radii.16 The charges derived from the electrostatic potential method are physically more satisfactory than the Mulliken's charges,17 especially when related to biological activity. All the statistical analysis (PCA, HCA, DA and KNN) employed here were performed making use of the program MATLAB 6.0.18 Results and Discussion Principal component analysis (PCA)The central idea of PCA is to reduce the dimensionality of the data set explaining the variance-covariance structure. This is achieved by linear transformation of the original data set of variables into a smaller number of uncorrelated principal components (PCs). Geometrically, this transformation represents the rotation of the original coordinate system. The direction of the maximum residual variance is given by the first principal component axis. The second principal component, orthogonal to the first one, has the second maximum variance and so on. In this way, projections conserving maximum amounts of statistical information can be plotted in order to show us a more detailed study of data structure.19-21 Before applying the PCA method, each one of the variables was autoscaled so that they could be compared to each other on the same scale. Table 1 shows the correlation matrix for all calculated variables and it was initially obtained in order to eliminate the correlated variables. After several attempts to obtain a good classification of the compounds (separation between the anti-schistosomiasis active and inactive compounds), the best separation was obtained with 3 variables (see Table 2) out of the 15 we had initially. This suggests that the other 12 variables are not important for classifying these compounds according to their anti-schistosomiasis activity. Table 3 shows the correlation matrix for the three variables used for the separation between active and inactive compounds. The PCA results show that the first two principal components (PC1 and PC2) describe 84.63% of the overall variance as follows: PC1 = 67.11 and PC2 = 17.52%. Since almost all of the variance is explained by the first two PCs, their score plot is a reliable representation of the spatial distribution of the points for the data set studied. The score plots were examined and the most informative one is presented in Figure 3 by first principal component (PC1) against the second principal component (PC2). Table 4 shows the loading vectors for PC1 and PC2. Looking at Figure 3 we can see that the eighteen neolignan compounds studied are separated into two groups, A and B. Group A contains the active compounds (5, 6, 8, 9 and 17) and group B contains the inactive compounds (1, 2, 3, 4, 7, 10, 11, 12, 13, 14, 15, 16 and 18). Also from Figure 3, we can see that PC1 alone is responsible for the separation between the active and inactive compounds. Figure 4 displays the plot of the loading vectors for the first two principal components (PC1 and PC2). According to Table 4, PC1 can be expressed through the following equation: From the equation above, we can see that active molecules can be obtained when we have higher values for MR combined with more positive charges on C19 and lower values for the variable HE (notice that HE is negative in the PC1 equation). These characteristics can be useful in the design of new neolignan compounds with effective anti-schistosomiasis activity. Here it is interesting to mention that the variables Q19, HE and MR are all electronic descriptors and represent the strength of a molecular association by electrostatic interaction.Hierarchical cluster analysis (HCA)Another technique very used for analyzing a complex data is the hierarchical cluster analysis (HCA). In the HCA, each object (the 18 molecules studied) is initially assumed to be a lone cluster. One similarity matrix is built, generally calculating the Euclidean distance between all the objects, and scanned for the minor values. The corresponding objects are clustered together and treated as a single cluster. Successive iterations lead to the total clustering of all objects generating a dendrogram with the objects clustered together according to their similarity level. Figure 5 shows the results obtained from the HCA analysis. The horizontal lines in Figure 5 represent the compounds and the vertical lines the similarity values between pairs of compounds, a compound and a group of compounds and between groups of compounds. The similarity value between the two classes of compounds was 0.15 and this means these two classes are distinct. From Figure 5, we can see the HCA results are very similar to those obtained with the PCA analysis, i.e. the compounds studied were grouped into two categories: actives (5, 6, 8, 9 and 17) and inactives (1, 2, 3, 4, 7, 10, 11, 12, 13, 14, 15, 16 and 18). Stepwise discriminant analysisDiscriminant analysis is a multivariate technique that has two principal goals: (1) separate objects from distinct populations; (2) allocate new objects to populations previously defined.21,22 Here we consider two groups: Group A, that contains the active compounds (5, 6, 8, 9 and 17) and Group B, that contains the inactive compounds (1, 2, 3, 4, 7, 10, 11, 12, 13, 14, 15, 16 and 18). The stepwise discriminant analysis is a linear discriminant method based on the Fischer test (F-test) for the significance of the variables.22 In each step one variable will be selected on the basis of its significance. After two steps, the two more significant variables were extracted from the fifteen variables under investigation: MR and HE. The discriminant functions are given as follows: The variables MR and HE represent the strength of a molecular association by electrostatic interaction. By using the quantities given in the discriminant functions above, we can obtain the classification summary showed in Table 5. The classification error rate was 0%, resulting in a satisfactory separation of the two groups. The allocation rule derived from the DA results, when the anti-schistosomiasis activity of a new neolignan compound is investigated, is: (a) initially one calculates, for the new neolignan compound, the value of the two more important variables obtained with the DA methodology (MR and HE); (b) substitute these values in the two discriminant functions obtained in this work; (c) check which discriminant function (Group A - anti-shistosomiasis active compounds or Group B - anti-shistosomiasis inactive compounds) presents the higher value. The new neolignan compound is active if the higher value is related to the discriminant function of Group A and vice versa. Comparing the results obtained with the DA and PCA methodologies, we can notice that the variables MR and HE are important in both methodologies. Recalling the PC1 equation, one sees that MR and HE are the two variables with higher weights. Thus, combining the results obtained with DA and PCA we can say that MR and HE are key variables for explaining the anti-shistosomiasis activity of the neolignan compounds studied here, but also Q19 is an important variable to be considered when one is trying to obtain (design) neolignan compounds with anti-schistosomiasis activity. It is interesting to notice that all of the three variables (MR, HE and Q19 ) found here as having an important role in anti-schistosomiasis activity are electronic descriptors, therefore we can conclude that electronic properties have a very important role in the anti-schistosomiasis activity of neolignan compounds. Particularly, as the descriptors MR, HE and Q19 represent the strength of a molecular association by electronic interaction, it is reasonable to suggest that electrostatic interactions play an important role in the mechanism of the anti-schistosomiasis activity.Kth nearest neighbor (KNN)The KNN method classifies a new compound (object) according to its distance to an object of the training set. The closer neighbors of the training set are found and the object will be assigned into the class that have the majority of its nearest neighbors. This method is self-validating because in the training set each sample (object) is compared with all of the others in the set but not with itself. The best value of K can be chosen based on the results from the training set alone.23 The classical KNN approach does not have outlier detection capability, i.e. a classification is always made whether or not the unknown object is a member of any class in the training set. This method was used for the validation of the initial data set and Table 6 presents the results obtained with 1, 3 and 5 nearest neighbors. For the
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201912050579264ZK.pdf | 80KB | download |