PREDICTIVE QSAR ANALYSIS OF FLAVONOID ANALOGUES AS ANTIPSORIATIC AGENTSHTML Full Text
PREDICTIVE QSAR ANALYSIS OF FLAVONOID ANALOGUES AS ANTIPSORIATIC AGENTS
P. K. Sharma and B. V. Vakil*
Guru Nanak Institute for Research and Development, G. N. Khalsa College, Nathalal Parekh Marg, Matunga, Mumbai - 400019, Maharashtra, India.
ABSTRACT: Objective: Newly designed antipsoriatic agents which are substituted series of analogues of flavonoids (kaempferol and quercetin) that belong to the subclass flavonols were subjected to (2D-QSAR) analysis using VLIFEMDS-QSARPro software with an intention to derive and understand the possible correlation of biological activity as dependent variable and other descriptors like molecular weight, XLogP values as independent variables. It can be concluded that the current study provides better insight for designing and chemical synthesis of more potent antipsoriatic agents. Methods: Several statistical regression expressions were obtained using variable selection method as simulated annealing coupled with various model building methods like partial least squares (PLS) Regression, multiple linear regression (MLR) etc. Results: For the analogues of both quercetin and kaempferol, a total of 9 QSAR models were generated, each using test set of 15 and training set of 45 similar compounds. The best QSAR model generated by PLS model building method for quercetin was model Q4 with correlation coefficient r2 of 0.9021 and significant cross validated correlation coefficient q2 of 0.5791.Similarly, the best QSAR model generated by MLR method, for kaempferol was model K2 with r2 of 0.687, significant q2 of 0.5676. Both model Q4 and K2, gave significant results and revealed that presence of SdOE-index, SdsCH count favours the biological activity of quercetin analogues whereas presence of SdssCE-index contributes positively towards biological activity of analogues of kaempferol. This study suggests that such descriptors will be helpful in designing more potent antipsoriatic agents.
Quercetin, Kaempferol, MLR, VLIFEMDS - QSARPro, 2D-QSAR, PLS
INTRODUCTION: Psoriasis is one of the most common chronic, inflammatory and non-contagious autoimmune skin diseases that produces dry flakes as plaques of thickened, scaling skin. The dry flakes are thought to result from the excessively rapid proliferation of skin cells that is triggered by inflammatory chemicals produced by lymphocytes 1.
The disease is considered as a T- cell mediated immune response characterized by hyper proliferative keratinocytes coupled with infiltration of T cells, dendritic cells, macrophages and neutrophils 2.
The several subsets of T cells each have a distinct function. T-helper cell (TH1 and TH2) initiate production of defining cytokines whereas IFN-γ and interleukin (IL)-4 are essential for initiating psoriatic lesions 3, 4. TH cells assist other white blood cells in immunological processes, including maturation of B cells into plasma cells, memory B cells along with activation of cytotoxic T cells and macrophages.
These cells are also known as CD4+ T cells because they express the CD4 glycoprotein on their surfaces. TH cells become activated when they are presented with peptide antigens by MHC class II molecules, which are expressed on the surface of antigen-presenting cells (APCs).Once activated, they divide rapidly and secrete small proteins called cytokines that regulate in the active immune response. These cells can differentiate into one of the subtypes, including TH1, TH2, TH3, TH17, TH9, or TFH, which secrete different cytokines to facilitate different types of immune responses. Signalling from the Antigen Presenting Cells (APC) directs T cells into particular subtypes 5.
There are many reported drug targets in psoriasis like STAT3, p53, Caspases etc. Out of which one is a protein called calcineurin (CaN), which is also protein phosphatase 2B (PP2B) 6. It is a ubiquitously expressed Ca2+-dependent cytosolic Ser/Thr protein phosphatase and is highly conserved in eukaryotes 7. Cyclosporin A (CsA) is molecule of choice for treatment of psoriasis. CsA is a potent immune suppressant that induces its biological effects by forming an initial complex with cytosolic proteins termed immunophilins. These drug immunophilin complexes then bind to and inhibit the serine/threonine protein phosphatase calcineurin (CaN) 8.
CaN seems to be the only protein phosphatase that dephosphorylates NFATc (nuclear factor of activated T cells) 7. Upon stimulation of T cells and subsequent calcium mobilization, activated calcineurin dephosphorylates NFATc (at 13 serine residues in the regulatory region), leading to its nuclear translocation by exposure of the nuclear localization sequence 8. Concerted rephos-phorylation of NFATc leads to its translocation into cytosol and abrogation of NFATc transcriptional activity. CaN not only modulates the activity of NFATc but also several other transcription factors such as NF-κB, AP-1, and Elk1 1. The inhibition of calcineurin activity is so far the only effective therapeutic strategy to suppress the activation of memory CD4+ and CD8+ T cells and their proliferation that plays an important role in initiation of psoriasis 1, 4.
CsA is considered as a classical drug targeting calcineurin activity and subsequently inhibiting NFATc activation in onset of psoriasis 7. It is one of the few drugs suppressing not only the activation of naïve and effect or TH cells but also the memory TH cells 9. The efficacy of CsA has been used as a powerful argument to support the fundamental role of T cells in the pathogenesis of psoriasis 10. Systemic use of CsA is effective in treatment of psoriasis. However, its usage is restricted by serious side-effects such as nephro- and neuro-toxicity. Over the past 2 decades, considerable progress has been made in further elucidating the complex pathogenesis of psoriasis which will hopefully facilitate the development of a new armamentarium of more effective, targeted therapies 11. Despite these important advances, substantial gaps remain in our understanding of psoriasis and its treatment, necessitating further research 12.
Herbal Drugs may be developed as an alternative control medication without the harmful side effects known to be associated with drugs like CsA. Flavonoids of herbal or synthetic origin are reported to exhibit various anti-inflammatory properties 13. Flavonoids are hydroxylated phenolic substances and are known to be synthesized by plants in response to microbial infections 14. Activities of flavonoids are structure dependent and they have been shown to have an ability to induce human protective enzyme systems 15. A number of studies have suggested protective effects of flavonoids against many infectious bacterial, viral diseases and degenerative diseases such as cardiovascular diseases, cancers and other age-related diseases, etc 15, 16. Kaempferol and quercetin are flavonoids that belong to the subclass flavonols 14, 15 and are reported to inhibit the phosphatase activity by binding to the catalytic domain of calcineurin and act independently of any matchmaker protein. Kaempferol suppresses IL-2 gene expression in Jurkat T cells 17. Surprisingly, it also inhibits the calcineurin-independent TNFα-induced NF-κB activation in HEK293 cells 1.
In silico Analysis: A new field of in silico protein modelling and novel drug designing has become available which considerably saves time and efforts by identifying potential drug targets via bioinformatics tools 18. The in silico tools can be used to analyze the target protein structures for possible binding sites and generate candidate drug which can bind effectively to target 19. Various softwares are designed to suit this approach such as modelling environment of Schrödinger Maestro, Discovery Studio, VLIFEMDS, etc 20. New substituted series of analogues have been designed using Maestro 11 modelling interface of Schrödinger software (https://www.schrodinger. com/maestro) 21. Maestro 11 is powerful and versatile molecular modelling environment software and is the linchpin of Schrödinger's computational technology. Maestro 11 has a vast array of visualization options making it possible to glean molecular properties as well as detailed intermolecular interactions. It is the portal to the most advanced science in computational chemistry 22.
QSAR: Quantitative structure-activity relationship (QSAR) (sometimes QSPR: quantitative structure property relationship) is the process by which a chemical structure is quantitatively correlated with a well-defined process, such as biological activity or chemical reactivity. It is based on the fact that biological activity of a compound is a function of its physicochemical parameters, i.e. physical properties, such as molecular weight, solubility, surface tension, partition coefficient and chemical properties such as dissociation or ionization, electron density, and rate of hydrolysis, etc 23. There are two main objectives for the development of QSAR: Development of predictive and robust QSAR, with a specified chemical domain, for prediction of activity of untested molecules and it can also act as an informative tool by extracting significant patterns in descriptors related to the measured biological activity leading to understanding of mechanisms of given biological activity. Such information could help in suggesting designing of novel lead molecules with improved activity profile 24, 25.
The quantitative approach in QSAR depends upon expression of a structure by numerical values and then relating these values to the corresponding changes in the biological activity by using statistical methods. The QSAR is also an important tool to study role of various physicochemical properties of a drug in providing necessary biological activity. There is a wide choice of relevant paid and free software tools for developing QSAR models like VLIFEMDS, Schrödinger, Vega, etc 24 - 26. The QSAR based modelling uses variety of statistical regression methods for prediction of various structure activity models based on descriptors like molecular weight, XLogP, type of hydrogen bond donor, acceptor etc 27. multiple linear regression (MLR), partial least squares (PLS) and principle component regression (PCR) are widely used statistical regression methods. This bioinformatics based approach is expected to help in gaining better insight to predict the biological activity for new analogues 25, 26.
The VLIFEMDS-QSARPro (www.vlifesciences. com) is an exhaustive regression analysis tool and has the prediction facilities for biological activities of newly designed analogues. It also comes with powerful pack of facilities for quantitative structure activity relationship analysis, statistical modelling, activity / property prediction and visualization 24-26. This software performs descriptor calculations, rapid calculation of more than 1000+ descriptors like 2D, 3D, alignment independent, graphical representation of relative distribution of descriptor values by distribution and pattern plot and cross-correlation matrix. In addition, VLIFE-QSARPro offers wide choice of methods for drug variable selection and model building. Any of the variable selection methods with any of the model building methods can be coupled e.g. simulated annealing variable method with statistical model building methods such as MLR, PCR and PLS 28.
2D QSAR models are based on descriptors derived from a 2D graph representation of a molecule. Such models are generated using training and test set molecules with uniformly distributed biological activities. The observed selection of test set molecules is made by considering the fact that test set molecules will represent a range of biological activity similar to the training set 24.
Biological Activity: Any structure and activity study is based on the assumption that there is an underlying relationship between the molecular structure of the compound and its biological activity. The QSAR analysis is an attempt to establish a correlation between various molecular properties of a set of molecules with their experimentally known biological activity. A dataset of a series of synthesized molecules tested for its desired biological activity is needed for carrying out the QSAR analysis. The quality of the predicted model is totally dependent on the quality of the experimental data used for building these models. Biological activity for the purpose of QSAR studies can be of two types: 1) Continuous Response: MEC, IC50, ED50, % inhibition 2) Categorical Response: Active / Inactive. Also to have confidence in QSAR analysis, biological data of at least 20 molecules is recommended 24.
Descriptor Calculation: Good descriptors should characterize molecular properties important for molecular interactions – Hydrophobic, electronic, steric / size / shape, hydrogen bonding. The handbook of molecular descriptors published in the year 2000 describes more than 2000 molecular descriptors used in QSAR and molecular modelling 29. The VLIFEMDS-QSARPro software has been employed for the calculation of different 2D descriptors. There are three categories of descriptors like thermodynamic parameters which describe free energy change during drug receptor complex formation, spatial parameters are the quantified steric features of drug molecules required for its complimentary fit with receptor and electronic parameters deal with weak non-covalent bonding between drug molecules and receptor. Some example of descriptors are XlogP, logP, hydrophobicity, elemental count, path count, chain count, path cluster count, molecular connectivity indices, etc. The calculated descriptors are to be gathered in a data matrix for selection of test and training set 27, 30.
Selection of Training and Test Set: In order to obtain a validated QSAR model for the purpose of meaningful prediction, an available dataset should be divided into the training and test sets. For the prediction statistics to be reliable, the test set must include at least five compounds. Ideally, the division into the training and test set must satisfy the following three conditions: (i) All representative compound-points of the test set in the multidimensional descriptor space must be close to those of the training set. (ii) All representative points of the training set must be close to those of the test set. (iii) The representative points of the training set must be distributed within the whole area occupied by the entire dataset 24. Following are some of the methods for division of the dataset into training and test set: 1) Manual Selection: This is done by visualizing the variation in the chemical and biological space of the given dataset. 2) Random Selection: This method creates training and test set by random distribution. 3) Sphere Exclusion Method: This is a rational method for creation of training and test set. It ensures that the points in the both the sets are uniformly distributed with respect to chemical and biological space. 4) Other methods like experimental design using for example, full factorial, fractional factorial and Onion Design etc 31, 32.
Variable Selection Method: The QSAR regression methods help to shortlist important descriptors which play an important role in determining biological activity of structures. These methods are divided mainly into two categories: 1) Systematic variable selection: These methods add or delete a descriptor in steps, one-by-one, in the model and the addition can be stepwise forward or stepwise forward-backward or stepwise backward 2) Stochastic variable selection: These methods are based on simulation of various physical or biological processes 33. These methods create model starting from randomly generated models and later modifying these models by using different process operators (e.g. perturbation, crossover etc.) to get better models. Various stochastic variable selection methods are available and one can choose the appropriate one like simulated annealing, genetic/evolutionary algorithms or a user defined selection method 24. The results obtained with any variable method further are subjected to construction of 2D and 3D QSAR models by coupling with appropriate statistical model building methods like MLR, PCR or PLS to generate QSAR models 28, 34.
Multiple Linear Regressions (MLR): It is widely used method for building QSAR models. MLR models have been developed as a mathematical equation which can relate chemical structure to the activity. The results obtained could be helpful to pharmacologists and medicinal chemists to come up with improved drugs like antipsoriatic agents 35. This method has been used for modelling linear relationship between a dependent variable Y (IC50) and independent variable X (2D descriptors). MLR is based on least squares calculations where sum-of-squares of differences of observed and a predicted value is minimized to achieve a better model. MLR estimates values of regression coefficients (r2) by applying least squares curve fitting method 28, 34. The resultant model creates a relationship in the form of a straight line (linear) that best approximates all the individual data points. In regression analysis, conditional mean of dependent variable (IC50) Y depends on (descriptors) X. MLR analysis extends this idea to include more than one independent variable.
Regression equation takes the form: b1x1 + b2x2 + b3x3 ---------(i)
Where Y is dependent variable, ‘b’s are regression coefficients for corresponding ‘x’s (independent variable), ‘c’ is a regression constant or intercept 36.
Principal Component Regression Analysis (PCR): It is a data compression method based upon the correlation among dependent and independent variables. PCR provides a method for finding structure in datasets. Its aim is to group correlated variables, replacing the original descriptors by new set called principal components (PCs) 37. These PCs are uncorrelated and built as a simple linear combination of original variables. It rotates the data into a new set of axes such that first few axes reflect most of the variations within the data 38. PCA selects a new set of axes for the data. These are selected in decreasing order of variance within the data. Purpose of principal component PCR is the estimation of values of a dependent variable on the basis of selected PCs of independent variables 37, 39.
Partial Least Squares Regression (PLS): It is a popular regression technique which can be used to relate one or more dependent variable (Y) to several independent (X) variables 40. PLS relates a matrix Y of dependent variables to a matrix X of molecular structure descriptors. PLS is useful in situations where the number of independent variables exceed the number of observations, when X data contain collinearities or when N is less than 5 M, where N is number of compounds and M is number of dependent variable. Main aim of PLS regression is to predict the activity Y from X and to describe their common structure 41.
Model Validation: Model validation is a process to test the internal stability and predictive ability of the QSAR models 39. Developed QSAR models have been validated by some procedure as follows:
Internal validation is carried out using leave one-out (LOO - Q2) method. For calculating, each molecule in the training set is to be eliminated once and the activity of the eliminated molecule predicted by using the model developed by the remaining molecules. That q2 is calculated using the equation which describes the internal stability of a model 27, 40.
In Eq. (ii), Ypred and Yabc indicate predicted and observed activity values respectively and Ymean indicate mean activity value. A model is considered acceptable when value of Q2 exceeds 0.5.
For external validation, the activity of each molecule in the test set is predicted using the model developed by the training set. The pred_r2 value is calculated as follows:
In Eq (iii) Ypred (test) and Ytest indicates predicted and Ytrain observed activity values respectively of the test set compounds and indicates mean activity value of the training set. For pred_r2 value should be more than 0.5, which signifies that predicted model can be taken for further analysis 39.
Randomization Test: Randomization test or Y-scrambling is important popular mean of statistical validation. To evaluate the statistical significance of the QSAR model for an actual dataset, one tail hypothesis testing is commonly used. The robustness of the models for training sets is examined by comparing these models to those derived for random datasets. Random sets are generated by rearranging the activities of the molecules in the training set. The statistical model is derived using various randomly rearranged activities (random sets) with the selected descriptors and the corresponding q2 are calculated. The significance of the models hence obtained is derived based on a calculated Z score.
A Z score value is calculated by the following formula: = Z score = h-µ/σ ------------ (iv)
Where h is the q2 value calculated for the actual dataset, µ the average q2 and s is its standard deviation calculated for various iterations using models build by different random datasets. The probability (a) of significance of randomization test is derived by comparing Z score value with critical Z score value as reported, if Z score value is less than 4.0; otherwise it is calculated by the formula as given in the literature. For example, a Z score value greater than 3.10 indicates that there is a probability (a) of less than 0.001 that the QSAR model constructed for the real dataset is random. The randomization test suggests that all the developed models have a probability of less than 1% that the model is generated by chance 30, 32, 38.
The present work was undertaken because currently there are very few drugs available for the treatment of the debilitating psoriasis disease and there is a need to quickly identify the substituted analogues of promising phytochemical compounds using in silico approach that may help to identify novel compounds having potential biological activity and less side effects. The promising analogues predicted may then be actually synthesized in the laboratory and evaluated using in vitro and in vivo testing methods. The objective of the present work was to perform in silico experiments on novel substituted series of analogues of quercetin and kaempferol to predict their biological activity.
MATERIALS AND METHODS:
Databases and Software (Computational Data): The flavonoids that were utilized for the study were quercetin and kaempferol. New substituted series of analogues of both flavonoids were designed using Maestro modeling suite of schrödinger software (https://www.schrodinger.com/maestro) 21. Further QSAR analysis was used to calculate in silico biological activity of these new analogues which may have potential antipsoriatic activity. For this study, VLIFEMDS - QSARPro (www.vlifesciences.com) 24 version 3.5 was used as it consists of exhaustive regression analysis tools and facilities for prediction of biological activity. The configuration used for analysis was Lenovo computer with Intel core duo processor. The system comes with powerful pack of facilities for quantitative structure activity relationship analysis, statistical modeling, activity / property prediction and visualization 24 - 26 .
Biological Activity: Biological activity is used as dependent variable, for correlating the data linear to the free-energy change with other 2D descriptors like XLogP, hydrogen bond donors, hydrogen bond acceptors, etc 35. For the development of quercetin and kaempferol QSAR models IC50 values in μM (in vitro biological activity in terms of half maximal inhibitory concentration) were taken from the PubChem database to predict biological activity of new analogues presented in Annexure 1.
Experimentally reported IC50 values (half maximal inhibitory concentration) were converted to pIC50 scale (−log IC50) to narrow down the range. Thus, a higher value of pIC50 exhibits a more potent compound. These values were then manually incorporated in VLIFEMDS-QSARPro 31.
ANNEXURE 1: SPECIFICATION OF TRAINING SET DATA FOR 2D- QSAR
|Quercetin Training Set||Kaempferol Training Set|
|Sr. no.||Accession id:
|IC50 (μM)||pIC50 (μM)||Accession id:
|IC50 (μM)||pIC50 (μM)|
|Note : CID: pubchem compound identification|
Variable method chosen for analysis for quercetin was simulated annealing which is the simulation of a physical process, ‘annealing’, which involves heating the system to a high temperature and then gradually cooling it to a preset temperature (e.g., room temperature). During this process, the system samples possible configurations distributed according to the Boltzmann distribution so that at equilibrium, low energy states are the most populated 33. Variable method chosen for kaempferol analogues was one of the systemic variable method i.e. forward - backward because simulated method was not able to predict acceptable model parameters for these analogues.
For QSAR analysis the substitution ratio of test and training sets considered for generation of QSAR models of quercetin and kaempferol were taken in a ratio of 1:3 therefore each dataset of 15 new designed analogues of quercetin and kaempferol were taken as test set (independent variables).
Dataset of 45 structures of already available similar compounds to quercetin and kaempferol were taken as training set (dependent variable) along with their biological activity IC50 values 41 from PubChem database 42.
The 2D descriptors like thermodynamic, spatial and electronic parameters were calculated which resulted in 95 descriptors for quercetin and 100 for kaempferol generated in data matrix. From variable selection methods simulated annealing was chosen and then coupled with the three statistical model building methods MLR, PCR and PLS to develop final QSAR models 28, 34. The parameters like cross-correlation limit was set at 1.000000, number of variables in the final equation to be generated was 10. For the model building methods MLR, PCR and PLS, term selection criteria as r2, f-test ‘in,’ at 4.000000 and ‘out’ at 3.990000. Variance cutoff was set at 0.0, scaling to auto scaling and number of random iterations to 100.
2D QSAR Model Development and its Validation: In silico QSAR analysis using VLIFE-QSARPro software was carried out defining the dataset into (training: test set) where quercetin and kaempferol test set was of 15 structures and training set of 45 structures each. Chemical structures of both parent flavonoids are shown in Fig. 1 quercetin and kaempferol:
FIG. 1: PARENT CHEMICAL STRUCTURE OF 1(A) QUERCETIN, 1(B) KAEMPFEROL 42
Annexure 1 shows training data set used for 2D QSAR analysis for quercetin and kaempferol new analogues that were retrieved from PubChem database with IC50 values converted to pIC50 values.
The details of the substituent groups created in parent structures positions of quercetin are presented as test set in Table 1 whereas details of the substituent groups created in parent structure of kaempferol as test set are shown in Table 2.
TABLE 1: SPECIFICATION FOR TEST SET OF QUERCETIN ANALOGUES
|Positions and substitutions in parent flavonoid structure of quercetin as test set|
|Quercetin Analogues Name||R1||R2||R3||R4||R5||R6|
Note: (-) indicates no substitution in that position.
Table 3 shows the best QSAR models predicted by VLIFE-QSARPro for substituted series of analogues of quercetin using simulated annealing variable selection method coupled with three model building methods like PLS, MLR and PCR. Similarly Table 4 shows QSAR models developed for analogues of kaempferol using systemic variable selection method coupled with same above mentioned model building methods.
TABLE 2: SPECIFICATION FOR TEST SET OF KAEMPFEROL ANALOGUES
|Positions and substitutions in parent flavonoid structure of kaempferol as test set|
|Kaempferol Analogues Name||R1||R2||R3|
TABLE 3: 2D-QSAR MODELS PREDICTED FOR QUERCETIN ANALOGUES
|Model building Method||MLR||PLS||PCR|
Note: % denotes the ratio of test set verses training set data.
TABLE 4: 2D - QSAR MODELS PREDICTED FOR KAEMPFEROL ANALOGUES
|Model building Method||MLR||PLS||PCR|
Note: % denotes the ratio of test set verses training set data
TABLE 5: OTHER STATISTICAL REGRESSION PARAMETERS OBTAINED FOR QUERCETIN ANALOGUES BEST MODEL (Q4) AND FOR KAEMPFEROL ANALOGUES BEST MODEL (K2)
|Model building method||PLS||MLR|
|Z score _ran_ r2||4.7648||7.52232|
Key : % = ratio of test set verses training set data, MLR = multiple linear regression, PLS = partial least squares, N = number of molecules of training set, Df = degree of freedom, r2 = coefficient of determination, q2 = cross-validated r 2, pred-r2 = r2 for external test set, Z score = the Z score calculated by q2 in the randomization test, best_ran_q2 = the highest q2 value in the randomization test and a_ran_q2 = the statistical significance parameter obtained by the randomization test.
From the data obtained and shown in the above 2 tables, model Q4 and K2 were selected as promising models with best cross validated squared correlation coefficient q2 > 0.3 for training set and r2> 0.6 for test set., which show accuracy of the statistical calculation. All other statistical regression expressions parameters were obtained using simulated annealing variable selection method in combination with model building method PLS for the model Q4 of Quercetin analogues. Similarly parameters were retrieved for K2 model of kaempferol analogues using systemic variable selection method coupled with MLR. These results are shown in Table 5. The Q4 and K2 models fulfill the selection criteria such as correlation coefficient r2 > 0.6 for anti-inflammatory activity with low standard error of squared correlation coefficient r2_se < 0.3 show the relative good fitness of the model and F value > 11 times than tabulated F value show the 99% statistical significance of the regression model. These models fulfil all validation criteria with low standard error of cross validated squared correlation coefficient q2_se < 0.3 and standard error of pred_r2se < 0.1. The randomization test suggests that the developed model have a probability of less than 1 percent that the model was generated by chance 43, 44.
Quercetin: Model Q4: It gave a correlation coefficient r2 of 0.9021, significant cross validated correlation coefficient q2 of 0.5791, F test of 48.4013 and degree of freedom 21. The model is validated by best_ran_r2 = 0.67502, best_ran_q2 = 0.17430, Z score_ran_r2 = 4.7648 and Z score_ran_q2 = 2.24988. Significance of model is shown by equations developed from analysis. Statistical data has been depicted in Table 5.
Equation for model Q4 IC50 = + 0.1421 XlogP- 0.0087 5 Path Count - 0.1223 SssO count - 0.0030 2 Path Count + 0.0165 0 Path Count - 1.3120 SsBr count- 1.9669 Sulfurs Count + 0.0797 SdOE-index + 0.1412 SdsCH count - 0.0812 chiV3 + 4.3399-(v)
Kaempferol: Model K2: The model K2 gave a correlation coefficient r2 of 0.687 significant cross validated correlation coefficient q2 of 0.5676, F test of 14.2654 and degree of freedom 26. The model is validated by best_ran_r2 = 0.40784, best_ran_q2 = 0.24601, Zscore_ran_r2= 7.52232 and Zscore_ran_q2=-1.33145. Statistical data is shown in Table 5.
TABLE 6: PREDICTED BIOLOGICAL ACTIVITIES FOR QUERCETIN AND KAEMPFEROL ANALOGUES
|Analogues||Predicted IC50 (μM)||Analogues||Predicted IC50 (μM)|
Note: Q: quercetin analogues, K: kaempferol analogues; Predicted biological activities (IC50) of analogues in model quercetin (Q4) and kaempferol (K2).
Table 6 shows predicted biological activities of new analogues of Quercetin in model Q4 and Kaempferol analogues in model K2.
Equation for Model K2 IC50 = - 0.2611 XlogP+ 0.3320 SdssCE-index - 0.4438 SsssCHE-index - 0.1215SaasCcount + 6.3433 ---------(vi)
The above two equations (v, vi) have led to the development of statistically significant QSAR models of quercetin and kaempferol analogues. In addition, this 2D QSAR study allowed investigating the influence of very simple and easy to compute descriptors in determining biological activities, which could shed light on the key factors that may aid in design of novel potent molecules. All the parameters along with their importance which may potentially contribute to the specific antipsoriatic inhibitory activity in the generated models, are discussed below. The fitness plot of observed vs. predicted activity of trained and test set data of analogues of quercetin and kaempferol is shown in Fig. 2A plot of quercetin model Q4 Fig. 2B plot of kaempferol model K2 . Actual vs. Predicted activities for training and test set data for quercetin predicted by PLS model building method is shown in Fig. 3A Similarly, for kaempferol analogues activities are predicted by MLR model building method as shown in Fig. 3B.
FIG. 2A: FITNESS PLOT OF QUERCETIN MODEL Q4
Key: The training set is represented by red dots and test set by blue dots
FIG. 2B: FITNESS PLOT OF KAEMPFEROL MODEL K2
Key: The training set is represented by red dots and test set by blue dots
FIG. 3A: PLOT OF QUERCETIN MODEL Q4
FIG. 3B: PLOT OF KAEMPFEROL MODEL K2
Key: Training set- Red dots and Test Set- Blue dots.
Contributing Parameters: In generated models of quercetin and kaempferol the important parameters which contributed to the specific biological activity or pharmacological actions with percentage contribution of each descriptor in developed models explaining variation in the biological activity are shown in Fig. 4(A) and 4(B).
FIG. 4A: PERCENTAGE CONTRIBUTION PLOT OF EACH DESCRIPTOR IN DEVELOPED MODEL EXPLAINING VARIATION IN THE BIOLOGICAL ACTIVITY OF QUERCETIN MODEL Q4
FIG. 4B: PERCENTAGE CONTRIBUTION PLOT OF EACH DESCRIPTOR IN DEVELOPED MODEL EXPLAINING VARIATION IN THE BIOLOGICAL ACTIVITY OF KAEMPFEROL MODEL K2
Following are the noteworthy observations from the contribution plot Fig. 4A for quercetin analogues:
- XlogP = This descriptor signifies ratio of solute concentration in octanol and water and generally termed as Octanol Water partition Coefficient. This is atom based evaluation of logP contributed positively for +10 % towards the activity of structures.
- 5 Path Count = This descriptor signifies total number of fragments of fifth order (five bond path) in a compound contributed negatively for -10 % towards the activity of structures.
- SssO count = This descriptor defines the total number of oxygen connected with two single bonds contributed negatively for -7 % towards the activity of structures.
- 2 Path Count = This descriptor signifies total number of fragments of second order (two bond path) in a compound contributed negatively for -1 % towards the activity of structures.
- 0 Path Count = This descriptor signifies total number of fragments of zero order (atoms) in a compound contributed positively for +3 % towards the activity of structures.
- SsBr count = This descriptor defines the total number of bromine atom connected with one single bond contributed negatively for -10 % towards the activity of structures.
- Sulfurs Count = This descriptor signifies number of sulphur atoms in a compound contributed negatively for -20 % towards the activity of structures.
- SdOE-index = Electrotopological state indices for number of oxygen atom connected with one double bond contributed positively for +26 % towards the activity of structures.
- SdsCH count = This descriptor defines the total number of –CH group connected with one double and one single bond contributed positively for +9 % towards the activity of structures.
- chiV3 = This descriptor signifies atomic valence connectivity index (order 3) contributed negatively for -4 % towards the activity of structures.
Similarly, following are the noteworthy observations from the contribution plot Fig. 4B for kaempferol analogues:
- XlogP = This descriptor signifies ratio of solute concentration in octanol and water and generally termed as octanol water partition coefficient. This is atom based evaluation of logP contributed negatively for -30 % towards the activity of structures.
- SdssCE-index = Electrotopological state indices for number of carbon atom connected with one double and two single bonds contributed positively for +25 % towards the activity of structures.
- SsssCHE-index = Electrotopological state indices for number of –CH group connected with three single bonds contributed positively for -27 % towards the activity of structures.
- SaasC count = This descriptor defines the total number of carbon connected with one single bond along with two aromatic bonds contributed positively for -15 % towards the activity of structures.
- The predicted biological activity IC50 of the test set molecules of quercetin and kaempferol are already presented in Table 6 as best predicted model values. These values were predicted as output file from VLIFE-QSARPro software.
DISCUSSION: The 2D QSAR analysis was performed using VLIFE-QSARPro software where 15 each new substituted series of designed quercetin and kaempferol analogues were taken as test sets (calcineurin protein inhibitors) and 45 similar compounds for each of these flavonoids from Pubchem database were used as training sets. QSAR analysis revealed useful information regarding the structural features of novel analogues by calculating the descriptor like (logp, molecular weight, hydrogen bond donor and acceptor) and predicting which descriptors are favoring the structures.
Firstly, we tried to develop models with systemic variable selection method i.e. forward- backward for both flavonoid analogues in combination with model building methods like MLR, PLS and PCR but the attempt was not successful for quercetin analogues. The resulted quercetin QSAR models generated were not validated successfully. (data not shown). Therefore simulated annealing variable selection method was the criteria used for QSAR analysis for quercetin analogues in combination with model building methods like MLR, PLS and PCR which gave validated QSAR models. In this analysis total 9 QSAR models were obtained for quercetin and kaempferol. The PLS method for quercetin has shown promising results with model Q4. Whereas for kaempferol analogues model K2 has significant results from MLR method coupled with systemic variable method.
In above QSAR models, r2 is a correlation coefficient that has been multiplied by 100 to give explained variance in biological activity where r2 was > 0.6. Predictive ability of generated QSAR models was evaluated by q2 employing LOO method. F-test value reflects ratio of variance explained by models and variance due to error in regression. In these models high F-test value indicates that model is statistically significant. Cross validated q2 of the models were q2 > 0.3 indicates good internal prediction power of the model. Another parameter for prediction of test set analogues structures is high pred_r2 > 0.4, which shows good external predictive power of the model 45.
The best model of quercetin Q4 model revealed the presence of SdOE-index (oxygen atoms), SdsCH count (CH atoms) that favor the activity in analogues of quercetin (equation 5). Mahesh et al., (2011) has reported QSAR studies on fluoroquinolone to inhibit DNA gyrase, reveals that out of all three optimized models, MLR method has giving significant results. The estate contribution, chi, path cluster and alignment independent descriptors were major contributors 30.
Sharma (2015) has employed similar QSAR analysis to study the quantitative effects of the molecular structure of the benzimidazoles on their activity as inhibitors of IgE response. In this model special emphasis was given to the contribution of electrotopological indices in predicting biological activity of 2- phenyl-benzimidazole derivatives and they were found to improve the QSAR model and make it more precisely predictive 35.
Whereas in our case, the best model of kaempferol K2 gave very significant results and revealed the presence of SdssCE-index (carbon atom) favoring the activity of analogues (equation 6). Doreswamy et al., (2014) has conducted similar kind of studies with a series of Sulfathiazoles derivatives for Mycobacterium tuberculosis (H37Rv) inhibitors, and some useful predictive molecular models were obtained where Chi2 and SdsN count contributes positively towards the sulfathiazoles derivatives. This suggests that by change in number of chi2 and SdsN count will be helpful for designing of more potent (H37Rv) inhibitors 27.
B Bertosa et al., (2012) have conducted 2D-QSAR studies of substituted pyrazolone derivatives as anti-inflammatory agents and shown that their model fulfils the selection criteria’s such as correlation coefficient r2 > 0.8 for anti-inflammatory activity with low standard error of squared correlation coefficient r2 _se <0.3 show the relative good fitness of the model and F value > 11 times than tabulated F value show the 99% statistical significance of the regression model. Two descriptors as chi2 and SdsN count contribute positively to models 44, 45.
CONCLUSION: The novel substituted series of analogues of quercetin and kaempferol were subjected to QSAR analysis and the best model was Q4 built using PLS method showed significant predictive power and reliability as compared to other two methods. Similarly, from kaempferol analogues, model K2 predicted significant results with MLR method coupled with systemic variable method. It is anticipated that the present study may prove to be helpful in development and optimization for newly designed antipsoriatic agents. Hence the models proposed in this work are expected to be useful and can be employed to design new analogues of flavonoids with specific antipsoriasis inhibitory activity.
ACKNOWLEDGEMENT: We are thankful to Ms. Yogini Dixit and Mr. Aseem Wagle, G. N. Khalsa College Matunga, Mumbai; for their scientific contributions during the tenure of the project. We express gratitude to Mr. Elvis Martis, from Bombay College of Pharmacy for his support during the research work.
CONFLICT OF INTEREST: There is no conflict of interest.
- Sieber M and Baumgrass R: Novel inhibitors of the calcineurin / NFATc hub-alternatives to CsA and FK506? Cell Communication and Signaling 2009; 7: 25.
- Lowes MA, Suarez-Farinas M and Krueger JG: Immunology of psoriasis. Annual Review of Immunology 2014; 21: 227-255.
- Sun L and Zhang X: The immunological and genetic aspects in psoriasis. Applied Informatics springer 2014; 1: 1-3.
- Eberle FC, Bruck J, Holstein J, Hirahara K and Ghoreschi K: Recent advances in understanding psoriasis. F1000 Research 2016; 5: 1-9.
- Flatz L and Conrad C: Review on role of T-cell-mediated inflammation in psoriasis: pathogenesis and targeted therapy. Psoriasis: Targets and Therapy 2013; 3: 1-10.
- Varadwaj PK, Sharma A and Kumar R: Anoverview of psoriasis with respect to its protein targets. Egyptian Dermatology Online Journal 2010; 6: 1.
- Chen XE and Zhang Y: Molecular Cloning and Characterization of the Calcineurin Subunit A from Plutella xylostella. International Journal of Molecular Sciences 2013; 10: 692-703.
- Tedesco D and Haragsim L: Cyclosporine: a review. Journal of transplantation 2012; 1-7.
- Cai Y, Fleming C and Yan J: New insights of T cells in the pathogenesis of psoriasis. Cellular and Molecular Immunology 2012; 9: 302-309.
- Thomson AW: A text book of Cyclosporin: Mode of Action and Clinical Applications. Springer Science and Business Media, first edition 2012.
- Dubois DS and Pouliot R: Promising new treatments for psoriasis. The Scientific World Journal 2013; 1-9.
- Khandpur S and Bhari N: Newer targeted therapies in psoriasis. Indian Journal of Dermatology, Venereology and Leprology 2013; 7: 47-52.
- Singh KK and Tripathy S: Natural Treatment Alternative for Psoriasis: A Review on Herbal Resources. Journal of Applied Pharmaceutical Science 2014; 4: 114-121.
- Kumar S and Pandey AK: Chemistry and biological activities of flavonoids: An overview. The Scientific World Journal 2013; 1-16.
- Herman A and Herman AP: Topically used herbal products for the treatment of psoriasis–mechanism of action, drug delivery, clinical studies. Journal of Planta Medica 2016; 17: 1447-1455.
- Leyva-Lopez N, Gutierrez-Grijalva EP, Ambriz-Perez DL and Heredia JB: Flavonoids as cytokine modulators: a possible therapy for inflammation-related diseases. International Journal of Molecular Sciences 2016; 6: 3-15.
- Zhou CL, Lei H, Zhang DS, Zheng J and Wei Q: Kaempferol: A New Immunosuppressant of Calcineurin. International Union of Biochemistry and Molecular Biology Life 2008; 60: 549–554.
- Gangrade D, Sawant G and Mehta A: Re-thinking drug discovery: In silico method. Journal of Chemical and Pharmaceutical Research 2016, 8: 1092-1099.
- Lionta E, Spyrou G, K Vassilatis D and Cournia Z: Structure-based virtual screening for drug discovery: principles, applications and recent advances. Current Topics in Medicinal Chemistry 2014; 14: 1923-1938.
- Click 2 Drug: directory of computer-aided drug design tools. Available at:http://www.click2drug.org/
- Maestro 11: A drug designing suite of Schrödinger: Available at https://www.Schrodinger.com/maestro 2005.
- Srinivasan P,Perumal CP and Sudha A: Discovery of novel inhibitors for nek6 protein through homology model assisted structure based virtual screening and molecular docking approaches. Scientific World Journal 2014; 1- 9.
- Asirvatham S, Dhokchawle BV and Tauro SJ: Quantitative structure activity relationships studies of non-steroidal anti-inflammatory drugs: A review.Arabian Journal of Chemistry 2016; 03: 1-2.
- VLIFEMDS: Integrated platform for computer aided drug design (CADD). Available at http://www.vlifesciences. com/products/vlifemds/product.vlifemds.phph 2008.
- Shanno Pathan SM and Shrivastava M: Quantitative structure activity relationship and drug design: A Review. International Journal of Research in Biosciences 2016; 5: 1-5.
- Parekh B: QSAR modeling for drug discovery and development: Applications and methodology in computer science. International Journal of Scientific Research 2015; 4: 2277 - 8179.
- Vastrad CM: Predictive comparative QSAR analysis of sulfathiazole analogues as mycobacterium tuberculosis h37rv inhibitors. Journal of Advanced Bioinformatics Applications and Research 2014; 3: 379-390.
- Kulkarni S, Patil P, Virupaksha B, Alpana G, Prashant K and Baikerikar S: Molecular dynamics, docking and QSAR analysis of napthoquinone derivatives as topoisomerase I inhibitors.International Journal of Computational Bioinformatics in Silico 2013; 2: 223-233.
- Todeschini R and Consonni V: Handbook of Molecular Descriptors, Wiley, First Edition 2000.
- Palkar MB, Noolvi MN, Patel HM, Maddi VS and Nargund LVG: 2D-QSAR study of fluoroquinolone derivatives: an approach to design anti-tubercular agents. International Journal of Drug Design and Discovery 2011; 3: 559-574.
- Jerzy L: Handbook of Computational Chemistry. Springer Science and Business Media, First Edition 2011.
- Antre RV, Oswal RJ, Kshirsagar SS, Kore PP, Mutha MM and Rishikesh V: 2D-QSAR studies of substituted pyrazolone derivatives as anti-inflammatory agents. Medicinal Chemistry 2012; 2: 126-130.
- Guyon I and Elisseeff A: An introduction to variable and feature selection. Journal of Machine Learning Research 2003; 3: 1157-82.
- Kumar SVSA and Gupta SP: A QSAR study on some series of ATP-sensitive potassium channel openers.Letters in Drug Design & Discovery 2008; 5: 173–177.
- Sharma MC: Molecular Modeling Studies of Some Substituted 2-Phenyl-benzimidazole Derivatives as Inhibitors of IgE Response.Alternative and Integrative Medicine 1970; 4: 1-9.
- Croux C and Joossens K: Influence of observations on the misclassification probability in quadratic discriminant analysis. Journal of Multivariate Analysis 2005; 96: 384-403.
- Hwang JG and Nettleton D: Principal components regression with data chosen components and related methods Technometrics 2003; 45: 70-9.
- Abdi H: Partial least squares regression and projection on latent structure regression. Wiley Interdisciplinary Reviews: Computational Statistics 2010; 2: 97-106.
- Belinfante A and Coxe KL: Principal components regression–selection rules and application. Journal of American Economic Review 1986; 20: 429-431.
- Balajee R and Rajan MD: Molecular docking and simulation studies of farnesyl trasnferase with the potential inhibitor the flavin. Journal of Applied Pharmaceutical Science 2011; 1: 141.
- Noolvi MN and Patel HM: A comparative QSAR analysis and molecular docking studies of quinazoline derivatives as tyrosine kinase (EGFR) inhibitors: A rational approach to anticancer drug design. Journal of Saudi Chemical Society 2013; 17: 361-379.
- Materska M: Quercetin and its derivatives: chemical structure and bioactivity-a review. Polish Journal of Food and Nutrition Sciences 2008; 58: 407-413.
- Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP and Agrawal RK: Validation of QSAR models-strategies and importance. International Journal of Drug Design and Discovery 2011; 3: 511-519.
- PubChem Project: A databse for chemical compounds and substances. Available at: https://pubchem.ncbi.nlm. nih.gov/ 2004.
- Vujasinovic I, Paravic-Radicevic A, Mlinaric-Majerski K, Brajsa K and Bertosa B: Synthesis and biological validation of novel pyrazole derivatives with anticancer activity guided by 3D-QSAR analysis. Bioorganic and Medicinal Chemistry 2012; 20: 2101-2110.
How to cite this article:
Sharma PK and Vakil BV: Predictive QSAR analysis of flavonoid analogues as antipsoriatic agents. Int J Pharm Sci Res 2017; 8(12): 5146-60.doi: 10.13040/IJPSR.0975-8232.8(12).5146-60.
All © 2013 are reserved by International Journal of Pharmaceutical Sciences and Research. This Journal licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
P. K. Sharma and B. V. Vakil*
Guru Nanak Institute for Research and Development, G. N. Khalsa College, Mumbai, Maharashtra, India.
08 April, 2017
14 June, 2017
17 September, 2017
01 December, 2017