INHIBPRED: A WEB SERVER FOR PREDICTING INHIBITORY ACTIVITY OF MOLECULES AGAINST HUMAN HDAC6 PROTEINHTML Full Text
INHIBPRED: A WEB SERVER FOR PREDICTING INHIBITORY ACTIVITY OF MOLECULES AGAINST HUMAN HDAC6 PROTEIN
S. Vijayasarathy * and J. Chatterjee
Department of Biotechnology, PES Institute of Technology, Bangalore - 560085, Karnataka, India.
ABSTRACT: Oral squamous cell carcinoma (OSCC) is the most commonly occurring malignancy of the oral cavity. Over the years, the incidence and morbidity rate of Oral Cancer patients has increased worldwide. In the last few decades, although oral cancer treatment modalities have advanced, the survival rate of oral cancer patients is very low. Thus, there arises a need for the identification of new drug targets besides the development of new and effective drugs for this disease. This paper describes the development of Quantitative Structure-Activity Relationship (QSAR) models using machine learning techniques such as Multiple Linear Regression (MLR), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM) for predicting the inhibitory activity (IC50) of anti-cancer compounds against HDAC6 protein. After data pre-processing and selection of relevant features, predictive models were developed, and the top two best performing models were selected based on the parameters such as Squared Correlation Coefficient (r2) and Mean Absolute Error (MAE). Moreover, based on this study, an open-source platform (www.inhibpred.com) has been developed, thus facilitating the identification of promising leads against this disease. Therefore, this free online resource will enable the drug discovery community to evaluate and know the potential of their compound library prior to actual synthesis and experimental testing, thus saving a lot of time, labour, and huge expenses
Oral Squamous Cell Carcinoma, HDAC inhibitors, Quantitative Structure-Activity Relationship, Machine Learning
INTRODUCTION: Oral Cancer (OC) is a subdivision of head and neck cancer 1. It is the sixth most common malignancy in Asia, with more than 2, 74, 300 new cases every year 2. India has the largest number of oral cancer cases thus, comprising of one-third of the total burden of oral cancer in the world 3. Globally every year, around 77,000 new cases and 52,000 deaths are reported, which is almost one-fourth of global occurrences 4.
In contrast to western countries, the burden of oral cancer is significantly higher in India. Around 70% of the cases are reported mostly in the advanced stages. Due to late detection of oral cancer and negligence of patients, the chances of cure are very low, with a five-year survival rate of about 20% 5. Oral squamous cell carcinoma (OSCC) accounts for 92-95% of all types of oral cancer. It can occur at various sites within the oral cavity such as lips, tongue, buccal mucosa, the floor of the mouth, palate, gingiva, and oropharynx 6.
The risk factors include excessive use of tobacco and alcohol, chronic inflammations, infection by human papillomavirus, betel quid chewing, poor oral hygiene, family history of Oral Cancer, and genetic predisposition.
The constant use of tobacco in many forms like cigarettes, gutka, hookah, mawa, zarda, kharra, khaini, bidi, etc., is the main cause of tumor growth in the oral cavity among the Indian population 7.
Potentially malignant disorders (PMDs) such as erythroplakia, leukoplakia, oral submucous fibrosis, dyskeratosis congenital, and lichen planus are some of the indicators of the preclinical phase of oral cancer; thus accurate and timely diagnosis is very important 8. Over the years, the incidence and morbidity rate of patients having Oral Cancer has increased continuously worldwide.
In the last few decades, although oral cancer treatment modalities have advanced, the survival rate of oral cancer patients is yet to improve significantly 9, 10. Recurrent treatment failures propose the need for the identification of new targets and development of novel and effective drugs for this disease.
Histone deacetylases (HDACs) are his tone-modifying enzymes involved in transcriptional repression via removal of acetyl groups from target histone molecules 11. His tone deacetylases are a promising class of anti-cancer drug targets that are capable of reversing abnormal epigenetic states related to cancer. Inhibition of HDACs is emerging as an important strategy in human cancer therapy, and HDAC inhibitors (HDACIs) enable his tone to maintain a high degree of acetylation 12. Cell proliferation, apoptosis, metastasis, invasion, and mitosis are some of the cell type-specific effects that they elicit 13.
The tone deacetylase isoform 6 (HDAC6) in humans plays an imperative role in motility of cells and aggresome formation. Most of the HDAC inhibitors present in the clinical or preclinical stage are non-selective inhibitors, have hydroxamate zinc-binding groups, and exhibit mutagenicity and off-target effects.
The identification of selective HDAC6 inhibitors with novel structural features or chemical properties has not been successful yet, and also due to the absence of crystallographic structural information that makes the rational design of HDAC6 selective inhibitors more challenging 14. Prospective studies show that HDACs play an important role in oral squamous cell carcinoma.
As His tone Deacetylases are over-expressed in cancer cells, inhibition of HDACs can be useful in inducing growth arrest and differentiation of tumor cells or specifically promote apoptosis. For example, HDAC6 deacetylates alpha-tubulin and increases cell motility. Over-expression of HDAC6 is associated with tumor growth and better rate of cell survival. Therefore, this enzyme could be considered as a marker for Oral Cancer prognosis 15. According to various studies, inhibition of HDAC6 has resulted in apoptosis as seen in several myeloma cells. Further, it is known that HDAC6 can contribute to metastasis of cancer as up regulation of this enzyme can escalate motility of breast cancer cells apart from interacting with cortactin.
Transcription and translation processes are also affected by HDAC6 through regulation of stress granules and heat-shock protein 90 (HSP 90) respectively 16. Also, this enzyme was found to be up regulated in OSCC, with an increase in advanced stages of cancer. Thus, selective inhibition of HDAC6 enzyme can be a likely method for the treatment of Oral Cancer and has been therefore selected for this study. Finding new drugs (Drug discovery) involving experimental screening of an enormous number of compounds is an intricate, expensive and laborious process 17. Therefore, the use of computational models such as Quantitative Structure-Activity Relationship predictive models could be an alternate approach.
The significant insights into the rational design of novel, as well as potent compounds, can be attained by studying the relationships between bioactivities of several similar compounds and their structures through the use of reliable and robust QSAR models. Since the conception of QSAR in 1960s, a keen interest has been observed for QSAR modelling in the drug discovery process to enable the design of potential drug candidates 18.
Its primarily used to predict biological activities, ADMET properties, thereby providing vital information necessary for drug development. QSAR modelling offers several advantages, of which important ones are: it reduces time, money, material resources required due to a smaller number of traditional tests, animal testing, etc. 19 Hence, it is commonly used in drug discovery in collaboration with pharmaceutical companies 20. The inhibitory activity of various HDAC6 inhibitors was predicted by QSAR modelling using machine learning techniques along with internal and external cross-validation in addition to developing a web server by integrating the best performing QSAR models using Java. To date, there is no web-based interface or web server for predicting the inhibitory activity of potential HDAC inhibitors targeting Oral Cancer protein based on molecular descriptors and machine learning techniques. This free, online web server will benefit the community of drug discovery and Bioinformaticians in gaining insights about the inhibitory activity of the potential HDAC inhibitors of interest prior to experimental studies, thereby saving time, labour, and huge expenses. Also, the developed QSAR model might pave the way towards unearthing of new HDAC inhibitors in addition to optimization of existing ones.
MATERIALS AND METHODS:
Data Set: A dataset comprising of forty HDAC6 inhibitors that are active in Pub Chem Bioassay 21 with known half-maximal inhibitory concentration (IC50) values against human HDAC6 were collected. The inhibitors displayed a wide range of activity (0.0004 µM to 0.661 µM). IC50 values were converted to negative logarithmic values to obtain observed/actual pIC50 values (i.e., pIC50 = - log10 IC50), in order to make it a dependent variable in the QSAR model. Further, the structure files were downloaded from Pub Chem database 22.
Calculation of Descriptors: Descriptors such as constitutional descriptors, connectivity indices, topological descriptors, functional groups, edge adjacency, molecular properties, walk and path count descriptor, 2D autocorrelation, topological charge indices, information indices, and burden eigenvalues were calculated using the E-Dragon tool (http://www.vcclab.org/lab/edragon/) 23.
Feature Selection: To eliminate bias from the predictions and for finding a computationally tractable set of descriptors/features, it is necessary to perform descriptor selection to remove redundant and highly correlated descriptors. To achieve this, invariable descriptors were removed and then Cfs Subset Eval module of WEKA has applied the dataset 24. The Cfs Subset Eval module, together with the best fit method, identifies the significant descriptors, by performing selection according to the predictive ability of each descriptor.
Model Building: Various machine learning algorithms such as Multiple Linear Regression (MLR), k-Nearest Neighbor (kNN) 25, and Library for Support Vector Machine (LibSVM) 26 were used to build the models on a training set in WEKA 3.6.8.
Evaluation of Models: The fitness of all the models was evaluated using the following statistical parameters.
Correlation coefficient (r) = ∑xiyi-(∑xi∑yi/N) / √ (∑xi2-(∑xi)2/N) (∑yi2-(∑yi)2/N) (1)
Mean Absolute Error (MAE) = ∑(yi-xi) / N (2)
Root Mean Squared Error (RMSE) = √ ∑ (yi-xi) 2 / N (3)
Where xi and yi represent actual and predicted IC50 (pIC50) values for the ith compound, N is a number of compounds 27. The models were validated using two different procedures, namely: internal and external cross-validation. For internal cross-validation, the Leave One Out (LOOCV) strategy was implemented in which one molecule is removed from the training set, and its activity is predicted on the basis of a model trained with the remaining molecules.
The process is repeated ‘n’ times until all the molecules appear in the test set once. Also, external validation was carried out by evaluating the performance of the generated model on a random set of ten compounds (independent test set).
RESULTS AND DISCUSSION:
QSAR Models: In this study, QSAR models were developed using various techniques on a training set of thirty HDAC6 inhibitors using MLR, kNN, and LibSVM techniques. QSAR models were developed using six molecular descriptors obtained from E-dragon after removing highly correlated descriptors Table 1.
TABLE 1: LIST OF DESCRIPTORS OBTAINED AFTER ATTRIBUTE / FEATURE SELECTION PROCESS IN WEKA 3.6.8
|X0Av||Average valence connectivity index of order 0 (Connectivity index)|
|BIC0||Bond Information Content index (neighbourhood symmetry of 0-order) (Information index)|
|JGI3||Mean topological charge index of order 3 (2D autocorrelation)|
|nArCNO||Number of oximes (aromatic) (Functional group count)|
|nArNR2||Number of tertiary amines (aromatic) (Functional group count)|
|Hypertens-80||Ghose-Viswanadhan-Wendoloski antihypertensive-like index at 80% (Drug-like index)|
The equation obtained for MLR is given below: pIC50 = -5.3313*X0Av - 13.3235*BIC0 + 7.3719*JGI3 - 1.0443*nArCNO + 1.4028* nArNR2 + 0.7771*Hypertens-80 + 13.5977 (4)
A correlation of r/r2 0.86/0.74 for MLR model was obtained between predicted, and actual value of pIC50 and MAE of 0.39 was obtained. Further, a kNN model achieved a high r/r2 of 0.98/0.96 between predicted and actual values of pIC50. R2 of 0.96 indicated that 96% variation in pIC50 values is explained by the descriptors in addition to having a low MAE of 0.06. Also, the QSAR model was developed on the same dataset using different kernels of Lib SVM, namely linear, polynomial, radial basis function, and sigmoid kernel. From Table 2, it’s evident that of all the kernels of Lib SVM, linear kernel performed better with r/r2 of 0.84/0.71 with a mean absolute error of 0.35. Further, leave one out cross-validation results showed that for MLR and kNN, Cross-validated squared correlation coefficient (q2) value was greater than 0.5, thus proving to be a reliable model. The results of actual predicted and residual pIC50 values of best performing MLR and kNN models are provided in Table 3.
TABLE 2: THE PERFORMANCE OF QSAR MODELS DEVELOPED BASED ON BEST DESCRIPTORS COMPUTED USING VARIOUS TECHNIQUES
|k Nearest Neighbour||0.98||0.96||0.06||0.15|
|Multiple Linear Regression||0.86||0.74||0.32||0.39|
|LibSVM (Linear kernel)||0.84||0.71||0.35||0.41|
|LibSVM (RBF kernel)||0.82||0.67||0.4||0.5|
|LibSVM (Sigmoid kernel)||0.81||0.66||0.44||0.57|
|LibSVM (Polynomial kernel)||0.8||0.64||0.52||0.65|
TABLE 3: ACTUAL, PREDICTED AND RESIDUAL PIC50 VALUES OF BEST PERFORMING MLR AND KNN MODELS
|S. no.||Compound ID||Actual||Predicted pIC50||Residual||Predicted pIC50||Residual|
|pIC50||(µM) values of||(µM)||(µM) values of||(µM)|
|(µM)||MLR based model||kNN model|
* indicates test set compounds
Among the limited literature available for HDAC6 involving QSAR study, HDAC1 inhibitors have achieved R2 value of 0.947, and MAE value of 0.173 for kNN with the usage of genetic algorithm variable selection method plus HDAC6 inhibitors yielded R2 value of 0.911and MAE value of 0.261 28. However, the kNN model that has resulted from this study for this dataset of HDAC6 inhibitors has achieved better results with a significant squared correlation coefficient (R2) of 0.96 and lower MAE = 0.06 and RMSE = 0.15, thus revealing that it is the best quality fit among all the models developed in this study.
Web Server: To date, there is no web server for predicting the inhibitory activity of potential HDAC inhibitors targeting Oral Cancer protein based on molecular descriptors and machine learning techniques; hence this is an effort in that direction. This web server has been developed under a Windows environment using JAVA. Thus, the top two performing models, namely kNN and MLR models, were integrated into a web server (Inhib Pred). This can be accessed at http://www. inhibpred.com/. A high-level architecture diagram for Inhib Pred is depicted in Fig. 1.
FIG. 1: HIGH-LEVEL ARCHITECTURE DIAGRAM
Input and Output: In this web server, the user can upload the file in .csv format containing the following attributes: X0Av, BIC0, JGI3, nAr CNO, NArNR2 and Hypertens-80 (obtained from E-dragon) along with its corresponding values for each compound. After uploading the file in the desired file format, the 'Submit' button has to be clicked to view the results (pIC50 value). Sample file is also available on the website for testing purposes. The result of the prediction is displayed on the same page after submitting the input file.
CONCLUSION: Thus, to tackle the oral cancer burden in India, critical factors include: Precautionary measures, early diagnosis, and timely treatment. Public awareness on the causes and fatalities of oral cancer along with the importance of quitting alcohol, tobacco, and maintenance of oral hygiene must be created as much as possible to control Oral cancer.
The QSAR models (MLR and kNN) obtained herein offered some interesting insights into the understanding of descriptors that contributed significantly to the stability of the model as well as for the inhibitory activity of HDAC6 inhibitors. Furthermore, it was found that, among all the methods, 2D QSAR model developed by kNN and MLR performed better in terms of predictivity with r2 of 0.96, MAE = 0.06 and RMSE = 0.15; r2 = 0.74, MAE = 0.32 as well as RMSE = 0.39 respectively. Hence these two models were integrated in an open-source platform.
This is the first open-source platform to the scientific community for discovering new drugs against human HDAC6 protein in addition to prediction of inhibitory activity of these anti-cancer compounds along with responsive web design across all platforms like phones, computers, and tablets. This online resource can be used to know in advance the activity of a given compound prior to actual synthesis, thereby aiding in the decision-making process, thus saving a lot of time, effort, and money involved in post-synthesis phases of drug development. Further, this web service will benefit the community involved in drug discovery and encourage other scientists to develop free software/web servers in the field of Computer-Aided Drug Discovery (CADD).
ACKNOWLEDGEMENT: The authors would like to thank Department of Biotechnology, PES Institute of Technology, Bangalore, for their support throughout the project work.
CONFLICTS OF INTEREST: The authors declare that there are no conflicts of interest.
- Sharma N and Om H: Data mining models for predicting oral cancer survivability. Netw Model Anal Health Inform Bio Informa 2013; 2: 285-95.
- Glick MBurket’s: Oral medicine peoples medical publishing house. USA Edition 2015: 12: 173-99.
- Borse V, Konwar AN and Buragohain P: Oral cancer diagnosis and perspectives in India. Sensors International 2020; 1: 100046.
- Laprise C, Shahul HP, Madathil SA, Thekkepurakkal AS, Castonguay G, Varghese I, Shiraz S, Allison P, Schlecht NF, Rousseau MC, Franco EL and Nicolau B: Periodontal diseases and risk of oral cancer in southern india. Results From The Hence Life Study Int J Canc 2016; 139: 1512-19.
- Veluthattil A, Sudha S, Kandasamy S and Chakkalakkoombil S: Effect of hypo fractionated, palliative radiotherapy on quality of life in late-stage oral cavity cancer a prospective clinical trial. Indian J Palliat Care 2019; 25: 383.
- Arrangoiz R, Cordera F, Caba D, Moreno E, de Leon EL and Muñoz M: Oral tongue cancer literature review and current management. Cancer Rep Rev 2018; 2: 1-9.
- Varshitha A: Prevalence of oral cancer in. India J Pharmaceut Sci Res 2015; 7: 845-48.
- Ajay P, Ashwinirani S, Nayak A, Suragimath G, Kamala K, Sande A and Naik R: Oral cancer prevalence in Western population of Maharashtra, India, for a period of 5 years. J Oral Res Rev 2018; 10: 11.
- Leemans CR, Braakhuis BJM and Brakenhoff RH: The molecular biology of head and neck cancer. Nat Rev Cancer 2011; 11: 9-22.
- Patil TT, Kowtal PK, Nikam A, Barkume MS, Patil A, Kane SV, Juvekar AS, Mahimkar MB and Kayal JJ: Establishment of a tongue squamous cell carcinoma cell line from indian gutka chewer. Journal of Oral Oncology 2014.
- Wu YW, Hsu KC, Lee HY, Huang TC, Lin TE, Chen YL, Sung TY, Liou JP, Wendy W, Verslues H, Pan SL and Huang Fu WC: A novel dual hdac6 and tubulin inhibitor, mpt0b451, displays anti-tumor ability in human cancer cells in vitro and in-vivo. Front Pharmacol 2018; 9: 205.
- Abdizadeh R, Hadizadeh F and Abdizadeh T: QSAR analysis of coumarin-based benzamides as histone deacetylase inhibitors using CoMFA, CoMSIA and HQSAR methods. Journal of Molecular Structure 2020; 1199: 126961.
- Li T, Zhang C, Hassan S, Liu X, Song F, Chen K, Zhang W and Yang J: His tone deacetylase 6 in cancer. Journal of Hematology & Oncology 2018; 11: 111-21.
- Goracci L, Deschamps N and Randazzo GM: A rational approach for the identification of non-hydroxamate hdac6-selective inhibitors. Scientific Reports 2016; 6: 29086.
- Sakuma T, Uzawa K, Onda T, Shiiba M, Yokoe H, Shibahara T and Tanzawa H: Aberrant expression of histone deacetylase 6 in oral squamous cell carcinoma. Int J Oncol 2006; 29: 117-24.
- Aldana-Masangkay GI and Sakamoto KM: The Role of hdac6 in cancer. Journal of Biomedicine and Biotechnology 2011; 2011: 875824.
- Chen J, Luo X, Qiu H, Mackey V, Sun L and Ouyang X: Drug discovery and drug marketing with the critical roles of modern administration. Am J Transl Res 2018; 10: 4302-12.
- Puzyn T, Leszczynski J and Cronin MT: Recent advances in qsar studies: methods and applications (challenges and advances in computational chemistry and physics. Springer Edition 2010; 1: 261-82.
- Martínez MJ, Razuc M and Ponzoni I: Mo De SuS a machine learning tool for selection of molecular descriptors in qsar studies applied to molecular informatics. Biomed Res Int 2019.
- Chen S, Xue D, Chuai G, Yang Q, Liu Q and FL-QSAR: A federated learning-based QSAR prototype for collaborative drug discovery. Bio Informatics 2020.
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, Shoemaker BA, Bolton E, Gindulyte A and Bryant SH: Pub Chem's Bio Assay Database Nucleic Acids Res 2012; 40: 400-12.
- Bolton E, Wang Y, Thiessen PA and Bryant SH: Pub chem integrated platform of small molecules and biological activities, annual reports in computational. Chemistry Elsevier Amsterdam 2008: 217-41.
- Tetko IV, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P, Palyulin VA, Radchenko EV, Zefirov NS, Makarenko AS, Tanchuk VY and Prokopenko VV: Virtual computational chemistry laboratory - design and description. J Comput Aid Mol Des 2005; 19: 453-63.
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P and Witten IH: The WEKA data mining software: an update. SIGKDD Explor News lett 2009; 11: 10-18.
- Aha D and Kibler D: Instance-based learning algorithms. Machine Learning 1991; 6: 37-66.
- Chang CC and Lin CJ: LIBSVM a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2001; 2: 1-27.
- Singla D, Anurag M, Dash D and Raghava GPS: A web server for predicting inhibitors against bacterial target GlmU protein. BMC Pharmacology 2011; 11: 5-14.
- Zhao L, Xiang Y, Song J and Zhang Z: A novel two-step QSAR modeling work flow to predict selectivity and activity of HDAC inhibitors. Bioorg Med Chem Lett 2013; 23: 929-33.
How to cite this article:
Vijayasarathy S and Chatterjee J: Inhibpred a web server for predicting inhibitory activity of molecules against human HDAC6 protein. Int J Pharm Sci & Res 2021; 12(8): 4400-06. doi: 10.13040/IJPSR.0975-8232.12(8).4400-06.
All © 2013 are reserved by International Journal of Pharmaceutical Sciences and Research. This Journal licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
S. Vijayasarathy * and J. Chatterjee
Department of Biotechnology, PES Institute of Technology, Bangalore, Karnataka, India.
28 August 2020
27 January 2021
19 May 2021
01 August 2021