Data Representations:

Molecules are typically represented as Kekulé diagrams showing atoms and bonds. With advancements in computational chemistry, machine-readable formats have been developed to enable faster computation, searching, and storage of molecular data. Over time, scientists have also developed various notations to represent different chemical and structural properties of compounds³.

1. SMILES (Simplified Molecular Input Line Entry System):

Atoms are denoted by their atomic symbols in SMILES notation, where the second letter of two-character symbols is written in lowercase. The organic subset's atoms (B, C, N, O, P, S, F, Cl, Br, and I) can be expressed without brackets, while other elements (such as [Fe2+]) need to be surrounded in brackets with their formal charges and associated hydrogens. Aromatic atoms are indicated by lowercase letters; for instance, "C" stands for a conventional carbon atom and "c" for an aromatic carbon. Single (-), double (=), triple (#), and aromatic (:) are the symbols used to represent chemical bonds; however, single and aromatic bonds are frequently left out. The atomic symbols are followed by numbers for the ring-opening bonds, which are used to illustrate cyclic structures, and parenthesis to denote branches in a molecule. Multiple SMILES strings can be used to represent a single molecule, however canonicalization techniques guarantee that each molecule is given a distinct SMILES string. For usage in machine learning models, SMILES strings are frequently transformed into one-hot vectors. SMILES is computationally more efficient than graph representations. However, some structural information might be lost since atomic connections are not explicitly encoded by SMILES. Furthermore, SMILES may produce erroneous molecules during molecule creation activities due to its syntax, which includes ring closures and atom valencies^27-29.

2. SELFIES (Self-Referencing Embedded Strings):

SELFIES is an innovative approach for representing molecular graphs as character strings, uniquely identifying molecules with 100% validity for every string. Unlike SMILES, which may produce syntactic errors like unbalanced parentheses or invalid ring identifiers, SELFIES uses a formal grammar with derivation rules to avoid such issues. Rings and branches are defined at a single location with special symbols (e.g., [Branch1], [Ring2]), and their sizes are determined by subsequent overloaded tokens, akin to function overloading in programming. For instance, [Ring2] specifies a branch length of (2 + 1), enabling precise representation of molecular structures. SELFIES is human-readable and widely applicable in tasks such as molecular fingerprint construction, similarity calculations, reaction detection, and drug discovery. Its robustness and compatibility with machine learning make it more suitable for molecular representation learning than SMILES. SELFIES functions as a simple programming language for chemistry, ensuring that all outputs represent valid molecular graphs without modifying underlying ML models. It supports isotopes, charges, radicals, chirality, and stereochemistry but is not yet capable of fully encoding macromolecules, crystals, or complex bonds. The SELFIES library provides core functions for seamless translation between SMILES and SELFIES, enhancing its utility in cheminformatics and AI-driven applications. For example, benzene can be encoded from SMILES to SELFIES and decoded back, demonstrating the method's reliability and versatility^30-32.

3. Group SELFIES:

The presence of groups, which are preset collections of atoms and bonds that represent molecular substructures with specific attachment locations, is the primary difference between SELFIES and Group SELFIES. In order to monitor the bonding capacity during decoding, these attachment points show how the group can make bonds and have a defined maximum valency. The encoder and decoder employ the indices assigned to attachment sites to navigate bonding relationships. Users must construct a "group set," a dictionary that associates group names with their corresponding definitions, in order to use Group SELFIES. This dictionary tells the decoder how to interpret group tokens and tells the encoder which groups to identify. Group SELFIES strings containing tokens from outside the current group set cannot be processed by the decoder, and each group set defines a distinct instance of Group SELFIES. Group tokens have the type [:S], where S is the initial attachment index, and are distinguished by a : prefix (for example, [:1parabenzene]). This identifier needs to be an alphanumeric string without dashes or numbers at the beginning. In order to direct the encoding process by establishing the order in which groups are identified, groups may optionally be given a priority value. Furthermore, each group can have an index overload value supplied, which specifies how the token should be numerically translated by the decoder as necessary³².

4. DeepSMILES:

SMILES strings are frequently used to encode molecular structures in deep neural networks, which are increasingly being used to create generative models for creating new molecules. These models produce SMILES strings that are intended to accomplish particular target attributes. Invalid molecules may result from the generated SMILES strings occasionally having mismatched parenthesis or improper ring closure symbols. To address these limitations, O’Boyle and Dalke introduced DEEPSMILES, a syntax better suited for deep generative models and automated inverse design. Instead of using two symbols to denote ring closures, DEEPSMILES uses a single symbol, which is a number that indicates how far back in the string the ring connects. One or more closing parenthesis are used to indicate branching, with the number representing the branch length. By reducing numerous common syntactical faults linked to SMILES, this simplified grammar improves generative models' resilience to errors and random mutations. DEEPSMILES strings can nevertheless generate semantically incorrect molecules that defy fundamental physical limits in spite of these advancements, underscoring the continuous need for a more reliable molecular grammar^33,34.

5. InChI (International Chemical Identifier):

The International Union of Pure and Applied Chemistry (IUPAC) developed the InChI system to provide a standardized notation for molecular structures, offering a canonical representation that assigns a unique string to each molecule. InChI encodes detailed molecular information, including distinctions between mobile and immobile hydrogens, but its complex syntax makes it challenging to interpret and less suited for generative modelling. Unlike SMILES, which allows multiple valid representations for the same molecule, InChI ensures a single canonical representation. This standardization simplifies database creation and enables efficient searching by mapping each structure to a unique identifier. Introduced in 2013 as open-source software, InChI strings consist of six main layers and various sublayers, each encoding specific molecular details, such as chemical formula, bonding, charges, and stereochemistry. InChI offers several advantages. Its canonical format facilitates linking in databases, and its layered structure hierarchically encodes molecular information, allowing derivatives of a molecule to share the same parent structure. InChI is also more expressive than SMILES, capturing features like hydrogen mobility and tautomeric equivalence. For instance, tautomers are represented by the same InChI string, while SMILES assigns different strings to each tautomer. Similarly, InChI consolidates resonance structures into a single representation, whereas SMILES produces multiple variants. However, InChI has notable limitations. Its complex hierarchical syntax is harder to read and interpret compared to SMILES, though familiarity improves comprehension. This complexity poses challenges for generative modeling, as the strict rules and syntax are difficult to implement in deep-learning frameworks. Additionally, the current InChI standard disconnects bonds to metal atoms, leading to the loss of stereochemical and bonding information, though future updates may address this issue. In practice, despite its expressiveness, InChI has been found less effective than SMILES in machine learning applications, likely due to its syntactic complexity and limitations in handling certain molecular features^35,36.

Models and Algorithms used to build AI:

Before the advent of deep learning, traditional machine learning models played a significant role in virtual screening, particularly for tasks like predicting drug-likeness, physicochemical properties, pharmacokinetics, and pharmacodynamics. Depending on the problem domain, various machine learning techniques were employed based on their underlying principles and capabilities, as described below:

Supervised Learning:

This approach follows a task-driven strategy, where algorithms are trained on labeled data to achieve specific objectives, such as classifying data or predicting outcomes. For instance, it can be used to identify spam emails. The most common supervised learning tasks are classification, which involves predicting categorical labels, and regression, which focuses on forecasting continuous values³⁷.

Unsupervised Learning:

This is a data-driven method that, in contrast to supervised learning, looks for patterns, structures, or insights in unlabeled data. Clustering, visualization, dimensionality reduction, association rule mining, and anomaly detection are typical unsupervised learning tasks^38,39.

Classification Technique:

For classification tasks, supervised learning models called Support Vector Machines (SVM), sometimes referred to as support vector networks, are used. Finding the ideal hyperplane that maximizes the margin between data points of various classes is how they operate. Although hyperplanes are capable of separating data, SVM ensures the most efficient separation by choosing the one with the largest distance between the nearest data points of each class. One deep learning technique called Convolutional Neural Networks (CNNs) is distinguished by its feedforward neural network design. Convolutional, pooling, and fully connected layers are the three primary types of layers that make up CNNs. The fully connected layer produces classification scores and carries out reasoning, the pooling layer minimizes the amount of trainable parameters to streamline computation, and the convolutional layer learns feature representations from the input data. CNNs eliminate the need for manual feature engineering by automatically extracting features from raw input, in contrast to conventional machine learning techniques. One kind of feedforward artificial neural network (ANN) made to handle sequential data is called a recurrent neural network (RNN). RNNs can maintain contextual information over time since they are made up of layers that are connected in a loop. This feature sets RNNs apart from conventional neural networks and makes them ideal for sequential data tasks like drug discovery and design. A discriminator plus a generator make up the deep learning framework known as Generative Adversarial Networks (GANs). While the discriminator determines if a sample is produced or real, the generator generates new data samples that closely resemble the training data. In contrast to other deep learning approaches and conventional machine learning methods, GANs perform exceptionally well in situations with small sample sizes, which makes them especially useful for producing synthetic data with realistic features⁴⁰.

Regression Analysis Technique:

Multiple Linear Regression (MLR) models the relationship between multiple independent variables and a dependent variable by fitting a linear equation using least squares to minimize prediction errors. Decision Trees (DTs) are nonlinear models for classification and regression, structured with nodes and branches that apply decision rules from the root to leaf nodes, representing outcomes. Logistic Regression (LR) predicts the probability of an event by modeling log odds and can be binary, nominal, or ordinal based on the response variable type.

Clustering Techniques:

A popular approach for grouping comparable data points into clusters is K-means clustering, which makes sure that the points in a given cluster are more similar to one another than to those in other clusters. It assigns each data point to the closest centroid after iteratively choosing a predetermined number of centroids (the average of all the points in a cluster). Until the assignments are constant, the process is repeated. In contrast, hierarchical clustering treats each data point as a separate cluster at first, then gradually combines the nearest clusters. A dendrogram shows the outcome of this operation, which is continued until every point is merged into a single cluster.

Dimension Reduction:

Principal Component Analysis (PCA) is a linear technique for reducing dataset dimensionality while preserving most variability by identifying principal components and transforming the data. The process includes standardization, computing the covariance matrix, extracting eigenvalues and eigenvectors, and selecting the main components. T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear method that visualizes high-dimensional data in 2D or 3D by converting similarities into probabilities and minimizing the Kullback-Leibler divergence between high- and low-dimensional representations^41,42.

Applications of AI in drug discovery:

1. Virtual Screening:

Virtual screening involves computationally analyzing large compound libraries to identify potential drug candidates, a key step in drug discovery. Traditional methods like molecular docking and pharmacophore modeling often relied on rigid structures, limiting predictive accuracy. Modern machine learning (ML) approaches provide a more flexible and precise alternative, leveraging large datasets to uncover complex patterns in ligand-target interactions. ML models, trained on annotated datasets of known interactions, can identify subtle structural and physicochemical features linked to binding affinity, enabling accurate predictions for novel compounds. They also integrate diverse data, including protein structures, gene expression, drug properties, and phenotypic changes, to enhance performance. Popular ML techniques in virtual screening include support vector machines (SVMs), random forests, and deep learning models^43,44.

2. Target Identification and Validation:

AI-driven drug discovery accelerates the identification and validation of molecular targets by analyzing diverse datasets, such as drug databases and public libraries, using advanced techniques like deep autoencoders, relief algorithms, and binary classification. AI excels in uncovering novel targets and hidden patterns in large datasets that traditional methods might miss, revealing new biological pathways with therapeutic potential. While experimental methods like affinity pull-downs and genome-wide knockdown screens are labor-intensive and resource-heavy, AI-based computational approaches streamline the process, reducing time, effort, and resource requirements while enhancing efficiency and success rates^45-47.

3. Protein structure prediction:

Accurate prediction of protein 3D structures is vital for structure-based drug design. Machine learning and deep learning, particularly models like AlphaFold, have revolutionized this process by leveraging extensive protein sequence and structural data. These models identify patterns linking amino acid sequences to 3D structures, offering a faster, cost-effective alternative to traditional methods. AI-driven techniques, including molecular dynamics simulations and graph machine learning, enhance drug discovery by analyzing protein-drug interactions, predicting efficacy, and exploring drug repurposing opportunities. This integration of AI into protein structure prediction accelerates drug development and improves outcomes in biomedical research⁴⁸.

4. Predictive Modeling for ADME Properties:

Technological advancements have revolutionized drug discovery, with AI-driven simulations enhancing R&D efficiency and aiding in designing novel drug candidates. Failures in later stages of drug development often stem from pharmacokinetic (ADME) issues. AI models address these challenges by enabling faster, cost-effective screening of large chemical datasets. Pharmacokinetics (PK) and pharmacodynamics (PD) are key to understanding drug efficacy and safety. AI simulations optimize drug dosing by considering patient-specific factors like age and weight, improving outcomes and minimizing side effects. These tools predict drug concentration-time profiles, supporting therapeutic drug monitoring and personalized treatment⁴⁹.

5. Lead Identification:

In compound screening, AI-powered virtual screening helps quickly find potential drug candidates from large compound databases. AI also makes planning chemical synthesis easier through automated retrosynthesis pathway prediction. Additionally, AI-based models are important in identifying cell targets and improving cell sorting, making the process of separating cells more efficient⁵⁰.

6. Prediction of drug–protein interactions:

Drug–protein interactions are crucial for therapy success, enabling drug efficacy, repurposing, and minimizing adverse effects. AI techniques like SVM, RF, and deep learning predict these interactions with high accuracy. Wang et al.'s SVM model, trained on 15,000 interactions, identified new compounds and targets, while Yu et al.'s RF models integrated pharmacological and chemical data to predict drug-target associations. AI aids drug repurposing by reducing costs and directly qualifying drugs for clinical trials. Approaches like SOMs, DNNs, and cellular network-based platforms (e.g., deepDTnet) identify new therapeutic uses, including topotecan for multiple sclerosis and inhibitors for viral diseases like SARS-CoV and HIV. These tools ensure faster drug discovery with reduced adverse effects. AI also addresses polypharmacology by predicting off-target effects, as seen in platforms like Ligand Express and KinomeX, which analyze drug selectivity and bioactivity. These innovations improve safety and support novel drug design⁵¹.

7. AI in Chemical Synthesis:

AI has revolutionized chemical synthesis in drug discovery, enhancing efficiency and precision by optimizing reaction conditions and enabling autonomous, error-free synthesis. Automation, real-time monitoring, and AI integration accelerate workflows and expand the potential for complex molecule synthesis. However, reliance on AI risks oversimplifying reaction complexities, highlighting the need for balanced integration with chemical expertise⁵².

8. Prediction of Drug Toxicity with AI:

Reducing clinical trial failures and increasing the effectiveness of drug research depend on the ability to predict drug toxicity during preclinical phases. Conventional approaches frequently suffer from small datasets and crude models. Machine learning (ML) and deep learning (DL), two AI-based techniques, use a variety of data sources, including chemical structures, biological pathways, and clinical data, to provide more accurate toxicity predictions. Key areas have been the focus of recent developments in AI-based toxicity prediction models:

Cardiac Toxicity (hERG Prediction):

ML algorithms like RF, SVM, and DL methods like CNN and GNN are used to predict hERG toxicity, a key marker for cardiac risks. Models like HergSPred and DeepHit have achieved high accuracy and ROC-AUC scores, outperforming traditional methods.

LD50 Prediction:

AI models replace animal-based acute toxicity tests, using tools like FP-ADMET and QuantitativeTox to predict lethal dose values with improved accuracy, sensitivity, and specificity.

Drug-Induced Liver Injury (DILI):

AI techniques, including RF, kNN, and DL frameworks like DeepDILI, achieve high accuracy in predicting DILI, reducing drug recalls and enhancing patient safety.

Carcinogenesis Prediction:

AI models like CapsCarcino and DeepCarc address challenges in identifying carcinogenic compounds, especially with sparse datasets, improving predictive performance and reducing reliance on animal studies. These AI-driven tools optimize toxicity prediction, accelerate drug development, reduce costs, and enhance patient outcomes^9,53-57.

9. De novo drug design:

AI has transformed de novo drug design, enabling the creation of novel drug-like molecules without relying on existing templates. By leveraging machine learning and deep learning, AI overcomes challenges like complex synthetic routes and bioactivity prediction. Generative models like VAEs and GANs, along with deep reinforcement learning (DRL), have shown success in generating molecules with desired properties and therapeutic targets. AI also enhances synthesis planning by identifying synthesizable structures and optimizing synthesis routes. Beyond small molecules, AI supports reaction prediction and mechanism exploration using techniques like DNNs and Monte Carlo tree searches, accelerating chemical space exploration. It also advances the understanding of protein-protein interactions (PPIs), a key area for therapeutic innovation. While challenges remain in bioactivity prediction and chemical space exploration, AI's integration into de novo design holds immense potential to accelerate the discovery of safe and effective drugs^45,58-59.

10. Clinical Trial Optimization:

AI tools are important in clinical trials because they help with identifying patient diseases, finding specific gene targets, and predicting how molecules will behave. They also improve how patients stick to their medication plans and make it easier to monitor risks, leading to more efficient and successful clinical trials^60-64.

CONCLUSION:

This review provided numerous applications of artificial intelligence (AI) in enhancing process of new drug discovery. AI uses different branches like machine learning (ML), deep learning (DL), and natural language processing (NLP). ML helps predict how drugs will interact with targets and check for potential side effects. DL models, such as those analyzing images and molecular structures, help understand complex biological processes. NLP extracts valuable information from research papers and clinical data to guide drug discovery. AI relies on data from various sources like PubChem, ChEMBL, DrugBank, and real-world evidence from patient records. This information is prepared in special formats like molecular graphs or 3D protein structures so AI models can process it effectively. With these tools, AI improves every step of drug discovery. It helps find potential drug candidates through virtual screening, designs entirely new molecules, predicts how drugs behave in the body, and even helps design better clinical trials. AI is also great at finding new uses for existing drugs, saving time and resources. In simple terms, AI brings together smart technologies, large datasets, and advanced models to revolutionize drug discovery. It is making the process faster, cheaper, and more reliable, leading to better treatments for patients.

REFERENCES

1. Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discovery Today. 2021; 26: 80–93. https://doi.org/10.1016/j.drudis.2020.10.010

2. Krosuri P, Prasuna GG. Artificial intelligence in drug discovery. Journal of Xidian University. 2023; 17(9): 734-753. http://xadzkjdx.cn/

3. Deng J, Johnson J, Yang Z. Artificial Intelligence in Drug Discovery: Applications and Techniques. 2021. http://dx.doi.org/10.48550/arXiv.2106.05386

4. Naman S, Sharma S, Baldi A. Reinventing Spice Authentication: Merging Artificial Intelligence Insights with Traditional Methods for Authentication of Cardamom. Res J Pharm Technol. 2024; 17(10): 4907-4.

5. Garza-Ulloa J. Artificial Intelligence: Predictive vs Generative vs New Mixing AI. Am J Biomed Sci and Res. 2024; 22(3): 491-500. www.biomedgrid.com

6. Sharma VK, Bharatam PV. Artificial Intelligence in Drug Discovery (AIDD). Current Research and Information on Pharmaceutical Sciences. 2022; 16(1): 3-7.

7. Zaeri N. Drug discovery for COVID-19 and related mutations using artificial intelligence. Res J Pharm Technol. 2023; 16(11): 5384–91.

8. Meenakshi K, Maragatham G. Computational intelligence in diagnosis and prognosis of gestational diabetes using deep learning. Res J Pharm Technol. 2019;12(8):3891–5.

9. Rehman AU, Li M, Wu B, Ali Y, Rasheed S, Shaheen S, et al. Role of Artificial Intelligence in Revolutionizing Drug Discovery. Fundamental Research. 2024. https://doi.org/10.1016/j.fmre.2024.04.021

10. Mishra DK, Awasthi H. Artificial Intelligence: A New Era in Drug Discovery. Asian Journal of Pharmaceutical Research and Development. 2021; 9(5): 87–92. http://dx.doi.org/10.22270/ajprd.v9i5995

11. Tabana Y, Babu D, Fahlman R, Siraki AG, Barakat K. Target identification of small molecules: an overview of the current applications in drug discovery. BMC Biotechnology. 2023; 23: 44.

12. Ha J, Park H, Park J, Park SB. Recent advances in identifying protein targets in drug discovery. Cell Chemical Biology. 2021; 28: 394–423.

13. Vora LK, Gholap AD, Jetha K, Thakur RRS, Solanki HK, Chavda VP. Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics. 2023; 15: 1916.

14. Visan AI, Negut I. Integrating Artificial Intelligence for Drug Discovery in the Context of Revolutionizing Drug Delivery. Life. 2024; 14: 233.

15. Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem 2023 update. Nucleic Acids Res. 2023; 51: D1373-D1380.

16. Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012 Jan; 40: D1100-7.

17. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Research. 2018; 46: D1074–D1082. https://doi.org/10.1093/nar/gkx1037

18. Irwin JJ, Sterling T, Mysinger MM, et al. ZINC: a free database of commercially available compounds for virtual screening. J Chem Inf Model. 2012; 52(7): 1757-68.

19. Chen, J., Swamidass, S. J., Dou, Y., Bruand, J., and Baldi, P. ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources. Bioinformatics. 2005: 21(22): 4133–4139. https://doi.org/10.1093/bioinformatics/bti671

20. Yamanishi Y, Araki M, Gutteridge A, et al. Drug Target Commons (DTC): a unified knowledgebase of drug targets and drug-target interactions. Nucleic Acids Res. 2011; 39: D1046-52.

21. Sorokina, M., and Steinbeck, C. COCONUT: The collection of Open Natural products. Journal of Cheminformatics. 2020; 12(1): 20. https://doi.org/10.1186/s13321-020-00424-9

22. Yang JJ, Chen Y, Suzek TO, et al. DGIdb: a database for genomic interactions with drugs and chemicals. Nucleic Acids Res. 2013; 41: D1069-75.

23. Srivastava R, Srivastava A, Kumar N. INPUT: Integrated Network for Proteins, Universes, and Targets for Systems Biology and Drug Target Identification. Database. 2014. https://doi.org/10.1093/database/bau092

24. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER Database of Drugs and Side Effects. Nucleic Acids Research, 2016; 44: D1075–D1079. https://doi.org/10.1093/nar/gkv1075

25. Kuhn M, Szklarczyk D, Franceschini A, et al. STITCH 4: improved prediction of protein-chemical interactions with increased coverage and reliability. Nucleic Acids Res. 2014; 42: D339-46.

26. Varadi, M., Anyango, S., Deshpande, M., et al. (). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 2022; 50: D439–D444. https://doi.org/10.1093/nar/gkab1061

27. David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform. 2020; 12: 56.

28. O’Boyle NM. Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI. J Cheminform. 2012; 4: 22. https://doi.org/10.1186/1758-2946-4-22

29. Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, et al. SELFIES and the future of molecular string representations. Patterns. Cell Press; 2022; 3(10): 100588.

30. Yuksel A, Ulusoy E, Unlu A, Dogan T. SELFormer: molecular representation learning via SELFIES language models. Mach Learn Sci Technol. 2023; 4(2): 025035.

31. Kosonocky CW, Feller AL, Wilke CO, and Ellington AD. Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches. Patterns. 2023; 4: 100865.

32. Cheng AH, Cai A, Miret S, Malkomes G, Phielipp M, Aspuru-Guzik A. Group SELFIES: a robust fragment-based molecular string representation. Digital Discovery. 2023; 2(3): 748–58.

33. O’Boyle NM, Dalke A. DeepSMILES: An adaptation of SMILES for use in machine-learning of chemical structures. Theoretical and Computational Chemistry. 2018. https://doi.org/10.26434/chemrxiv.7097960.v1

34. Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D. InChI, the IUPAC International Chemical Identifier. J Cheminform. 2015; 7: 23. https://doi.org/10.1186/s13321-015-0068-4

35. Pletnev I, Erin A, McNaught A, Blinov K, Tchekhovskoi D, Heller S. InChIKey collision resistance: An experimental testing. J Cheminform. 2012; 4: 29

36. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery. 2019; 18: 463–77.

37. Ajay I. Patel, Pooja K. Khunti, Amit J. Vyas, Ashok B. Patel. Explicating Artificial Intelligence: Applications in Medicine and Pharmacy. Asian Journal of Pharmacy and Technology; 2022; 12(4): 401-6.

38. Chen W, Liu X, Zhang S, Chen S. Artificial intelligence for drug discovery: Resources, methods, and applications. Molecular Therapy Nucleic Acids. 2023; 31: 691–702.

39. Sarker IH. AI Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Computer Science. 2022; 3: 158.

40. Sanjay S. Patel, Sparsh A. Shah. Artificial Intelligence: Comprehensive Overview and its Pharma Application. Asian Journal of Pharmacy and Technology. 2022; 12(4): 337-8.

41. Patel K, Nand K, Student P. Artificial Intelligence and its Models Artificial Intelligence Journal of Applied Science and Computations. 2020; VII(II): 95-97. https://www.researchgate.net/publication/339472454

42. Rojanala R. Algorithms, Models and Applications of Artificial Intelligence. International Journal of Scientific Research in Computer Science Engineering and Information Technology. 2019; 5(4): 2456-3307.

43. Serrano DR, Luciano FC, Anaya BJ, Ongoren B, Kara A, Molina G, et al. Artificial Intelligence (AI) Applications in Drug Discovery and Drug Delivery: Revolutionizing Personalized Medicine. Pharmaceutics. 2024; 16(10): 1328.

44. Holey VS, Baitule AW. A Wide Application of Artificial Intelligence in Pharma Field. Asian Journal of Pharmaceutical Research. 2024; 14(4): 403-0.

45. Patil P, Nrip NK, Hajare A, Hajare D, Patil MK, Kanthe R, et al. Artificial Intelligence and Tools in Pharmaceuticals: An Overview. Res J Pharm Technol. 2023; 16(4): 2075–82.

46. Bairagi A, Singhai AK, Jain A. Artificial Intelligence: Future Aspects in the Pharmaceutical Industry an Overview. Asian Journal of Pharmacy and Technology. 2024; 14(3): 237-6.

47. R. R. Kulkarni, P. S. Pawar. Artificial Intelligence in Pharmacy. Asian Journal of Pharmacy and Technology. 2023; 13(4): 304-6.

48. Fatima MJ, Parthiban C. Artificial Intelligence [AI] -The Game Changer in Pharmaceutical Industry. Asian Journal of Pharmacy and Technology. 2024.

49. Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. ChemBioChem. John Wiley and Sons Inc; 2024; 25(14): e202300816. https://doi.org/10.1002/cbic.202300816

50. Prusty A, Panda SK. The Revolutionary Role of Artificial Intelligence (AI) in Pharmaceutical Sciences. Indian Journal of Pharmaceutical Education and Research. 2024; 58: s768–76.

51. Mak KK, Pichika MR. Artificial intelligence in drug development: present status and future prospects. Drug Discovery Today. 2019; 24: 773–80.

52. Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discovery Today. 2021; 26: 80–93.

53. Adnan R. Ahmad. Chemical Reaction Prediction using Machine Learning. Research Journal of Pharmacy and Technology. 2024; 17(11): 5435-8. doi: 10.52711/0974-360X.2024.00831

54. Farghali H, Kutinová Canová N, Arora M. The Potential Applications of Artificial Intelligence in Drug Discovery and Development. Physiol. Res. 2021; 70(4): S715-S722,.

55. Nasnodkar S. Cinar B, Ness S. Artificial Intelligence in Toxicology and Pharmacology. Journal of Engineering Research and Reports. 2023; 25(7): 192-206.

56. Tran TTV, Surya Wibowo A, Tayara H, Chong KT. Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives. J Chem Inf Model. 2023; 63(9): 2628-2643.

57. Sudhaar P, SelvaPreethi S, Anjali G Artificial Intelligence in Clinical Trials- Future Prospectives. Bioequiv and Bioavailab. Int J. 2023; 7(1): 000196.

58. Meenakshi K, Maragatham G. Computational Intelligence in Diagnosis and Prognosis of Gestational Diabetes using Deep Learning. Research J. Pharm. and Tech. 2019; 12(8): 3891-3895. doi: 10.5958/0974-360X.2019.00669.3

59. Bora SJ, Chakravorty R, Gupta PD. The use of Artificial Intelligence in Pharmacy. Asian Journal of Pharmacy and Technology. 2023; 13(3): 229-4.

60. Ingale S, Shrisunder N, Gophane G, Birajdar A. Ascent of Artificial Intelligence (AI) in Pharmacy. International Journal of Technology. 2024; 14(1): 54-8.

61. Khirfan R, Kotb H, Atiyeh H. Utilizing Artificial Intelligence to Improve Patient Safety: Innovations, Obstacles, and Future Paths. Research Journal of Pharmacy and Technology. 2024; 17(9): 4630-6.

62. Mahajan BS, Mahale BSP, Pawar AR, Patil VV, Patil PS, Songire J. A Review on Artificial Intelligence in Pharmacy. Research Journal of Science and Technology. 2024; 16(2): 129-6.

63. Lakshmidevi Sigatapu, S. Sundar, K. Padmalatha, Sravya. K, D. Ooha, P. Uha Devi. Artificial Intelligence in Healthcare- An Overview. Asian Journal of Pharmacy and Technology. 2023; 13(3): 218-2.

64. Tahilani P, Swami H, Goyanar G, Tiwari S. The Era of Artificial Intelligence in Pharmaceutical Industries - A Review. Research Journal of Science and Technology. 2022; 14(3): 183-7.

Received on 29.12.2024 Revised on 22.06.2025

Accepted on 31.10.2025 Published on 10.02.2026

Available online from February 16, 2026

Research J. Pharmacy and Technology. 2026;19(2):970-978.

DOI: 10.52711/0974-360X.2026.00137

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License.

Databases	Description	Website URL	Launching Laboratory	References
Pubchem	Information about molecules, such as their chemical structures, IDs, physical and chemical characteristics, and biological activities, can be found in this freely available chemistry database.	https://pubchem.ncbi.nlm.nih.gov/	National Institutes of Health, USA in 2004	15
ChEMBL	a carefully selected database of bioactive compounds that resemble drugs. In order to help translate genetic knowledge into the development of potent new medications, it gathers chemical, bioactivity, and genomic data.	https://www.ebi.ac.uk/chembl/	European Molecular Biology Laboratory (EMBL-EBI), Cambridge, UK	16
DRUGBANK	A database with details on drugs, their targets, three-dimensional structures, and other pertinent information.	http://www.drugbank.ca/	Wishart Research Group at the University of Alberta, Canada.	17
ZINC	A database that contains details about an API	https://zinc.docking.org/	Irwin and Shoichet Laboratories San Francisco (UCSF), USA	18
ChemDB	A chemical database that includes almost 5 million small molecules available for purchase, along with their physical and chemical properties, either predicted or tested.	http://cdb.ics.uci.edu/	Brenk Group at the California Institute of Technology (Caltech).	19
DTC	A platform that uses crowd-sourcing to provide data on drug-target interactions and classifies the targets.	http://drugtargetcommons.fimm.fi/	Aalto University School of Science, Finland, and collaborators	20
COCONUT	a database with information on 407,270 different natural compounds, including their descriptions and molecular characteristics.	https://coconut.naturalproducts.net/	University of Hamburg, Germany	21
DGIdb	A database that offers information on drug-target interactions (DTI) and druggable genomes sourced from over 30 reliable platforms.	http://www.dgidb.org/	McDonnell Genome Institute at Washington University	22
INPUT	A platform for network pharmacology in traditional Chinese medicine that includes 29,812 substances derived from 4,716 Chinese plants.	http://cbcb.cdutcm.edu.cn/INPUT/	Department of Systems Biology and Translational Medicine, Texas A&M Health Science Center, USA,	23
SIDER	A database that offers details on approved medications and their associated adverse reactions.	http://sideeffects.embl.de/	European Molecular Biology Laboratory (EMBL), Heidelberg, Germany	24
STITCH	A database containing information on 9,643,763 proteins from 2,031 organisms, including known and projected interactions between compounds and proteins.	http://stitch.embl.de/	European Molecular Biology Laboratory (EMBL), Heidelberg, Germany	25
AlphaFold DB	A database contains high-quality predicted protein structures, covering the proteomes of many organisms, including humans. It is an essential tool for structural biology and the development of new drugs.	https://www.alphafold.ebi.ac.uk/	European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI)	26