APPENDIX D   DATABASE FORMATS

 

IUPAC database

The IUPAC database (.iup or .txt) is a text file in which each line includes the IUPAC name of each molecule.

 

File format example:

Bis(2-naphthyl)methane
dinaphthalen-2-ylmethane
2-(naphthalen-2-ylmethyl)naphthalene
4H-benzo[d][1,3]dithiine
3,6,8-trioxabicyclo[3.2.2]nonane
2(10),3-Pinadiene
Perfluoro(1-methylperhydronaphthalene)
4-(Isopropylidenehydrazono)-2,5-cyclohexadiene-1-carboxylic acid
1-Ethylidene-5-(2-naphthyl)carbonohydrazide

 

SMILES database

The SMILES database (.smi or .txt) is a text file in which each line includes two fields (the SMILES string and the molecule name) separated by one or more spaces or tabs. If the molecule name contains spaces, it must be quoted by using double quotes.

 

File format example:

CC(=O)Oc1ccccc1C(O)=O "Aspirin (ASA)"
CC1=CN(C2CC(N=NN)C(CO)O2)C(=O)NC1=O AZT
CN1C(=O)c2c([n]c[n]2C)N(C)C1=O Caffeine
[NH3+][Pt]([NH3+])(Cl)Cl Cisplantin
Nc1ccc(cc1)S(=O)(=O)c1ccc(N)cc1 Dapsone
CN1C(=O)CN=C(c2cc(Cl)ccc12)c1ccccc1 Diazepam
CNCC(O)c1cc(O)c(O)cc1 Epinefrine
CC12CCC3C(CCc4cc(O)ccc43)C1CCC2O Estradiol
CC(C)Cc1ccc(cc1)C(C)C(O)=O Ibuprofen
CN(C)CCCN1c2ccccc2CCc2ccccc12 Imipramine
CN1CCCC1c1c[n]ccc1 Nicotine
CN(C)CC1CCC(CSCCNC(=C[n](:o):o)NC)O1 Ranitidine
CC1OC(OC2C([nH]:c(:[nH2]):[nH2])C(O)C([nH]:c(:[nH2]):[nH2])C(O)C2O)C(OC2OC(CO)C(O)C(O)C2NC)C1(O)C=O Streptomycin