13.3 Included scripts

 

13.3.1 Introduction

The VEGA ZZ package includes some scripts placed in ...\VEGA ZZ\Scripts directory with the following sub-directory structure:

Scripts
     _Templates (hidden folder)
  Ammp
  Build
  Calculation
     Molinspiration
Color
  Common
  Communication
  Database
  Docking
     AutoDock
    PLANTS
    Vina
    Other docking scripts
Examples
  File conversion
  Interaction surface
  Movie
Protein tools
    Homology modelling services
  PubChem
     Database rename
    Download
    Get
  QSAR
    Linear regression
    Virtual screening
Trajectory
Utilities

 

13.3.2 _Templates

This folder contains the templates used when a new script is created.

OpenGL.c 

Template for OpenGL C scripts.

    
Rebol.r  Template for REBOL scripts.
    
Stabdard.c Template for standard C scripts.
    
Window API.c Template window with Close button (Windows API version).
    
Window GraphApp.c Template window with Ok button (GraphApp GUI version).
    
Window GraphApp Calc.c Template window with Calculate button. Clicking this button, the main window hides and the abort dialog is shown. Pressing its Abort button, the calculation is stopped. This script template requires the GarphApp GUI.

 

13.3.3 Ammp

The scripts included in this directory, are useful to control some AMMP jobs in automatic way.

Automatic Boltzmann jump.c This script performs a conformational analysis of the current molecule in the workspace by Boltzmann jump algorithm. More in detail, it generates 1000 conformations at 500 K temperature and each of them is minimized by conjugate gradients algorithm (3000 steps, 0.01 RMS). The conformations are automatically saved into a DCD trajectory. After the conformational search, it is also performed a cluster analysis in order to discard the redundant conformations and to keep only the most significant conformers (one for each cluster). In particular, the conformations whose the differnce of the average value of the flexible torsion angles is no more than 60 degrees are included in the same cluster. Two files are automatically generated: a trajectory file (* - clust.dcd) with the best conformers of each cluster and a text file (* - clust.ene) with the energy of the best conformer and the number of conformers of each cluster.
All calculation parameters can be set by the user changing the parameters at the beginning of the script source code. For more information on the conformational search, click here.
    
Dipole.c

Compute the dipole momentum by AMMP. If the charges aren't assigned, they are fixed by Gasteiger - Marsili method (see AMMP's DIPOLE command).

    
Interaction analysis.c Evaluate the non-bond interaction energy between two molecules. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. For more information, see ANALYZE command in AMMP manual.
The results are shown in VEGA ZZ console. AMMP shows the energy for each atom in the selection range:
Vnonbon internal lys.n 137 Eq -12.860423 E6 -1.397398 E12 2.191076
Vnonbon external lys.n 137 Eq 16.632879 E6 -5.806829 E12 9.177713
Vnonbon total lys.n 137 Eq 3.772456 E6 -7.204227 E12 11.368790

where internal is intramolecular energy, external is the intermolecular (interaction) energy, total is the sum of intramolecular and intermolecular energies, Eq is the electrostatic (coulombic) energy, E6 and E12 are the Lennard - Johnes terms. At the end of the atom dump, AMMP shows also:

Vnonbon total internal 151.439880
Vnonbon total external 2.272158
Vnonbon total 153.712067
153.712067 non-bonded energy
153.712067 total potential energy

where Vnonbon total internal is the total intramolecular energy, Vnonbon total external is the total intermolecular (interaction) energy, Vnonbon total is the total non-bond interaction energy (it's the sum of Vnonbon total internal and Vnonbon total external). Non-bonded energy and total potential energy are self explaining.

Finally, the results (Vnonbond total internal, Vnonbond external and Vnonbond total) are copied to the clipboard.

    
Neural network.c

Use the AMMP's Kohonen neural network to find the 3D space filling curve corresponding to the structure.  If the charges aren't assigned, they are fixed by Gasteiger - Marsili method (see AMMP's KOHONEN command).

    
Rigid docking.c

Perform the genetic algorithm rigid docking using AMMP. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. This last molecule is moved to obtain the complex. Both molecules must have the hydrogens and the charges are automatically fixed (Gasteiger - Marsili method) if they are unassigned.
This script has a graphic user interface (provided by GraphApp library) and to understand the meaning of each field, it's strongly recommended to read the GDOCK documentation.

 

13.3.4 Build

By these scripts, it's possible to build complex structures:

Aromaticity fix.c Fix the bond order in aromatic rings, changing the alternated single and double bonds to partial double bonds.
    
Coordinate transformation.c This script applies the specified transformation matrix to all atoms or to visible/active atoms only (see Active atoms only checlbox). It's useful to build multimeric structures from the information included in the REMARK 300 and 350 tags of PDB files.
REMARK 300
REMARK 300 BIOMOLECULE: 1
REMARK 300 THIS ENTRY CONTAINS THE CRYSTALLOGRAPHIC ASYMMETRIC UNIT
REMARK 300 WHICH CONSISTS OF 2 CHAIN(S). SEE REMARK 350 FOR
REMARK 300 INFORMATION ON GENERATING THE BIOLOGICAL MOLECULE(S).
REMARK 350
REMARK 350 GENERATING THE BIOMOLECULE
REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN
REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE
REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS
REMARK 350 GIVEN BELOW.  BOTH NON-CRYSTALLOGRAPHIC AND
REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN.
REMARK 350
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: B, A
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000
REMARK 350   BIOMT1   2 -1.000000  0.000000  0.000000      174.00000
REMARK 350   BIOMT2   2  0.000000 -1.000000  0.000000      174.00000            
REMARK 350   BIOMT3   2  0.000000  0.000000  1.000000        0.00000

To build this homodimeric macromolecule:

  • Open the original PDB file.
  • Run Coordinate transformation script,
  • Put in the dialog window the values shown in red.
  • Click Apply button.
  • Reopen the original PDB file in the same workspace of the transformed structure and click Append in the dialog window.
    
Graphite.r Create one or more graphite planes.
    
Nanotube.r

Generates single-walled carbon nanotube (SWCNT) structures. It's based on VBS code developed by Roberto G. A. Veiga at Instituto de FÝsica - Universidade Federal de UberlÔndia (UFU) - Brazil, using the algorithm described in the article: White et al., Phys. Rev. B, 1993, Vol. 47, No. 9, pp. 5485-88.

   
Protein mutagenesis.c This script generates mutated proteins from a template structure and a list of mutations. As first step, it ask if you want to perform all possible permutation of the mutation or only one mutation for each column of the mutation file. The output molecules are stored in a database and an additional CSV file is also generated, containing the molecule names and their aminoacid sequence.
The file of the mutation list must include one mutation for each line in the following format:
ResName:ResNum:ChainID:MolNum List_Of_Aminoacids

where ResName is the name of the residue (max. 4 characters), ResNum is the residue number (max. 4 characters), ChainID is the chain identifier (1 character), MolNum is the molecule number (non zero, unsigned integer) and List_Of_Aminoacids is the list of the aminoacids that will be sequentially replaced (max. 20 characters, aminocid single character code). ChainID and MolNum are optional parameters, but it you want to specify the molecule number without to indicate the chain, you can use * as ID. # and ; at the beginning of each line can be used for remarks.

 

Example:

; Mutation list example

THR:3 EYF
SER:6:Y AL

It generates 6 mutants, involving the residues in 3 and 6: EA, YA, FA, EL, YL and FL.

 

Each mutated protein is automatically minimized by NAMD 2 (5000 steps), keeping the backbone fixed.

 

WARNING:
To run this script, NAMD 2 package and parm.prm parameter file must be installed. For more detail, click here and here.

    
Protonation fix.c Fix the protonation state of the molecule in the current workspace, removing the acid hydrogens (bonded to carboxylate, solphonate, phosphite and phosphate groups) and adding the basic hydrogens (to nitrogens of primary amines and guanidines).
    
Solvent cluster racemizer.c This script create a racemic mixture from a solvent cluster of chiral molecules built from a single enantiomer. The solvent cluster must be opened in the current workspace.
    
Stereoisomers.c Build all possible stereoisomers from a chiral molecule that must be opened in the current workspace. Diastereoisomers are automatically minimized (conjugate gradients, 3000 steps, toler 0.01).
For security reasons, the maximum number of chiral centers is limited to 8 (28 = 256 stereoisomers), but it can be incresed to 32 changing the VGP_MAX_CHIRAL_CENTERS and VGP_MAX_CHIRAL_CENTERSSTR definitions.

When you start the script, a file requester is show in which you can put the output format and the file name that is used as prefix, because each stereoisomer is named adding the configuration of all stereocenters. You must remember that if the bond order of the starting molecule is assigned in wrong way, the chirality attribution could be incorrect (according to Cahn-Ingold and Prelog rules).
    
Zero coord.c

Place the atoms at the specified coordinates. Checking Active atoms only, only the visible atoms are moved.

 

13.3.5 Calculation

This directory includes scripts for generic calculations:

APBS membrane energy.c Evaluate the energy required by a molecule to leave the hydration shell and to reach a biological membrane. This calculation is performed by APBS and both solvents are implicity defined by their dielectric constants (78.0 for water and 9.0 for membrane).
This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/
    
APBS solvation energy.c Evaluate the solvation energy of the molecule in the current workspace by APBS. The results are shown in VEGA ZZ console and copied to the clipboard.
This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/
    
Copy properties.c Copy some molecular properties to the clipboard in selective mode.
    
Database properties.c This script calculates several properties for all molecules included in a database as 3D structures. The script asks for a database in one of the VEGA ZZ supported formats as input and for a CSV file as output. During the calculation, a log file is also created in which all errors are recorded.
This script is especially useful for that database formats that don't include molecular properties such as Mol2, Sdf and Zip.
   
Druglikeness.c Check the druglikeness of the molecule in the current workspace. Two methods are used:

 

Lipinski's rule of five
This rule establishes that  an orally active drug must have:

  • not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms);
  • not more than 10 (2 x 5) hydrogen bond acceptors (nitrogen or oxygen atoms) ;
  • a molecular weight under 500 g/mol ;
  • a partition coefficient logP less than 5.

Ghose's rule
This rule establishes that  an orally active drug must have:

  • partition coefficient logP in -0.4 to 5.6 range;
  • molar refractivity from 40 to 130;
  • molecular weight from 160 to 480;
  • number of heavy atoms from 20 to 70.

The molecular refractivity is calculated according to the Ghose and Crippen method.

 

References:
Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J.
"Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings"
AdV. Drug DeliVery ReV. 1997, 23,3-25.

Ghose, A. K.; Viswanadhan V. N.; Wendoloski, J.J.
"A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases"
J. Comb. Chem. 1999, 1, 55 68.

    
Elecrostatic energy.c

Evaluate the electrostatic energy of the molecule in the current workspace. The default dielectric constant is 1 (vacuum).

    
Log kw IAM.MG/DD2.c Since the scale of log kwIAM values was frequently found to better mimic the drug/membrane interactions actually occurring in vivo than lipophilicity in n-octanol, this script implements a method to predict the kwIAM for both MG and DD2 chromatographic columns. In particular, you can estimate the retention time as log kw values for a molecule in the current workspace or, alternatively, for any molecule in PubChem database. The results can be copied to the clipboard and if the descriptors used for the prediction or the calculated log kw is out of prediction domain, warning messages are shown in the console.

You can choose between two prediction methods that use two different approaches to predict log P: the former, more accurate, is based on miLogP and requires to send the data to Molinspiration software and the latter, less accurate, is based on virtual log P and runs off-line. If you have to manage sensible data that you don't want to share on the Web, you should choose the second method.

The predictions miLogP-based exploit these two correlative equations:

log kwIAM.MG = -0.1405 + 0.4401 miLogP + 0.0536 HeavyAtoms - 0.0833 HLBM - 0.0435 FlexDihedrals
n = 204   r2 = 0.81   q2 = 0.80   SE = 0.438   F = 213.92   P < 1.0 10-8   PC = 39.403

log kwIAM.DD2 = -2.3989 + 0.4936 miLogP + 0.4354 Vdiam - 0.0640 HLBPSA - 0.0497 FlexDihedrals
n = 160   r2 = 0.85   q2 = 0.84   SE = 0.459   F = 212.94   P < 1.0 10-8   PC = 33.974

where:

FlexDihedrals  = number of flexible dihedral angles;
HeavyAtoms = number of heavy atoms;
HLBM = hydrophilic-lipophylic balance (HLB) as mean of HLBD, HLBG and HLBPSA;
HLBPSA

= hydrophilic-lipophylic balance (HLB) calculated as ratio between PSA and total surface;

Vdiam = volume diameter in .

The predictions virtual log P-based exploit these other correlative equations:

 

log kwIAM.MG = -0.3867 + 0.4159 VirtualLogP + 0.0741 HeavyAtoms - 0.0806 HLBG - 0.0657 FlexDihedrals

n = 205   r2 = 0.75   q2 = 0.74   SE = 0.501   F = 151.79   P < 1.0 10-8   PC = 51.739

 

log kwIAM.DD2 = -3.0812 + 0.4809 VirtualLogP + 0.5464 Vdiam - 0.0765 HLBPSA - 0.0829 FlexDihedrals

n = 161   r2 = 0.80   q2 = 0.79   SE = 0.523   F = 155.22   P < 1.0 10-8   PC = 44.319

 

where:

HLBG   =  hydrophilic-lipophylic balance (HLB) calculated with Griffin's method.

 

 

WARNING:
These two equation are valid only for neutral non-ionic molecules.

   
Mopac.r  Run multiple Mopac jobs.
XLOGP2.c Calculate the logP by XLOGP V2 method. The result is shown in VEGA ZZ console and copied to the clipboard.
This script requires X-Score 1.3 for Windows that is not included in VEGA ZZ package. To install X-Socre, read the X-Score script manual.

 

For more information about X-Score and XLOGP, visit http://www.sioc-ccbg.ac.cn/

 

13.3.5.1 Molinspiration

Scripts to access to Molinspiration services.

Database properties.c By this script, you can calculate all properties provided by Molinspiration Web site for each molecule of a database. As output, a CSV file is written. The calculated properties are:
  • smiles string (smiles);
  • number of atoms (natoms);
  • number of rotable bonds (nrotb);
  • number of violations of Lipinsky's rule (nviolations);
  • number of N and O atoms (H-bond acceptors, nON);
  • molecular volume (volume);
  • topological polar surface area (TPSA);
  • number of hydrogens bound to H and O atoms (H-bond donors, nOHNH);
  • log P (miLogP, logP);
  • molecular weight (MW).

WARNING:
The access to the service is limited to 1000 requests per month and when you reach this limit, an error message is shown.

    
miLogP.c This script gets the miLogP value of the molecule in the current workspace from Molinspiration Web server. If it's impossible to perform the calculation, an error message is shown.
    
TPSA.c This script gets the Topological Polar Surface Area (TPSA) of the molecule in the current workspace from Molinspiration Web server. If it's impossible to perform the calculation, an error message is shown.

 

 

13.3.6 Color

Scripts to color the molecule:

Color RasMol.c Color the molecule using the RasMol color scheme.
    
Color VMD.c

Color the molecule using the VMD color scheme.

 

13.3.7 Common

This directory contains the initialization scripts to include in REBOL scripts:

Fmod.r Fmod commands.
    
Formats.r File format keywords and other definitions.
    
Utils.r Functions for path manipulation.
    
Vega.r VEGA ZZ interface (don't change it without any real reason !).
    
Vutils.r REBOL/View utilities.

The C header files contained in this directory are hidden and they can't changed directly by VEGA ZZ environment.

 

13.3.8 Communication

This directory includes communication and Internet-related scripts:

ActiveSync VRML send.c

Convert the molecule in the current workspace to VRML and send it to mobile devices (e.g. PocketPC) using Microsoft ActiveSync. The molecule can be shown using a pocket VRML viewer (e.g. Pocket Cortona). Due to the mobile device hardware limits, don't transfer too complex molecules. The script requires Microsoft ActiveSync and it works with all Windows versions.

    
Download molecule from URL.c This script download a molecule from a given URL.
   
E-mail PDB send.c Save the molecule in PDB format, compress it and attach it to a user-editable e-mail. This script uses the MAPI layer and so it's compatible with MAPI compliant e-mail clients only (e.g. Outlook, Outlook Express, etc). To change the output format or other settings, see the script source code.
    
FTP put.r

Copy the molecule in the current workspace to a remote host via FTP.

    
IrDA VRML send.c

Convert the molecule in the current workspace to VRML and send it to mobile devices (e.g. PocketPC) over an infrared link. The molecule can be shown using a pocket VRML viewer (e.g. Pocket Cortona). Due to the mobile device hardware limits, don't transfer too complex molecules. The script requires at least Windows 2000.

    
Web server.r

Micro Web server for on-line manual.

 

13.3.9 Database

This directory includes scripts to manage databases:

Count functional groups.c This script creates a CSV file including the counts of the functional groups for each molecule in a database. The functional groups are recognized by GROUPS ATDL template. To check it, select Calculate « Charges & Pot. in the main menu, click GROUPS in Force field box and click the edit button at the to right corner of the dialog window. The resulting file can be useful for QSAR analysis.
    
Database expander.r It's a REBOL/View script to extract the molecules contained in a database to a directory. It allows to specify the file format, the compression and the save attributes (connectivity and constraints). 
    
Database logP.c Calculate the logP by Testa's MLP method for each molecule in the database and export the results in a CSV (Output file) file. The input must be a supported database (Input database) and its structures can be pre-processed adding the hydrogens (Add the hydrogens) applying the geometry method (default) or the bond order method (Use bond order). This last method is recommended if the molecules have an assigned bond order. In the pre-processing phase, the structures can be optimized by the steepest descend (Steepest minimization) and/or the conjugate gradients (Conjugate minimization) methods. For both minimization algorithm, it's possible to put the number of iterations (Steps), the toler value (Toler) and the dielectric constant (Dielectric). Checking Update the graphic, the 3D graphic output is updated every 20 minimization steps. Increasing the Dot density value, it's possible to make a better prediction of the logP. A good value is from 10 to 50 dots for 2.

 

Warning:

even if in the theory it's possible to manage a 2D database, adding the hydrogens by the bond order method and optimizing the structures, this procedure is not recommended because the distance geometry optimization is not performed. For this reason, a better choice is the conversion of the database from 2D to 3D (see the Database 2D to 3D.c script) and the resulting database can be used directly to predict the logP values.

    
Database volume.c Calculate the volume of each molecule in the database. It have the same options of the Database logP.c script.
    
Database to 0D.c Convert a 2D or 3D database to a 0D SDF database, translating all atoms at the specified coordinates, usually at (0, 0, 0).
    
DrugBank SDF fix.c The DrugBank SDF files aren't standard, because the header of each reacord has two lines only instead of three and the first line contains a tab character to delimit the molecule name from the DrugBank ID.
This script create a new file adding _fix.sdf suffix to the file name and fixing the files adding the missing line, removig the tab character and "SDF file of " string in the molecule name line.
    
Force field check.c This script assign the force field to each molecule in the database and checks if it is correctly assigned. An output file in the same directory of the database file is created and named as the database followed by - force field check.txt suffix.
This script is useful to check if there are problems in atom type assignment before to run a virtual screening calculation.
    
Mol2 merge.c Join two or more databases in Mol2 format into a new file. This script doesn't perform any change to the data and therefore it's extremely fast.
    
SDF merge.c Join two or more databases in SDF format into a new file. This script doesn't perform any change to the data and therefore it's extremely fast.
    
SDF metadata extractor.c This script extracts the metadata (e.g. InChi, SMILES, biological activity, etc) from a SDF file and puts it into a Comma Separated Values (CSV) file. The output file is placed in the same directory of the source database and its name is generated from it adding _meta.csv suffix.
    
SMILES to database.c This script converts the SMILES molecules of a CSV file to 3D and puts them in a database. The CSV file must have two fields for each line separated by a semicolon (;): the former must be the molecule name and the latter must be the SMILES string.
    
Splitter.c This script splits a database into more than one file, that can be useful to distrubute calculations on different PCs. The Input database must be in one of the formats supported by VEGA ZZ.
    
Subset creator.c Create a new database in SQLite format, including a subset of molecules of another database. The molecules must be specified in a text file in which molecule names (not ID) are placed one for each line.
The subset database is created in the same directory of the source preserving its name as prefix and adding _subset.db suffix. A log file is generated also in which  possible problems are reported.

This script was specially developed to prepare input databases for virtual screening studies.
    
ZINC get by ID.c This script downloads a structure from ZINC database to the current workspace by specifying the molecule ID. If the code is wrong or the entry doesn't exist, an error message is shown.

 

13.3.10 Docking

Scripts for molecular docking.

 

13.3.10.1 AutoDock

These scripts allow to prepare input files for AutoDock 4:

Box calc.c Calculate the box dimensions and its center coordinates containing the active (visible) atoms and show the results in the console. This script is useful to define a macromolecule region to dock ligands.
    
DLG to PDB multimodel.c Convert an AutoDock 4 DLG output to a standard PDB multimodel file, keeping in the remarks the energy information. This conversion is not required by VEGA ZZ that read DLG files as trajectories, but is needed by programs that are unable to manage this kind of files.
    
Ki calculator.c Evaluate the Ki and the interaction energy of a given ligand - receptor complex. This script is useful to recalculate the AutoDock 4 score after an energy minimization (e.g. performed by NAMD). This calculation requires at least two molecules in the workspace and atom constraints defining the region in which the AutoDock 4 grid maps will be calculated. The free atoms only are considered to define this region. If there are more than two molecules or the ligand is ambiguous, the script ask to specify the molecule ID of the ligand.
The results are shown in the VEGA ZZ console and copied to the clipboard.
    
Ligand.c Prepare and save the current molecule as receptor for AutoDock 4, performing these steps:
  1. If needed, add the hydrogens by protein method.
  2. If required, assign the atom charges.

If the molecule has two dimensions only, the 2D to 3D conversion is performed as explained below:

  1. Send the molecule to AMMP.
  2. Gauss-Siedel distance geometry optimization (15 steps).
  3. Steepest descent energy minimization (50 steps, toler = 1).
  4. Conjugate gradients energy minimization (3000 steps, toler = 0.01).
  5. Send the resulting structure to VEGA ZZ.

These steps are performed for both 2D and 3D structures:

  1. Fix the atom types, applying the AutoDock force field.
  2. Remove the apolar hydrogens.
  3. Save the molecule in PDBQT format.
    
Receptor.r Prepare and save the molecule in the current workspace as receptor for AutoDock 4, performing these steps:
  1. If needed, add the hydrogens by protein method.
  2. If required, assign the atom charges.
  3. Fix the atom types, applying the AutoDock force field.
  4. Remove the apolar hydrogens.
  5. Save the molecule in PDBQT format.
  6. Run AutoGrid4 to calculate the maps if the user confirms the operation.

The pre-defined docking box is set to explore the entire receptor, but if you want explore a specific protein region, you must select the atoms defining that region before to run the script.
The grid spacing is automatically adjusted if the number of grid points exceeds the 63 value because AutoGrid 4 and AutoDock 4 can't manage grid greater than 63x63x63 points.

 

13.3.10.2 PLANTS

These scripts are useful to manage PLANTS docking software.

Docking.c This script performs a molecular docking or a virtual screening calculation by PLANTS software, that must be installed as explained in the manual an in the PLANTS node of the script tree. The receptor and the ligand must be in Sybyl Mol2 format and if you want to run a virtual screening the ligands must be included into a Mol2 database (Mol2 multimodel format).

In the graphic interface, some parameters can be set:

  • Receptor
    File name of the target macromolecule (receptor) in Mol2 format.
  • Ligand
    File name of the ligand to dock in Mol2 format.
  • Output directory
    Directory in which the output files will be created. This field is automatically completed by selecting receptor and ligand.
  • Flex. residues
    List of residues (space separated, in ResNum format) whose side chain will be considered flexible. By clicking the Get button, the field is automatically filled the residues that are active in the current workspace.
  • Center
    X, Y, Z coordinates of the binding site center.
  • Radius (┼)
    Radius in ┼ of the sphere including the binding site atoms. Clicking Calc. button, Center and Radius fields are automatically filled considering as binding site the visible atoms of the molecule shown in the current workspace.
  • Clusters
    Number of solution clusters.
  • RMSD
    Root Mean Square Deviation for the cluster analysis.
  • Multimodel output
    Checking this gadget, it's possible to save all solutions in a whole multimodel file in Mol2 format.
  • Atom scoring
    This checkbox allows the scoring values of each atom to be saved in the Mol2 output. The atomic charges are replaced by scoring values.
  • Rigid ligand
    The ligand is kept rigid.
  • Shape constraints
    In these fields, you can specify the molecule and the weight that is used for the volume overlap calculation (the more ligand atoms overlap, the better). For more information, read the PLANTS manual.
  • Score
    Scoring function (chemplp, plp and plp95).
  • Search
    Search mode: speed1 (highest reliability, slowest settings), speed2 (good reliability, twice as fast as speed1) and speed4 (modest reliability, four time fast as speed1).

By clicking Run button, the calculation starts and a window is shown in which it's possible to stop the run by clicking Abort button.

 

WARNING:

if you close VEGA ZZ, the PLANTS calculation is not stopped, but when it finishes, the scripts doesn't convert the output files to be read directly by Microsoft Excel.

 

For more information about PLANTS, visit http://www.tcd.uni-konstanz.de/

 

PLANTS installation:

  • Complete the registration form in download page at http://www.tcd.uni-konstanz.de/
  • Download the PLANTS Win32 (minGW).
  • Rename the file name to Plants.exe and copy it to ...\VEGA ZZ\Bin\Win32 directory, where ...\VEGA ZZ is the VEGA installation directory.
  • Download mingwm10.dll and copy it to ...\VEGA ZZ\Bin\Win32 directory.

If you installed the 1.1 version built by Mingw32, it's strongly recommended to patch it by running Patch bin 1.1 script.

Patch bin 1.1.c This script applies a patch to PLANTS 1.1 binary (Mingw32 version) in order to fix S.O and S.O2 atom types that are defined in wrong way as S.o and S.o2.
A backup copy of the original version of PLANTS is made in ...\VEGA ZZ\Bin\Win32 directory (Plants.bak).
Receptor.c This script saves the receptor in the current workspace to be used in PLANTS calculations. In particular, it marks the backbone atoms and bonds by BACKBONE label that is required to consider the flexibility of the receptor side chains during the docking.

WARNING:
If you don't need to consider the receptor flexibility, you can save a normal Sybyl Mol2 file from VEGA ZZ main menu.
Rescore ChemPlp.c

 

Rescore Plp.c

 

Rescore Plp95.c

Evaluate the ligand - receptor interaction energy by ChemPlp, Plp and Plp95 scoring functions implemented in PLANTS. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand.
The results are shown in VEGA ZZ console and copied to the clipboard.
This script requires PLANTS for Windows that is not included in VEGA ZZ package.
RMSD calc.c This script calculates the root mean square deviation (RMSD) of a given set of poses obtained by a docking calculation. As reference structure, the first pose of each ligand is considered that, in case of PLANTS, is the best ranked. The script calculates also the RMSD (ALNRMSD) aligning each pose to the reference one (this is useful to evaluate the conformational changes between the poses) and the mean values of both type of RMSDs. You must specify only the database including the docking poses (without the target/protein) and the name of the output CSV file.

You can give also a database as input including both receptors and ligands. The script try to detect automatically the ligand and when it's not possible, a requester is shown.

 

13.3.10.3 Vina

These scripts allow to prepare input files and to run AutoDock Vina:

Docking.c This script performs a molecular docking calculation using AutoDock Vina. The receptor and the ligand files must be in PDBQT format and can be prepared by Receptor.c and Ligand.c scripts. In the graphic interface of this script, you can specify the following parameters:
  • Receptor
    File name of the target macromolecule (receptor) in PDBQT format.
  • Ligand
    File name of the ligand to dock in PDBQT format.
  • Output model
    File name of the ligand poses in PDBQT multimodel format. Remember that this file doesn't include the receptor structure.
  • Log file
    Vina log file name.
  • Center
    X, Y, Z coordinates of the binding site center.
  • Size (┼)
    Dimensions in ┼ of the cube including the binding site.
  • Exahaustiveness
    Exhaustiveness of the global search (roughly proportional to time).
  • Binding modes
    Maximum number of binding modes to generate.

For more information about AutoDock Vina, click here.

    
Ligand.c Prepare and save the current molecule as receptor for Vina, performing these steps:
  1. If needed, add the hydrogens by protein method.
  2. If required, assign the atom charges.

If the molecule has two dimensions only, the 2D to 3D conversion is performed as explained below:

  1. Send the molecule to AMMP.
  2. Gauss-Siedel distance geometry optimization (15 steps).
  3. Steepest descent energy minimization (50 steps, toler = 1).
  4. Conjugate gradients energy minimization (3000 steps, toler = 0.01).
  5. Send the resulting structure to VEGA ZZ.

These steps are performed for both 2D and 3D structures:

  1. Fix the atom types, applying the Vina force field.
  2. Remove the apolar hydrogens.
  3. Save the molecule in PDBQT format.
    
Receptor.c Prepare and save the molecule in the current workspace as receptor for Vina, performing these steps:
  1. If needed, add the hydrogens by protein method.
  2. If required, assign the atom charges.
  3. Fix the atom types, applying the Vina force field.
  4. Remove the apolar hydrogens.
  5. Save the molecule in PDBQT format.
    
Virtual screening.c This script performs structure-based virtual screenings by AutoDock Vina. To do them, you need:
  • the receptor structure in PDBQT format. You can prepare it from any type of file using Receptor.c script;
  • the database containing the ligands to screen. It must be in any format supported by VEGA ZZ (Microsoft Access, Merck MMD, Mol2 multimodel, ODBC data source, SDF file, SQLite and Zip archive). The database don't require to be prepared before the screening, because the script has the capability to detect the missing features and to fix them. In particular, it can add hydrogens using the best strategy, fix the atomic charges and to convert structures from 2D to 3D.

The graphic user interface of this script allows to setup the screening in easy way, changing the following parameters:

  • Receptor
    File name of the target macromolecule (receptor) in PDBQT format.
  • Energies
    Output file in localized CSV format containing the energy of the best pose for each ligand. The first column is the molecule progressive number (MolID), the second one is the molecule name (Name) and the third one is the Vina energy of the best pose (Energy).
  • Ligand database
    Database of the ligands to screen.
  • Output models
    File name of the Zip archive in which the poses in PDBQT multimodel format are stored. The script add a numerical suffix to file name that is incremented automatically every time in which the file size exceeds the limit of 2 Gb.
  • Log file
    Log file name.
  • Center
    X, Y, Z coordinates of the binding site center.
  • Size (┼)
    Dimensions in ┼ of the cube including the binding site.
  • Exhaustiveness
    Exhaustiveness of the global search (roughly proportional to time, default 8).
  • Binding modes
    Maximum number of binding modes to generate (default 1).


Clicking Calc button, Center and Size fields are automatically completed using the atoms selected in the current workspace that will be considered as binding site.
Clicking Save cfg, you can save the current configuration that can be restored clicking Load cfg. The resulting .vcf file is not compatible with Vina, while that generated by Docking.c maintains the compatibility (see --config option of Vina).

 

About the restart
The restart procedure is automatically performed if the energy CSV file is found. You can choose to restart the calculation or to run it from the beginning by a requester window.

For more information about AutoDock Vina, click here.

If you want to run a Vina docking calculation, follow these steps:

 

13.3.10.4 Other docking scripts

Here are other scripts for generic analysis.

APBS binding energy.c This script evaluates the binding energy of a given ligand - receptor complex. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand. The results are shown in VEGA ZZ console and copied to the clipboard.
This script uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/
    
Best score of isomers.c This script was developed with the aim to manage the docking results obtained when the database of ligands was expanded with stereoisomers, geometric isomers and tautomers of each molecule. It chooses the best isomer of a molecule on the basis of the best (lowest) docking score. When you run the script, you must put the input file in CSV format including the data (molecule name, scores etc) of all docked species, the output CSV file, the column with the ligand names and the column of the score.
The isomers are detected by name: they must share the same prefix followed by the underscore character ("_").
    
Contact surface.c This script measures the ligand/receptor contact surface (shared surface) in a complex. The results are automatically copied to the clipboard and are: contact surface, percentage of contact surface referred respectively to the ligand, receptor and complex surfaces. All data are expressed in ┼▓.
   
Fred2 scrore.c Calculate the interaction score of a ligand - protein complex using OpenEye's Fred2 docking software. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand. The scores extracted from Fred's outputs are: Chemgauss2, Chemscore, Plp, Screenscore, Shapegauus and Zapbind. The results are automatically copied to the clipboard.

 

Warning:

This script requires Fred2 installed on your PC. You can request/buy it at http://www.eyesopen.com/

   
Mean score of multiple poses.c This script calculate mean, minimum, maximum, range and standard deviation of docking scores for all poses of each ligand. When you run the script, you must put the input file in CSV format including the data (molecule name, scores etc) of all docked species, the output CSV file, the column with the ligand names and the column of the score.
The ligands are detected by name: they must share the same prefix followed by the underscore character ("_").
   
Mopac binding enthalpy.c This script evaluates the binding enthalpy with MOPAC, by performing the following steps:
  • the calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand;
  • the receptor is simplified keeping only the residues included in a spheroid of 3 ┼ around the ligand. The user can change this value in the script (VS_PROXIMITY constant);
  • the complex geometry is optimized until the termination criteria GNORM is achieved. By default, this value is set to 10 (see VS_MOPACKEYS_MIN constant in the script);
  • the heat of formation is evaluated for both ligand and receptor separated and complexed;
  • the binding enthalpy is obtained by subtracting the two energies;
  • the results are shown in VEGA ZZ console and copied to the clipboard, if requested by the user.

This script requires at least Mopac 2012 for Windows that is not included in VEGA ZZ package. For more information, see 2.1 Installation of optional components.

   
Rescore+.c This script recalculates the interaction scores between a given set of ligand poses in a database and a target molecule or, alternatively, between a ligand and a receptor both included in a trajectory file. To run the calculation, you must specify the Receptor file name, the Database including the docked ligands, the CSV output file to store the scores, the Log file in which are written the errors and finally one or more scoring functions. For more information about the scoring functions, you can consult the VEGA ZZ manual.

WARNING:
the database must contain ligand poses obtained by a previous docking calculation. This script doesn't perform any kind of docking calculation.
   
RPScore.c Calculate the RPScore of a given protein-protein complex or a trajectory of protein-protein complexes. In this second case, the results are saved to a CSV file. The complex or the trajectory must open in the current workspace.

This script is the VEGA ZZ implementation of the well known RPScore program. For more details:

http://www.sbg.bio.ic.ac.uk/docking/rpscore.html

Gidon Moont, Henry A. Gabb, and Michael J.E. Sternber, "Use of Pair Potentials Across Protein Interfaces in Screening Predicted Docked Complexes", PROTEINS: Structure, Function, and Genetics 35:364-373 (1999).
   
WarpEngine GRAMM extractor.c This script extracts one or more complexes from the output generated by GRAMM docking software used in WarpEngine parallel execution environment. To complete the extraction, the script asks you:
  • the database containing the ligands that were docked;
  • the receptor file in any format supported by VEGA ZZ;
  • the WarpEngine GRAMM output file (it have the .csv file extension);
  • the output directory;
  • the numbero of top ranked complexes that you want to extract.

By default, the script saves the complexes in IFF format and assigns CHARMM force field and Gasteiger-Marsili atom charges. These default parameters can be changed by editing the script code.

   
X-Score.c Evaluate the interaction score of a given ligand - receptor complex. This calculation requires at least two molecules in the workspace and if there are more than two molecules or the ligand is ambiguous, the script asks to specify the molecule ID of the ligand.
The results are shown in the VEGA ZZ console and copied to the clipboard.

This script requires X-Score 1.2 or 1.3 for Windows that is not included in VEGA ZZ package.

For more information about X-Score, visit http://www.sioc-ccbg.ac.cn/

To install X-Score package in VEGA ZZ enviroment:

  • Open the following Web site:
    http://www.sioc-ccbg.ac.cn/?p=42&software=xscore
  • Complete the on-line registration form.
  • Log-in with your credential and download X-Score package for Windows platform.
  • Open the tar file by WinRAR or other suitable software able to unpack tar gizipped files.
  • Extract xscore_win32.exe from xscore_win32\bin to ...\VEGA ZZ\Bin\Win32, where ...\VEGA ZZ is the VEGA ZZ installation path (usually C:\Program Files\VEGA ZZ).
  • Rename xscore_win32.exe to xscore.exe.
  • Extract parameter directory from xscore_win32 to ...\VEGA ZZ\Data directory. This last directory is hard to identify, because every Windows version creates it in a different place. To find it, open VEGA console from Start menu and type OpenDataDir.
  • Rename parameter to Xscore.
  • Now, you are ready to use X-Score script. If you want to use xscore.exe from command prompt, open VEGA console and use xs command, that is a shell script that fixes the environment variable required by X-Score.

 

13.3.11 Examples

This directory includes the example scripts:

Benzene In this folder, you can find several examples showing you how to build a benzene ring using different scripting languages (C-Script, JavaScript, PHP, Python, REBOL).
    
HyperDrive This folder includes C-scripts showing you how to use HyperDrive APIs.
    
Log kw IAM In this folder, there are minimalist codes in different scripting languages to calculate log kwIAM.DD2 and log kwIAM.MG of the molecule in the current workspace.
    
Command console.htm This script demonstrates how it's possible to control VEGA ZZ by JavaScripts in a HTML page.
    
Demo.bat Demo script.
    
Demo.r The same of the above, but written in REBOL.
    
Distances.r This REBOL script explains how to measure interatomic distances.
    
Graph.r Demo of the extended commands to manage the plots.
    
GraphApp demo.r Demo of the GraphApp GUI library.
    
Info.r  Show some information in the VEGA ZZ console.
    
Meshload.r Load & display a 3D rabbit mesh model.
    
Mini-XML demo.c Demo script of Mini-XML library.
   
MP3 player.r Minimalist mp3 player (fmod demo).
    
NAMD minimization.c This script shows how to use the NAMD helper to perform an energy minimization by NAMD 2. It requires only a molecule in the current workspace.
    
REBOL View\VEGA ZZ toolbar.r Show a REBOL/View toolbar to control the VEGA ZZ main features.
    
Requesters.r Simple demo of the VEGA ZZ built-in requesters.
    
VEGA GL.c Application example of VEGA GL commands.

 

13.3.12 File conversion

This directory includes scripts for file format conversion :

CSSR SOMFA export.c Export the current molecule in CSSR format readable by SOMFA.
    
CSV export.c Save the molecule in Comma Separated Values (CSV) format.
    
Format conversion.r 

This script performs the batch file format conversion of all molecules contained in a folder. Some parameters can be changed in the dialog window:

  • Source dir.
    Name of the source directory in which the converting files are placed. Click Open button to show the directory requester.
  • Destination dir.
    Name of the destination directory in which all converted files will be inserted. Click Open button to show the directory requester.
  • Output format
    Use this list to select the output format.
  • Compression
    Compression method (default none).
  • Add hydrogens
    - None
    No hydrogens will be added.
    - Generic
    Generic organic geometry-based method.
    - Generic BO
    Generic organic bond order-based method.
    - Nucleic acid
    Nucleic acid geometry-based method.
    - Nuc. acid BO
    Nucleic acid bond order-based method.
    - Protein
    Protein geometry-based method.
    - Protein BO
    Protein bond order-based method.
  • Include the connectivity
    If checked, the atom connectivity is included (if the file format supports it).
  • Include the atom constraints
    If checked, the atom constraints are saved into the file (if the file format supports it).
  • Normalize the coordinates
    If checked, the molecule is translated at the axis origin (0, 0, 0).
  • Assign the Gasteiger/Marsili charges
    If checked, the Gasteiger - Marsili atom charges are assigned.

    Clicking Convert button, the conversion starts and clicking Close the dialog window is closed.
    
PDB ren export.c Export the molecule in PDB format renumbering the atoms.
    
XYZ import.c Import XYZ files giving the possibility to adapt the filter to each sub-format.

 

13.3.13 Interaction surface

These scripts calculate and manage ligand-receptor interaction surfaces.

CHARMM interaction surface.c Calculate the CHARMM non-bond interaction energy of each ligand-receptor atom pair and project it on the Van der Waals surface. The user must enter the molecule ID/number to indicate the ligand.
    
Lipophilic interaction surface.c Calculate the lipophilic interaction of each ligand-receptor atom pair and project it on the Van der Waals surface. The user must enter the molecule ID/number to indicate the ligand.
    
MEP interaction surface.c Calculate the electrostatic interaction energy of each ligand-receptor atom pair and project it on the Van der Waals surface. The user must enter the molecule ID/number to indicate the ligand.
    
MLPInS color ramp.c This script normalizes the color ramp calculated by MLPInS interaction surface script, using the user-defined range of values. The normalization is useful to compare surfaces of different molecules using the same color scheme.
It recognizes MLPInS surfaces only and changes them selectively.
   
MLPInS interaction surface.c Calculate the MLP Interaction Score (MLPInS) of each ligand-receptor atom pair and project it on the Van der Waals surface. The user must enter the molecule ID/number to indicate the ligand.

 

13.3.14 Movie

Scripts to create movies.

Movie maker.c This script generates a movie file starting from the molecule in the current workspace, rotating it around one or more axis. The parameters that the user can change are: Output movie (file name of the output movie), Number of frames (number of frames to put in the trajectory),Preview (checking this gadget, the animation is shown in the main window not saving the output movie), X rotation (rotation in degrees around the X axis), Y rotation (rotation in degrees around the Y axis) and Z rotation (rotation in degrees around the Z axis).
Clicking Animate, the movie will be created. The codec requester is shown to select the required compression options. Take care choosing the Render mode because not all graphic cards supports the Hardware mode. The Software rendering is the most reliable even if it's unable to reach the Hardware quality.
    
Sec. structure anim.c This script generates a movie file starting from the peptide in the current workspace, changing the secondary structure. The parameters that you can change are: Output movie (File name of the output movie), Number of frames (number of frames to put in the animation), Preview (checking this gadget, the animation is shown in the main window not saving the output movie), Start Phi (starting value of the Phi dihedral angle), Start Psi (starting value of the Psi dihedral angle, Start Omega (starting value the Omega dihedral angle), End Phi (ending value of the Phi dihedral angle), End Psi (ending value of the Psi dihedral angle), End Omega (ending value of the Omega dihedral angle).
Click Animate to create the movie file. The codec requester is shown to select the required compression options. Take care choosing the Render mode because not all graphic cards supports the Hardware mode. The Software rendering is the most reliable even if it's unable to reach the Hardware quality.
For the most common Phi, Psi and Omega values, click here.

 

13.3.15 Protein tools

This directory includes the visualization scripts:

Aminoacid selector.r

Show the aminoacid by selection and/or by chemical/physical properties.

    
Dump backbone torsions.c Dump the phi and psi backbone torsions of a protein.
    
Fasta to text.r

Convert a Fasta into a text file. That's is useful to load it into Microsoft Excel.

    
HIS protonantion.c

Find the histidine protonantion state (on NE2 or on ND1) using the CHARMM potential and swap the hydrogens (e.g. H-NE2 to H-ND1) according to the hydrogen bond energy. If the energy difference between the H-NE2 and H-ND1 tautomers is more than 2.0 Kcal/mol the hydrogen is placed on the nitrogen realizing a structure with lower hydrogen bonding energy. The starting structure must have the hydrogens.

    
Move hydrogens to end.c

Move the hydrogens to the end of the atom list. In this way, you can obtain files split in two parts: the first one containing the heavy atoms and the second one, placed at the end, containing the hydrogens. As an example, that's useful to write mol2 files compatible with GOLD docking system.

    
Score.c Calculate the interaction score between a ligand and a generic target biomacromolecule. The ligand must be previously docked in the target structure. This calculation requires two molecules in the workspace: the first one must be the receptor and the second one must be the ligand.
The script can calculate:
  • Electrostatic energy (Coulomb).
  • Electrostatic energy with distant-dependent dielectric constant.
  • R6-R12 Lennard-Johnes non-bond energy using the CHARMM and CVFF force fields.
  • Hydrophobic interaction using the Broto-Moreau parameters with different distance functions (linear, square, cube and Ferm's function).

The results are automatically copied to the clipboard.

 

13.3.15.1 Homology modelling services

This folder includes on-line services for homology modelling.

FUGUE.htm FUGUE is a program for recognizing distant homologues by sequence-structure comparison. It utilizes environment-specific substitution tables and structure-dependent gap penalties, where scores for amino acid matching and insertions/deletions are evaluated depending on the local environment of each amino acid residue in a known structure. Given a query sequence (or a sequence alignment), FUGUE scans a database of structural profiles, calculates the sequence-structure compatibility scores and produces a list of potential homologues and alignments.

For more information, visit this Web site: http://tardis.nibio.go.jp/fugue/
    
I-TASSER.htm I-TASSER server is an Internet service for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER assembly simulations; function inslights are then derived by matching the predicted models with protein function databases. I-TASSER (as 'Zhang-Server') was ranked as the No 1 server for protein structure prediction in recent CASP7, CASP8 and CASP9 experiments. It was also ranked as the best for function prediction in CASP9. The server is in active development with the goal to provide the most accurate structural and function predictions using state-of-the-art algorithms.
    
Phyre 2.htm Protein Homology/analogY Recognition Engine.
    
ROBETTA.htm Robetta provides both ab initio and comparative models of protein domains. It uses the ROSETTA fragment insertion method (Simons et al. (1997) J Mol Biol. 268:209-225). Domains without a detectable PDB homolog are modeled with the Rosetta de novo protocol (Bonneau et al. (2002) J Mol Biol. 322:65-78). Comparative models are built from Parent PDBs detected by UW-PDB-BLAST or HHSEARCH and aligned by various methods which include HHSEARCH, Compass, and Promals. Loop regions are assembled from fragments and optimized to fit the aligned template structure (Rohl et al. (2004) Proteins 55:656-677). The procedure is fully automated.

For more information, visit this Web site: http://robetta.bakerlab.org/
    
SWISS-MODEL.htm SWISS-MODEL is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modelling accessible to all biochemists and molecular biologists worldwide.

For more information about the service, visit: http://swissmodel.expasy.org/

 

13.3.16 PubChem

 

PubChem-related scripts. They requires an Internet connection.

 

13.3.16.1 PubChem database rename

Scripts to rename the molecules in a database.

By CID.c This script allows to rename all molecules in a database according to CID code. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.
    
By IUPAC.c This script allows to rename all molecules in a database according to IUPAC name. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.
    
By name.c This script allows to rename all molecules in a database according to the most common name in PubChem. A log file containing the errors is automatically created in the database directory by adding "- rename.log" as suffix to the database file name.

 

13.3.16.2 PubChem download

Multiple by CID.c This script download multiple molecules to a directory by specifying their CID in a CSV file (with semicolon separated fields). This file must contain the first line with the labels, the first column with CIDs and, optionally, a second column with the molecule names that are used for the files.
The molecules are downloaded in 3D SDF format and if an error occurs, it is reported in the log file that has the same prefix of CSV one and " - download.log" as suffix.

CSV file example with CIDs only:

CID
10075246
10110916
10111186
10114637

CSV file example with CIDs and names:

CID;Name
10075246;"Mol 1"
10110916;"Mol 2"
10111186;"Mol 3"
10114637;"Mol 4"
    
Multiple by name.c This script download multiple molecules to a directory by specifying their name in a text file. This file must contain the name of the molecules to download one for each line. The molecules are downloaded in 3D SDF format and if an error occurs, it is reported in the log file that has the same prefix of the input one and " - download.log" as suffix.

Text file example:

Ethanol
Benzene
Aspirin
Phenol
    
Single by CID.c Download a structure from PubChem to the current workspace by specifying the CID code. If the code is wrong, an error message is shown.
    
Single by name.c Download a structure from PubChem to the current workspace by specifying its name. If the molecule is not available, an error message is shown.

 

13.3.16.2 PubChem get

CID.c This script asks PubChem the CID codeof the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.
    
IUPAC name.c This script asks PubChem the IUPAC name of the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.
    
Name.c This script asks PubChem the name of the molecule in the current workspace. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.
    
Multiple IUPAC names.c This script ask PubChem to obtain the IUPAC names of the molecules by specifying their CID in a CSV/text file. The first line of this file can be the column label (not mandatory). The IUPAC names are stored in a CSV file that can be specified by the user.

Input file example:

CID
243
3339
128563
3236

Output file example:

CID;IUPAC
243;"benzoic acid"
3339;"propan-2-yl 2-[4-(4-chlorobenzoyl)phenoxy]-2-methylpropanoate"
128563;"methyl (2S,4aR,6aR,7R,9S,10aS,10bR)-9-acetyloxy-2-(furan-3-yl)-6a,10b-dimethyl-4,10-dioxo-2,4a,5,6,7,8,9,10a-octahydro-1H-benzo[f]isochromene-7-carboxylate"
3236;"1-(4-ethylphenyl)-2-methyl-3-piperidin-1-ylpropan-1-one"
    
XLogP.c This script gets the XLogP name of the molecule in the current workspace from PubChem. If the molecule is not included in the database, an error message is shown. The molecule is identified by submitting its SMILES string.

 

13.3.17 QSAR

 

Scripts for QSAR.

 

Data normalizer.c The script normalizes the values of the specified columns in 0-1 range of a given spreadsheet in CSV format, assuming that the first row is the header of each column. The output spreadsheet is saved using "- normalized.csv" extesion to the file name.
   
Principal component analysis.c This scripts performs the Principal Component Analysis of a given dataset in CSV format. You can chose the columns to include in the matrix to be analyzed. The script saves two files: the first one includes statistical data for each selected column such as the mean of the values, their standard deviations and the PCA results such as the eigenvalues, the eigenvectors and the coefficients to project the data in the PCA space, whose values are in the second file. The PCA calculation is done only for the first three principal components.
    
Table join.c This script joins two or more tables in CSV format. That's useful when the number of colums/rows is too large to be managed by Microsoft Excel. You can select an unlimited number of tables/spreadsheets and you can specify individually the join position (Bottom or Right). The output file is automatically saved when you stop to add other spreadsheets clicking Cancel in the file requester and its name is obtained from the first file by adding - join.csv extension.
    
Training and test set creator.c This script helps the user to create a random training set and test set from a given data set in CSV format. That's useful to validate a QSAR model, by calculating the linear regression of the training set and using the test set to predict the dependent variable.
This script writes two CSV files as output for the training set and for the test set, respectively adding to the file name "- training" and "- test".

 

13.3.17.1 Linear regression

Scripts for linear regression.

Automatic linear regression.c This script generates automatically all possible multiple regression models by these steps:
  • Selection of the best independent variables by calculating the correspondent equation with a single regressor. Regressions with R2 value less than 0.10 determine automatically the exclusion of the independent variable. If the number of found variables is less than the maximum number of regressors, the "desperate mode" is automatically enabled and 50% of the best variables are selected.
  • Identification of collinear independent variables by calculating the Variance Inflation Factor (VIF) value for each regressor pair. Variable pairs with VIF > 5.0 are considered collinear and aren't not considered in the model calculation.
  • Calculation of the models with a number of regressors from one to a user-defined value (default 3).
  • For each model, a cross-validation procedure (leave-one-out) is performend and the prediction power is shown ad Q2. If the number of observations is more than 200, the script asks to confirm the cross-validation.

The script requires a CSV file as input, that can be exported from your preferred spreadsheet software (e.g. Microsoft Excel) and generates an output file with the same prefix of the input followed by - regression.txt as name. The output file includes some information as the best independent variables, the collinear variable pairs, all regression models and the best regression models (three for each number of regressors).

   
Linear regression.c This script performs the multiple linear regression and requires a CSV file as input. In two steps, you can select the dependent variable (usually the activity) and the independent variables from the list built from the first row in the spreadsheet.
    
Model validator.c This script allows the QSAR models to be validated by splitting randomly the whole dataset in a number of training and test set pairs. For each training set, the regression coefficients are calculated to evaluate the test set in terms of standard deviation of errors, angular coefficient, intercept and r2 of the trend line of the chart of the predicted vs. experimental activities.
To use this script, you must specify the file containing the data of the regression analysis that must be in CSV format and can be exported in easy way from your preferred spreadsheet. Thus, you must select the dependent variable (usually the activity) and the independent variables of the QSAR model that you have found previously for example by Automatic Linear Regression script. Finally, you must put the number of molecules of the training set and the number of random trials. At the end of the calculation, a CSV output file is written in the same directory of the data file by adding "- validation.csv" suffix to the original file name. This output can be opened by a spreadsheet and it includes columns as shown below:
  • Trial
    Progressive number of the trial.
  • Rsq
    Multiple correlation coefficient of the training set (r2).
  • RsqAdj
    adjusted r2 if the training set.
  • PC
    Amemiya Prediction Criterion of the training set.
  • P
    Probability of the training set.
  • F
    Fisher F statistic for regression of the training set.
  • StdDevOfErrs
    Standard deviation of errors (SE) of the training set.
  • Test_MeanErr
    Mean error in prediction of the test set.
  • Test_StdErrOfErrs
    Standard deviation of errors (SE) of the test set.
  • Test_M
    Angular coefficient of the trend line of the chart of predicted vs. experimental activities.
  • Test_B
    Intercept of the trend line of the chart of predicted vs. experimental activities.
  • Test_Rsq
    r2 of the chart of predicted vs. experimental activities.
  • Test_PC
    Amemiya Prediction Criterion of the test set.
  • Test_P
    Probability of the test set.
  • Test_F
    Fisher F statistic for regression of the test set.
  • Intercept
    Intercept of the regression equations.
  • Coefficients of the independent variables
    The list of the coefficients for each regressor.

 

The output file includes also the mean (Mean) and the standard deviation (StdDev) of the previous columns and the labels of the columns selected as dependent (DepVar) and independent (InDepVar) variables.

 

13.3.17.2 Virtual screening

Scripts for the analysis of virtual screening results.

 

Automatic model builder.c The script uses the same approach of Model builder.c, introducing the automatic selection of the variables to obtain the best mathematic models in terms of enrichment factors. You can select the activity, the independent variables/molecular descriptors and scores to be combined to obtain the maximum enrichment factor. Although this script has a parallel design, it could require a long time to complete the calculation, especially when you select a large number of equation terms (more than three). You can also specify the threshold for the detection of active and inactive compounds, the number of variables used to build the models and the cluster size for the cluster analysis.
The results are sorted from best to worst enrichment factor and saved in a CSV file that can be analyzed by your preferred spreadsheet. The output file (named prefix - model.csv) includes several columns:
  • ModID = identification number of the model that are ranked by ModelScore (from the best to the worst);
  • NV = number of variables/scores used to build the model;
  • Active_N = number of active molecules in the first N percentile;
  • ActivePerc_N = percentage of the active molecules in the first N percentile;
  • EF_N = enrichment factor in the first N percentile;
  • Kurtosis = kurtosis of the histogram profile obtained by cluster analysis;
  • Skewness = skewness of the histogram profile obtained by cluster analysis;
  • ModelScore = score of the model (larger = better);
  • Model = equation of the model with calculated coefficients (if you selected more than one score);
  • ScoreMin = minimum score evaluated by the model;
  • ScoreMax = maximum score evaluated by the model;
  • Clustrer_N = percentage of active molecule in each cluster.

This script performs also the validation of the best models by building five pairs of training and external sets (with 70/30 % ratio) from the starting dataset. Training set is used to recalculate the models and external set to predict the activity. The results of this analysis are saved to prefix - valitadion.csv in which are present the same data as for the models obtained from the whole dataset with the exception of population of the clusters. The headers of the columns are named with ts and es prefix to identify respectively the training and the external sets.

    
CSV to SVM light.c This script converts a standard CSV file to SVM Light format. It requires the molecule names as first column and an activity / dependent variable column that you can choose by a requester. Moreover, you can select also the dependent variables that are exported to the output file.
For more information, read http://svmlight.joachims.org/.
    
Enrichment analysis.c This script helps to to setup a virtual screening calculation by analyzing the enrichment factor that you can obtain by screenings on sets including true-active and decoy molecules. The data must be in CSV file format and you can select the activity and score columns. Moreover, you can also specify the activity threshold to indicate when a molecule must be considered active or not and the cluster size for the cluster analysis. The script sorts the rows in ascending order on the basis of the score/property used to predict the activity, thus performs the cluster analysis showing the results in a bar plot. If the score/property can successfully detect the active compounds, they must be ranked at the top of the sorted list populating the first clusters. The enrichment quality is evaluated in terms of skewness and kurtosis. In particular, a kurtosis value close to zero indicates a Gaussian distribution, otherwise an high value is synonym of an asymmetric curve. The aim of this kind of analysis is to obtain an highly asymmetric curve translated on the left of the plot and this result can be obtained when the kurtosis value is high. Just to give you an idea, kurtosis values less then 5 can be considered poor and, on the contrary, values greater than 5 are good.
    
Model builder.c This script can be used to improve the enrichment factors of a virtual screening analysis. More in detail, it allows a new scoring function to be obtained, resulting from the linear combination of two or more user-defined descriptors such as docking scores and molecular properties. The coefficients of this first-degree equation are calculated by maximizing the number of the active compounds in the top of the list in which the molecules are ranked by the score calculated through the new equation. The maximization is performed by the gradient-free Hooke-Jeeves algorithm and, in order to avoid local maxima, a random sampling is also applied. As input, a CSV file is required, containing one activity and several score/properties columns that you must select. Moreover, you must also specify the activity threshold to indicate when a molecule must be considered active or not. The output is shown in the VEGA ZZ console as in the following example:
  File name.....................: bestranking.csv
  Activity column...............: ACTIVITY
  Activity range................: 0.00 - 1.00
  Activity threshold............: 0.50
  Number of molecules...........: 2513
  Number of active molecules....: 38
  Max. minimization steps.......: 5000
  RMS to stop minimization......: 0.001
  Random sampling steps.........: 36
  Random selection probability..: 1.51 %

  Score = 1.0000 SCORE_0000 + 0.2309 SCORE_+000 - 0.5851 SCORE_0+00
   Top %  Mols   Act  Act %    EF
  =================================
    1.00    25     4  16.00  10.58
    2.00    50     6  12.00   7.94
    5.00   125    13  10.40   6.88
   10.00   251    18   7.17   4.74
   20.00   502    24   4.78   3.16

The coefficients of the equation are divided by the coefficient of the first term.

 

13.3.18 Trajectory

It contains scripts for trajectory management.

Anim maker.c This script generates a trajectory file starting from the molecule in the current workspace, rotating it around one or more axis. That's useful to create video files. The parameters that the user can change are:
  • Output trajectory
    File name of the output trajectory. In the file requester, is it possible to select the output format (default Gromacs XTC).
  • XTC comp. (1-6)
    Gromacs XTC compression ratio. It has a meaning only if the Gromacs XTC format is selected as output (default 3).
  • Save the animation
    Check this gadget to save/render the animation (e.g. avi, mpeg, etc).
  • Number of frames
    Number of frames to put in the trajectory (default 50).
  • X rotation
    Rotation in degrees around the X axis (default 0). Negative values are allowed.
  • Y rotation
    Rotation in degrees around the Y axis (default 360). Negative values are allowed.
  • Z rotation
    Rotation in degrees around the Z axis (default 0). Negative values are allowed.
  • Animate
    Push this button to create the animation trajectory.
     
    
APBS trajectory.c This script calculates the solvation energy for each frame included in a MD trajectory and save the values in a CSV file. It uses APBS for Windows that is included in VEGA ZZ package. APBS is a software for modeling biomolecular solvation through solution of the Poisson-Boltzmann equation (PBE), developed by Nathan Baker in collaboration with J. Andrew McCammon and Michael Holst.

For more information about APBS, visit http://www.poissonboltzmann.org/apbs/
    
Automatic quenching.r This script extracts the frames from a trajectory file, then minimize them using AMMP or Mopac. The results will be stored to an output trajectory file. You can input some parameters:
  • Input molecule
    File name of the input molecule. When you select a molecule using Open button, Input trajectory, Output trajectory and Output energy fields are automatically updated.
  • Input trajectory
    File name of the input MD trajectory. When you select a new trajectory using Open button, Output trajectory and Output energy fields are automatically updated.
  • Output trajectory
    File name of the output trajectory. When you select a new trajectory using Open button, Output energy field is automatically updated. In the file requester, you can select the output format.
  • XTC comp. (1-6)
    Gromacs XTC compression ratio. It has a meaning only if Gromacs XTC format is selected as output.
  • Output energy file
    File in which the energy values are stored (CSV format). It's available only if Mopac is selected as Minimization type.
  • First frame
    Trajectory frame from which the quenching starts.
  • Last
    Trajectory frame to which the quenching ends.
  • Step
    Increment for the frame enumeration.
  • Minimization type
    This field allows to select the minimization type: None (nothing is performed), AMMP (molecular mechanics method based on the conjugate gradients algorithms) and Mopac (semiempirical method).
  • AMMP min. steps
    Number of minimization steps used by AMMP.
  • AMMP toler
    I's the convergence criterion used by AMMP to stop the minimization.
  • Mopac keywords
    In this field, you can put the keywords to control the Mopac calculation.
  • Calculate
    Press this button to perform the quenching. If a parameter is incorrect or missing, an error message is shown.
    
DCD fix for VMD.c All pre-3.0.0 VEGA ZZ releases write buggy DCD files that aren't readable by VMD. This scripts fix the problem patching the DCD trajectory only if the problem is detected.
    
Dump energy.c This script calculates the energy for each MD frame and dumps the molecular mechanics energy components in a CSV file. It also performs a histogram analysis.
  • Input molecule
    File name of the input molecule. When you select a molecule using Open button, Input trajectory, Output trajectory and Output energy fields are automatically updated.
  • Input trajectory
    File name of the input MD trajectory. When you select a new trajectory using Open button, Output trajectory and Output energy fields are automatically updated.
  • Output energy
    Output energy file in CSV format (Comma Separated Values). Each column contains the following data: frame number, bond, angle, torsion, hybrid, non-bond and total energies.
  • Output histogram
    Output histogram in CSV format.
  • First frame
    Trajectory frame from which the quenching starts.
  • Last
    Trajectory frame to which the quenching ends.
  • Step
    Increment for the frame enumeration.
  • Minimization type
    This field allows to select the minimization type: None (nothing is performed), AMMP (molecular mechanics method based on the conjugate gradients algorithms) and Mopac (semiempirical method).
  • AMMP min. steps
    Number of minimization steps used by AMMP.
  • AMMP toler
    I's the convergence criterion used by AMMP to stop the minimization.
  • Mopac keywords
    In this field, you can put the keywords to control the Mopac calculation.
  • Calculate
    Press this button to perform the quenching. If a parameter is incorrect or missing, an error message is shown.
    
Enantiomerizer.r Convert the trajectory to another format inverting all chiral atoms. You can specify the following parameters:
  • Input traj.
    File name of the input trajectory. Clicking Open button, the file requester is shown.
  • Output traj.
    File name of the output trajectory.
  • Output format
    File format of the output trajectory.
  • Compression
    Compression level. It has an effect only if XTC format is selected.
  • Append if the file exists
    If it's checked and the output trajectory exists, the converted frames are appended.
  • Consider selected atoms only
    If it's checked, the active (visible) atoms only are saved into the new trajectory.
  • Swap endian
    If it's checked, the endian of the converted trajectory is swapped. This function has an effect only if the DCD format is selected.

Click Go ! button to start the conversion and Cancel button to close the window.

    
Frame extractor.r Extract the frames from a trajectory file (Input Traj.), saving them in the specified directory (Output Dir.). You can change Quenching step, Output format and Compression method.
    
NAMD SMD force plot.c This script shows the force/frame, force/distance and distance/frame of a steered molecular dynamics simulation by reading the NAMD output file.
    
PELE PDB fix.c This script fixes the non-standard PDB files generated by PELE to be read by VEGA ZZ.
For more information about PELE, click here.
   
Ramachandran.c

This script perform the Ramachandran analysis for each trajectory frame. Before running it, you must open a trajectory file. For each frame, the Phi and Psi backbone torsion angles are measured and evaluated if they are inside or outside the Ramachandran permission areas. For each frame is calculated the percentage referred to the total number of the residues and these values are visualized in a plot. This calculation is useful to highlight the secondary structure evolution during a MD simulation. If the percentage of the residues (Phi and Psi values) inside the permission areas is decreasing during the simulation, it means that the secondary structure evolves to a worse situation. Vice versa, if the percentage is growing, the secondary structure is improving.

    
SDF export.c Convert the current trajectory in a SDF database. Each structure in the database is equivalent to each frame in the trajectory file.
    
Water remover.r

Remove all water molecules from a trajectory converting it into a PDB multimodel file. This script is obsolete and it's maintained as example only. The same function is now implemented in VEGA ZZ without external scripts. 

 

13.3.19 Utilities

This directory includes the generic scripts. Some of these require REBOL/View.

Bin2h.c This script for developers converts a binary file to a C header file including a bite vector or a Base64 encoded string. In this last case, to decode the data, you can use HD_Base64EncodeMem() HyperDrive function by including hdbase64.h file.
   
Calculator.r Simple calculator (script by Ryan S. Cole).
    
Calendar.r Calendar and scheduler (script by Sterling Newton).
    
Clock.r Digital clock (script by Carl Sassenrath).
    
Console.r Open the REBOL console.
   
CPU load.c Show the CPU load in a small window.
    
Image viewer.r Image viewer.