Creating a cheminformatics workflow Using the command-line interface to ChemAxon tools, Aabel and Applescript.

You may have used Marvin a collection of tools for drawing, displaying and characterizing chemical structures, substructures and reactions. Most of the time you would access these tools via the GUI provided by ChemAxon, however it is also possible to access these tools via the command-line. Open up a Terminal window and type cxcalc -h and you should see the following options available.
ChrisMacBookPro:~ username$ cxcalc -h

Calculator, (C) 1998-2009 ChemAxon Ltd.
version 5.2.2
Runs various molecule calculations: charge, pKa, logP, etc.

Usage:
  cxcalc [general options] [input files/strings]
        [plugin options] [input files/strings]
  cxcalc [general options] [input files]
        [plugin1 options] [input files/strings]
        [plugin2 options] [input files/strings]
       ...
  cxcalc [training options] [input file (the training set)]

General options: 
  cxcalc -h, --help              this help message,
                                 list of available calculations
  cxcalc  -h, --help     plugin specific help message
  -o, --output         output file path (default: stdout)
  -t, --tag            name of the SDFile tag to store the
                                 calculation results, tag name prefix
                                 to default tag names in case of multiple
                                 plugins (default: see plugin help)
  -i, --id      SDFile tag that stores the molecule ID
                                 if no such tag exists in the input molecule
                                 then molecule ID is the molecule itself
                                 converted to the specified format
                                 (default: ID = molecule index)
  -N, --do-not-display   do not display molecule ID and/or
                                 table header (in table output form):
                                 i  - no molecule ID
                                 h  - no table header
                                 ih - neither molecule ID nor table header
  -S, --sdf-output               SDF output with results in SDF tags
  -M, --mrv-output               result molecule output in MRV format
                                 (if neither -S nor -M is specified then
                                 plugin results are written in table form)
  -g, --ignore-error             continue with next molecule on error
  -v, --verbose                  print calculation warnings to the console

Training options: 
  -T, --train-knowledge-base [logP|pKa]
                                 generate knowledge base for the specified
                                 calculation
  -o, --output             logP: output file path
                                 pKa: output directory path
  -t, --tag            name of the SDFile tag that stores the
                                 experimental values (logP only)
  -a, --add-built-in-training-set
                                 add built-in training set (logP only)

Available calculations:

  atomcount, composition, dotdisconnectedformula, 
  dotdisconnectedisotopeformula, elemanal, elementalanalysistable, 
  exactmass, formula, icomposition, iformula, isotopecomposition, 
  isotopeformula, mass

Charge
  atomicpolarizability, atompol, averagemolecularpolarizability, 
  averagepol, avgpol, axxpol, ayypol, azzpol, charge, formalcharge, 
  ioncharge, molecularpolarizability, molpol, oen, 
  orbitalelectronegativity, pol, polarizability, tholepolarizability, 
  tpol, tpolarizability

Conformation
  conformers, hasvalidconformer, leconformer, lowestenergyconformer, 
  moldyn, moleculardynamics

Geometry
  aliphaticatom, aliphaticatomcount, aliphaticbondcount, 
  aliphaticringcount, aliphaticringcountofsize, angle, aromaticatom, 
  aromaticatomcount, aromaticbondcount, aromaticringcount, 
  aromaticringcountofsize, asa, asymmetricatom, asymmetricatomcount, 
  balabanindex, bondcount, bondtype, carboaromaticringcount, 
  carboringcount, chainatom, chainatomcount, chainbond, chainbondcount, 
  chiralcenter, chiralcentercount, connected, connectedgraph, 
  cyclomaticnumber, dihedral, distance, distancedegree, dreidingenergy, 
  eccentricity, fragmentcount, fusedaliphaticringcount, 
  fusedaromaticringcount, fusedringcount, hararyindex, 
  heteroaromaticringcount, heteroringcount, hindrance, hyperwienerindex, 
  largestatomringsize, largestringsize, largestringsystemsize, 
  maximalprojectionarea, maximalprojectionradius, minimalprojectionarea, 
  minimalprojectionradius, molecularsurfacearea, msa, plattindex, 
  polarsurfacearea, psa, randicindex, ringatom, ringatomcount, ringbond, 
  ringbondcount, ringcount, ringcountofatom, ringcountofsize, 
  ringsystemcount, ringsystemcountofsize, rotatablebond, 
  rotatablebondcount, shortestpath, smallestatomringsize, 
  smallestringsize, smallestringsystemsize, stereodoublebondcount, 
  stericeffectindex, sterichindrance, szegedindex, topanal, 
  topologyanalysistable, vdwsa, wateraccessiblesurfacearea, wienerindex, 
  wienerpolarity

Isomers
  canonicaltautomer, dominanttautomerdistribution, 
  doublebondstereoisomercount, doublebondstereoisomers, generictautomer, 
  majortautomer, moststabletautomer, stereoisomercount, stereoisomers, 
  tautomercount, tautomers, tetrahedralstereoisomercount, 
  tetrahedralstereoisomers

Markush Enumerations
  enumerationcount, enumerations, markushenumerationcount, 
  markushenumerations, randommarkushenumerations

Name
  name

Partitioning
  logd, logp

Protonation
  averagemicrospeciescharge, chargedistribution, isoelectricpoint, 
  majormicrospecies, majorms, microspeciesdistribution, msdistr, pi, pka

Other
  acc, acceptor, acceptorcount, acceptorsitecount, acceptortable, 
  accsitecount, aromaticelectrophilicityorder, 
  aromaticnucleophilicityorder, canonicalresonant, chargedensity, don, 
  donor, donorcount, donorsitecount, donortable, donsitecount, 
  electrondensity, electrophilicityorder, 
  electrophiliclocalizationenergy, energy, frameworks, hbda, 
  hbonddonoracceptor, huckel, huckeleigenvalue, huckeleigenvector, 
  huckelorbitals, huckeltable, localizationenergy, msacc, msdon, 
  nucleophilicityorder, nucleophiliclocalizationenergy, order, 
  pichargedensity, pienergy, refractivity, resonantcount, resonants, 
  totalchargedensity

Examples:
  cxcalc mols.sdf charge
  cxcalc -i smiles mols.sdf logP pKa
  cxcalc -S -t myLOGP mols.sdf logp -t increments,logP -p 3
  cxcalc -t my mols.sdf logd -l 3 -u 7 -s 0.5 logp -t increments,logP -p 3
  cxcalc -T logP -t LOGP -o logPparameters.txt trainingset.sdf

We can use these tools to provide a cheminformatics computation engine for use in quick calculations from the command-line or as part of a workflow. For example the following quickly calculates the LogP of the input SMILES string.
ChrisMacBookPro:~ username$ cxcalc 'c1(c(cccc1)Br)C(=O)C' logp
id      logP
1       2.30

Alternatively you might want to calculate the LogD at a particular pH (usually physiological pH 7.4).
ChrisMacBookPro:~ username$ cxcalc 'c1(c(cccc1)Br)C(=O)C' logd -H 7.4
id      logD[pH=7.4]
1       2.30

The commands can be added together, for example if you wanted to calculate the Lipinski "Rule of Five" properties (doi:10.1016/S0169-409X(00)00129-0).
ChrisMacBookPro:~ username$ cxcalc 'c1(c(cccc1)Br)C(=O)C' logp mass acceptorcount donorcount
id      logP    Mass    acceptorcount   donorcount
1       2.30    199.045 2       0

These commands can also be used to manipulate files, so calculate the Ro5 properties for a file use:-
ChrisMacBookPro:~ username$ cxcalc /Users/username/Desktop/acetophenones.smiles -o /Users/username/Desktop/results.tab logp mass acceptorcount donorcount

Or if you want the results added to an SDFile use:-
ChrisMacBookPro:~ username$ cxcalc /Users/username/Desktop/acetophenones.sdf -S -o /Users/username/Desktop/results.sdf logp mass acceptorcount donorcount

The attraction of the command-line options is that they can be included in an Applescript to automate processing a chemical structure file. However when using Applescript to run UNIX commands the are are few things you need to bear in mind. the Applescript command "do shell script" always uses /bin/sh to interpret your command not your default shell, it also ignores the configuration file that an interactive shell would read, so commands you may use in the terminal may need modifying to work in an Applescript. In particular you will probably have to give full paths to commands etc. and it is probably a good idea to enclose paths in single quotes to avoid problems with spaces in folder/file names. The other thing to note is that Applescript uses a colon ":" as a separator for directories however UNIX uses POSIX file paths in which the slash "/" is used as the directory separator. However one of the additions to AppleScript 1.8 was the ability to inter-convert the two file reference systems. You can demonstrate this using the simple applescript below.
set this_file to choose file
set this_file_text to (this_file as text)
display dialog this_file_text
set posix_this_file to POSIX path of this_file
display dialog posix_this_file
set this_file_back to (POSIX file posix_this_file) as string
display dialog this_file_back
Using quotes and backslashes in the shell command, Strings in AppleScript go from an opening double quote to a closing double quote. To put a literal double quote in your string you must "escape" it with a backslash character. Some punctuation has special meanings in shell so use quoted form to avoid punctuation being interpreted by the shell.
set the_text to "this is a test."
do shell script "echo " & quoted form of the_text & " | perl -n -e 'print \"\\U$_\"'"

-- shell sees, echo 'this is a test.' | perl -n -e 'print "\U$_"' --result: "THIS IS A TEST."
So a script to calculate the Ro5 properties on a file might look like this:-
set this_file to choose file
set this_file_text to (this_file as text)
--get the posix path to chosen file
set posix_this_file to quoted form of POSIX path of this_file

set shell_script to "'/Applications/ChemAxon/MarvinBeans/bin/cxcalc' " & posix_this_file & "   -o '/Users/username/Desktop/results.tab' logp mass acceptorcount donorcount polarsurfacearea"

do shell script shell_script
This script creates a tab delimited file on the desktop called results.tab. Hard coding the path to the desktop will fail if the user has renamed the hard drive, so we use a short script snippet to get the path to the desktop.
set user_path to (path to desktop) as text
--file for calculated results
set result_file to user_path & "results.tab" as text
set posix_result_file to quoted form of POSIX path of result_file
Now we have a file containing the data in tab delimited format we can use a plotting application to plot the distribution of molecular properties for the compounds in the file. I use Aabel. Using Applescript to create a chart using Aabel is efficient but perhaps not straightforward at first sight so lets take it in small steps. Also you need to download the latest patched version of Aabel, during the course of developing this script I identified a couple of bugs in the Applescript support that the developers rapidly fixed.
First we define the tab delimited file to be imported and then import the data into a new worksheet. The next part simply defines the working directory to save files to. Aabel separates data from charts so now we need to create a new viewer to plot the data onto.
SelectChart, Selects a chart type, the parameters are the Viewer button and menu coordinates as tab delimited TEXT. So SelectChart "4 3" means select the fourth button from the top row of the viewer (which is pie charts) and then select the third item from the dropdown menu (Square Pie). In our case we chose button 8 (histograms) and menu item 1 (continuous data). We then define the fill colour, the numbers are available from the "Edit colour palette" menu, and select the variables we use the second column of data from the topmost worksheet. Then we position the chart, Dimensions are given in current real world page units (inches, cm etc.: horizontal start point, horizontal end point, vertical start point, vertical end point). The coordinates are 0,0 in the upper-left corner of the page, and the positive direction is down and to the right.
tell application "Aabel_3"
       Run
	
	set thetabdelimitedfile to (result_file as text) as alias
	
	
	ImportDataIntoNewWorksheet thetabdelimitedfile
	
	set currentdirectory to alias "Macintosh HD:Public"
	SetCurrentDirectory currentdirectory
	
	
	CreateNewViewer "1"
	activate
	
	SelectChart "8 1"
	
	SelectDefaultFillColor "120"
	
	SelectVariables "1 2"
	
	SetChartInstanceDimensions "0.2 3.2 0.2 3.2"
	
end tell
This is the simplest type of script and it is possible to refine the display. With SelectChart You can add an additional 15 parameters. These parameters define the settings that correspond to those defined using the Variables & Plot Options palette controls, which can include:
(1) 5 popup menu values
(2) 5 checkbox values
(3) 5 slider values

So SelectChart "8 1 1 3 1 1 -1 0 0 0 0 1 20 0.2 -1 -1 -1"

Corresponds to

Finally if we want to add text we use these commands.

tell application "Aabel_3"
       Run
	
	SetDefaultTextLineFont "Helvetica"
	
	SetDefaultTextLineFontStyle "Bold"
	
	SetDefaultTextLineFontSize "14"
	
	SelectDefaultLineColor "1"
	
	CreateTextLine the_text
	
end tell
The final complete script that includes export of the final chart as a pdf is shown below and the actual script can be downloaded here.
property obgrepPath : "'/usr/local/bin/obgrep'"

set user_path to (path to desktop) as text
--file for calculated results
set result_file to user_path & "results.tab" as text
set posix_result_file to quoted form of POSIX path of result_file

set this_file to choose file
set this_file_text to (this_file as text)
tell application "Finder" to set file_name to (name of this_file)



--get the posix path to chosen file
set posix_this_file to quoted form of POSIX path of this_file
--use openbabel to get number of structures
set obgrep_command to obgrepPath & " -v -c \"NNNNNN\" '" & posix_this_file & "'"

set obgrep_command_shell to obgrep_command & " |cut -d  \" \" -f2"
set count_lines to (do shell script obgrep_command_shell) as string

set the_text to " The file " & file_name & " contains " & count_lines & " structures."


--get molecular properties
set shell_script to "'/Applications/ChemAxon/MarvinBeans/bin/cxcalc' " & posix_this_file & "   -o " & posix_result_file & " logp mass acceptorcount donorcount polarsurfacearea rotatablebondcount"


do shell script shell_script

--Use Aabel create histograms

tell application "Aabel_3"
       Run
        
       set thetabdelimitedfile to (result_file as text) as alias
	
	
	ImportDataIntoNewWorksheet thetabdelimitedfile
	
	set currentdirectory to alias "Macintosh HD:Public"
	SetCurrentDirectory currentdirectory
	
	
	CreateNewViewer "1"
	activate
	
	SelectChart "8 1"
	
	SelectDefaultFillColor "120"
	
	SelectVariables "1 2"
	
	SetChartInstanceDimensions "0.2 3.2 0.2 3.2"
	
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "21"
	
	SelectChart "8 1"
	
	SelectVariables "1 3"
	
	SetChartInstanceDimensions "3.2 6.2 0.2.2 3.2"
	
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "64"
	
	SelectChart "8 1 1 3 1 1 -1 0 0 0 0 1 10 0.2 -1 -1 -1"
	
	SelectVariables "1 4"
	
	SetChartInstanceDimensions "0.2 3.2 3.2 6.2"
	
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "140"
	
	
	SelectChart "8 1 1 3 1 1 -1 0 0 0 0 1 10 0.2 -1 -1 -1"
	
	SelectVariables "1 5"
	
	SetChartInstanceDimensions "3.2 6.2 3.2.2 6.2"
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "45"
	
	SelectChart "8 1"
	
	SelectVariables "1 6"
	
	SetChartInstanceDimensions "6.2 9.2 0.2.2 3.2"
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "64"
	
	SelectChart "8 1 1 3 1 1 -1 0 0 0 0 1 20 0.2 -1 -1 -1"
	
	SelectVariables "1 7"
	
	SetChartInstanceDimensions "6.2 9.2 3.2 6.2"
	
	
	CreateNewChartInstance " "
	
	SelectDefaultFillColor "140"
	SetDefaultObjectLocation "1  7"
	
	SetDefaultTextLineFont "Helvetica"
	
	SetDefaultTextLineFontStyle "Bold"
	
	SetDefaultTextLineFontSize "14"
	
	SelectDefaultLineColor "1"
	
	CreateTextLine the_text
	
	
	ExportVisibleViewerContent "MyfileNew.pdf"
	
	
	Quit
	
end tell

And the result is shown here.

This script could be stored in the Applescript menu folder or converted into a droplet and left on the desktop. Many more properties could be calculated using the chemaxon tools. I've only tested this with SMILES and sdf format files, but you should be able to use Openbabel to convert most file formats to these formats.