Category Archives: science

IPython notebook custom css

Here is the custom css I use for my ipython notebook. It is not finished but I think it is a great improvement from the default, the biggest change being that I removed the In/Out prompt. There are some changes to the margins and colours too, and a red dotted line is drawn around running cells. The font for the code was changed to Inconsolata-bz (you need to install that font to get exactly the same look).

ipy_ccss

To use this custom css as default, first create an ipython default profile:

$ ipython profile create

Create the directory that will contain the custom.css file:

$ mkdir -p ~/.config/ipython/profile_default/static/custom

Save the code below as custom.css in the directory you just created.

Download the file here: custom.css

Compiling Aria2, CNS, procheck and aqua for NMR structure calculation

A quick guide to compiling ARIA 2.2, CNS 1.21, AQUA 3.2 and PROCHECK, under Ubuntu Karmic 9.10 (32-bit)

If you use Gentoo linux, you can find aria under sci-chemistry (masked by ~x86). Procheck is in the science overlay (see the layman and overlays documentation). Aqua has to be installed manually at the time of writing this post.

In Ubuntu, the following packages are needed:

  • numpy
  • scipy
  • python-scientific
  • tcl 8.5
  • tk 8.5
  • tix
  • matplotlib
  • cns (to be compiled manually, see below)
  • the intel fortran compiler (to compile cns, gfortran does not seem to work)
  • tcsh
  • openjdk6 (needed for intel fortran compiler)
  • libstdc++5 (for cns)
  • flex (for cns)

Some files need to be copied from ARIA to CNS before it is compiled, this can be found in the ARIA instructions. Both CNS and ARIA can be compiled according to the instructions, but during the compilation of CNS with gfortran 4.3 I got a Segmentation fault. Using the Intel fortran compiler (ifort, free for academic use), there were no problems. For the installation of the Intel compiler, I needed to have openjdk6, which I installed using the Synaptic "default_jre" package.

CNS required libstdc++5 (download the old .deb) and flex ("lex" command not found) to compile correctly.

Procheck and aqua can be installed according to the instructions. I used ifort to compile procheck, and g++ for aqua.

And here are the lines to include in .tcshrc (of course substituting the directories for the ones that are used on your system):

# for cns:
source /home/louic/software/cns_solve_1.21/cns_solve_env
 
# for aria:
setenv ARIA2 '/home/louic/software/aria2.2'
alias aria2 '/usr/bin/python -O $ARIA2/aria2.py'
 
# for procheck and aqua:
setenv prodir '/home/louic/software/procheck'
setenv aquaroot '/home/louic/software/aqua3.2'
source $aquaroot/aqsetup

There seemed to be some problems on a 64-bit system, which is why I switched back to using 32-bit for now. I did not fully investigate this issue and do not know what the exact problem was.

Using procheck_nmr on a large number of structures

After a structure calculation with the aria 2.2 software I used aqua and procheck_nmr to assess the result. Although procheck_nmr worked fine on my 20 refined structures (in aria's refine directory), it ran into trouble with the 100 structures after iteration 8 (directory it8):

 * Restraints read in from file:
 * allpdb.nrv                                                                  
* Warning. Error reading restraint on line          34
* Warning. Error reading restraint on line          35
* Warning. Error reading restraint on line          36
* Warning. Error reading restraint on line          37
* Warning. Error reading restraint on line          38
* Warning. Error reading restraint on line          39
* Warning. Error reading restraint on line          40
* Warning. Error reading restraint on line          41
* Warning. Error reading restraint on line          42
(... and so on...)

A look at the allpdb.nrv file indicated that nothing was wrong with it, and as I mentioned the analysis worked fine when a lower number of structures was analysed.

A quick look at the procheck_nmr script indicated that these error messages were produced by vplot, which has it's source code in the vplot.f file. The source-code defines a string called IREC, that will hold the record with all the relevant distances in all the structures, but IREC is defined as:

       CHARACTER*512 IREC

...which is not long enough to hold all the distances that are read from the .nrv file. A similar problem occurs somewhere else in the code, where a format string is used on the record that is read from the .nrv file.

By increasing the size of the relevant variables and recompiling vplot.f, the analysis runs smoothly. Here is the diff-file that may be used as a patch:

louic@picadilly:~/software/procheck$ diff -u ../old/procheck/vplot.f vplot.f
--- ../old/procheck/vplot.f	2010-01-19 13:37:32.000000000 +0000
+++ vplot.f	2010-01-21 18:46:38.000000000 +0000
@@ -2141,7 +2141,7 @@
       CHARACTER*4   ATTYP(2)
       CHARACTER*9   RESDET(2)
       CHARACTER*80  FNAME
-      CHARACTER*512 IREC
+      CHARACTER*1024 IREC
       INTEGER       IATNO(2), IATTYP, ICONST, ICTYPE, IERR, IFILE,
      -              IMODEL, IRES, IRESNO(2), ITYPE, JERR, LCOUNT, LINE,
      -              NFILE, MAXCON, UCOUNT
@@ -2292,7 +2292,7 @@
 C----                     Read in the restraint violations for all the models
                           READ(IREC,460,IOSTAT=IERR)
      -                        (ACDIST(IFILE,ICONST), IFILE = 1, NFILE)
- 460                      FORMAT(57X,60F7.2)
+ 460                      FORMAT(57X,100F7.2)
 
 C----                     Extract the data for just those models that
 C                         have been selected by the user

To apply this patch, go to the directory where vplot.f is located, save the above patch as vplot.f.patch, and run the following commands:

patch < vplot.f.patch
make

Note that although this modification makes the analysis run on 100 structures, the length of IREC and the float in the format string are still limited. They may need to be increased if you want to analyse more structures at the same time.

Compiling and installing MolMol under Linux

The molmol software that is available from the website of ETH Zurich does not compile on Ubuntu Linux without some changes. It seems that it is no longer supported by ETH Zurich, but can be downloaded from several other websites (just google it). I describe below what I needed to do to get it to compile and run on Ubuntu Linux.

A summary of the installation procedure as described in the readme file follows for reference, but it does not seem to work without the changes mentioned below:

tar xvzf molmol-2k.2.0-doc.tar.gz
tar xfzf molmol-2k.2.0-src.tar.gz
 
cd tiff-v3.4
./configure
make
sudo make install
cd ..
 
cp makedef.lnx makedef
./INSTALL
make
sed -i 's/ksh/sh/g' molmol 
cd src/main
strip molmol
cp molmol ../../molmol.lnx
cd ../..
 
./molmol

First, some additional packages are needed in Ubuntu:

sudo apt-get install libmotif-dev x11proto-print-dev libxpm-dev libxt-dev

(depending on your current ubuntu installation, you may need some other packages as well. Some error messages during the installation may give you an indication of what else is needed.)

However, when I tried to compile molmol, the following error occured:

/usr/bin/gcc -I../../tools/include -Dsqrtf=sqrt -Dexpf=exp -Dlogf=log -Dpowf=pow -Dsinf=sin -Dcosf=cos -Dtanf=tan -Dasinf=asin -Dacosf=acos -Datanf=atan -Datan2f=atan2 -Dfabsf=fabs -Dceilf=ceil -O2 -I../../tools/include -Dsqrtf=sqrt -Dexpf=exp -Dlogf=log -Dpowf=pow -Dsinf=sin -Dcosf=cos -Dtanf=tan -Dasinf=asin -Dacosf=acos -Datanf=atan -Datan2f=atan2 -Dfabsf=fabs -Dceilf=ceil  -c -o RandNum.o RandNum.c
In file included from /usr/include/math.h:94,
                 from RandNum.c:29:
/usr/include/bits/mathcalls.h:55: error: conflicting types for ‘acos’
/usr/include/bits/mathcalls.h:57: error: conflicting types for ‘asin’
/usr/include/bits/mathcalls.h:59: error: conflicting types for ‘atan’
/usr/include/bits/mathcalls.h:61: error: conflicting types for ‘atan2’
/usr/include/bits/mathcalls.h:64: error: conflicting types for ‘cos’
/usr/include/bits/mathcalls.h:66: error: conflicting types for ‘sin’
/usr/include/bits/mathcalls.h:68: error: conflicting types for ‘tan’
/usr/include/bits/mathcalls.h:101: error: conflicting types for ‘exp’
/usr/include/bits/mathcalls.h:110: error: conflicting types for ‘log’
/usr/include/bits/mathcalls.h:154: error: conflicting types for ‘pow’
/usr/include/bits/mathcalls.h:157: error: conflicting types for ‘sqrt’
/usr/include/bits/mathcalls.h:179: error: conflicting types for ‘ceil’
/usr/include/bits/mathcalls.h:182: error: conflicting types for ‘fabs’
make[4]: *** [RandNum.o] Error 1

This was easily solved by editing the makedef file as follows:

change

MISSFUNC = -Dsqrtf=sqrt -Dexpf=exp -Dlogf=log -Dpowf=pow \
           -Dsinf=sin -Dcosf=cos -Dtanf=tan \
           -Dasinf=asin -Dacosf=acos -Datanf=atan -Datan2f=atan2 \
           -Dfabsf=fabs -Dceilf=ceil

to

# MISSFUNC = -Dsqrtf=sqrt -Dexpf=exp -Dlogf=log -Dpowf=pow \
#         -Dsinf=sin -Dcosf=cos -Dtanf=tan \
#         -Dasinf=asin -Dacosf=acos -Datanf=atan -Datan2f=atan2 \
#         -Dfabsf=fabs -Dceilf=ceil

At this point, I got a second error

/usr/bin/sleep 2
make[6]: /usr/bin/sleep: Command not found

This one is even easier to solve, by again editing makedef:

change

WAIT   = /usr/bin/sleep 2

to

WAIT   = /bin/sleep 2

Another error occurs that requires us to make some changes in the source code. The error message is:

/usr/bin/gcc -o molmol -I../../tools/include -I../../sg/include -I../../include  -O2 MolMol.o MolInit.o ../../lib/libcip.a ../../lib/libcmd.a ../../lib/libui.a ../../lib/libgraph.a ../../lib/libio.a ../../lib/libpu.a ../../lib/libcalc.a ../../lib/libprim.a ../../lib/libdata.a ../../lib/libattr.a ../../lib/libfileio.a ../../lib/libos.a ../../sg/lib/libsg.a ../../tools/lib/libtools.a  -L/usr/X11R6/lib -lXm -lXt -lX11 -lm -lc -lieee
../../lib/libos.a(GFile.o): In function `raiseError':
GFile.c:(.text+0x37): warning: `sys_errlist' is deprecated; use `strerror' or `strerror_r' instead
/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference in ../../lib/libos.a(GFile.o)
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[4]: *** [molmol] Error 1

The following changes to the source-code solved the problem.

Add the following line to the file ./src/os/GFile.c

#include <errno.h>

add it for example under the line that says

#include <linlist.h>

Also change the following line (line 85 after the previous edit):

msg = sys_errlist[errno];

to

msg = strerror(errno);

Molmol shoud compile now, but unfortunately the problems are not finished yet. After having stripped and copied molmol from ./src/main to molmol.lnx as in the original instructions, when we try to run it, it says:

MOLMOL 2K.2 
 
Version 2.1-2.6: Copyright (c) 1994-98 by
    Institut fuer Molekularbiologie und Biophysik, ETH Zurich
    Spectrospin AG, Faellanden, Switzerland
 
Version 2K.2: Custom version by Reto Koradi, 1999-2003
 
using Motif/OpenGL
unknown IO device

This can be solved by editing the startup script (the file called molmol):

Comment out the following lines (line 192-209):

#if [ -n "$nograph" ]; then
#  MOLMOLDEV=TTY/NO
#elif [ -n "$MOLMOLDEV" ]; then
#  true  # already set
#elif [ -n "$localdev" -a  $display =  ]; then
#  MOLMOLDEV=$localdev
#elif [ -x $xdpy ]; then
#  xdpyout=`$xdpy -d $display 2>&1 | egrep 'GLX|unable'`
#  case $xdpyout in
#    *unable*) MOLMOLDEV=TTY/NO
#              nograph=y
#              continue;;
#    *GLX*)    if [ -n "$glxdev" ]; then
#                MOLMOLDEV=$glxdev
#              fi
#              continue;;
#  esac
#fi

I hope this helps!

Drawing structures of organic molecules in (Gentoo) Linux

In gentoo portage, there are several programs to draw structures of organic molecules. The best one is gchempaint. Not in portage, but also very interesting is bkchem. A Gentoo ebuild is available, but needs to be installed from a portage overlay. To install it, just use the commands below (where I assume you have a portage overlay correctly set up in /usr/local/portage).

# mkdir -p /usr/local/portage/sci-chemistry/bkchem
# cd /usr/local/portage/sci-chemistry/bkchem
# wget http://bkchem.zirael.org/download/bkchem-0.12.5.ebuild
# ebuild bkchem-0.12.5.ebuild digest
# emerge -va bkchem

If you want to make some final adjustments to the structure of the molecule, I suggest saving the molecule as an svg image, and making the adjustments with inkscape.

BKChem website
GChemPaint website

Some xmgrace tips

Xmgrace is a very nice piece of software to create publication quality figures. Even better is that it stores your data and graph layout options as plain text (which is especially nice in combination with subversion or scripts to edit your graph).

But some things are not very intuitive, which is why I keep a list of some useful possibilities below:

  • Subscript, superscript
    x-squared: x\S2\N
    subscript: 3\s10\N
  • Greek letters, example: theta
    \f{Symbol}q\f{} or shorter: \xq\0q to get a theta symbol followed by the letter q: \x switches to the Symbol font and \0 switches back to Times-Roman.
  • Special symbols, example: Angstrom symbol
    \cE\C
    For other characters, look at this list: ascii table with low and high characters. Just use the character from the left column between \c and \C to produce the one from the right column. I highlighted the most interesting characters (for a scientist). The \c and \C option are listed as deprecated in the xmgrace manual., but what is the new way?. The new method to insert special characters in xmgrace is:
    • Press ctrl-e while positioned in a text-edit field to bring up the font dialog box.
    • Select the desired font from the drop-down list. You probably want to use Symbol because it contains many of the commonly used special characters.
    • Click on the character you want to insert
  • Saving the default settings for new graphs:
    open xmgrace, make the desired settings, save them as:
    ~/.grace/templates/Default.agr
    Unfortunately, this does not save the "print" settings, but see below.
  • Setting the default printer to print to .png files with 300dpi:
    create the file ~/.grace/gracerc.user and enter the following text:
    HARDCOPY DEVICE "PNG"
    DEVICE "PNG" DPI 300
  • Changing the definition of the default colors:
    Just edit the lines that say
    @map color 7 to (220, 220, 220), "grey"
    in the saved file. Edit the default file (see above) if you wish to use the new colors everywhere from now on.

See also the Grace users guide and the grace forums.

If you have more helpful hints, please post them in the comments, so that this blog post will become an interesting collection of tips that can be turned into a useful "cheat sheet".

Draw a helical wheel plot from a pdb file

Many programs are available that draw helical wheels from a given peptide primary structure (sequence), assuming a perfect alpha helix. Sometimes it may be interesting to see what the helical-wheel plot of a non-ideal helix looks like. Because I could not find such a program, I created one myself.

This script needs the PDL and SVG modules, and makes some assumptions as to how your pdb file looks. Please read below.

It is a Perl script that uses the PDL (perl data language) module for the calculations and the SVG module to generate the image. You should have Perl and these two modules installed to use it. The modules can be downloaded from http://www.cpan.org. You also need the Getopt::Long and Pod::Usage modules, but you probably already have those. I only tested it on Linux, but it should work on windows XP (and other versions) as well.

I have made the script available under the creative commons by-nc-sa licence, which means you may use and distribute it as long as you say you got it from me. And not for commercial purposes. If you publish any results, I would appreciate it if you mention my name in your article. At the moment there is no scientific publication related to this software.

click here to download (wheel.tar.bz2)
(but please continue reading for more information)

To learn how to use the program, you can take a look at the man-page with

./wheel.pl -m

The program goes through the following steps to draw the helical wheel:

  • Read the coordinates of the CA atoms from the pdb file
  • Find the principal axis of those atoms (the longest axis)
  • Move the centre of mass to the origin
  • Calculate the mass distribution tensor of the CA atoms
  • Calculate the eigenvectors and eigenvalues of this matrix (the matrix of eigenvectors is the rotation matrix)
  • Rotate the principal axis to coincide with the z-axis
  • Draw the xy-plane, which is the helical wheel plot

The numerical data for all steps is written to a log-file, in case you want to know what happened.

The output image will look similar to this (it is resized and converted to a png file here for web publishing):
helical wheel plot example
Because the program creates an SVG image, it may be edited and resized without losing quality. I recommend inkscape for this purpose.

It is important to realize that this program does not show the idealized helical wheel representation, but a projection of the CA atoms on the xy plane of the actual pdb coordinates. This is useful, but of course leads to images that "look less nice" than the idealized helical wheels. If a helix is curved, has a kink, or flexible termini, the output may not be what you expect. If this is the case, the principal helix axis that is calculated by the software does not lead to the "nice" representation you want to see, even though it is technically correct.

There are three ways I can think of to solve this problem:

  1. remove some amino acid residues (by using the -startres and -endres options of the program, or by removing them from the pdb file). But this may not be what you want.
  2. align (a part of) the helix to the z-axis manually (for example using molmol), and then use the -norotate option of the program. Although the helix will not be aligned to the principal axis, it may better represent the relative position of the amino acids as you want to show it (and still be correct).
  3. a combination of (1) and (2)

Please also note that the script makes several assumptions regarding the way your pdb file looks. If your pdb-file looks like the example below, it should work:

ATOM      1  N   LYS+    1       2.006  -6.797 -16.583  1.00  0.00
ATOM      2  H   LYS+    1       2.409  -5.881 -16.443  1.00  0.00
ATOM      3  CA  LYS+    1       1.934  -7.721 -15.428  1.00  0.00
ATOM      4  HA  LYS+    1       0.893  -7.948 -15.252  1.00  0.00
ATOM      5  CB  LYS+    1       2.689  -9.032 -15.717  1.00  0.00

At the moment, pdb-files that contain a chain identifier field cannot be read without making a small change to the source code of the script. The easiest way around this is probably to temporarily remove the chain identifier from your pdb file with your favorite editor (try a regexp in vi).

Feel free to contact me if you have any ideas, questions, problems or comments.

download (wheel.tar.bz2)