Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit

Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. They describe Pybel, a Python module that provides access to the OpenBabel toolkit.

Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel.

Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.

The relationship between Python modules described in the text and the OpenBabel C++ library. Python modules are shown in green; the C++ library is shown in blue.

Cheminformaticians often need to write once-off scripts to create extract data from text files, prepare data for analysis or carry out simple statistics. Scripting languages such as Perl, Python and Ruby are ideally suited to these day-to-day tasks . Such languages are, however, an order of magnitude or more slower than compiled languages such as C++. Since cheminformaticians regularly deal with molecular files containing thousands of molecules and many cheminformatics algorithms are computationally expensive, cheminformatics toolkits are typically written in compiled languages for performance.

OpenBabel is a C++ toolkit with extensive capabilities for reading and writing molecular file formats (over 80 are supported) as well as for manipulating molecular data [2]. Many standard chemistry algorithms are included, for example, determination of the smallest set of smallest rings, bond order perception, addition of hydrogens, and assignment of Gasteiger charges. In relation to cheminformatics, OpenBabel supports SMARTS searching , molecular fingerprints (both Daylight-type, and structural-key based), and includes group contribution descriptors for LogP , polar surface area (PSA) and molar refractivity (MR) .

Of the current popular scripting languages, Python is the de-facto standard language for scripting in cheminformatics. Several commercial cheminformatics toolkits have interfaces in Python: OpenEye's closed-source successor to OpenBabel, OEChem, is a C++ toolkit with interfaces in Python and Java; Rational Discovery's RDKit, which is now open source, is a C++ cheminformatics toolkit with a Python interface; the Daylight toolkit from Daylight Chemical Information Systems, written in C, only has Java and C++ wrappers but PyDaylight, available separately from Dalke Scientific, provides a Python interface to the toolkit; the Cambios Molecular Toolkit from Cambios Consulting is a commercial C++ toolkit with a Python interface. There are also toolkits entirely implemented in Python: Frowns, an open source cheminformatics toolkit by Brian Kelley, and PyBabel, an open source toolkit included in the MGLTools package from the Molecular Graphics Labs at the Scripps Research Institute. Note that the latter is not related to the OpenBabel project; rather its name derives from the fact that its aim was to implement in Python some of the functionality of Babel v1.6 , a command-line application for converting file formats which is a predecessor of OpenBabel.

Here they describe the implementation and application of Pybel, a Python module that provides access to the OpenBabel C++ library from the Python programming language. Pybel builds on the basic Python bindings to make it easier to carry out frequent tasks in cheminformatics. It also aims to be as 'Pythonic' as possible; that is, to adhere to Python language conventions and idioms, and where possible to make use of Python language features such as iterators. The result is a module that takes advantage of Python's expressive syntax to allow cheminformaticians to carry out tasks such as SMARTS matching, data field manipulation and calculation of molecular fingerprints in just a few lines of code.

O'Boyle, N.M., Morley, C. & Hutchison, G.R. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chemistry Central Journal2, 5 (2008).