     Chemistry Informatics

Software to draw chemical structures
MarvinView (live demo)
ACD ChemSketch
OOChemistry – JChemPaint plugin for OpenOffice, which allows to embed chemical molecules into ODF.
Chem4Word – Chemistry Add-in for Microsoft Word, which allows to tag chemical entities and change the representation (2D, common name, formular, …).

Cheminformatics toolkits

Facile management of molecules, atoms, bonds, and conformers
Conformational and frame-of-reference coordinate transformations
Maximum common substructure and exact substructure searching
Extremely fast 2D similarity using LINGOS
Perception of aromaticity with multiple models
Chemical reaction parsing and processing
Tetrahedral and E/Z stereochemistry recognition
Ring perception and Kekulization
Molecular normalization and canonicalization
Multiconformer molecule handling
Support for residues and bases
Ability to store and recall generic primitives or user-defined objects on molecules, atoms, bonds, or conformers
Multiple file format handling: robust reading and specification-compliant writing of: SMILES, SLN, SDF, MOL, MOL2, PDB, FASTA, MOPAC, MacroModel, XYZ, CCP4, XPLOR, and OEBinary.
Available in C++, Python & Java


Graph based structure to modify molecular structures
Classes for getting the aromatic flags for atoms and bonds
Classes for getting the hybridisation of atoms
Descriptor calculation classes (I-State, E-State, Burden, …)
SMiles ARbitrary Target Specification (SMARTS) substructure search
Base classes for reading and writing molecular file formats
Support for SMILES, Chemical Markup Language (CML), CACTVS's clear text format (CTX), POVRay export (including aromatic rings)
Atom and bond properties classes (including import and export filter)
Processes / External processes and process decision filters
Regression module using Neural Networks (JavaNNS)
Regression module using Support Vector Machines
JOELib-Matlab connection, e.g. for feature extraction
External processing modules for 3D structure generation with Corina and descriptor calculation with Petra (especially atom and bond property descriptors).
Database module checking for duplicate molecules
Available in Java


2D rendering (see also Renderer Tutorial, Chemblaics Blog by Egon Willighagen and this plus this maillist posts for more examples)
JChemPaint 2D diagram editor
Structure Diagram Layout
3D Rendering
integration with Jmol
CML, SMILES parsing/generation, MDL Molfile support (limited), InChI (via JNI bridge), readers for XYZ, ShelX, HIN, GhemicalMM, Mol2
interface to OpenBabel (via command line)
rule based IUPAC name parser
Virtual Screening
molecular, atomic and bond descriptors
LogP, TPSA, Rule-of-Five, many more
Gasteiger-Marsili charges (sigma *and* pi)
interface to R and Weka for modelling
path-based Fingerprinter
3D model builder
atom typing
MM2, MMFF94, CDK-internal
MMFF94 force field
Kabsch alignment
Chemical Graphs
isomorphism detection
maximal common substructure search
substructure searching (SMARTS like)
ring searches (Smallest Set of Smallest Rings (SSSR), all rings)
NMR prediction
Structure Generation
deterministic generator
stochastic generators (genetic algorithm- and simulated annealing- based)
BioJava interface
Protein Structures
PDB reading
active site detection
sequence to connectivity table
CDK-Taverna Project
Commercial and proprietary cheminformatics tools
A task-oriented comparison of multiple cheminformatics toolkits, Chemical Informatics Toolkits by Andrew Dalke
Python for Computational Chemistry
Dataflow vs. Scripting Languages
Python and Chemical Informatics

Bio Events

International Conference on Intelligent Systems for Molecular Biology (ISMB)
BioIT World Conference & Expo
Bio International Convention
Courses & Events on Bioinformatics Training Network
Conferences & Events on
Chemistry Conferences WorldWide Calendar

Learning material

Cold Spring Harbor Laboratory
Scientific Literature Digital Library
Journal of Computational Science
Basic Introduction to Systems Biology – Online Course
The Open University – Chemistry

Blogs and forums

CADD Group Chemoinformatics Tools and User Services blog
Open Source Chemical Engineering Software Forum
A question and answer site for bioinformatics
Blue Obelisk Exchange – the place to ask about the use and development of Open Data, Open Source, and Open Standards
Journal of Cheminformatics

Search engines and databases

PubMed and GoPubMed
Google Scholar, Goole Patents1)
biobar Firefox addon allows a biologist to browse and retrieve data from many databases
Reflect – Highlighting Proteins, and Small Molecule Names, similar to this:
Concept Web Knowledge Enhancer – highlight concepts for search (another link)
NCBO Bioportal – ontologies used in biomedical communities
The BioCatalogue – a curated catalogue of Life Science Web Services
Google DB Directory
Online Public Compound Databases
Chemical compound search
Chemical databases:
IBM Contributes Data To The National Institutes Of Health To Speed Drug Discovery And Cancer Research Innovation (see also IBM BAO's strategic IP insight platform (SIIP))
Beilstein database query
Chemical Identifier Resolver
Chemical Identifier Resolver plugin for Bioclipse
Text extraction and analysis
OSRA – a utility designed to convert graphical representations of chemical structures
Imago OCR – a toolkit for 2D chemical structure image recognition
Imago OCR

Text extraction and analysis

Text Analysis Tools
Improved chemical text mining of patents using infinite dictionaries, translation and automatic spelling correction (Roger A Sayle, Plamen Petrov, Jon Winter and Sorel Muresan) (online)
Preserving Nuance in Chemical Nomenclature Translation (Roger Sayle)
Semantic Analysis of Chemical Patents (David Jessop, Peter Murray-Rust, Lezan Hawizy) [2010] (online)
High-Throughput Identification of Chemistry in Life Science Texts (Peter Corbett and Peter Murray-Rust) [2006] (online)
Annotation of chemical named entities (Peter Corbett, Colin Batchelor, Simone Teufel) [2007]
Text mining in Bioclipse with Oscar4
Identification of Chemical Entities in Patent Documents (Tiago Grego, Piotr Pzik, Francisco M. Couto, and Dietrich Rebholz-Schuhmann) (online)
Example of chemical extraction from SureChem
Chemicalize project in action by ChemAxon


Chemical Informatics Functionality in R (Rajarshi Guha) [2007] – describes the rcdk package that provides the R user with access to the CDK
Polymer Markup Language (PML). Chemical Markup, XML and the World-Wide Web. [2008] by Nico Adams, Jerry Winter, Peter Murray-Rust and Henry S. Rzepa (DOI: 10.1021/ci8002123)
Drawing Polymers in JChemPaint
Edit reaction with JChemPaintPanel
Representation of chemical structures (Wendy A. Warr) [2011]
Canonical Line Notations: InChI vs SMILES (Krisztina Boda) [2010]
Unofficial InChI FAQ: What Can InChI Currently Not Represent?
What is Std InChI?
Partial Standard InChIKey Lookup
Representation of Markush structures (Szabolcs Csepregi) [2010]
Chemical name resolving
Bio-Linux for bioinformatics workstations
Bioclipse – a Rich Client for the Life Sciences
The management, analysis, and visualization of chemical structures and related information
The management and analysis of biological sequences (DNA, RNA, and proteins).
Pharmacological research and drug discovery.
Data analysis engine based on the statistical language R.
Poster for EclipseCon
SemanticWeb features in Bioclipse 2.2
Ten Simple Rules for Providing a Scientific Web Resource

