Oscar3 is a tool for shallow, chemistry-specific parsing of chemical documents. It identifies (or attempts to identify):
Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms, some enzymes and reaction names.
Ontology terms: if you can do it by string-matching, you can get OSCAR to do it.
Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.
In addition, where possible the chemical names that are detected are annotated with structures, either via lookup or name-to-structure parsing ("OPSIN"), and with identifiers from the chemical ontology ChEBI
Current work on OSCAR3 by Peter Corbett focuses on its use in SciBorg, a framework for the deep parsing of chemical text.
OSCAR3 also includes the Oscar Server, a Jetty-powered set of servlets. These provide the following services:
Parsing of text/HTML by OSCAR.
Text/InChI/SMILES/SMILES substructues/SMILES similarity search of papers, coupled with keyword and ontology-based search, using Lucene and the CDK.
List of all names found / all names that co-occur with a search term or terms.
Online management of a chemical/stopword lexicon.
Manual editing of SciXML fragments containing named entities, for creating of gold standards and training data.
8位参与ChemSpider的化学信息学专家分享Microsoft Jim Gray eScience的奖项奖金
Summary by 李晓霞 on 2013-04-24
Last updated by 李晓霞 on 2013-04-24