text2genome is using a unique way to map scientific articles to genomic locations: From a full-text scientific article and it's supplementary data files, all words that resemble DNA sequences are extracted and then mapped to public genome sequences. They can then be displayed on genome browser websites and used in data-mining applications.
The publication describing the text2genome system on open-access publications is: Haeussler, Gerner and Bergman (2011) Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics 27:980-6.
Source code for the text2genome application can be found at the project's SourceForge repository.
This website demonstrates how the results from the 2011 article can be used. You can search, browse and download data obtained from running text2genome on more than 150,000 open-access articles from PubMed Central.
Data can be overlayed onto the Ensembl and UCSC genome browsers. For some examples, please see the Search page and the links on the Browse page.
Update: The text2genome project is now being extended to include a larger part of the scientific literature by Maximilian Haeussler and David Haussler at the Center for Biomolecular Science and Engineering at the University of California-Santa Cruz and Casey Bergman at the University of Manchester, UK.
The first result of this collaboration is a Sciverse application that makes sequences clickable in Science Direct full text articles (See Sciverse App Gallery) and native tracks on the UCSC Genome Browser with mapped sequences from PubMed Central and Science Direct full text articles, see Genome-preview.
For further updates on the project, see the UCSC Genocoding project for current developments and progress with non-open-access publishers and extensions to the original text2genome system.