You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by bu...@apache.org on 2005/07/14 16:56:56 UTC

DO NOT REPLY [Bug 35741] New: - [Link] iHOP - Information Hyperlinked over Proteins

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35741>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35741

           Summary: [Link] iHOP - Information Hyperlinked over Proteins
           Product: Cocoon 2
           Version: Current SVN 2.1
          Platform: Other
               URL: http://www.pdg.cnb.uam.es/UniPub/iHOP/
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Documentation
        AssignedTo: dev@cocoon.apache.org
        ReportedBy: hoffmann@cbio.mskcc.org


Version used: Cocoon 2.17

---
Summary
iHOP (Information Hyperlinked over Proteins) is a public information system for
the biomedical sciences, providing the network of gene synonyms as a mean of
navigating the scientific literature. 
By employing genes and proteins as hyperlinks between sentences and abstracts,
the information in PubMed (National Library of Medicine) can be converted into
one navigable resource. This way the information in more than twelve million
abstracts in PubMed can be converted into one small world that brings all
advantages of the internet to scientific literature research.
The current network provided in iHOP contains two million sentences and 30000
different genes from human, mouse, fly, worm, fish, plants and yeast.

Hoffmann, R., Valencia, A. A Gene Network for Navigating the Literature. Nature
Genetics 36, 664 (2004)
---

Additional Technical information

Underlying data
In a process previous to the web application, genes and proteins and MeSH terms
(biomedical thesaurus) are identified in about 12 million biomedical abstracts
from PubMed. Of a total number of 200000 genes, about 30000 genes were
identified in 2 million abstracts.
Starting from this index, 1 XML document was created for each abstract, 1 for
each gene and 2 different xml documents for each of the gene in the literature.
Thus the total number of different XML documents is around 2.3 million. 
These documents essentially contain the original text divided into individual
sentences with gene synonyms, MeSH terms, and verbs tagged. Gene documents also
contain general information, such as database references, synonyms, and a list
of homologous genes, which in the web application are used to provide links to
external resources.

Web application
The web application currently consists of the data in 2.3 million XML-documents,
33 XSP scripts and 37 different transformation style sheets for all the
different views on the gene and abstract data. 
Dynamic effects are achieved through the HTML and JavaScript layer on the client
side to minimize server load and to avoid complex front-end database queries.
This way, extremely fast response times are obtained and multiple concurrent
usage of the system is possible.

---

* How can we verify this site is actually built with Cocoon?
Calling an inexistent page, e.g.
"http://www.pdg.cnb.uam.es/UniPub/iHOP/nil/anything", will lead to the Cocoon
error message.
Credit is given to the Cocoon project on the first page of iHOP. 

* How much time did it take to build the site from design to publication?
16 month

* How many people were involved in the project?
Just me.

* How much traffic does the site handle?
Around 10000 different IPs per day.

* What made you choose Cocoon to build the site?
The book "Java and XML by Brett McLaughlin. The perfect separation between lots
of fairly stable data in XML and it's everchanging HTML representation.

* What other information do you want to disclose (e.g. how does it work, how did
you build it, what parts of Cocoon did you use)?
See technical information. 

* Contact 
Robert Hoffmann
Memorial Sloan-Kettering Cancer Center, MSKCC
hoffmann@cbio.mskcc.org, http://www.cbio.mskcc.org/~hoffmann/

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.