You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Maciek Sykulski <ma...@gmail.com> on 2012/09/05 17:18:12 UTC
Apache Stanbol in Cytogenetic Lab demo

Hello Stanbol Developers,

We're bioinformatics group <http://bioputer.mimuw.edu.pl/> at the
University of Warsaw, and we deployed Apache Stanbol technologies at
the Institute
of Mother and Child, Cytogenetics
Labs<http://www.imid.med.pl/klient1/view-content/218/Pracownia-Cytogenetyki-Molekularnej--oferta-diagnostyczna.html>,
Warsaw,PL.

You can try the *demo* of what we've achieved so far here:
http://bioputer.mimuw.edu.pl:9442/welcome/

This is a realization of the
http://wiki.iks-project.eu/index.php/Cytogenetic_Proposal , a part of IKS
Early Adopters.

The demo presents integration of Apache Stanbol with software used at IMID
Cytogenetics Labs. The purpose of the integration is to allow geneticists
to annotate experiment data with relevant content: information about genes,
genetic diseases, scientific publications, and more,  and later allow for
searching through, and summarized reports of the annotated content.

The demo features:

   1. annotation of biological content (in this case it's a result of
aCGH<http://en.wikipedia.org/wiki/aCGH> experiment)
   with relevant Linked Data: user may add enhancements found with engines,
   and create his own tree of enhanced content.
   2. a set of semi-automated enhancers which communicate with Apache
   Stanbol and search Linked Data for relevant entries
   3. faceted search<http://bioputer.mimuw.edu.pl:9442/welcome/stanbol_search/search/>
through
   content annotations provided by Apache Stanbol Contenthub.



   - For the purpose of this demo UNIPROT linked data was indexed using
   Apache Stanbol tools (the resulting index is 10GB in size)
      - UNIPROT RDF
release<ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/README>
      - http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
      - This UNIPROT relase also contains following Linked Data: *Gene
      Ontology* terms, *PubMed* abstracts
   - the site was built on web2py <http://web2py.com/>

What you may find useful in your projects:

   - A Stanbol connection module in
Python<http://bioputer.mimuw.edu.pl:33000/projects/ikscyt/repository/entry/applications/welcome/modules/stanbol.py>
was
   created for the puprose of this integration.
   - and a set of content enhancers in Python for
web2py<http://bioputer.mimuw.edu.pl:33000/projects/ikscyt/repository/entry/applications/welcome/controllers/stanbol_enhancers.py>
were
   created for the purpose of this integration. If you don't use web2py,
   you'll find there examples of how to use stanbol.py module. You can try
   these enhancers at the demo site.

What we didn't quite succeed yet:

   - we wanted to integrate VIE <http://viejs.org/> in our presentation
   side, however, as far as i know, VIE does not support getting content from
   Stanbol Contenthub. We tried to modify VIE to allow for that

   http://bioputer.mimuw.edu.pl:33000/projects/ikscyt/repository/diff/applications/welcome/static/vie/vie-2.0.0.debug.js?utf8=%E2%9C%93&rev=4175&rev_to=4164
   however some more modifications would be needed since the resulting json
   from contenthub is parsed in unexpected way.
   - the resulting content/enhacements graph is not yet stored in a triple
   store, our Stanbol does not allow for SPARQL queries for content ("There
   is no registered TripleCollection." I'll make another post about it).


The demo site: http://bioputer.mimuw.edu.pl:9442/welcome/
Home page for this Early Adopters project: http://bioputer.mimuw.edu.pl/iks/


Don't hesitate to ask questions or issue any comments.
Best,
Maciek Sykulski