You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Martin Krallinger <kr...@gmail.com> on 2015/02/19 17:53:21 UTC
CALL FOR PARTICIPATION: CHEMDNER-patents task (Biocreative V)

CALL FOR PARTICIPATION: CHEMDNER-patents task: Chemical and drug name
recognition task in patents  (
http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/)




The CHEMDNER-patents task (BioCreative V - http://www.biocreative.org) is a
community challenge on named entity recognition of chemical compounds in
patents and text classification.





*Task Organizers*

   - Martin Krallinger, Spanish National Cancer Research Centre
   - Florian Leitner, Universidad Politecnica de Madrid
   - Obdulia Rabal, Center for Applied Medical Research (CIMA), University
   of Navarra
   - Julen Oyarzabal, Center for Applied Medical Research (CIMA),
   University of Navarra
   - Alfonso Valencia, Spanish National Cancer Research Centre




Registration and participation

Teams interested in the CHEMDNER-patents task should register for track 2
of BioCreative V:

http://www.biocreative.org/events/biocreative-v/biocreative-v-team/




Background

This task will address the automatic extraction of chemical and biological
data from medicinal chemistry patents. The identification and integration
of all information contained in these patents (e.g., chemical structures,
their synthesis and associated biological data) is currently a very hard
task not only for database curators but for life sciences researches and
biomedical text mining experts as well. Despite the valuable
characterizations of biomedical relevant entities such as chemical
compounds, genes and proteins contained in patents, academic research in
the area of text mining and information extraction using patent data has
been minimal. Pharmaceutical patents covering chemical compounds provide
information on their therapeutic applications and, in most cases, on their
primary biological targets.



*CHEMDNER-patents tasks*

This task would cover three essential steps for the identification of
biomedical relevant descriptions of chemical compounds:

·  *CEMP* (chemical entity mention in patents, main task): the detection of
chemical named entity mentions in patents (start and end indices
corresponding to all the chemical entities).

·  *CPD* (chemical passage detection, text classification task): the
detection of sentences that mention chemical compounds.

·  *CER* (chemical entity relation): the extraction of chemical compound
relations; covering biologically relevant chemical relations (e.g.
chemical-biological targets relations).

Participating teams do not need to send results for all of three sub-tasks.
The can also send results only for individual sub-tasks.


CHEMDNER session at the BioCreative V workshop

At the BioCreative V Workshop to be held in Seville (Spain) September 9-11
(2015) there will be a session devoted to the CHEMDNER patents task. This
session will include an overview talk presenting the used datasets and
results obtained by the participating teams. A number of teams will also be
invited to present their systems. We plan to have also a discussion session
where teams, task organizers and domain experts will discuss the obtained
results and future steps. Finally during the poster session all teams will
be able to present their participating strategies.


CHEMDNER patents workshop proceedings and journal special issue

Participating teams will be invited to contribute to the: Proceedings of
the Fifth BioCreative Challenge Evaluation Workshop. A selected number of
top performing teams will also be invited to contribute with a system
description paper to a special issue of a relevant journal in the field.




Previous CHEMDNER (Biocreative IV)

The CHEMDNER-Biocreative IV special issue was published in the Journal of
Chemoinformatics: Volume 7 Supplement 1, 'Text mining for chemistry and the
CHEMDNER track'. It focused on the detection of chemical entities from
PubMed abstracts. The entire supplement is available from the *Journal of
Cheminformatics*: http://www.jcheminf.com/supplements/7/S1



The special issue includes an overview paper on the task, a paper on the
CHEMDNER corpus and 13 selected systems description papers. Top scoring
teams obtained an F-score of 87.39% for the recognition of chemical entity
mentions, a very competitive result already close to the human IAA.
Additionally some systems could show additional improvements compared to
their original submissions.



In addition participating teams provided a short systems description paper
for the BioCreative workshop proceedings, see:

http://www.biocreative.org/resources/publications/chemdner-proceed-publications/



*References*

   1. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &
   Valencia, A. CHEMDNER: The drugs and chemical names extraction challenge.
   Journal of Cheminformatics 2015, 7(Suppl 1):S1
   2. Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and
   its annotation principles. Journal of Cheminformatics 2015, 7(Suppl 1):S2
   3. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., &
   Valencia, A. (2013, October). Overview of the chemical compound and drug
   name recognition (CHEMDNER) task. In BioCreative Challenge Evaluation
   Workshop (Vol. 2, p. 2).
   4. Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K.,
   Boppana, K., Lowe, D., ... & Muresan, S. (2014). Annotated Chemical Patent
   Corpus: A Gold Standard for Text Mining. PloS one, 9(9), e107477.
   5. Grego, T., Pęzik, P., Couto, F. M., & Rebholz-Schuhmann, D. (2009).
   Identification of chemical entities in patent documents. In Distributed
   Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and
   Ambient Assisted Living (pp. 942-949). Springer Berlin Heidelberg.
   6. Jessop, D. M., Adams, S. E., & Murray-Rust, P. (2011). Mining
   chemical information from Open patents. Journal of cheminformatics, 3(1),
   40.
   7. Gurulingappa, H., Müller, B., Klinger, R., Mevissen, H. T.,
   Hofmann-Apitius, M., Friedrich, C. M., & Fluck, J. (2010). Prior Art Search
   in Chemistry Patents Based On Semantic Concepts and Co-Citation Analysis.
   In TREC.
   8. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M.,
   Stothard, P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resource
   for in silico drug discovery and exploration. Nucleic acids research,
   34(suppl 1), D668-D672.
   9. Zhu, F., Han, B., Kumar, P., Liu, X., Ma, X., Wei, X., ... & Chen, Y.
   (2010). Update of TTD: therapeutic target database. Nucleic acids research,
   38(suppl 1), D787-D791.