You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Xavier Sumba Toral <c....@hotmail.com> on 2017/10/11 03:23:23 UTC

Tag publications with the UNESCO nomenclature

Hi,

I’m having some troubles to tag some publications metadata with a taxonomy. The problem is the following:


The UNESCO nomenclature [1] defines areas to classify research papers. Each of these areas have a code and are divided in three levels [2]: 1) fields (two-digit code), 2) disciplines (four-digit code), and 3) subdisciplines (six-digit code). Then I'd like to map a publication with one of these areas. So, when given publications' meta data, return UNESCO areas.

For example, let's say I have the title, abstract, and keywords of a publication.

Input

Title: Learning representations by back-propagating errors

Abstract: There have been many attempts to design self-organizing neural networks. The aim is to find a powerful synaptic modification rule that will allow an arbitrarily connected neural network to develop an internal structure that is appropriate for a particular task domain.....

Keywords: Neural net, back-propagation, artificial intelligence.

Output: If we get the areas in a bottom-up approach, we should get the subdisciplines and it's easy to infer the other levels. BTW, it could be more than one output for interdisciplinary publications or areas that have been combined since the taxonomy hasn't been updated. So, I might get the following subdiscipline based on the UNESCO taxonomy: 1203.04 Artificial Intelligence > Computer Sciences > Mathematics.


So anyone can help me with some insights to implement a taxonomy matcher? or some related work already done?

Cheers.

[1] https://en.wikipedia.org/wiki/UNESCO_nomenclature
[2] http://unesdoc.unesco.org/images/0008/000829/082946eb.pdf