You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by turnguard <ts...@yahoo.com> on 2009/09/20 16:21:00 UTC

newbie intro

hi!

could someone please shed some light on the following newbie questions - i'd
definitly would like to 
use mahout (currently i'm using kea) for keyphrase extraction based on a
controlled vocabulary, so here's my situation :

1. i'm using a modified version of kea (http://www.nzdl.org/Kea/), that is
capable of getting it's controlled vocabulary from
   any SAIL-RDF-Repository (http://www.openrdf.org). kea has two modes to
extract keyphrases from a text 
   document - a  free one and a controlled one (which checks keyphrase
candidate against a skos:thesaurus). 
2. it's possible to train such an extraction model on the fly via a
webinterface (i should admit that i'm not conviced that 
   "training" is correct term for giving an extraction model new input data
- people would assume it's getting better and 
   better, but in most cases it's only getting different.
3. what i also liked to achieve is, that if someone creates a new
skos:Concept with some skos:prefLabel, i'd like to 
   suggest where to place this new concept in the thesaurus (suggest what
could be it's skos:narrowers or skos:broaders)
   currently i'm doing this via the bridge of indexed documents. (i.e.: a
new skos:Concept gets the skos:prefLabel 
   "house", i search for all documents containing "house", count the
allready existing skos:Concepts these documents 
   are tagged with and print out the list of concepts with the number of
their occurrences.


- has anyone some experience with extracting keyphrases from a document
using mahout
- has anyone some experience with extracting keyphrases based on a
controlled vocabulary 
  from a document using mahout
- has anyone some experience making thesaurus suggestions, i.e in my
thesaurus there's a concept with prefLabel "xml"
  someone enters a new concept "extensible markup language" : how could i
suggest, not to create a new concept, but
  to use "extensible markup language" as an altLabel for xml.
- could someone point me into the right direction for basic intro into
keyphrase extraction using mahout

any help or comments really appreciated
wkr www.turnguard.com


-- 
View this message in context: http://www.nabble.com/newbie-intro-tp25530069p25530069.html
Sent from the Mahout User List mailing list archive at Nabble.com.