You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Damiano Porta <da...@gmail.com> on 2016/08/16 09:49:17 UTC

RegexNameFinderFactory with SimpleTokenizer

Hello,

After person, addresses etc I also need to extract email/telephone from my
documents, i just found
https://github.com/apache/opennlp/blob/cac4db6d3cb74ae3414fc8c438eec770af783538/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinderFactory.java

Reading the code it seems to be possible using the EMAIL/TELEPHONE regexes
with whitespace tokenizer only. At the moment i am using the
SimpleTokenizer, I cant change it because I am using a NER model (MAXENT)
that is working fine. Is there a workaround to match those regexes?

Thanks!
Damiano