You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Paterson, Norman [CRI]" <np...@bsd.uchicago.edu> on 2015/12/02 18:04:58 UTC
Jira OpenNLP-216 and OpenNLP-217
Suggest the following is added to documentation (source https://www.mail-archive.com/search?l=issues@opennlp.apache.org&q=subject:%22%5C%5Bjira%5C%5D+%5C%5BComment+Edited%5C%5D+%5C(OPENNLP%5C-216%5C)+Add+Detokenizer+API+section%22&o=newest&f=1):
Create instance of SimpleTokenizer.
String sentence = He said \This is a test\.;
SimpleTokenizer instance = SimpleTokenizer.INSTANCE;
Tokenize the sentence using tokenize(String str) method from SimpleTokenizer
String tokens[] = instance.tokenize(sentence);
The operations array must have the same number of operation name as tokens
array. Basically array length should be equal.
Store the operation name N-times (tokens.length times) into operation array.
Operation operations[] = new Operation[tokens.length];
String oper = MOVE_RIGHT; // please refere above list for the list of
operations
for (int i = 0; i tokens.length; i++) {
operations[i] = Operation.parse(oper);
}
System.out.println(operations.length);
Here the operation array length will be equal to the tokens array length.
Now create an instance of DetokenizationDictionary by passing tokens and
operations arrays to the constructor.
DetokenizationDictionary detokenizeDict = new
DetokenizationDictionary(tokens, operations);
Pass DetokenizationDictionary instance to the DictionaryDetokenizer class to
detokenize the tokens.
DictionaryDetokenizer dictDetokenize = new
DictionaryDetokenizer(detokenizeDict);
DictionaryDetokenizer.detokenize requires two parameters. a). tokens array and
b). split marker
String st = dictDetokenize.detokenize(tokens, );
Output:
-
He said This is a test .
was (Author: prakash111...@gmail.com):
-----------------------------------------------------
Norman Paterson - Software Engineer
University of Chicago Div. of Biological Sciences
Ph. 773-834-4809 Cel. 312-350-9838
-----------------------------------------------------
********************************************************************************
This e-mail is intended only for the use of the individual or entity to which
it is addressed and may contain information that is privileged and confidential.
If the reader of this e-mail message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is prohibited. If you have received this e-mail in error, please
notify the sender and destroy all copies of the transmittal.
Thank you
University of Chicago Medicine and Biological Sciences
********************************************************************************