You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "Manoj B. Narayanan" <ma...@gmail.com> on 2017/09/26 04:05:13 UTC

Features to tokenizer

Hi,

I was wondering if there is an possibility to provide features to
tokenizer. Sometimes, tokenization might depend on certain factors.

For example, the word 'semi-supervised' shouldn't be tokenized while
'august-september' should be tokenized.

Is there any way by which we could add custom features to the Learnable
Tokenizer similar to NER.

Thanks.

Manoj.

Re: Features to tokenizer

Posted by "Manoj B. Narayanan" <ma...@gmail.com>.
Hi,

I have previously come across a way to add features at the character level.
I would like to know if this can be done at the word level.

Thanks.

Manoj.

On Tue, Sep 26, 2017 at 9:35 AM, Manoj B. Narayanan <
manojb.narayanan2011@gmail.com> wrote:

> Hi,
>
> I was wondering if there is an possibility to provide features to
> tokenizer. Sometimes, tokenization might depend on certain factors.
>
> For example, the word 'semi-supervised' shouldn't be tokenized while
> 'august-september' should be tokenized.
>
> Is there any way by which we could add custom features to the Learnable
> Tokenizer similar to NER.
>
> Thanks.
>
> Manoj.
>