You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/27 11:37:03 UTC

HtmlParser performance

Hello,

I am modifiying htmlparser for my own purposes. After lots of coding
and testing, I pretty much know what to do.

I was wondering, if we were lets say lingpipe library to do some named
entity recognition at parse stage. Many libraries such as lingpipe,
but not limited to lingpipe have some initialization procedures, and
require certain datastructures to be initialized, before can be put to
work. Where should I put those in html parser? I just need to load a
map, from certain text file, and html parser to be able to access to
that map.

Is there a way to do this avoiding a reinit - each time htmlparser is called?

best regards,
c.b.