You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexander Aristov <al...@gmail.com> on 2009/01/20 11:56:00 UTC
how to split a page into separate documents
Hi all
Can someone suggest me how to write a plugin (or parser) which can parse a
page and produce more than one document from it.
I have pages which are composed of sections. And I would like make each
section as a separate searchable document in Nutch. I have no problem with
parsing the doc, I can write special parser and I know structure of the
pages.
But parsers return only one document - that is the matter.
How should I change the behavoiur?
--
Best Regards
Alexander Aristov
Re: how to split a page into separate documents
Posted by Doğacan Güney <do...@gmail.com>.
Hi,
On Tue, Jan 20, 2009 at 12:56 PM, Alexander Aristov
<al...@gmail.com> wrote:
> Hi all
>
> Can someone suggest me how to write a plugin (or parser) which can parse a
> page and produce more than one document from it.
>
> I have pages which are composed of sections. And I would like make each
> section as a separate searchable document in Nutch. I have no problem with
> parsing the doc, I can write special parser and I know structure of the
> pages.
>
> But parsers return only one document - that is the matter.
>
> How should I change the behavoiur?
Actually, nutch trunk parsers can return more than one document. Take a look at
feed plugin for an example.
> --
> Best Regards
> Alexander Aristov
>
--
Doğacan Güney