You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Michael Kelleher <mj...@gmail.com> on 2013/05/06 04:37:56 UTC
Web Connector
Hello,
I was not sure which to post to, although I think this group is the most
pertinent.
For the web connector, is there a document that defines what "data" is
sent to the output connector? I am wondering if in addition to the HTML
payload, if I wrote a custom connector would I have access to the:
Headers generated during the "current" page request, Cookies generated
during the "current"page request?
Thanks.
--mike
Re: Web Connector
Posted by Karl Wright <da...@gmail.com>.
Hi Mike,
The rule is that a repository connector can only include metadata that is
not likely to change as the result of the actual circumstances of the
crawl. Otherwise, incremental crawling is a fiction. Unfortunately,
cookies are exactly the kind of data that would change every time the
document is fetched.
Karl
On Sun, May 5, 2013 at 10:37 PM, Michael Kelleher <mj...@gmail.com>wrote:
> Hello,
>
> I was not sure which to post to, although I think this group is the most
> pertinent.
>
> For the web connector, is there a document that defines what "data" is
> sent to the output connector? I am wondering if in addition to the HTML
> payload, if I wrote a custom connector would I have access to the: Headers
> generated during the "current" page request, Cookies generated during the
> "current"page request?
>
> Thanks.
>
> --mike
>