You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Michael Kelleher <mj...@gmail.com> on 2013/05/06 04:37:56 UTC

Web Connector

Hello,

I was not sure which to post to, although I think this group is the most 
pertinent.

For the web connector, is there a document that defines what "data" is 
sent to the output connector?  I am wondering if in addition to the HTML 
payload, if I wrote a custom connector would I have access to the:  
Headers generated during the "current" page request, Cookies generated 
during the "current"page request?

Thanks.

--mike

Re: Web Connector

Posted by Karl Wright <da...@gmail.com>.
Hi Mike,

The rule is that a repository connector can only include metadata that is
not likely to change as the result of the actual circumstances of the
crawl.  Otherwise, incremental crawling is a fiction.  Unfortunately,
cookies are exactly the kind of data that would change every time the
document is fetched.

Karl



On Sun, May 5, 2013 at 10:37 PM, Michael Kelleher <mj...@gmail.com>wrote:

> Hello,
>
> I was not sure which to post to, although I think this group is the most
> pertinent.
>
> For the web connector, is there a document that defines what "data" is
> sent to the output connector?  I am wondering if in addition to the HTML
> payload, if I wrote a custom connector would I have access to the:  Headers
> generated during the "current" page request, Cookies generated during the
> "current"page request?
>
> Thanks.
>
> --mike
>