You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Gang <mi...@gmail.com> on 2013/01/08 16:08:13 UTC

nutch 2.1 and session cookies

Hi,

I am searching for a way to scrap pages where you have to login first.
I searched in google and found this jira
https://issues.apache.org/jira/browse/NUTCH-827
"
I've created a patch against the trunk which adds support for very
rudimentary POST-based authentication support. It takes a link from
nutch-site.xml with a site to POST to and its respective parameters
(username, password, etc.). It then checks upon every request whether any
cookies have been initialized, and if none have, it fetches them from the
given link.
".

I wanted to ask if this issue will be introduced in nutch 2?

Thanks,
David

Re: nutch 2.1 and session cookies

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Michael,

So far there has been no discussion on this topic with specific focus on
adding the functionality.
I also notice that NUTCH-827 is not marked for inclusion in 2.2.
I would urge you to open another issue describing your approach and
suggested solution specifically for 2.x... if this is possible.

Thanks

Lewis

On Tue, Jan 8, 2013 at 7:08 AM, Michael Gang <mi...@gmail.com> wrote:

> Hi,
>
> I am searching for a way to scrap pages where you have to login first.
> I searched in google and found this jira
> https://issues.apache.org/jira/browse/NUTCH-827
> "
> I've created a patch against the trunk which adds support for very
> rudimentary POST-based authentication support. It takes a link from
> nutch-site.xml with a site to POST to and its respective parameters
> (username, password, etc.). It then checks upon every request whether any
> cookies have been initialized, and if none have, it fetches them from the
> given link.
> ".
>
> I wanted to ask if this issue will be introduced in nutch 2?
>
> Thanks,
> David
>



-- 
*Lewis*