You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kazam <az...@gmail.com> on 2009/04/27 18:25:56 UTC

Nutch fetch creates too many http sessions

Hi there,
I am generating nutch indexes for our site which is running off a websphere
server. The indexing takes about 20 hours to complete. However, after about
15-16 hours the websphere server crashes, because of too many sessions being
created. 

It seems that each fetch creates a new session. Is there a way that all
nutch fetches can be done via  a single session. 

Has anyone else encountered such problem? All ideas are welcome.

Thanks.
-- 
View this message in context: http://www.nabble.com/Nutch-fetch-creates-too-many-http-sessions-tp23259993p23259993.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Nutch fetch creates too many http sessions

Posted by kazam <az...@gmail.com>.
Thanks Dennis, you are right. I have bumped up the RAM for the webserver and
increased the number of allowed sessions, plus reduced the time for a
session timeout. Hopefully, this will allow for the indexing to complete.


Dennis Kubes-2 wrote:
> 
> This seems to be more of a session handling issue on the websphere 
> server than a nutch fetching issue.  Nutch doesn't actually create the 
> session, it just doesn't store cookies or session information so 
> websphere is creating a new session per fetch.
> 
> While having a single stored session for fetching the same domain in 
> Nutch seems like it might be interesting functionality, I don't believe 
> that currently exists.  My suggestion is to look into tuning websphere 
> session timeouts.  My guess would be they are set to a very high level.
> 
> Dennis
> 
> kazam wrote:
>> Hi there,
>> I am generating nutch indexes for our site which is running off a
>> websphere
>> server. The indexing takes about 20 hours to complete. However, after
>> about
>> 15-16 hours the websphere server crashes, because of too many sessions
>> being
>> created. 
>> 
>> It seems that each fetch creates a new session. Is there a way that all
>> nutch fetches can be done via  a single session. 
>> 
>> Has anyone else encountered such problem? All ideas are welcome.
>> 
>> Thanks.
> 
> 

-- 
View this message in context: http://www.nabble.com/Nutch-fetch-creates-too-many-http-sessions-tp23259993p23287083.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Nutch fetch creates too many http sessions

Posted by Dennis Kubes <ku...@apache.org>.
This seems to be more of a session handling issue on the websphere 
server than a nutch fetching issue.  Nutch doesn't actually create the 
session, it just doesn't store cookies or session information so 
websphere is creating a new session per fetch.

While having a single stored session for fetching the same domain in 
Nutch seems like it might be interesting functionality, I don't believe 
that currently exists.  My suggestion is to look into tuning websphere 
session timeouts.  My guess would be they are set to a very high level.

Dennis

kazam wrote:
> Hi there,
> I am generating nutch indexes for our site which is running off a websphere
> server. The indexing takes about 20 hours to complete. However, after about
> 15-16 hours the websphere server crashes, because of too many sessions being
> created. 
> 
> It seems that each fetch creates a new session. Is there a way that all
> nutch fetches can be done via  a single session. 
> 
> Has anyone else encountered such problem? All ideas are welcome.
> 
> Thanks.