You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Lukas, Ray" <Ra...@idearc.com> on 2009/04/22 22:30:01 UTC

Hadoop thread seems to remain alive

I am hoping to write up an article on my project and all the cool things
that I figured out about nutch and java and eclipse, etc.. I will go
into a long and boring dissertation at that point.. For now I will keep
it short and sweet... As best I can.. 
I have eclipse, java 6, nutch, hadoop running and they all work great,
except for one thing.. After all my code completes hadoop seems to still
have open threads. Sometimes I can not delete the newly created indexes,
or the hadoop log file because I believe that hadoop still hold them. 
Basically I am using the crawl.java from the source directory.. 

Question:
What is the proper accepted and safe way to shut down nutch (hadoop)
after I am done with it?

Hadoop.getFileSystem().closeAll ??
Is that what I should be doing?? 

Thanks guys.. Thanks
Ray

Oh Hadoop is on a single machine, right out of the box, I did nothing
special with it.. Nothing.. 

Re: Hadoop thread seems to remain alive

Posted by Dennis Kubes <ku...@apache.org>.
I don't know the exact cause but I would say it depends on what OS and 
what job you are running, if any, when you shutdown.  AFAIK the hadoop 
scripts send a SIGQUIT, same a ctrl-c, to the server processes.  They 
may or may not have shutdown hooks built it.  I know the hbase servers 
processes did a while ago, don't know if the hadoop ones do.

The servers do create pid files and put them in the pids directory as 
specified in the hadoop-env.sh file. An indexer job may also have a file 
that is used to lock the index it is creating before it is finished.

Don't know if any of this helps your problem.  Just a guess but it may 
be locking issues with windows as well.

Dennis

Raymond Balmès wrote:
> Same problems... even rebooting the PC does not always solve the issue,
> files remain locked.
> 
> I have gone the brutal way and use unlocker.exe but I mean to find out
> what's going wrong so I will keep posted on this one.
> 
> -Ray-
> 
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
> 
>> Question:
>> What is the proper accepted and safe way to shut down nutch (hadoop)
>> after I am done with it?
>>
>> Hadoop.getFileSystem().closeAll() ??
>> I did try this and no luck. Anyone else having this problem?
>>
>> Thanks guys.. Thanks, if/when I find it I will post it for everyone.
>> Ray
>>
> 

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
 
Andrzej:
I agree on all points.. I have (I just recently posted this) implemented
the Crawl in my own class.. For the exact reasons that you mentioned..
You are 100 percent correct.. 

I will, as you suggested read up on the PluginRepository classloader
issue. Thanks 

I was trying to kill that thread but could not find a good way of doing
that.. I am on XP in eclipse.. How would I do that.. I would love to
know.. If you would.. Tell me.. That would be great. If you would..

ray



-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Thursday, April 23, 2009 10:35 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Lukas, Ray wrote:
> Hey Ray.. Great name you have there.. HA.. 
> 
> I don't actually care about deleting these files.. That is not the
issue.. See I have embedded Nutch in my application. That application
calls nutch over and over again to do crawling and index creation.. This
thread that stays alive.. It eventually exceeds some limit (native
thread) in Java and crashes my application.. So that is why I need to
find and properly close down that service or whatever. I noticed that
Hadopp files are still locked and so I am thinking that as a hint that
it is hadopp.. 
> 
> Bottom line is
> 
> When you run Crawl in the java directory, some thread stays open..
That thread is killing me.. What is it that stays alive past the
completion of the Crawl.java code... 
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe
something stays alive.. How to clise that is the issue.. 
> 
> See what I am asking.. 
> 

First, don't use the Crawl class to implement continuous crawling in a 
long-running application. This class was never meant to do this - e.g. 
it instantiates various Nutch tools over and over again. Just replicate 
the logic there in your own class so that you instantiate things once.

Second, it's likely that you're experiencing the PluginRepository 
classloader issue, described here: 
https://issues.apache.org/jira/browse/NUTCH-356 . The patch in this 
issue is still not applied, because it's a hack, and there were few 
active users who experienced this problem - because it occurs only in 
long-running applications that run Nutch tools in the context of a 
single JVM, and most users run Nutch tools from command-line.

And finally, if the application is stuck and doesn't exit due to a still

running thread, generate a thread dump and see what that thread is.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Hadoop thread seems to remain alive

Posted by Andrzej Bialecki <ab...@getopt.org>.
Lukas, Ray wrote:
> Hey Ray.. Great name you have there.. HA.. 
> 
> I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls nutch over and over again to do crawling and index creation.. This thread that stays alive.. It eventually exceeds some limit (native thread) in Java and crashes my application.. So that is why I need to find and properly close down that service or whatever. I noticed that Hadopp files are still locked and so I am thinking that as a hint that it is hadopp.. 
> 
> Bottom line is
> 
> When you run Crawl in the java directory, some thread stays open.. That thread is killing me.. What is it that stays alive past the completion of the Crawl.java code... 
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe something stays alive.. How to clise that is the issue.. 
> 
> See what I am asking.. 
> 

First, don't use the Crawl class to implement continuous crawling in a 
long-running application. This class was never meant to do this - e.g. 
it instantiates various Nutch tools over and over again. Just replicate 
the logic there in your own class so that you instantiate things once.

Second, it's likely that you're experiencing the PluginRepository 
classloader issue, described here: 
https://issues.apache.org/jira/browse/NUTCH-356 . The patch in this 
issue is still not applied, because it's a hack, and there were few 
active users who experienced this problem - because it occurs only in 
long-running applications that run Nutch tools in the context of a 
single JVM, and most users run Nutch tools from command-line.

And finally, if the application is stuck and doesn't exit due to a still 
running thread, generate a thread dump and see what that thread is.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
Cool// well at least you know that I was willing to give you a  hand and some code..  You are not alone in this cruel world.. 

-----Original Message-----
From: Raymond Balmès [mailto:raymond.balmes@gmail.com] 
Sent: Saturday, April 25, 2009 5:28 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Hey ray,

Actually found my problem, I wasn't stopping Tomcat at the right moment &
the right way...  so it kept some threads/locks.
If I do it using the Windows proper service... works fine.

-Ray-

2009/4/24 Lukas, Ray <Ra...@idearc.com>

> What does that thread do.. Well you guessed it (and this is a hint why I
> first thought it was in a problem in Hadoop) it opens up and gathers a
> Hadoop segments..
> Take a peek at FetchedSegments, there is a Thread in there (class) called
> SegmentUpdater.. It never dies.. You see, "while (true)".. That is your/our
> problem.. I addded a membervariable that is set inside the
> FetchedSegments.close method, which is called by the NutchBean.close method
> to set that member variable. Once set. The loop exits and then thread
> expires, maybe after it comes out of sleeping, but it does expire....
> You see what I am saying..
>
> -----Original Message-----
> From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
>  Sent: Friday, April 24, 2009 2:51 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop thread seems to remain alive
>
> Actually I have my files locked when exiting from Tomcat, no matter how I
> exit gracefull or not, probably due to some lost threads.
> Since the servlet uses the same NutchBean looks like a similar issue as
> yours.
>
> Maybe there is no nutchBean.close() being called I will look for it when I
> have more time for this.
>
> -The other Ray-
>
>
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
>
> > I'm sorry guys.. I made a mistake.. This is not coming out of hadoop..
> This
> > thread is coming out of nutch bean. Sorry.. I should have looked more
> > carefully..  I am still learning this stuff..
> >
> > Here is my code as it is.. Let me look into this some more..
> >
> >                NativeCrawler nativeCrawler = new NativeCrawler("
> > www.dcu.com", "dcu-index", 2, 5);
> >                int maxHits = 1000;
> >        -->     NutchBean nutchBean = new
> > NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
> >                Query nutchQuery = Query.parse("credit",
> > nativeCrawler.getConfig());
> >                Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
> >
> >                nutchQuery = null;
> >                nutchBean.close();
> >                nutchBean = null;
> >
> > Native Crawler is my version of java.Crawl code.. Which works great.. I
> am
> > not closing down the query part of my system correctly and will now go
> and
> > read on that.. Please forgive my taking your time.. I should have been a
> > little more precise in my work.. Sigh.. It happens when you are rushing
> on
> > to many projects.. Sorry guys and thanks so much for the help that you
> guys
> > gave me.. I will post the solution to this for us.
> >
> > ray
> >
> > -----Original Message-----
> > From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]
> > Sent: Thursday, April 23, 2009 9:21 AM
> > To: nutch-user@lucene.apache.org
> >  Subject: RE: Hadoop thread seems to remain alive
> >
> > Hey Ray.. Great name you have there.. HA..
> >
> > I don't actually care about deleting these files.. That is not the
> issue..
> > See I have embedded Nutch in my application. That application calls nutch
> > over and over again to do crawling and index creation.. This thread that
> > stays alive.. It eventually exceeds some limit (native thread) in Java
> and
> > crashes my application.. So that is why I need to find and properly close
> > down that service or whatever. I noticed that Hadopp files are still
> locked
> > and so I am thinking that as a hint that it is hadopp..
> >
> > Bottom line is
> >
> > When you run Crawl in the java directory, some thread stays open.. That
> > thread is killing me.. What is it that stays alive past the completion of
> > the Crawl.java code...
> > If you run org.apache.nutch.crawl.Crawl from within java/eclispe
> something
> > stays alive.. How to clise that is the issue..
> >
> > See what I am asking..
> >
> > Ray, the other ray..
> >
> > -----Original Message-----
> > From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
> > Sent: Thursday, April 23, 2009 8:23 AM
> > To: nutch-user@lucene.apache.org
> > Subject: Re: Hadoop thread seems to remain alive
> >
> > Same problems... even rebooting the PC does not always solve the issue,
> > files remain locked.
> >
> > I have gone the brutal way and use unlocker.exe but I mean to find out
> > what's going wrong so I will keep posted on this one.
> >
> > -Ray-
> >
> > 2009/4/23 Lukas, Ray <Ra...@idearc.com>
> >
> > > Question:
> > > What is the proper accepted and safe way to shut down nutch (hadoop)
> > > after I am done with it?
> > >
> > > Hadoop.getFileSystem().closeAll() ??
> > > I did try this and no luck. Anyone else having this problem?
> > >
> > > Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> > > Ray
> > >
> >
>

Re: Hadoop thread seems to remain alive

Posted by Raymond Balmès <ra...@gmail.com>.
Hey ray,

Actually found my problem, I wasn't stopping Tomcat at the right moment &
the right way...  so it kept some threads/locks.
If I do it using the Windows proper service... works fine.

-Ray-

2009/4/24 Lukas, Ray <Ra...@idearc.com>

> What does that thread do.. Well you guessed it (and this is a hint why I
> first thought it was in a problem in Hadoop) it opens up and gathers a
> Hadoop segments..
> Take a peek at FetchedSegments, there is a Thread in there (class) called
> SegmentUpdater.. It never dies.. You see, "while (true)".. That is your/our
> problem.. I addded a membervariable that is set inside the
> FetchedSegments.close method, which is called by the NutchBean.close method
> to set that member variable. Once set. The loop exits and then thread
> expires, maybe after it comes out of sleeping, but it does expire....
> You see what I am saying..
>
> -----Original Message-----
> From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
>  Sent: Friday, April 24, 2009 2:51 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop thread seems to remain alive
>
> Actually I have my files locked when exiting from Tomcat, no matter how I
> exit gracefull or not, probably due to some lost threads.
> Since the servlet uses the same NutchBean looks like a similar issue as
> yours.
>
> Maybe there is no nutchBean.close() being called I will look for it when I
> have more time for this.
>
> -The other Ray-
>
>
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
>
> > I'm sorry guys.. I made a mistake.. This is not coming out of hadoop..
> This
> > thread is coming out of nutch bean. Sorry.. I should have looked more
> > carefully..  I am still learning this stuff..
> >
> > Here is my code as it is.. Let me look into this some more..
> >
> >                NativeCrawler nativeCrawler = new NativeCrawler("
> > www.dcu.com", "dcu-index", 2, 5);
> >                int maxHits = 1000;
> >        -->     NutchBean nutchBean = new
> > NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
> >                Query nutchQuery = Query.parse("credit",
> > nativeCrawler.getConfig());
> >                Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
> >
> >                nutchQuery = null;
> >                nutchBean.close();
> >                nutchBean = null;
> >
> > Native Crawler is my version of java.Crawl code.. Which works great.. I
> am
> > not closing down the query part of my system correctly and will now go
> and
> > read on that.. Please forgive my taking your time.. I should have been a
> > little more precise in my work.. Sigh.. It happens when you are rushing
> on
> > to many projects.. Sorry guys and thanks so much for the help that you
> guys
> > gave me.. I will post the solution to this for us.
> >
> > ray
> >
> > -----Original Message-----
> > From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]
> > Sent: Thursday, April 23, 2009 9:21 AM
> > To: nutch-user@lucene.apache.org
> >  Subject: RE: Hadoop thread seems to remain alive
> >
> > Hey Ray.. Great name you have there.. HA..
> >
> > I don't actually care about deleting these files.. That is not the
> issue..
> > See I have embedded Nutch in my application. That application calls nutch
> > over and over again to do crawling and index creation.. This thread that
> > stays alive.. It eventually exceeds some limit (native thread) in Java
> and
> > crashes my application.. So that is why I need to find and properly close
> > down that service or whatever. I noticed that Hadopp files are still
> locked
> > and so I am thinking that as a hint that it is hadopp..
> >
> > Bottom line is
> >
> > When you run Crawl in the java directory, some thread stays open.. That
> > thread is killing me.. What is it that stays alive past the completion of
> > the Crawl.java code...
> > If you run org.apache.nutch.crawl.Crawl from within java/eclispe
> something
> > stays alive.. How to clise that is the issue..
> >
> > See what I am asking..
> >
> > Ray, the other ray..
> >
> > -----Original Message-----
> > From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
> > Sent: Thursday, April 23, 2009 8:23 AM
> > To: nutch-user@lucene.apache.org
> > Subject: Re: Hadoop thread seems to remain alive
> >
> > Same problems... even rebooting the PC does not always solve the issue,
> > files remain locked.
> >
> > I have gone the brutal way and use unlocker.exe but I mean to find out
> > what's going wrong so I will keep posted on this one.
> >
> > -Ray-
> >
> > 2009/4/23 Lukas, Ray <Ra...@idearc.com>
> >
> > > Question:
> > > What is the proper accepted and safe way to shut down nutch (hadoop)
> > > after I am done with it?
> > >
> > > Hadoop.getFileSystem().closeAll() ??
> > > I did try this and no luck. Anyone else having this problem?
> > >
> > > Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> > > Ray
> > >
> >
>

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
What does that thread do.. Well you guessed it (and this is a hint why I first thought it was in a problem in Hadoop) it opens up and gathers a Hadoop segments.. 
Take a peek at FetchedSegments, there is a Thread in there (class) called SegmentUpdater.. It never dies.. You see, "while (true)".. That is your/our problem.. I addded a membervariable that is set inside the FetchedSegments.close method, which is called by the NutchBean.close method to set that member variable. Once set. The loop exits and then thread expires, maybe after it comes out of sleeping, but it does expire.... 
You see what I am saying.. 

-----Original Message-----
From: Raymond Balmès [mailto:raymond.balmes@gmail.com] 
Sent: Friday, April 24, 2009 2:51 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Actually I have my files locked when exiting from Tomcat, no matter how I
exit gracefull or not, probably due to some lost threads.
Since the servlet uses the same NutchBean looks like a similar issue as
yours.

Maybe there is no nutchBean.close() being called I will look for it when I
have more time for this.

-The other Ray-


2009/4/23 Lukas, Ray <Ra...@idearc.com>

> I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This
> thread is coming out of nutch bean. Sorry.. I should have looked more
> carefully..  I am still learning this stuff..
>
> Here is my code as it is.. Let me look into this some more..
>
>                NativeCrawler nativeCrawler = new NativeCrawler("
> www.dcu.com", "dcu-index", 2, 5);
>                int maxHits = 1000;
>        -->     NutchBean nutchBean = new
> NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
>                Query nutchQuery = Query.parse("credit",
> nativeCrawler.getConfig());
>                Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
>
>                nutchQuery = null;
>                nutchBean.close();
>                nutchBean = null;
>
> Native Crawler is my version of java.Crawl code.. Which works great.. I am
> not closing down the query part of my system correctly and will now go and
> read on that.. Please forgive my taking your time.. I should have been a
> little more precise in my work.. Sigh.. It happens when you are rushing on
> to many projects.. Sorry guys and thanks so much for the help that you guys
> gave me.. I will post the solution to this for us.
>
> ray
>
> -----Original Message-----
> From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]
> Sent: Thursday, April 23, 2009 9:21 AM
> To: nutch-user@lucene.apache.org
>  Subject: RE: Hadoop thread seems to remain alive
>
> Hey Ray.. Great name you have there.. HA..
>
> I don't actually care about deleting these files.. That is not the issue..
> See I have embedded Nutch in my application. That application calls nutch
> over and over again to do crawling and index creation.. This thread that
> stays alive.. It eventually exceeds some limit (native thread) in Java and
> crashes my application.. So that is why I need to find and properly close
> down that service or whatever. I noticed that Hadopp files are still locked
> and so I am thinking that as a hint that it is hadopp..
>
> Bottom line is
>
> When you run Crawl in the java directory, some thread stays open.. That
> thread is killing me.. What is it that stays alive past the completion of
> the Crawl.java code...
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe something
> stays alive.. How to clise that is the issue..
>
> See what I am asking..
>
> Ray, the other ray..
>
> -----Original Message-----
> From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
> Sent: Thursday, April 23, 2009 8:23 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop thread seems to remain alive
>
> Same problems... even rebooting the PC does not always solve the issue,
> files remain locked.
>
> I have gone the brutal way and use unlocker.exe but I mean to find out
> what's going wrong so I will keep posted on this one.
>
> -Ray-
>
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
>
> > Question:
> > What is the proper accepted and safe way to shut down nutch (hadoop)
> > after I am done with it?
> >
> > Hadoop.getFileSystem().closeAll() ??
> > I did try this and no luck. Anyone else having this problem?
> >
> > Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> > Ray
> >
>

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
Nutch bean does have a close.. It just does not kill all the threads.. Each creation of a Nutch Bean kicks off a new thread and that thread never dies.. So... In my case.. I do so many queries I throw an exception because I exhaust my allotment of threads.. Yeah I call that many queries.. I have patched this code and it works great now.. I will email you a copy.. If that is okay.. And give you a hand getting it working.. If you want.. Just let me know.. 
ray 

-----Original Message-----
From: Raymond Balmès [mailto:raymond.balmes@gmail.com] 
Sent: Friday, April 24, 2009 2:51 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Actually I have my files locked when exiting from Tomcat, no matter how I
exit gracefull or not, probably due to some lost threads.
Since the servlet uses the same NutchBean looks like a similar issue as
yours.

Maybe there is no nutchBean.close() being called I will look for it when I
have more time for this.

-The other Ray-


2009/4/23 Lukas, Ray <Ra...@idearc.com>

> I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This
> thread is coming out of nutch bean. Sorry.. I should have looked more
> carefully..  I am still learning this stuff..
>
> Here is my code as it is.. Let me look into this some more..
>
>                NativeCrawler nativeCrawler = new NativeCrawler("
> www.dcu.com", "dcu-index", 2, 5);
>                int maxHits = 1000;
>        -->     NutchBean nutchBean = new
> NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
>                Query nutchQuery = Query.parse("credit",
> nativeCrawler.getConfig());
>                Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
>
>                nutchQuery = null;
>                nutchBean.close();
>                nutchBean = null;
>
> Native Crawler is my version of java.Crawl code.. Which works great.. I am
> not closing down the query part of my system correctly and will now go and
> read on that.. Please forgive my taking your time.. I should have been a
> little more precise in my work.. Sigh.. It happens when you are rushing on
> to many projects.. Sorry guys and thanks so much for the help that you guys
> gave me.. I will post the solution to this for us.
>
> ray
>
> -----Original Message-----
> From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]
> Sent: Thursday, April 23, 2009 9:21 AM
> To: nutch-user@lucene.apache.org
>  Subject: RE: Hadoop thread seems to remain alive
>
> Hey Ray.. Great name you have there.. HA..
>
> I don't actually care about deleting these files.. That is not the issue..
> See I have embedded Nutch in my application. That application calls nutch
> over and over again to do crawling and index creation.. This thread that
> stays alive.. It eventually exceeds some limit (native thread) in Java and
> crashes my application.. So that is why I need to find and properly close
> down that service or whatever. I noticed that Hadopp files are still locked
> and so I am thinking that as a hint that it is hadopp..
>
> Bottom line is
>
> When you run Crawl in the java directory, some thread stays open.. That
> thread is killing me.. What is it that stays alive past the completion of
> the Crawl.java code...
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe something
> stays alive.. How to clise that is the issue..
>
> See what I am asking..
>
> Ray, the other ray..
>
> -----Original Message-----
> From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
> Sent: Thursday, April 23, 2009 8:23 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop thread seems to remain alive
>
> Same problems... even rebooting the PC does not always solve the issue,
> files remain locked.
>
> I have gone the brutal way and use unlocker.exe but I mean to find out
> what's going wrong so I will keep posted on this one.
>
> -Ray-
>
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
>
> > Question:
> > What is the proper accepted and safe way to shut down nutch (hadoop)
> > after I am done with it?
> >
> > Hadoop.getFileSystem().closeAll() ??
> > I did try this and no luck. Anyone else having this problem?
> >
> > Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> > Ray
> >
>

Re: Hadoop thread seems to remain alive

Posted by Raymond Balmès <ra...@gmail.com>.
Actually I have my files locked when exiting from Tomcat, no matter how I
exit gracefull or not, probably due to some lost threads.
Since the servlet uses the same NutchBean looks like a similar issue as
yours.

Maybe there is no nutchBean.close() being called I will look for it when I
have more time for this.

-The other Ray-


2009/4/23 Lukas, Ray <Ra...@idearc.com>

> I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This
> thread is coming out of nutch bean. Sorry.. I should have looked more
> carefully..  I am still learning this stuff..
>
> Here is my code as it is.. Let me look into this some more..
>
>                NativeCrawler nativeCrawler = new NativeCrawler("
> www.dcu.com", "dcu-index", 2, 5);
>                int maxHits = 1000;
>        -->     NutchBean nutchBean = new
> NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
>                Query nutchQuery = Query.parse("credit",
> nativeCrawler.getConfig());
>                Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
>
>                nutchQuery = null;
>                nutchBean.close();
>                nutchBean = null;
>
> Native Crawler is my version of java.Crawl code.. Which works great.. I am
> not closing down the query part of my system correctly and will now go and
> read on that.. Please forgive my taking your time.. I should have been a
> little more precise in my work.. Sigh.. It happens when you are rushing on
> to many projects.. Sorry guys and thanks so much for the help that you guys
> gave me.. I will post the solution to this for us.
>
> ray
>
> -----Original Message-----
> From: Lukas, Ray [mailto:Ray.Lukas@idearc.com]
> Sent: Thursday, April 23, 2009 9:21 AM
> To: nutch-user@lucene.apache.org
>  Subject: RE: Hadoop thread seems to remain alive
>
> Hey Ray.. Great name you have there.. HA..
>
> I don't actually care about deleting these files.. That is not the issue..
> See I have embedded Nutch in my application. That application calls nutch
> over and over again to do crawling and index creation.. This thread that
> stays alive.. It eventually exceeds some limit (native thread) in Java and
> crashes my application.. So that is why I need to find and properly close
> down that service or whatever. I noticed that Hadopp files are still locked
> and so I am thinking that as a hint that it is hadopp..
>
> Bottom line is
>
> When you run Crawl in the java directory, some thread stays open.. That
> thread is killing me.. What is it that stays alive past the completion of
> the Crawl.java code...
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe something
> stays alive.. How to clise that is the issue..
>
> See what I am asking..
>
> Ray, the other ray..
>
> -----Original Message-----
> From: Raymond Balmès [mailto:raymond.balmes@gmail.com]
> Sent: Thursday, April 23, 2009 8:23 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: Hadoop thread seems to remain alive
>
> Same problems... even rebooting the PC does not always solve the issue,
> files remain locked.
>
> I have gone the brutal way and use unlocker.exe but I mean to find out
> what's going wrong so I will keep posted on this one.
>
> -Ray-
>
> 2009/4/23 Lukas, Ray <Ra...@idearc.com>
>
> > Question:
> > What is the proper accepted and safe way to shut down nutch (hadoop)
> > after I am done with it?
> >
> > Hadoop.getFileSystem().closeAll() ??
> > I did try this and no luck. Anyone else having this problem?
> >
> > Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> > Ray
> >
>

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This thread is coming out of nutch bean. Sorry.. I should have looked more carefully..  I am still learning this stuff..  

Here is my code as it is.. Let me look into this some more.. 

		NativeCrawler nativeCrawler = new NativeCrawler("www.dcu.com", "dcu-index", 2, 5);
		int maxHits = 1000;
	-->	NutchBean nutchBean = new NutchBean(nativeCrawler.getConfig(), nativeCrawler.getIndexPath());
		Query nutchQuery = Query.parse("credit", nativeCrawler.getConfig());
		Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
		
		nutchQuery = null;
		nutchBean.close();
		nutchBean = null;

Native Crawler is my version of java.Crawl code.. Which works great.. I am not closing down the query part of my system correctly and will now go and read on that.. Please forgive my taking your time.. I should have been a little more precise in my work.. Sigh.. It happens when you are rushing on to many projects.. Sorry guys and thanks so much for the help that you guys gave me.. I will post the solution to this for us.

ray

-----Original Message-----
From: Lukas, Ray [mailto:Ray.Lukas@idearc.com] 
Sent: Thursday, April 23, 2009 9:21 AM
To: nutch-user@lucene.apache.org
Subject: RE: Hadoop thread seems to remain alive

Hey Ray.. Great name you have there.. HA.. 

I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls nutch over and over again to do crawling and index creation.. This thread that stays alive.. It eventually exceeds some limit (native thread) in Java and crashes my application.. So that is why I need to find and properly close down that service or whatever. I noticed that Hadopp files are still locked and so I am thinking that as a hint that it is hadopp.. 

Bottom line is

When you run Crawl in the java directory, some thread stays open.. That thread is killing me.. What is it that stays alive past the completion of the Crawl.java code... 
If you run org.apache.nutch.crawl.Crawl from within java/eclispe something stays alive.. How to clise that is the issue.. 

See what I am asking.. 

Ray, the other ray.. 

-----Original Message-----
From: Raymond Balmès [mailto:raymond.balmes@gmail.com] 
Sent: Thursday, April 23, 2009 8:23 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Same problems... even rebooting the PC does not always solve the issue,
files remain locked.

I have gone the brutal way and use unlocker.exe but I mean to find out
what's going wrong so I will keep posted on this one.

-Ray-

2009/4/23 Lukas, Ray <Ra...@idearc.com>

> Question:
> What is the proper accepted and safe way to shut down nutch (hadoop)
> after I am done with it?
>
> Hadoop.getFileSystem().closeAll() ??
> I did try this and no luck. Anyone else having this problem?
>
> Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> Ray
>

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
Hey Ray.. Great name you have there.. HA.. 

I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls nutch over and over again to do crawling and index creation.. This thread that stays alive.. It eventually exceeds some limit (native thread) in Java and crashes my application.. So that is why I need to find and properly close down that service or whatever. I noticed that Hadopp files are still locked and so I am thinking that as a hint that it is hadopp.. 

Bottom line is

When you run Crawl in the java directory, some thread stays open.. That thread is killing me.. What is it that stays alive past the completion of the Crawl.java code... 
If you run org.apache.nutch.crawl.Crawl from within java/eclispe something stays alive.. How to clise that is the issue.. 

See what I am asking.. 

Ray, the other ray.. 

-----Original Message-----
From: Raymond Balmès [mailto:raymond.balmes@gmail.com] 
Sent: Thursday, April 23, 2009 8:23 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive

Same problems... even rebooting the PC does not always solve the issue,
files remain locked.

I have gone the brutal way and use unlocker.exe but I mean to find out
what's going wrong so I will keep posted on this one.

-Ray-

2009/4/23 Lukas, Ray <Ra...@idearc.com>

> Question:
> What is the proper accepted and safe way to shut down nutch (hadoop)
> after I am done with it?
>
> Hadoop.getFileSystem().closeAll() ??
> I did try this and no luck. Anyone else having this problem?
>
> Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> Ray
>

Re: Hadoop thread seems to remain alive

Posted by Raymond Balmès <ra...@gmail.com>.
Same problems... even rebooting the PC does not always solve the issue,
files remain locked.

I have gone the brutal way and use unlocker.exe but I mean to find out
what's going wrong so I will keep posted on this one.

-Ray-

2009/4/23 Lukas, Ray <Ra...@idearc.com>

> Question:
> What is the proper accepted and safe way to shut down nutch (hadoop)
> after I am done with it?
>
> Hadoop.getFileSystem().closeAll() ??
> I did try this and no luck. Anyone else having this problem?
>
> Thanks guys.. Thanks, if/when I find it I will post it for everyone.
> Ray
>

RE: Hadoop thread seems to remain alive

Posted by "Lukas, Ray" <Ra...@idearc.com>.
Question:
What is the proper accepted and safe way to shut down nutch (hadoop)
after I am done with it?

Hadoop.getFileSystem().closeAll() ??
I did try this and no luck. Anyone else having this problem?

Thanks guys.. Thanks, if/when I find it I will post it for everyone.
Ray