You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2010/11/01 00:12:12 UTC

Re: Solr in virtual host as opposed to /lib

Can you expand on your question? Are you having a problem? Is this idle
curiosity?

Because I have no idea how to respond when there is so little information.

Best
Erick

On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:

> Is there an issue running Solr in /home/lib as opposed to running it
> somewhere outside of the virtual hosts like /lib?
>
> Eric
>
>

RE: Solr in virtual host as opposed to /lib

Posted by Eric Martin <er...@makethembite.com>.
I was speaking about apache virtual hosts. I was concerned that there was an increase processing time due to the solr and nutch instance being housed inside a virtual host as opposed to being dropped in root of my distro.

Thank you for the astute clarification.

-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
Sent: Monday, November 01, 2010 9:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

I think you guys are talking about two different kinds of 'virtual 
hosts'.  Lance is talking about CPU virtualization. Eric appears to be 
talking about apache virtual web hosts, although Eric hasn't told us how 
apache is involved in his setup in the first place, so it's unclear.

Assuming you are using apache to reverse proxy to Solr, there is no 
reason I can think of that your front-end apache setup would effect CPU 
utilizaton by Solr, let alone by nutch.

Eric Martin wrote:
> Oh. So I should take out the installations and move them to /<some_dir> as opposed to inside my virtual host of /home/<my solr & nutch is here>/www
> '
>
> -----Original Message-----
> From: Lance Norskog [mailto:goksron@gmail.com]
> Sent: Sunday, October 31, 2010 7:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
>
> With virtual hosting you can give CPU & memory quotas to your
> different VMs. This allows you to control the Nutch v.s. The World
> problem. Unforch, you cannot allocate disk channel. With two i/o bound
> apps, this is a problem.
>
> On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <er...@makethembite.com> wrote:
>   
>> Excellent information. Thank you. Solr is acting just fine then. I can
>> connect to it no issues, it indexes fine and there didn't seem to be any
>> complication with it. Now I can rule it out and go about solving, what you
>> pointed out, and I agree, to be a java/nutch issue.
>>
>> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
>> source and found on apache.org
>>
>> Thanks for your time.
>>
>> -----Original Message-----
>> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
>> Sent: Sunday, October 31, 2010 4:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr in virtual host as opposed to /lib
>>
>> What servlet container are you putting your Solr in? Jetty? Tomcat?
>> Something else?  Are you fronting it with apache on top of that? (I think
>> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
>> applies).
>>
>> In general, Solr of course doesn't care what directory it's in on disk, so
>> long as the process running solr has the neccesary read/write permissions to
>> the neccesary directories (and if it doesn't, you'd usually find out right
>> away with an error message).  And clients to Solr don't care what directory
>> it's in on disk either, they only care that they can get it to it connecting
>> to a certain port at a certain hostname. In general, if they can't get to it
>> on a certain port at a certain hostname, that's something you'd discover
>> right away, not something that would be intermittent.  But I'm not familiar
>> with nutch, you may want to try connecting to the port you have Solr running
>> on (the hostname/port you have told nutch to find solr on?) yourself
>> manually, and just make sure it is connectable.
>>
>> I can't think of any reason that what directory you have Solr in could cause
>> CPU utilization issues. I think it's got nothing to do with that.
>>
>> I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
>> you might want to find some nutch experts to ask. Perhaps there's a nutch
>> listserv?  I am also not familiar with hadoop; you mention just in passing
>> that you're using hadoop too, maybe that's an added complication, I don't
>> know.
>>
>> One obvious reason nutch could be taking 100% cpu would be simply because
>> you've asked it to do a lot of work quickly, and it's trying to.
>>
>> One reason I have seen Solr take 100% of CPU and become responsive, is when
>> the Solr process gets caught up in terrible Java garbage collection. If
>> that's what's happening, then giving the Solr JVM a higher maximum heap size
>> can sometimes help (although confusingly, I've seen people suggest that if
>> you give the Solr JVM too MUCH heap it can also result in long GC pauses),
>> and if you have a multi-core/multi-CPU machine, I've found the JVM argument
>> -XX:+UseConcMarkSweepGC to be very helpful.
>>
>> Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
>> Solr issue.
>> ________________________________________
>> From: Eric Martin [eric@makethembite.com]
>> Sent: Sunday, October 31, 2010 7:16 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr in virtual host as opposed to /lib
>>
>> Hi,
>>
>> Thank you. This is more than idle curiosity. I am trying to debug an issue I
>> am having with my installation and this is one step in verifying that I have
>> a setup that does not consume resources. I am trying to debunk my internal
>> myth that having Solr nad Nutch in a virtual host would be causing these
>> issues. Here is the main issue that involves Nutch/Solr and Drupal:
>>
>> /home/mootlaw/lib/solr
>> /home/mootlaw/lib/nutch
>> /home/mootlaw/www/<Drupal site>
>>
>> I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
>> Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
>> jetty for my Solr. My server is not rooted.
>>
>> Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:
>>
>> /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
>> -Dhadoop.log.file=hadoop.log
>> -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
>> -classpath
>> /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
>> d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
>> utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
>> /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
>> 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
>> w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
>> c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
>> otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
>> ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
>> aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
>> gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
>> ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
>> core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
>> jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
>> ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
>> ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
>> akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
>> /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
>> h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
>> /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
>> ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
>> /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
>> ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
>> lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
>> w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
>> e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
>> ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
>> home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
>> -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
>> ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
>> ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
>> -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
>> org.apache.nutch.fetcher.Fetcher
>> /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50
>>
>> My PIDS cannot be traced and my mem usage is at 5%
>>
>> My hadoop logs show:
>>
>> 2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
>> http://caselaw.findlaw.com/us-5th-circuit/1454354.html
>> 2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
>> http://www.dallastxcriminaldefenseattorney.com/atom.xml
>> 2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=48, fetchQueues.totalSize=2499
>> 2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
>> http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
>> 2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
>> http://lcweb2.loc.gov/const/const.html
>> 2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2499
>> 2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> Can anyone help me out? Did I miss something should i be using Tomcat? One
>> interesting part of this is when I try and change the nutch setting post url
>> and urls by score to 1 they stay at 10 no matter what I do.
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Sunday, October 31, 2010 4:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr in virtual host as opposed to /lib
>>
>> Can you expand on your question? Are you having a problem? Is this idle
>> curiosity?
>>
>> Because I have no idea how to respond when there is so little information.
>>
>> Best
>> Erick
>>
>> On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:
>>
>>     
>>> Is there an issue running Solr in /home/lib as opposed to running it
>>> somewhere outside of the virtual hosts like /lib?
>>>
>>> Eric
>>>
>>>
>>>       
>>     
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>   


Re: Solr in virtual host as opposed to /lib

Posted by Jonathan Rochkind <ro...@jhu.edu>.
I think you guys are talking about two different kinds of 'virtual 
hosts'.  Lance is talking about CPU virtualization. Eric appears to be 
talking about apache virtual web hosts, although Eric hasn't told us how 
apache is involved in his setup in the first place, so it's unclear.

Assuming you are using apache to reverse proxy to Solr, there is no 
reason I can think of that your front-end apache setup would effect CPU 
utilizaton by Solr, let alone by nutch.

Eric Martin wrote:
> Oh. So I should take out the installations and move them to /<some_dir> as opposed to inside my virtual host of /home/<my solr & nutch is here>/www
> '
>
> -----Original Message-----
> From: Lance Norskog [mailto:goksron@gmail.com]
> Sent: Sunday, October 31, 2010 7:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
>
> With virtual hosting you can give CPU & memory quotas to your
> different VMs. This allows you to control the Nutch v.s. The World
> problem. Unforch, you cannot allocate disk channel. With two i/o bound
> apps, this is a problem.
>
> On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <er...@makethembite.com> wrote:
>   
>> Excellent information. Thank you. Solr is acting just fine then. I can
>> connect to it no issues, it indexes fine and there didn't seem to be any
>> complication with it. Now I can rule it out and go about solving, what you
>> pointed out, and I agree, to be a java/nutch issue.
>>
>> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
>> source and found on apache.org
>>
>> Thanks for your time.
>>
>> -----Original Message-----
>> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
>> Sent: Sunday, October 31, 2010 4:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr in virtual host as opposed to /lib
>>
>> What servlet container are you putting your Solr in? Jetty? Tomcat?
>> Something else?  Are you fronting it with apache on top of that? (I think
>> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
>> applies).
>>
>> In general, Solr of course doesn't care what directory it's in on disk, so
>> long as the process running solr has the neccesary read/write permissions to
>> the neccesary directories (and if it doesn't, you'd usually find out right
>> away with an error message).  And clients to Solr don't care what directory
>> it's in on disk either, they only care that they can get it to it connecting
>> to a certain port at a certain hostname. In general, if they can't get to it
>> on a certain port at a certain hostname, that's something you'd discover
>> right away, not something that would be intermittent.  But I'm not familiar
>> with nutch, you may want to try connecting to the port you have Solr running
>> on (the hostname/port you have told nutch to find solr on?) yourself
>> manually, and just make sure it is connectable.
>>
>> I can't think of any reason that what directory you have Solr in could cause
>> CPU utilization issues. I think it's got nothing to do with that.
>>
>> I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
>> you might want to find some nutch experts to ask. Perhaps there's a nutch
>> listserv?  I am also not familiar with hadoop; you mention just in passing
>> that you're using hadoop too, maybe that's an added complication, I don't
>> know.
>>
>> One obvious reason nutch could be taking 100% cpu would be simply because
>> you've asked it to do a lot of work quickly, and it's trying to.
>>
>> One reason I have seen Solr take 100% of CPU and become responsive, is when
>> the Solr process gets caught up in terrible Java garbage collection. If
>> that's what's happening, then giving the Solr JVM a higher maximum heap size
>> can sometimes help (although confusingly, I've seen people suggest that if
>> you give the Solr JVM too MUCH heap it can also result in long GC pauses),
>> and if you have a multi-core/multi-CPU machine, I've found the JVM argument
>> -XX:+UseConcMarkSweepGC to be very helpful.
>>
>> Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
>> Solr issue.
>> ________________________________________
>> From: Eric Martin [eric@makethembite.com]
>> Sent: Sunday, October 31, 2010 7:16 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr in virtual host as opposed to /lib
>>
>> Hi,
>>
>> Thank you. This is more than idle curiosity. I am trying to debug an issue I
>> am having with my installation and this is one step in verifying that I have
>> a setup that does not consume resources. I am trying to debunk my internal
>> myth that having Solr nad Nutch in a virtual host would be causing these
>> issues. Here is the main issue that involves Nutch/Solr and Drupal:
>>
>> /home/mootlaw/lib/solr
>> /home/mootlaw/lib/nutch
>> /home/mootlaw/www/<Drupal site>
>>
>> I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
>> Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
>> jetty for my Solr. My server is not rooted.
>>
>> Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:
>>
>> /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
>> -Dhadoop.log.file=hadoop.log
>> -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
>> -classpath
>> /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
>> d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
>> utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
>> /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
>> 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
>> w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
>> c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
>> otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
>> ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
>> aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
>> gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
>> ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
>> core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
>> jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
>> ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
>> ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
>> akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
>> /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
>> h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
>> /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
>> ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
>> /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
>> ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
>> lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
>> w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
>> e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
>> ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
>> home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
>> -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
>> ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
>> ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
>> -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
>> org.apache.nutch.fetcher.Fetcher
>> /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50
>>
>> My PIDS cannot be traced and my mem usage is at 5%
>>
>> My hadoop logs show:
>>
>> 2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
>> http://caselaw.findlaw.com/us-5th-circuit/1454354.html
>> 2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
>> http://www.dallastxcriminaldefenseattorney.com/atom.xml
>> 2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=48, fetchQueues.totalSize=2499
>> 2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=50, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
>> http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
>> 2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
>> http://lcweb2.loc.gov/const/const.html
>> 2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2499
>> 2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> 2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
>> spinWaiting=49, fetchQueues.totalSize=2500
>> Can anyone help me out? Did I miss something should i be using Tomcat? One
>> interesting part of this is when I try and change the nutch setting post url
>> and urls by score to 1 they stay at 10 no matter what I do.
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Sunday, October 31, 2010 4:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr in virtual host as opposed to /lib
>>
>> Can you expand on your question? Are you having a problem? Is this idle
>> curiosity?
>>
>> Because I have no idea how to respond when there is so little information.
>>
>> Best
>> Erick
>>
>> On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:
>>
>>     
>>> Is there an issue running Solr in /home/lib as opposed to running it
>>> somewhere outside of the virtual hosts like /lib?
>>>
>>> Eric
>>>
>>>
>>>       
>>     
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>   

RE: Solr in virtual host as opposed to /lib

Posted by Eric Martin <er...@makethembite.com>.
Oh. So I should take out the installations and move them to /<some_dir> as opposed to inside my virtual host of /home/<my solr & nutch is here>/www
'
 
-----Original Message-----
From: Lance Norskog [mailto:goksron@gmail.com] 
Sent: Sunday, October 31, 2010 7:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.

On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <er...@makethembite.com> wrote:
> Excellent information. Thank you. Solr is acting just fine then. I can
> connect to it no issues, it indexes fine and there didn't seem to be any
> complication with it. Now I can rule it out and go about solving, what you
> pointed out, and I agree, to be a java/nutch issue.
>
> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
> source and found on apache.org
>
> Thanks for your time.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: Sunday, October 31, 2010 4:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> What servlet container are you putting your Solr in? Jetty? Tomcat?
> Something else?  Are you fronting it with apache on top of that? (I think
> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
> applies).
>
> In general, Solr of course doesn't care what directory it's in on disk, so
> long as the process running solr has the neccesary read/write permissions to
> the neccesary directories (and if it doesn't, you'd usually find out right
> away with an error message).  And clients to Solr don't care what directory
> it's in on disk either, they only care that they can get it to it connecting
> to a certain port at a certain hostname. In general, if they can't get to it
> on a certain port at a certain hostname, that's something you'd discover
> right away, not something that would be intermittent.  But I'm not familiar
> with nutch, you may want to try connecting to the port you have Solr running
> on (the hostname/port you have told nutch to find solr on?) yourself
> manually, and just make sure it is connectable.
>
> I can't think of any reason that what directory you have Solr in could cause
> CPU utilization issues. I think it's got nothing to do with that.
>
> I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
> you might want to find some nutch experts to ask. Perhaps there's a nutch
> listserv?  I am also not familiar with hadoop; you mention just in passing
> that you're using hadoop too, maybe that's an added complication, I don't
> know.
>
> One obvious reason nutch could be taking 100% cpu would be simply because
> you've asked it to do a lot of work quickly, and it's trying to.
>
> One reason I have seen Solr take 100% of CPU and become responsive, is when
> the Solr process gets caught up in terrible Java garbage collection. If
> that's what's happening, then giving the Solr JVM a higher maximum heap size
> can sometimes help (although confusingly, I've seen people suggest that if
> you give the Solr JVM too MUCH heap it can also result in long GC pauses),
> and if you have a multi-core/multi-CPU machine, I've found the JVM argument
> -XX:+UseConcMarkSweepGC to be very helpful.
>
> Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
> Solr issue.
> ________________________________________
> From: Eric Martin [eric@makethembite.com]
> Sent: Sunday, October 31, 2010 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> Hi,
>
> Thank you. This is more than idle curiosity. I am trying to debug an issue I
> am having with my installation and this is one step in verifying that I have
> a setup that does not consume resources. I am trying to debunk my internal
> myth that having Solr nad Nutch in a virtual host would be causing these
> issues. Here is the main issue that involves Nutch/Solr and Drupal:
>
> /home/mootlaw/lib/solr
> /home/mootlaw/lib/nutch
> /home/mootlaw/www/<Drupal site>
>
> I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
> Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
> jetty for my Solr. My server is not rooted.
>
> Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:
>
> /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
> -Dhadoop.log.file=hadoop.log
> -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
> -classpath
> /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
> d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
> utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
> /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
> 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
> w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
> c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
> otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
> ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
> aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
> gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
> ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
> core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
> jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
> ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
> ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
> akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
> /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
> h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
> /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
> ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
> /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
> ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
> lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
> w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
> e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
> ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
> home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
> -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
> ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
> ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
> -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
> org.apache.nutch.fetcher.Fetcher
> /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50
>
> My PIDS cannot be traced and my mem usage is at 5%
>
> My hadoop logs show:
>
> 2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/us-5th-circuit/1454354.html
> 2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
> http://www.dallastxcriminaldefenseattorney.com/atom.xml
> 2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=48, fetchQueues.totalSize=2499
> 2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
> 2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
> http://lcweb2.loc.gov/const/const.html
> 2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2499
> 2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> Can anyone help me out? Did I miss something should i be using Tomcat? One
> interesting part of this is when I try and change the nutch setting post url
> and urls by score to 1 they stay at 10 no matter what I do.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Sunday, October 31, 2010 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
>
> Can you expand on your question? Are you having a problem? Is this idle
> curiosity?
>
> Because I have no idea how to respond when there is so little information.
>
> Best
> Erick
>
> On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:
>
>> Is there an issue running Solr in /home/lib as opposed to running it
>> somewhere outside of the virtual hosts like /lib?
>>
>> Eric
>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com


Re: Solr in virtual host as opposed to /lib

Posted by Lance Norskog <go...@gmail.com>.
With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.

On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <er...@makethembite.com> wrote:
> Excellent information. Thank you. Solr is acting just fine then. I can
> connect to it no issues, it indexes fine and there didn't seem to be any
> complication with it. Now I can rule it out and go about solving, what you
> pointed out, and I agree, to be a java/nutch issue.
>
> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
> source and found on apache.org
>
> Thanks for your time.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: Sunday, October 31, 2010 4:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> What servlet container are you putting your Solr in? Jetty? Tomcat?
> Something else?  Are you fronting it with apache on top of that? (I think
> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
> applies).
>
> In general, Solr of course doesn't care what directory it's in on disk, so
> long as the process running solr has the neccesary read/write permissions to
> the neccesary directories (and if it doesn't, you'd usually find out right
> away with an error message).  And clients to Solr don't care what directory
> it's in on disk either, they only care that they can get it to it connecting
> to a certain port at a certain hostname. In general, if they can't get to it
> on a certain port at a certain hostname, that's something you'd discover
> right away, not something that would be intermittent.  But I'm not familiar
> with nutch, you may want to try connecting to the port you have Solr running
> on (the hostname/port you have told nutch to find solr on?) yourself
> manually, and just make sure it is connectable.
>
> I can't think of any reason that what directory you have Solr in could cause
> CPU utilization issues. I think it's got nothing to do with that.
>
> I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
> you might want to find some nutch experts to ask. Perhaps there's a nutch
> listserv?  I am also not familiar with hadoop; you mention just in passing
> that you're using hadoop too, maybe that's an added complication, I don't
> know.
>
> One obvious reason nutch could be taking 100% cpu would be simply because
> you've asked it to do a lot of work quickly, and it's trying to.
>
> One reason I have seen Solr take 100% of CPU and become responsive, is when
> the Solr process gets caught up in terrible Java garbage collection. If
> that's what's happening, then giving the Solr JVM a higher maximum heap size
> can sometimes help (although confusingly, I've seen people suggest that if
> you give the Solr JVM too MUCH heap it can also result in long GC pauses),
> and if you have a multi-core/multi-CPU machine, I've found the JVM argument
> -XX:+UseConcMarkSweepGC to be very helpful.
>
> Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
> Solr issue.
> ________________________________________
> From: Eric Martin [eric@makethembite.com]
> Sent: Sunday, October 31, 2010 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> Hi,
>
> Thank you. This is more than idle curiosity. I am trying to debug an issue I
> am having with my installation and this is one step in verifying that I have
> a setup that does not consume resources. I am trying to debunk my internal
> myth that having Solr nad Nutch in a virtual host would be causing these
> issues. Here is the main issue that involves Nutch/Solr and Drupal:
>
> /home/mootlaw/lib/solr
> /home/mootlaw/lib/nutch
> /home/mootlaw/www/<Drupal site>
>
> I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
> Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
> jetty for my Solr. My server is not rooted.
>
> Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:
>
> /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
> -Dhadoop.log.file=hadoop.log
> -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
> -classpath
> /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
> d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
> utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
> /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
> 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
> w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
> c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
> otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
> ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
> aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
> gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
> ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
> core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
> jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
> ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
> ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
> akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
> /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
> h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
> /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
> ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
> /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
> ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
> lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
> w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
> e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
> ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
> home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
> -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
> ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
> ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
> -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
> org.apache.nutch.fetcher.Fetcher
> /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50
>
> My PIDS cannot be traced and my mem usage is at 5%
>
> My hadoop logs show:
>
> 2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/us-5th-circuit/1454354.html
> 2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
> http://www.dallastxcriminaldefenseattorney.com/atom.xml
> 2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=48, fetchQueues.totalSize=2499
> 2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
> 2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
> http://lcweb2.loc.gov/const/const.html
> 2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2499
> 2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> Can anyone help me out? Did I miss something should i be using Tomcat? One
> interesting part of this is when I try and change the nutch setting post url
> and urls by score to 1 they stay at 10 no matter what I do.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Sunday, October 31, 2010 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
>
> Can you expand on your question? Are you having a problem? Is this idle
> curiosity?
>
> Because I have no idea how to respond when there is so little information.
>
> Best
> Erick
>
> On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:
>
>> Is there an issue running Solr in /home/lib as opposed to running it
>> somewhere outside of the virtual hosts like /lib?
>>
>> Eric
>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

RE: Solr in virtual host as opposed to /lib

Posted by Eric Martin <er...@makethembite.com>.
Excellent information. Thank you. Solr is acting just fine then. I can
connect to it no issues, it indexes fine and there didn't seem to be any
complication with it. Now I can rule it out and go about solving, what you
pointed out, and I agree, to be a java/nutch issue.

Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
source and found on apache.org

Thanks for your time.

-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
Sent: Sunday, October 31, 2010 4:33 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in virtual host as opposed to /lib

What servlet container are you putting your Solr in? Jetty? Tomcat?
Something else?  Are you fronting it with apache on top of that? (I think
maybe you are, otherwise I'm not sure how the phrase 'virtual host'
applies). 

In general, Solr of course doesn't care what directory it's in on disk, so
long as the process running solr has the neccesary read/write permissions to
the neccesary directories (and if it doesn't, you'd usually find out right
away with an error message).  And clients to Solr don't care what directory
it's in on disk either, they only care that they can get it to it connecting
to a certain port at a certain hostname. In general, if they can't get to it
on a certain port at a certain hostname, that's something you'd discover
right away, not something that would be intermittent.  But I'm not familiar
with nutch, you may want to try connecting to the port you have Solr running
on (the hostname/port you have told nutch to find solr on?) yourself
manually, and just make sure it is connectable. 

I can't think of any reason that what directory you have Solr in could cause
CPU utilization issues. I think it's got nothing to do with that. 

I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
you might want to find some nutch experts to ask. Perhaps there's a nutch
listserv?  I am also not familiar with hadoop; you mention just in passing
that you're using hadoop too, maybe that's an added complication, I don't
know. 

One obvious reason nutch could be taking 100% cpu would be simply because
you've asked it to do a lot of work quickly, and it's trying to. 

One reason I have seen Solr take 100% of CPU and become responsive, is when
the Solr process gets caught up in terrible Java garbage collection. If
that's what's happening, then giving the Solr JVM a higher maximum heap size
can sometimes help (although confusingly, I've seen people suggest that if
you give the Solr JVM too MUCH heap it can also result in long GC pauses),
and if you have a multi-core/multi-CPU machine, I've found the JVM argument
-XX:+UseConcMarkSweepGC to be very helpful. 

Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
Solr issue. 
________________________________________
From: Eric Martin [eric@makethembite.com]
Sent: Sunday, October 31, 2010 7:16 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in virtual host as opposed to /lib

Hi,

Thank you. This is more than idle curiosity. I am trying to debug an issue I
am having with my installation and this is one step in verifying that I have
a setup that does not consume resources. I am trying to debunk my internal
myth that having Solr nad Nutch in a virtual host would be causing these
issues. Here is the main issue that involves Nutch/Solr and Drupal:

/home/mootlaw/lib/solr
/home/mootlaw/lib/nutch
/home/mootlaw/www/<Drupal site>

I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
jetty for my Solr. My server is not rooted.

Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:

/usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
-Dhadoop.log.file=hadoop.log
-Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
-classpath
/home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
/apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
/home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
/home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
/lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
-core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
-2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
org.apache.nutch.fetcher.Fetcher
/home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50

My PIDS cannot be traced and my mem usage is at 5%

My hadoop logs show:

2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/us-5th-circuit/1454354.html
2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
http://www.dallastxcriminaldefenseattorney.com/atom.xml
2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=48, fetchQueues.totalSize=2499
2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
http://lcweb2.loc.gov/const/const.html
2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2499
2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
Can anyone help me out? Did I miss something should i be using Tomcat? One
interesting part of this is when I try and change the nutch setting post url
and urls by score to 1 they stay at 10 no matter what I do.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Sunday, October 31, 2010 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

Can you expand on your question? Are you having a problem? Is this idle
curiosity?

Because I have no idea how to respond when there is so little information.

Best
Erick

On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:

> Is there an issue running Solr in /home/lib as opposed to running it
> somewhere outside of the virtual hosts like /lib?
>
> Eric
>
>


RE: Solr in virtual host as opposed to /lib

Posted by Jonathan Rochkind <ro...@jhu.edu>.
What servlet container are you putting your Solr in? Jetty? Tomcat? Something else?  Are you fronting it with apache on top of that? (I think maybe you are, otherwise I'm not sure how the phrase 'virtual host' applies). 

In general, Solr of course doesn't care what directory it's in on disk, so long as the process running solr has the neccesary read/write permissions to the neccesary directories (and if it doesn't, you'd usually find out right away with an error message).  And clients to Solr don't care what directory it's in on disk either, they only care that they can get it to it connecting to a certain port at a certain hostname. In general, if they can't get to it on a certain port at a certain hostname, that's something you'd discover right away, not something that would be intermittent.  But I'm not familiar with nutch, you may want to try connecting to the port you have Solr running on (the hostname/port you have told nutch to find solr on?) yourself manually, and just make sure it is connectable. 

I can't think of any reason that what directory you have Solr in could cause CPU utilization issues. I think it's got nothing to do with that. 

I am not familar with nutch, if it's nutch that's taking 100% of your CPU, you might want to find some nutch experts to ask. Perhaps there's a nutch listserv?  I am also not familiar with hadoop; you mention just in passing that you're using hadoop too, maybe that's an added complication, I don't know. 

One obvious reason nutch could be taking 100% cpu would be simply because you've asked it to do a lot of work quickly, and it's trying to. 

One reason I have seen Solr take 100% of CPU and become responsive, is when the Solr process gets caught up in terrible Java garbage collection. If that's what's happening, then giving the Solr JVM a higher maximum heap size can sometimes help (although confusingly, I've seen people suggest that if you give the Solr JVM too MUCH heap it can also result in long GC pauses), and if you have a multi-core/multi-CPU machine, I've found the JVM argument -XX:+UseConcMarkSweepGC to be very helpful. 

Other than that, it sounds to me like you've got a nutch/hadoop issue, not a Solr issue. 
________________________________________
From: Eric Martin [eric@makethembite.com]
Sent: Sunday, October 31, 2010 7:16 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in virtual host as opposed to /lib

Hi,

Thank you. This is more than idle curiosity. I am trying to debug an issue I
am having with my installation and this is one step in verifying that I have
a setup that does not consume resources. I am trying to debunk my internal
myth that having Solr nad Nutch in a virtual host would be causing these
issues. Here is the main issue that involves Nutch/Solr and Drupal:

/home/mootlaw/lib/solr
/home/mootlaw/lib/nutch
/home/mootlaw/www/<Drupal site>

I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
jetty for my Solr. My server is not rooted.

Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:

/usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
-Dhadoop.log.file=hadoop.log
-Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
-classpath
/home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
/apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
/home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
/home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
/lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
-core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
-2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
org.apache.nutch.fetcher.Fetcher
/home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50

My PIDS cannot be traced and my mem usage is at 5%

My hadoop logs show:

2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/us-5th-circuit/1454354.html
2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
http://www.dallastxcriminaldefenseattorney.com/atom.xml
2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=48, fetchQueues.totalSize=2499
2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
http://lcweb2.loc.gov/const/const.html
2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2499
2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
Can anyone help me out? Did I miss something should i be using Tomcat? One
interesting part of this is when I try and change the nutch setting post url
and urls by score to 1 they stay at 10 no matter what I do.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Sunday, October 31, 2010 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

Can you expand on your question? Are you having a problem? Is this idle
curiosity?

Because I have no idea how to respond when there is so little information.

Best
Erick

On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:

> Is there an issue running Solr in /home/lib as opposed to running it
> somewhere outside of the virtual hosts like /lib?
>
> Eric
>
>


RE: Solr in virtual host as opposed to /lib

Posted by Eric Martin <er...@makethembite.com>.
Hi,

Thank you. This is more than idle curiosity. I am trying to debug an issue I
am having with my installation and this is one step in verifying that I have
a setup that does not consume resources. I am trying to debunk my internal
myth that having Solr nad Nutch in a virtual host would be causing these
issues. Here is the main issue that involves Nutch/Solr and Drupal:

/home/mootlaw/lib/solr
/home/mootlaw/lib/nutch
/home/mootlaw/www/<Drupal site>

I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
jetty for my Solr. My server is not rooted.

Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:

/usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
-Dhadoop.log.file=hadoop.log
-Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
-classpath
/home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
/apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
/home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
/home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
/lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
-core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
-2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
org.apache.nutch.fetcher.Fetcher
/home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50

My PIDS cannot be traced and my mem usage is at 5%

My hadoop logs show:

2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/us-5th-circuit/1454354.html
2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
http://www.dallastxcriminaldefenseattorney.com/atom.xml
2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=48, fetchQueues.totalSize=2499
2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=50, fetchQueues.totalSize=2500
2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
http://lcweb2.loc.gov/const/const.html
2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2499
2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=49, fetchQueues.totalSize=2500
Can anyone help me out? Did I miss something should i be using Tomcat? One
interesting part of this is when I try and change the nutch setting post url
and urls by score to 1 they stay at 10 no matter what I do.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Sunday, October 31, 2010 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host as opposed to /lib

Can you expand on your question? Are you having a problem? Is this idle
curiosity?

Because I have no idea how to respond when there is so little information.

Best
Erick

On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <er...@makethembite.com> wrote:

> Is there an issue running Solr in /home/lib as opposed to running it
> somewhere outside of the virtual hosts like /lib?
>
> Eric
>
>