You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James Tyrrell <jr...@hotmail.com> on 2004/10/27 12:58:57 UTC
Indexing process causes Tomcat to stop working
Hello,
I am a Java/Lucene/Tomcat newbie I know that does not bode well as a start
to a post but I really am in dire straits as far as Lucene goes so bear with
me. I am working on indexing and replacing search functionality for a
website (about 10 gig in size, although only about 7 gig is indexed) I
presently have a working model based on the luceneweb demo dispatched with
Lucene, this has already proven functional when tested on various sites
(admittedly much smaller 200-400mb etc). However, issues occur when
performing the index on the main site that I havent found explained on any
of the Lucene forums thus far.
After a successful index and optimisation of the website (takes around 4hrs
40m unoptimised) I cant get to the index.jsp or even access tomcat. My
first thought was to restart tomcat
No joy and no access. Thinking the
larger index had killed the test server I accessed apache on port 80, which
worked perfectly. After a few checks I realised the test server was fine,
apache was fine, used the same application to create an index of the tomcat
docs so java was working. Confused I went back to the forums, FAQ's and
groups to see if anyone had any similar problems and have come up with a
brief list of what my problem is not;
There is no index write.lock files found for Lucene in either /tmp or
opt/tomcat/temp directories so the index is open to be searched. Nor does
top reveal anything overloading the system. Apache is running fine and
displays all relevant pages. Tomcat cannot be reached with a browser
(neither the default congratulations page or the Luceneweb application)
Tomcat was a fresh install as was Java, Tomcat logs show nothing different
to standard startup logs. So I logged the entire indexing process and saw
two errors occurring infrequently.
Parse Aborted: Encountered "\"" at line 6, column 129. //where these values
vary
Was expecting one of:
<ArgName> ...
"=" ...
<TagEnd> ...
Im satisfied this is just the HTML parser kicking off about some badly
formatted HTML and is only affecting what is indexed but its here for
completeness. The other error is more serious:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.receive(PipedInputStream.java:136)
at java.io.PipedInputStream.receive(PipedInputStream.java:176)
at java.io.PipedOutputStream.write(PipedOutputStream.java:129)
at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
at
sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:395)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
at java.io.Writer.write(Writer.java:126)
at
org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:137)
at
org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:203)
at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:31)
Im again pretty sure that this is the same error that occurred once before
when I was using the maxFieldLength to limit the number of terms recorded.
Im also confident its a threading error and found the following post by
Doug Cutting that seemed to explain it http://java2.5341.com/msg/80502.html
however I am assuming thats what it is and havent yet attempted to change
the threading system of the demo as yet due to my lack of java knowledge.
The strange thing is after restarting the server all aspects of the Lucene
web application work perfectly stemming, alphanumeric indexing summaries etc
are all as expected, so I am left assuming due to this (and by running out
of options) that Lucene has somehow done something to Tomcat by doing such a
large index. Being that both run off Java I guess its something to do with
that but I have nowhere near enough experience in java to work out what
The system I am currently running on is Java 1.4.2_05, Tomcat 5.0.27,
Lucene 1.4.1, Linux version 2.4.20-8 (gcc version 3.2.2 20030222 (Red
Hat Linux 3.2.2-5)), Apache 2.0.42. I have not modified the mergeFactor or
MaxMergeDocuments nor am I using RAMdirectories. The processor is 800MHz and
there is 128mb of RAM.
If more info is required on setup, source code etc or you think this should
be moved to a tomcat forum just post.
Best regards and thanks in advance for any advice you can offer,
J Tyrrell
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Indexing process causes Tomcat to stop working
Posted by Aad Nales <aa...@rotterdam-cs.com>.
James,
How do you kick off your reindex? Could it be a session timeout?
cheers,
Aad
Hello,
I am a Java/Lucene/Tomcat newbie I know that does not bode well as a
start
to a post but I really am in dire straits as far as Lucene goes so bear
with
me. I am working on indexing and replacing search functionality for a
website (about 10 gig in size, although only about 7 gig is indexed) I
presently have a working model based on the luceneweb demo dispatched
with
Lucene, this has already proven functional when tested on various sites
(admittedly much smaller 200-400mb etc). However, issues occur when
performing the index on the main site that I haven't found explained on
any
of the Lucene forums thus far.
After a successful index and optimisation of the website (takes around
4hrs
40m unoptimised) I can't get to the index.jsp or even access tomcat. My
first thought was to restart tomcat. No joy and no access. Thinking the
larger index had killed the test server I accessed apache on port 80,
which
worked perfectly. After a few checks I realised the test server was
fine,
apache was fine, used the same application to create an index of the
tomcat
docs so java was working. Confused I went back to the forums, FAQ's and
groups to see if anyone had any similar problems and have come up with a
brief list of what my problem is not;
There is no index write.lock files found for Lucene in either /tmp or
opt/tomcat/temp directories so the index is open to be searched. Nor
does
'top' reveal anything overloading the system. Apache is running fine and
displays all relevant pages. Tomcat cannot be reached with a browser
(neither the default congratulations page or the Luceneweb application)
Tomcat was a fresh install as was Java, Tomcat logs show nothing
different
to standard startup logs. So I logged the entire indexing process and
saw
two errors occurring infrequently.
Parse Aborted: Encountered "\"" at line 6, column 129. //where these
values
vary
Was expecting one of:
<ArgName> ...
"=" ...
<TagEnd> ...
I'm satisfied this is just the HTML parser kicking off about some badly
formatted HTML and is only affecting what is indexed but its here for
completeness. The other error is more serious:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.receive(PipedInputStream.java:136)
at java.io.PipedInputStream.receive(PipedInputStream.java:176)
at java.io.PipedOutputStream.write(PipedOutputStream.java:129)
at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
at
sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:395)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
at java.io.Writer.write(Writer.java:126)
at
org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:137)
at
org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:203)
at
org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:31)
I'm again pretty sure that this is the same error that occurred once
before
when I was using the maxFieldLength to limit the number of terms
recorded.
I'm also confident it's a threading error and found the following post
by
Doug Cutting that seemed to explain it
http://java2.5341.com/msg/80502.html
however I am assuming that's what it is and haven't yet attempted to
change
the threading system of the demo as yet due to my lack of java
knowledge.
The strange thing is after restarting the server all aspects of the
Lucene
web application work perfectly stemming, alphanumeric indexing summaries
etc
are all as expected, so I am left assuming due to this (and by running
out
of options) that Lucene has somehow done something to Tomcat by doing
such a
large index. Being that both run off Java I guess its something to do
with
that but I have nowhere near enough experience in java to work out what
The system I am currently running on is Java - 1.4.2_05, Tomcat -
5.0.27,
Lucene - 1.4.1, Linux version - 2.4.20-8 (gcc version 3.2.2 20030222
(Red
Hat Linux 3.2.2-5)), Apache 2.0.42. I have not modified the mergeFactor
or
MaxMergeDocuments nor am I using RAMdirectories. The processor is 800MHz
and
there is 128mb of RAM.
If more info is required on setup, source code etc or you think this
should
be moved to a tomcat forum just post.
Best regards and thanks in advance for any advice you can offer,
J Tyrrell
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org