You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Daley, Kristopher M." <da...@ornl.gov> on 2007/09/19 15:04:44 UTC

Index/Update Problems with Solrj/Tomcat and Larger Files

I am using Tomcat 6 and Solr 1.2 on a Windows 2003 server using the
following java code.   I am trying to index pdf files, and I'm
constantly getting errors on larger files (the same ones).  

 

      SolrServer server = new CommonsHttpSolrServer(solrPostUrl);

      SolrInputDocument addDoc = new SolrInputDocument();

      addDoc.addField("url", url);

      addDoc.addField("site", site);

      addDoc.addField("author", author);

      addDoc.addField("title", title);

      addDoc.addField("subject", subject);

      addDoc.addField("keywords", keywords);

      addDoc.addField("text", docText);

      UpdateRequest ur = new UpdateRequest();

      ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...

      ur.add(addDoc);

      UpdateResponse rsp = ur.process(server);

 

The java error I received is: class
org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Software caused connection abort: recv
failed)

Tomcat Log:

SEVERE: java.net.SocketTimeoutException: Read timed out

                at java.net.SocketInputStream.socketRead0(Native Method)

                at
java.net.SocketInputStream.read(SocketInputStream.java:129)

                at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.ja
va:716)

                at
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRe
ad(InternalInputBuffer.java:746)

                at
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInpu
tFilter.java:116)

                at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.
java:675)

                at org.apache.coyote.Request.doRead(Request.java:428)

                at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java
:297)

                at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:405)

                at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:312)

                at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.j
ava:193)

                at
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)

                at
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)

                at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)

                at
java.io.InputStreamReader.read(InputStreamReader.java:167)

                at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)

                at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

                at
org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)

                at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

                at
org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)

                at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequest
Handler.java:332)

                at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:162)

                at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)

                at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)

                at
org.apache.solr.core.SolrCore.execute(SolrCore.java:658)

                at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:191)

                at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:159)

                at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)

                at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)

                at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)

                at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:175)

                at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)

                at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)

                at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)

                at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
63)

                at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
4)

                at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:584)

                at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

                at java.lang.Thread.run(Thread.java:619)

 

This happens when I try to index a field containing the contents of the
PDF file.  It's string length is 189002.  If I only do a substring on
the field of say length 150000, it usually will work.  Does anyone have
any idea on why this might be happening?

 

I have had this and other files index correctly using a different
combination version of Tomcat/Solr without any problem (using similar
code, I re-wrote it because I thought it would be better to use Solrj).
I get the same error whether I use a simple StringBuilder to created the
add manually or if I use Solrj.  I have manually encoded each field
before passing it in to the add function as well, so I don't believe it
is a content problem.   I have tried to change every setting in Tomcat
and Solr that I can think of, but I'm newer to both of them.  

 

I have tried to do large files (docText.length() = 439902) and I'm
getting class org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Connection reset by peer: socket write
error).

 Any help would be greatly appreciated!  Thanks.  


RE: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by "Daley, Kristopher M." <da...@ornl.gov>.
I am running against 1.2.  Where would I get the 1.3-dev version?  

I will try different versions of Tomcat and/or Jetty.  Thanks for all
your suggestions, I'll let you know.

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Wednesday, September 19, 2007 8:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

> 
> However, if I go to the tomcat server and restart it after I have
issued
> the process command, the program returns and the documents are all
> posted correctly!
> 
> Very strange behavior....am I somehow not closing the connection
> properly?  
> 

What version is the solr you are connecting to? 1.2 or 1.3-dev?  (I have

not tested against 1.2)

Does this only happen with tomcat?  If you run with jetty do you get the

same behavior?  (again, just stabs in the dark)

If you can make a small repeatable problem, post it in JIRA and I'll 
look into it.

ryan


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by Ryan McKinley <ry...@gmail.com>.
> 
> However, if I go to the tomcat server and restart it after I have issued
> the process command, the program returns and the documents are all
> posted correctly!
> 
> Very strange behavior....am I somehow not closing the connection
> properly?  
> 

What version is the solr you are connecting to? 1.2 or 1.3-dev?  (I have 
not tested against 1.2)

Does this only happen with tomcat?  If you run with jetty do you get the 
same behavior?  (again, just stabs in the dark)

If you can make a small repeatable problem, post it in JIRA and I'll 
look into it.

ryan


RE: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by "Daley, Kristopher M." <da...@ornl.gov>.
Ok, I'll try to play with those.  Any suggestion on the size?

Something else that is very interesting is that I just tried to do an
aggregate add of a bunch of docs, including the one that always returned
the error.

I called a function to create a SolrInputDocument and return it.  I then
did the following:

Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
UpdateRequest ur = new UpdateRequest();
ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...
ur.add(docs);
UpdateResponse rsp = ur.process(server);

In doing this, the program simply hangs after the last command.  If I
let it sit there for an amount of time, it eventually returns with the
error: class org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Connection reset by peer: socket write error)

However, if I go to the tomcat server and restart it after I have issued
the process command, the program returns and the documents are all
posted correctly!

Very strange behavior....am I somehow not closing the connection
properly?  

 
-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Wednesday, September 19, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:

  getConnectionManager().getParams().setSendBufferSize( big );
  getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apac
he/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:
> I tried 10000 and 60000, same result.
> 
> -----Original Message-----
> From: Ryan McKinley [mailto:ryantxu@gmail.com] 
> Sent: Wednesday, September 19, 2007 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files
> 
> Daley, Kristopher M. wrote:
>> I have tried changing those settings, for example, as:
>>
>> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
>> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
>> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
>> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
>>
>> However, still no luck.  
>>
> 
> Have you tried anything larger then 60?  60ms is not long...
> 
> try 10000 (10s) and see if it works.
> 
> 


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by Ryan McKinley <ry...@gmail.com>.
I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:

  getConnectionManager().getParams().setSendBufferSize( big );
  getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:
> I tried 10000 and 60000, same result.
> 
> -----Original Message-----
> From: Ryan McKinley [mailto:ryantxu@gmail.com] 
> Sent: Wednesday, September 19, 2007 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files
> 
> Daley, Kristopher M. wrote:
>> I have tried changing those settings, for example, as:
>>
>> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
>> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
>> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
>> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
>>
>> However, still no luck.  
>>
> 
> Have you tried anything larger then 60?  60ms is not long...
> 
> try 10000 (10s) and see if it works.
> 
> 


RE: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by "Daley, Kristopher M." <da...@ornl.gov>.
I tried 10000 and 60000, same result.

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Wednesday, September 19, 2007 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Daley, Kristopher M. wrote:
> I have tried changing those settings, for example, as:
> 
> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
> 
> However, still no luck.  
> 

Have you tried anything larger then 60?  60ms is not long...

try 10000 (10s) and see if it works.


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by Ryan McKinley <ry...@gmail.com>.
Daley, Kristopher M. wrote:
> I have tried changing those settings, for example, as:
> 
> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
> 
> However, still no luck.  
> 

Have you tried anything larger then 60?  60ms is not long...

try 10000 (10s) and see if it works.


RE: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by "Daley, Kristopher M." <da...@ornl.gov>.
I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  

I took the SimplePostTool.java file from the wiki, changed the URL,
compiled it and ran it with the output of the command:
UpdateRequest ur = new UpdateRequest();
ur.add(addDoc);
String xml = ur.getXML();

This works.  It seems that it must be a communication setting, but I'm
stumped.  

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Wednesday, September 19, 2007 10:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

> 
> I have had this and other files index correctly using a different
> combination version of Tomcat/Solr without any problem (using similar
> code, I re-wrote it because I thought it would be better to use
Solrj).
> I get the same error whether I use a simple StringBuilder to created
the
> add manually or if I use Solrj.  I have manually encoded each field
> before passing it in to the add function as well, so I don't believe
it
> is a content problem.   I have tried to change every setting in Tomcat
> and Solr that I can think of, but I'm newer to both of them.  
> 

So it works if you build an XML file with the same content and send it 
to the server using the example post.sh/post.jar tool?

Have you tried messing with the connection settings?
  SolrServer server = new CommonsHttpSolrServer( url );
   ((CommonsHttpSolrServer)server).setConnectionTimeout(5);
   ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
   ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

a timeout of 5ms is probably too short...


ryan

Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Posted by Ryan McKinley <ry...@gmail.com>.
> 
> I have had this and other files index correctly using a different
> combination version of Tomcat/Solr without any problem (using similar
> code, I re-wrote it because I thought it would be better to use Solrj).
> I get the same error whether I use a simple StringBuilder to created the
> add manually or if I use Solrj.  I have manually encoded each field
> before passing it in to the add function as well, so I don't believe it
> is a content problem.   I have tried to change every setting in Tomcat
> and Solr that I can think of, but I'm newer to both of them.  
> 

So it works if you build an XML file with the same content and send it 
to the server using the example post.sh/post.jar tool?

Have you tried messing with the connection settings?
  SolrServer server = new CommonsHttpSolrServer( url );
   ((CommonsHttpSolrServer)server).setConnectionTimeout(5);
   ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
   ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

a timeout of 5ms is probably too short...


ryan