You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Giovanni Fernandez-Kincade <gf...@capitaliq.com> on 2010/03/19 00:19:15 UTC

stream.url Contention

I recently switched from posting a file (PDFs in this case) to the Extract handler, to using the Stream.URL parameter. I've noticed a huge amount of contention around opening URL connections:

http-8080-Processor36 [BLOCKED] CPU time: 0:47
sun.net.www.protocol.file.Handler.openConnection(URL)
java.net.URL.openConnection()
sun.net.www.protocol.jar.JarURLConnection.<init>(URL, Handler)
sun.net.www.protocol.jar.Handler.openConnection(URL)
java.net.URL.openConnection()
java.net.URL.openStream()
java.lang.ClassLoader.getResourceAsStream(String)
org.pdfbox.util.ResourceLoader.loadResource(String)
org.pdfbox.util.ResourceLoader.loadProperties(String)
org.pdfbox.util.PDFTextStripper.<init>()
org.apache.tika.parser.pdf.PDF2XHTML.<init>(ContentHandler, Metadata)
org.apache.tika.parser.pdf.PDF2XHTML.process(PDDocument, ContentHandler, Metadata)
org.apache.tika.parser.pdf.PDFParser.parse(InputStream, ContentHandler, Metadata)
org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata)
org.apache.tika.parser.AutoDetectParser.parse(InputStream, ContentHandler, Metadata)
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(SolrQueryRequest, SolrQueryResponse, ContentStream)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse)
org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse)
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, SolrQueryResponse)
org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse, FilterChain)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, ServletResponse)
org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection, Object[])
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection, Object[])
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
java.lang.Thread.run()

This seems to be a significant bottleneck, even when running only a handful of thread. Has anyone else run into this? Any ideas on how to reduce the blocking?

Thanks,
Gio.