You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yossi Nachum <na...@gmail.com> on 2013/09/17 15:47:30 UTC
check which file/document cause solr to work hard
Hi,
I am trying to index my windows pc files with manifoldcf version 1.3 and
solr version 4.4.
Few minutes after I start the crawler job I see that tomcat process
constantly consume 100% of one cpu (I have two cpu's).
I check the thread dump in solr admin and saw that the following threads
take the most cpu/user time
"
http-8080-3 (32)
- java.io.FileInputStream.readBytes(Native Method)
- java.io.FileInputStream.read(FileInputStream.java:236)
- java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
- java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
- java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
- java.io.FilterInputStream.read(FilterInputStream.java:133)
- org.apache.tika.io.TailStream.read(TailStream.java:117)
- org.apache.tika.io.TailStream.skip(TailStream.java:140)
- org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
- org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
-
org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
- org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
-
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
-
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
-
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
-
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
- org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
-
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
-
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
-
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
-
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
-
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
-
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
-
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
-
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
-
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
-
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
-
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
-
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
-
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
- org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
- java.lang.Thread.run(Thread.java:679)
"
how can I check which file cause tika to work so hard?
I don't see anything in the log files and I am stuck
Thanks,
Yossi
Re: check which file/document cause solr to work hard
Posted by Erick Erickson <er...@gmail.com>.
you can always commit them one at a time to the ExtractingRequestHandler
http://wiki.apache.org/solr/ExtractingRequestHandler
Best,
Erick
On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum <na...@gmail.com> wrote:
> Hi,
>
> I am trying to index my windows pc files with manifoldcf version 1.3 and
> solr version 4.4.
>
> Few minutes after I start the crawler job I see that tomcat process
> constantly consume 100% of one cpu (I have two cpu's).
>
> I check the thread dump in solr admin and saw that the following threads
> take the most cpu/user time
> "
> http-8080-3 (32)
>
> - java.io.FileInputStream.readBytes(Native Method)
> - java.io.FileInputStream.read(FileInputStream.java:236)
> - java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> - java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> - java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> - java.io.FilterInputStream.read(FilterInputStream.java:133)
> - org.apache.tika.io.TailStream.read(TailStream.java:117)
> - org.apache.tika.io.TailStream.skip(TailStream.java:140)
> - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> -
>
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> -
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> -
>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> -
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> -
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> -
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> -
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> -
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> -
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> -
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> -
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> -
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> -
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> -
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> -
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> -
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> -
>
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> -
>
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
> -
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> -
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> - java.lang.Thread.run(Thread.java:679)
>
> "
>
> how can I check which file cause tika to work so hard?
> I don't see anything in the log files and I am stuck
> Thanks,
> Yossi
>
Re: check which file/document cause solr to work hard
Posted by Erick Erickson <er...@gmail.com>.
you can always commit them one at a time to the ExtractingRequestHandler
http://wiki.apache.org/solr/ExtractingRequestHandler
Best,
Erick
On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum <na...@gmail.com> wrote:
> Hi,
>
> I am trying to index my windows pc files with manifoldcf version 1.3 and
> solr version 4.4.
>
> Few minutes after I start the crawler job I see that tomcat process
> constantly consume 100% of one cpu (I have two cpu's).
>
> I check the thread dump in solr admin and saw that the following threads
> take the most cpu/user time
> "
> http-8080-3 (32)
>
> - java.io.FileInputStream.readBytes(Native Method)
> - java.io.FileInputStream.read(FileInputStream.java:236)
> - java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> - java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> - java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> - java.io.FilterInputStream.read(FilterInputStream.java:133)
> - org.apache.tika.io.TailStream.read(TailStream.java:117)
> - org.apache.tika.io.TailStream.skip(TailStream.java:140)
> - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> -
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> -
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> -
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> -
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> -
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> -
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> -
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> -
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> -
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> -
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> -
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> -
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> -
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> -
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> -
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> -
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> -
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> -
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
> -
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> - java.lang.Thread.run(Thread.java:679)
>
> "
>
> how can I check which file cause tika to work so hard?
> I don't see anything in the log files and I am stuck
> Thanks,
> Yossi