You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by msaunier <ms...@citya.com> on 2018/07/24 09:58:37 UTC

Out of memory, one file bug i think

Re Karl,

 

I have an Out of Memory Error today. I think I have an error with a
document. I have this WARNING before crash:

 

------------------------------------------------------------------------

 

WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception
extracting: TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@62980adb

org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.microsoft.OfficeParser@62980adb

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
~[tika-core-1.17.jar:1.17]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[tika-core-1.17.jar:1.17]

        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
~[tika-core-1.17.jar:1.17]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser
.java:74) ~[mcf-tika-connector.jar:?]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceD
ocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?]

        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineA
ddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226
) [mcf-agents.jar:?]

        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineA
ddFanout.sendDocument(IncrementalIngester.java:3077) [mcf-agents.jar:?]

        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineO
bjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java
:2708) [mcf-agents.jar:?]

        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentI
ngest(IncrementalIngester.java:756) [mcf-agents.jar:?]

        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocu
mentWithException(WorkerThread.java:1583) [mcf-pull-agent.jar:?]

        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocu
mentWithException(WorkerThread.java:1548) [mcf-pull-agent.jar:?]

        at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.pro
cessDocuments(SharedDriveConnector.java:939) [mcf-jcifs-connector.jar:?]

        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
[mcf-pull-agent.jar:?]

Caused by: java.io.IOException: java.lang.ClassNotFoundException:
org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at
org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
~[?:?]

        at
org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
~[?:?]

       at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
~[?:?]

        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        ... 12 more

Caused by: java.lang.ClassNotFoundException:
org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
~[?:1.8.0_171]

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
~[?:1.8.0_171]

        at
org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222
) ~[?:?]

        at
org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
~[?:?]

        at
org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
~[?:?]

        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
~[?:?]

        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        ... 12 more

 

I think it's a file, because RAM allocation have a weird behavior. In one
second, ManifoldCF (or Tika) allocate +6Go RAM.

 



 

How Can I find the file?

 

Thanks,

Maxence,


RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
How can I with history determinate the document in error?

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 17:19
À : user@manifoldcf.apache.org
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

The way it works in the JCIFS connector is that files that aren't within the specification are removed from the list of files being processed.  If a file is already being processed, however, it is just retried.  So changing this property to make an out-of-memory condition go away is not going to work if you've already got a problem document being processed.

You can restart the job, and that will make it work.  Or you can add the transformation connection instead.

 

FWIW, tou could verify if this was working properly if your simple history was enabled.  Without that, you really can't.  

 

Karl

 

 

On Thu, Jul 26, 2018 at 11:09 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On repository connection. I have add « 20971520 » on the max document size.

 

Maxence

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : jeudi 26 juillet 2018 17:07
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

How are you limiting content size?  Is this in the repository connection, or in an Allowed Documents transformation connection?

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have limit to 20Mb / document and I have again an out of memory java.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : jeudi 26 juillet 2018 16:23
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

I believe there's also a content length tab in the Windows Share connector, if you're using that.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The ContentLimiter truncates documents.  That's not what you want.

 

Use the Allowed Documents transformer.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down


RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.

 

POSTGRESQL history table result and distinct

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 17:19
À : user@manifoldcf.apache.org
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

The way it works in the JCIFS connector is that files that aren't within the specification are removed from the list of files being processed.  If a file is already being processed, however, it is just retried.  So changing this property to make an out-of-memory condition go away is not going to work if you've already got a problem document being processed.

You can restart the job, and that will make it work.  Or you can add the transformation connection instead.

 

FWIW, tou could verify if this was working properly if your simple history was enabled.  Without that, you really can't.  

 

Karl

 

 

On Thu, Jul 26, 2018 at 11:09 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On repository connection. I have add « 20971520 » on the max document size.

 

Maxence

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : jeudi 26 juillet 2018 17:07
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

How are you limiting content size?  Is this in the repository connection, or in an Allowed Documents transformation connection?

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have limit to 20Mb / document and I have again an out of memory java.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : jeudi 26 juillet 2018 16:23
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

I believe there's also a content length tab in the Windows Share connector, if you're using that.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The ContentLimiter truncates documents.  That's not what you want.

 

Use the Allowed Documents transformer.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down


Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
The way it works in the JCIFS connector is that files that aren't within
the specification are removed from the list of files being processed.  If a
file is already being processed, however, it is just retried.  So changing
this property to make an out-of-memory condition go away is not going to
work if you've already got a problem document being processed.

You can restart the job, and that will make it work.  Or you can add the
transformation connection instead.

FWIW, tou could verify if this was working properly if your simple history
was enabled.  Without that, you really can't.

Karl


On Thu, Jul 26, 2018 at 11:09 AM msaunier <ms...@citya.com> wrote:

> On repository connection. I have add « 20971520 » on the max document size.
>
>
>
> Maxence
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* jeudi 26 juillet 2018 17:07
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> How are you limiting content size?  Is this in the repository connection,
> or in an Allowed Documents transformation connection?
>
>
>
> Karl
>
>
>
>
>
> On Thu, Jul 26, 2018 at 10:58 AM msaunier <ms...@citya.com> wrote:
>
> I have limit to 20Mb / document and I have again an out of memory java.
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* jeudi 26 juillet 2018 16:23
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> I believe there's also a content length tab in the Windows Share
> connector, if you're using that.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <da...@gmail.com> wrote:
>
> The ContentLimiter truncates documents.  That's not what you want.
>
>
>
> Use the Allowed Documents transformer.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Jul 26, 2018 at 10:06 AM msaunier <ms...@citya.com> wrote:
>
> I have add a Content limiter transformation before Tika extractor. It’s
> very very slow now. It’s normal?
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 19:15
> *À :* user@manifoldcf.apache.org
> *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> It looks like you are still running out of memory.  I would love to know
> what document it was that doing that.  I suspect it is very large already,
> and for some reason it cannot be streamed.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:
>
> Hi Maxence,
>
>
>
> The second exception is occurring because processing is still occurring
> while the JVM is shutting down; it can be ignored.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
>

RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
On repository connection. I have add « 20971520 » on the max document size.

 

Maxence

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 17:07
À : user@manifoldcf.apache.org
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

How are you limiting content size?  Is this in the repository connection, or in an Allowed Documents transformation connection?

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have limit to 20Mb / document and I have again an out of memory java.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : jeudi 26 juillet 2018 16:23
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

I believe there's also a content length tab in the Windows Share connector, if you're using that.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The ContentLimiter truncates documents.  That's not what you want.

 

Use the Allowed Documents transformer.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.conne


Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
How are you limiting content size?  Is this in the repository connection,
or in an Allowed Documents transformation connection?

Karl


On Thu, Jul 26, 2018 at 10:58 AM msaunier <ms...@citya.com> wrote:

> I have limit to 20Mb / document and I have again an out of memory java.
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* jeudi 26 juillet 2018 16:23
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> I believe there's also a content length tab in the Windows Share
> connector, if you're using that.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <da...@gmail.com> wrote:
>
> The ContentLimiter truncates documents.  That's not what you want.
>
>
>
> Use the Allowed Documents transformer.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Jul 26, 2018 at 10:06 AM msaunier <ms...@citya.com> wrote:
>
> I have add a Content limiter transformation before Tika extractor. It’s
> very very slow now. It’s normal?
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 19:15
> *À :* user@manifoldcf.apache.org
> *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> It looks like you are still running out of memory.  I would love to know
> what document it was that doing that.  I suspect it is very large already,
> and for some reason it cannot be streamed.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:
>
> Hi Maxence,
>
>
>
> The second exception is occurring because processing is still occurring
> while the JVM is shutting down; it can be ignored.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at org.apache.manifoldcf.crawler.conne
>
>

RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
I have limit to 20Mb / document and I have again an out of memory java.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 16:23
À : user@manifoldcf.apache.org
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

I believe there's also a content length tab in the Windows Share connector, if you're using that.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The ContentLimiter truncates documents.  That's not what you want.

 

Use the Allowed Documents transformer.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventT


RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Yes I have find it. Thanks you. 

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 16:23
À : user@manifoldcf.apache.org
Objet : Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

 

I believe there's also a content length tab in the Windows Share connector, if you're using that.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The ContentLimiter truncates documents.  That's not what you want.

 

Use the Allowed Documents transformer.

 

Karl

 

 

On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventT


Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
I believe there's also a content length tab in the Windows Share connector,
if you're using that.

Karl


On Thu, Jul 26, 2018 at 10:19 AM Karl Wright <da...@gmail.com> wrote:

> The ContentLimiter truncates documents.  That's not what you want.
>
> Use the Allowed Documents transformer.
>
> Karl
>
>
> On Thu, Jul 26, 2018 at 10:06 AM msaunier <ms...@citya.com> wrote:
>
>> I have add a Content limiter transformation before Tika extractor. It’s
>> very very slow now. It’s normal?
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 19:15
>> *À :* user@manifoldcf.apache.org
>> *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think
>>
>>
>>
>> It looks like you are still running out of memory.  I would love to know
>> what document it was that doing that.  I suspect it is very large already,
>> and for some reason it cannot be streamed.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:
>>
>> Hi Maxence,
>>
>>
>>
>> The second exception is occurring because processing is still occurring
>> while the JVM is shutting down; it can be ignored.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> I have add the snapshot and I’m spam with this error :
>>
>>
>>
>> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 13:12
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> Tomorrow (7/26) the POI project will be delivering a nightly build which
>> should repair the Class Not Found exceptions.  You will need to download it
>> here:
>>
>>
>> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>>
>>
>>
>> ... and replace all poi jars with the corresponding ones from the binary
>> distribution.  I believe the poi jars are all in connector-common-lib.  Be
>> sure to delete the old ones (or move them somewhere else) first.
>>
>>
>>
>> I don't know whether this will fix your out of memory problem however.
>> Please let me know what's still not working and I can take it from there.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>>
>> Out of memory errors are fatal, I'm afraid, because they corrupt not only
>> the document in question but all others being processed at the same time.
>> So those cannot be ignored.
>>
>>
>>
>> Tika should ignore documents that it cannot process, however, and that is
>> a great enhancement request for them.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> Okay. So today, I'm going to force ManifoldCF to run so that only the
>> documents are left behind.
>>
>> In the future, could I ignore these mistakes? Because it makes the
>> application crash, and in production it is not terrible as behavior.
>>
>>
>>
>> Thanks
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:53
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> The problem isn't with images in general; it's with certain kinds of
>> images.  There are optional dependencies in Tika for some kinds of images
>> that we cannot include in the MCF distribution because of licensing
>> problems.  I don't know which kinds these are but apparently you are trying
>> to index some of them.
>>
>> You will need to find and download the right jar and put it in the
>> connector-common-lib folder for this to work.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>>
>> On other crawl I extract images with sames parameters and I not have
>> problems with images. They are index without errors. Images are necessary
>> for this job. I try to recreate my job and test.
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:32
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> " java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)"
>>
>>
>>
>> This exception is occurring because you are trying to extract content
>> from an image.  In order for this to work you need a jar that isn't
>> supplied with Tika for licensing reasons.  Can you exclude images from your
>> crawl?
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> With just connectors in debug I have that informations:
>>
>>
>>
>> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
>> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970049, negotiated timeout = 40000
>>
>> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
>> live nodes from ZooKeeper... (0) -> (2)
>>
>> [Thread-269948] INFO
>> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
>> kemp-formation-solr:2181 ready
>>
>> java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)
>>
>>         at java.lang.Class.getConstructor0(Class.java:3082)
>>
>>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>>
>>         at
>> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>>
>>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>>
>>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>>
>>         at
>> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>>
>>         at
>> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x100000050ae004d, negotiated timeout = 40000
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.resize(HashMap.java:704)
>>
>>         at java.util.HashMap.putVal(HashMap.java:629)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.Arrays.copyOf(Arrays.java:3308)
>>
>>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>>
>>         at java.util.BitSet.expandTo(BitSet.java:352)
>>
>>         at java.util.BitSet.set(BitSet.java:447)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>>         at
>> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004e closed
>>
>> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004e
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004d closed
>>
>> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004d
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004a closed
>>
>> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004a
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004b closed
>>
>> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004b
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970046 closed
>>
>> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970046
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004c closed
>>
>> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004c
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004c closed
>>
>> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004c
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970048 closed
>>
>> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970048
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970049 closed
>>
>> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970049
>>
>>
>>
>> I have unactivate history to gain performances. So, can I find the last
>> file with SQL request?
>>
>>
>>
>> Maxence,
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 16:04
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> You would want to turn on connector debugging INSTEAD of the debugging
>> you've turned on, which is very noisy and not helpful.
>>
>>
>>
>> In global properties: org.apache.manifoldcf.connectors value DEBUG
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>>
>> With debug:
>>
>>
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044, closing socket
>> connection and attempting reconnect
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>>
>>         at
>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired, closing socket connection
>>
>> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970043
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired, closing socket connection
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>>
>> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae0049
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x2000000b80d0049, negotiated timeout = 40000
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970045, negotiated timeout = 40000
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.newNode(HashMap.java:1747)
>>
>>         at java.util.HashMap.putVal(HashMap.java:631)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>>
>>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>>
>>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>>
>>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>>
>>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>>
>>         at
>> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>>
>>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d0046 closed
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7538-EventT
>>
>>

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
The ContentLimiter truncates documents.  That's not what you want.

Use the Allowed Documents transformer.

Karl


On Thu, Jul 26, 2018 at 10:06 AM msaunier <ms...@citya.com> wrote:

> I have add a Content limiter transformation before Tika extractor. It’s
> very very slow now. It’s normal?
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 19:15
> *À :* user@manifoldcf.apache.org
> *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> It looks like you are still running out of memory.  I would love to know
> what document it was that doing that.  I suspect it is very large already,
> and for some reason it cannot be streamed.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:
>
> Hi Maxence,
>
>
>
> The second exception is occurring because processing is still occurring
> while the JVM is shutting down; it can be ignored.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventT
>
>

RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
I have add a Content limiter transformation before Tika extractor. It’s very very slow now. It’s normal?

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

 


Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

The limit in the solr connection will apply to the extracted data.  The
extracted data, though, cannot be computed until Tika does the extraction.

I suggest introducing a pipeline transformer in front of Tika which limits
the raw input size.

Karl


On Wed, Jul 25, 2018 at 1:40 PM msaunier <ms...@citya.com> wrote:

> The biguest document is a doc with 189 Mo. But in Solr Connector
> configuration, I have limit to 52428800b. I need to limit on all jobs
> definition ?
>
>
>
> Maxence,
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 19:15
> *À :* user@manifoldcf.apache.org
> *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think
>
>
>
> It looks like you are still running out of memory.  I would love to know
> what document it was that doing that.  I suspect it is very large already,
> and for some reason it cannot be streamed.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:
>
> Hi Maxence,
>
>
>
> The second exception is occurring because processing is still occurring
> while the JVM is shutting down; it can be ignored.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
>

RE: ***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
The biguest document is a doc with 189 Mo. But in Solr Connector configuration, I have limit to 52428800b. I need to limit on all jobs definition ?



 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 19:15
À : user@manifoldcf.apache.org
Objet : ***UNCHECKED*** Re: Out of memory, one file bug i think

 

It looks like you are still running out of memory.  I would love to know what document it was that doing that.  I suspect it is very large already, and for some reason it cannot be streamed.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored.

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

 


***UNCHECKED*** Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
It looks like you are still running out of memory.  I would love to know
what document it was that doing that.  I suspect it is very large already,
and for some reason it cannot be streamed.

Karl


On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <da...@gmail.com> wrote:

> Hi Maxence,
>
> The second exception is occurring because processing is still occurring
> while the JVM is shutting down; it can be ignored.
>
> Karl
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
>> Hi Karl,
>>
>>
>>
>> I have add the snapshot and I’m spam with this error :
>>
>>
>>
>> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 13:12
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> Tomorrow (7/26) the POI project will be delivering a nightly build which
>> should repair the Class Not Found exceptions.  You will need to download it
>> here:
>>
>>
>> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>>
>>
>>
>> ... and replace all poi jars with the corresponding ones from the binary
>> distribution.  I believe the poi jars are all in connector-common-lib.  Be
>> sure to delete the old ones (or move them somewhere else) first.
>>
>>
>>
>> I don't know whether this will fix your out of memory problem however.
>> Please let me know what's still not working and I can take it from there.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>>
>> Out of memory errors are fatal, I'm afraid, because they corrupt not only
>> the document in question but all others being processed at the same time.
>> So those cannot be ignored.
>>
>>
>>
>> Tika should ignore documents that it cannot process, however, and that is
>> a great enhancement request for them.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> Okay. So today, I'm going to force ManifoldCF to run so that only the
>> documents are left behind.
>>
>> In the future, could I ignore these mistakes? Because it makes the
>> application crash, and in production it is not terrible as behavior.
>>
>>
>>
>> Thanks
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:53
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> The problem isn't with images in general; it's with certain kinds of
>> images.  There are optional dependencies in Tika for some kinds of images
>> that we cannot include in the MCF distribution because of licensing
>> problems.  I don't know which kinds these are but apparently you are trying
>> to index some of them.
>>
>> You will need to find and download the right jar and put it in the
>> connector-common-lib folder for this to work.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>>
>> On other crawl I extract images with sames parameters and I not have
>> problems with images. They are index without errors. Images are necessary
>> for this job. I try to recreate my job and test.
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:32
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> " java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)"
>>
>>
>>
>> This exception is occurring because you are trying to extract content
>> from an image.  In order for this to work you need a jar that isn't
>> supplied with Tika for licensing reasons.  Can you exclude images from your
>> crawl?
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> With just connectors in debug I have that informations:
>>
>>
>>
>> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
>> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970049, negotiated timeout = 40000
>>
>> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
>> live nodes from ZooKeeper... (0) -> (2)
>>
>> [Thread-269948] INFO
>> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
>> kemp-formation-solr:2181 ready
>>
>> java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)
>>
>>         at java.lang.Class.getConstructor0(Class.java:3082)
>>
>>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>>
>>         at
>> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>>
>>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>>
>>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>>
>>         at
>> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>>
>>         at
>> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x100000050ae004d, negotiated timeout = 40000
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.resize(HashMap.java:704)
>>
>>         at java.util.HashMap.putVal(HashMap.java:629)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.Arrays.copyOf(Arrays.java:3308)
>>
>>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>>
>>         at java.util.BitSet.expandTo(BitSet.java:352)
>>
>>         at java.util.BitSet.set(BitSet.java:447)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>>         at
>> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004e closed
>>
>> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004e
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004d closed
>>
>> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004d
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004a closed
>>
>> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004a
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004b closed
>>
>> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004b
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970046 closed
>>
>> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970046
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004c closed
>>
>> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004c
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004c closed
>>
>> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004c
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970048 closed
>>
>> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970048
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970049 closed
>>
>> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970049
>>
>>
>>
>> I have unactivate history to gain performances. So, can I find the last
>> file with SQL request?
>>
>>
>>
>> Maxence,
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 16:04
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> You would want to turn on connector debugging INSTEAD of the debugging
>> you've turned on, which is very noisy and not helpful.
>>
>>
>>
>> In global properties: org.apache.manifoldcf.connectors value DEBUG
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>>
>> With debug:
>>
>>
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044, closing socket
>> connection and attempting reconnect
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>>
>>         at
>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired, closing socket connection
>>
>> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970043
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired, closing socket connection
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>>
>> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae0049
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x2000000b80d0049, negotiated timeout = 40000
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970045, negotiated timeout = 40000
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.newNode(HashMap.java:1747)
>>
>>         at java.util.HashMap.putVal(HashMap.java:631)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>>
>>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>>
>>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>>
>>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>>
>>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>>
>>         at
>> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>>
>>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d0046 closed
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d0046
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>>
>>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>>
>>         at
>> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>
>>
>>

Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

The second exception is occurring because processing is still occurring
while the JVM is shutting down; it can be ignored.

Karl


On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:

> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Yes I can do it tonight.

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 19:09
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

That's what I was afraid of.  The new poi jars have dependencies we haven't accounted for yet.

 

Can you download apache-commons-compress jar (latest version should be OK) and also put that in connector-common-lib?  Thanks!!

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

 


RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
commons-compress-1.17.jar

poi-4.0.0-SNAPSHOT.jar 

poi-ooxml-4.0.0-SNAPSHOT.jar

poi-ooxml-schemas-4.0.0-SNAPSHOT.jar

poi-scratchpad-4.0.0-SNAPSHOT.jar

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 13:44
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

I am wondering whether you moved any jars from dist/connector-common-lib to dist/lib?  If you did this, you will mess up the ability of any of the Tika jars to find their dependencies.  This also explains why commons-compress cannot be found; it's in connector-common-lib.  It sounds like you may have put the new poi jars in the wrong place?  They should *all* be in connector-common-lib too.

 

Karl

 

 

On Thu, Jul 26, 2018 at 6:23 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The following error:

 

>>>>>> 

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

 

<<<<<< 

 

.... seems to be the result of putting new POI jars down that are not compatible fully with the version of Tika that's there.  Unfortunately, this cannot be addressed right now in any way I can think of.  Tika's dependencies are legion and they change all the time.

 

The only thing we can really do is wait for: (1) POI to release their new software, and then (2) Tika to release a new release that depends on it.

 

Karl

 

 

On Thu, Jul 26, 2018 at 5:33 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hello Karl,

 

For the moment, it working.

 

I have write this errors but they are not FATAL:

 

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*' against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving checkInclude for 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path is 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/' and is included.

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.lang.ClassNotFoundException: org.apache.poi.POIXMLTextExtractor

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        ... 18 more

AND 

 

Starting crawler...

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

TIFFImageWriter not loaded. tiff files will not be processed

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

J2KImageReader not loaded. JPEG2000 files will not be processed.

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

 

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.

Please provide the jar on your classpath to parse sqlite files.

See tika-parsers/pom.xml for the correct version.

 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:09
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

That's what I was afraid of.  The new poi jars have dependencies we haven't accounted for yet.

 

Can you download apache-commons-compress jar (latest version should be OK) and also put that in connector-common-lib?  Thanks!!

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to r


RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Hi Karl,

 

I have replace old jars, so I have add they new (bin) jar on connector-common-lib. All jar.

 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : jeudi 26 juillet 2018 13:44
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

I am wondering whether you moved any jars from dist/connector-common-lib to dist/lib?  If you did this, you will mess up the ability of any of the Tika jars to find their dependencies.  This also explains why commons-compress cannot be found; it's in connector-common-lib.  It sounds like you may have put the new poi jars in the wrong place?  They should *all* be in connector-common-lib too.

 

Karl

 

 

On Thu, Jul 26, 2018 at 6:23 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Hi Maxence,

 

The following error:

 

>>>>>> 

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

 

<<<<<< 

 

.... seems to be the result of putting new POI jars down that are not compatible fully with the version of Tika that's there.  Unfortunately, this cannot be addressed right now in any way I can think of.  Tika's dependencies are legion and they change all the time.

 

The only thing we can really do is wait for: (1) POI to release their new software, and then (2) Tika to release a new release that depends on it.

 

Karl

 

 

On Thu, Jul 26, 2018 at 5:33 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hello Karl,

 

For the moment, it working.

 

I have write this errors but they are not FATAL:

 

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*' against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving checkInclude for 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path is 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/' and is included.

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.lang.ClassNotFoundException: org.apache.poi.POIXMLTextExtractor

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        ... 18 more

AND 

 

Starting crawler...

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

TIFFImageWriter not loaded. tiff files will not be processed

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

J2KImageReader not loaded. JPEG2000 files will not be processed.

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

 

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.

Please provide the jar on your classpath to parse sqlite files.

See tika-parsers/pom.xml for the correct version.

 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 19:09
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

That's what I was afraid of.  The new poi jars have dependencies we haven't accounted for yet.

 

Can you download apache-commons-compress jar (latest version should be OK) and also put that in connector-common-lib?  Thanks!!

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to r


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

I am wondering whether you moved any jars from dist/connector-common-lib to
dist/lib?  If you did this, you will mess up the ability of any of the Tika
jars to find their dependencies.  This also explains why commons-compress
cannot be found; it's in connector-common-lib.  It sounds like you may have
put the new poi jars in the wrong place?  They should *all* be in
connector-common-lib too.

Karl


On Thu, Jul 26, 2018 at 6:23 AM Karl Wright <da...@gmail.com> wrote:

> Hi Maxence,
>
> The following error:
>
> >>>>>>
>
> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
> org/apache/poi/POIXMLTextExtractor
>
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
> <<<<<<
>
> .... seems to be the result of putting new POI jars down that are not
> compatible fully with the version of Tika that's there.  Unfortunately,
> this cannot be addressed right now in any way I can think of.  Tika's
> dependencies are legion and they change all the time.
>
> The only thing we can really do is wait for: (1) POI to release their new
> software, and then (2) Tika to release a new release that depends on it.
>
> Karl
>
>
> On Thu, Jul 26, 2018 at 5:33 AM msaunier <ms...@citya.com> wrote:
>
>> Hello Karl,
>>
>>
>>
>> For the moment, it working.
>>
>>
>>
>> I have write this errors but they are not FATAL:
>>
>>
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*'
>> against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving
>> checkInclude for
>> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path
>> is
>> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>> and is included.
>>
>> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
>> org/apache/poi/POIXMLTextExtractor
>>
>> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.poi.POIXMLTextExtractor
>>
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> ~[?:1.8.0_171]
>>
>>         at
>> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ~[?:1.8.0_171]
>>
>>         ... 18 more
>>
>> AND
>>
>>
>>
>> Starting crawler...
>>
>> juil. 26, 2018 11:29:01 AM
>> org.apache.tika.config.InitializableProblemHandler$3
>> handleInitializableProblem
>>
>> AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>> TIFFImageWriter not loaded. tiff files will not be processed
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>> J2KImageReader not loaded. JPEG2000 files will not be processed.
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>>
>>
>> juil. 26, 2018 11:29:01 AM
>> org.apache.tika.config.InitializableProblemHandler$3
>> handleInitializableProblem
>>
>> AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.
>>
>> Please provide the jar on your classpath to parse sqlite files.
>>
>> See tika-parsers/pom.xml for the correct version.
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 19:09
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> That's what I was afraid of.  The new poi jars have dependencies we
>> haven't accounted for yet.
>>
>>
>>
>> Can you download apache-commons-compress jar (latest version should be
>> OK) and also put that in connector-common-lib?  Thanks!!
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> I have add the snapshot and I’m spam with this error :
>>
>>
>>
>> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 13:12
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> Tomorrow (7/26) the POI project will be delivering a nightly build which
>> should repair the Class Not Found exceptions.  You will need to download it
>> here:
>>
>>
>> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>>
>>
>>
>> ... and replace all poi jars with the corresponding ones from the binary
>> distribution.  I believe the poi jars are all in connector-common-lib.  Be
>> sure to delete the old ones (or move them somewhere else) first.
>>
>>
>>
>> I don't know whether this will fix your out of memory problem however.
>> Please let me know what's still not working and I can take it from there.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>>
>> Out of memory errors are fatal, I'm afraid, because they corrupt not only
>> the document in question but all others being processed at the same time.
>> So those cannot be ignored.
>>
>>
>>
>> Tika should ignore documents that it cannot process, however, and that is
>> a great enhancement request for them.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> Okay. So today, I'm going to force ManifoldCF to run so that only the
>> documents are left behind.
>>
>> In the future, could I ignore these mistakes? Because it makes the
>> application crash, and in production it is not terrible as behavior.
>>
>>
>>
>> Thanks
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:53
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> The problem isn't with images in general; it's with certain kinds of
>> images.  There are optional dependencies in Tika for some kinds of images
>> that we cannot include in the MCF distribution because of licensing
>> problems.  I don't know which kinds these are but apparently you are trying
>> to index some of them.
>>
>> You will need to find and download the right jar and put it in the
>> connector-common-lib folder for this to work.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>>
>> On other crawl I extract images with sames parameters and I not have
>> problems with images. They are index without errors. Images are necessary
>> for this job. I try to recreate my job and test.
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:32
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> " java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)"
>>
>>
>>
>> This exception is occurring because you are trying to extract content
>> from an image.  In order for this to work you need a jar that isn't
>> supplied with Tika for licensing reasons.  Can you exclude images from your
>> crawl?
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> With just connectors in debug I have that informations:
>>
>>
>>
>> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
>> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970049, negotiated timeout = 40000
>>
>> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
>> live nodes from ZooKeeper... (0) -> (2)
>>
>> [Thread-269948] INFO
>> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
>> kemp-formation-solr:2181 ready
>>
>> java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)
>>
>>         at java.lang.Class.getConstructor0(Class.java:3082)
>>
>>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>>
>>         at
>> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>>
>>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>>
>>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>>
>>         at
>> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>>
>>         at
>> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x100000050ae004d, negotiated timeout = 40000
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.resize(HashMap.java:704)
>>
>>         at java.util.HashMap.putVal(HashMap.java:629)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.Arrays.copyOf(Arrays.java:3308)
>>
>>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>>
>>         at java.util.BitSet.expandTo(BitSet.java:352)
>>
>>         at java.util.BitSet.set(BitSet.java:447)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>>         at
>> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004e closed
>>
>> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004e
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004d closed
>>
>> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004d
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004a closed
>>
>> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004a
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004b closed
>>
>> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004b
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970046 closed
>>
>> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970046
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004c closed
>>
>> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004c
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004c closed
>>
>> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004c
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970048 closed
>>
>> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970048
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970049 closed
>>
>> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970049
>>
>>
>>
>> I have unactivate history to gain performances. So, can I find the last
>> file with SQL request?
>>
>>
>>
>> Maxence,
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 16:04
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> You would want to turn on connector debugging INSTEAD of the debugging
>> you've turned on, which is very noisy and not helpful.
>>
>>
>>
>> In global properties: org.apache.manifoldcf.connectors value DEBUG
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>>
>> With debug:
>>
>>
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044, closing socket
>> connection and attempting reconnect
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>>
>>         at
>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired, closing socket connection
>>
>> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970043
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired, closing socket connection
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>>
>> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae0049
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to r
>>
>>

Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

The following error:

>>>>>>

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
~[?:?]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
~[?:?]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
~[?:?]

<<<<<<

.... seems to be the result of putting new POI jars down that are not
compatible fully with the version of Tika that's there.  Unfortunately,
this cannot be addressed right now in any way I can think of.  Tika's
dependencies are legion and they change all the time.

The only thing we can really do is wait for: (1) POI to release their new
software, and then (2) Tika to release a new release that depends on it.

Karl


On Thu, Jul 26, 2018 at 5:33 AM msaunier <ms...@citya.com> wrote:

> Hello Karl,
>
>
>
> For the moment, it working.
>
>
>
> I have write this errors but they are not FATAL:
>
>
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*'
> against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving
> checkInclude for
> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path
> is
> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
> and is included.
>
> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
> org/apache/poi/POIXMLTextExtractor
>
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.POIXMLTextExtractor
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_171]
>
>         at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_171]
>
>         ... 18 more
>
> AND
>
>
>
> Starting crawler...
>
> juil. 26, 2018 11:29:01 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
> TIFFImageWriter not loaded. tiff files will not be processed
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
> J2KImageReader not loaded. JPEG2000 files will not be processed.
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
>
>
> juil. 26, 2018 11:29:01 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.
>
> Please provide the jar on your classpath to parse sqlite files.
>
> See tika-parsers/pom.xml for the correct version.
>
>
>
> Maxence,
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 19:09
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> That's what I was afraid of.  The new poi jars have dependencies we
> haven't accounted for yet.
>
>
>
> Can you download apache-commons-compress jar (latest version should be OK)
> and also put that in connector-common-lib?  Thanks!!
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to r
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Hello Karl,

 

For the moment, it working.

 

I have write this errors but they are not FATAL:

 

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*' against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving checkInclude for 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'

DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path is 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/' and is included.

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.lang.ClassNotFoundException: org.apache.poi.POIXMLTextExtractor

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        ... 18 more

AND 

 

Starting crawler...

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

TIFFImageWriter not loaded. tiff files will not be processed

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

J2KImageReader not loaded. JPEG2000 files will not be processed.

See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io

for optional dependencies.

 

juil. 26, 2018 11:29:01 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem

AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.

Please provide the jar on your classpath to parse sqlite files.

See tika-parsers/pom.xml for the correct version.

 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 19:09
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

That's what I was afraid of.  The new poi jars have dependencies we haven't accounted for yet.

 

Can you download apache-commons-compress jar (latest version should be OK) and also put that in connector-common-lib?  Thanks!!

 

Karl

 

 

On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

 


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
That's what I was afraid of.  The new poi jars have dependencies we haven't
accounted for yet.

Can you download apache-commons-compress jar (latest version should be OK)
and also put that in connector-common-lib?  Thanks!!

Karl


On Wed, Jul 25, 2018 at 1:01 PM msaunier <ms...@citya.com> wrote:

> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)


TR: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
And:

 

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000636b80000, 2452094976, 0) failed; error='Ne peut allouer de la mémoire' (errno=12)

#

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (mmap) failed to map 2452094976 bytes for committing reserved memory.

# An error report file with more information is saved as:

# /opt/manifoldcf-trunk/bin/hs_err_pid4406.log

 

 

I join the hs_err fil

 

Maxence,

 

 

De : msaunier [mailto:msaunier@citya.com] 
Envoyé : mercredi 25 juillet 2018 18:36
À : 'user@manifoldcf.apache.org' <us...@manifoldcf.apache.org>
Objet : TR: Out of memory, one file bug i think

 

I have recrawl a part of documents and I have just this errors:

 

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

2018-07-25 18:33:39,277 Worker thread '9' FATAL Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown hook as this is not started. Current state: STOPPED

        at org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)

        at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)

        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)

        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)

        at org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)

        at org.apache.log4j.Logger.getLogger(Logger.java:37)

        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)

        at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)

        at sun.reflect.GeneratedConstructorAccessor23.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:545)

        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)

        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)

        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)

        at org.apache.pdfbox.pdmodel.font.PDCIDFontType0.<clinit>(PDCIDFontType0.java:52)

        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121)

        at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:128)

        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)

        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)

        at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:179)

        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)

        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)

        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)

        at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)

        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)

        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)

        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)

        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)

        at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)

        at org.apache.tika.extractor.EmbeddedDocumentUtil.parseEmbedded(EmbeddedDocumentUtil.java:220)

        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:124)

        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:100)

        at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:265)

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

 

[Thread-1226691] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@659a6e85 <ma...@659a6e85> 

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0xff0007615ff60078, negotiated timeout = 40000

[Thread-1226691] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-1226691] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

 

 

 

De : msaunier [mailto:msaunier@citya.com] 
Envoyé : mercredi 25 juillet 2018 16:44
À : 'user@manifoldcf.apache.org' <user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> >
Objet : RE: Out of memory, one file bug i think

 

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)


TR: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
I have recrawl a part of documents and I have just this errors:

 

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

2018-07-25 18:33:39,277 Worker thread '9' FATAL Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown hook as this is not started. Current state: STOPPED

        at org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)

        at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)

        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)

        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)

        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)

        at org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)

        at org.apache.log4j.Logger.getLogger(Logger.java:37)

        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)

        at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)

        at sun.reflect.GeneratedConstructorAccessor23.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:545)

        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)

        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)

        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)

        at org.apache.pdfbox.pdmodel.font.PDCIDFontType0.<clinit>(PDCIDFontType0.java:52)

        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121)

        at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:128)

        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)

        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)

        at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:179)

        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)

        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)

        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)

        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)

        at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)

        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)

        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)

        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)

        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)

        at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)

        at org.apache.tika.extractor.EmbeddedDocumentUtil.parseEmbedded(EmbeddedDocumentUtil.java:220)

        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:124)

        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:100)

        at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:265)

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

 

[Thread-1226691] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@659a6e85

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-1226691-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0xff0007615ff60078, negotiated timeout = 40000

[Thread-1226691] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-1226691] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

 

 

 

De : msaunier [mailto:msaunier@citya.com] 
Envoyé : mercredi 25 juillet 2018 16:44
À : 'user@manifoldcf.apache.org' <us...@manifoldcf.apache.org>
Objet : RE: Out of memory, one file bug i think

 

Hi Karl,

 

I have add the snapshot and I’m spam with this error :

 

FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: org/apache/commons/compress/utils/InputStreamStatistics

java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics

        at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) ~[?:?]

        at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) ~[?:?]

        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]

        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) ~[?:?]

        at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) ~[?:?]

        at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) ~[?:?]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[?:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

 

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 25 juillet 2018 13:12
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions.  You will need to download it here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

 

... and replace all poi jars with the corresponding ones from the binary distribution.  I believe the poi jars are all in connector-common-lib.  Be sure to delete the old ones (or move them somewhere else) first.

 

I don't know whether this will fix your out of memory problem however.  Please let me know what's still not working and I can take it from there.

 

Karl

 

 

On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time.  So those cannot be ignored.

 

Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them.

 

Karl

 

 

On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

Tomorrow (7/26) the POI project will be delivering a nightly build which
should repair the Class Not Found exceptions.  You will need to download it
here:

https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

... and replace all poi jars with the corresponding ones from the binary
distribution.  I believe the poi jars are all in connector-common-lib.  Be
sure to delete the old ones (or move them somewhere else) first.

I don't know whether this will fix your out of memory problem however.
Please let me know what's still not working and I can take it from there.

Karl


On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <da...@gmail.com> wrote:

> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
> Karl
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:
>
>> Hi Karl,
>>
>>
>>
>> Okay. So today, I'm going to force ManifoldCF to run so that only the
>> documents are left behind.
>>
>> In the future, could I ignore these mistakes? Because it makes the
>> application crash, and in production it is not terrible as behavior.
>>
>>
>>
>> Thanks
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:53
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> The problem isn't with images in general; it's with certain kinds of
>> images.  There are optional dependencies in Tika for some kinds of images
>> that we cannot include in the MCF distribution because of licensing
>> problems.  I don't know which kinds these are but apparently you are trying
>> to index some of them.
>>
>> You will need to find and download the right jar and put it in the
>> connector-common-lib folder for this to work.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>>
>> On other crawl I extract images with sames parameters and I not have
>> problems with images. They are index without errors. Images are necessary
>> for this job. I try to recreate my job and test.
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:32
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> " java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)"
>>
>>
>>
>> This exception is occurring because you are trying to extract content
>> from an image.  In order for this to work you need a jar that isn't
>> supplied with Tika for licensing reasons.  Can you exclude images from your
>> crawl?
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> With just connectors in debug I have that informations:
>>
>>
>>
>> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
>> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970049, negotiated timeout = 40000
>>
>> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
>> live nodes from ZooKeeper... (0) -> (2)
>>
>> [Thread-269948] INFO
>> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
>> kemp-formation-solr:2181 ready
>>
>> java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)
>>
>>         at java.lang.Class.getConstructor0(Class.java:3082)
>>
>>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>>
>>         at
>> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>>
>>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>>
>>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>>
>>         at
>> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>>
>>         at
>> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x100000050ae004d, negotiated timeout = 40000
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.resize(HashMap.java:704)
>>
>>         at java.util.HashMap.putVal(HashMap.java:629)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.Arrays.copyOf(Arrays.java:3308)
>>
>>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>>
>>         at java.util.BitSet.expandTo(BitSet.java:352)
>>
>>         at java.util.BitSet.set(BitSet.java:447)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>>         at
>> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004e closed
>>
>> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004e
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004d closed
>>
>> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004d
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004a closed
>>
>> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004a
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004b closed
>>
>> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004b
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970046 closed
>>
>> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970046
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004c closed
>>
>> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004c
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004c closed
>>
>> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004c
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970048 closed
>>
>> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970048
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970049 closed
>>
>> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970049
>>
>>
>>
>> I have unactivate history to gain performances. So, can I find the last
>> file with SQL request?
>>
>>
>>
>> Maxence,
>>
>>
>>
>> *De :* Karl Wright [mailto:daddywri@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 16:04
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> You would want to turn on connector debugging INSTEAD of the debugging
>> you've turned on, which is very noisy and not helpful.
>>
>>
>>
>> In global properties: org.apache.manifoldcf.connectors value DEBUG
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>>
>> With debug:
>>
>>
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044, closing socket
>> connection and attempting reconnect
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>>
>>         at
>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired, closing socket connection
>>
>> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970043
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired, closing socket connection
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>>
>> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae0049
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x2000000b80d0049, negotiated timeout = 40000
>>
>> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
>> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970045, negotiated timeout = 40000
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.newNode(HashMap.java:1747)
>>
>>         at java.util.HashMap.putVal(HashMap.java:631)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>>
>>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>>
>>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>>
>>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>>
>>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>>
>>         at
>> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>>
>>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
>> reestablished.
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-11-thread-2] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
>> ZooKeeper
>>
>> [zkCallback-3-thread-4] INFO
>> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d0046 closed
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-21-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d0046
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>>
>>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>>
>>         at
>> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>> [zkCallback-19-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@43f7378f name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-19-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [zkCallback-15-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@6432608f name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-15-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [zkCallback-13-thread-3] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-13-thread-3] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)
>>
>>         at java.lang.StringCoding.encode(StringCoding.java:348)
>>
>>         at java.lang.String.getBytes(String.java:941)
>>
>>         at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)
>>
>>         at
>> org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)
>>
>>         at
>> org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)
>>
>>         at
>> org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)
>>
>>         at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)
>>
>>         at
>> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>>
>>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>>
>>         at
>> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>>
>>         at
>> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>>
>>         at
>> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>>
>>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
>>
>>

Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Out of memory errors are fatal, I'm afraid, because they corrupt not only
the document in question but all others being processed at the same time.
So those cannot be ignored.

Tika should ignore documents that it cannot process, however, and that is a
great enhancement request for them.

Karl


On Wed, Jul 25, 2018 at 3:39 AM msaunier <ms...@citya.com> wrote:

> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@43f7378f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@6432608f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)
>
>         at java.lang.StringCoding.encode(StringCoding.java:348)
>
>         at java.lang.String.getBytes(String.java:941)
>
>         at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)
>
>         at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>
>         at
> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Hi Karl,

 

Okay. So today, I'm going to force ManifoldCF to run so that only the documents are left behind.

In the future, could I ignore these mistakes? Because it makes the application crash, and in production it is not terrible as behavior.

 

Thanks

Maxence,

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970044 closed

[Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970044

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004a, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004a closed

[Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004a

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/


RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
How can I find list of this images do delete all of these ?

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 17:53
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

The problem isn't with images in general; it's with certain kinds of images.  There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems.  I don't know which kinds these are but apparently you are trying to index some of them.

You will need to find and download the right jar and put it in the connector-common-lib folder for this to work.

 

Karl

 

 

On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto: <ma...@gmail.com> daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 17:32
À :  <ma...@manifoldcf.apache.org> user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000  <ma...@3c351b22> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@5382340> org.apache.solr.common.cloud.ConnectionManager@5382340 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{ <http://0.0.0.0:8345> 0.0.0.0:8345}

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped  <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped  <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto: <ma...@gmail.com> daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 16:04
À :  <ma...@manifoldcf.apache.org> user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@7a5c701e> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@53181a58> org.apache.solr.common.cloud.ConnectionManager@53181a58 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@53181a58> org.apache.solr.common.cloud.ConnectionManager@53181a58 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000  <ma...@53181a58> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@7a5c701e> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000  <ma...@7a5c701e> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{ <http://0.0.0.0:8345> 0.0.0.0:8345}

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@381a7557> org.apache.solr.common.cloud.ConnectionManager@381a7557 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@43f7378f> org.apache.solr.common.cloud.ConnectionManager@43f7378f name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@6432608f> org.apache.solr.common.cloud.ConnectionManager@6432608f name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher  <ma...@68bb3d74> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970044 closed

[Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970044

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, sessionid = 0x100000050ae004a, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004a closed

[Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004a

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/ <http://192.168.37.107:2181> 192.168.37.107:2181, initiating session

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
The problem isn't with images in general; it's with certain kinds of
images.  There are optional dependencies in Tika for some kinds of images
that we cannot include in the MCF distribution because of licensing
problems.  I don't know which kinds these are but apparently you are trying
to index some of them.

You will need to find and download the right jar and put it in the
connector-common-lib folder for this to work.

Karl


On Tue, Jul 24, 2018 at 11:36 AM msaunier <ms...@citya.com> wrote:

> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@43f7378f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@6432608f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)
>
>         at java.lang.StringCoding.encode(StringCoding.java:348)
>
>         at java.lang.String.getBytes(String.java:941)
>
>         at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)
>
>         at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>
>         at
> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970044 closed
>
> [Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970044
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004a, negotiated timeout = 40000
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004a closed
>
> [Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004a
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/ <http://192.168.37.107:2181>
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
On other crawl I extract images with sames parameters and I not have problems with images. They are index without errors. Images are necessary for this job. I try to recreate my job and test.

 

Thanks,

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 17:32
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

" java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)"

 

This exception is occurring because you are trying to extract content from an image.  In order for this to work you need a jar that isn't supplied with Tika for licensing reasons.  Can you exclude images from your crawl?

 

Karl

 

 

On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 <ma...@3c351b22> 

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 <ma...@5382340>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970044 closed

[Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970044

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004a, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004a closed

[Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004a

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0047, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0047 closed

[Thread-7602-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0047

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

 

 

Any idea?

Thanks.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 13:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

I've opened CONNECTORS-1516 to track the Class Not Found issue, and also created an Apache POI bugzilla ticket, which is referenced.

 

Karl

 

 

On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The "class not found" error looks probably like a classloader issue with Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it might possibly be caused by an out-of-memory condition.

You should be able to find the exception in the Simple History and figure out what document it came from from that.  If not, then look at the log prior to the exception, and look at what Worker Thread 1 was doing.

 

Karl

 

 

On Tue, Jul 24, 2018 at 5:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Re Karl,

 

I have an Out of Memory Error today. I think I have an error with a document. I have this WARNING before crash:

 

------------------------------------------------------------------------

 

WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception extracting: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.17.jar:1.17]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) [mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) [mcf-jcifs-connector.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

 

I think it’s a file, because RAM allocation have a weird behavior. In one second, ManifoldCF (or Tika) allocate +6Go RAM.

 

 

How Can I find the file?

 

Thanks,

Maxence,


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
" java.lang.NoSuchMethodException: org.openxmlformats.schemas.
wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(
org.apache.xmlbeans.SchemaType, boolean)"

This exception is occurring because you are trying to extract content from
an image.  In order for this to work you need a jar that isn't supplied
with Tika for licensing reasons.  Can you exclude images from your crawl?

Karl


On Tue, Jul 24, 2018 at 10:32 AM msaunier <ms...@citya.com> wrote:

> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@44d52de2
> {/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@60410cd
> {/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@43f7378f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@6432608f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)
>
>         at java.lang.StringCoding.encode(StringCoding.java:348)
>
>         at java.lang.String.getBytes(String.java:941)
>
>         at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)
>
>         at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>
>         at
> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970044 closed
>
> [Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970044
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004a, negotiated timeout = 40000
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004a closed
>
> [Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004a
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0047, negotiated timeout = 40000
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0047 closed
>
> [Thread-7602-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0047
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
>
>
>
>
> Any idea?
>
> Thanks.
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 13:15
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> I've opened CONNECTORS-1516 to track the Class Not Found issue, and also
> created an Apache POI bugzilla ticket, which is referenced.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <da...@gmail.com> wrote:
>
> The "class not found" error looks probably like a classloader issue with
> Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it
> might possibly be caused by an out-of-memory condition.
>
> You should be able to find the exception in the Simple History and figure
> out what document it came from from that.  If not, then look at the log
> prior to the exception, and look at what Worker Thread 1 was doing.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 5:58 AM msaunier <ms...@citya.com> wrote:
>
> Re Karl,
>
>
>
> I have an Out of Memory Error today. I think I have an error with a
> document. I have this WARNING before crash:
>
>
>
> ------------------------------------------------------------------------
>
>
>
> WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception
> extracting: TIKA-198: Illegal IOException from
> org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
> from org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> [mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> [mcf-jcifs-connector.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> Caused by: java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_171]
>
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_171]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
>
>
> I think it’s a file, because RAM allocation have a weird behavior. In one
> second, ManifoldCF (or Tika) allocate +6Go RAM.
>
>
>
>
>
> How Can I find the file?
>
>
>
> Thanks,
>
> Maxence,
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
Hi Karl,

 

With just connectors in debug I have that informations:

 

[Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0xff00000201970049, negotiated timeout = 40000

[Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated live nodes from ZooKeeper... (0) -> (2)

[Thread-269948] INFO org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at kemp-formation-solr:2181 ready

java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)

        at java.lang.Class.getConstructor0(Class.java:3082)

        at java.lang.Class.getDeclaredConstructor(Class.java:2178)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)

        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)

        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)

        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)

        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)

        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)

        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)

        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)

        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)

        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)

        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)

        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)

        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28024ms for sessionid 0x100000050ae004d, closing socket connection and attempting reconnect

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@5382340 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-16-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)

        at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)

        at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)

        at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)

        at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:896)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0x100000050ae004d, negotiated timeout = 40000

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.resize(HashMap.java:704)

        at java.util.HashMap.putVal(HashMap.java:629)

        at java.util.HashMap.put(HashMap.java:612)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)

        at org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)

        at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)

        at org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.Arrays.copyOf(Arrays.java:3308)

        at java.util.BitSet.ensureCapacity(BitSet.java:337)

        at java.util.BitSet.expandTo(BitSet.java:352)

        at java.util.BitSet.set(BitSet.java:447)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

        at org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)

        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)

        at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004e closed

[Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004e

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004d closed

[Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004d

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004a closed

[Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004a

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004b closed

[Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004b

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970046 closed

[Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970046

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004c closed

[Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004c

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d004c closed

[Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d004c

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970048 closed

[Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970048

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970049 closed

[Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970049

 

I have unactivate history to gain performances. So, can I find the last file with SQL request?

 

Maxence,

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 16:04
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

Hi Maxence,

 

You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful.

 

In global properties: org.apache.manifoldcf.connectors value DEBUG

 

Karl

 

 

On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 <ma...@53181a58> 

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e <ma...@7a5c701e> 

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345> }

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 <ma...@381a7557>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f <ma...@43f7378f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f <ma...@6432608f>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 <ma...@68bb3d74>  name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970044 closed

[Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970044

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x100000050ae004a, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004a closed

[Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004a

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> . Will not attempt to authenticate using SASL (unknown error)

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , initiating session

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181 <http://192.168.37.107:2181> , sessionid = 0x2000000b80d0047, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0047 closed

[Thread-7602-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0047

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} <mailto:o.e.j.w.WebAppContext@44d52de2%7b/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war%7d> 

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

 

 

Any idea?

Thanks.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com> ] 
Envoyé : mardi 24 juillet 2018 13:15
À : user@manifoldcf.apache.org <ma...@manifoldcf.apache.org> 
Objet : Re: Out of memory, one file bug i think

 

I've opened CONNECTORS-1516 to track the Class Not Found issue, and also created an Apache POI bugzilla ticket, which is referenced.

 

Karl

 

 

On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The "class not found" error looks probably like a classloader issue with Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it might possibly be caused by an out-of-memory condition.

You should be able to find the exception in the Simple History and figure out what document it came from from that.  If not, then look at the log prior to the exception, and look at what Worker Thread 1 was doing.

 

Karl

 

 

On Tue, Jul 24, 2018 at 5:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Re Karl,

 

I have an Out of Memory Error today. I think I have an error with a document. I have this WARNING before crash:

 

------------------------------------------------------------------------

 

WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception extracting: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.17.jar:1.17]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) [mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) [mcf-jcifs-connector.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

 

I think it’s a file, because RAM allocation have a weird behavior. In one second, ManifoldCF (or Tika) allocate +6Go RAM.

 

 

How Can I find the file?

 

Thanks,

Maxence,


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

You would want to turn on connector debugging INSTEAD of the debugging
you've turned on, which is very noisy and not helpful.

In global properties: org.apache.manifoldcf.connectors value DEBUG

Karl


On Tue, Jul 24, 2018 at 9:12 AM msaunier <ms...@citya.com> wrote:

> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0049, negotiated timeout = 40000
>
> [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)]
> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on
> server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970045, negotiated timeout = 40000
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.newNode(HashMap.java:1747)
>
>         at java.util.HashMap.putVal(HashMap.java:631)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:66)
>
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:661)
>
>         at jcifs.smb.SmbSession.send(SmbSession.java:238)
>
>         at jcifs.smb.SmbTree.send(SmbTree.java:119)
>
>         at jcifs.smb.SmbFile.send(SmbFile.java:776)
>
>         at
> jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>
>         at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper
> reestablished.
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-11-thread-2] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to
> ZooKeeper
>
> [zkCallback-3-thread-4] INFO
> org.apache.solr.common.cloud.ConnectionManager - Connected:true
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0046 closed
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@381a7557 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-21-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0046
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.regex.Matcher.<init>(Matcher.java:225)
>
>         at java.util.regex.Pattern.matcher(Pattern.java:1093)
>
>         at
> de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@43f7378f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-19-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@6432608f name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-15-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-13-thread-3] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)
>
>         at java.lang.StringCoding.encode(StringCoding.java:348)
>
>         at java.lang.String.getBytes(String.java:941)
>
>         at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)
>
>         at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)
>
>         at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>
>         at
> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>
>         at
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970044 closed
>
> [Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970044
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004a, negotiated timeout = 40000
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004a closed
>
> [Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004a
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x2000000b80d0047, negotiated timeout = 40000
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d0047 closed
>
> [Thread-7602-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d0047
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@44d52de2
> {/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
>
>
>
>
> Any idea?
>
> Thanks.
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 24 juillet 2018 13:15
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> I've opened CONNECTORS-1516 to track the Class Not Found issue, and also
> created an Apache POI bugzilla ticket, which is referenced.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <da...@gmail.com> wrote:
>
> The "class not found" error looks probably like a classloader issue with
> Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it
> might possibly be caused by an out-of-memory condition.
>
> You should be able to find the exception in the Simple History and figure
> out what document it came from from that.  If not, then look at the log
> prior to the exception, and look at what Worker Thread 1 was doing.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 5:58 AM msaunier <ms...@citya.com> wrote:
>
> Re Karl,
>
>
>
> I have an Out of Memory Error today. I think I have an error with a
> document. I have this WARNING before crash:
>
>
>
> ------------------------------------------------------------------------
>
>
>
> WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception
> extracting: TIKA-198: Illegal IOException from
> org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
> from org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> [mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> [mcf-jcifs-connector.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> Caused by: java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_171]
>
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_171]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
>
>
> I think it’s a file, because RAM allocation have a weird behavior. In one
> second, ManifoldCF (or Tika) allocate +6Go RAM.
>
>
>
>
>
> How Can I find the file?
>
>
>
> Thanks,
>
> Maxence,
>
>

RE: Out of memory, one file bug i think

Posted by msaunier <ms...@citya.com>.
With debug:

 

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x100000050ae0049, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27737ms for sessionid 0xff00000201970043, closing socket connection and attempting reconnect

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28394ms for sessionid 0x2000000b80d0047, closing socket connection and attempting reconnect

[Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27708ms for sessionid 0xff00000201970044, closing socket connection and attempting reconnect

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046

[Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 36805ms for sessionid 0x2000000b80d0046, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.lang.StringBuilder.toString(StringBuilder.java:407)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)

        at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)

        at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)

        at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)

        at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)

        at org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)

        at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)

        at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

agents process ran out of memory - shutting down

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 27763ms for sessionid 0x100000050ae004a, closing socket connection and attempting reconnect

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-3-thread-7] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28316ms for sessionid 0x100000050ae004b, closing socket connection and attempting reconnect

java.lang.OutOfMemoryError: GC overhead limit exceeded

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-11-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired

[Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0xff00000201970043 has expired, closing socket connection

[Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970043

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@53181a58 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired

[Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x100000050ae0049 has expired, closing socket connection

[zkCallback-11-thread-2] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58

[Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae0049

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Expired type:None path:null path: null type: None

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...

[zkCallback-3-thread-4] WARN org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one...

[zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}

[zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0x2000000b80d0049, negotiated timeout = 40000

[zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0xff00000201970045, negotiated timeout = 40000

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.HashMap.newNode(HashMap.java:1747)

        at java.util.HashMap.putVal(HashMap.java:631)

        at java.util.HashMap.put(HashMap.java:612)

        at jcifs.util.transport.Transport.sendrecv(Transport.java:66)

        at jcifs.smb.SmbTransport.send(SmbTransport.java:661)

        at jcifs.smb.SmbSession.send(SmbSession.java:238)

        at jcifs.smb.SmbTree.send(SmbTree.java:119)

        at jcifs.smb.SmbFile.send(SmbFile.java:776)

        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)

        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903)

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper reestablished.

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-11-thread-2] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper

[zkCallback-3-thread-4] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0046 closed

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@381a7557 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-21-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[Thread-7538-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0046

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at java.util.regex.Matcher.<init>(Matcher.java:225)

        at java.util.regex.Pattern.matcher(Pattern.java:1093)

        at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)

        at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)

        at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

        at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)

        at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)

        at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)

        at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)

        at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)

        at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@43f7378f name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-19-thread-5] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@6432608f name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-15-thread-2] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@68bb3d74 name: ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None

[zkCallback-13-thread-3] WARN org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected

agents process ran out of memory - shutting down

java.lang.OutOfMemoryError: GC overhead limit exceeded

        at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72)

        at java.lang.StringCoding.encode(StringCoding.java:348)

        at java.lang.String.getBytes(String.java:941)

        at org.postgresql.core.Utils.encodeUTF8(Utils.java:53)

        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1448)

        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1777)

        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1354)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:292)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)

        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)

        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)

        at org.apache.manifoldcf.core.database.Database.execute(Database.java:876)

        at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0xff00000201970044 closed

[Thread-31532-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0xff00000201970044

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0x100000050ae004a, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x100000050ae004a closed

[Thread-7574-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000050ae004a

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to authenticate using SASL (unknown error)

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session

[Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = 0x2000000b80d0047, negotiated timeout = 40000

[Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000000b80d0047 closed

[Thread-7602-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000000b80d0047

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5748290590258150821.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}

[Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} <mailto:o.e.j.w.WebAppContext@60410cd%7b/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-1380683823589504600.dir/webapp/,UNAVAILABLE%7d%7b/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war%7d> 

 

 

Any idea?

Thanks.

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 24 juillet 2018 13:15
À : user@manifoldcf.apache.org
Objet : Re: Out of memory, one file bug i think

 

I've opened CONNECTORS-1516 to track the Class Not Found issue, and also created an Apache POI bugzilla ticket, which is referenced.

 

Karl

 

 

On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com> > wrote:

The "class not found" error looks probably like a classloader issue with Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it might possibly be caused by an out-of-memory condition.

You should be able to find the exception in the Simple History and figure out what document it came from from that.  If not, then look at the log prior to the exception, and look at what Worker Thread 1 was doing.

 

Karl

 

 

On Tue, Jul 24, 2018 at 5:58 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Re Karl,

 

I have an Out of Memory Error today. I think I have an error with a document. I have this WARNING before crash:

 

------------------------------------------------------------------------

 

WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception extracting: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@62980adb <ma...@62980adb> 

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.17.jar:1.17]

        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[tika-core-1.17.jar:1.17]

        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) ~[mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) [mcf-tika-connector.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) [mcf-agents.jar:?]

        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) [mcf-agents.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) [mcf-pull-agent.jar:?]

        at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) [mcf-jcifs-connector.jar:?]

        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]

Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder

        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_171]

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_171]

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_171]

        at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ~[?:?]

        at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) ~[?:?]

        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) ~[?:?]

        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]

        ... 12 more

 

I think it’s a file, because RAM allocation have a weird behavior. In one second, ManifoldCF (or Tika) allocate +6Go RAM.

 



 

How Can I find the file?

 

Thanks,

Maxence,


Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
I've opened CONNECTORS-1516 to track the Class Not Found issue, and also
created an Apache POI bugzilla ticket, which is referenced.

Karl


On Tue, Jul 24, 2018 at 6:15 AM Karl Wright <da...@gmail.com> wrote:

> The "class not found" error looks probably like a classloader issue with
> Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it
> might possibly be caused by an out-of-memory condition.
>
> You should be able to find the exception in the Simple History and figure
> out what document it came from from that.  If not, then look at the log
> prior to the exception, and look at what Worker Thread 1 was doing.
>
> Karl
>
>
> On Tue, Jul 24, 2018 at 5:58 AM msaunier <ms...@citya.com> wrote:
>
>> Re Karl,
>>
>>
>>
>> I have an Out of Memory Error today. I think I have an error with a
>> document. I have this WARNING before crash:
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception
>> extracting: TIKA-198: Illegal IOException from
>> org.apache.tika.parser.microsoft.OfficeParser@62980adb
>>
>> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
>> from org.apache.tika.parser.microsoft.OfficeParser@62980adb
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>> ~[tika-core-1.17.jar:1.17]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[tika-core-1.17.jar:1.17]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>> ~[tika-core-1.17.jar:1.17]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[mcf-tika-connector.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> [mcf-tika-connector.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> [mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> [mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> [mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> [mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> [mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> [mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> [mcf-jcifs-connector.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>> Caused by: java.io.IOException: java.lang.ClassNotFoundException:
>> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>>
>>         at
>> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
>> ~[?:?]
>>
>>        at
>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         ... 12 more
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>>
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> ~[?:1.8.0_171]
>>
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ~[?:1.8.0_171]
>>
>>         at
>> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         ... 12 more
>>
>>
>>
>> I think it’s a file, because RAM allocation have a weird behavior. In one
>> second, ManifoldCF (or Tika) allocate +6Go RAM.
>>
>>
>>
>>
>>
>> How Can I find the file?
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>

Re: Out of memory, one file bug i think

Posted by Karl Wright <da...@gmail.com>.
The "class not found" error looks probably like a classloader issue with
Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it
might possibly be caused by an out-of-memory condition.

You should be able to find the exception in the Simple History and figure
out what document it came from from that.  If not, then look at the log
prior to the exception, and look at what Worker Thread 1 was doing.

Karl


On Tue, Jul 24, 2018 at 5:58 AM msaunier <ms...@citya.com> wrote:

> Re Karl,
>
>
>
> I have an Out of Memory Error today. I think I have an error with a
> document. I have this WARNING before crash:
>
>
>
> ------------------------------------------------------------------------
>
>
>
> WARN 2018-07-24T11:46:22,098 (Worker thread '1') - Tika: Tika exception
> extracting: TIKA-198: Illegal IOException from
> org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
> from org.apache.tika.parser.microsoft.OfficeParser@62980adb
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[tika-core-1.17.jar:1.17]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> [mcf-tika-connector.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> [mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> [mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> [mcf-jcifs-connector.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> Caused by: java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_171]
>
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_171]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
> ~[?:?]
>
>         at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> ~[?:?]
>
>         at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         ... 12 more
>
>
>
> I think it’s a file, because RAM allocation have a weird behavior. In one
> second, ManifoldCF (or Tika) allocate +6Go RAM.
>
>
>
>
>
> How Can I find the file?
>
>
>
> Thanks,
>
> Maxence,
>