You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2008/12/11 10:11:44 UTC

[jira] Commented: (SOLR-846) Out Of memory doing delta import with fetch size set to -1

    [ https://issues.apache.org/jira/browse/SOLR-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655574#action_12655574 ] 

Shalin Shekhar Mangar commented on SOLR-846:
--------------------------------------------

Committed revision 725627.

I've committed Noble's patch, however as he noted, it is only a partial solution. I'm in favor of streaming it however that will be an invasive change. Let's keep this issue open until we can implement a better solution.

> Out Of memory doing delta import with fetch size set to -1
> ----------------------------------------------------------
>
>                 Key: SOLR-846
>                 URL: https://issues.apache.org/jira/browse/SOLR-846
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>         Environment: Linux 2.6.18-92.1.13.el5xen, mysql 5.0
>            Reporter: Ricky Leung
>         Attachments: SOLR-846.patch
>
>
> Database has about 3 million records.  Doing full-import there is no problem.  However, when a large number of changes occurred 2558057, delta-import throws OutOfMemory error after 1288338 documents processed.  The stack trace is below
> Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
> 	at org.tartarus.snowball.ext.EnglishStemmer.<init>(EnglishStemmer.java:4
> 9)
> 	at org.apache.solr.analysis.EnglishPorterFilter.<init>(EnglishPorterFilt
> erFactory.java:83)
> 	at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
> terFilterFactory.java:66)
> 	at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
> terFilterFactory.java:35)
> 	at org.apache.solr.analysis.TokenizerChain.tokenStream(TokenizerChain.ja
> va:48)
> 	at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(Inde
> xSchema.java:348)
> 	at org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java
> :44)
> 	at org.apache.lucene.index.DocInverterPerField.processFields(DocInverter
> PerField.java:117)
> 	at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFi
> eldConsumersPerField.java:36)
> 	at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(Do
> cFieldProcessorPerThread.java:234)
> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
> r.java:765)
> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
> r.java:748)
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
> 118)
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
> 095)
> 	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandle
> r2.java:232)
> 	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
> ateProcessorFactory.java:59)
> 	at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:
> 69)
> 	at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImp
> ortHandler.java:288)
> 	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> r.java:319)
> 	at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java
> :211)
> 	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> :133)
> 	at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImp
> orter.java:359)
> 	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> ava:388)
> 	at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
> va:377)
> dataSource in data-config.xml has been with the batchSize of "-1".
>     <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://host/dbname" 
> user="*" password="*" batchSize="-1"/>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (SOLR-846) Out Of memory doing delta import with fetch size set to -1

Posted by Marc Sturlese <ma...@gmail.com>.
Hey there,

I experienced the problem and sort it with the patch. But... in case I would
have 5000000 of rows to modify the outofmemory problem would appear again?

Would be a good solution to run the query with limit 100000?. And keep doing
it until no more docs would have to be updated? Every time a query is ran
and data is persisted I would set he maps to null. Would this be a good
solution to turn dataimporthandler more scalable?

Thanks in advanced



JIRA jira@apache.org wrote:
> 
> 
>     [
> https://issues.apache.org/jira/browse/SOLR-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655574#action_12655574
> ] 
> 
> Shalin Shekhar Mangar commented on SOLR-846:
> --------------------------------------------
> 
> Committed revision 725627.
> 
> I've committed Noble's patch, however as he noted, it is only a partial
> solution. I'm in favor of streaming it however that will be an invasive
> change. Let's keep this issue open until we can implement a better
> solution.
> 
>> Out Of memory doing delta import with fetch size set to -1
>> ----------------------------------------------------------
>>
>>                 Key: SOLR-846
>>                 URL: https://issues.apache.org/jira/browse/SOLR-846
>>             Project: Solr
>>          Issue Type: Bug
>>          Components: contrib - DataImportHandler
>>    Affects Versions: 1.3
>>         Environment: Linux 2.6.18-92.1.13.el5xen, mysql 5.0
>>            Reporter: Ricky Leung
>>         Attachments: SOLR-846.patch
>>
>>
>> Database has about 3 million records.  Doing full-import there is no
>> problem.  However, when a large number of changes occurred 2558057,
>> delta-import throws OutOfMemory error after 1288338 documents processed. 
>> The stack trace is below
>> Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap
>> space
>> 	at org.tartarus.snowball.ext.EnglishStemmer.<init>(EnglishStemmer.java:4
>> 9)
>> 	at org.apache.solr.analysis.EnglishPorterFilter.<init>(EnglishPorterFilt
>> erFactory.java:83)
>> 	at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
>> terFilterFactory.java:66)
>> 	at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
>> terFilterFactory.java:35)
>> 	at org.apache.solr.analysis.TokenizerChain.tokenStream(TokenizerChain.ja
>> va:48)
>> 	at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(Inde
>> xSchema.java:348)
>> 	at org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java
>> :44)
>> 	at org.apache.lucene.index.DocInverterPerField.processFields(DocInverter
>> PerField.java:117)
>> 	at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFi
>> eldConsumersPerField.java:36)
>> 	at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(Do
>> cFieldProcessorPerThread.java:234)
>> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
>> r.java:765)
>> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
>> r.java:748)
>> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
>> 118)
>> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
>> 095)
>> 	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandle
>> r2.java:232)
>> 	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
>> ateProcessorFactory.java:59)
>> 	at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:
>> 69)
>> 	at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImp
>> ortHandler.java:288)
>> 	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:319)
>> 	at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java
>> :211)
>> 	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :133)
>> 	at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImp
>> orter.java:359)
>> 	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:388)
>> 	at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>> va:377)
>> dataSource in data-config.xml has been with the batchSize of "-1".
>>     <dataSource driver="com.mysql.jdbc.Driver"
>> url="jdbc:mysql://host/dbname" 
>> user="*" password="*" batchSize="-1"/>
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-846%29-Out-Of-memory-doing-delta-import-with-fetch-size-set-to--1-tp20441742p21145545.html
Sent from the Solr - Dev mailing list archive at Nabble.com.