You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Laurent Chavet (JIRA)" <ji...@apache.org> on 2009/12/04 21:43:20 UTC
[jira] Created: (SOLR-1623) Solr hangs (often throwing
java.lang.OutOfMemoryError: PermGen space) when indexing many different
field names
Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
--------------------------------------------------------------------------------------------------------------
Key: SOLR-1623
URL: https://issues.apache.org/jira/browse/SOLR-1623
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.4, 1.3
Environment: Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
Apache Tomcat/6.0 snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux 2.6.18-164.el5 amd64
and/or
Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
Apache Tomcat/6.0.18 1.6.0_12-b04 Sun Microsystems Inc. Windows 2003 5.2 amd64
Reporter: Laurent Chavet
Priority: Critical
With the following fields in schema.xml:
<fields>
<field name="id" type="sint" indexed="true" stored="true" required="true" />
<dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
</fields>
Run the following code:
import java.util.ArrayList;
import java.util.List;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
public static void main(String[] args) throws Exception {
SolrServer server;
try {
server = new CommonsHttpSolrServer(args[0]);
} catch (Exception e) {
System.err.println("can't creater server using: " + args[0] + " " + e.getMessage());
throw e;
}
for (int i = 0; i < 1000; i++) {
List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
for (int j = 0; j < 1000; j++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", i * 1000 + j);
// hangs after 30 to 50 batches
doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
// hangs after about 200 batches
//doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
batchedDocs.add(doc);
}
try {
server.add(batchedDocs, true);
System.err.println("Done with batch=" + i);
// server.commit(); //doesn't change anything
} catch (Exception e) {
System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
throw e;
}
}
}
And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1623) Solr hangs (often throwing
java.lang.OutOfMemoryError: PermGen space) when indexing many different
field names
Posted by "Laurent Chavet (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786185#action_12786185 ]
Laurent Chavet commented on SOLR-1623:
--------------------------------------
Yes this definitely repros in 1.4.
Unfortunately I think I need a lot of fields; here is what I am trying to do:
I want to store news articles and extract many topics for each story with a score for each topic for each story.
So for example a story migh have a topic of Crime with a score of 20.
So what I am doing now is store:
Field:Topic Value:Crime indexed="true" stored="true" (need to searched and retrieved)
Field:Weight_Topic_Crime Value:20 indexed="true" stored="true" (needs to be sorted and retrieved)
Because there can be a lot of different value for the field topic; with this schema we end up with a lot of fields starting with weight.
Any suggestion on how to achieve the same result in a different way?
Thanks,
Laurent
> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1623
> URL: https://issues.apache.org/jira/browse/SOLR-1623
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 1.3, 1.4
> Environment: Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0 snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux 2.6.18-164.el5 amd64
> and/or
> Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0.18 1.6.0_12-b04 Sun Microsystems Inc. Windows 2003 5.2 amd64
> Reporter: Laurent Chavet
> Priority: Critical
>
> With the following fields in schema.xml:
> <fields>
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
> <dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> public static void main(String[] args) throws Exception {
> SolrServer server;
> try {
> server = new CommonsHttpSolrServer(args[0]);
> } catch (Exception e) {
> System.err.println("can't creater server using: " + args[0] + " " + e.getMessage());
> throw e;
> }
> for (int i = 0; i < 1000; i++) {
> List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
> for (int j = 0; j < 1000; j++) {
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", i * 1000 + j);
> // hangs after 30 to 50 batches
> doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> // hangs after about 200 batches
> //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> batchedDocs.add(doc);
> }
> try {
> server.add(batchedDocs, true);
> System.err.println("Done with batch=" + i);
> // server.commit(); //doesn't change anything
> } catch (Exception e) {
> System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
> throw e;
> }
> }
> }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1623) Solr hangs (often throwing
java.lang.OutOfMemoryError: PermGen space) when indexing many different
field names
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786161#action_12786161 ]
Yonik Seeley commented on SOLR-1623:
------------------------------------
This is most likely due to interning of field names. If you really need that many field names, the only option right now is to increase the size of the perm gen.
> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1623
> URL: https://issues.apache.org/jira/browse/SOLR-1623
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 1.3, 1.4
> Environment: Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0 snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux 2.6.18-164.el5 amd64
> and/or
> Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0.18 1.6.0_12-b04 Sun Microsystems Inc. Windows 2003 5.2 amd64
> Reporter: Laurent Chavet
> Priority: Critical
>
> With the following fields in schema.xml:
> <fields>
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
> <dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> public static void main(String[] args) throws Exception {
> SolrServer server;
> try {
> server = new CommonsHttpSolrServer(args[0]);
> } catch (Exception e) {
> System.err.println("can't creater server using: " + args[0] + " " + e.getMessage());
> throw e;
> }
> for (int i = 0; i < 1000; i++) {
> List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
> for (int j = 0; j < 1000; j++) {
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", i * 1000 + j);
> // hangs after 30 to 50 batches
> doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> // hangs after about 200 batches
> //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> batchedDocs.add(doc);
> }
> try {
> server.add(batchedDocs, true);
> System.err.println("Done with batch=" + i);
> // server.commit(); //doesn't change anything
> } catch (Exception e) {
> System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
> throw e;
> }
> }
> }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1623) Solr hangs (often throwing
java.lang.OutOfMemoryError: PermGen space) when indexing many different
field names
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786167#action_12786167 ]
Mark Miller commented on SOLR-1623:
-----------------------------------
Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore? Did you really test with 1.4? Or am I missing something?
You can also turn on gc for the perm gen space - not a complete solution, but it can help under the right circumstances (likely in combination with a larger perm gen space)).
> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1623
> URL: https://issues.apache.org/jira/browse/SOLR-1623
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 1.3, 1.4
> Environment: Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0 snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux 2.6.18-164.el5 amd64
> and/or
> Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0.18 1.6.0_12-b04 Sun Microsystems Inc. Windows 2003 5.2 amd64
> Reporter: Laurent Chavet
> Priority: Critical
>
> With the following fields in schema.xml:
> <fields>
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
> <dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> public static void main(String[] args) throws Exception {
> SolrServer server;
> try {
> server = new CommonsHttpSolrServer(args[0]);
> } catch (Exception e) {
> System.err.println("can't creater server using: " + args[0] + " " + e.getMessage());
> throw e;
> }
> for (int i = 0; i < 1000; i++) {
> List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
> for (int j = 0; j < 1000; j++) {
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", i * 1000 + j);
> // hangs after 30 to 50 batches
> doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> // hangs after about 200 batches
> //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> batchedDocs.add(doc);
> }
> try {
> server.add(batchedDocs, true);
> System.err.println("Done with batch=" + i);
> // server.commit(); //doesn't change anything
> } catch (Exception e) {
> System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
> throw e;
> }
> }
> }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1623) Solr hangs (often throwing
java.lang.OutOfMemoryError: PermGen space) when indexing many different
field names
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786175#action_12786175 ]
Yonik Seeley commented on SOLR-1623:
------------------------------------
bq. Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore?
The default StringHelper.intern() from Lucene is just a cache - String.intern() is still called.
> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1623
> URL: https://issues.apache.org/jira/browse/SOLR-1623
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 1.3, 1.4
> Environment: Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0 snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux 2.6.18-164.el5 amd64
> and/or
> Tomcat Version JVM Version JVM Vendor OS Name OS Version OS Architecture
> Apache Tomcat/6.0.18 1.6.0_12-b04 Sun Microsystems Inc. Windows 2003 5.2 amd64
> Reporter: Laurent Chavet
> Priority: Critical
>
> With the following fields in schema.xml:
> <fields>
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
> <dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> public static void main(String[] args) throws Exception {
> SolrServer server;
> try {
> server = new CommonsHttpSolrServer(args[0]);
> } catch (Exception e) {
> System.err.println("can't creater server using: " + args[0] + " " + e.getMessage());
> throw e;
> }
> for (int i = 0; i < 1000; i++) {
> List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
> for (int j = 0; j < 1000; j++) {
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", i * 1000 + j);
> // hangs after 30 to 50 batches
> doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> // hangs after about 200 batches
> //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
> batchedDocs.add(doc);
> }
> try {
> server.add(batchedDocs, true);
> System.err.println("Done with batch=" + i);
> // server.commit(); //doesn't change anything
> } catch (Exception e) {
> System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
> throw e;
> }
> }
> }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.