You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by glumet <ja...@gmail.com> on 2014/09/07 11:31:39 UTC

Nutch + Solr - Indexer causes java.lang.OutOfMemoryError: Java heap space

Hello everyone, 

I have configured my 2 servers to run in distributed mode (with Hadoop) and
my configuration for crawling process is Nutch 2.2.1 - HBase (as a storage)
and Solr. Solr is run by Tomcat. The problem is everytime I try to do the
last step - I mean when I want to index data from HBase into Solr. After
then this *[1]* error occures. I tried to add CATALINA_OPTS (or JAVA_OPTS)
like this:

CATALINA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -Xms1g -Xmx6000m
-XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=30 -XX:MaxPermSize=512m
-XX:+CMSClassUnloadingEnabled"

to Tomcat's catalina.sh script and run server with this script but it didn't
help. I also add these *[2]* properties to nutch-site.xml file but it ended
up with OutOfMemory again. Can you help me please?

*[1]*
/2014-09-06 22:52:50,683 FATAL org.apache.hadoop.mapred.Child: Error running
child : java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2367)
	at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
	at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587)
	at java.lang.StringBuffer.append(StringBuffer.java:332)
	at java.io.StringWriter.write(StringWriter.java:77)
	at org.apache.solr.common.util.XML.escape(XML.java:204)
	at org.apache.solr.common.util.XML.escapeCharData(XML.java:77)
	at org.apache.solr.common.util.XML.writeXML(XML.java:147)
	at
org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:161)
	at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:129)
	at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:355)
	at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:271)
	at
org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:66)
	at
org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:94)
	at
org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:104)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:247)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
	at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
	at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:96)
	at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117)
	at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
	at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
	at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
/

*[2]*

<property>
  <name>http.content.limit</name>
  <value>150000000</value>
  <description>The length limit for downloaded content using the http
  protocol, in bytes. If this value is nonnegative (>=0), content longer
  than it will be truncated; otherwise, no truncation at all. Do not
  confuse this setting with the file.content.limit setting.
  For our purposes it is twice bigger than default - parsing big pages: 128
* 1024
  </description>
</property>

<property>
   <name>indexer.max.tokens</name>
   <value>100000</value>
</property>

<property>
  <name>http.timeout</name>
  <value>50000</value>
  <description>The default network timeout, in milliseconds.</description>
</property>

<property>
  <name>solr.commit.size</name>
  <value>100</value>
  <description>
  Defines the number of documents to send to Solr in a single update batch.
  Decrease when handling very large documents to prevent Nutch from running
  out of memory. NOTE: It does not explicitly trigger a server side commit.
  </description>
</property>



--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-Solr-Indexer-causes-java-lang-OutOfMemoryError-Java-heap-space-tp4157308.html
Sent from the Solr - User mailing list archive at Nabble.com.