You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2014/02/27 16:03:49 UTC

How To Test SolrCloud Indexing Limits

Hi;

I'm trying to index 2 million documents into SolrCloud via Map Reduce Jobs
(really small number of documents for my system). However I get that error
at tasks when I increase the added document size:

java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be
cast to java.lang.Exception
	at org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException.<init>(CloudSolrServer.java:484)
	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:351)
	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
	at org.apache.nutch.indexwriter.solrcloud.SolrCloudIndexWriter.close(SolrCloudIndexWriter.java:95)
	at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:114)
	at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:649)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)


I use Solr 4.5.1 for my purpose. I do not get any error at my
SolrCloud nodes.. I want to test my indexing capability and I have
changed some parameters to tune up. Is there any idea for autocommit -
softcommit size or maxTime - maxDocs parameters to test. I don't need
the numbers I just want to follow a policy as like: increase
autocommit and maxDocs, don't use softcommit and maxTime (or maybe no
free lunch, try everything!).

I don't ask this question for production purpose, I know that I should
test more parameters and tune up my system for such kind of purpose I
just want to test my indexing limits.


Thanks;

Furkan KAMACI

Re: How To Test SolrCloud Indexing Limits

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Markus;

I am already using existing functionality at Nutch. I have calculated the
batch size effect and I think that map task should be tune up.

Thanks;
Furkan KAMACI


2014-02-27 17:21 GMT+02:00 Markus Jelsma <ma...@openindex.io>:

> Something must be eating your memory in your solrcloud indexer in Nutch.
> We have our own SolrCloud indexer in Nutch and it uses extremely little
> memory.  You either have a leak or your batch size is too large.
>
> -----Original message-----
> > From:Furkan KAMACI <fu...@gmail.com>
> > Sent: Thursday 27th February 2014 16:04
> > To: solr-user@lucene.apache.org
> > Subject: How To Test SolrCloud Indexing Limits
> >
> > Hi;
> >
> > I'm trying to index 2 million documents into SolrCloud via Map Reduce
> Jobs
> > (really small number of documents for my system). However I get that
> error
> > at tasks when I increase the added document size:
> >
> > java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be
> > cast to java.lang.Exception
> >       at
> org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException.<init>(CloudSolrServer.java:484)
> >       at
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:351)
> >       at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
> >       at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >       at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
> >       at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
> >       at
> org.apache.nutch.indexwriter.solrcloud.SolrCloudIndexWriter.close(SolrCloudIndexWriter.java:95)
> >       at
> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:114)
> >       at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
> >       at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:649)
> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
> >       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:396)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> >
> > I use Solr 4.5.1 for my purpose. I do not get any error at my
> > SolrCloud nodes.. I want to test my indexing capability and I have
> > changed some parameters to tune up. Is there any idea for autocommit -
> > softcommit size or maxTime - maxDocs parameters to test. I don't need
> > the numbers I just want to follow a policy as like: increase
> > autocommit and maxDocs, don't use softcommit and maxTime (or maybe no
> > free lunch, try everything!).
> >
> > I don't ask this question for production purpose, I know that I should
> > test more parameters and tune up my system for such kind of purpose I
> > just want to test my indexing limits.
> >
> >
> > Thanks;
> >
> > Furkan KAMACI
> >
>

RE: How To Test SolrCloud Indexing Limits

Posted by Markus Jelsma <ma...@openindex.io>.
Something must be eating your memory in your solrcloud indexer in Nutch. We have our own SolrCloud indexer in Nutch and it uses extremely little memory.  You either have a leak or your batch size is too large.
 
-----Original message-----
> From:Furkan KAMACI <fu...@gmail.com>
> Sent: Thursday 27th February 2014 16:04
> To: solr-user@lucene.apache.org
> Subject: How To Test SolrCloud Indexing Limits
> 
> Hi;
> 
> I'm trying to index 2 million documents into SolrCloud via Map Reduce Jobs
> (really small number of documents for my system). However I get that error
> at tasks when I increase the added document size:
> 
> java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be
> cast to java.lang.Exception
> 	at org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException.<init>(CloudSolrServer.java:484)
> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:351)
> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
> 	at org.apache.nutch.indexwriter.solrcloud.SolrCloudIndexWriter.close(SolrCloudIndexWriter.java:95)
> 	at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:114)
> 	at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:649)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 
> 
> I use Solr 4.5.1 for my purpose. I do not get any error at my
> SolrCloud nodes.. I want to test my indexing capability and I have
> changed some parameters to tune up. Is there any idea for autocommit -
> softcommit size or maxTime - maxDocs parameters to test. I don't need
> the numbers I just want to follow a policy as like: increase
> autocommit and maxDocs, don't use softcommit and maxTime (or maybe no
> free lunch, try everything!).
> 
> I don't ask this question for production purpose, I know that I should
> test more parameters and tune up my system for such kind of purpose I
> just want to test my indexing limits.
> 
> 
> Thanks;
> 
> Furkan KAMACI
>