You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by al...@aim.com on 2013/03/04 04:16:52 UTC

solr cloud index size is too big

Hello,

I had a non cloud collection index size around 80G for 15M documents with solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr the following command

 curl 'http://slave:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=1'

I tried to put replicationFactor=0 but this command gave an error.  After reindexing, into two separate linux boxes with one instances of solr running in each of them I see that size of index in each shard is 90GB versus expected 40GB although each of the shards has half (7.5M) of  documents.

Any ideas what went wrong?

Thanks.
Alex.

Re: solr cloud index size is too big

Posted by Erick Erickson <er...@gmail.com>.
Well, that would definitely make the index bigger. Why don't you just try
it and see?
you should be able to see the effects with a reasonable subset of your
docs...

Another thing to keep in mind is if you have any additions "stored=true"
fields defined.

Best
Erick


On Mon, Mar 4, 2013 at 5:51 PM, <al...@aim.com> wrote:

> Hi,
>
> It is the index folder. tlog is only a few MB.
>
> I have analysed all changed and found out that only one field in schema
> was changed.
>
> This field in non cloud
>  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>
> was changed to
>  <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> termVectors="true" termPositions="true" termOffsets="true">
>
>  in cloud to use fastVectorHighlighting.
>
> Is it possible that this change could double index size?
>
> Thanks.
> Alex.
>
>
>
>
>
> -----Original Message-----
> From: Jan Høydahl <ja...@cominvent.com>
> To: solr-user <so...@lucene.apache.org>
> Sent: Mon, Mar 4, 2013 2:24 pm
> Subject: Re: solr cloud index size is too big
>
>
> Can you tell whether it's the "index" folder that is that large or is it
> including the "tlog" transaction log folder?
> If you have a huge transaction log, you need to start sending hard commits
> more
> often during indexing to flush the tlogs.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 4. mars 2013 kl. 04:16 skrev alxsss@aim.com:
>
> > Hello,
> >
> > I had a non cloud collection index size around 80G for 15M documents with
> solr-4.1.0. So, I decided to use solr cloud with two shards and sent to
> solr the
> following command
> >
> > curl '
> http://slave:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=1
> '
> >
> > I tried to put replicationFactor=0 but this command gave an error.  After
> reindexing, into two separate linux boxes with one instances of solr
> running in
> each of them I see that size of index in each shard is 90GB versus
> expected 40GB
> although each of the shards has half (7.5M) of  documents.
> >
> > Any ideas what went wrong?
> >
> > Thanks.
> > Alex.
>
>
>
>

Re: solr cloud index size is too big

Posted by al...@aim.com.
Hi,

It is the index folder. tlog is only a few MB.

I have analysed all changed and found out that only one field in schema was changed.

This field in non cloud
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">

was changed to
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100" termVectors="true" termPositions="true" termOffsets="true">

 in cloud to use fastVectorHighlighting.

Is it possible that this change could double index size?

Thanks.
Alex.

 

 

-----Original Message-----
From: Jan Høydahl <ja...@cominvent.com>
To: solr-user <so...@lucene.apache.org>
Sent: Mon, Mar 4, 2013 2:24 pm
Subject: Re: solr cloud index size is too big


Can you tell whether it's the "index" folder that is that large or is it 
including the "tlog" transaction log folder?
If you have a huge transaction log, you need to start sending hard commits more 
often during indexing to flush the tlogs.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

4. mars 2013 kl. 04:16 skrev alxsss@aim.com:

> Hello,
> 
> I had a non cloud collection index size around 80G for 15M documents with 
solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr the 
following command
> 
> curl 'http://slave:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=1'
> 
> I tried to put replicationFactor=0 but this command gave an error.  After 
reindexing, into two separate linux boxes with one instances of solr running in 
each of them I see that size of index in each shard is 90GB versus expected 40GB 
although each of the shards has half (7.5M) of  documents.
> 
> Any ideas what went wrong?
> 
> Thanks.
> Alex.


 

Re: solr cloud index size is too big

Posted by Jan Høydahl <ja...@cominvent.com>.
Can you tell whether it's the "index" folder that is that large or is it including the "tlog" transaction log folder?
If you have a huge transaction log, you need to start sending hard commits more often during indexing to flush the tlogs.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

4. mars 2013 kl. 04:16 skrev alxsss@aim.com:

> Hello,
> 
> I had a non cloud collection index size around 80G for 15M documents with solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr the following command
> 
> curl 'http://slave:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=1'
> 
> I tried to put replicationFactor=0 but this command gave an error.  After reindexing, into two separate linux boxes with one instances of solr running in each of them I see that size of index in each shard is 90GB versus expected 40GB although each of the shards has half (7.5M) of  documents.
> 
> Any ideas what went wrong?
> 
> Thanks.
> Alex.