You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kent Mu <so...@gmail.com> on 2016/07/11 15:17:22 UTC

solrcloud consumes more time than solr when write index

Hi friends!

solr version: 4.9.0.

we use solr and solrcloud in our project, that means we use sorl and
solrcloud at the same time.
but we find a phenomenon that sorlcoud consumes more time than solr when
write index. it takes nearly 5 or more times longer. I wonder that is why?

in our project, we have a scheduler job to add index, and then execute the
the method of "optimize(false, true, 2)" to optimize the added index.
I wonder if it is caused by solrcloud internal that when writing index,
solrcloud needs to just which shard it should be stored? and when
optimizing the replicate needs to take some time to synchronize the data
from leader?

and I wonder what about query?  will solrcloud also take more time than
solr when query data?

Re: solrcloud consumes more time than solr when write index

Posted by Kent Mu <so...@gmail.com>.
correct the URL.

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3CCAMCstK6rv0NWH3tqG0MBo%3D1kccDHTH4JQP-sNFvTuEzd2mUYFA%40mail.gmail.com%3E

2016-07-14 1:17 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:

> There’s another thread on this list going on right now touching on the
> need to optimize, might be worth reading.
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3C61f3d01f-c3ef-2d71-7112-6a88b01458f6@elyograg.org%3E
>
>
> On 7/12/16, 6:25 PM, "Kent Mu" <so...@gmail.com> wrote:
>
> >Dear Mr. Wartes,
> >Thanks for your reply. well, I see. for solr we do have replicas, and for
> >solrcloud, we have 5 shards and each shards with one leader and one
> >replica. and the data number is nearly 100 million, you mean we do not
> need
> >to optimize the index data?
> >
> >Thanks!
> >Kent
> >
> >2016-07-12 23:02 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:
> >
> >> Well, two thoughts:
> >>
> >>
> >> 1. If you’re not using solrcloud, presumably you don’t have any
> replicas.
> >> If you are, presumably you do. This makes for a biased comparison,
> because
> >> SolrCloud won’t acknowledge a write until it’s been safely written to
> all
> >> replicas. In short, solrcloud write time is max(per-replica write time).
> >> The more replicas you add, the bigger the chance some replica randomly
> >> takes longer (gc pause, perhaps?), and the longer your overall write
> time,
> >> assuming a fixed number of indexing threads.
> >> 2. The parallelism of the optimize operation across replicas has gone
> back
> >> and forth a bit, and I’m not sure what it was doing in 4.9. However, at
> one
> >> point the optimize happened per-replica, serially. So it’d do
> >> shard1_replica1, then when that was done, do shard1_replica2, then
> >> shard2_replica1, etc. Other versions of Solr would do those at the same
> >> time. Again, I don’t know if you’re comparing to a non-replicated solr
> >> index, but that could explain some of the difference.
> >>
> >> There’s a sort of an obligatory comment at this point that optimize
> >> doesn’t necessarily save you a lot. There are certainly cases where it
> >> does, but if you haven’t already, you’ll want to validate that you have
> one
> >> of them and that you’re not just doing unnecessary work.
> >>
> >>
> >> On 7/12/16, 7:41 AM, "Kent Mu" <so...@gmail.com> wrote:
> >>
> >> >hello, does anybody also come across the issue? can anybody help me?
> >> >
> >> >2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:
> >> >
> >> >> Hi friends!
> >> >>
> >> >> solr version: 4.9.0.
> >> >>
> >> >> we use solr and solrcloud in our project, that means we use sorl and
> >> >> solrcloud at the same time.
> >> >> but we find a phenomenon that sorlcoud consumes more time than solr
> when
> >> >> write index. it takes nearly 5 or more times longer. I wonder that is
> >> why?
> >> >>
> >> >> in our project, we have a scheduler job to add index, and then
> execute
> >> the
> >> >> the method of "optimize(false, true, 2)" to optimize the added index.
> >> >> I wonder if it is caused by solrcloud internal that when writing
> index,
> >> >> solrcloud needs to just which shard it should be stored? and when
> >> >> optimizing the replicate needs to take some time to synchronize the
> data
> >> >> from leader?
> >> >>
> >> >> and I wonder what about query?  will solrcloud also take more time
> than
> >> >> solr when query data?
> >> >>
> >>
> >>
>
>

Re: solrcloud consumes more time than solr when write index

Posted by Kent Mu <so...@gmail.com>.
Thanks a lot! I see now.
well, I come across an issue that so many connections in solrcloud. and I
have raised a issue on mailing list. please help me!
looking forward to your reply.

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/ajax/%3CCAMCstK6rv0NWH3tqG0MBo%3D1kccDHTH4JQP-sNFvTuEzd2mUYFA%40mail.gmail.com%3E

Thanks
Kent

2016-07-14 1:17 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:

> There’s another thread on this list going on right now touching on the
> need to optimize, might be worth reading.
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3C61f3d01f-c3ef-2d71-7112-6a88b01458f6@elyograg.org%3E
>
>
> On 7/12/16, 6:25 PM, "Kent Mu" <so...@gmail.com> wrote:
>
> >Dear Mr. Wartes,
> >Thanks for your reply. well, I see. for solr we do have replicas, and for
> >solrcloud, we have 5 shards and each shards with one leader and one
> >replica. and the data number is nearly 100 million, you mean we do not
> need
> >to optimize the index data?
> >
> >Thanks!
> >Kent
> >
> >2016-07-12 23:02 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:
> >
> >> Well, two thoughts:
> >>
> >>
> >> 1. If you’re not using solrcloud, presumably you don’t have any
> replicas.
> >> If you are, presumably you do. This makes for a biased comparison,
> because
> >> SolrCloud won’t acknowledge a write until it’s been safely written to
> all
> >> replicas. In short, solrcloud write time is max(per-replica write time).
> >> The more replicas you add, the bigger the chance some replica randomly
> >> takes longer (gc pause, perhaps?), and the longer your overall write
> time,
> >> assuming a fixed number of indexing threads.
> >> 2. The parallelism of the optimize operation across replicas has gone
> back
> >> and forth a bit, and I’m not sure what it was doing in 4.9. However, at
> one
> >> point the optimize happened per-replica, serially. So it’d do
> >> shard1_replica1, then when that was done, do shard1_replica2, then
> >> shard2_replica1, etc. Other versions of Solr would do those at the same
> >> time. Again, I don’t know if you’re comparing to a non-replicated solr
> >> index, but that could explain some of the difference.
> >>
> >> There’s a sort of an obligatory comment at this point that optimize
> >> doesn’t necessarily save you a lot. There are certainly cases where it
> >> does, but if you haven’t already, you’ll want to validate that you have
> one
> >> of them and that you’re not just doing unnecessary work.
> >>
> >>
> >> On 7/12/16, 7:41 AM, "Kent Mu" <so...@gmail.com> wrote:
> >>
> >> >hello, does anybody also come across the issue? can anybody help me?
> >> >
> >> >2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:
> >> >
> >> >> Hi friends!
> >> >>
> >> >> solr version: 4.9.0.
> >> >>
> >> >> we use solr and solrcloud in our project, that means we use sorl and
> >> >> solrcloud at the same time.
> >> >> but we find a phenomenon that sorlcoud consumes more time than solr
> when
> >> >> write index. it takes nearly 5 or more times longer. I wonder that is
> >> why?
> >> >>
> >> >> in our project, we have a scheduler job to add index, and then
> execute
> >> the
> >> >> the method of "optimize(false, true, 2)" to optimize the added index.
> >> >> I wonder if it is caused by solrcloud internal that when writing
> index,
> >> >> solrcloud needs to just which shard it should be stored? and when
> >> >> optimizing the replicate needs to take some time to synchronize the
> data
> >> >> from leader?
> >> >>
> >> >> and I wonder what about query?  will solrcloud also take more time
> than
> >> >> solr when query data?
> >> >>
> >>
> >>
>
>

Re: solrcloud consumes more time than solr when write index

Posted by Jeff Wartes <jw...@whitepages.com>.
There’s another thread on this list going on right now touching on the need to optimize, might be worth reading.
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3C61f3d01f-c3ef-2d71-7112-6a88b01458f6@elyograg.org%3E


On 7/12/16, 6:25 PM, "Kent Mu" <so...@gmail.com> wrote:

>Dear Mr. Wartes,
>Thanks for your reply. well, I see. for solr we do have replicas, and for
>solrcloud, we have 5 shards and each shards with one leader and one
>replica. and the data number is nearly 100 million, you mean we do not need
>to optimize the index data?
>
>Thanks!
>Kent
>
>2016-07-12 23:02 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:
>
>> Well, two thoughts:
>>
>>
>> 1. If you’re not using solrcloud, presumably you don’t have any replicas.
>> If you are, presumably you do. This makes for a biased comparison, because
>> SolrCloud won’t acknowledge a write until it’s been safely written to all
>> replicas. In short, solrcloud write time is max(per-replica write time).
>> The more replicas you add, the bigger the chance some replica randomly
>> takes longer (gc pause, perhaps?), and the longer your overall write time,
>> assuming a fixed number of indexing threads.
>> 2. The parallelism of the optimize operation across replicas has gone back
>> and forth a bit, and I’m not sure what it was doing in 4.9. However, at one
>> point the optimize happened per-replica, serially. So it’d do
>> shard1_replica1, then when that was done, do shard1_replica2, then
>> shard2_replica1, etc. Other versions of Solr would do those at the same
>> time. Again, I don’t know if you’re comparing to a non-replicated solr
>> index, but that could explain some of the difference.
>>
>> There’s a sort of an obligatory comment at this point that optimize
>> doesn’t necessarily save you a lot. There are certainly cases where it
>> does, but if you haven’t already, you’ll want to validate that you have one
>> of them and that you’re not just doing unnecessary work.
>>
>>
>> On 7/12/16, 7:41 AM, "Kent Mu" <so...@gmail.com> wrote:
>>
>> >hello, does anybody also come across the issue? can anybody help me?
>> >
>> >2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:
>> >
>> >> Hi friends!
>> >>
>> >> solr version: 4.9.0.
>> >>
>> >> we use solr and solrcloud in our project, that means we use sorl and
>> >> solrcloud at the same time.
>> >> but we find a phenomenon that sorlcoud consumes more time than solr when
>> >> write index. it takes nearly 5 or more times longer. I wonder that is
>> why?
>> >>
>> >> in our project, we have a scheduler job to add index, and then execute
>> the
>> >> the method of "optimize(false, true, 2)" to optimize the added index.
>> >> I wonder if it is caused by solrcloud internal that when writing index,
>> >> solrcloud needs to just which shard it should be stored? and when
>> >> optimizing the replicate needs to take some time to synchronize the data
>> >> from leader?
>> >>
>> >> and I wonder what about query?  will solrcloud also take more time than
>> >> solr when query data?
>> >>
>>
>>


Re: solrcloud consumes more time than solr when write index

Posted by Kent Mu <so...@gmail.com>.
Dear Mr. Wartes,
Thanks for your reply. well, I see. for solr we do have replicas, and for
solrcloud, we have 5 shards and each shards with one leader and one
replica. and the data number is nearly 100 million, you mean we do not need
to optimize the index data?

Thanks!
Kent

2016-07-12 23:02 GMT+08:00 Jeff Wartes <jw...@whitepages.com>:

> Well, two thoughts:
>
>
> 1. If you’re not using solrcloud, presumably you don’t have any replicas.
> If you are, presumably you do. This makes for a biased comparison, because
> SolrCloud won’t acknowledge a write until it’s been safely written to all
> replicas. In short, solrcloud write time is max(per-replica write time).
> The more replicas you add, the bigger the chance some replica randomly
> takes longer (gc pause, perhaps?), and the longer your overall write time,
> assuming a fixed number of indexing threads.
> 2. The parallelism of the optimize operation across replicas has gone back
> and forth a bit, and I’m not sure what it was doing in 4.9. However, at one
> point the optimize happened per-replica, serially. So it’d do
> shard1_replica1, then when that was done, do shard1_replica2, then
> shard2_replica1, etc. Other versions of Solr would do those at the same
> time. Again, I don’t know if you’re comparing to a non-replicated solr
> index, but that could explain some of the difference.
>
> There’s a sort of an obligatory comment at this point that optimize
> doesn’t necessarily save you a lot. There are certainly cases where it
> does, but if you haven’t already, you’ll want to validate that you have one
> of them and that you’re not just doing unnecessary work.
>
>
> On 7/12/16, 7:41 AM, "Kent Mu" <so...@gmail.com> wrote:
>
> >hello, does anybody also come across the issue? can anybody help me?
> >
> >2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:
> >
> >> Hi friends!
> >>
> >> solr version: 4.9.0.
> >>
> >> we use solr and solrcloud in our project, that means we use sorl and
> >> solrcloud at the same time.
> >> but we find a phenomenon that sorlcoud consumes more time than solr when
> >> write index. it takes nearly 5 or more times longer. I wonder that is
> why?
> >>
> >> in our project, we have a scheduler job to add index, and then execute
> the
> >> the method of "optimize(false, true, 2)" to optimize the added index.
> >> I wonder if it is caused by solrcloud internal that when writing index,
> >> solrcloud needs to just which shard it should be stored? and when
> >> optimizing the replicate needs to take some time to synchronize the data
> >> from leader?
> >>
> >> and I wonder what about query?  will solrcloud also take more time than
> >> solr when query data?
> >>
>
>

Re: solrcloud consumes more time than solr when write index

Posted by Jeff Wartes <jw...@whitepages.com>.
Well, two thoughts:


1. If you’re not using solrcloud, presumably you don’t have any replicas. If you are, presumably you do. This makes for a biased comparison, because SolrCloud won’t acknowledge a write until it’s been safely written to all replicas. In short, solrcloud write time is max(per-replica write time). The more replicas you add, the bigger the chance some replica randomly takes longer (gc pause, perhaps?), and the longer your overall write time, assuming a fixed number of indexing threads.
2. The parallelism of the optimize operation across replicas has gone back and forth a bit, and I’m not sure what it was doing in 4.9. However, at one point the optimize happened per-replica, serially. So it’d do shard1_replica1, then when that was done, do shard1_replica2, then shard2_replica1, etc. Other versions of Solr would do those at the same time. Again, I don’t know if you’re comparing to a non-replicated solr index, but that could explain some of the difference.

There’s a sort of an obligatory comment at this point that optimize doesn’t necessarily save you a lot. There are certainly cases where it does, but if you haven’t already, you’ll want to validate that you have one of them and that you’re not just doing unnecessary work.


On 7/12/16, 7:41 AM, "Kent Mu" <so...@gmail.com> wrote:

>hello, does anybody also come across the issue? can anybody help me?
>
>2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:
>
>> Hi friends!
>>
>> solr version: 4.9.0.
>>
>> we use solr and solrcloud in our project, that means we use sorl and
>> solrcloud at the same time.
>> but we find a phenomenon that sorlcoud consumes more time than solr when
>> write index. it takes nearly 5 or more times longer. I wonder that is why?
>>
>> in our project, we have a scheduler job to add index, and then execute the
>> the method of "optimize(false, true, 2)" to optimize the added index.
>> I wonder if it is caused by solrcloud internal that when writing index,
>> solrcloud needs to just which shard it should be stored? and when
>> optimizing the replicate needs to take some time to synchronize the data
>> from leader?
>>
>> and I wonder what about query?  will solrcloud also take more time than
>> solr when query data?
>>


Re: solrcloud consumes more time than solr when write index

Posted by Kent Mu <so...@gmail.com>.
hello, does anybody also come across the issue? can anybody help me?

2016-07-11 23:17 GMT+08:00 Kent Mu <so...@gmail.com>:

> Hi friends!
>
> solr version: 4.9.0.
>
> we use solr and solrcloud in our project, that means we use sorl and
> solrcloud at the same time.
> but we find a phenomenon that sorlcoud consumes more time than solr when
> write index. it takes nearly 5 or more times longer. I wonder that is why?
>
> in our project, we have a scheduler job to add index, and then execute the
> the method of "optimize(false, true, 2)" to optimize the added index.
> I wonder if it is caused by solrcloud internal that when writing index,
> solrcloud needs to just which shard it should be stored? and when
> optimizing the replicate needs to take some time to synchronize the data
> from leader?
>
> and I wonder what about query?  will solrcloud also take more time than
> solr when query data?
>