You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrea Gazzarini <gx...@gmail.com> on 2017/04/18 09:45:06 UTC

Index and query time suggester behavior in a SolrCloud environment

Hi,
I have a project, with SolrCloud, where I'm going to use the Suggester 
component (BlendedInfixLookupFactory with DocumentDictionaryFactory).
Some info:

  * I will have a suggest-only collection, with no NRT requirements
    (indexes will be updated with a daily frequency)
  * I'm not yet sure about the replication factor (I have to do some checks)
  * I'm using Solrj on the client side

After reading some documentation I have a couple of doubts:

  * how the *suggest.build* command is working? Can I issue this command
    towards just one node, and have that node forward the request to the
    other nodes (so each of them can build its own suggester index portion)?
  * how things are working at query time? Can I use send a request with
    only suggest.q=... to my /suggest request handler and get back
    distributed suggestions?

Thanks in advance
Andrea

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I also opened https://issues.apache.org/jira/browse/SOLR-10532 to fix
this annoying and confusing behavior of SuggestComponent.

On Thu, Apr 20, 2017 at 8:40 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
> Ah great, many thanks again!
>
>
>
> On 20/04/17 17:09, Shalin Shekhar Mangar wrote:
>>
>> Hi Andrea,
>>
>> Looks like I have you some bad information. I looked at the code and
>> ran a test locally. The suggest.build and suggest.reload params are in
>> fact distributed across to all shards but only to one replica of each
>> shard. This is still bad enough and you should use buildOnOptimize as
>> suggested but I just wanted to correct the wrong information I gave
>> earlier.
>>
>> On Thu, Apr 20, 2017 at 6:23 PM, Andrea Gazzarini <gx...@gmail.com>
>> wrote:
>>>
>>> Perfect, I don't need NRT at this moment so that fits perfectly
>>>
>>> Thanks,
>>> Andrea
>>>
>>>
>>> On 20/04/17 14:37, Shalin Shekhar Mangar wrote:
>>>>
>>>> Yeah, if it is just once a day then you can afford to do an optimize.
>>>> For a more NRT indexing approach, I wouldn't recommend optimize at
>>>> all.
>>>>
>>>> On Thu, Apr 20, 2017 at 5:29 PM, Andrea Gazzarini <gx...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Ok, many thanks
>>>>>
>>>>> I see / read that it should be better to rely on the background merging
>>>>> instead of issuing explicit optimizes, but I think in this case one
>>>>> optimize
>>>>> in a day it shouldn't be a problem.
>>>>>
>>>>> Did I get you correctly?
>>>>>
>>>>> Thanks again,
>>>>> Andrea
>>>>>
>>>>>
>>>>> On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
>>>>>>
>>>>>> Can the client not send an optimize command explicitly after all
>>>>>> indexing/deleting is complete?
>>>>>
>>>>>
>>>>
>>
>>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Andrea Gazzarini <gx...@gmail.com>.
Ah great, many thanks again!


On 20/04/17 17:09, Shalin Shekhar Mangar wrote:
> Hi Andrea,
>
> Looks like I have you some bad information. I looked at the code and
> ran a test locally. The suggest.build and suggest.reload params are in
> fact distributed across to all shards but only to one replica of each
> shard. This is still bad enough and you should use buildOnOptimize as
> suggested but I just wanted to correct the wrong information I gave
> earlier.
>
> On Thu, Apr 20, 2017 at 6:23 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
>> Perfect, I don't need NRT at this moment so that fits perfectly
>>
>> Thanks,
>> Andrea
>>
>>
>> On 20/04/17 14:37, Shalin Shekhar Mangar wrote:
>>> Yeah, if it is just once a day then you can afford to do an optimize.
>>> For a more NRT indexing approach, I wouldn't recommend optimize at
>>> all.
>>>
>>> On Thu, Apr 20, 2017 at 5:29 PM, Andrea Gazzarini <gx...@gmail.com>
>>> wrote:
>>>> Ok, many thanks
>>>>
>>>> I see / read that it should be better to rely on the background merging
>>>> instead of issuing explicit optimizes, but I think in this case one
>>>> optimize
>>>> in a day it shouldn't be a problem.
>>>>
>>>> Did I get you correctly?
>>>>
>>>> Thanks again,
>>>> Andrea
>>>>
>>>>
>>>> On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
>>>>> Can the client not send an optimize command explicitly after all
>>>>> indexing/deleting is complete?
>>>>
>>>
>
>


Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi Andrea,

Looks like I have you some bad information. I looked at the code and
ran a test locally. The suggest.build and suggest.reload params are in
fact distributed across to all shards but only to one replica of each
shard. This is still bad enough and you should use buildOnOptimize as
suggested but I just wanted to correct the wrong information I gave
earlier.

On Thu, Apr 20, 2017 at 6:23 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
> Perfect, I don't need NRT at this moment so that fits perfectly
>
> Thanks,
> Andrea
>
>
> On 20/04/17 14:37, Shalin Shekhar Mangar wrote:
>>
>> Yeah, if it is just once a day then you can afford to do an optimize.
>> For a more NRT indexing approach, I wouldn't recommend optimize at
>> all.
>>
>> On Thu, Apr 20, 2017 at 5:29 PM, Andrea Gazzarini <gx...@gmail.com>
>> wrote:
>>>
>>> Ok, many thanks
>>>
>>> I see / read that it should be better to rely on the background merging
>>> instead of issuing explicit optimizes, but I think in this case one
>>> optimize
>>> in a day it shouldn't be a problem.
>>>
>>> Did I get you correctly?
>>>
>>> Thanks again,
>>> Andrea
>>>
>>>
>>> On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
>>>>
>>>> Can the client not send an optimize command explicitly after all
>>>> indexing/deleting is complete?
>>>
>>>
>>
>>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Andrea Gazzarini <gx...@gmail.com>.
Perfect, I don't need NRT at this moment so that fits perfectly

Thanks,
Andrea

On 20/04/17 14:37, Shalin Shekhar Mangar wrote:
> Yeah, if it is just once a day then you can afford to do an optimize.
> For a more NRT indexing approach, I wouldn't recommend optimize at
> all.
>
> On Thu, Apr 20, 2017 at 5:29 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
>> Ok, many thanks
>>
>> I see / read that it should be better to rely on the background merging
>> instead of issuing explicit optimizes, but I think in this case one optimize
>> in a day it shouldn't be a problem.
>>
>> Did I get you correctly?
>>
>> Thanks again,
>> Andrea
>>
>>
>> On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
>>> Can the client not send an optimize command explicitly after all
>>> indexing/deleting is complete?
>>
>
>


Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Yeah, if it is just once a day then you can afford to do an optimize.
For a more NRT indexing approach, I wouldn't recommend optimize at
all.

On Thu, Apr 20, 2017 at 5:29 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
> Ok, many thanks
>
> I see / read that it should be better to rely on the background merging
> instead of issuing explicit optimizes, but I think in this case one optimize
> in a day it shouldn't be a problem.
>
> Did I get you correctly?
>
> Thanks again,
> Andrea
>
>
> On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
>>
>> Can the client not send an optimize command explicitly after all
>> indexing/deleting is complete?
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Andrea Gazzarini <gx...@gmail.com>.
Ok, many thanks

I see / read that it should be better to rely on the background merging 
instead of issuing explicit optimizes, but I think in this case one 
optimize in a day it shouldn't be a problem.

Did I get you correctly?

Thanks again,
Andrea

On 20/04/17 13:17, Shalin Shekhar Mangar wrote:
> Can the client not send an optimize command explicitly after all
> indexing/deleting is complete?


Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 20, 2017 at 4:27 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
> Hi Shalin,
> many thanks for your response. This is my scenario:
>
>  * I build my index once in a day, it could be a delta or a full
>    re-index.In any case, that takes some time;
>  * I have an auto-commit (hard, no soft-commits) set to a given period
>    and during the indexing cycle, several hard commits are executed. So
>    the buildOnCommit (I guess) it's not an option because it will
>    rebuild that suggest index several times.

Yes, you're right, multiple commits will cause the suggest index to be
rebuilt needlessly.

>
> But I have a doubt on the second point: the reference guide says:
>
> /"Use buildOnCommit to rebuild the dictionary with every soft-commit"/
>
> As I said, I have no soft-commits only hard-commits: does the rebuild happen
> after hard commits (with buildOnCommit=true)?

I peeked at the code and yes, actually the rebuild happens whenever a
new searcher is created which means that it happens on soft-commits or
on a hard commit with openSearcher=true.

>
> The other option, buildOnOptimize, makes me curious: in the scenario above,
> let's say documents are indexed / deleted every morning at 4am, in a window
> that takes 1 max 3 hours, how can I build the suggest index (more or less)
> just after that window? I'm ok if the build happens after a reasonable delay
> (e.g. 1, max 2 hours)

Can the client not send an optimize command explicitly after all
indexing/deleting is complete?

>
> Many thanks,
> Andrea
>
>
>
> On 20/04/17 11:11, Shalin Shekhar Mangar wrote:
>>
>> Comments inline:
>>
>>
>> On Wed, Apr 19, 2017 at 2:46 PM, Andrea Gazzarini <gx...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> any help out there?
>>>
>>> BTW I forgot the Solr version: 6.5.0
>>>
>>> Thanks,
>>> Andrea
>>>
>>>
>>> On 18/04/17 11:45, Andrea Gazzarini wrote:
>>>>
>>>> Hi,
>>>> I have a project, with SolrCloud, where I'm going to use the Suggester
>>>> component (BlendedInfixLookupFactory with DocumentDictionaryFactory).
>>>> Some info:
>>>>
>>>>    * I will have a suggest-only collection, with no NRT requirements
>>>>      (indexes will be updated with a daily frequency)
>>>>    * I'm not yet sure about the replication factor (I have to do some
>>>>      checks)
>>>>    * I'm using Solrj on the client side
>>>>
>>>> After reading some documentation I have a couple of doubts:
>>>>
>>>>    * how the *suggest.build* command is working? Can I issue this
>>>>      command towards just one node, and have that node forward the
>>>>      request to the other nodes (so each of them can build its own
>>>>      suggester index portion)?
>>
>> The suggest.build only builds locally in the node to which you sent
>> the request. This makes it a bit tricky because if you send that
>> command with just the collection name, it will be resolved to a local
>> core and executed there. The safest/easiest way is to set
>> buildOnCommit or buildOnOptimize in the suggester configuration.
>>
>>>>    * how things are working at query time? Can I use send a request
>>>>      with only suggest.q=... to my /suggest request handler and get
>>>>      back distributed suggestions?
>>
>> The SuggestComponent works in distributed mode and it will request and
>> merge results from all shards.
>>
>>>> Thanks in advance
>>>> Andrea
>>>
>>>
>>
>>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Andrea Gazzarini <gx...@gmail.com>.
Hi Shalin,
many thanks for your response. This is my scenario:

  * I build my index once in a day, it could be a delta or a full
    re-index.In any case, that takes some time;
  * I have an auto-commit (hard, no soft-commits) set to a given period
    and during the indexing cycle, several hard commits are executed. So
    the buildOnCommit (I guess) it's not an option because it will
    rebuild that suggest index several times.

But I have a doubt on the second point: the reference guide says:

/"Use buildOnCommit to rebuild the dictionary with every soft-commit"/

As I said, I have no soft-commits only hard-commits: does the rebuild 
happen after hard commits (with buildOnCommit=true)?

The other option, buildOnOptimize, makes me curious: in the scenario 
above, let's say documents are indexed / deleted every morning at 4am, 
in a window that takes 1 max 3 hours, how can I build the suggest index 
(more or less) just after that window? I'm ok if the build happens after 
a reasonable delay (e.g. 1, max 2 hours)

Many thanks,
Andrea


On 20/04/17 11:11, Shalin Shekhar Mangar wrote:
> Comments inline:
>
>
> On Wed, Apr 19, 2017 at 2:46 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
>> Hi,
>> any help out there?
>>
>> BTW I forgot the Solr version: 6.5.0
>>
>> Thanks,
>> Andrea
>>
>>
>> On 18/04/17 11:45, Andrea Gazzarini wrote:
>>> Hi,
>>> I have a project, with SolrCloud, where I'm going to use the Suggester
>>> component (BlendedInfixLookupFactory with DocumentDictionaryFactory).
>>> Some info:
>>>
>>>    * I will have a suggest-only collection, with no NRT requirements
>>>      (indexes will be updated with a daily frequency)
>>>    * I'm not yet sure about the replication factor (I have to do some
>>>      checks)
>>>    * I'm using Solrj on the client side
>>>
>>> After reading some documentation I have a couple of doubts:
>>>
>>>    * how the *suggest.build* command is working? Can I issue this
>>>      command towards just one node, and have that node forward the
>>>      request to the other nodes (so each of them can build its own
>>>      suggester index portion)?
> The suggest.build only builds locally in the node to which you sent
> the request. This makes it a bit tricky because if you send that
> command with just the collection name, it will be resolved to a local
> core and executed there. The safest/easiest way is to set
> buildOnCommit or buildOnOptimize in the suggester configuration.
>
>>>    * how things are working at query time? Can I use send a request
>>>      with only suggest.q=... to my /suggest request handler and get
>>>      back distributed suggestions?
> The SuggestComponent works in distributed mode and it will request and
> merge results from all shards.
>
>>> Thanks in advance
>>> Andrea
>>
>
>


Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Comments inline:


On Wed, Apr 19, 2017 at 2:46 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
> Hi,
> any help out there?
>
> BTW I forgot the Solr version: 6.5.0
>
> Thanks,
> Andrea
>
>
> On 18/04/17 11:45, Andrea Gazzarini wrote:
>>
>> Hi,
>> I have a project, with SolrCloud, where I'm going to use the Suggester
>> component (BlendedInfixLookupFactory with DocumentDictionaryFactory).
>> Some info:
>>
>>   * I will have a suggest-only collection, with no NRT requirements
>>     (indexes will be updated with a daily frequency)
>>   * I'm not yet sure about the replication factor (I have to do some
>>     checks)
>>   * I'm using Solrj on the client side
>>
>> After reading some documentation I have a couple of doubts:
>>
>>   * how the *suggest.build* command is working? Can I issue this
>>     command towards just one node, and have that node forward the
>>     request to the other nodes (so each of them can build its own
>>     suggester index portion)?

The suggest.build only builds locally in the node to which you sent
the request. This makes it a bit tricky because if you send that
command with just the collection name, it will be resolved to a local
core and executed there. The safest/easiest way is to set
buildOnCommit or buildOnOptimize in the suggester configuration.

>>   * how things are working at query time? Can I use send a request
>>     with only suggest.q=... to my /suggest request handler and get
>>     back distributed suggestions?

The SuggestComponent works in distributed mode and it will request and
merge results from all shards.

>>
>> Thanks in advance
>> Andrea
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Index and query time suggester behavior in a SolrCloud environment

Posted by Andrea Gazzarini <gx...@gmail.com>.
Hi,
any help out there?

BTW I forgot the Solr version: 6.5.0

Thanks,
Andrea

On 18/04/17 11:45, Andrea Gazzarini wrote:
> Hi,
> I have a project, with SolrCloud, where I'm going to use the Suggester 
> component (BlendedInfixLookupFactory with DocumentDictionaryFactory).
> Some info:
>
>   * I will have a suggest-only collection, with no NRT requirements
>     (indexes will be updated with a daily frequency)
>   * I'm not yet sure about the replication factor (I have to do some
>     checks)
>   * I'm using Solrj on the client side
>
> After reading some documentation I have a couple of doubts:
>
>   * how the *suggest.build* command is working? Can I issue this
>     command towards just one node, and have that node forward the
>     request to the other nodes (so each of them can build its own
>     suggester index portion)?
>   * how things are working at query time? Can I use send a request
>     with only suggest.q=... to my /suggest request handler and get
>     back distributed suggestions?
>
> Thanks in advance
> Andrea