You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nitin Solanki <ni...@gmail.com> on 2015/01/20 08:28:27 UTC

Issue : Replacing ID with another will degrade performance in Solr?

Hi,
                 I am working on solr 4.10.2. I have been trapped into
the *performance
issue* where I have indexed 600MB data on 4 shards with single replicas
each. I have defined 2 fields (ngram and frequency). I have removed ID
field and replaced it with ngram field. Therefore, Search performance is
getting low and taking *QTime  = 134 ms* which is not well for my task.

*Schema.xml(sample part) *:-
*ngram field* -  <field name="ngram" type="textSpell" indexed="true"
stored="true" required="true" multiValued="false"/>

<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>

I  have posted same problem on Stackoverflow
<http://stackoverflow.com/questions/27983291/replacing-id-with-another-will-degrade-performance-in-solr/27984428?noredirect=1#comment44431492_27984428>
but no able to get correct solution. Please help me.

Thanks and Regards,
 Nitin Solanki.

Fwd: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Nitin Solanki <ni...@gmail.com>.
Hi,
                 I am working on solr 4.10.2. I have been trapped into
the *performance
issue* where I have indexed 600MB data on 4 shards with single replicas
each. I have defined 2 fields (ngram and frequency). I have removed ID
field and replaced it with ngram field. Therefore, Search performance is
getting low and taking *QTime  = 134 ms* which is not well for my task.

*Schema.xml(sample part) *:-
*ngram field* -  <field name="ngram" type="textSpell" indexed="true"
stored="true" required="true" multiValued="false"/>

<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>

I  have posted same problem on Stackoverflow
<http://stackoverflow.com/questions/27983291/replacing-id-with-another-will-degrade-performance-in-solr/27984428?noredirect=1#comment44431492_27984428>
but no able to get correct solution. Please help me.

Thanks and Regards,
 Nitin Solanki.

Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Nitin Solanki <ni...@gmail.com>.
Anyone has answer of question which I have asked on 20th Jan 2015 at 7:48 PM

On Tue, Jan 20, 2015 at 11:59 PM, Nitin Solanki <ni...@gmail.com>
wrote:

> Okay. No Problem. Please somebody check my question which I have mailed on
> 20th Jan 2015 at 7:48 PM where I have posted my question along with 2
> attachments. I am also waiting for Shalin, if he is able to answer.
>
> On Tue, Jan 20, 2015 at 11:49 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
>
>> On 1/20/2015 11:11 AM, Nitin Solanki wrote:
>> > Thanks a lot Shawn. There is any way to reduce time to retrieve
>> suggestions
>> > fast.
>>
>> I know almost nothing about how to use the suggester and spellcheck
>> features of Solr.  I do know that the suggester is based on spellcheck.
>> I have a spellcheck config in my solrconfig.xml, but we've never used
>> it, and so I don't even know if it's any good.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Nitin Solanki <ni...@gmail.com>.
Okay. No Problem. Please somebody check my question which I have mailed on
20th Jan 2015 at 7:48 PM where I have posted my question along with 2
attachments. I am also waiting for Shalin, if he is able to answer.

On Tue, Jan 20, 2015 at 11:49 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/20/2015 11:11 AM, Nitin Solanki wrote:
> > Thanks a lot Shawn. There is any way to reduce time to retrieve
> suggestions
> > fast.
>
> I know almost nothing about how to use the suggester and spellcheck
> features of Solr.  I do know that the suggester is based on spellcheck.
> I have a spellcheck config in my solrconfig.xml, but we've never used
> it, and so I don't even know if it's any good.
>
> Thanks,
> Shawn
>
>

Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/20/2015 11:11 AM, Nitin Solanki wrote:
> Thanks a lot Shawn. There is any way to reduce time to retrieve suggestions
> fast.

I know almost nothing about how to use the suggester and spellcheck
features of Solr.  I do know that the suggester is based on spellcheck. 
I have a spellcheck config in my solrconfig.xml, but we've never used
it, and so I don't even know if it's any good.

Thanks,
Shawn


Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Nitin Solanki <ni...@gmail.com>.
Thanks a lot Shawn. There is any way to reduce time to retrieve suggestions
fast.

On Tue, Jan 20, 2015 at 9:33 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/20/2015 7:18 AM, Nitin Solanki wrote:
> > Thanks and sorry for Stackoverflow. You are saying that use "string"
> > type. But I have used filter = solr.ShingleFilterFactory to break a
> > string into ngrams.
> > I want to build query correction just like google is doing - "Did you
> > mean".
>
> Shalin is saying that you can't use a tokenized fieldType for the
> *uniqueKey*.  For most other Solr features, it is acceptable (and
> extremely commonplace) to use fully tokenized and even multi-value
> fields, but you must use a single-value simple field type for uniqueKey,
> to avoid ANY possibility of duplicate uniqueKey field values resulting
> from multiple inputs.  If you use TextField for a uniqueKey, you're
> likely to have problems.  Only StrField or one of the numeric types
> should be used for that field.
>
> Thanks,
> Shawn
>
>

Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/20/2015 7:18 AM, Nitin Solanki wrote:
> Thanks and sorry for Stackoverflow. You are saying that use "string"
> type. But I have used filter = solr.ShingleFilterFactory to break a
> string into ngrams.
> I want to build query correction just like google is doing - "Did you
> mean".

Shalin is saying that you can't use a tokenized fieldType for the
*uniqueKey*.  For most other Solr features, it is acceptable (and
extremely commonplace) to use fully tokenized and even multi-value
fields, but you must use a single-value simple field type for uniqueKey,
to avoid ANY possibility of duplicate uniqueKey field values resulting
from multiple inputs.  If you use TextField for a uniqueKey, you're
likely to have problems.  Only StrField or one of the numeric types
should be used for that field.

Thanks,
Shawn


Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Nitin Solanki <ni...@gmail.com>.
Thanks and sorry for Stackoverflow. You are saying that use "string" type.
But I have used filter = solr.ShingleFilterFactory to break a string into
ngrams.
I want to build query correction just like google is doing - "Did you
mean".

i) I am storing ngrams into gram field and have only single this field in
solr. And saving ngrams(1 to 5 grams) using wikipedia dump data.
ii) Using suggester component to get suggestions of searched query/words.
Suggester gives suggestions on word by evaluating documents and suggested
words are sorting according to freq that I should.

Right Now, I have 600MB indexed data.
Example : When I apply algorithm on input query = "what is ago of salman
khn". It corrects the query into "what is age of salman khan" but it takes
10 seconds to do processing. Because I am calling on Solr API multiple
times to get suggetions of each words( By building input query from unigram
to 5-grams to check). Approx. Number of calls to Solr for single query is
around 1500 times. How to reduce it or make solr faster to give suggestions
fast. Average QTime for single hit on solr is 22 ms. it is taking.

I have attached schema.xml and solrconfig.xml. Please check it and give
your suggestions.
Waiting for your reply.




On Tue, Jan 20, 2015 at 5:25 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> I already replied to you on stack overflow but your response there and the
> schema.xml definition here are contrary to each other.
>
> You are using a textSpell field which is tokenized as a unique key. As I
> mentioned on stack overflow, it is a bad idea. Yes, it will impact
> performance as well as lead to duplicate documents. Switch to a "string" or
> int/long field and you should be fine regardless of what it is named.
>
> On Tue, Jan 20, 2015 at 8:28 AM, Nitin Solanki <ni...@gmail.com>
> wrote:
>
> > Hi,
> >                  I am working on solr 4.10.2. I have been trapped into
> > the *performance
> > issue* where I have indexed 600MB data on 4 shards with single replicas
> > each. I have defined 2 fields (ngram and frequency). I have removed ID
> > field and replaced it with ngram field. Therefore, Search performance is
> > getting low and taking *QTime  = 134 ms* which is not well for my task.
> >
> > *Schema.xml(sample part) *:-
> > *ngram field* -  <field name="ngram" type="textSpell" indexed="true"
> > stored="true" required="true" multiValued="false"/>
> >
> > <fieldType name="textSpell" class="solr.TextField"
> > positionIncrementGap="100">
> >        <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> > minShingleSize="2" outputUnigrams="true"/>
> >     </analyzer>
> >     <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> > minShingleSize="2" outputUnigrams="true"/>
> >     </analyzer>
> > </fieldType>
> >
> > I  have posted same problem on Stackoverflow
> > <
> >
> http://stackoverflow.com/questions/27983291/replacing-id-with-another-will-degrade-performance-in-solr/27984428?noredirect=1#comment44431492_27984428
> > >
> > but no able to get correct solution. Please help me.
> >
> > Thanks and Regards,
> >  Nitin Solanki.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Issue : Replacing ID with another will degrade performance in Solr?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I already replied to you on stack overflow but your response there and the
schema.xml definition here are contrary to each other.

You are using a textSpell field which is tokenized as a unique key. As I
mentioned on stack overflow, it is a bad idea. Yes, it will impact
performance as well as lead to duplicate documents. Switch to a "string" or
int/long field and you should be fine regardless of what it is named.

On Tue, Jan 20, 2015 at 8:28 AM, Nitin Solanki <ni...@gmail.com> wrote:

> Hi,
>                  I am working on solr 4.10.2. I have been trapped into
> the *performance
> issue* where I have indexed 600MB data on 4 shards with single replicas
> each. I have defined 2 fields (ngram and frequency). I have removed ID
> field and replaced it with ngram field. Therefore, Search performance is
> getting low and taking *QTime  = 134 ms* which is not well for my task.
>
> *Schema.xml(sample part) *:-
> *ngram field* -  <field name="ngram" type="textSpell" indexed="true"
> stored="true" required="true" multiValued="false"/>
>
> <fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> minShingleSize="2" outputUnigrams="true"/>
>     </analyzer>
>     <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> minShingleSize="2" outputUnigrams="true"/>
>     </analyzer>
> </fieldType>
>
> I  have posted same problem on Stackoverflow
> <
> http://stackoverflow.com/questions/27983291/replacing-id-with-another-will-degrade-performance-in-solr/27984428?noredirect=1#comment44431492_27984428
> >
> but no able to get correct solution. Please help me.
>
> Thanks and Regards,
>  Nitin Solanki.
>



-- 
Regards,
Shalin Shekhar Mangar.